Age | Commit message (Collapse) | Author |
|
[Why & How]
Fix a typo for dcn315 line buffer bpp.
Reviewed-by: Jun Lei <Jun.Lei@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
[Why]
After gpu-reset, sometimes the driver fails to enable vblank irq,
causing flip_done timed out and the desktop freezed.
During gpu-reset, we disable and enable vblank irq in dm_suspend() and
dm_resume(). Later on in amdgpu_irq_gpu_reset_resume_helper(), we check
irqs' refcount and decide to enable or disable the irqs again.
However, we have 2 sets of API for controling vblank irq, one is
dm_vblank_get/put() and another is amdgpu_irq_get/put(). Each API has
its own refcount and flag to store the state of vblank irq, and they
are not synchronized.
In drm we use the first API to control vblank irq but in
amdgpu_irq_gpu_reset_resume_helper() we use the second set of API.
The failure happens when vblank irq was enabled by dm_vblank_get()
before gpu-reset, we have vblank->enabled true. However, during
gpu-reset, in amdgpu_irq_gpu_reset_resume_helper() vblank irq's state
checked from amdgpu_irq_update() is DISABLED. So finally it disables
vblank irq again. After gpu-reset, if there is a cursor plane commit,
the driver will try to enable vblank irq by calling drm_vblank_enable(),
but the vblank->enabled is still true, so it fails to turn on vblank
irq and causes flip_done can't be completed in vblank irq handler and
desktop become freezed.
[How]
Combining the 2 vblank control APIs by letting drm's API finally calls
amdgpu_irq's API, so the irq's refcount and state of both APIs can be
synchronized. Also add a check to prevent refcount from being less then
0 in amdgpu_irq_put().
v2:
- Add warning in amdgpu_irq_enable() if the irq is already disabled.
- Call dc_interrupt_set() in dm_set_vblank() to avoid refcount change
if it is in gpu-reset.
v3:
- Improve commit message and code comments.
Signed-off-by: Alan Liu <HaoPing.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
It is preferred to use typed property access functions (i.e.
of_property_read_<type> functions) rather than low-level
of_get_property()/of_find_property() functions for reading properties. As
part of this, convert of_get_property()/of_find_property() calls to the
recently added of_property_present() helper when we just want to test for
presence of a property and nothing more.
Link: https://lore.kernel.org/r/20230310144719.1544443-1-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> # pcie-mediatek
|
|
smatch reports
drivers/gpu/drm/amd/amdgpu/../display/dc/dcn314/dcn314_dccg.c:277:6: warning: symbol
'dccg314_init' was not declared. Should it be static?
This variable is only used in one file so should be static.
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This commit replaces spaces with tabs in multiple functions and adjusts
the indentation in some other parts of the code to improve readability.
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When using FPO, there is some misconfiguration that happens for the lack
of configuration of the MCLK switch in some circumstances. This commit
adds the required field update when using the MCLK switch.
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
All clock values came from firmware, but bounding box values can be
helpful in some debug situations. This commit updates some of the values
associated with clock speed and memory channels.
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[WHY]
Function "amdgpu_irq_update()" called by "amdgpu_device_ip_late_init()" is an atomic context.
We shouldn't access registers through KIQ since "msleep()" may be called in "amdgpu_kiq_rreg()".
[HOW]
Move function "amdgpu_virt_release_full_gpu()" after function "amdgpu_device_ip_late_init()",
to ensure that registers be accessed through RLCG instead of KIQ.
Call Trace:
<TASK>
show_stack+0x52/0x69
dump_stack_lvl+0x49/0x6d
dump_stack+0x10/0x18
__schedule_bug.cold+0x4f/0x6b
__schedule+0x473/0x5d0
? __wake_up_klogd.part.0+0x40/0x70
? vprintk_emit+0xbe/0x1f0
schedule+0x68/0x110
schedule_timeout+0x87/0x160
? timer_migration_handler+0xa0/0xa0
msleep+0x2d/0x50
amdgpu_kiq_rreg+0x18d/0x1f0 [amdgpu]
amdgpu_device_rreg.part.0+0x59/0xd0 [amdgpu]
amdgpu_device_rreg+0x3a/0x50 [amdgpu]
amdgpu_sriov_rreg+0x3c/0xb0 [amdgpu]
gfx_v10_0_set_gfx_eop_interrupt_state.constprop.0+0x16c/0x190 [amdgpu]
gfx_v10_0_set_eop_interrupt_state+0xa5/0xb0 [amdgpu]
amdgpu_irq_update+0x53/0x80 [amdgpu]
amdgpu_irq_get+0x7c/0xb0 [amdgpu]
amdgpu_fence_driver_hw_init+0x58/0x90 [amdgpu]
amdgpu_device_init.cold+0x16b7/0x2022 [amdgpu]
Signed-off-by: Chong Li <chongli2@amd.com>
Reviewed-by: JingWen.Chen2@amd.com
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
gcc with W=1 reports
In file included from drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0.c:36:
./drivers/gpu/drm/amd/amdgpu/../pm/swsmu/inc/smu_v13_0.h:66:18: error:
‘pmfw_decoded_link_width’ defined but not used [-Werror=unused-const-variable=]
66 | static const int pmfw_decoded_link_width[7] = {0, 1, 2, 4, 8, 12, 16};
| ^~~~~~~~~~~~~~~~~~~~~~~
./drivers/gpu/drm/amd/amdgpu/../pm/swsmu/inc/smu_v13_0.h:65:18: error:
‘pmfw_decoded_link_speed’ defined but not used [-Werror=unused-const-variable=]
65 | static const int pmfw_decoded_link_speed[5] = {1, 2, 3, 4, 5};
| ^~~~~~~~~~~~~~~~~~~~~~~
These variables are defined and used in smu_v13_0_7_ppt.c and smu_v13_0_0_ppt.c.
There should be only one definition. So define the variables as globals
in smu_v13_0.c
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
add jpeg table size to ctx table size rather than override it
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: JingWen Chen <JingWen.Chen2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into clk-rockchip
Pull a couple Rockchip clk driver updates from Heiko Stübner:
Reparenting fix for the clock supplying camera modules on the rk3399
and more critical (bus-)clocks on the rk3588.
* tag 'v6.4-rockchip-clk1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
clk: rockchip: rk3588: make gate linked clocks critical
clk: rockchip: rk3399: allow clk_cifout to force clk_cifout_src to reparent
|
|
Andrii Nakryiko says:
====================
This patch set moves bpf_for(), bpf_for_each(), and bpf_repeat() macros from
selftests-internal bpf_misc.h header to libbpf-provided bpf_helpers.h header.
To do this in a way to allow users to feature-detect and guard such
bpf_for()/bpf_for_each() uses on old kernels we also extend libbpf to improve
unresolved kfunc calls handling and reporting. This lets us mark
bpf_iter_num_{new,next,destroy}() declarations as __weak, and thus not fail
program loading outright if such kfuncs are missing on the host kernel.
Patches #1 and #2 do some simple clean ups and logging improvements. Patch #3
adds kfunc call poisoning and log fixup logic and is the hear of this patch
set, effectively. Patch #4 adds selftest for this logic. Patches #4 and #5
move bpf_for()/bpf_for_each()/bpf_repeat() into bpf_helpers.h header and mark
kfuncs as __weak to allow users to feature-detect and guard their uses.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Mark bpf_iter_num_{new,next,destroy}() kfuncs declared for
bpf_for()/bpf_repeat() macros as __weak to allow users to feature-detect
their presence and guard bpf_for()/bpf_repeat() loops accordingly for
backwards compatibility with old kernels.
Now that libbpf supports kfunc calls poisoning and better reporting of
unresolved (but called) kfuncs, declaring number iterator kfuncs in
bpf_helpers.h won't degrade user experience and won't cause unnecessary
kernel feature dependencies.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
To make it easier for bleeding-edge BPF applications, such as sched_ext,
to utilize open-coded iterators, move bpf_for(), bpf_for_each(), and
bpf_repeat() macros from selftests/bpf-internal bpf_misc.h helper, to
libbpf-provided bpf_helpers.h header.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add test validating that libbpf correctly poisons and reports __weak
unresolved kfuncs in post-processed verifier log.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, libbpf leaves `call #0` instruction for __weak unresolved
kfuncs, which might lead to a confusing verifier log situations, where
invalid `call #0` will be treated as successfully validated.
We can do better. Libbpf already has an established mechanism of
poisoning instructions that failed some form of resolution (e.g., CO-RE
relocation and BPF map set to not be auto-created). Libbpf doesn't fail
them outright to allow users to guard them through other means, and as
long as BPF verifier can prove that such poisoned instructions cannot be
ever reached, this doesn't consistute an invalid BPF program. If user
didn't guard such code, libbpf will extract few pieces of information to
tie such poisoned instructions back to additional information about what
entitity wasn't resolved (e.g., BPF map name, or CO-RE relocation
information).
__weak unresolved kfuncs fit this model well, so this patch extends
libbpf with poisioning and log fixup logic for kfunc calls.
Note, this poisoning is done only for kfunc *calls*, not kfunc address
resolution (ldimm64 instructions). The former cannot be ever valid, if
reached, so it's safe to poison them. The latter is a valid mechanism to
check if __weak kfunc ksym was resolved, and do necessary guarding and
work arounds based on this result, supported in most recent kernels. As
such, libbpf keeps such ldimm64 instructions as loading zero, never
poisoning them.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently libbpf always reports "kernel" as a source of ksym BTF type,
which is ambiguous given ksym's BTF can come from either vmlinux or
kernel module BTFs. Make this explicit and log module name, if used BTF
is from kernel module.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Normalize internal constants, field names, and comments related to log
fixup. Also add explicit `ext_idx` alias for relocation where relocation
is pointing to extern description for additional information.
No functional changes, just a clean up before subsequent additions.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This was caught by randconfig builds but does not show up in
build testing without CONFIG_MODULE_DECOMPRESS:
kernel/module/stats.c: In function 'mod_stat_bump_invalid':
kernel/module/stats.c:229:42: error: 'invalid_mod_byte' undeclared (first use in this function); did you mean 'invalid_mod_bytes'?
229 | atomic_long_add(info->compressed_len, &invalid_mod_byte);
| ^~~~~~~~~~~~~~~~
| invalid_mod_bytes
Fixes: df3e764d8e5c ("module: add debug stats to help identify memory pressure")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
clang build reports
kernel/module/stats.c:307:34: error: variable
'len' is uninitialized when used here [-Werror,-Wuninitialized]
len = scnprintf(buf + 0, size - len,
^~~
At the start of this sequence, neither the '+ 0', nor the '- len' are needed.
So remove them and fix using 'len' uninitalized.
Fixes: df3e764d8e5c ("module: add debug stats to help identify memory pressure")
Signed-off-by: Tom Rix <trix@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
The new module statistics code mixes 64-bit types and wordsized 'long'
variables, which leads to build failures on 32-bit architectures:
kernel/module/stats.c: In function 'read_file_mod_stats':
kernel/module/stats.c:291:29: error: passing argument 1 of 'atomic64_read' from incompatible pointer type [-Werror=incompatible-pointer-types]
291 | total_size = atomic64_read(&total_mod_size);
x86_64-linux-ld: kernel/module/stats.o: in function `read_file_mod_stats':
stats.c:(.text+0x2b2): undefined reference to `__udivdi3'
To fix this, the code has to use one of the two types consistently.
Change them all to word-size types here.
Fixes: df3e764d8e5c ("module: add debug stats to help identify memory pressure")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
MODULE_INIT_COMPRESSED_FILE is defined in the uapi header, which
is not included indirectly from the normal linux/module.h, but
has to be pulled in explicitly:
kernel/module/stats.c: In function 'mod_stat_bump_invalid':
kernel/module/stats.c:227:14: error: 'MODULE_INIT_COMPRESSED_FILE' undeclared (first use in this function)
227 | if (flags & MODULE_INIT_COMPRESSED_FILE)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
The finit_module() system call can create unnecessary virtual memory
pressure for duplicate modules. This is because load_module() can in
the worse case allocate more than twice the size of a module in virtual
memory. This saves at least a full size of the module in wasted vmalloc
space memory by trying to avoid duplicates as soon as we can validate
the module name in the read module structure.
This can only be an issue if a system is getting hammered with userspace
loading modules. There are two ways to load modules typically on systems,
one is the kernel moduile auto-loading (*request_module*() calls in-kernel)
and the other is things like udev. The auto-loading is in-kernel, but that
pings back to userspace to just call modprobe. We already have a way to
restrict the amount of concurrent kernel auto-loads in a given time, however
that still allows multiple requests for the same module to go through
and force two threads in userspace racing to call modprobe for the same
exact module. Even though libkmod which both modprobe and udev does check
if a module is already loaded prior calling finit_module() races are
still possible and this is clearly evident today when you have multiple
CPUs.
To avoid memory pressure for such stupid cases put a stop gap for them.
The *earliest* we can detect duplicates from the modules side of things
is once we have blessed the module name, sadly after the first vmalloc
allocation. We can check for the module being present *before* a secondary
vmalloc() allocation.
There is a linear relationship between wasted virtual memory bytes and
the number of CPU counts. The reason is that udev ends up racing to call
tons of the same modules for each of the CPUs.
We can see the different linear relationships between wasted virtual
memory and CPU count during after boot in the following graph:
+----------------------------------------------------------------------------+
14GB |-+ + + + + *+ +-|
| **** |
| *** |
| ** |
12GB |-+ ** +-|
| ** |
| ** |
| ** |
| ** |
10GB |-+ ** +-|
| ** |
| ** |
| ** |
8GB |-+ ** +-|
waste | ** ### |
| ** #### |
| ** ####### |
6GB |-+ **** #### +-|
| * #### |
| * #### |
| ***** #### |
4GB |-+ ** #### +-|
| ** #### |
| ** #### |
| ** #### |
2GB |-+ ** ##### +-|
| * #### |
| * #### Before ******* |
| **## + + + + After ####### |
+----------------------------------------------------------------------------+
0 50 100 150 200 250 300
CPUs count
On the y-axis we can see gigabytes of wasted virtual memory during boot
due to duplicate module requests which just end up failing. Trying to
infer the slope this ends up being about ~463 MiB per CPU lost prior
to this patch. After this patch we only loose about ~230 MiB per CPU, for
a total savings of about ~233 MiB per CPU. This is all *just on bootup*!
On a 8vcpu 8 GiB RAM system using kdevops and testing against selftests
kmod.sh -t 0008 I see a saving in the *highest* side of memory
consumption of up to ~ 84 MiB with the Linux kernel selftests kmod
test 0008. With the new stress-ng module test I see a 145 MiB difference
in max memory consumption with 100 ops. The stress-ng module ops tests can be
pretty pathalogical -- it is not realistic, however it was used to
finally successfully reproduce issues which are only reported to happen on
system with over 400 CPUs [0] by just usign 100 ops on a 8vcpu 8 GiB RAM
system. Running out of virtual memory space is no surprise given the
above graph, since at least on x86_64 we're capped at 128 MiB, eventually
we'd hit a series of errors and once can use the above graph to
guestimate when. This of course will vary depending on the features
you have enabled. So for instance, enabling KASAN seems to make this
much worse.
The results with kmod and stress-ng can be observed and visualized below.
The time it takes to run the test is also not affected.
The kmod tests 0008:
The gnuplot is set to a range from 400000 KiB (390 Mib) - 580000 (566 Mib)
given the tests peak around that range.
cat kmod.plot
set term dumb
set output fileout
set yrange [400000:580000]
plot filein with linespoints title "Memory usage (KiB)"
Before:
root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008
root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > log-0008-before.txt ^C
root@kmod ~ # sort -n -r log-0008-before.txt | head -1
528732
So ~516.33 MiB
After:
root@kmod ~ # /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008
root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > log-0008-after.txt ^C
root@kmod ~ # sort -n -r log-0008-after.txt | head -1
442516
So ~432.14 MiB
That's about 84 ~MiB in savings in the worst case. The graphs:
root@kmod ~ # gnuplot -e "filein='log-0008-before.txt'; fileout='graph-0008-before.txt'" kmod.plot
root@kmod ~ # gnuplot -e "filein='log-0008-after.txt'; fileout='graph-0008-after.txt'" kmod.plot
root@kmod ~ # cat graph-0008-before.txt
580000 +-----------------------------------------------------------------+
| + + + + + + + |
560000 |-+ Memory usage (KiB) ***A***-|
| |
540000 |-+ +-|
| |
| *A *AA*AA*A*AA *A*AA A*A*A *AA*A*AA*A A |
520000 |-+A*A*AA *AA*A *A*AA*A*AA *A*A A *A+-|
|*A |
500000 |-+ +-|
| |
480000 |-+ +-|
| |
460000 |-+ +-|
| |
| |
440000 |-+ +-|
| |
420000 |-+ +-|
| + + + + + + + |
400000 +-----------------------------------------------------------------+
0 5 10 15 20 25 30 35 40
root@kmod ~ # cat graph-0008-after.txt
580000 +-----------------------------------------------------------------+
| + + + + + + + |
560000 |-+ Memory usage (KiB) ***A***-|
| |
540000 |-+ +-|
| |
| |
520000 |-+ +-|
| |
500000 |-+ +-|
| |
480000 |-+ +-|
| |
460000 |-+ +-|
| |
| *A *A*A |
440000 |-+A*A*AA*A A A*A*AA A*A*AA*A*AA*A*AA*A*AA*AA*A*AA*A*AA-|
|*A *A*AA*A |
420000 |-+ +-|
| + + + + + + + |
400000 +-----------------------------------------------------------------+
0 5 10 15 20 25 30 35 40
The stress-ng module tests:
This is used to run the test to try to reproduce the vmap issues
reported by David:
echo 0 > /proc/sys/vm/oom_dump_tasks
./stress-ng --module 100 --module-name xfs
Prior to this commit:
root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > baseline-stress-ng.txt
root@kmod ~ # sort -n -r baseline-stress-ng.txt | head -1
5046456
After this commit:
root@kmod ~ # free -k -s 1 -c 40 | grep Mem | awk '{print $3}' > after-stress-ng.txt
root@kmod ~ # sort -n -r after-stress-ng.txt | head -1
4896972
5046456 - 4896972
149484
149484/1024
145.98046875000000000000
So this commit using stress-ng reveals saving about 145 MiB in memory
using 100 ops from stress-ng which reproduced the vmap issue reported.
cat kmod.plot
set term dumb
set output fileout
set yrange [4700000:5070000]
plot filein with linespoints title "Memory usage (KiB)"
root@kmod ~ # gnuplot -e "filein='baseline-stress-ng.txt'; fileout='graph-stress-ng-before.txt'" kmod-simple-stress-ng.plot
root@kmod ~ # gnuplot -e "filein='after-stress-ng.txt'; fileout='graph-stress-ng-after.txt'" kmod-simple-stress-ng.plot
root@kmod ~ # cat graph-stress-ng-before.txt
+---------------------------------------------------------------+
5.05e+06 |-+ + A + + + + + + +-|
| * Memory usage (KiB) ***A*** |
| * A |
5e+06 |-+ ** ** +-|
| ** * * A |
4.95e+06 |-+ * * A * A* +-|
| * * A A * * * * A |
| * * * * * * *A * * * A * |
4.9e+06 |-+ * * * A*A * A*AA*A A *A **A **A*A *+-|
| A A*A A * A * * A A * A * ** |
| * ** ** * * * * * * * |
4.85e+06 |-+ A A A ** * * ** *-|
| * * * * ** * |
| * A * * * * |
4.8e+06 |-+ * * * A A-|
| * * * |
4.75e+06 |-+ * * * +-|
| * ** |
| * + + + + + + ** + |
4.7e+06 +---------------------------------------------------------------+
0 5 10 15 20 25 30 35 40
root@kmod ~ # cat graph-stress-ng-after.txt
+---------------------------------------------------------------+
5.05e+06 |-+ + + + + + + + +-|
| Memory usage (KiB) ***A*** |
| |
5e+06 |-+ +-|
| |
4.95e+06 |-+ +-|
| |
| |
4.9e+06 |-+ *AA +-|
| A*AA*A*A A A*AA*AA*A*AA*A A A A*A *AA*A*A A A*AA*AA |
| * * ** * * * ** * *** * |
4.85e+06 |-+* *** * * * * *** A * * +-|
| * A * * ** * * A * * |
| * * * * ** * * |
4.8e+06 |-+* * * A * * * +-|
| * * * A * * |
4.75e+06 |-* * * * * +-|
| * * * * * |
| * + * *+ + + + + * *+ |
4.7e+06 +---------------------------------------------------------------+
0 5 10 15 20 25 30 35 40
[0] https://lkml.kernel.org/r/20221013180518.217405-1-david@redhat.com
Reported-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
Loading modules with finit_module() can end up using vmalloc(), vmap()
and vmalloc() again, for a total of up to 3 separate allocations in the
worst case for a single module. We always kernel_read*() the module,
that's a vmalloc(). Then vmap() is used for the module decompression,
and if so the last read buffer is freed as we use the now decompressed
module buffer to stuff data into our copy module. The last allocation is
specific to each architectures but pretty much that's generally a series
of vmalloc() calls or a variation of vmalloc to handle ELF sections with
special permissions.
Evaluation with new stress-ng module support [1] with just 100 ops
is proving that you can end up using GiBs of data easily even with all
care we have in the kernel and userspace today in trying to not load modules
which are already loaded. 100 ops seems to resemble the sort of pressure a
system with about 400 CPUs can create on module loading. Although issues
relating to duplicate module requests due to each CPU inucurring a new
module reuest is silly and some of these are being fixed, we currently lack
proper tooling to help diagnose easily what happened, when it happened
and who likely is to blame -- userspace or kernel module autoloading.
Provide an initial set of stats which use debugfs to let us easily scrape
post-boot information about failed loads. This sort of information can
be used on production worklaods to try to optimize *avoiding* redundant
memory pressure using finit_module().
There's a few examples that can be provided:
A 255 vCPU system without the next patch in this series applied:
Startup finished in 19.143s (kernel) + 7.078s (userspace) = 26.221s
graphical.target reached after 6.988s in userspace
And 13.58 GiB of virtual memory space lost due to failed module loading:
root@big ~ # cat /sys/kernel/debug/modules/stats
Mods ever loaded 67
Mods failed on kread 0
Mods failed on decompress 0
Mods failed on becoming 0
Mods failed on load 1411
Total module size 11464704
Total mod text size 4194304
Failed kread bytes 0
Failed decompress bytes 0
Failed becoming bytes 0
Failed kmod bytes 14588526272
Virtual mem wasted bytes 14588526272
Average mod size 171115
Average mod text size 62602
Average fail load bytes 10339140
Duplicate failed modules:
module-name How-many-times Reason
kvm_intel 249 Load
kvm 249 Load
irqbypass 8 Load
crct10dif_pclmul 128 Load
ghash_clmulni_intel 27 Load
sha512_ssse3 50 Load
sha512_generic 200 Load
aesni_intel 249 Load
crypto_simd 41 Load
cryptd 131 Load
evdev 2 Load
serio_raw 1 Load
virtio_pci 3 Load
nvme 3 Load
nvme_core 3 Load
virtio_pci_legacy_dev 3 Load
virtio_pci_modern_dev 3 Load
t10_pi 3 Load
virtio 3 Load
crc32_pclmul 6 Load
crc64_rocksoft 3 Load
crc32c_intel 40 Load
virtio_ring 3 Load
crc64 3 Load
The following screen shot, of a simple 8vcpu 8 GiB KVM guest with the
next patch in this series applied, shows 226.53 MiB are wasted in virtual
memory allocations which due to duplicate module requests during boot.
It also shows an average module memory size of 167.10 KiB and an an
average module .text + .init.text size of 61.13 KiB. The end shows all
modules which were detected as duplicate requests and whether or not
they failed early after just the first kernel_read*() call or late after
we've already allocated the private space for the module in
layout_and_allocate(). A system with module decompression would reveal
more wasted virtual memory space.
We should put effort now into identifying the source of these duplicate
module requests and trimming these down as much possible. Larger systems
will obviously show much more wasted virtual memory allocations.
root@kmod ~ # cat /sys/kernel/debug/modules/stats
Mods ever loaded 67
Mods failed on kread 0
Mods failed on decompress 0
Mods failed on becoming 83
Mods failed on load 16
Total module size 11464704
Total mod text size 4194304
Failed kread bytes 0
Failed decompress bytes 0
Failed becoming bytes 228959096
Failed kmod bytes 8578080
Virtual mem wasted bytes 237537176
Average mod size 171115
Average mod text size 62602
Avg fail becoming bytes 2758544
Average fail load bytes 536130
Duplicate failed modules:
module-name How-many-times Reason
kvm_intel 7 Becoming
kvm 7 Becoming
irqbypass 6 Becoming & Load
crct10dif_pclmul 7 Becoming & Load
ghash_clmulni_intel 7 Becoming & Load
sha512_ssse3 6 Becoming & Load
sha512_generic 7 Becoming & Load
aesni_intel 7 Becoming
crypto_simd 7 Becoming & Load
cryptd 3 Becoming & Load
evdev 1 Becoming
serio_raw 1 Becoming
nvme 3 Becoming
nvme_core 3 Becoming
t10_pi 3 Becoming
virtio_pci 3 Becoming
crc32_pclmul 6 Becoming & Load
crc64_rocksoft 3 Becoming
crc32c_intel 3 Becoming
virtio_pci_modern_dev 2 Becoming
virtio_pci_legacy_dev 1 Becoming
crc64 2 Becoming
virtio 2 Becoming
virtio_ring 2 Becoming
[0] https://github.com/ColinIanKing/stress-ng.git
[1] echo 0 > /proc/sys/vm/oom_dump_tasks
./stress-ng --module 100 --module-name xfs
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
The patient module check inside add_unformed_module() is large
enough as we need it. It is a bit hard to read too, so just
move it to a helper and do the inverse checks first to help
shift the code and make it easier to read. The new helper then
is module_patient_check_exists().
To make this work we need to mvoe the finished_loading() up,
we do that without making any functional changes to that routine.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
Simplify the concurrency delimiter we use for kmod with the semaphore.
I had used the kmod strategy to try to implement a similar concurrency
delimiter for the kernel_read*() calls from the finit_module() path
so to reduce vmalloc() memory pressure. That effort didn't provide yet
conclusive results, but one thing that became clear is we can use
the suggested alternative solution with semaphores which Linus hinted
at instead of using the atomic / wait strategy.
I've stress tested this with kmod test 0008:
time /data/linux-next/tools/testing/selftests/kmod/kmod.sh -t 0008
And I get only a *slight* delay. That delay however is small, a few
seconds for a full test loop run that runs 150 times, for about ~30-40
seconds. The small delay is worth the simplfication IMHO.
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
Fundamentally semaphores are a counted primitive, but
DEFINE_SEMAPHORE() does not expose this and explicitly creates a
binary semaphore.
Change DEFINE_SEMAPHORE() to take a number argument and use that in the
few places that open-coded it using __SEMAPHORE_INITIALIZER().
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[mcgrof: add some tribal knowledge about why some folks prefer
binary sempahores over mutexes]
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
|
|
flag
For veth pairs, NETDEV_XDP_ACT_NDO_XMIT is supported by the current
device if the peer one is running a XDP program or if it has GRO enabled.
Fix the xdp_features flags reporting considering peer device and not
current one for NETDEV_XDP_ACT_NDO_XMIT.
Fixes: fccca038f300 ("veth: take into account device reconfiguration for xdp_features flag")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/4f1ca6f6f6b42ae125bfdb5c7782217c83968b2e.1681767806.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Merge series from Richard Fitzgerald <rf@opensource.cirrus.com>:
These patches make some small changes to align with the B0
silicon revision.
|
|
Add a compatible for the Power Domain Controller on SA8775p platforms.
Increase the number of PDC pin mappings.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230417094635.302344-1-brgl@bgdev.pl
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Remove the stih415 and stih416 reset dt-bindings since those
two platforms are no more supported.
Signed-off-by: Alain Volmat <avolmat@me.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230416200442.61554-1-avolmat@me.com
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Remove compatible for stih415/stih416 and stid127 which are
no more supported.
Signed-off-by: Alain Volmat <avolmat@me.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Link: https://lore.kernel.org/r/20230416195857.61284-1-avolmat@me.com
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Remove bindings for the stih415/stih416/stid127 since they are
not supported within the kernel anymore.
Signed-off-by: Alain Volmat <avolmat@me.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230416190950.18929-1-avolmat@me.com
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Jonathan notes that cxl_cdat_get_length() and cxl_cdat_read_table()
allocate 32 dwords for the DOE response even though it may be smaller.
In the case of cxl_cdat_get_length(), only the second dword of the
response is of interest (it contains the length). So reduce the
allocation to 2 dwords and let DOE discard the remainder.
In the case of cxl_cdat_read_table(), a correctly sized allocation for
the full CDAT already exists. Let DOE write each table entry directly
into that allocation. There's a snag in that the table entry is
preceded by a Table Access Response Header (1 dword, CXL 3.0 table 8-14).
Save the last dword of the previous table entry, let DOE overwrite it
with the header of the next entry and restore it afterwards.
The resulting CDAT is preceded by 4 unavoidable useless bytes. Increase
the allocation size accordingly.
The buffer overflow check in cxl_cdat_read_table() becomes unnecessary
because the remaining bytes in the allocation are tracked in "length",
which is passed to DOE and limits how many bytes it writes to the
allocation. Additionally, cxl_cdat_read_table() bails out if the DOE
response is truncated due to insufficient space.
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/7a4e1f86958a79a70f29b96a92199522f08f8322.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The cdat.table and cdat.length fields in struct cxl_port are set before
CDAT retrieval and must therefore be unset on failure.
Simplify by setting only on success.
Suggested-by: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
Link: https://lore.kernel.org/linux-cxl/20230209113422.00007017@Huawei.com/
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
[lukas: rebase and rephrase commit message]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/7a5c7104fb6a3016dbaec1c5d0ed34619ea11a0c.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
An upcoming user of DOE is CMA (Component Measurement and Authentication,
PCIe r6.0 sec 6.31).
It builds on SPDM (Security Protocol and Data Model):
https://www.dmtf.org/dsp/DSP0274
SPDM message sizes are not always a multiple of dwords. To transport
them over DOE without using bounce buffers, allow sending requests and
receiving responses whose final dword is only partially populated.
To be clear, PCIe r6.0 sec 6.30.1 specifies the Data Object Header 2
"Length" in dwords and pci_doe_send_req() and pci_doe_recv_resp()
read/write dwords. So from a spec point of view, DOE is still specified
in dwords and allowing non-dword request/response buffers is merely for
the convenience of callers.
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ming Li <ming4.li@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/151b1a6a1794afb65d941287ecbc032c5b8004b9.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The PCI core has just been amended to create a pci_doe_mb struct for
every DOE instance on device enumeration. CXL (the only in-tree DOE
user so far) has been migrated to use those mailboxes instead of
creating its own.
That leaves pcim_doe_create_mb() and pci_doe_for_each_off() without any
callers, so drop them.
pci_doe_supports_prot() is now only used internally, so declare it
static.
pci_doe_destroy_mb() is no longer used as callback for
devm_add_action(), so refactor it to accept a struct pci_doe_mb pointer
instead of a generic void pointer.
Because pci_doe_create_mb() is only called on device enumeration, i.e.
before driver binding, the workqueue name never contains a driver name.
So replace dev_driver_string() with dev_bus_name() when generating the
workqueue name.
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ming Li <ming4.li@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/64f614b6584982986c55d2c6229b4ee2b276dd59.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The PCI core has just been amended to create a pci_doe_mb struct for
every DOE instance on device enumeration.
Drop creation of a (duplicate) CDAT DOE mailbox on cxl probing in favor
of the one already created by the PCI core.
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/becaf70e8faf9681d474200117d62d7eaac46cca.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Currently a DOE instance cannot be shared by multiple drivers because
each driver creates its own pci_doe_mb struct for a given DOE instance.
For the same reason a DOE instance cannot be shared between the PCI core
and a driver.
Moreover, finding out which protocols a DOE instance supports requires
creating a pci_doe_mb for it. If a device has multiple DOE instances,
a driver looking for a specific protocol may need to create a pci_doe_mb
for each of the device's DOE instances and then destroy those which
do not support the desired protocol. That's obviously an inefficient
way to do things.
Overcome these issues by creating mailboxes in the PCI core on device
enumeration.
Provide a pci_find_doe_mailbox() API call to allow drivers to get a
pci_doe_mb for a given (pci_dev, vendor, protocol) triple. This API is
modeled after pci_find_capability() and can later be amended with a
pci_find_next_doe_mailbox() call to iterate over all mailboxes of a
given pci_dev which support a specific protocol.
On removal, destroy the mailboxes in pci_destroy_dev(), after the driver
is unbound. This allows drivers to use DOE in their ->remove() hook.
On surprise removal, cancel ongoing DOE exchanges and prevent new ones
from being scheduled. Thereby ensure that a hot-removed device doesn't
needlessly wait for a running exchange to time out.
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ming Li <ming4.li@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/40a6f973f72ef283d79dd55e7e6fddc7481199af.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
DOE mailbox creation is currently only possible through a devres-managed
API. The lifetime of mailboxes thus ends with driver unbinding.
An upcoming commit will create DOE mailboxes upon device enumeration by
the PCI core. Their lifetime shall not be limited by a driver.
Therefore rework pcim_doe_create_mb() into the non-devres-managed
pci_doe_create_mb(). Add pci_doe_destroy_mb() for mailbox destruction
on device removal.
Provide a devres-managed wrapper under the existing pcim_doe_create_mb()
name.
The error path of pcim_doe_create_mb() previously called xa_destroy() if
alloc_ordered_workqueue() failed. That's unnecessary because the xarray
is still empty at that point. It doesn't need to be destroyed until
it's been populated by pci_doe_cache_protocols(). Arrange the error
path of the new pci_doe_create_mb() accordingly.
pci_doe_cancel_tasks() is no longer used as callback for
devm_add_action(), so refactor it to accept a struct pci_doe_mb pointer
instead of a generic void pointer.
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ming Li <ming4.li@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/7c9a63867d70233c5e9d26cd8bf956742cd6d650.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
When a DOE mailbox is torn down, its workqueue is flushed once in
pci_doe_flush_mb() through a call to flush_workqueue() and subsequently
flushed once more in pci_doe_destroy_workqueue() through a call to
destroy_workqueue().
Deduplicate by dropping flush_workqueue() from pci_doe_flush_mb().
Rename pci_doe_flush_mb() to pci_doe_cancel_tasks() to more aptly
describe what it now does.
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ming Li <ming4.li@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/1f009f60b326d1c6d776641d4b20aff27de0c234.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
A synchronous API for DOE has just been introduced. CXL (the only
in-tree DOE user so far) was converted to use it instead of the
asynchronous API.
Consequently, pci_doe_submit_task() as well as the pci_doe_task struct
are only used internally, so make them private.
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ming Li <ming4.li@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/cc19544068483681e91dfe27545c2180cd09f931.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
A synchronous API for DOE has just been introduced. Convert CXL CDAT
retrieval over to it.
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/c329c0a21c11c3b524ce2336b0bbb3c80a28c415.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The DOE API only allows asynchronous exchanges and forces callers to
provide a completion callback. Yet all existing callers only perform
synchronous exchanges. Upcoming commits for CMA (Component Measurement
and Authentication, PCIe r6.0 sec 6.31) likewise require only
synchronous DOE exchanges.
Provide a synchronous pci_doe() API call which builds on the internal
asynchronous machinery.
Convert the internal pci_doe_discovery() to the new call.
The new API allows submission of const-declared requests, necessitating
the addition of a const qualifier in struct pci_doe_task.
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ming Li <ming4.li@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/0f444206da9615c56301fbaff459c0f45d27f122.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
A recent debug session yielded a couple debug messages that were useful
for determining the reason why the driver was or was not falling back
to CXL range register emulation, and for identifying decoder setting
enumeration problems.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/168149845668.792294.11814353796371419167.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
Do not assume that a single-target port falls back to a passthrough
decoder configuration. Scan for decoders and only fallback after probing
that the HDM decoder capability is not present.
One user visible affect of this bug is the inability to enumerate
present CXL regions as the decoder settings for the present decoders are
skipped.
Fixes: d17d0540a0db ("cxl/core/hdm: Add CXL standard decoder enumeration to the core")
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: http://lore.kernel.org/r/20230227153128.8164-1-Jonathan.Cameron@huawei.com
Cc: <stable@vger.kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/168149845130.792294.3210421233937427962.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|