summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-02-01drm/amd/pm: drop unneeded dpm features disablement for SMU 13.0.4/11Tim Huang
PMFW will handle the features disablement properly for gpu reset case, driver involvement may cause some unexpected issues. Cc: stable@vger.kernel.org # 6.1 Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-01drm/amd/display: Reset DMUB mailbox SW state after HW resetNicholas Kazlauskas
[Why] Otherwise we can be out of sync with what's in the hardware, leading to us rerunning every command that's presently in the ringbuffer. [How] Reset software state for the mailboxes in hw_reset callback. This is already done as part of the mailbox init in hw_init, but we do need to remember to reset the last cached wptr value as well here. Reviewed-by: Hansen Dsouza <hansen.dsouza@amd.com> Acked-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-01drm/amd/display: Unassign does_plane_fit_in_mall function from dcn3.2George Shen
[Why] The hwss function does_plane_fit_in_mall not applicable to dcn3.2 asics. Using it with dcn3.2 can result in undefined behaviour. [How] Assign the function pointer to NULL. Reviewed-by: Alvin Lee <Alvin.Lee2@amd.com> Acked-by: Alex Hung <alex.hung@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-01drm/amd/display: Adjust downscaling limits for dcn314Daniel Miess
[Why] Lower max_downscale_ratio and ARGB888 downscale factor to prevent cases where underflow may occur on dcn314 [How] Set max_downscale_ratio to 400 and ARGB downscale factor to 250 for dcn314 Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Daniel Miess <Daniel.Miess@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-01drm/amd/display: Add missing brackets in calculationDaniel Miess
[Why] Brackets missing in the calculation for MIN_DST_Y_NEXT_START [How] Add missing brackets for this calculation Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Daniel Miess <Daniel.Miess@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-01drm/amdgpu: update wave data type to 3 for gfx11Graham Sider
SQ_WAVE_INST_DW0 isn't present on gfx11 compared to gfx10, so update wave data type to signify a difference. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Mukul Joshi <Mukul.Joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-02-01blk-cgroup: don't update io stat for root cgroupMing Lei
We source root cgroup stats from the system-wide stats, see blkcg_print_stat and blkcg_rstat_flush, so don't update io state for root cgroup. Fixes blkg leak issue introduced in commit 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()") which starts to grab blkg's reference when adding iostat_cpu into percpu blkcg list, but this state won't be consumed by blkcg_rstat_flush() where the blkg reference is dropped. Tested-by: Bart van Assche <bvanassche@acm.org> Reported-by: Bart van Assche <bvanassche@acm.org> Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()") Cc: Tejun Heo <tj@kernel.org> Cc: Waiman Long <longman@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230202021804.278582-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-02powerpc/64s: Reconnect tlb_flush() to hash__tlb_flush()Michael Ellerman
Commit baf1ed24b27d ("powerpc/mm: Remove empty hash__ functions") removed some empty hash MMU flushing routines, but got a bit overeager and also removed the call to hash__tlb_flush() from tlb_flush(). In regular use this doesn't lead to any noticable breakage, which is a little concerning. Presumably there are flushes happening via other paths such as arch_leave_lazy_mmu_mode(), and/or a bit of luck. Fix it by reinstating the call to hash__tlb_flush(). Fixes: baf1ed24b27d ("powerpc/mm: Remove empty hash__ functions") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230131111407.806770-1-mpe@ellerman.id.au
2023-02-01perf symbols: Allow for static executables with .pltAdrian Hunter
A statically linked executable can have a .plt due to IFUNCs, in which case .symtab is used not .dynsym. Check the section header link to see if that is the case, and then use symtab instead. Example: Before: $ cat tstifunc.c #include <stdio.h> void thing1(void) { printf("thing1\n"); } void thing2(void) { printf("thing2\n"); } typedef void (*thing_fn_t)(void); thing_fn_t thing_ifunc(void) { int x; if (x & 1) return thing2; return thing1; } void thing(void) __attribute__ ((ifunc ("thing_ifunc"))); int main() { thing(); return 0; } $ gcc --version gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ gcc -static -Wall -Wextra -Wno-uninitialized -o tstifuncstatic tstifunc.c $ readelf -SW tstifuncstatic | grep 'Name\|plt\|dyn' [Nr] Name Type Address Off Size ES Flg Lk Inf Al [ 4] .rela.plt RELA 00000000004002e8 0002e8 000258 18 AI 29 20 8 [ 6] .plt PROGBITS 0000000000401020 001020 000190 00 AX 0 0 16 [20] .got.plt PROGBITS 00000000004c5000 0c4000 0000e0 08 WA 0 0 8 $ perf record -e intel_pt//u --filter 'filter main @ ./tstifuncstatic' ./tstifuncstatic thing1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.008 MB perf.data ] $ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso 15786.690189535: tr strt 0 [unknown] => 4017cd main+0x0 15786.690189535: tr end call 4017d5 main+0x8 => 401170 [unknown] 15786.690197660: tr strt 0 [unknown] => 4017da main+0xd 15786.690197660: tr end return 4017e0 main+0x13 => 401c1a __libc_start_call_main+0x6a After: $ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso 15786.690189535: tr strt 0 [unknown] => 4017cd main+0x0 15786.690189535: tr end call 4017d5 main+0x8 => 401170 thing_ifunc@plt+0x0 15786.690197660: tr strt 0 [unknown] => 4017da main+0xd 15786.690197660: tr end return 4017e0 main+0x13 => 401c1a __libc_start_call_main+0x6a Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230131131625.6964-8-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-01perf symbols: Allow for .plt without headerAdrian Hunter
A static executable can have a .plt due to the presence of IFUNCs. In that case the .plt does not have a header. Check for whether there is a header by comparing the number of entries to the number of relocation entries. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230131131625.6964-7-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-01perf symbols: Add support for IFUNC symbols for x86_64Adrian Hunter
For x86_64, the GNU linker is putting IFUNC information in the relocation addend, so use it to try to find a symbol for plt entries that refer to IFUNCs. Example: Before: $ cat tstpltlib.c void fn1(void) {} void fn2(void) {} void fn3(void) {} void fn4(void) {} $ cat tstpltifunc.c #include <stdio.h> void thing1(void) { printf("thing1\n"); } void thing2(void) { printf("thing2\n"); } typedef void (*thing_fn_t)(void); thing_fn_t thing_ifunc(void) { int x; if (x & 1) return thing2; return thing1; } void thing(void) __attribute__ ((ifunc ("thing_ifunc"))); void fn1(void); void fn2(void); void fn3(void); void fn4(void); int main() { fn4(); fn1(); thing(); fn2(); fn3(); return 0; } $ gcc --version gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ gcc -Wall -Wextra -shared -o libtstpltlib.so tstpltlib.c $ gcc -Wall -Wextra -Wno-uninitialized -o tstpltifunc tstpltifunc.c -L . -ltstpltlib -Wl,-rpath="$(pwd)" $ readelf -rW tstpltifunc | grep -A99 plt Relocation section '.rela.plt' at offset 0x738 contains 8 entries: Offset Info Type Symbol's Value Symbol's Name + Addend 0000000000003f98 0000000300000007 R_X86_64_JUMP_SLOT 0000000000000000 puts@GLIBC_2.2.5 + 0 0000000000003fa8 0000000400000007 R_X86_64_JUMP_SLOT 0000000000000000 __stack_chk_fail@GLIBC_2.4 + 0 0000000000003fb0 0000000500000007 R_X86_64_JUMP_SLOT 0000000000000000 fn1 + 0 0000000000003fb8 0000000600000007 R_X86_64_JUMP_SLOT 0000000000000000 fn3 + 0 0000000000003fc0 0000000800000007 R_X86_64_JUMP_SLOT 0000000000000000 fn4 + 0 0000000000003fc8 0000000900000007 R_X86_64_JUMP_SLOT 0000000000000000 fn2 + 0 0000000000003fd0 0000000b00000007 R_X86_64_JUMP_SLOT 0000000000000000 getrandom@GLIBC_2.25 + 0 0000000000003fa0 0000000000000025 R_X86_64_IRELATIVE 125d $ perf record -e intel_pt//u --filter 'filter main @ ./tstpltifunc' ./tstpltifunc thing2 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.016 MB perf.data ] $ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso 21860.073683659: tr strt 0 [unknown] => 561e212c42be main+0x0 21860.073683659: tr end call 561e212c42c6 main+0x8 => 561e212c4110 fn4@plt+0x0 21860.073683661: tr strt 0 [unknown] => 561e212c42cb main+0xd 21860.073683661: tr end call 561e212c42cb main+0xd => 561e212c40f0 fn1@plt+0x0 21860.073683661: tr strt 0 [unknown] => 561e212c42d0 main+0x12 21860.073683661: tr end call 561e212c42d0 main+0x12 => 561e212c40d0 offset_0x10d0@plt+0x0 21860.073698451: tr strt 0 [unknown] => 561e212c42d5 main+0x17 21860.073698451: tr end call 561e212c42d5 main+0x17 => 561e212c4120 fn2@plt+0x0 21860.073698451: tr strt 0 [unknown] => 561e212c42da main+0x1c 21860.073698451: tr end call 561e212c42da main+0x1c => 561e212c4100 fn3@plt+0x0 21860.073698452: tr strt 0 [unknown] => 561e212c42df main+0x21 21860.073698452: tr end return 561e212c42e5 main+0x27 => 7fb51cc29d90 __libc_start_call_main+0x80 After: $ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso 21860.073683659: tr strt 0 [unknown] => 561e212c42be main+0x0 21860.073683659: tr end call 561e212c42c6 main+0x8 => 561e212c4110 fn4@plt+0x0 21860.073683661: tr strt 0 [unknown] => 561e212c42cb main+0xd 21860.073683661: tr end call 561e212c42cb main+0xd => 561e212c40f0 fn1@plt+0x0 21860.073683661: tr strt 0 [unknown] => 561e212c42d0 main+0x12 21860.073683661: tr end call 561e212c42d0 main+0x12 => 561e212c40d0 thing_ifunc@plt+0x0 21860.073698451: tr strt 0 [unknown] => 561e212c42d5 main+0x17 21860.073698451: tr end call 561e212c42d5 main+0x17 => 561e212c4120 fn2@plt+0x0 21860.073698451: tr strt 0 [unknown] => 561e212c42da main+0x1c 21860.073698451: tr end call 561e212c42da main+0x1c => 561e212c4100 fn3@plt+0x0 21860.073698452: tr strt 0 [unknown] => 561e212c42df main+0x21 21860.073698452: tr end return 561e212c42e5 main+0x27 => 7fb51cc29d90 __libc_start_call_main+0x80 Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230131131625.6964-6-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-01perf symbols: Record whether a symbol is an alias for an IFUNC symbolAdrian Hunter
To assist with synthesizing plt symbols for IFUNCs, record whether a symbol is an alias of an IFUNC symbol. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230131131625.6964-5-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-01perf symbols: Sort plt relocations for x86Adrian Hunter
For x86, with the addition of IFUNCs, relocation information becomes disordered with respect to plt. Correct that by sorting the relocations by offset. Example: Before: $ cat tstpltlib.c void fn1(void) {} void fn2(void) {} void fn3(void) {} void fn4(void) {} $ cat tstpltifunc.c #include <stdio.h> void thing1(void) { printf("thing1\n"); } void thing2(void) { printf("thing2\n"); } typedef void (*thing_fn_t)(void); thing_fn_t thing_ifunc(void) { int x; if (x & 1) return thing2; return thing1; } void thing(void) __attribute__ ((ifunc ("thing_ifunc"))); void fn1(void); void fn2(void); void fn3(void); void fn4(void); int main() { fn4(); fn1(); thing(); fn2(); fn3(); return 0; } $ gcc --version gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ gcc -Wall -Wextra -shared -o libtstpltlib.so tstpltlib.c $ gcc -Wall -Wextra -Wno-uninitialized -o tstpltifunc tstpltifunc.c -L . -ltstpltlib -Wl,-rpath="$(pwd)" $ readelf -rW tstpltifunc | grep -A99 plt Relocation section '.rela.plt' at offset 0x738 contains 8 entries: Offset Info Type Symbol's Value Symbol's Name + Addend 0000000000003f98 0000000300000007 R_X86_64_JUMP_SLOT 0000000000000000 puts@GLIBC_2.2.5 + 0 0000000000003fa8 0000000400000007 R_X86_64_JUMP_SLOT 0000000000000000 __stack_chk_fail@GLIBC_2.4 + 0 0000000000003fb0 0000000500000007 R_X86_64_JUMP_SLOT 0000000000000000 fn1 + 0 0000000000003fb8 0000000600000007 R_X86_64_JUMP_SLOT 0000000000000000 fn3 + 0 0000000000003fc0 0000000800000007 R_X86_64_JUMP_SLOT 0000000000000000 fn4 + 0 0000000000003fc8 0000000900000007 R_X86_64_JUMP_SLOT 0000000000000000 fn2 + 0 0000000000003fd0 0000000b00000007 R_X86_64_JUMP_SLOT 0000000000000000 getrandom@GLIBC_2.25 + 0 0000000000003fa0 0000000000000025 R_X86_64_IRELATIVE 125d $ perf record -e intel_pt//u --filter 'filter main @ ./tstpltifunc' ./tstpltifunc thing2 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.029 MB perf.data ] $ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso 20417.302513948: tr strt 0 [unknown] => 5629a74892be main+0x0 20417.302513948: tr end call 5629a74892c6 main+0x8 => 5629a7489110 fn2@plt+0x0 20417.302513949: tr strt 0 [unknown] => 5629a74892cb main+0xd 20417.302513949: tr end call 5629a74892cb main+0xd => 5629a74890f0 fn3@plt+0x0 20417.302513950: tr strt 0 [unknown] => 5629a74892d0 main+0x12 20417.302513950: tr end call 5629a74892d0 main+0x12 => 5629a74890d0 __stack_chk_fail@plt+0x0 20417.302528114: tr strt 0 [unknown] => 5629a74892d5 main+0x17 20417.302528114: tr end call 5629a74892d5 main+0x17 => 5629a7489120 getrandom@plt+0x0 20417.302528115: tr strt 0 [unknown] => 5629a74892da main+0x1c 20417.302528115: tr end call 5629a74892da main+0x1c => 5629a7489100 fn4@plt+0x0 20417.302528115: tr strt 0 [unknown] => 5629a74892df main+0x21 20417.302528115: tr end return 5629a74892e5 main+0x27 => 7ff14da29d90 __libc_start_call_main+0x80 After: $ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso 20417.302513948: tr strt 0 [unknown] => 5629a74892be main+0x0 20417.302513948: tr end call 5629a74892c6 main+0x8 => 5629a7489110 fn4@plt+0x0 20417.302513949: tr strt 0 [unknown] => 5629a74892cb main+0xd 20417.302513949: tr end call 5629a74892cb main+0xd => 5629a74890f0 fn1@plt+0x0 20417.302513950: tr strt 0 [unknown] => 5629a74892d0 main+0x12 20417.302513950: tr end call 5629a74892d0 main+0x12 => 5629a74890d0 offset_0x10d0@plt+0x0 20417.302528114: tr strt 0 [unknown] => 5629a74892d5 main+0x17 20417.302528114: tr end call 5629a74892d5 main+0x17 => 5629a7489120 fn2@plt+0x0 20417.302528115: tr strt 0 [unknown] => 5629a74892da main+0x1c 20417.302528115: tr end call 5629a74892da main+0x1c => 5629a7489100 fn3@plt+0x0 20417.302528115: tr strt 0 [unknown] => 5629a74892df main+0x21 20417.302528115: tr end return 5629a74892e5 main+0x27 => 7ff14da29d90 __libc_start_call_main+0x80 Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230131131625.6964-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-01perf symbols: Add support for x86 .plt.secAdrian Hunter
The section .plt.sec was originally added for MPX and was first called .plt.bnd. While MPX has been deprecated, .plt.sec is now also used for IBT. On x86_64, IBT may be enabled by default, but can be switched off using gcc option -fcf-protection=none, or switched on by -z ibt or -z ibtplt. On 32-bit, option -z ibt or -z ibtplt will enable IBT. With .plt.sec, calls are made into .plt.sec instead of .plt, so it makes more sense to put the symbols there instead of .plt. A notable difference is that .plt.sec does not have a header entry. For x86, when synthesizing symbols for plt, use offset and entry size of .plt.sec instead of .plt when there is a .plt.sec section. Example on Ubuntu 22.04 gcc 11.3: Before: $ cat tstpltlib.c void fn1(void) {} void fn2(void) {} void fn3(void) {} void fn4(void) {} $ cat tstplt.c void fn1(void); void fn2(void); void fn3(void); void fn4(void); int main() { fn4(); fn1(); fn2(); fn3(); return 0; } $ gcc --version gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ gcc -Wall -Wextra -shared -o libtstpltlib.so tstpltlib.c $ gcc -Wall -Wextra -z ibt -o tstplt tstplt.c -L . -ltstpltlib -Wl,-rpath=$(pwd) $ readelf -SW tstplt | grep 'plt\|Name' [Nr] Name Type Address Off Size ES Flg Lk Inf Al [11] .rela.plt RELA 0000000000000698 000698 000060 18 AI 6 24 8 [13] .plt PROGBITS 0000000000001020 001020 000050 10 AX 0 0 16 [14] .plt.got PROGBITS 0000000000001070 001070 000010 10 AX 0 0 16 [15] .plt.sec PROGBITS 0000000000001080 001080 000040 10 AX 0 0 16 $ perf record -e intel_pt//u --filter 'filter main @ ./tstplt' ./tstplt [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.015 MB perf.data ] $ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso 38970.522546686: tr strt 0 [unknown] => 55fc222a81a9 main+0x0 38970.522546686: tr end call 55fc222a81b1 main+0x8 => 55fc222a80a0 [unknown] 38970.522546687: tr strt 0 [unknown] => 55fc222a81b6 main+0xd 38970.522546687: tr end call 55fc222a81b6 main+0xd => 55fc222a8080 [unknown] 38970.522546688: tr strt 0 [unknown] => 55fc222a81bb main+0x12 38970.522546688: tr end call 55fc222a81bb main+0x12 => 55fc222a80b0 [unknown] 38970.522546688: tr strt 0 [unknown] => 55fc222a81c0 main+0x17 38970.522546688: tr end call 55fc222a81c0 main+0x17 => 55fc222a8090 [unknown] 38970.522546689: tr strt 0 [unknown] => 55fc222a81c5 main+0x1c 38970.522546894: tr end return 55fc222a81cb main+0x22 => 7f3a4dc29d90 __libc_start_call_main+0x80 After: $ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso 38970.522546686: tr strt 0 [unknown] => 55fc222a81a9 main+0x0 38970.522546686: tr end call 55fc222a81b1 main+0x8 => 55fc222a80a0 fn4@plt+0x0 38970.522546687: tr strt 0 [unknown] => 55fc222a81b6 main+0xd 38970.522546687: tr end call 55fc222a81b6 main+0xd => 55fc222a8080 fn1@plt+0x0 38970.522546688: tr strt 0 [unknown] => 55fc222a81bb main+0x12 38970.522546688: tr end call 55fc222a81bb main+0x12 => 55fc222a80b0 fn2@plt+0x0 38970.522546688: tr strt 0 [unknown] => 55fc222a81c0 main+0x17 38970.522546688: tr end call 55fc222a81c0 main+0x17 => 55fc222a8090 fn3@plt+0x0 38970.522546689: tr strt 0 [unknown] => 55fc222a81c5 main+0x1c 38970.522546894: tr end return 55fc222a81cb main+0x22 => 7f3a4dc29d90 __libc_start_call_main+0x80 Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230131131625.6964-3-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-01perf symbols: Correct plt entry sizes for x86Adrian Hunter
In 32-bit executables the .plt entry size can be set to 4 when it is really 16. In fact the only sizes used for x86 (32 or 64 bit) are 8 or 16, so check for those and, if not, use the alignment to choose which it is. Example on Ubuntu 22.04 gcc 11.3: Before: $ cat tstpltlib.c void fn1(void) {} void fn2(void) {} void fn3(void) {} void fn4(void) {} $ cat tstplt.c void fn1(void); void fn2(void); void fn3(void); void fn4(void); int main() { fn4(); fn1(); fn2(); fn3(); return 0; } $ gcc --version gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ gcc -m32 -Wall -Wextra -shared -o libtstpltlib32.so tstpltlib.c $ gcc -m32 -Wall -Wextra -o tstplt32 tstplt.c -L . -ltstpltlib32 -Wl,-rpath=$(pwd) $ perf record -e intel_pt//u --filter 'filter main @ ./tstplt32' ./tstplt32 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.011 MB perf.data ] $ readelf -SW tstplt32 | grep 'plt\|Name' [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [10] .rel.plt REL 0000041c 00041c 000028 08 AI 5 22 4 [12] .plt PROGBITS 00001030 001030 000060 04 AX 0 0 16 <- ES is 0x04, should be 0x10 [13] .plt.got PROGBITS 00001090 001090 000008 08 AX 0 0 8 $ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso 17894.383903029: tr strt 0 [unknown] => 565b81cd main+0x0 17894.383903029: tr end call 565b81d4 main+0x7 => 565b80d0 __x86.get_pc_thunk.bx+0x0 17894.383903031: tr strt 0 [unknown] => 565b81d9 main+0xc 17894.383903031: tr end call 565b81df main+0x12 => 565b8070 [unknown] 17894.383903032: tr strt 0 [unknown] => 565b81e4 main+0x17 17894.383903032: tr end call 565b81e4 main+0x17 => 565b8050 [unknown] 17894.383903033: tr strt 0 [unknown] => 565b81e9 main+0x1c 17894.383903033: tr end call 565b81e9 main+0x1c => 565b8080 [unknown] 17894.383903033: tr strt 0 [unknown] => 565b81ee main+0x21 17894.383903033: tr end call 565b81ee main+0x21 => 565b8060 [unknown] 17894.383903237: tr strt 0 [unknown] => 565b81f3 main+0x26 17894.383903237: tr end return 565b81fc main+0x2f => f7c21519 [unknown] After: $ perf script --itrace=be --ns -F+flags,-event,+addr,-period,-comm,-tid,-cpu,-dso 17894.383903029: tr strt 0 [unknown] => 565b81cd main+0x0 17894.383903029: tr end call 565b81d4 main+0x7 => 565b80d0 __x86.get_pc_thunk.bx+0x0 17894.383903031: tr strt 0 [unknown] => 565b81d9 main+0xc 17894.383903031: tr end call 565b81df main+0x12 => 565b8070 fn4@plt+0x0 17894.383903032: tr strt 0 [unknown] => 565b81e4 main+0x17 17894.383903032: tr end call 565b81e4 main+0x17 => 565b8050 fn1@plt+0x0 17894.383903033: tr strt 0 [unknown] => 565b81e9 main+0x1c 17894.383903033: tr end call 565b81e9 main+0x1c => 565b8080 fn2@plt+0x0 17894.383903033: tr strt 0 [unknown] => 565b81ee main+0x21 17894.383903033: tr end call 565b81ee main+0x21 => 565b8060 fn3@plt+0x0 17894.383903237: tr strt 0 [unknown] => 565b81f3 main+0x26 17894.383903237: tr end return 565b81fc main+0x2f => f7c21519 [unknown] Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230131131625.6964-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-01perf tests shell: Fix check for libtracevent supportAthira Rajeev
Test “Use vfs_getname probe to get syscall args filenames” fails in environment with missing libtraceevent support as below: 82: Use vfs_getname probe to get syscall args filenames : --- start --- test child forked, pid 304726 Recording open file: event syntax error: 'probe:vfs_getname*' \___ unsupported tracepoint libtraceevent is necessary for tracepoint support Run 'perf list' for a list of valid events Usage: perf record [<options>] [<command>] or: perf record [<options>] -- <command> [<options>] -e, --event <event> event selector. use 'perf list' to list available events test child finished with -1 ---- end ---- Use vfs_getname probe to get syscall args filenames: FAILED! The environment has debuginfo but is missing the libtraceevent devel. Hence perf is compiled without libtraceevent support. The test tries to add probe “probe:vfs_getname” and then uses it with “perf record”. This fails at function “parse_events_add_tracepoint" due to missing libtraceevent. Similarly "probe libc's inet_pton & backtrace it with ping" test slso fails with same reason. Add a function in 'perf test shell' library to check if perf record with —dry-run reports any error on missing support for libtraceevent. Update both the tests to use this new function “skip_no_probe_record_support” before proceeding With using probe point via perf builtin record. With the change, 82: Use vfs_getname probe to get syscall args filenames : --- start --- test child forked, pid 305014 Recording open file: libtraceevent is necessary for tracepoint support test child finished with -2 ---- end ---- Use vfs_getname probe to get syscall args filenames: Skip 81: probe libc's inet_pton & backtrace it with ping : --- start --- test child forked, pid 305036 libtraceevent is necessary for tracepoint support test child finished with -2 ---- end ---- probe libc's inet_pton & backtrace it with ping: Skip Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: kjain@linux.ibm.com, Cc: linuxppc-dev@lists.ozlabs.org Link: http://lore.kernel.org/r/20230201180421.59640-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-01perf tests shell: Add check for perf data file in ↵Athira Rajeev
record+probe_libc_inet_pton test The "probe libc's inet_pton & backtrace it with ping" test installs a uprobe and uses perf record/script to check the backtrace. Currently even if the "perf record" fails, the test reports success. Logs below: # ./perf test -v "probe libc's inet_pton & backtrace it with ping" 81: probe libc's inet_pton & backtrace it with ping : --- start --- test child forked, pid 304211 failed to open /tmp/perf.data.Btf: No such file or directory test child finished with 0 ---- end ---- probe libc's inet_pton & backtrace it with ping: Ok Fix this by adding check for presence of perf.data file before proceeding with "perf script". With the patch changes, test reports fail correctly. # ./perf test -v "probe libc's inet_pton & backtrace it with ping" 81: probe libc's inet_pton & backtrace it with ping : --- start --- test child forked, pid 304358 FAIL: perf record failed to create "/tmp/perf.data.Uoi" test child finished with -1 ---- end ---- probe libc's inet_pton & backtrace it with ping: FAILED! Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Disha Goel <disgoel@linux.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lore.kernel.org/r/20230201180421.59640-1-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-01perf test: Add pipe mode test to the Intel PT test suiteNamhyung Kim
The test_pipe() function will check perf report and perf inject with pipe input. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230131023350.1903992-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-01perf session: Avoid calling lseek(2) for pipeNamhyung Kim
We should not call lseek(2) for pipes as it won't work. And we already in the proper place to read the data for AUXTRACE. Add the comment like in the PERF_RECORD_HEADER_TRACING_DATA. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230131023350.1903992-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-01perf intel-pt: Do not try to queue auxtrace data on pipeNamhyung Kim
When it processes AUXTRACE_INFO, it calls to auxtrace_queue_data() to collect AUXTRACE data first. That won't work with pipe since it needs lseek() to read the scattered aux data. $ perf record -o- -e intel_pt// true | perf report -i- --itrace=i100 # To display the perf.data header info, please use --header/--header-only options. # 0x4118 [0xa0]: failed to process type: 70 Error: failed to process sample For the pipe mode, it can handle the aux data as it gets. But there's no guarantee it can get the aux data in time. So the following warning will be shown at the beginning: WARNING: Intel PT with pipe mode is not recommended. The output cannot relied upon. In particular, time stamps and the order of events may be incorrect. Fixes: dbd134322e74f19d ("perf intel-pt: Add support for decoding AUX area samples") Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230131023350.1903992-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-01perf inject: Use perf_data__read() for auxtraceNamhyung Kim
In copy_bytes(), it reads the data from the (input) fd and writes it to the output file. But it does with the read(2) unconditionally which caused a problem of mixing buffered vs unbuffered I/O together. You can see the problem when using pipes. $ perf record -e intel_pt// -o- true | perf inject -b > /dev/null [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] 0x45c0 [0x30]: failed to process type: 71 It should use perf_data__read() to honor the 'use_stdio' setting. Fixes: 601366678c93618f ("perf data: Allow to use stdio functions for pipe mode") Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230131023350.1903992-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-01parisc: Wire up PTRACE_GETREGS/PTRACE_SETREGS for compat caseHelge Deller
Wire up the missing ptrace requests PTRACE_GETREGS, PTRACE_SETREGS, PTRACE_GETFPREGS and PTRACE_SETFPREGS when running 32-bit applications on 64-bit kernels. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # 4.7+
2023-02-01parisc: Replace hardcoded value with PRIV_USER constant in ptrace.cHelge Deller
Prefer usage of the PRIV_USER constant over the hard-coded value to set the lowest 2 bits for the userspace privilege. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org # 5.16+
2023-02-01Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Just small bugfixes all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa: ifcvf: Do proper cleanup if IFCVF init fails vhost-scsi: unbreak any layout for response tools/virtio: fix the vringh test for virtio ring changes vhost/net: Clear the pending messages when the backend is removed
2023-02-01Merge tag 'sound-6.2-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A bit higher volume of changes than wished, but each change is relatively small and the fix targets are mostly device-specific, so those should be safe as a late stage merge. The most significant LoC is about the memalloc helper fix, which is applied only to Xen PV. The other major parts are ASoC Intel SOF and AVS fixes that are scattered as various small code changes. The rest are device-specific fixes and quirks for HD- and USB-audio, FireWire and ASoC AMD / HDMI" * tag 'sound-6.2-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (30 commits) ALSA: firewire-motu: fix unreleased lock warning in hwdep device ALSA: memalloc: Workaround for Xen PV ASoC: cs42l56: fix DT probe ASoC: codecs: wsa883x: correct playback min/max rates ALSA: hda/realtek: Add Acer Predator PH315-54 ASoC: amd: yc: Add Xiaomi Redmi Book Pro 15 2022 into DMI table ALSA: hda: Do not unset preset when cleaning up codec ASoC: SOF: sof-audio: prepare_widgets: Check swidget for NULL on sink failure ASoC: hdmi-codec: zero clear HDMI pdata ASoC: SOF: ipc4-mtrace: prevent underflow in sof_ipc4_priority_mask_dfs_write() ASoC: Intel: sof_ssp_amp: always set dpcm_capture for amplifiers ASoC: Intel: sof_nau8825: always set dpcm_capture for amplifiers ASoC: Intel: sof_cs42l42: always set dpcm_capture for amplifiers ASoC: Intel: sof_rt5682: always set dpcm_capture for amplifiers ALSA: hda/via: Avoid potential array out-of-bound in add_secret_dac_path() ALSA: usb-audio: Add FIXED_RATE quirk for JBL Quantum610 Wireless ALSA: hda/realtek: fix mute/micmute LEDs, speaker don't work for a HP platform ASoC: SOF: keep prepare/unprepare widgets in sink path ASoC: SOF: sof-audio: skip prepare/unprepare if swidget is NULL ASoC: SOF: sof-audio: unprepare when swidget->use_count > 0 ...
2023-02-01ARM: dts: wpcm450: Add nuvoton,shm = <&shm> to FIU nodeJonathan Neuschäfer
The Flash Interface Unit (FIU) should have a reference to the Shared Memory controller (SHM) so that flash access from the host (x86 computer managed by the WPCM450 BMC) can be blocked during flash access by the FIU driver. Fixes: 38abcb0d68767 ("ARM: dts: wpcm450: Add FIU SPI controller node") Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Link: https://lore.kernel.org/r/20230129112611.1176517-1-j.neuschaefer@gmx.net Signed-off-by: Joel Stanley <joel@jms.id.au> Link: https://lore.kernel.org/r/20230201044158.962417-1-joel@jms.id.au Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-02-01MAINTAINERS: Update entry for MediaTek SoC supportMatthias Brugger
The linux-mediatek IRC channel has moved to liber.chat for quite some time. Apart from that, not all patches are also send to LKML, so add this ML explicitly. And last but not least: Angelo does a wunderfull job in reviewing patches for all kind of devices from MediaTek. Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/20230201152256.19514-1-matthias.bgg@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-02-01nvme-auth: use workqueue dedicated to authenticationShin'ichiro Kawasaki
NVMe In-Band authentication uses two kinds of works: chap->auth_work and ctrl->dhchap_auth_work. The latter work flushes or cancels the former work. However, the both works are queued to the same workqueue nvme-wq. It results in the lockdep WARNING as follows: WARNING: possible recursive locking detected 6.2.0-rc4+ #1 Not tainted -------------------------------------------- kworker/u16:7/69 is trying to acquire lock: ffff902d52e65548 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: start_flush_work+0x2c5/0x380 but task is already holding lock: ffff902d52e65548 ((wq_completion)nvme-wq){+.+.}-{0:0}, at: process_one_work+0x210/0x410 To avoid the WARNING, introduce a new workqueue nvme-auth-wq dedicated to chap->auth_work. Reported-by: Daniel Wagner <dwagner@suse.de> Link: https://lore.kernel.org/linux-nvme/20230130110802.paafkiipmitwtnwr@carbon.lan/ Fixes: f50fff73d620 ("nvme: implement In-Band authentication") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvme: clear the request_queue pointers on failure in nvme_alloc_io_tag_setMaurizio Lombardi
In nvme_alloc_io_tag_set(), the connect_q pointer should be set to NULL in case of error to avoid potential invalid pointer dereferences. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvme: clear the request_queue pointers on failure in nvme_alloc_admin_tag_setMaurizio Lombardi
If nvme_alloc_admin_tag_set() fails, the admin_q and fabrics_q pointers are left with an invalid, non-NULL value. Other functions may then check the pointers and dereference them, e.g. in nvme_probe() -> out_disable: -> nvme_dev_remove_admin(). Fix the bug by setting admin_q and fabrics_q to NULL in case of error. Also use the set variable to free the tag_set as ctrl->admin_tagset isn't initialized yet. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01nvme-fc: fix a missing queue put in nvmet_fc_ls_create_associationAmit Engel
As part of nvmet_fc_ls_create_association there is a case where nvmet_fc_alloc_target_queue fails right after a new association with an admin queue is created. In this case, no one releases the get taken in nvmet_fc_alloc_target_assoc. This fix is adding the missing put. Signed-off-by: Amit Engel <Amit.Engel@dell.com> Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-02-01drm/panel: boe-tv101wum-nl6: Ensure DSI writes succeed during disableStephen Boyd
The unprepare sequence has started to fail after moving to panel bridge code in the msm drm driver (commit 007ac0262b0d ("drm/msm/dsi: switch to DRM_PANEL_BRIDGE")). You'll see messages like this in the kernel logs: panel-boe-tv101wum-nl6 ae94000.dsi.0: failed to set panel off: -22 This is because boe_panel_enter_sleep_mode() needs an operating DSI link to set the panel into sleep mode. Performing those writes in the unprepare phase of bridge ops is too late, because the link has already been torn down by the DSI controller in post_disable, i.e. the PHY has been disabled, etc. See dsi_mgr_bridge_post_disable() for more details on the DSI . Split the unprepare function into a disable part and an unprepare part. For now, just the DSI writes to enter sleep mode are put in the disable function. This fixes the panel off routine and keeps the panel happy. My Wormdingler has an integrated touchscreen that stops responding to touch if the panel is only half disabled too. This patch fixes it. And finally, this saves power when the screen is off because without this fix the regulators for the panel are left enabled when nothing is being displayed on the screen. Fixes: 007ac0262b0d ("drm/msm/dsi: switch to DRM_PANEL_BRIDGE") Fixes: a869b9db7adf ("drm/panel: support for boe tv101wum-nl6 wuxga dsi video mode panel") Cc: yangcong <yangcong5@huaqin.corp-partner.google.com> Cc: Douglas Anderson <dianders@chromium.org> Cc: Jitao Shi <jitao.shi@mediatek.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Rob Clark <robdclark@chromium.org> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230106030108.2542081-1-swboyd@chromium.org (cherry picked from commit c913cd5489930abbb557ef144a333846286754c3) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2023-01-31Merge patch "riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y"Palmer Dabbelt
This is a single fix, but it conflicts with some recent features. I'm merging it on top of the commit it fixes to ease backporting. * b4-shazam-merge: riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=y Link: https://lore.kernel.org/r/20220922060958.44203-1-samuel@sholland.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-31riscv: Fix build with CONFIG_CC_OPTIMIZE_FOR_SIZE=ySamuel Holland
commit 8eb060e10185 ("arch/riscv: add Zihintpause support") broke building with CONFIG_CC_OPTIMIZE_FOR_SIZE enabled (gcc 11.1.0): CC arch/riscv/kernel/vdso/vgettimeofday.o In file included from <command-line>: ./arch/riscv/include/asm/jump_label.h: In function 'cpu_relax': ././include/linux/compiler_types.h:285:33: warning: 'asm' operand 0 probably does not match constraints 285 | #define asm_volatile_goto(x...) asm goto(x) | ^~~ ./arch/riscv/include/asm/jump_label.h:41:9: note: in expansion of macro 'asm_volatile_goto' 41 | asm_volatile_goto( | ^~~~~~~~~~~~~~~~~ ././include/linux/compiler_types.h:285:33: error: impossible constraint in 'asm' 285 | #define asm_volatile_goto(x...) asm goto(x) | ^~~ ./arch/riscv/include/asm/jump_label.h:41:9: note: in expansion of macro 'asm_volatile_goto' 41 | asm_volatile_goto( | ^~~~~~~~~~~~~~~~~ make[1]: *** [scripts/Makefile.build:249: arch/riscv/kernel/vdso/vgettimeofday.o] Error 1 make: *** [arch/riscv/Makefile:128: vdso_prepare] Error 2 Having a static branch in cpu_relax() is problematic because that function is widely inlined, including in some quite complex functions like in the VDSO. A quick measurement shows this static branch is responsible by itself for around 40% of the jump table. Drop the static branch, which ends up being the same number of instructions anyway. If Zihintpause is supported, we trade the nop from the static branch for a div. If Zihintpause is unsupported, we trade the jump from the static branch for (what gets interpreted as) a nop. Fixes: 8eb060e10185 ("arch/riscv: add Zihintpause support") Signed-off-by: Samuel Holland <samuel@sholland.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Release bridge info once packet escapes the br_netfilter path, from Florian Westphal. 2) Revert incorrect fix for the SCTP connection tracking chunk iterator, also from Florian. First path fixes a long standing issue, the second path addresses a mistake in the previous pull request for net. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: Revert "netfilter: conntrack: fix bug in for_each_sctp_chunk" netfilter: br_netfilter: disable sabotage_in hook after first suppression ==================== Link: https://lore.kernel.org/r/20230131133158.4052-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-31net: phy: meson-gxl: Add generic dummy stubs for MMD register accessChris Healy
The Meson G12A Internal PHY does not support standard IEEE MMD extended register access, therefore add generic dummy stubs to fail the read and write MMD calls. This is necessary to prevent the core PHY code from erroneously believing that EEE is supported by this PHY even though this PHY does not support EEE, as MMD register access returns all FFFFs. Fixes: 5c3407abb338 ("net: phy: meson-gxl: add g12a support") Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Chris Healy <healych@amazon.com> Reviewed-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://lore.kernel.org/r/20230130231402.471493-1-cphealy@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-31net: fix NULL pointer in skb_segment_listYan Zhai
Commit 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") introduced UDP listifyed GRO. The segmentation relies on frag_list being untouched when passing through the network stack. This assumption can be broken sometimes, where frag_list itself gets pulled into linear area, leaving frag_list being NULL. When this happens it can trigger following NULL pointer dereference, and panic the kernel. Reverse the test condition should fix it. [19185.577801][ C1] BUG: kernel NULL pointer dereference, address: ... [19185.663775][ C1] RIP: 0010:skb_segment_list+0x1cc/0x390 ... [19185.834644][ C1] Call Trace: [19185.841730][ C1] <TASK> [19185.848563][ C1] __udp_gso_segment+0x33e/0x510 [19185.857370][ C1] inet_gso_segment+0x15b/0x3e0 [19185.866059][ C1] skb_mac_gso_segment+0x97/0x110 [19185.874939][ C1] __skb_gso_segment+0xb2/0x160 [19185.883646][ C1] udp_queue_rcv_skb+0xc3/0x1d0 [19185.892319][ C1] udp_unicast_rcv_skb+0x75/0x90 [19185.900979][ C1] ip_protocol_deliver_rcu+0xd2/0x200 [19185.910003][ C1] ip_local_deliver_finish+0x44/0x60 [19185.918757][ C1] __netif_receive_skb_one_core+0x8b/0xa0 [19185.927834][ C1] process_backlog+0x88/0x130 [19185.935840][ C1] __napi_poll+0x27/0x150 [19185.943447][ C1] net_rx_action+0x27e/0x5f0 [19185.951331][ C1] ? mlx5_cq_tasklet_cb+0x70/0x160 [mlx5_core] [19185.960848][ C1] __do_softirq+0xbc/0x25d [19185.968607][ C1] irq_exit_rcu+0x83/0xb0 [19185.976247][ C1] common_interrupt+0x43/0xa0 [19185.984235][ C1] asm_common_interrupt+0x22/0x40 ... [19186.094106][ C1] </TASK> Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/Y9gt5EUizK1UImEP@debian Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-31net: fman: memac: free mdio device if lynx_pcs_create() failsVladimir Oltean
When memory allocation fails in lynx_pcs_create() and it returns NULL, there remains a dangling reference to the mdiodev returned by of_mdio_find_device() which is leaked as soon as memac_pcs_create() returns empty-handed. Fixes: a7c2a32e7f22 ("net: fman: memac: Use lynx pcs driver") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Sean Anderson <sean.anderson@seco.com> Acked-by: Madalin Bucur <madalin.bucur@oss.nxp.com> Link: https://lore.kernel.org/r/20230130193051.563315-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-31sctp: do not check hb_timer.expires when resetting hb_timerXin Long
It tries to avoid the frequently hb_timer refresh in commit ba6f5e33bdbb ("sctp: avoid refreshing heartbeat timer too often"), and it only allows mod_timer when the new expires is after hb_timer.expires. It means even a much shorter interval for hb timer gets applied, it will have to wait until the current hb timer to time out. In sctp_do_8_2_transport_strike(), when a transport enters PF state, it expects to update the hb timer to resend a heartbeat every rto after calling sctp_transport_reset_hb_timer(), which will not work as the change mentioned above. The frequently hb_timer refresh was caused by sctp_transport_reset_timers() called in sctp_outq_flush() and it was already removed in the commit above. So we don't have to check hb_timer.expires when resetting hb_timer as it is now not called very often. Fixes: ba6f5e33bdbb ("sctp: avoid refreshing heartbeat timer too often") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/d958c06985713ec84049a2d5664879802710179a.1675095933.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-01powerpc/kexec_file: Count hot-pluggable memory in FDT estimateSourabh Jain
On Systems where online memory is lesser compared to max memory, the kexec_file_load system call may fail to load the kdump kernel with the below errors: "Failed to update fdt with linux,drconf-usable-memory property" "Error setting up usable-memory property for kdump kernel" This happens because the size estimation for usable memory properties for the kdump kernel's FDT is based on the online memory whereas the usable memory properties include max memory. In short, the hot-pluggable memory is not accounted for while estimating the size of the usable memory properties. The issue is addressed by calculating usable memory property size using max hotplug address instead of the last online memory address. Fixes: 2377c92e37fe ("powerpc/kexec_file: fix FDT size estimation for kdump kernel") Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20230131030615.729894-1-sourabhjain@linux.ibm.com
2023-01-31mm: memcg: fix NULL pointer in mem_cgroup_track_foreign_dirty_slowpath()Kefeng Wang
As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"), hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could occurs a NULL pointer dereference, let's do not record the foreign writebacks for folio memcg is null in mem_cgroup_track_foreign_dirty() to fix it. Link: https://lkml.kernel.org/r/20230129040945.180629-1-wangkefeng.wang@huawei.com Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing") Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reported-by: Ma Wupeng <mawupeng1@huawei.com> Tested-by: Miko Larsson <mikoxyzzz@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Ma Wupeng <mawupeng1@huawei.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-31Kconfig.debug: fix the help description in SCHED_DEBUGye xingchen
The correct file path for SCHED_DEBUG is /sys/kernel/debug/sched. Link: https://lkml.kernel.org/r/202301291013573466558@zte.com.cn Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-31mm/swapfile: add cond_resched() in get_swap_pages()Longlong Xia
The softlockup still occurs in get_swap_pages() under memory pressure. 64 CPU cores, 64GB memory, and 28 zram devices, the disksize of each zram device is 50MB with same priority as si. Use the stress-ng tool to increase memory pressure, causing the system to oom frequently. The plist_for_each_entry_safe() loops in get_swap_pages() could reach tens of thousands of times to find available space (extreme case: cond_resched() is not called in scan_swap_map_slots()). Let's add cond_resched() into get_swap_pages() when failed to find available space to avoid softlockup. Link: https://lkml.kernel.org/r/20230128094757.1060525-1-xialonglong1@huawei.com Signed-off-by: Longlong Xia <xialonglong1@huawei.com> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Cc: Chen Wandun <chenwandun@huawei.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-31mm: use stack_depot_early_init for kmemleakZhaoyang Huang
Mirsad report the below error which is caused by stack_depot_init() failure in kvcalloc. Solve this by having stackdepot use stack_depot_early_init(). On 1/4/23 17:08, Mirsad Goran Todorovac wrote: I hate to bring bad news again, but there seems to be a problem with the output of /sys/kernel/debug/kmemleak: [root@pc-mtodorov ~]# cat /sys/kernel/debug/kmemleak unreferenced object 0xffff951c118568b0 (size 16): comm "kworker/u12:2", pid 56, jiffies 4294893952 (age 4356.548s) hex dump (first 16 bytes): 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0....... backtrace: [root@pc-mtodorov ~]# Apparently, backtrace of called functions on the stack is no longer printed with the list of memory leaks. This appeared on Lenovo desktop 10TX000VCR, with AlmaLinux 8.7 and BIOS version M22KT49A (11/10/2022) and 6.2-rc1 and 6.2-rc2 builds. This worked on 6.1 with the same CONFIG_KMEMLEAK=y and MGLRU enabled on a vanilla mainstream kernel from Mr. Torvalds' tree. I don't know if this is deliberate feature for some reason or a bug. Please find attached the config, lshw and kmemleak output. [vbabka@suse.cz: remove stack_depot_init() call] Link: https://lore.kernel.org/all/5272a819-ef74-65ff-be61-4d2d567337de@alu.unizg.hr/ Link: https://lkml.kernel.org/r/1674091345-14799-2-git-send-email-zhaoyang.huang@unisoc.com Fixes: 56a61617dd22 ("mm: use stack_depot for recording kmemleak's backtrace") Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: ke.wang <ke.wang@unisoc.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-31Squashfs: fix handling and sanity checking of xattr_ids countPhillip Lougher
A Sysbot [1] corrupted filesystem exposes two flaws in the handling and sanity checking of the xattr_ids count in the filesystem. Both of these flaws cause computation overflow due to incorrect typing. In the corrupted filesystem the xattr_ids value is 4294967071, which stored in a signed variable becomes the negative number -225. Flaw 1 (64-bit systems only): The signed integer xattr_ids variable causes sign extension. This causes variable overflow in the SQUASHFS_XATTR_*(A) macros. The variable is first multiplied by sizeof(struct squashfs_xattr_id) where the type of the sizeof operator is "unsigned long". On a 64-bit system this is 64-bits in size, and causes the negative number to be sign extended and widened to 64-bits and then become unsigned. This produces the very large number 18446744073709548016 or 2^64 - 3600. This number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by SQUASHFS_METADATA_SIZE overflows and produces a length of 0 (stored in len). Flaw 2 (32-bit systems only): On a 32-bit system the integer variable is not widened by the unsigned long type of the sizeof operator (32-bits), and the signedness of the variable has no effect due it always being treated as unsigned. The above corrupted xattr_ids value of 4294967071, when multiplied overflows and produces the number 4294963696 or 2^32 - 3400. This number when rounded up by SQUASHFS_METADATA_SIZE - 1 (8191 bytes) and divided by SQUASHFS_METADATA_SIZE overflows again and produces a length of 0. The effect of the 0 length computation: In conjunction with the corrupted xattr_ids field, the filesystem also has a corrupted xattr_table_start value, where it matches the end of filesystem value of 850. This causes the following sanity check code to fail because the incorrectly computed len of 0 matches the incorrect size of the table reported by the superblock (0 bytes). len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids); indexes = SQUASHFS_XATTR_BLOCKS(*xattr_ids); /* * The computed size of the index table (len bytes) should exactly * match the table start and end points */ start = table_start + sizeof(*id_table); end = msblk->bytes_used; if (len != (end - start)) return ERR_PTR(-EINVAL); Changing the xattr_ids variable to be "usigned int" fixes the flaw on a 64-bit system. This relies on the fact the computation is widened by the unsigned long type of the sizeof operator. Casting the variable to u64 in the above macro fixes this flaw on a 32-bit system. It also means 64-bit systems do not implicitly rely on the type of the sizeof operator to widen the computation. [1] https://lore.kernel.org/lkml/000000000000cd44f005f1a0f17f@google.com/ Link: https://lkml.kernel.org/r/20230127061842.10965-1-phillip@squashfs.org.uk Fixes: 506220d2ba21 ("squashfs: add more sanity checks in xattr id lookup") Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: <syzbot+082fa4af80a5bb1a9843@syzkaller.appspotmail.com> Cc: Alexey Khoroshilov <khoroshilov@ispras.ru> Cc: Fedor Pchelkin <pchelkin@ispras.ru> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-31sh: define RUNTIME_DISCARD_EXITTom Saeger
sh vmlinux fails to link with GNU ld < 2.40 (likely < 2.36) since commit 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv"). This is similar to fixes for powerpc and s390: commit 4b9880dbf3bd ("powerpc/vmlinux.lds: Define RUNTIME_DISCARD_EXIT"). commit a494398bde27 ("s390: define RUNTIME_DISCARD_EXIT to fix link error with GNU ld < 2.36"). $ sh4-linux-gnu-ld --version | head -n1 GNU ld (GNU Binutils for Debian) 2.35.2 $ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu- microdev_defconfig $ make ARCH=sh CROSS_COMPILE=sh4-linux-gnu- `.exit.text' referenced in section `__bug_table' of crypto/algboss.o: defined in discarded section `.exit.text' of crypto/algboss.o `.exit.text' referenced in section `__bug_table' of drivers/char/hw_random/core.o: defined in discarded section `.exit.text' of drivers/char/hw_random/core.o make[2]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1 make[1]: *** [Makefile:1252: vmlinux] Error 2 arch/sh/kernel/vmlinux.lds.S keeps EXIT_TEXT: /* * .exit.text is discarded at runtime, not link time, to deal with * references from __bug_table */ .exit.text : AT(ADDR(.exit.text)) { EXIT_TEXT } However, EXIT_TEXT is thrown away by DISCARD(include/asm-generic/vmlinux.lds.h) because sh does not define RUNTIME_DISCARD_EXIT. GNU ld 2.40 does not have this issue and builds fine. This corresponds with Masahiro's comments in a494398bde27: "Nathan [Chancellor] also found that binutils commit 21401fc7bf67 ("Duplicate output sections in scripts") cured this issue, so we cannot reproduce it with binutils 2.36+, but it is better to not rely on it." Link: https://lkml.kernel.org/r/9166a8abdc0f979e50377e61780a4bba1dfa2f52.1674518464.git.tom.saeger@oracle.com Fixes: 99cb0d917ffa ("arch: fix broken BuildID for arm64 and riscv") Link: https://lore.kernel.org/all/Y7Jal56f6UBh1abE@dev-arch.thelio-3990X/ Link: https://lore.kernel.org/all/20230123194218.47ssfzhrpnv3xfez@oracle.com/ Signed-off-by: Tom Saeger <tom.saeger@oracle.com> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Dennis Gilmore <dennis@ausil.us> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-31highmem: round down the address passed to kunmap_flush_on_unmap()Matthew Wilcox (Oracle)
We already round down the address in kunmap_local_indexed() which is the other implementation of __kunmap_local(). The only implementation of kunmap_flush_on_unmap() is PA-RISC which is expecting a page-aligned address. This may be causing PA-RISC to be flushing the wrong addresses currently. Link: https://lkml.kernel.org/r/20230126200727.1680362-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Fixes: 298fa1ad5571 ("highmem: Provide generic variant of kmap_atomic*") Reviewed-by: Ira Weiny <ira.weiny@intel.com> Cc: "Fabio M. De Francesco" <fmdefrancesco@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Helge Deller <deller@gmx.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: David Sterba <dsterba@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-31migrate: hugetlb: check for hugetlb shared PMD in node migrationMike Kravetz
migrate_pages/mempolicy semantics state that CAP_SYS_NICE is required to move pages shared with another process to a different node. page_mapcount > 1 is being used to determine if a hugetlb page is shared. However, a hugetlb page will have a mapcount of 1 if mapped by multiple processes via a shared PMD. As a result, hugetlb pages shared by multiple processes and mapped with a shared PMD can be moved by a process without CAP_SYS_NICE. To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found consider the page shared. Link: https://lkml.kernel.org/r/20230126222721.222195-3-mike.kravetz@oracle.com Fixes: e2d8cf405525 ("migrate: add hugepage migration code to migrate_pages()") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: James Houghton <jthoughton@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-31mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smapsMike Kravetz
Patch series "Fixes for hugetlb mapcount at most 1 for shared PMDs". This issue of mapcount in hugetlb pages referenced by shared PMDs was discussed in [1]. The following two patches address user visible behavior caused by this issue. [1] https://lore.kernel.org/linux-mm/Y9BF+OCdWnCSilEu@monkey/ This patch (of 2): A hugetlb page will have a mapcount of 1 if mapped by multiple processes via a shared PMD. This is because only the first process increases the map count, and subsequent processes just add the shared PMD page to their page table. page_mapcount is being used to decide if a hugetlb page is shared or private in /proc/PID/smaps. Pages referenced via a shared PMD were incorrectly being counted as private. To fix, check for a shared PMD if mapcount is 1. If a shared PMD is found count the hugetlb page as shared. A new helper to check for a shared PMD is added. [akpm@linux-foundation.org: simplification, per David] [akpm@linux-foundation.org: hugetlb.h: include page_ref.h for page_count()] Link: https://lkml.kernel.org/r/20230126222721.222195-2-mike.kravetz@oracle.com Fixes: 25ee01a2fca0 ("mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Peter Xu <peterx@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: James Houghton <jthoughton@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-31mm/MADV_COLLAPSE: catch !none !huge !bad pmd lookupsZach O'Keefe
In commit 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE") we make the following change to find_pmd_or_thp_or_none(): - if (!pmd_present(pmde)) - return SCAN_PMD_NULL; + if (pmd_none(pmde)) + return SCAN_PMD_NONE; This was for-use by MADV_COLLAPSE file/shmem codepaths, where MADV_COLLAPSE might identify a pte-mapped hugepage, only to have khugepaged race-in, free the pte table, and clear the pmd. Such codepaths include: A) If we find a suitably-aligned compound page of order HPAGE_PMD_ORDER already in the pagecache. B) In retract_page_tables(), if we fail to grab mmap_lock for the target mm/address. In these cases, collapse_pte_mapped_thp() really does expect a none (not just !present) pmd, and we want to suitably identify that case separate from the case where no pmd is found, or it's a bad-pmd (of course, many things could happen once we drop mmap_lock, and the pmd could plausibly undergo multiple transitions due to intervening fault, split, etc). Regardless, the code is prepared install a huge-pmd only when the existing pmd entry is either a genuine pte-table-mapping-pmd, or the none-pmd. However, the commit introduces a logical hole; namely, that we've allowed !none- && !huge- && !bad-pmds to be classified as genuine pte-table-mapping-pmds. One such example that could leak through are swap entries. The pmd values aren't checked again before use in pte_offset_map_lock(), which is expecting nothing less than a genuine pte-table-mapping-pmd. We want to put back the !pmd_present() check (below the pmd_none() check), but need to be careful to deal with subtleties in pmd transitions and treatments by various arch. The issue is that __split_huge_pmd_locked() temporarily clears the present bit (or otherwise marks the entry as invalid), but pmd_present() and pmd_trans_huge() still need to return true while the pmd is in this transitory state. For example, x86's pmd_present() also checks the _PAGE_PSE , riscv's version also checks the _PAGE_LEAF bit, and arm64 also checks a PMD_PRESENT_INVALID bit. Covering all 4 cases for x86 (all checks done on the same pmd value): 1) pmd_present() && pmd_trans_huge() All we actually know here is that the PSE bit is set. Either: a) We aren't racing with __split_huge_page(), and PRESENT or PROTNONE is set. => huge-pmd b) We are currently racing with __split_huge_page(). The danger here is that we proceed as-if we have a huge-pmd, but really we are looking at a pte-mapping-pmd. So, what is the risk of this danger? The only relevant path is: madvise_collapse() -> collapse_pte_mapped_thp() Where we might just incorrectly report back "success", when really the memory isn't pmd-backed. This is fine, since split could happen immediately after (actually) successful madvise_collapse(). So, it should be safe to just assume huge-pmd here. 2) pmd_present() && !pmd_trans_huge() Either: a) PSE not set and either PRESENT or PROTNONE is. => pte-table-mapping pmd (or PROT_NONE) b) devmap. This routine can be called immediately after unlocking/locking mmap_lock -- or called with no locks held (see khugepaged_scan_mm_slot()), so previous VMA checks have since been invalidated. 3) !pmd_present() && pmd_trans_huge() Not possible. 4) !pmd_present() && !pmd_trans_huge() Neither PRESENT nor PROTNONE set => not present I've checked all archs that implement pmd_trans_huge() (arm64, riscv, powerpc, longarch, x86, mips, s390) and this logic roughly translates (though devmap treatment is unique to x86 and powerpc, and (3) doesn't necessarily hold in general -- but that doesn't matter since !pmd_present() always takes failure path). Also, add a comment above find_pmd_or_thp_or_none() to help future travelers reason about the validity of the code; namely, the possible mutations that might happen out from under us, depending on how mmap_lock is held (if at all). Link: https://lkml.kernel.org/r/20230125225358.2576151-1-zokeefe@google.com Fixes: 34488399fa08 ("mm/madvise: add file and shmem support to MADV_COLLAPSE") Signed-off-by: Zach O'Keefe <zokeefe@google.com> Reported-by: Hugh Dickins <hughd@google.com> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>