summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-04-18arm64: drop ranges in definition of ARCH_FORCE_MAX_ORDERMike Rapoport (IBM)
It is not a good idea to change fundamental parameters of core memory management. Having predefined ranges suggests that the values within those ranges are sensible, but one has to *really* understand implications of changing MAX_ORDER before actually amending it and ranges don't help here. Drop ranges in definition of ARCH_FORCE_MAX_ORDER and make its prompt visible only if EXPERT=y Link: https://lkml.kernel.org/r/20230324052233.2654090-3-rppt@kernel.org Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rich Felker <dalias@libc.org> Cc: "Russell King (Oracle)" <linux@armlinux.org.uk> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18arm: reword ARCH_FORCE_MAX_ORDER prompt and help textMike Rapoport (IBM)
Patch series "arch,mm: cleanup Kconfig entries for ARCH_FORCE_MAX_ORDER", v3. Several architectures have ARCH_FORCE_MAX_ORDER in their Kconfig and they all have wrong and misleading prompt and help text for this option. Besides, some define insane limits for possible values of ARCH_FORCE_MAX_ORDER, some carefully define ranges only for a subset of possible configurations, some make this option configurable by users for no good reason. This set updates the prompt and help text everywhere and does its best to update actual definitions of ranges where applicable. kbuild generated a bunch of false positives because it assigns -1 to ARCH_FORCE_MAX_ORDER, hopefully this will be fixed soon. This patch (of 14): The prompt and help text of ARCH_FORCE_MAX_ORDER are not even close to describe this configuration option. Update both to actually describe what this option does. Link: https://lkml.kernel.org/r/20230325060828.2662773-1-rppt@kernel.org Link: https://lkml.kernel.org/r/20230324052233.2654090-1-rppt@kernel.org Link: https://lkml.kernel.org/r/20230324052233.2654090-2-rppt@kernel.org Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Miller <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rich Felker <dalias@libc.org> Cc: "Russell King (Oracle)" <linux@armlinux.org.uk> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18memcg: do not drain charge pcp caches on remote isolated cpusMichal Hocko
Leonardo Bras has noticed that pcp charge cache draining might be disruptive on workloads relying on 'isolated cpus', a feature commonly used on workloads that are sensitive to interruption and context switching such as vRAN and Industrial Control Systems. There are essentially two ways how to approach the issue. We can either allow the pcp cache to be drained on a different rather than a local cpu or avoid remote flushing on isolated cpus. The current pcp charge cache is really optimized for high performance and it always relies to stick with its cpu. That means it only requires local_lock (preempt_disable on !RT) and draining is handed over to pcp WQ to drain locally again. The former solution (remote draining) would require to add an additional locking to prevent local charges from racing with the draining. This adds an atomic operation to otherwise simple arithmetic fast path in the try_charge path. Another concern is that the remote draining can cause a lock contention for the isolated workloads and therefore interfere with it indirectly via user space interfaces. Another option is to avoid draining scheduling on isolated cpus altogether. That means that those remote cpus would keep their charges even after drain_all_stock returns. This is certainly not optimal either but it shouldn't really cause any major problems. In the worst case (many isolated cpus with charges - each of them with MEMCG_CHARGE_BATCH i.e 64 page) the memory consumption of a memcg would be artificially higher than can be immediately used from other cpus. Theoretically a memcg OOM killer could be triggered pre-maturely. Currently it is not really clear whether this is a practical problem though. Tight memcg limit would be really counter productive to cpu isolated workloads pretty much by definition because any memory reclaimed induced by memcg limit could break user space timing expectations as those usually expect execution in the userspace most of the time. Also charges could be left behind on memcg removal. Any future charge on those isolated cpus will drain that pcp cache so this won't be a permanent leak. Considering cons and pros of both approaches this patch is implementing the second option and simply do not schedule remote draining if the target cpu is isolated. This solution is much more simpler. It doesn't add any new locking and it is more more predictable from the user space POV. Should the pre-mature memcg OOM become a real life problem, we can revisit this decision. [akpm@linux-foundation.org: memcontrol.c needs sched/isolation.h] Link: https://lore.kernel.org/oe-kbuild-all/202303180617.7E3aIlHf-lkp@intel.com/ Link: https://lkml.kernel.org/r/20230317134448.11082-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reported-by: Leonardo Bras <leobras@redhat.com> Acked-by: Shakeel Butt <shakeelb@google.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18sched/isolation: add cpu_is_isolated() APIFrederic Weisbecker
Patch series "memcg, cpuisol: do not interfere pcp cache charges draining with cpuisol workloads". Leonardo has reported [1] that pcp memcg charge draining can interfere with cpu isolated workloads. The said draining is done from a WQ context with a pcp worker scheduled on each CPU which holds any cached charges for a specific memcg hierarchy. Operation is not really a common operation [2]. It can be triggered from the userspace though so some care is definitely due. Leonardo has tried to address the issue by allowing remote charge draining [3]. This approach requires an additional locking to synchronize pcp caches sync from a remote cpu from local pcp consumers. Even though the proposed lock was per-cpu there is still potential for contention and less predictable behavior. This patchset addresses the issue from a different angle. Rather than dealing with a potential synchronization, cpus which are isolated are simply never scheduled to be drained. This means that a small amount of charges could be laying around and waiting for a later use or they are flushed when a different memcg is charged from the same cpu. More details are in patch 2. The first patch from Frederic is implementing an abstraction to tell whether a specific cpu has been isolated and therefore require a special treatment. This patch (of 2): Provide this new API to check if a CPU has been isolated either through isolcpus= or nohz_full= kernel parameter. It aims at avoiding kernel load deemed to be safely spared on CPUs running sensitive workload that can't bear any disturbance, such as pcp cache draining. Link: https://lkml.kernel.org/r/20230317134448.11082-1-mhocko@kernel.org Link: https://lkml.kernel.org/r/20230317134448.11082-2-mhocko@kernel.org Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Suggested-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Leonardo Bras <leobras@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()Ivan Orlov
Syzkaller reported the following issue: kernel BUG at mm/khugepaged.c:1823! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 5097 Comm: syz-executor220 Not tainted 6.2.0-syzkaller-13154-g857f1268a591 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/16/2023 RIP: 0010:collapse_file mm/khugepaged.c:1823 [inline] RIP: 0010:hpage_collapse_scan_file+0x67c8/0x7580 mm/khugepaged.c:2233 Code: 00 00 89 de e8 c9 66 a3 ff 31 ff 89 de e8 c0 66 a3 ff 45 84 f6 0f 85 28 0d 00 00 e8 22 64 a3 ff e9 dc f7 ff ff e8 18 64 a3 ff <0f> 0b f3 0f 1e fa e8 0d 64 a3 ff e9 93 f6 ff ff f3 0f 1e fa 4c 89 RSP: 0018:ffffc90003dff4e0 EFLAGS: 00010093 RAX: ffffffff81e95988 RBX: 00000000000001c1 RCX: ffff8880205b3a80 RDX: 0000000000000000 RSI: 00000000000001c0 RDI: 00000000000001c1 RBP: ffffc90003dff830 R08: ffffffff81e90e67 R09: fffffbfff1a433c3 R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000000 R13: ffffc90003dff6c0 R14: 00000000000001c0 R15: 0000000000000000 FS: 00007fdbae5ee700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fdbae6901e0 CR3: 000000007b2dd000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> madvise_collapse+0x721/0xf50 mm/khugepaged.c:2693 madvise_vma_behavior mm/madvise.c:1086 [inline] madvise_walk_vmas mm/madvise.c:1260 [inline] do_madvise+0x9e5/0x4680 mm/madvise.c:1439 __do_sys_madvise mm/madvise.c:1452 [inline] __se_sys_madvise mm/madvise.c:1450 [inline] __x64_sys_madvise+0xa5/0xb0 mm/madvise.c:1450 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The xas_store() call during page cache scanning can potentially translate 'xas' into the error state (with the reproducer provided by the syzkaller the error code is -ENOMEM). However, there are no further checks after the 'xas_store', and the next call of 'xas_next' at the start of the scanning cycle doesn't increase the xa_index, and the issue occurs. This patch will add the xarray state error checking after the xas_store() and the corresponding result error code. Tested via syzbot. [akpm@linux-foundation.org: update include/trace/events/huge_memory.h's SCAN_STATUS] Link: https://lkml.kernel.org/r/20230329145330.23191-1-ivan.orlov0322@gmail.com Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com> Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com Tested-by: Zach O'Keefe <zokeefe@google.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Himadri Pandya <himadrispandya@gmail.com> Cc: Ivan Orlov <ivan.orlov0322@gmail.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Song Liu <songliubraving@fb.com> Cc: Rik van Riel <riel@surriel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18kasan: remove hwasan-kernel-mem-intrinsic-prefix=1 for clang-14Arnd Bergmann
Some unknown -mllvm options (i.e. those starting with the letter "h") don't cause an error to be returned by clang, so the cc-option helper adds the unknown hwasan-kernel-mem-intrinsic-prefix=1 flag to CFLAGS with compilers that are new enough for hwasan but too old for this option. This causes a rather unreadable build failure: fixdep: error opening file: scripts/mod/.empty.o.d: No such file or directory make[4]: *** [/home/arnd/arm-soc/scripts/Makefile.build:252: scripts/mod/empty.o] Error 2 fixdep: error opening file: scripts/mod/.devicetable-offsets.s.d: No such file or directory make[4]: *** [/home/arnd/arm-soc/scripts/Makefile.build:114: scripts/mod/devicetable-offsets.s] Error 2 Add a version check to only allow this option with clang-15, gcc-13 or later versions. Link: https://lkml.kernel.org/r/20230418122350.1646391-1-arnd@kernel.org Fixes: 51287dcb00cc ("kasan: emit different calls for instrumentable memintrinsics") Link: https://lore.kernel.org/all/CANpmjNMwYosrvqh4ogDO8rgn+SeDHM2b-shD21wTypm_6MMe=g@mail.gmail.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Rix <trix@redhat.com> Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18zsmalloc: reset compaction source zspage pointer after putback_zspage()Sergey Senozhatsky
The current implementation of the compaction loop fails to set the source zspage pointer to NULL in all cases, leading to a potential issue where __zs_compact() could use a stale zspage pointer. This pointer could even point to a previously freed zspage, causing unexpected behavior in the putback_zspage() and migrate_write_unlock() functions after returning from the compaction loop. Address the issue by ensuring that the source zspage pointer is always set to NULL when it should be. Link: https://lkml.kernel.org/r/20230417130850.1784777-1-senozhatsky@chromium.org Fixes: 5a845e9f2d66 ("zsmalloc: rework compaction algorithm") Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reported-by: Yu Zhao <yuzhao@google.com> Tested-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: make arch_has_descending_max_zone_pfns() staticArnd Bergmann
clang produces a build failure on x86 for some randconfig builds after a change that moves around code to mm/mm_init.c: Cannot find symbol for section 2: .text. mm/mm_init.o: failed I have not been able to figure out why this happens, but the __weak annotation on arch_has_descending_max_zone_pfns() is the trigger here. Removing the weak function in favor of an open-coded Kconfig option check avoids the problem and becomes clearer as well as better to optimize by the compiler. [arnd@arndb.de: fix logic bug] Link: https://lkml.kernel.org/r/20230415081904.969049-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20230414080418.110236-1-arnd@kernel.org Fixes: 9420f89db2dd ("mm: move most of core MM initialization to mm/mm_init.c") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: SeongJae Park <sj@kernel.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: kernel test robot <oliver.sang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: avoid passing 0 to __ffs()Kirill A. Shutemov
23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") results in various boot failures (hang) on arm targets Debug messages reveal the reason. ########### MAX_ORDER=10 start=0 __ffs(start)=-1 min()=10 min_t=-1 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If start==0, __ffs(start) returns 0xfffffff or (as int) -1, which min_t() interprets as such, while min() apparently uses the returned unsigned long value. Obviously a negative order isn't received well by the rest of the code. [akpm@linux-foundation.org: fix comment, per Mike] Link: https://lkml.kernel.org/r/ZDBa7HWZK69dKKzH@kernel.org Link: https://lkml.kernel.org/r/20230406072529.vupqyrzqnhyozeyh@box.shutemov.name Fixes: 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") Signed-off-by: "Kirill A. Shutemov" <kirill@shutemov.name> Reported-by: Guenter Roeck <linux@roeck-us.net> Link: https://lkml.kernel.org/r/9460377a-38aa-4f39-ad57-fb73725f92db@roeck-us.net Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18RISC-V: Add hwprobe vDSO function and dataEvan Green
Add a vDSO function __vdso_riscv_hwprobe, which can sit in front of the riscv_hwprobe syscall and answer common queries. We stash a copy of static answers for the "all CPUs" case in the vDSO data page. This data is private to the vDSO, so we can decide later to change what's stored there or under what conditions we defer to the syscall. Currently all data can be discovered at boot, so the vDSO function answers all queries when the cpumask is set to the "all CPUs" hint. There's also a boolean in the data that lets the vDSO function know that all CPUs are the same. In that case, the vDSO will also answer queries for arbitrary CPU masks in addition to the "all CPUs" hint. Signed-off-by: Evan Green <evan@rivosinc.com> Link: https://lore.kernel.org/r/20230407231103.2622178-7-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18selftests: Test the new RISC-V hwprobe interfaceEvan Green
This adds a test for the recently added RISC-V interface for probing hardware capabilities. It happens to be the first selftest we have for RISC-V, so I've added some infrastructure for those as well. Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Evan Green <evan@rivosinc.com> Link: https://lore.kernel.org/r/20230407231103.2622178-6-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18RISC-V: hwprobe: Support probing of misaligned access performanceEvan Green
This allows userspace to select various routines to use based on the performance of misaligned access on the target hardware. Rather than adding DT bindings, this change taps into the alternatives mechanism used to probe CPU errata. Add a new function pointer alongside the vendor-specific errata_patch_func() that probes for desirable errata (otherwise known as "features"). Unlike the errata_patch_func(), this function is called on each CPU as it comes up, so it can save feature information per-CPU. The T-head C906 has fast unaligned access, both as defined by GCC [1], and in performing a basic benchmark, which determined that byte copies are >50% slower than a misaligned word copy of the same data size (source for this test at [2]): bytecopy size f000 count 50000 offset 0 took 31664899 us wordcopy size f000 count 50000 offset 0 took 5180919 us wordcopy size f000 count 50000 offset 1 took 13416949 us [1] https://github.com/gcc-mirror/gcc/blob/master/gcc/config/riscv/riscv.cc#L353 [2] https://pastebin.com/EPXvDHSW Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com> Link: https://lore.kernel.org/r/20230407231103.2622178-5-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMAEvan Green
We have an implicit set of base behaviors that userspace depends on, which are mostly defined in various ISA specifications. Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com> Link: https://lore.kernel.org/r/20230407231103.2622178-4-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18RISC-V: Add a syscall for HW probingEvan Green
We don't have enough space for these all in ELF_HWCAP{,2} and there's no system call that quite does this, so let's just provide an arch-specific one to probe for hardware capabilities. This currently just provides m{arch,imp,vendor}id, but with the key-value pairs we can pass more in the future. Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com> Link: https://lore.kernel.org/r/20230407231103.2622178-3-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18RISC-V: Move struct riscv_cpuinfo to new headerEvan Green
In preparation for tracking and exposing microarchitectural details to userspace (like whether or not unaligned accesses are fast), move the riscv_cpuinfo struct out to its own new cpufeatures.h header. It will need to be used by more than just cpu.c. Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu> Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com> Link: https://lore.kernel.org/r/20230407231103.2622178-2-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18sync mm-stable with mm-hotfixes-stable to pick up depended-upon upstream changesAndrew Morton
2023-04-18nilfs2: initialize unused bytes in segment summary blocksRyusuke Konishi
Syzbot still reports uninit-value in nilfs_add_checksums_on_logs() for KMSAN enabled kernels after applying commit 7397031622e0 ("nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field"). This is because the unused bytes at the end of each block in segment summaries are not initialized. So this fixes the issue by padding the unused bytes with null bytes. Link: https://lkml.kernel.org/r/20230417173513.12598-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+048585f3f4227bb2b49b@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b Cc: Alexander Potapenko <glider@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: page_alloc: skip regions with hugetlbfs pages when allocating 1G pagesMel Gorman
A bug was reported by Yuanxi Liu where allocating 1G pages at runtime is taking an excessive amount of time for large amounts of memory. Further testing allocating huge pages that the cost is linear i.e. if allocating 1G pages in batches of 10 then the time to allocate nr_hugepages from 10->20->30->etc increases linearly even though 10 pages are allocated at each step. Profiles indicated that much of the time is spent checking the validity within already existing huge pages and then attempting a migration that fails after isolating the range, draining pages and a whole lot of other useless work. Commit eb14d4eefdc4 ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig") removed two checks, one which ignored huge pages for contiguous allocations as huge pages can sometimes migrate. While there may be value on migrating a 2M page to satisfy a 1G allocation, it's potentially expensive if the 1G allocation fails and it's pointless to try moving a 1G page for a new 1G allocation or scan the tail pages for valid PFNs. Reintroduce the PageHuge check and assume any contiguous region with hugetlbfs pages is unsuitable for a new 1G allocation. The hpagealloc test allocates huge pages in batches and reports the average latency per page over time. This test happens just after boot when fragmentation is not an issue. Units are in milliseconds. hpagealloc 6.3.0-rc6 6.3.0-rc6 6.3.0-rc6 vanilla hugeallocrevert-v1r1 hugeallocsimple-v1r2 Min Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 18.94 ( 28.30%) 1st-qrtle Latency 356.61 ( 0.00%) 5.34 ( 98.50%) 19.85 ( 94.43%) 2nd-qrtle Latency 697.26 ( 0.00%) 5.47 ( 99.22%) 20.44 ( 97.07%) 3rd-qrtle Latency 972.94 ( 0.00%) 5.50 ( 99.43%) 20.81 ( 97.86%) Max-1 Latency 26.42 ( 0.00%) 5.07 ( 80.82%) 18.94 ( 28.30%) Max-5 Latency 82.14 ( 0.00%) 5.11 ( 93.78%) 19.31 ( 76.49%) Max-10 Latency 150.54 ( 0.00%) 5.20 ( 96.55%) 19.43 ( 87.09%) Max-90 Latency 1164.45 ( 0.00%) 5.53 ( 99.52%) 20.97 ( 98.20%) Max-95 Latency 1223.06 ( 0.00%) 5.55 ( 99.55%) 21.06 ( 98.28%) Max-99 Latency 1278.67 ( 0.00%) 5.57 ( 99.56%) 22.56 ( 98.24%) Max Latency 1310.90 ( 0.00%) 8.06 ( 99.39%) 26.62 ( 97.97%) Amean Latency 678.36 ( 0.00%) 5.44 * 99.20%* 20.44 * 96.99%* 6.3.0-rc6 6.3.0-rc6 6.3.0-rc6 vanilla revert-v1 hugeallocfix-v2 Duration User 0.28 0.27 0.30 Duration System 808.66 17.77 35.99 Duration Elapsed 830.87 18.08 36.33 The vanilla kernel is poor, taking up to 1.3 second to allocate a huge page and almost 10 minutes in total to run the test. Reverting the problematic commit reduces it to 8ms at worst and the patch takes 26ms. This patch fixes the main issue with skipping huge pages but leaves the page_count() out because a page with an elevated count potentially can migrate. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=217022 Link: https://lkml.kernel.org/r/20230414141429.pwgieuwluxwez3rj@techsingularity.net Fixes: eb14d4eefdc4 ("mm,page_alloc: drop unnecessary checks from pfn_range_valid_contig") Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reported-by: Yuanxi Liu <y.liu@naruida.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm/mmap: regression fix for unmapped_area{_topdown}Liam R. Howlett
The maple tree limits the gap returned to a window that specifically fits what was asked. This may not be optimal in the case of switching search directions or a gap that does not satisfy the requested space for other reasons. Fix the search by retrying the operation and limiting the search window in the rare occasion that a conflict occurs. Link: https://lkml.kernel.org/r/20230414185919.4175572-1-Liam.Howlett@oracle.com Fixes: 3499a13168da ("mm/mmap: use maple tree for unmapped_area{_topdown}") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18maple_tree: fix mas_empty_area() searchLiam R. Howlett
The internal function of mas_awalk() was incorrectly skipping the last entry in a node, which could potentially be NULL. This is only a problem for the left-most node in the tree - otherwise that NULL would not exist. Fix mas_awalk() by using the metadata to obtain the end of the node for the loop and the logical pivot as apposed to the raw pivot value. Link: https://lkml.kernel.org/r/20230414145728.4067069-2-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18maple_tree: make maple state reusable after mas_empty_area_rev()Liam R. Howlett
Stop using maple state min/max for the range by passing through pointers for those values. This will allow the maple state to be reused without resetting. Also add some logic to fail out early on searching with invalid arguments. Link: https://lkml.kernel.org/r/20230414145728.4067069-1-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: kmsan: handle alloc failures in kmsan_ioremap_page_range()Alexander Potapenko
Similarly to kmsan_vmap_pages_range_noflush(), kmsan_ioremap_page_range() must also properly handle allocation/mapping failures. In the case of such, it must clean up the already created metadata mappings and return an error code, so that the error can be propagated to ioremap_page_range(). Without doing so, KMSAN may silently fail to bring the metadata for the page range into a consistent state, which will result in user-visible crashes when trying to access them. Link: https://lkml.kernel.org/r/20230413131223.4135168-2-glider@google.com Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations") Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com> Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/ Reviewed-by: Marco Elver <elver@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush()Alexander Potapenko
As reported by Dipanjan Das, when KMSAN is used together with kernel fault injection (or, generally, even without the latter), calls to kcalloc() or __vmap_pages_range_noflush() may fail, leaving the metadata mappings for the virtual mapping in an inconsistent state. When these metadata mappings are accessed later, the kernel crashes. To address the problem, we return a non-zero error code from kmsan_vmap_pages_range_noflush() in the case of any allocation/mapping failure inside it, and make vmap_pages_range_noflush() return an error if KMSAN fails to allocate the metadata. This patch also removes KMSAN_WARN_ON() from vmap_pages_range_noflush(), as these allocation failures are not fatal anymore. Link: https://lkml.kernel.org/r/20230413131223.4135168-1-glider@google.com Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations") Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com> Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/ Reviewed-by: Marco Elver <elver@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18tools/Makefile: do missed s/vm/mm/SeongJae Park
Commit 799fb82aa132 ("tools/vm: rename tools/vm to tools/mm") missed renaming 'vm' in 'tools/Makefile' to 'mm'. As a result, 'make clean' under 'tools/' directory fails as below: $ make -C tools clean DESCEND vm make[1]: Entering directory '/linux/tools/vm' make[1]: *** No rule to make target 'clean'. Stop. make[1]: Leaving directory '/linux/tools/vm' make: *** [Makefile:173: vm_clean] Error 2 make: Leaving directory '/linux/tools' Do the missed rename. Link: https://lkml.kernel.org/r/20230415203110.13858-1-sj@kernel.org Fixes: 799fb82aa132 ("tools/vm: rename tools/vm to tools/mm") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Ricardo Pardini <ricardo@pardini.net> Link: https://lore.kernel.org/linux-mm/20230415202454.13558-1-sj@kernel.org/ Tested-by: Ricardo Pardini <ricardo@pardini.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: fix memory leak on mm_init error handlingMathieu Desnoyers
commit f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter") introduces a memory leak by missing a call to destroy_context() when a percpu_counter fails to allocate. Before introducing the per-cpu counter allocations, init_new_context() was the last call that could fail in mm_init(), and thus there was no need to ever invoke destroy_context() in the error paths. Adding the following percpu counter allocations adds error paths after init_new_context(), which means its associated destroy_context() needs to be called when percpu counters fail to allocate. Link: https://lkml.kernel.org/r/20230330133822.66271-1-mathieu.desnoyers@efficios.com Fixes: f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Shakeel Butt <shakeelb@google.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlockTetsuo Handa
syzbot is reporting circular locking dependency which involves zonelist_update_seq seqlock [1], for this lock is checked by memory allocation requests which do not need to be retried. One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler. CPU0 ---- __build_all_zonelists() { write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd // e.g. timer interrupt handler runs at this moment some_timer_func() { kmalloc(GFP_ATOMIC) { __alloc_pages_slowpath() { read_seqbegin(&zonelist_update_seq) { // spins forever because zonelist_update_seq.seqcount is odd } } } } // e.g. timer interrupt handler finishes write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even } This deadlock scenario can be easily eliminated by not calling read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation requests. But Michal Hocko does not know whether we should go with this approach. Another deadlock scenario which syzbot is reporting is a race between kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with port->lock held and printk() from __build_all_zonelists() with zonelist_update_seq held. CPU0 CPU1 ---- ---- pty_write() { tty_insert_flip_string_and_push_buffer() { __build_all_zonelists() { write_seqlock(&zonelist_update_seq); build_zonelists() { printk() { vprintk() { vprintk_default() { vprintk_emit() { console_unlock() { console_flush_all() { console_emit_next_record() { con->write() = serial8250_console_write() { spin_lock_irqsave(&port->lock, flags); tty_insert_flip_string() { tty_insert_flip_string_fixed_flag() { __tty_buffer_request_room() { tty_buffer_alloc() { kmalloc(GFP_ATOMIC | __GFP_NOWARN) { __alloc_pages_slowpath() { zonelist_iter_begin() { read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held } } } } } } } } spin_unlock_irqrestore(&port->lock, flags); // message is printed to console spin_unlock_irqrestore(&port->lock, flags); } } } } } } } } } write_sequnlock(&zonelist_update_seq); } } } This deadlock scenario can be eliminated by preventing interrupt context from calling kmalloc(GFP_ATOMIC) and preventing printk() from calling console_flush_all() while zonelist_update_seq.seqcount is odd. Since Petr Mladek thinks that __build_all_zonelists() can become a candidate for deferring printk() [2], let's address this problem by disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC) and disabling synchronous printk() in order to avoid console_flush_all() . As a side effect of minimizing duration of zonelist_update_seq.seqcount being odd by disabling synchronous printk(), latency at read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and __GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e. do not record unnecessary locking dependency) from interrupt context is still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC) inside write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq) section... Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2] Reported-by: syzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Petr Mladek <pmladek@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Patrick Daly <quic_pdaly@quicinc.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()Ondrej Mosnacek
Linux Security Modules (LSMs) that implement the "capable" hook will usually emit an access denial message to the audit log whenever they "block" the current task from using the given capability based on their security policy. The occurrence of a denial is used as an indication that the given task has attempted an operation that requires the given access permission, so the callers of functions that perform LSM permission checks must take care to avoid calling them too early (before it is decided if the permission is actually needed to perform the requested operation). The __sys_setres[ug]id() functions violate this convention by first calling ns_capable_setid() and only then checking if the operation requires the capability or not. It means that any caller that has the capability granted by DAC (task's capability set) but not by MAC (LSMs) will generate a "denied" audit record, even if is doing an operation for which the capability is not required. Fix this by reordering the checks such that ns_capable_setid() is checked last and -EPERM is returned immediately if it returns false. While there, also do two small optimizations: * move the capability check before prepare_creds() and * bail out early in case of a no-op. Link: https://lkml.kernel.org/r/20230217162154.837549-1-omosnace@redhat.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18drm/amd/display: fix a divided-by-zero errorAlex Hung
[Why & How] timing.dsc_cfg.num_slices_v can be zero and it is necessary to check before using it. This fixes the error "divide error: 0000 [#1] PREEMPT SMP NOPTI". Reviewed-by: Aurabindo Pillai <Aurabindo.Pillai@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-04-18drm/amd/display: limit timing for single dimm memoryDaniel Miess
[Why] 1. It could hit bandwidth limitdation under single dimm memory when connecting 8K external monitor. 2. IsSupportedVidPn got validation failed with 2K240Hz eDP + 8K24Hz external monitor. 3. It's better to filter out such combination in EnumVidPnCofuncModality 4. For short term, filter out in dc bandwidth validation. [How] Force 2K@240Hz+8K@24Hz timing validation false in dc. Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Daniel Miess <Daniel.Miess@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-04-18drm/amd/display: set dcn315 lb bpp to 48Dmytro Laktyushkin
[Why & How] Fix a typo for dcn315 line buffer bpp. Reviewed-by: Jun Lei <Jun.Lei@amd.com> Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com> Signed-off-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-04-18drm/amdgpu: Fix desktop freezed after gpu-resetAlan Liu
[Why] After gpu-reset, sometimes the driver fails to enable vblank irq, causing flip_done timed out and the desktop freezed. During gpu-reset, we disable and enable vblank irq in dm_suspend() and dm_resume(). Later on in amdgpu_irq_gpu_reset_resume_helper(), we check irqs' refcount and decide to enable or disable the irqs again. However, we have 2 sets of API for controling vblank irq, one is dm_vblank_get/put() and another is amdgpu_irq_get/put(). Each API has its own refcount and flag to store the state of vblank irq, and they are not synchronized. In drm we use the first API to control vblank irq but in amdgpu_irq_gpu_reset_resume_helper() we use the second set of API. The failure happens when vblank irq was enabled by dm_vblank_get() before gpu-reset, we have vblank->enabled true. However, during gpu-reset, in amdgpu_irq_gpu_reset_resume_helper() vblank irq's state checked from amdgpu_irq_update() is DISABLED. So finally it disables vblank irq again. After gpu-reset, if there is a cursor plane commit, the driver will try to enable vblank irq by calling drm_vblank_enable(), but the vblank->enabled is still true, so it fails to turn on vblank irq and causes flip_done can't be completed in vblank irq handler and desktop become freezed. [How] Combining the 2 vblank control APIs by letting drm's API finally calls amdgpu_irq's API, so the irq's refcount and state of both APIs can be synchronized. Also add a check to prevent refcount from being less then 0 in amdgpu_irq_put(). v2: - Add warning in amdgpu_irq_enable() if the irq is already disabled. - Call dc_interrupt_set() in dm_set_vblank() to avoid refcount change if it is in gpu-reset. v3: - Improve commit message and code comments. Signed-off-by: Alan Liu <HaoPing.Liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-04-18PCI: Use of_property_present() for testing DT property presenceRob Herring
It is preferred to use typed property access functions (i.e. of_property_read_<type> functions) rather than low-level of_get_property()/of_find_property() functions for reading properties. As part of this, convert of_get_property()/of_find_property() calls to the recently added of_property_present() helper when we just want to test for presence of a property and nothing more. Link: https://lore.kernel.org/r/20230310144719.1544443-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> # pcie-mediatek
2023-04-18drm/amd/display: set variable dccg314_init storage-class-specifier to staticTom Rix
smatch reports drivers/gpu/drm/amd/amdgpu/../display/dc/dcn314/dcn314_dccg.c:277:6: warning: symbol 'dccg314_init' was not declared. Should it be static? This variable is only used in one file so should be static. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amd/display: Use pointer in the memcpyRodrigo Siqueira
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amd/display: Remove wrong assignment of DP link rateRodrigo Siqueira
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amd/display: Set dp_rate to dm_dp_rate_na by defaultRodrigo Siqueira
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amd/display: Set maximum VStartup if is DCN201Rodrigo Siqueira
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amd/display: Adjust code identation and other minor detailsRodrigo Siqueira
This commit replaces spaces with tabs in multiple functions and adjusts the indentation in some other parts of the code to improve readability. Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amd/display: Add missing mclk updateRodrigo Siqueira
When using FPO, there is some misconfiguration that happens for the lack of configuration of the MCLK switch in some circumstances. This commit adds the required field update when using the MCLK switch. Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amd/display: Update bouding box values for DCN32Rodrigo Siqueira
All clock values came from firmware, but bounding box values can be helpful in some debug situations. This commit updates some of the values associated with clock speed and memory channels. Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amdgpu: release gpu full access after "amdgpu_device_ip_late_init"Chong Li
[WHY] Function "amdgpu_irq_update()" called by "amdgpu_device_ip_late_init()" is an atomic context. We shouldn't access registers through KIQ since "msleep()" may be called in "amdgpu_kiq_rreg()". [HOW] Move function "amdgpu_virt_release_full_gpu()" after function "amdgpu_device_ip_late_init()", to ensure that registers be accessed through RLCG instead of KIQ. Call Trace: <TASK> show_stack+0x52/0x69 dump_stack_lvl+0x49/0x6d dump_stack+0x10/0x18 __schedule_bug.cold+0x4f/0x6b __schedule+0x473/0x5d0 ? __wake_up_klogd.part.0+0x40/0x70 ? vprintk_emit+0xbe/0x1f0 schedule+0x68/0x110 schedule_timeout+0x87/0x160 ? timer_migration_handler+0xa0/0xa0 msleep+0x2d/0x50 amdgpu_kiq_rreg+0x18d/0x1f0 [amdgpu] amdgpu_device_rreg.part.0+0x59/0xd0 [amdgpu] amdgpu_device_rreg+0x3a/0x50 [amdgpu] amdgpu_sriov_rreg+0x3c/0xb0 [amdgpu] gfx_v10_0_set_gfx_eop_interrupt_state.constprop.0+0x16c/0x190 [amdgpu] gfx_v10_0_set_eop_interrupt_state+0xa5/0xb0 [amdgpu] amdgpu_irq_update+0x53/0x80 [amdgpu] amdgpu_irq_get+0x7c/0xb0 [amdgpu] amdgpu_fence_driver_hw_init+0x58/0x90 [amdgpu] amdgpu_device_init.cold+0x16b7/0x2022 [amdgpu] Signed-off-by: Chong Li <chongli2@amd.com> Reviewed-by: JingWen.Chen2@amd.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amd/pm: change pmfw_decoded_link_width, speed variables to globalsTom Rix
gcc with W=1 reports In file included from drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu13/smu_v13_0.c:36: ./drivers/gpu/drm/amd/amdgpu/../pm/swsmu/inc/smu_v13_0.h:66:18: error: ‘pmfw_decoded_link_width’ defined but not used [-Werror=unused-const-variable=] 66 | static const int pmfw_decoded_link_width[7] = {0, 1, 2, 4, 8, 12, 16}; | ^~~~~~~~~~~~~~~~~~~~~~~ ./drivers/gpu/drm/amd/amdgpu/../pm/swsmu/inc/smu_v13_0.h:65:18: error: ‘pmfw_decoded_link_speed’ defined but not used [-Werror=unused-const-variable=] 65 | static const int pmfw_decoded_link_speed[5] = {1, 2, 3, 4, 5}; | ^~~~~~~~~~~~~~~~~~~~~~~ These variables are defined and used in smu_v13_0_7_ppt.c and smu_v13_0_0_ppt.c. There should be only one definition. So define the variables as globals in smu_v13_0.c Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amdgpu/vcn: fix mmsch ctx table sizeJane Jian
add jpeg table size to ctx table size rather than override it Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: JingWen Chen <JingWen.Chen2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18Merge tag 'v6.4-rockchip-clk1' of ↵Stephen Boyd
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into clk-rockchip Pull a couple Rockchip clk driver updates from Heiko Stübner: Reparenting fix for the clock supplying camera modules on the rk3399 and more critical (bus-)clocks on the rk3588. * tag 'v6.4-rockchip-clk1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: clk: rockchip: rk3588: make gate linked clocks critical clk: rockchip: rk3399: allow clk_cifout to force clk_cifout_src to reparent
2023-04-18Merge branch 'Provide bpf_for() and bpf_for_each() by libbpf'Alexei Starovoitov
Andrii Nakryiko says: ==================== This patch set moves bpf_for(), bpf_for_each(), and bpf_repeat() macros from selftests-internal bpf_misc.h header to libbpf-provided bpf_helpers.h header. To do this in a way to allow users to feature-detect and guard such bpf_for()/bpf_for_each() uses on old kernels we also extend libbpf to improve unresolved kfunc calls handling and reporting. This lets us mark bpf_iter_num_{new,next,destroy}() declarations as __weak, and thus not fail program loading outright if such kfuncs are missing on the host kernel. Patches #1 and #2 do some simple clean ups and logging improvements. Patch #3 adds kfunc call poisoning and log fixup logic and is the hear of this patch set, effectively. Patch #4 adds selftest for this logic. Patches #4 and #5 move bpf_for()/bpf_for_each()/bpf_repeat() into bpf_helpers.h header and mark kfuncs as __weak to allow users to feature-detect and guard their uses. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-18libbpf: mark bpf_iter_num_{new,next,destroy} as __weakAndrii Nakryiko
Mark bpf_iter_num_{new,next,destroy}() kfuncs declared for bpf_for()/bpf_repeat() macros as __weak to allow users to feature-detect their presence and guard bpf_for()/bpf_repeat() loops accordingly for backwards compatibility with old kernels. Now that libbpf supports kfunc calls poisoning and better reporting of unresolved (but called) kfuncs, declaring number iterator kfuncs in bpf_helpers.h won't degrade user experience and won't cause unnecessary kernel feature dependencies. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230418002148.3255690-7-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-18libbpf: move bpf_for(), bpf_for_each(), and bpf_repeat() into bpf_helpers.hAndrii Nakryiko
To make it easier for bleeding-edge BPF applications, such as sched_ext, to utilize open-coded iterators, move bpf_for(), bpf_for_each(), and bpf_repeat() macros from selftests/bpf-internal bpf_misc.h helper, to libbpf-provided bpf_helpers.h header. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230418002148.3255690-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-18selftests/bpf: add missing __weak kfunc log fixup testAndrii Nakryiko
Add test validating that libbpf correctly poisons and reports __weak unresolved kfuncs in post-processed verifier log. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230418002148.3255690-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-18libbpf: improve handling of unresolved kfuncsAndrii Nakryiko
Currently, libbpf leaves `call #0` instruction for __weak unresolved kfuncs, which might lead to a confusing verifier log situations, where invalid `call #0` will be treated as successfully validated. We can do better. Libbpf already has an established mechanism of poisoning instructions that failed some form of resolution (e.g., CO-RE relocation and BPF map set to not be auto-created). Libbpf doesn't fail them outright to allow users to guard them through other means, and as long as BPF verifier can prove that such poisoned instructions cannot be ever reached, this doesn't consistute an invalid BPF program. If user didn't guard such code, libbpf will extract few pieces of information to tie such poisoned instructions back to additional information about what entitity wasn't resolved (e.g., BPF map name, or CO-RE relocation information). __weak unresolved kfuncs fit this model well, so this patch extends libbpf with poisioning and log fixup logic for kfunc calls. Note, this poisoning is done only for kfunc *calls*, not kfunc address resolution (ldimm64 instructions). The former cannot be ever valid, if reached, so it's safe to poison them. The latter is a valid mechanism to check if __weak kfunc ksym was resolved, and do necessary guarding and work arounds based on this result, supported in most recent kernels. As such, libbpf keeps such ldimm64 instructions as loading zero, never poisoning them. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230418002148.3255690-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-18libbpf: report vmlinux vs module name when dealing with ksymsAndrii Nakryiko
Currently libbpf always reports "kernel" as a source of ksym BTF type, which is ambiguous given ksym's BTF can come from either vmlinux or kernel module BTFs. Make this explicit and log module name, if used BTF is from kernel module. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230418002148.3255690-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>