summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-04-18mm: ksm: support hwpoison for ksm pageLonglong Xia
hwpoison_user_mappings() is updated to support ksm pages, and add collect_procs_ksm() to collect processes when the error hit an ksm page. The difference from collect_procs_anon() is that it also needs to traverse the rmap-item list on the stable node of the ksm page. At the same time, add_to_kill_ksm() is added to handle ksm pages. And task_in_to_kill_list() is added to avoid duplicate addition of tsk to the to_kill list. This is because when scanning the list, if the pages that make up the ksm page all come from the same process, they may be added repeatedly. Link: https://lkml.kernel.org/r/20230414021741.2597273-3-xialonglong1@huawei.com Signed-off-by: Longlong Xia <xialonglong1@huawei.com> Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: memory-failure: refactor add_to_kill()Longlong Xia
Patch series "mm: ksm: support hwpoison for ksm page", v2. Currently, ksm does not support hwpoison. As ksm is being used more widely for deduplication at the system level, container level, and process level, supporting hwpoison for ksm has become increasingly important. However, ksm pages were not processed by hwpoison in 2009 [1]. The main method of implementation: 1. Refactor add_to_kill() and add new add_to_kill_*() to better accommodate the handling of different types of pages. 2. Add collect_procs_ksm() to collect processes when the error hit an ksm page. 3. Add task_in_to_kill_list() to avoid duplicate addition of tsk to the to_kill list. 4. Try_to_unmap ksm page (already supported). 5. Handle related processes such as sending SIGBUS. Tested with poisoning to ksm page from 1) different process 2) one process and with/without memory_failure_early_kill set, the processes are killed as expected with the patchset. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ commit/?h=01e00f880ca700376e1845cf7a2524ebe68e47d6 This patch (of 2): The page_address_in_vma() is used to find the user virtual address of page in add_to_kill(), but it doesn't support ksm due to the ksm page->index unusable, add an ksm_addr as parameter to add_to_kill(), let's the caller to pass it, also rename the function to __add_to_kill(), and adding add_to_kill_anon_file() for handling anonymous pages and file pages, adding add_to_kill_fsdax() for handling fsdax pages. Link: https://lkml.kernel.org/r/20230414021741.2597273-1-xialonglong1@huawei.com Link: https://lkml.kernel.org/r/20230414021741.2597273-2-xialonglong1@huawei.com Signed-off-by: Longlong Xia <xialonglong1@huawei.com> Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Nanyong Sun <sunnanyong@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/memfd: fix test_sysctlJeff Xu
sysctl memfd_noexec is pid-namespaced, non-reservable, and inherent to the child process. Move the inherence test from init ns to child ns, so init ns can keep the default value. Link: https://lkml.kernel.org/r/20230414022801.2545257-1-jeffxu@google.com Signed-off-by: Jeff Xu <jeffxu@google.com> Reported-by: kernel test robot <yujie.liu@intel.com> Link: https://lore.kernel.org/oe-lkp/202303312259.441e35db-yujie.liu@intel.com Tested-by: Yujie Liu <yujie.liu@intel.com> Cc: Daniel Verkamp <dverkamp@chromium.org> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jorge Lucangeli Obes <jorgelo@chromium.org> Cc: Kees Cook <keescook@chromium.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: run hugetlb testcases of va switchChaitanya S Prakash
The va_high_addr_switch selftest is used to test mmap across 128TB boundary. It divides the selftest cases into two main categories on the basis of size. One set is used to create mappings that are multiples of PAGE_SIZE while the other creates mappings that are multiples of HUGETLB_SIZE. In order to run the hugetlb testcases the binary must be appended with "--run-hugetlb" but the file that used to run the test only invokes the binary, thereby completely skipping the hugetlb testcases. Hence, the required statement has been added. Link: https://lkml.kernel.org/r/20230323105243.2807166-6-chaitanyas.prakash@arm.com Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: configure nr_hugepages for arm64Chaitanya S Prakash
Arm64 has a default hugepage size of 512MB when CONFIG_ARM64_64K_PAGES=y is enabled. While testing on arm64 platforms having up to 4PB of virtual address space, a minimum of 6 hugepages were required for all test cases to pass. Support for this requirement has been added. Link: https://lkml.kernel.org/r/20230323105243.2807166-5-chaitanyas.prakash@arm.com Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: add platform independent in code commentsChaitanya S Prakash
The in code comments for the selftest were made on the basis of 128TB switch, an architecture feature specific to PowerPc and x86 platforms. Keeping in mind the support added for arm64 platforms which implements a 256TB switch, a more generic explanation has been provided. Link: https://lkml.kernel.org/r/20230323105243.2807166-4-chaitanyas.prakash@arm.com Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: rename va_128TBswitch to va_high_addr_switchChaitanya S Prakash
As the initial selftest only took into consideration PowperPC and x86 architectures, on adding support for arm64, a platform independent naming convention is chosen. Link: https://lkml.kernel.org/r/20230323105243.2807166-3-chaitanyas.prakash@arm.com Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: add support for arm64 platform on va switchChaitanya S Prakash
Patch series "selftests/mm: Implement support for arm64 on va". The va_128TBswitch selftest is designed and implemented for PowerPC and x86 architectures which support a 128TB switch, up to 256TB of virtual address space and hugepage sizes of 16MB and 2MB respectively. Arm64 platforms on the other hand support a 256Tb switch, up to 4PB of virtual address space and a default hugepage size of 512MB when 64k pagesize is enabled. These architectural differences require introducing support for arm64 platforms, after which a more generic naming convention is suggested. The in code comments are amended to provide a more platform independent explanation of the working of the code and nr_hugepages are configured as required. Finally, the file running the testcase is modified in order to prevent skipping of hugetlb testcases of va_high_addr_switch. This patch (of 5): Arm64 platforms have the ability to support 64kb pagesize, 512MB default hugepage size and up to 4PB of virtual address space. The address switch occurs at 256TB as opposed to 128TB. Hence, the necessary support has been added. Link: https://lkml.kernel.org/r/20230323105243.2807166-1-chaitanyas.prakash@arm.com Link: https://lkml.kernel.org/r/20230323105243.2807166-2-chaitanyas.prakash@arm.com Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18dt-bindings: display: simplify compatibles syntaxKrzysztof Kozlowski
Lists (items) with one item should be just const or enum because it is shorter and simpler. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://lore.kernel.org/r/20230414104230.23165-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-04-18dt-bindings: display: mediatek: simplify compatibles syntaxKrzysztof Kozlowski
Lists (items) with one item should be just enum because it is shorter, simpler and does not confuse, if one wants to add new entry with a fallback. Convert all of them to enums. OTOH, leave unused "oneOf" entries in anticipation of further growth of the entire binding. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/20230414083311.12197-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-04-18dt-bindings: drm/bridge: ti-sn65dsi86: Fix the video-interfaces.yaml referencesFabio Estevam
video-interface.txt does not exist anymore, as it has been converted to video-interfaces.yaml. Instead of referencing video-interfaces.yaml multiple times, pass it as a $ref to the schema. Signed-off-by: Fabio Estevam <festevam@denx.de> Link: https://lore.kernel.org/r/20230412175800.2537812-1-festevam@gmail.com Signed-off-by: Rob Herring <robh@kernel.org>
2023-04-18dt-bindings: timer: Drop unneeded quotesRob Herring
Cleanup bindings dropping unneeded quotes. Once all these are fixed, checking for this can be enabled in yamllint. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20230327170146.4104556-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-04-18dt-bindings: interrupt-controller: qcom,pdc: document qcom,qdu1000-pdcKrzysztof Kozlowski
Add QDU1000 PDC, already used in upstreamed DTS. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230416102831.105136-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-04-18checkpatch: introduce proper bindings license checkDmitry Rokosov
All headers from 'include/dt-bindings/' must be verified by checkpatch together with Documentation bindings, because all of them are part of the whole DT bindings system. The requirement is dual licensed and matching patterns: * Schemas: /GPL-2\.0(?:-only)? OR BSD-2-Clause/ * Headers: /GPL-2\.0(?:-only)? OR \S+/ Above patterns suggested by Rob at: https://lore.kernel.org/all/CAL_Jsq+-YJsBO+LuPJ=ZQ=eb-monrwzuCppvReH+af7hYZzNaQ@mail.gmail.com The issue was found during patch review: https://lore.kernel.org/all/20230313201259.19998-4-ddrokosov@sberdevices.ru/ Link: https://lkml.kernel.org/r/20230404191715.7319-1-ddrokosov@sberdevices.ru Signed-off-by: Dmitry Rokosov <ddrokosov@sberdevices.ru> Reviewed-by: Rob Herring <robh@kernel.org> Cc: Andy Whitcroft <apw@canonical.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18epoll: rename global epmutexDavidlohr Bueso
As of 4f04cbaf128 ("epoll: use refcount to reduce ep_mutex contention"), this lock is now specific to nesting cases - inserting an epoll fd onto another epoll fd. Rename the lock to be less generic. Link: https://lkml.kernel.org/r/20230411234159.20421-1-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: add GDB convenience functions $lx_dentry_name() and $lx_i_dentry()Glenn Washburn
$lx_dentry_name() generates a full VFS path from a given dentry pointer, and $lx_i_dentry() returns the dentry pointer associated with the given inode pointer, if there is one. Link: https://lkml.kernel.org/r/c9a5ad8efbfbd2cc6559e082734eed7628f43a16.1677631565.git.development@efficientek.com Signed-off-by: Glenn Washburn <development@efficientek.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Antonio Borneo <antonio.borneo@foss.st.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: create linux/vfs.py for VFS related GDB helpersGlenn Washburn
Patch series "GDB VFS utils". I've created a couple GDB convenience functions that I found useful when debugging some VFS issues and figure others might find them useful. For instance, they are useful in setting conditional breakpoints on VFS functions where you only care if the dentry path is a certain value. I took the opportunity to create a new "vfs" python module to give VFS related utilities a home. This patch (of 2): This will allow for more VFS specific GDB helpers to be collected in one place. Move utils.dentry_name into the vfs modules. Also a local variable in proc.py was changed from vfs to mnt to prevent a naming collision with the new vfs module. [akpm@linux-foundation.org: add SPDX-License-Identifier] Link: https://lkml.kernel.org/r/cover.1677631565.git.development@efficientek.com Link: https://lkml.kernel.org/r/7bba4c065a8c2c47f1fc5b03a7278005b04db251.1677631565.git.development@efficientek.com Signed-off-by: Glenn Washburn <development@efficientek.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Antonio Borneo <antonio.borneo@foss.st.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18uapi/linux/const.h: prefer ISO-friendly __typeof__Kevin Brodsky
typeof is (still) a GNU extension, which means that it cannot be used when building ISO C (e.g. -std=c99). It should therefore be avoided in uapi headers in favour of the ISO-friendly __typeof__. Unfortunately this issue could not be detected by CONFIG_UAPI_HEADER_TEST=y as the __ALIGN_KERNEL() macro is not expanded in any uapi header. This matters from a userspace perspective, not a kernel one. uapi headers and their contents are expected to be usable in a variety of situations, and in particular when building ISO C applications (with -std=c99 or similar). This particular problem can be reproduced by trying to use the __ALIGN_KERNEL macro directly in application code, say: #include <linux/const.h> int align(int x, int a) { return __KERNEL_ALIGN(x, a); } and trying to build that with -std=c99. Link: https://lkml.kernel.org/r/20230411092747.3759032-1-kevin.brodsky@arm.com Fixes: a79ff731a1b2 ("netfilter: xtables: make XT_ALIGN() usable in exported headers by exporting __ALIGN_KERNEL()") Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reported-by: Ruben Ayrapetyan <ruben.ayrapetyan@arm.com> Tested-by: Ruben Ayrapetyan <ruben.ayrapetyan@arm.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Tested-by: Petr Vorel <pvorel@suse.cz> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18delayacct: track delays from IRQ/SOFTIRQYang Yang
Delay accounting does not track the delay of IRQ/SOFTIRQ. While IRQ/SOFTIRQ could have obvious impact on some workloads productivity, such as when workloads are running on system which is busy handling network IRQ/SOFTIRQ. Get the delay of IRQ/SOFTIRQ could help users to reduce such delay. Such as setting interrupt affinity or task affinity, using kernel thread for NAPI etc. This is inspired by "sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure"[1]. Also fix some code indent problems of older code. And update tools/accounting/getdelays.c: / # ./getdelays -p 156 -di print delayacct stats ON printing IO accounting PID 156 CPU count real total virtual total delay total delay average 15 15836008 16218149 275700790 18.380ms IO count delay total delay average 0 0 0.000ms SWAP count delay total delay average 0 0 0.000ms RECLAIM count delay total delay average 0 0 0.000ms THRASHING count delay total delay average 0 0 0.000ms COMPACT count delay total delay average 0 0 0.000ms WPCOPY count delay total delay average 36 7586118 0.211ms IRQ count delay total delay average 42 929161 0.022ms [1] commit 52b1364ba0b1("sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure") Link: https://lkml.kernel.org/r/202304081728353557233@zte.com.cn Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Cc: Jiang Xuexin <jiang.xuexin@zte.com.cn> Cc: wangyong <wang.yong12@zte.com.cn> Cc: junhua huang <huang.junhua@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: timerlist: convert int chunks to strAmjad Ouled-Ameur
join() expects strings but integers are given. Convert chunks list to strings before passing it to join() Link: https://lkml.kernel.org/r/20230406221217.1585486-4-f.fainelli@gmail.com Signed-off-by: Amjad Ouled-Ameur <aouledameur@baylibre.com> Signed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: print interruptsFlorian Fainelli
This GDB script prints the interrupts in the system in the same way that /proc/interrupts does. This does include the architecture specific part done by arch_show_interrupts() for x86, ARM, ARM64 and MIPS. Example output from an ARM64 system: (gdb) lx-interruptlist CPU0 CPU1 CPU2 CPU3 10: 3167 1225 1276 2629 GICv2 30 Level arch_timer 13: 0 0 0 0 GICv2 36 Level arm-pmu 14: 0 0 0 0 GICv2 37 Level arm-pmu 15: 0 0 0 0 GICv2 38 Level arm-pmu 16: 0 0 0 0 GICv2 39 Level arm-pmu 28: 0 0 0 0 interrupt-controller@8410640 5 Edge brcmstb-gpio-wake 30: 125 0 0 0 GICv2 128 Level ttyS0 31: 0 0 0 0 interrupt-controller@8416000 0 Level mspi_done 32: 0 0 0 0 interrupt-controller@8410640 3 Edge brcmstb-waketimer 33: 0 0 0 0 interrupt-controller@8418580 8 Edge brcmstb-waketimer-rtc 34: 872 0 0 0 GICv2 230 Level brcm_scmi@0 35: 0 0 0 0 interrupt-controller@8410640 10 Edge 8d0f200.usb-phy 37: 0 0 0 0 GICv2 97 Level PCIe PME 42: 0 0 0 0 GICv2 145 Level xhci-hcd:usb1 43: 94 0 0 0 GICv2 71 Level mmc1 44: 0 0 0 0 GICv2 70 Level mmc0 IPI0: 23 666 154 98 Rescheduling interrupts IPI1: 247 1053 1701 634 Function call interrupts IPI2: 0 0 0 0 CPU stop interrupts IPI3: 0 0 0 0 CPU stop (for crash dump) interrupts IPI4: 0 0 0 0 Timer broadcast interrupts IPI5: 7 9 5 0 IRQ work interrupts IPI6: 0 0 0 0 CPU wake-up interrupts ERR: 0 Link: https://lkml.kernel.org/r/20230406220451.1583239-1-f.fainelli@gmail.com Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: raise error with reduced debugging informationFlorian Fainelli
If CONFIG_DEBUG_INFO_REDUCED is enabled in the kernel configuration, we will typically not be able to load vmlinux-gdb.py and will fail with: Traceback (most recent call last): File "/home/fainelli/work/buildroot/output/arm64/build/linux-custom/vmlinux-gdb.py", line 25, in <module> import linux.utils File "/home/fainelli/work/buildroot/output/arm64/build/linux-custom/scripts/gdb/linux/utils.py", line 131, in <module> atomic_long_counter_offset = atomic_long_type.get_type()['counter'].bitpos KeyError: 'counter' Rather be left wondering what is happening only to find out that reduced debug information is the cause, raise an eror. This was not typically a problem until e3c8d33e0d62 ("scripts/gdb: fix 'lx-dmesg' on 32 bits arch") but it has since then. Link: https://lkml.kernel.org/r/20230406215252.1580538-1-f.fainelli@gmail.com Fixes: e3c8d33e0d62 ("scripts/gdb: fix 'lx-dmesg' on 32 bits arch") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Cc: Antonio Borneo <antonio.borneo@foss.st.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: add a Radix Tree ParserKieran Bingham
Linux makes use of the Radix Tree data structure to store pointers indexed by integer values. This structure is utilised across many structures in the kernel including the IRQ descriptor tables, and several filesystems. This module provides a method to lookup values from a structure given its head node. Usage: The function lx_radix_tree_lookup, must be given a symbol of type struct radix_tree_root, and an index into that tree. The object returned is a generic integer value, and must be cast correctly to the type based on the storage in the data structure. For example, to print the irq descriptor in the sparse irq_desc_tree at index 18, try the following: (gdb) print (struct irq_desc)$lx_radix_tree_lookup(irq_desc_tree, 18) This script previously existed under commit e127a73d41ac471d7e3ba950cf128f42d6ee3448 ("scripts/gdb: add a Radix Tree Parser") and was later reverted with b447e02548a3304c47b78b5e2d75a4312a8f17e1i (Revert "scripts/gdb: add a Radix Tree Parser"). This version expects the XArray based radix tree implementation and has been verified using QEMU/x86 on Linux 6.3-rc5. [f.fainelli@gmail.com: revive and update for xarray implementation] [f.fainelli@gmail.com: guard against a NULL node in the while loop] Link: https://lkml.kernel.org/r/20230405222743.1191674-1-f.fainelli@gmail.com Link: https://lkml.kernel.org/r/20230404214049.1016811-1-f.fainelli@gmail.com Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18lib/rbtree: use '+' instead of '|' for setting color.Noah Goldstein
This has a slight benefit for x86 and has no effect on other targets. The benefit to x86 is it change the codegen for setting a node to block from `mov %r0, %r1; or $RB_BLACK, %r1` to `lea RB_BLACK(%r0), %r1` which saves an instructions. In all other cases it just replace ALU with ALU (or -> and) which perform the same on all machines I am aware of. Total instructions in rbtree.o: Before - 802 After - 782 so it saves about 20 `mov` instructions. Link: https://lkml.kernel.org/r/20230404221350.3806566-1-goldstein.w.n@gmail.com Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Cc: Michel Lespinasse <michel@lespinasse.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18proc/stat: remove arch_idle_time()Heiko Carstens
The last (only) architecture specific arch_idle_time() implementation was removed with commit be76ea614460 ("s390/idle: remove arch_cpu_idle_time() and corresponding code"). Therefore remove the now dead code in fs/proc/stat.c as well. Link: https://lkml.kernel.org/r/20230405143452.2677172-1-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18checkpatch: check for misuse of the link tagsMatthieu Baerts
"Link:" and "Closes:" tags have to be used with public URLs. It is difficult to make sure the link is public but at least we can verify the tag is followed by 'http(s)://'. With that, we avoid such a tag that is not allowed [1]: Closes: <number> Now that we check the "link" tags are followed by a URL, we can relax the check linked to "Reported-by being followed by a link tag" to only verify if a "link" tag is present after the "Reported-by" one. Link: https://lore.kernel.org/linux-doc/CAHk-=wh0v1EeDV3v8TzK81nDC40=XuTdY2MCr0xy3m3FiBV3+Q@mail.gmail.com/ [1] Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-5-d26d1fa66f9f@tessares.net Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kai Wasserbäch <kai@dev.carbon-project.org> Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18checkpatch: allow Closes tags with linksMatthieu Baerts
As a follow-up of a previous patch modifying the documentation to allow using the "Closes:" tag, checkpatch.pl is updated accordingly. checkpatch.pl now no longer complain when the "Closes:" tag is used by itself: commit 76f381bb77a0 ("checkpatch: warn when unknown tags are used for links") ... or after the "Reported-by:" tag: commit d7f1d71e5ef6 ("checkpatch: warn when Reported-by: is not followed by Link:") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/373 Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-4-d26d1fa66f9f@tessares.net Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kai Wasserbäch <kai@dev.carbon-project.org> Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18checkpatch: use a list of "link" tagsMatthieu Baerts
The following commit will allow the use of a similar "link" tag. Because there is a possibility that other similar tags will be added in the future and to reduce the number of places where the code will be modified to allow this new tag, a list with all these "link" tags is now used. Two variables are created from it: one to search for such tags and one to print all tags in a warning message. Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-3-d26d1fa66f9f@tessares.net Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Suggested-by: Joe Perches <joe@perches.com> Acked-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kai Wasserbäch <kai@dev.carbon-project.org> Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18checkpatch: don't print the next line if not definedMatthieu Baerts
When checking if "Reported-by" tag is followed by "Link:", there is no need to print the next line if there is no next line. While at it, also mention in this case that the "Link:" tag should be followed by a URL, similar to the next warning. By doing that, the code is now similar to what is done above when checking if the Co-developed-by tag is properly used. Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-2-d26d1fa66f9f@tessares.net Fixes: d7f1d71e5ef6 ("checkpatch: warn when Reported-by: is not followed by Link:") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kai Wasserbäch <kai@dev.carbon-project.org> Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18docs: process: allow Closes tags with linksMatthieu Baerts
Since v6.3, checkpatch.pl now complains about the use of "Closes:" tags followed by a link [1]. It also complains if a "Reported-by:" tag is followed by a "Closes:" one [2]. As detailed in the first patch, this "Closes:" tag is used for a bit of time, mainly by DRM and MPTCP subsystems. It is used by some bug trackers to automate the closure of issues when a patch is accepted. It is even planned to use this tag with bugzilla.kernel.org [3]. The first patch updates the documentation to explain what is this "Closes:" tag and how/when to use it. The second patch modifies checkpatch.pl to stop complaining about it. The DRM maintainers and their mailing list have been added in Cc as they are probably interested by these two patches as well. [1] https://lore.kernel.org/all/3b036087d80b8c0e07a46a1dbaaf4ad0d018f8d5.1674217480.git.linux@leemhuis.info/ [2] https://lore.kernel.org/all/bb5dfd55ea2026303ab2296f4a6df3da7dd64006.1674217480.git.linux@leemhuis.info/ [3] https://lore.kernel.org/linux-doc/20230315181205.f3av7h6owqzzw64p@meerkat.local/ This patch (of 5): Making sure a bug tracker is up to date is not an easy task. For example, a first version of a patch fixing a tracked issue can be sent a long time after having created the issue. But also, it can take some time to have this patch accepted upstream in its final form. When it is done, someone -- probably not the person who accepted the patch -- has to remember about closing the corresponding issue. This task of closing and tracking the patch can be done automatically by bug trackers like GitLab [1], GitHub [2] and hopefully soon [3] bugzilla.kernel.org when the appropriated tag is used. The two first ones accept multiple tags but it is probably better to pick one. According to commit 76f381bb77a0 ("checkpatch: warn when unknown tags are used for links"), the "Closes" tag seems to have been used in the past by a few people and it is supported by popular bug trackers. Here is how it has been used in the past: $ git log --no-merges --format=email -P --grep='^Closes: http' | \ grep '^Closes: http' | cut -d/ -f3-5 | sort | uniq -c | sort -rn 391 gitlab.freedesktop.org/drm/intel 79 github.com/multipath-tcp/mptcp_net-next 8 gitlab.freedesktop.org/drm/msm 3 gitlab.freedesktop.org/drm/amd 2 gitlab.freedesktop.org/mesa/mesa 1 patchwork.freedesktop.org/series/73320 1 gitlab.freedesktop.org/lima/linux 1 gitlab.freedesktop.org/drm/nouveau 1 github.com/ClangBuiltLinux/linux 1 bugzilla.netfilter.org/show_bug.cgi?id=1579 1 bugzilla.netfilter.org/show_bug.cgi?id=1543 1 bugzilla.netfilter.org/show_bug.cgi?id=1436 1 bugzilla.netfilter.org/show_bug.cgi?id=1427 1 bugs.debian.org/625804 Likely here, the "Closes" tag was only properly used with GitLab and GitHub. We can also see that it has been used quite a few times (and still used recently) and this is then not a "random tag that makes no sense" like it was the case with "BugLink" recently [4]. It has also been misused but that was a long time ago, when it was common to use many different random tags. checkpatch.pl script should then stop complaining about this "Closes" tag. As suggested by Thorsten [5], if this tag is accepted, it should first be described in the documentation. This is what is done here in this patch. To avoid confusion, the "Closes" should be used with any public bug report. No need to check if the underlying bug tracker supports automations. Having this tag with any kind of public bug reports allows bots like regzbot to clearly identify patches fixing a specific bug and avoid false-positives, e.g. patches mentioning it is related to an issue but not fixing it. As suggested by Thorsten [6] again, if we follow the same logic, the "Closes" tag should then be used after a "Reported-by" one. Note that thanks to this "Closes" tag, the mentioned bug trackers can also locate where a patch has been applied in different branches and repositories. If only the "Link" tag is used, the tracking can also be done but the ticket will not be closed and a manual operation will be needed. Also, these bug trackers have some safeguards: the closure is only done if a commit having the "Closes:" tag is applied in a specific branch. It will then not be closed if a random commit having the same tag is published elsewhere. Also in case of closure, a notification is sent to the owners. Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-0-d26d1fa66f9f@tessares.net Link: https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#default-closing-pattern [1] Link: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests [2] Link: https://lore.kernel.org/linux-doc/20230315181205.f3av7h6owqzzw64p@meerkat.local/ [3] Link: https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@mail.gmail.com/ [4] Link: https://lore.kernel.org/all/688cd6cb-90ab-6834-a6f5-97080e39ca8e@leemhuis.info/ [5] Link: https://lore.kernel.org/linux-doc/2194d19d-f195-1a1e-41fc-7827ae569351@leemhuis.info/ [6] Link: https://github.com/multipath-tcp/mptcp_net-next/issues/373 Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-1-d26d1fa66f9f@tessares.net Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Suggested-by: Thorsten Leemhuis <linux@leemhuis.info> Acked-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Acked-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kai Wasserbäch <kai@dev.carbon-project.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: fix lx-timerlist for HRTIMER_MAX_CLOCK_BASES printingPeng Liu
HRTIMER_MAX_CLOCK_BASES is of enum type hrtimer_base_type. To print it as an integer, HRTIMER_MAX_CLOCK_BASES should be converted first. Link: https://lkml.kernel.org/r/TYCP286MB214640FF0E7F04AC3926A39EC6819@TYCP286MB2146.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Peng Liu <liupeng17@lenovo.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Kieran Bingham <kbingham@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: fix lx-timerlist for Python3Peng Liu
Below incompatibilities between Python2 and Python3 made lx-timerlist fail to run under Python3. o xrange() is replaced by range() in Python3 o bytes and str are different types in Python3 o the return value of Inferior.read_memory() is memoryview object in Python3 akpm: cc stable so that older kernels are properly debuggable under newer Python. Link: https://lkml.kernel.org/r/TYCP286MB2146EE1180A4D5176CBA8AB2C6819@TYCP286MB2146.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Peng Liu <liupeng17@lenovo.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: fix lx-timerlist for struct timequeue_head changePeng Liu
commit 511885d7061e ("lib/timerqueue: Rely on rbtree semantics for next timer") changed struct timerqueue_head, and so print_active_timers() should be changed accordingly with its way to interpret the structure. Link: https://lkml.kernel.org/r/TYCP286MB21463BD277330B26DDC18903C6819@TYCP286MB2146.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Peng Liu <liupeng17@lenovo.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18memfd: pass argument of memfd_fcntl as intLuca Vizzarro
The interface for fcntl expects the argument passed for the command F_ADD_SEALS to be of type int. The current code wrongly treats it as a long. In order to avoid access to undefined bits, we should explicitly cast the argument to int. This commit changes the signature of all the related and helper functions so that they treat the argument as int instead of long. Link: https://lkml.kernel.org/r/20230414152459.816046-5-Luca.Vizzarro@arm.com Signed-off-by: Luca Vizzarro <Luca.Vizzarro@arm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Kevin Brodsky <Kevin.Brodsky@arm.com> Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com> Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: David Laight <David.Laight@ACULAB.com> Cc: Mark Rutland <Mark.Rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: Multi-gen LRU: remove wait_event_killable()Kalesh Singh
Android 14 and later default to MGLRU [1] and field telemetry showed occasional long tail latency (>100ms) in the reclaim path. Tracing revealed priority inversion in the reclaim path. In try_to_inc_max_seq(), when high priority tasks were blocked on wait_event_killable(), the preemption of the low priority task to call wake_up_all() caused those high priority tasks to wait longer than necessary. In general, this problem is not different from others of its kind, e.g., one caused by mutex_lock(). However, it is specific to MGLRU because it introduced the new wait queue lruvec->mm_state.wait. The purpose of this new wait queue is to avoid the thundering herd problem. If many direct reclaimers rush into try_to_inc_max_seq(), only one can succeed, i.e., the one to wake up the rest, and the rest who failed might cause premature OOM kills if they do not wait. So far there is no evidence supporting this scenario, based on how often the wait has been hit. And this begs the question how useful the wait queue is in practice. Based on Minchan's recommendation, which is in line with his commit 6d4675e60135 ("mm: don't be stuck to rmap lock on reclaim path") and the rest of the MGLRU code which also uses trylock when possible, remove the wait queue. [1] https://android-review.googlesource.com/q/I7ed7fbfd6ef9ce10053347528125dd98c39e50bf Link: https://lkml.kernel.org/r/20230413214326.2147568-1-kaleshsingh@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Minchan Kim <minchan@kernel.org> Reported-by: Wei Wang <wvw@google.com> Acked-by: Yu Zhao <yuzhao@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Cc: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: workingset: update description of the source fileYang Yang
The calculation of workingset size is the core logic of handling refault, it had been updated several times[1][2] after workingset.c was created[3]. But the description hadn't been updated accordingly, this mismatch may confuse the readers. So we update the description to make it consistent to the code. [1] commit 34e58cac6d8f ("mm: workingset: let cache workingset challenge anon") [2] commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU") [3] commit a528910e12ec ("mm: thrash detection-based file cache sizing") Link: https://lkml.kernel.org/r/202304131634494948454@zte.com.cn Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18printk: export console trace point for kcsan/kasan/kfence/kmsanPavankumar Kondeti
The console tracepoint is used by kcsan/kasan/kfence/kmsan test modules. Since this tracepoint is not exported, these modules iterate over all available tracepoints to find the console trace point. Export the trace point so that it can be directly used. Link: https://lkml.kernel.org/r/20230413100859.1492323-1-quic_pkondeti@quicinc.com Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Marco Elver <elver@google.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: vmscan: refactor updating current->reclaim_stateYosry Ahmed
During reclaim, we keep track of pages reclaimed from other means than LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab, which we stash a pointer to in current task_struct. However, we keep track of more than just reclaimed slab pages through this. We also use it for clean file pages dropped through pruned inodes, and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add a helper function that wraps updating it through current, so that future changes to this logic are contained within include/linux/swap.h. Link: https://lkml.kernel.org/r/20230413104034.1086717-4-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: vmscan: move set_task_reclaim_state() near flush_reclaim_state()Yosry Ahmed
Move set_task_reclaim_state() near flush_reclaim_state() so that all helpers manipulating reclaim_state are in close proximity. Link: https://lkml.kernel.org/r/20230413104034.1086717-3-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: vmscan: ignore non-LRU-based reclaim in memcg reclaimYosry Ahmed
Patch series "Ignore non-LRU-based reclaim in memcg reclaim", v6. Upon running some proactive reclaim tests using memory.reclaim, we noticed some tests flaking where writing to memory.reclaim would be successful even though we did not reclaim the requested amount fully Looking further into it, I discovered that *sometimes* we overestimate the number of reclaimed pages in memcg reclaim. Reclaimed pages through other means than LRU-based reclaim are tracked through reclaim_state in struct scan_control, which is stashed in current task_struct. These pages are added to the number of reclaimed pages through LRUs. For memcg reclaim, these pages generally cannot be linked to the memcg under reclaim and can cause an overestimated count of reclaimed pages. This short series tries to address that. Patch 1 ignores pages reclaimed outside of LRU reclaim in memcg reclaim. The pages are uncharged anyway, so even if we end up under-reporting reclaimed pages we will still succeed in making progress during charging. Patches 2-3 are just refactoring. Patch 2 moves set_reclaim_state() helper next to flush_reclaim_state(). Patch 3 adds a helper that wraps updating current->reclaim_state, and renames reclaim_state->reclaimed_slab to reclaim_state->reclaimed. This patch (of 3): We keep track of different types of reclaimed pages through reclaim_state->reclaimed_slab, and we add them to the reported number of reclaimed pages. For non-memcg reclaim, this makes sense. For memcg reclaim, we have no clue if those pages are charged to the memcg under reclaim. Slab pages are shared by different memcgs, so a freed slab page may have only been partially charged to the memcg under reclaim. The same goes for clean file pages from pruned inodes (on highmem systems) or xfs buffer pages, there is no simple way to currently link them to the memcg under reclaim. Stop reporting those freed pages as reclaimed pages during memcg reclaim. This should make the return value of writing to memory.reclaim, and may help reduce unnecessary reclaim retries during memcg charging. Writing to memory.reclaim on the root memcg is considered as cgroup_reclaim(), but for this case we want to include any freed pages, so use the global_reclaim() check instead of !cgroup_reclaim(). Generally, this should make the return value of try_to_free_mem_cgroup_pages() more accurate. In some limited cases (e.g. freed a slab page that was mostly charged to the memcg under reclaim), the return value of try_to_free_mem_cgroup_pages() can be underestimated, but this should be fine. The freed pages will be uncharged anyway, and we can charge the memcg the next time around as we usually do memcg reclaim in a retry loop. Link: https://lkml.kernel.org/r/20230413104034.1086717-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20230413104034.1086717-2-yosryahmed@google.com Fixes: f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects instead of pages") Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: apply __must_check to vmap_pages_range_noflush()Alexander Potapenko
To prevent errors when vmap_pages_range_noflush() or __vmap_pages_range_noflush() silently fail (see the link below for an example), annotate them with __must_check so that the callers do not unconditionally assume the mapping succeeded. Link: https://lkml.kernel.org/r/20230413131223.4135168-4-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com> Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/ Reviewed-by: Marco Elver <elver@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: kmsan: apply __must_check to non-void functionsAlexander Potapenko
Non-void KMSAN hooks may return error codes that indicate that KMSAN failed to reflect the changed memory state in the metadata (e.g. it could not create the necessary memory mappings). In such cases the callers should handle the errors to prevent the tool from using the inconsistent metadata in the future. We mark non-void hooks with __must_check so that error handling is not skipped. Link: https://lkml.kernel.org/r/20230413131223.4135168-3-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dipanjan Das <mail.dipanjan.das@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: hwpoison: support recovery from HugePage copy-on-write faultsLiu Shixin
copy-on-write of hugetlb user pages with uncorrectable errors will result in a kernel crash. This is because the copy is performed in kernel mode and in general we can not handle accessing memory with such errors while in kernel mode. Commit a873dfe1032a ("mm, hwpoison: try to recover from copy-on write faults") introduced the routine copy_user_highpage_mc() to gracefully handle copying of user pages with uncorrectable errors. However, the separate hugetlb copy-on-write code paths were not modified as part of commit a873dfe1032a. Modify hugetlb copy-on-write code paths to use copy_mc_user_highpage() so that they can also gracefully handle uncorrectable errors in user pages. This involves changing the hugetlb specific routine copy_user_large_folio() from type void to int so that it can return an error. Modify the hugetlb userfaultfd code in the same way so that it can return -EHWPOISON if it encounters an uncorrectable error. Link: https://lkml.kernel.org/r/20230413131349.2524210-1-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18memcg: page_cgroup_ino() get memcg from the page's folioYosry Ahmed
In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we observed a warning from page_cgroup_ino() when reading /proc/kpagecgroup. This warning was added to catch fragile reads of a page memcg. Make page_cgroup_ino() get memcg from the page's folio using folio_memcg_check(): that gives it the correct memcg for each page of a folio, so is the right fix. Note that page_folio() is racy, the page's folio can change from under us, but the entire function is racy and documented as such. I dithered between the right fix and the safer "fix": it's unlikely but conceivable that some userspace has learnt that /proc/kpagecgroup gives no memcg on tail pages, and compensates for that in some (racy) way: so continuing to give no memcg on tails, without warning, might be safer. But hwpoison_filter_task(), the only other user of page_cgroup_ino(), persuaded me. It looks as if it currently leaves out tail pages of the selected memcg, by mistake: whereas hwpoison_inject() uses compound_head() and expects the tails to be included. So hwpoison testing coverage has probably been restricted by the wrong output from page_cgroup_ino() (if that memcg filter is used at all): in the short term, it might be safer not to enable wider coverage there, but long term we would regret that. This is based on a patch originally written by Hugh Dickins and retains most of the original commit log [1] The patch was changed to use folio_memcg_check(page_folio(page)) instead of page_memcg_check(compound_head(page)) based on discussions with Matthew Wilcox; where he stated that callers of page_memcg_check() should stop using it due to the ambiguity around tail pages -- instead they should use folio_memcg_check() and handle tail pages themselves. Link: https://lkml.kernel.org/r/20230412003451.4018887-1-yosryahmed@google.com Link: https://lore.kernel.org/linux-mm/20230313083452.1319968-1-yosryahmed@google.com/ [1] Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm/hugetlb_vmemmap: rename ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAPAneesh Kumar K.V
Now we use ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP config option to indicate devdax and hugetlb vmemmap optimization support. Hence rename that to a generic ARCH_WANT_OPTIMIZE_VMEMMAP Link: https://lkml.kernel.org/r/20230412050025.84346-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Tarun Sahu <tsahu@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm/vmemmap/devdax: fix kernel crash when probing devdax devicesAneesh Kumar K.V
commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") added support for using optimized vmmemap for devdax devices. But how vmemmap mappings are created are architecture specific. For example, powerpc with hash translation doesn't have vmemmap mappings in init_mm page table instead they are bolted table entries in the hardware page table vmemmap_populate_compound_pages() used by vmemmap optimization code is not aware of these architecture-specific mapping. Hence allow architecture to opt for this feature. I selected architectures supporting HUGETLB_PAGE_OPTIMIZE_VMEMMAP option as also supporting this feature. This patch fixes the below crash on ppc64. BUG: Unable to handle kernel data access on write at 0xc00c000100400038 Faulting instruction address: 0xc000000001269d90 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc5-150500.34-default+ #2 5c90a668b6bbd142599890245c2fb5de19d7d28a Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.40 (VL950_099) hv:phyp pSeries NIP: c000000001269d90 LR: c0000000004c57d4 CTR: 0000000000000000 REGS: c000000003632c30 TRAP: 0300 Not tainted (6.3.0-rc5-150500.34-default+) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24842228 XER: 00000000 CFAR: c0000000004c57d0 DAR: c00c000100400038 DSISR: 42000000 IRQMASK: 0 .... NIP [c000000001269d90] __init_single_page.isra.74+0x14/0x4c LR [c0000000004c57d4] __init_zone_device_page+0x44/0xd0 Call Trace: [c000000003632ed0] [c000000003632f60] 0xc000000003632f60 (unreliable) [c000000003632f10] [c0000000004c5ca0] memmap_init_zone_device+0x170/0x250 [c000000003632fe0] [c0000000005575f8] memremap_pages+0x2c8/0x7f0 [c0000000036330c0] [c000000000557b5c] devm_memremap_pages+0x3c/0xa0 [c000000003633100] [c000000000d458a8] dev_dax_probe+0x108/0x3e0 [c0000000036331a0] [c000000000d41430] dax_bus_probe+0xb0/0x140 [c0000000036331d0] [c000000000cef27c] really_probe+0x19c/0x520 [c000000003633260] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c0000000036332e0] [c000000000cef888] driver_probe_device+0x58/0x120 [c000000003633320] [c000000000cefa6c] __device_attach_driver+0x11c/0x1e0 [c0000000036333a0] [c000000000cebc58] bus_for_each_drv+0xa8/0x130 [c000000003633400] [c000000000ceefcc] __device_attach+0x15c/0x250 [c0000000036334a0] [c000000000ced458] bus_probe_device+0x108/0x110 [c0000000036334f0] [c000000000ce92dc] device_add+0x7fc/0xa10 [c0000000036335b0] [c000000000d447c8] devm_create_dev_dax+0x1d8/0x530 [c000000003633640] [c000000000d46b60] __dax_pmem_probe+0x200/0x270 [c0000000036337b0] [c000000000d46bf0] dax_pmem_probe+0x20/0x70 [c0000000036337d0] [c000000000d2279c] nvdimm_bus_probe+0xac/0x2b0 [c000000003633860] [c000000000cef27c] really_probe+0x19c/0x520 [c0000000036338f0] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c000000003633970] [c000000000cef888] driver_probe_device+0x58/0x120 [c0000000036339b0] [c000000000cefd08] __driver_attach+0x1d8/0x240 [c000000003633a30] [c000000000cebb04] bus_for_each_dev+0xb4/0x130 [c000000003633a90] [c000000000cee564] driver_attach+0x34/0x50 [c000000003633ab0] [c000000000ced878] bus_add_driver+0x218/0x300 [c000000003633b40] [c000000000cf1144] driver_register+0xa4/0x1b0 [c000000003633bb0] [c000000000d21a0c] __nd_driver_register+0x5c/0x100 [c000000003633c10] [c00000000206a2e8] dax_pmem_init+0x34/0x48 [c000000003633c30] [c0000000000132d0] do_one_initcall+0x60/0x320 [c000000003633d00] [c0000000020051b0] kernel_init_freeable+0x360/0x400 [c000000003633de0] [c000000000013764] kernel_init+0x34/0x1d0 [c000000003633e50] [c00000000000de14] ret_from_kernel_thread+0x5c/0x64 Link: https://lkml.kernel.org/r/20230411142214.64464-1-aneesh.kumar@linux.ibm.com Fixes: 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reported-by: Tarun Sahu <tsahu@linux.ibm.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: add uffdio register ioctls testPeter Xu
This new test tests against the returned ioctls from UFFDIO_REGISTER, where put into uffdio_register.ioctls. This also tests the expected failure cases of UFFDIO_REGISTER, aka: - Register with empty mode should fail with -EINVAL - Register minor without page cache (anon) should fail with -EINVAL Link: https://lkml.kernel.org/r/20230412164548.329376-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: add shmem-private test to uffd-stressPeter Xu
The userfaultfd stress test never tested private shmem, which I think was overlooked long due. Add it so it matches with uffd unit test and it'll cover all memory supported with the three memory types. Meanwhile, rename the memory types a bit. Considering shared mem is the major use case for both shmem / hugetlbfs, changing from: anon, hugetlb, hugetlb_shared, shmem To (with shmem-private added): anon, hugetlb, hugetlb-private, shmem, shmem-private Add the shmem-private to run_vmtests.sh too. Link: https://lkml.kernel.org/r/20230412164546.329355-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: drop sys/dev test in uffd-stress testPeter Xu
With the new uffd unit test covering the /dev/userfaultfd path and syscall path of uffd initializations, we can safely drop the devnode test in the old stress test. One thing is to avoid duplication of running the stress test twice which is an overkill to only test the /dev/ interface in run_vmtests.sh. The other benefit is now all uffd tests (that uses userfaultfd_open) can run automatically as long as any type of interface is enabled (either syscall or dev), so it's more likely to succeed rather than fail due to unprivilege. With this patch lands, we can drop all the "mem_type:XXX" handlings too. Link: https://lkml.kernel.org/r/20230412164525.329176-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: allow uffd test to skip properly with no privilegePeter Xu
Allow skip a unit test properly due to no privilege (e.g. sigbus and events tests). [colin.i.king@gmail.com: fix spelling mistake "priviledge" -> "privilege"] Link: https://lkml.kernel.org/r/20230414081506.1678998-1-colin.i.king@gmail.com Link: https://lkml.kernel.org/r/20230412164520.329163-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>