summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-04-18dt-bindings: display: mediatek: simplify compatibles syntaxKrzysztof Kozlowski
Lists (items) with one item should be just enum because it is shorter, simpler and does not confuse, if one wants to add new entry with a fallback. Convert all of them to enums. OTOH, leave unused "oneOf" entries in anticipation of further growth of the entire binding. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/20230414083311.12197-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-04-18dt-bindings: drm/bridge: ti-sn65dsi86: Fix the video-interfaces.yaml referencesFabio Estevam
video-interface.txt does not exist anymore, as it has been converted to video-interfaces.yaml. Instead of referencing video-interfaces.yaml multiple times, pass it as a $ref to the schema. Signed-off-by: Fabio Estevam <festevam@denx.de> Link: https://lore.kernel.org/r/20230412175800.2537812-1-festevam@gmail.com Signed-off-by: Rob Herring <robh@kernel.org>
2023-04-18dt-bindings: timer: Drop unneeded quotesRob Herring
Cleanup bindings dropping unneeded quotes. Once all these are fixed, checking for this can be enabled in yamllint. Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20230327170146.4104556-1-robh@kernel.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-04-18dt-bindings: interrupt-controller: qcom,pdc: document qcom,qdu1000-pdcKrzysztof Kozlowski
Add QDU1000 PDC, already used in upstreamed DTS. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20230416102831.105136-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring <robh@kernel.org>
2023-04-18checkpatch: introduce proper bindings license checkDmitry Rokosov
All headers from 'include/dt-bindings/' must be verified by checkpatch together with Documentation bindings, because all of them are part of the whole DT bindings system. The requirement is dual licensed and matching patterns: * Schemas: /GPL-2\.0(?:-only)? OR BSD-2-Clause/ * Headers: /GPL-2\.0(?:-only)? OR \S+/ Above patterns suggested by Rob at: https://lore.kernel.org/all/CAL_Jsq+-YJsBO+LuPJ=ZQ=eb-monrwzuCppvReH+af7hYZzNaQ@mail.gmail.com The issue was found during patch review: https://lore.kernel.org/all/20230313201259.19998-4-ddrokosov@sberdevices.ru/ Link: https://lkml.kernel.org/r/20230404191715.7319-1-ddrokosov@sberdevices.ru Signed-off-by: Dmitry Rokosov <ddrokosov@sberdevices.ru> Reviewed-by: Rob Herring <robh@kernel.org> Cc: Andy Whitcroft <apw@canonical.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Joe Perches <joe@perches.com> Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18epoll: rename global epmutexDavidlohr Bueso
As of 4f04cbaf128 ("epoll: use refcount to reduce ep_mutex contention"), this lock is now specific to nesting cases - inserting an epoll fd onto another epoll fd. Rename the lock to be less generic. Link: https://lkml.kernel.org/r/20230411234159.20421-1-dave@stgolabs.net Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: add GDB convenience functions $lx_dentry_name() and $lx_i_dentry()Glenn Washburn
$lx_dentry_name() generates a full VFS path from a given dentry pointer, and $lx_i_dentry() returns the dentry pointer associated with the given inode pointer, if there is one. Link: https://lkml.kernel.org/r/c9a5ad8efbfbd2cc6559e082734eed7628f43a16.1677631565.git.development@efficientek.com Signed-off-by: Glenn Washburn <development@efficientek.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Antonio Borneo <antonio.borneo@foss.st.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: create linux/vfs.py for VFS related GDB helpersGlenn Washburn
Patch series "GDB VFS utils". I've created a couple GDB convenience functions that I found useful when debugging some VFS issues and figure others might find them useful. For instance, they are useful in setting conditional breakpoints on VFS functions where you only care if the dentry path is a certain value. I took the opportunity to create a new "vfs" python module to give VFS related utilities a home. This patch (of 2): This will allow for more VFS specific GDB helpers to be collected in one place. Move utils.dentry_name into the vfs modules. Also a local variable in proc.py was changed from vfs to mnt to prevent a naming collision with the new vfs module. [akpm@linux-foundation.org: add SPDX-License-Identifier] Link: https://lkml.kernel.org/r/cover.1677631565.git.development@efficientek.com Link: https://lkml.kernel.org/r/7bba4c065a8c2c47f1fc5b03a7278005b04db251.1677631565.git.development@efficientek.com Signed-off-by: Glenn Washburn <development@efficientek.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Antonio Borneo <antonio.borneo@foss.st.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18uapi/linux/const.h: prefer ISO-friendly __typeof__Kevin Brodsky
typeof is (still) a GNU extension, which means that it cannot be used when building ISO C (e.g. -std=c99). It should therefore be avoided in uapi headers in favour of the ISO-friendly __typeof__. Unfortunately this issue could not be detected by CONFIG_UAPI_HEADER_TEST=y as the __ALIGN_KERNEL() macro is not expanded in any uapi header. This matters from a userspace perspective, not a kernel one. uapi headers and their contents are expected to be usable in a variety of situations, and in particular when building ISO C applications (with -std=c99 or similar). This particular problem can be reproduced by trying to use the __ALIGN_KERNEL macro directly in application code, say: #include <linux/const.h> int align(int x, int a) { return __KERNEL_ALIGN(x, a); } and trying to build that with -std=c99. Link: https://lkml.kernel.org/r/20230411092747.3759032-1-kevin.brodsky@arm.com Fixes: a79ff731a1b2 ("netfilter: xtables: make XT_ALIGN() usable in exported headers by exporting __ALIGN_KERNEL()") Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reported-by: Ruben Ayrapetyan <ruben.ayrapetyan@arm.com> Tested-by: Ruben Ayrapetyan <ruben.ayrapetyan@arm.com> Reviewed-by: Petr Vorel <pvorel@suse.cz> Tested-by: Petr Vorel <pvorel@suse.cz> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18delayacct: track delays from IRQ/SOFTIRQYang Yang
Delay accounting does not track the delay of IRQ/SOFTIRQ. While IRQ/SOFTIRQ could have obvious impact on some workloads productivity, such as when workloads are running on system which is busy handling network IRQ/SOFTIRQ. Get the delay of IRQ/SOFTIRQ could help users to reduce such delay. Such as setting interrupt affinity or task affinity, using kernel thread for NAPI etc. This is inspired by "sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure"[1]. Also fix some code indent problems of older code. And update tools/accounting/getdelays.c: / # ./getdelays -p 156 -di print delayacct stats ON printing IO accounting PID 156 CPU count real total virtual total delay total delay average 15 15836008 16218149 275700790 18.380ms IO count delay total delay average 0 0 0.000ms SWAP count delay total delay average 0 0 0.000ms RECLAIM count delay total delay average 0 0 0.000ms THRASHING count delay total delay average 0 0 0.000ms COMPACT count delay total delay average 0 0 0.000ms WPCOPY count delay total delay average 36 7586118 0.211ms IRQ count delay total delay average 42 929161 0.022ms [1] commit 52b1364ba0b1("sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure") Link: https://lkml.kernel.org/r/202304081728353557233@zte.com.cn Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Cc: Jiang Xuexin <jiang.xuexin@zte.com.cn> Cc: wangyong <wang.yong12@zte.com.cn> Cc: junhua huang <huang.junhua@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: timerlist: convert int chunks to strAmjad Ouled-Ameur
join() expects strings but integers are given. Convert chunks list to strings before passing it to join() Link: https://lkml.kernel.org/r/20230406221217.1585486-4-f.fainelli@gmail.com Signed-off-by: Amjad Ouled-Ameur <aouledameur@baylibre.com> Signed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: print interruptsFlorian Fainelli
This GDB script prints the interrupts in the system in the same way that /proc/interrupts does. This does include the architecture specific part done by arch_show_interrupts() for x86, ARM, ARM64 and MIPS. Example output from an ARM64 system: (gdb) lx-interruptlist CPU0 CPU1 CPU2 CPU3 10: 3167 1225 1276 2629 GICv2 30 Level arch_timer 13: 0 0 0 0 GICv2 36 Level arm-pmu 14: 0 0 0 0 GICv2 37 Level arm-pmu 15: 0 0 0 0 GICv2 38 Level arm-pmu 16: 0 0 0 0 GICv2 39 Level arm-pmu 28: 0 0 0 0 interrupt-controller@8410640 5 Edge brcmstb-gpio-wake 30: 125 0 0 0 GICv2 128 Level ttyS0 31: 0 0 0 0 interrupt-controller@8416000 0 Level mspi_done 32: 0 0 0 0 interrupt-controller@8410640 3 Edge brcmstb-waketimer 33: 0 0 0 0 interrupt-controller@8418580 8 Edge brcmstb-waketimer-rtc 34: 872 0 0 0 GICv2 230 Level brcm_scmi@0 35: 0 0 0 0 interrupt-controller@8410640 10 Edge 8d0f200.usb-phy 37: 0 0 0 0 GICv2 97 Level PCIe PME 42: 0 0 0 0 GICv2 145 Level xhci-hcd:usb1 43: 94 0 0 0 GICv2 71 Level mmc1 44: 0 0 0 0 GICv2 70 Level mmc0 IPI0: 23 666 154 98 Rescheduling interrupts IPI1: 247 1053 1701 634 Function call interrupts IPI2: 0 0 0 0 CPU stop interrupts IPI3: 0 0 0 0 CPU stop (for crash dump) interrupts IPI4: 0 0 0 0 Timer broadcast interrupts IPI5: 7 9 5 0 IRQ work interrupts IPI6: 0 0 0 0 CPU wake-up interrupts ERR: 0 Link: https://lkml.kernel.org/r/20230406220451.1583239-1-f.fainelli@gmail.com Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: raise error with reduced debugging informationFlorian Fainelli
If CONFIG_DEBUG_INFO_REDUCED is enabled in the kernel configuration, we will typically not be able to load vmlinux-gdb.py and will fail with: Traceback (most recent call last): File "/home/fainelli/work/buildroot/output/arm64/build/linux-custom/vmlinux-gdb.py", line 25, in <module> import linux.utils File "/home/fainelli/work/buildroot/output/arm64/build/linux-custom/scripts/gdb/linux/utils.py", line 131, in <module> atomic_long_counter_offset = atomic_long_type.get_type()['counter'].bitpos KeyError: 'counter' Rather be left wondering what is happening only to find out that reduced debug information is the cause, raise an eror. This was not typically a problem until e3c8d33e0d62 ("scripts/gdb: fix 'lx-dmesg' on 32 bits arch") but it has since then. Link: https://lkml.kernel.org/r/20230406215252.1580538-1-f.fainelli@gmail.com Fixes: e3c8d33e0d62 ("scripts/gdb: fix 'lx-dmesg' on 32 bits arch") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Cc: Antonio Borneo <antonio.borneo@foss.st.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: add a Radix Tree ParserKieran Bingham
Linux makes use of the Radix Tree data structure to store pointers indexed by integer values. This structure is utilised across many structures in the kernel including the IRQ descriptor tables, and several filesystems. This module provides a method to lookup values from a structure given its head node. Usage: The function lx_radix_tree_lookup, must be given a symbol of type struct radix_tree_root, and an index into that tree. The object returned is a generic integer value, and must be cast correctly to the type based on the storage in the data structure. For example, to print the irq descriptor in the sparse irq_desc_tree at index 18, try the following: (gdb) print (struct irq_desc)$lx_radix_tree_lookup(irq_desc_tree, 18) This script previously existed under commit e127a73d41ac471d7e3ba950cf128f42d6ee3448 ("scripts/gdb: add a Radix Tree Parser") and was later reverted with b447e02548a3304c47b78b5e2d75a4312a8f17e1i (Revert "scripts/gdb: add a Radix Tree Parser"). This version expects the XArray based radix tree implementation and has been verified using QEMU/x86 on Linux 6.3-rc5. [f.fainelli@gmail.com: revive and update for xarray implementation] [f.fainelli@gmail.com: guard against a NULL node in the while loop] Link: https://lkml.kernel.org/r/20230405222743.1191674-1-f.fainelli@gmail.com Link: https://lkml.kernel.org/r/20230404214049.1016811-1-f.fainelli@gmail.com Signed-off-by: Kieran Bingham <kieran.bingham@linaro.org> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18lib/rbtree: use '+' instead of '|' for setting color.Noah Goldstein
This has a slight benefit for x86 and has no effect on other targets. The benefit to x86 is it change the codegen for setting a node to block from `mov %r0, %r1; or $RB_BLACK, %r1` to `lea RB_BLACK(%r0), %r1` which saves an instructions. In all other cases it just replace ALU with ALU (or -> and) which perform the same on all machines I am aware of. Total instructions in rbtree.o: Before - 802 After - 782 so it saves about 20 `mov` instructions. Link: https://lkml.kernel.org/r/20230404221350.3806566-1-goldstein.w.n@gmail.com Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Cc: Michel Lespinasse <michel@lespinasse.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18proc/stat: remove arch_idle_time()Heiko Carstens
The last (only) architecture specific arch_idle_time() implementation was removed with commit be76ea614460 ("s390/idle: remove arch_cpu_idle_time() and corresponding code"). Therefore remove the now dead code in fs/proc/stat.c as well. Link: https://lkml.kernel.org/r/20230405143452.2677172-1-hca@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18checkpatch: check for misuse of the link tagsMatthieu Baerts
"Link:" and "Closes:" tags have to be used with public URLs. It is difficult to make sure the link is public but at least we can verify the tag is followed by 'http(s)://'. With that, we avoid such a tag that is not allowed [1]: Closes: <number> Now that we check the "link" tags are followed by a URL, we can relax the check linked to "Reported-by being followed by a link tag" to only verify if a "link" tag is present after the "Reported-by" one. Link: https://lore.kernel.org/linux-doc/CAHk-=wh0v1EeDV3v8TzK81nDC40=XuTdY2MCr0xy3m3FiBV3+Q@mail.gmail.com/ [1] Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-5-d26d1fa66f9f@tessares.net Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kai Wasserbäch <kai@dev.carbon-project.org> Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18checkpatch: allow Closes tags with linksMatthieu Baerts
As a follow-up of a previous patch modifying the documentation to allow using the "Closes:" tag, checkpatch.pl is updated accordingly. checkpatch.pl now no longer complain when the "Closes:" tag is used by itself: commit 76f381bb77a0 ("checkpatch: warn when unknown tags are used for links") ... or after the "Reported-by:" tag: commit d7f1d71e5ef6 ("checkpatch: warn when Reported-by: is not followed by Link:") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/373 Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-4-d26d1fa66f9f@tessares.net Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kai Wasserbäch <kai@dev.carbon-project.org> Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18checkpatch: use a list of "link" tagsMatthieu Baerts
The following commit will allow the use of a similar "link" tag. Because there is a possibility that other similar tags will be added in the future and to reduce the number of places where the code will be modified to allow this new tag, a list with all these "link" tags is now used. Two variables are created from it: one to search for such tags and one to print all tags in a warning message. Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-3-d26d1fa66f9f@tessares.net Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Suggested-by: Joe Perches <joe@perches.com> Acked-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kai Wasserbäch <kai@dev.carbon-project.org> Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18checkpatch: don't print the next line if not definedMatthieu Baerts
When checking if "Reported-by" tag is followed by "Link:", there is no need to print the next line if there is no next line. While at it, also mention in this case that the "Link:" tag should be followed by a URL, similar to the next warning. By doing that, the code is now similar to what is done above when checking if the Co-developed-by tag is properly used. Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-2-d26d1fa66f9f@tessares.net Fixes: d7f1d71e5ef6 ("checkpatch: warn when Reported-by: is not followed by Link:") Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kai Wasserbäch <kai@dev.carbon-project.org> Cc: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Cc: Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18docs: process: allow Closes tags with linksMatthieu Baerts
Since v6.3, checkpatch.pl now complains about the use of "Closes:" tags followed by a link [1]. It also complains if a "Reported-by:" tag is followed by a "Closes:" one [2]. As detailed in the first patch, this "Closes:" tag is used for a bit of time, mainly by DRM and MPTCP subsystems. It is used by some bug trackers to automate the closure of issues when a patch is accepted. It is even planned to use this tag with bugzilla.kernel.org [3]. The first patch updates the documentation to explain what is this "Closes:" tag and how/when to use it. The second patch modifies checkpatch.pl to stop complaining about it. The DRM maintainers and their mailing list have been added in Cc as they are probably interested by these two patches as well. [1] https://lore.kernel.org/all/3b036087d80b8c0e07a46a1dbaaf4ad0d018f8d5.1674217480.git.linux@leemhuis.info/ [2] https://lore.kernel.org/all/bb5dfd55ea2026303ab2296f4a6df3da7dd64006.1674217480.git.linux@leemhuis.info/ [3] https://lore.kernel.org/linux-doc/20230315181205.f3av7h6owqzzw64p@meerkat.local/ This patch (of 5): Making sure a bug tracker is up to date is not an easy task. For example, a first version of a patch fixing a tracked issue can be sent a long time after having created the issue. But also, it can take some time to have this patch accepted upstream in its final form. When it is done, someone -- probably not the person who accepted the patch -- has to remember about closing the corresponding issue. This task of closing and tracking the patch can be done automatically by bug trackers like GitLab [1], GitHub [2] and hopefully soon [3] bugzilla.kernel.org when the appropriated tag is used. The two first ones accept multiple tags but it is probably better to pick one. According to commit 76f381bb77a0 ("checkpatch: warn when unknown tags are used for links"), the "Closes" tag seems to have been used in the past by a few people and it is supported by popular bug trackers. Here is how it has been used in the past: $ git log --no-merges --format=email -P --grep='^Closes: http' | \ grep '^Closes: http' | cut -d/ -f3-5 | sort | uniq -c | sort -rn 391 gitlab.freedesktop.org/drm/intel 79 github.com/multipath-tcp/mptcp_net-next 8 gitlab.freedesktop.org/drm/msm 3 gitlab.freedesktop.org/drm/amd 2 gitlab.freedesktop.org/mesa/mesa 1 patchwork.freedesktop.org/series/73320 1 gitlab.freedesktop.org/lima/linux 1 gitlab.freedesktop.org/drm/nouveau 1 github.com/ClangBuiltLinux/linux 1 bugzilla.netfilter.org/show_bug.cgi?id=1579 1 bugzilla.netfilter.org/show_bug.cgi?id=1543 1 bugzilla.netfilter.org/show_bug.cgi?id=1436 1 bugzilla.netfilter.org/show_bug.cgi?id=1427 1 bugs.debian.org/625804 Likely here, the "Closes" tag was only properly used with GitLab and GitHub. We can also see that it has been used quite a few times (and still used recently) and this is then not a "random tag that makes no sense" like it was the case with "BugLink" recently [4]. It has also been misused but that was a long time ago, when it was common to use many different random tags. checkpatch.pl script should then stop complaining about this "Closes" tag. As suggested by Thorsten [5], if this tag is accepted, it should first be described in the documentation. This is what is done here in this patch. To avoid confusion, the "Closes" should be used with any public bug report. No need to check if the underlying bug tracker supports automations. Having this tag with any kind of public bug reports allows bots like regzbot to clearly identify patches fixing a specific bug and avoid false-positives, e.g. patches mentioning it is related to an issue but not fixing it. As suggested by Thorsten [6] again, if we follow the same logic, the "Closes" tag should then be used after a "Reported-by" one. Note that thanks to this "Closes" tag, the mentioned bug trackers can also locate where a patch has been applied in different branches and repositories. If only the "Link" tag is used, the tracking can also be done but the ticket will not be closed and a manual operation will be needed. Also, these bug trackers have some safeguards: the closure is only done if a commit having the "Closes:" tag is applied in a specific branch. It will then not be closed if a random commit having the same tag is published elsewhere. Also in case of closure, a notification is sent to the owners. Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-0-d26d1fa66f9f@tessares.net Link: https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#default-closing-pattern [1] Link: https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests [2] Link: https://lore.kernel.org/linux-doc/20230315181205.f3av7h6owqzzw64p@meerkat.local/ [3] Link: https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@mail.gmail.com/ [4] Link: https://lore.kernel.org/all/688cd6cb-90ab-6834-a6f5-97080e39ca8e@leemhuis.info/ [5] Link: https://lore.kernel.org/linux-doc/2194d19d-f195-1a1e-41fc-7827ae569351@leemhuis.info/ [6] Link: https://github.com/multipath-tcp/mptcp_net-next/issues/373 Link: https://lkml.kernel.org/r/20230314-doc-checkpatch-closes-tag-v4-1-d26d1fa66f9f@tessares.net Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Suggested-by: Thorsten Leemhuis <linux@leemhuis.info> Acked-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Acked-by: Joe Perches <joe@perches.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Bagas Sanjaya <bagasdotme@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Dwaipayan Ray <dwaipayanray1@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kai Wasserbäch <kai@dev.carbon-project.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: fix lx-timerlist for HRTIMER_MAX_CLOCK_BASES printingPeng Liu
HRTIMER_MAX_CLOCK_BASES is of enum type hrtimer_base_type. To print it as an integer, HRTIMER_MAX_CLOCK_BASES should be converted first. Link: https://lkml.kernel.org/r/TYCP286MB214640FF0E7F04AC3926A39EC6819@TYCP286MB2146.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Peng Liu <liupeng17@lenovo.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Kieran Bingham <kbingham@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: fix lx-timerlist for Python3Peng Liu
Below incompatibilities between Python2 and Python3 made lx-timerlist fail to run under Python3. o xrange() is replaced by range() in Python3 o bytes and str are different types in Python3 o the return value of Inferior.read_memory() is memoryview object in Python3 akpm: cc stable so that older kernels are properly debuggable under newer Python. Link: https://lkml.kernel.org/r/TYCP286MB2146EE1180A4D5176CBA8AB2C6819@TYCP286MB2146.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Peng Liu <liupeng17@lenovo.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18scripts/gdb: fix lx-timerlist for struct timequeue_head changePeng Liu
commit 511885d7061e ("lib/timerqueue: Rely on rbtree semantics for next timer") changed struct timerqueue_head, and so print_active_timers() should be changed accordingly with its way to interpret the structure. Link: https://lkml.kernel.org/r/TYCP286MB21463BD277330B26DDC18903C6819@TYCP286MB2146.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Peng Liu <liupeng17@lenovo.com> Reviewed-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18memfd: pass argument of memfd_fcntl as intLuca Vizzarro
The interface for fcntl expects the argument passed for the command F_ADD_SEALS to be of type int. The current code wrongly treats it as a long. In order to avoid access to undefined bits, we should explicitly cast the argument to int. This commit changes the signature of all the related and helper functions so that they treat the argument as int instead of long. Link: https://lkml.kernel.org/r/20230414152459.816046-5-Luca.Vizzarro@arm.com Signed-off-by: Luca Vizzarro <Luca.Vizzarro@arm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Kevin Brodsky <Kevin.Brodsky@arm.com> Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com> Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: David Laight <David.Laight@ACULAB.com> Cc: Mark Rutland <Mark.Rutland@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: Multi-gen LRU: remove wait_event_killable()Kalesh Singh
Android 14 and later default to MGLRU [1] and field telemetry showed occasional long tail latency (>100ms) in the reclaim path. Tracing revealed priority inversion in the reclaim path. In try_to_inc_max_seq(), when high priority tasks were blocked on wait_event_killable(), the preemption of the low priority task to call wake_up_all() caused those high priority tasks to wait longer than necessary. In general, this problem is not different from others of its kind, e.g., one caused by mutex_lock(). However, it is specific to MGLRU because it introduced the new wait queue lruvec->mm_state.wait. The purpose of this new wait queue is to avoid the thundering herd problem. If many direct reclaimers rush into try_to_inc_max_seq(), only one can succeed, i.e., the one to wake up the rest, and the rest who failed might cause premature OOM kills if they do not wait. So far there is no evidence supporting this scenario, based on how often the wait has been hit. And this begs the question how useful the wait queue is in practice. Based on Minchan's recommendation, which is in line with his commit 6d4675e60135 ("mm: don't be stuck to rmap lock on reclaim path") and the rest of the MGLRU code which also uses trylock when possible, remove the wait queue. [1] https://android-review.googlesource.com/q/I7ed7fbfd6ef9ce10053347528125dd98c39e50bf Link: https://lkml.kernel.org/r/20230413214326.2147568-1-kaleshsingh@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Suggested-by: Minchan Kim <minchan@kernel.org> Reported-by: Wei Wang <wvw@google.com> Acked-by: Yu Zhao <yuzhao@google.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Cc: Oleksandr Natalenko <oleksandr@natalenko.name> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: workingset: update description of the source fileYang Yang
The calculation of workingset size is the core logic of handling refault, it had been updated several times[1][2] after workingset.c was created[3]. But the description hadn't been updated accordingly, this mismatch may confuse the readers. So we update the description to make it consistent to the code. [1] commit 34e58cac6d8f ("mm: workingset: let cache workingset challenge anon") [2] commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU") [3] commit a528910e12ec ("mm: thrash detection-based file cache sizing") Link: https://lkml.kernel.org/r/202304131634494948454@zte.com.cn Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18printk: export console trace point for kcsan/kasan/kfence/kmsanPavankumar Kondeti
The console tracepoint is used by kcsan/kasan/kfence/kmsan test modules. Since this tracepoint is not exported, these modules iterate over all available tracepoints to find the console trace point. Export the trace point so that it can be directly used. Link: https://lkml.kernel.org/r/20230413100859.1492323-1-quic_pkondeti@quicinc.com Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: John Ogness <john.ogness@linutronix.de> Cc: Marco Elver <elver@google.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: vmscan: refactor updating current->reclaim_stateYosry Ahmed
During reclaim, we keep track of pages reclaimed from other means than LRU-based reclaim through scan_control->reclaim_state->reclaimed_slab, which we stash a pointer to in current task_struct. However, we keep track of more than just reclaimed slab pages through this. We also use it for clean file pages dropped through pruned inodes, and xfs buffer pages freed. Rename reclaimed_slab to reclaimed, and add a helper function that wraps updating it through current, so that future changes to this logic are contained within include/linux/swap.h. Link: https://lkml.kernel.org/r/20230413104034.1086717-4-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: vmscan: move set_task_reclaim_state() near flush_reclaim_state()Yosry Ahmed
Move set_task_reclaim_state() near flush_reclaim_state() so that all helpers manipulating reclaim_state are in close proximity. Link: https://lkml.kernel.org/r/20230413104034.1086717-3-yosryahmed@google.com Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: vmscan: ignore non-LRU-based reclaim in memcg reclaimYosry Ahmed
Patch series "Ignore non-LRU-based reclaim in memcg reclaim", v6. Upon running some proactive reclaim tests using memory.reclaim, we noticed some tests flaking where writing to memory.reclaim would be successful even though we did not reclaim the requested amount fully Looking further into it, I discovered that *sometimes* we overestimate the number of reclaimed pages in memcg reclaim. Reclaimed pages through other means than LRU-based reclaim are tracked through reclaim_state in struct scan_control, which is stashed in current task_struct. These pages are added to the number of reclaimed pages through LRUs. For memcg reclaim, these pages generally cannot be linked to the memcg under reclaim and can cause an overestimated count of reclaimed pages. This short series tries to address that. Patch 1 ignores pages reclaimed outside of LRU reclaim in memcg reclaim. The pages are uncharged anyway, so even if we end up under-reporting reclaimed pages we will still succeed in making progress during charging. Patches 2-3 are just refactoring. Patch 2 moves set_reclaim_state() helper next to flush_reclaim_state(). Patch 3 adds a helper that wraps updating current->reclaim_state, and renames reclaim_state->reclaimed_slab to reclaim_state->reclaimed. This patch (of 3): We keep track of different types of reclaimed pages through reclaim_state->reclaimed_slab, and we add them to the reported number of reclaimed pages. For non-memcg reclaim, this makes sense. For memcg reclaim, we have no clue if those pages are charged to the memcg under reclaim. Slab pages are shared by different memcgs, so a freed slab page may have only been partially charged to the memcg under reclaim. The same goes for clean file pages from pruned inodes (on highmem systems) or xfs buffer pages, there is no simple way to currently link them to the memcg under reclaim. Stop reporting those freed pages as reclaimed pages during memcg reclaim. This should make the return value of writing to memory.reclaim, and may help reduce unnecessary reclaim retries during memcg charging. Writing to memory.reclaim on the root memcg is considered as cgroup_reclaim(), but for this case we want to include any freed pages, so use the global_reclaim() check instead of !cgroup_reclaim(). Generally, this should make the return value of try_to_free_mem_cgroup_pages() more accurate. In some limited cases (e.g. freed a slab page that was mostly charged to the memcg under reclaim), the return value of try_to_free_mem_cgroup_pages() can be underestimated, but this should be fine. The freed pages will be uncharged anyway, and we can charge the memcg the next time around as we usually do memcg reclaim in a retry loop. Link: https://lkml.kernel.org/r/20230413104034.1086717-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20230413104034.1086717-2-yosryahmed@google.com Fixes: f2fe7b09a52b ("mm: memcg/slab: charge individual slab objects instead of pages") Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Lameter <cl@linux.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: David Rientjes <rientjes@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: NeilBrown <neilb@suse.de> Cc: Peter Xu <peterx@redhat.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: apply __must_check to vmap_pages_range_noflush()Alexander Potapenko
To prevent errors when vmap_pages_range_noflush() or __vmap_pages_range_noflush() silently fail (see the link below for an example), annotate them with __must_check so that the callers do not unconditionally assume the mapping succeeded. Link: https://lkml.kernel.org/r/20230413131223.4135168-4-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Reported-by: Dipanjan Das <mail.dipanjan.das@gmail.com> Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/ Reviewed-by: Marco Elver <elver@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: kmsan: apply __must_check to non-void functionsAlexander Potapenko
Non-void KMSAN hooks may return error codes that indicate that KMSAN failed to reflect the changed memory state in the metadata (e.g. it could not create the necessary memory mappings). In such cases the callers should handle the errors to prevent the tool from using the inconsistent metadata in the future. We mark non-void hooks with __must_check so that error handling is not skipped. Link: https://lkml.kernel.org/r/20230413131223.4135168-3-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Reviewed-by: Marco Elver <elver@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dipanjan Das <mail.dipanjan.das@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm: hwpoison: support recovery from HugePage copy-on-write faultsLiu Shixin
copy-on-write of hugetlb user pages with uncorrectable errors will result in a kernel crash. This is because the copy is performed in kernel mode and in general we can not handle accessing memory with such errors while in kernel mode. Commit a873dfe1032a ("mm, hwpoison: try to recover from copy-on write faults") introduced the routine copy_user_highpage_mc() to gracefully handle copying of user pages with uncorrectable errors. However, the separate hugetlb copy-on-write code paths were not modified as part of commit a873dfe1032a. Modify hugetlb copy-on-write code paths to use copy_mc_user_highpage() so that they can also gracefully handle uncorrectable errors in user pages. This involves changing the hugetlb specific routine copy_user_large_folio() from type void to int so that it can return an error. Modify the hugetlb userfaultfd code in the same way so that it can return -EHWPOISON if it encounters an uncorrectable error. Link: https://lkml.kernel.org/r/20230413131349.2524210-1-liushixin2@huawei.com Signed-off-by: Liu Shixin <liushixin2@huawei.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18memcg: page_cgroup_ino() get memcg from the page's folioYosry Ahmed
In a kernel with added WARN_ON_ONCE(PageTail) in page_memcg_check(), we observed a warning from page_cgroup_ino() when reading /proc/kpagecgroup. This warning was added to catch fragile reads of a page memcg. Make page_cgroup_ino() get memcg from the page's folio using folio_memcg_check(): that gives it the correct memcg for each page of a folio, so is the right fix. Note that page_folio() is racy, the page's folio can change from under us, but the entire function is racy and documented as such. I dithered between the right fix and the safer "fix": it's unlikely but conceivable that some userspace has learnt that /proc/kpagecgroup gives no memcg on tail pages, and compensates for that in some (racy) way: so continuing to give no memcg on tails, without warning, might be safer. But hwpoison_filter_task(), the only other user of page_cgroup_ino(), persuaded me. It looks as if it currently leaves out tail pages of the selected memcg, by mistake: whereas hwpoison_inject() uses compound_head() and expects the tails to be included. So hwpoison testing coverage has probably been restricted by the wrong output from page_cgroup_ino() (if that memcg filter is used at all): in the short term, it might be safer not to enable wider coverage there, but long term we would regret that. This is based on a patch originally written by Hugh Dickins and retains most of the original commit log [1] The patch was changed to use folio_memcg_check(page_folio(page)) instead of page_memcg_check(compound_head(page)) based on discussions with Matthew Wilcox; where he stated that callers of page_memcg_check() should stop using it due to the ambiguity around tail pages -- instead they should use folio_memcg_check() and handle tail pages themselves. Link: https://lkml.kernel.org/r/20230412003451.4018887-1-yosryahmed@google.com Link: https://lore.kernel.org/linux-mm/20230313083452.1319968-1-yosryahmed@google.com/ [1] Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm/hugetlb_vmemmap: rename ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAPAneesh Kumar K.V
Now we use ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP config option to indicate devdax and hugetlb vmemmap optimization support. Hence rename that to a generic ARCH_WANT_OPTIMIZE_VMEMMAP Link: https://lkml.kernel.org/r/20230412050025.84346-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Tarun Sahu <tsahu@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18mm/vmemmap/devdax: fix kernel crash when probing devdax devicesAneesh Kumar K.V
commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") added support for using optimized vmmemap for devdax devices. But how vmemmap mappings are created are architecture specific. For example, powerpc with hash translation doesn't have vmemmap mappings in init_mm page table instead they are bolted table entries in the hardware page table vmemmap_populate_compound_pages() used by vmemmap optimization code is not aware of these architecture-specific mapping. Hence allow architecture to opt for this feature. I selected architectures supporting HUGETLB_PAGE_OPTIMIZE_VMEMMAP option as also supporting this feature. This patch fixes the below crash on ppc64. BUG: Unable to handle kernel data access on write at 0xc00c000100400038 Faulting instruction address: 0xc000000001269d90 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 7 PID: 1 Comm: swapper/0 Not tainted 6.3.0-rc5-150500.34-default+ #2 5c90a668b6bbd142599890245c2fb5de19d7d28a Hardware name: IBM,9009-42G POWER9 (raw) 0x4e0202 0xf000005 of:IBM,FW950.40 (VL950_099) hv:phyp pSeries NIP: c000000001269d90 LR: c0000000004c57d4 CTR: 0000000000000000 REGS: c000000003632c30 TRAP: 0300 Not tainted (6.3.0-rc5-150500.34-default+) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24842228 XER: 00000000 CFAR: c0000000004c57d0 DAR: c00c000100400038 DSISR: 42000000 IRQMASK: 0 .... NIP [c000000001269d90] __init_single_page.isra.74+0x14/0x4c LR [c0000000004c57d4] __init_zone_device_page+0x44/0xd0 Call Trace: [c000000003632ed0] [c000000003632f60] 0xc000000003632f60 (unreliable) [c000000003632f10] [c0000000004c5ca0] memmap_init_zone_device+0x170/0x250 [c000000003632fe0] [c0000000005575f8] memremap_pages+0x2c8/0x7f0 [c0000000036330c0] [c000000000557b5c] devm_memremap_pages+0x3c/0xa0 [c000000003633100] [c000000000d458a8] dev_dax_probe+0x108/0x3e0 [c0000000036331a0] [c000000000d41430] dax_bus_probe+0xb0/0x140 [c0000000036331d0] [c000000000cef27c] really_probe+0x19c/0x520 [c000000003633260] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c0000000036332e0] [c000000000cef888] driver_probe_device+0x58/0x120 [c000000003633320] [c000000000cefa6c] __device_attach_driver+0x11c/0x1e0 [c0000000036333a0] [c000000000cebc58] bus_for_each_drv+0xa8/0x130 [c000000003633400] [c000000000ceefcc] __device_attach+0x15c/0x250 [c0000000036334a0] [c000000000ced458] bus_probe_device+0x108/0x110 [c0000000036334f0] [c000000000ce92dc] device_add+0x7fc/0xa10 [c0000000036335b0] [c000000000d447c8] devm_create_dev_dax+0x1d8/0x530 [c000000003633640] [c000000000d46b60] __dax_pmem_probe+0x200/0x270 [c0000000036337b0] [c000000000d46bf0] dax_pmem_probe+0x20/0x70 [c0000000036337d0] [c000000000d2279c] nvdimm_bus_probe+0xac/0x2b0 [c000000003633860] [c000000000cef27c] really_probe+0x19c/0x520 [c0000000036338f0] [c000000000cef6b4] __driver_probe_device+0xb4/0x230 [c000000003633970] [c000000000cef888] driver_probe_device+0x58/0x120 [c0000000036339b0] [c000000000cefd08] __driver_attach+0x1d8/0x240 [c000000003633a30] [c000000000cebb04] bus_for_each_dev+0xb4/0x130 [c000000003633a90] [c000000000cee564] driver_attach+0x34/0x50 [c000000003633ab0] [c000000000ced878] bus_add_driver+0x218/0x300 [c000000003633b40] [c000000000cf1144] driver_register+0xa4/0x1b0 [c000000003633bb0] [c000000000d21a0c] __nd_driver_register+0x5c/0x100 [c000000003633c10] [c00000000206a2e8] dax_pmem_init+0x34/0x48 [c000000003633c30] [c0000000000132d0] do_one_initcall+0x60/0x320 [c000000003633d00] [c0000000020051b0] kernel_init_freeable+0x360/0x400 [c000000003633de0] [c000000000013764] kernel_init+0x34/0x1d0 [c000000003633e50] [c00000000000de14] ret_from_kernel_thread+0x5c/0x64 Link: https://lkml.kernel.org/r/20230411142214.64464-1-aneesh.kumar@linux.ibm.com Fixes: 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reported-by: Tarun Sahu <tsahu@linux.ibm.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: add uffdio register ioctls testPeter Xu
This new test tests against the returned ioctls from UFFDIO_REGISTER, where put into uffdio_register.ioctls. This also tests the expected failure cases of UFFDIO_REGISTER, aka: - Register with empty mode should fail with -EINVAL - Register minor without page cache (anon) should fail with -EINVAL Link: https://lkml.kernel.org/r/20230412164548.329376-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: add shmem-private test to uffd-stressPeter Xu
The userfaultfd stress test never tested private shmem, which I think was overlooked long due. Add it so it matches with uffd unit test and it'll cover all memory supported with the three memory types. Meanwhile, rename the memory types a bit. Considering shared mem is the major use case for both shmem / hugetlbfs, changing from: anon, hugetlb, hugetlb_shared, shmem To (with shmem-private added): anon, hugetlb, hugetlb-private, shmem, shmem-private Add the shmem-private to run_vmtests.sh too. Link: https://lkml.kernel.org/r/20230412164546.329355-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: drop sys/dev test in uffd-stress testPeter Xu
With the new uffd unit test covering the /dev/userfaultfd path and syscall path of uffd initializations, we can safely drop the devnode test in the old stress test. One thing is to avoid duplication of running the stress test twice which is an overkill to only test the /dev/ interface in run_vmtests.sh. The other benefit is now all uffd tests (that uses userfaultfd_open) can run automatically as long as any type of interface is enabled (either syscall or dev), so it's more likely to succeed rather than fail due to unprivilege. With this patch lands, we can drop all the "mem_type:XXX" handlings too. Link: https://lkml.kernel.org/r/20230412164525.329176-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: allow uffd test to skip properly with no privilegePeter Xu
Allow skip a unit test properly due to no privilege (e.g. sigbus and events tests). [colin.i.king@gmail.com: fix spelling mistake "priviledge" -> "privilege"] Link: https://lkml.kernel.org/r/20230414081506.1678998-1-colin.i.king@gmail.com Link: https://lkml.kernel.org/r/20230412164520.329163-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: workaround no way to detect uffd-minor + wpPeter Xu
Userfaultfd minor+wp mode was very recently added. The test will fail on the old kernels at ioctl(UFFDIO_CONTINUE) which is misterious. Unfortunately there's no feature bit to detect for this support. Add a hack to leverage WP_UNPOPULATED to detect whether that feature existed, since WP_UNPOPULATED was merged right after minor+wp. Link: https://lkml.kernel.org/r/20230412164517.329152-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: move zeropage test into uffd unit testsPeter Xu
Simplifies it a bit along the way, e.g., drop the never used offset field (which was always the 1st page so offset=0). Introduce uffd_register_with_ioctls() out of uffd_register() to detect uffdio_register.ioctls got returned. Check that automatically when testing UFFDIO_ZEROPAGE on different types of memory (and kernel). Link: https://lkml.kernel.org/r/20230412164404.328815-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: move uffd sig/events tests into uffd unit testsPeter Xu
Move the two tests into the unit test, and convert it into 20 standalone tests: - events test on all 5 mem types, with wp on/off - signal test on all 5 mem types, with wp on/off Testing sigbus on anon... done Testing sigbus on shmem... done Testing sigbus on shmem-private... done Testing sigbus on hugetlb... done Testing sigbus on hugetlb-private... done Testing sigbus-wp on anon... done Testing sigbus-wp on shmem... done Testing sigbus-wp on shmem-private... done Testing sigbus-wp on hugetlb... done Testing sigbus-wp on hugetlb-private... done Testing events on anon... done Testing events on shmem... done Testing events on shmem-private... done Testing events on hugetlb... done Testing events on hugetlb-private... done Testing events-wp on anon... done Testing events-wp on shmem... done Testing events-wp on shmem-private... done Testing events-wp on hugetlb... done Testing events-wp on hugetlb-private... done It'll also remove a lot of global references along the way, e.g. test_uffdio_wp will be replaced with the wp value passed over. Link: https://lkml.kernel.org/r/20230412164400.328798-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: move uffd minor test to unit testPeter Xu
This moves the minor test to the new unit test. Rewrite the content check with char* opeartions to avoid fiddling with my_bcmp(). Drop global vars test_uffdio_minor and test_collapse, just assume test them always in common code for now. OTOH make this single test into five tests: - minor test on [shmem, hugetlb] with wp=false - minor test on [shmem, hugetlb] with wp=true - minor test + collapse on shmem only One thing to mention that we used to test COLLAPSE+WP but that doesn't sound right at all. It's possible it's silently broken but unnoticed because COLLAPSE is not part of the default test suite. Make the MADV_COLLAPSE test fail-able (by skip it when failing), because it's not guaranteed to success anyway. Drop a bunch of useless code after the move, because the unit test always use aligned num of pages and has nothing to do with n_cpus. Link: https://lkml.kernel.org/r/20230412164357.328779-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: move uffd pagemap test to unit testPeter Xu
Move it over and make it split into two tests, one for pagemap and one for the new WP_UNPOPULATED (to be a separate one). The thp pagemap test wasn't really working (with MADV_HUGEPAGE). Let's just drop it (since it never really worked anyway..) and leave that for later. Link: https://lkml.kernel.org/r/20230412164352.328733-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: add framework for uffd-unit-testPeter Xu
Add a framework to be prepared to move unit tests from uffd-stress.c into uffd-unit-tests.c. The goal is to allow detection of uffd features for each test, and also loop over specified types of memory that a test support. Link: https://lkml.kernel.org/r/20230412164348.328710-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: allow allocate_area() to fail properlyPeter Xu
Mostly to detect hugetlb allocation errors and skip hugetlb tests when pages are not allocated. Link: https://lkml.kernel.org/r/20230412164345.328659-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: let uffd_handle_page_fault() take wp parameterPeter Xu
Make the handler optionally apply WP bit when resolving page faults for either missing or minor page faults. This moves towards removing global test_uffdio_wp outside of the common code. Link: https://lkml.kernel.org/r/20230412164341.328618-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-04-18selftests/mm: rename uffd_stats to uffd_argsPeter Xu
Prepare for adding more fields into the struct. Link: https://lkml.kernel.org/r/20230412164337.328607-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Suggested-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>