summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2022-03-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Proper emulation of the OSLock feature of the debug architecture - Scalibility improvements for the MMU lock when dirty logging is on - New VMID allocator, which will eventually help with SVA in VMs - Better support for PMUs in heterogenous systems - PSCI 1.1 support, enabling support for SYSTEM_RESET2 - Implement CONFIG_DEBUG_LIST at EL2 - Make CONFIG_ARM64_ERRATUM_2077057 default y - Reduce the overhead of VM exit when no interrupt is pending - Remove traces of 32bit ARM host support from the documentation - Updated vgic selftests - Various cleanups, doc updates and spelling fixes RISC-V: - Prevent KVM_COMPAT from being selected - Optimize __kvm_riscv_switch_to() implementation - RISC-V SBI v0.3 support s390: - memop selftest - fix SCK locking - adapter interruptions virtualization for secure guests - add Claudio Imbrenda as maintainer - first step to do proper storage key checking x86: - Continue switching kvm_x86_ops to static_call(); introduce static_call_cond() and __static_call_ret0 when applicable. - Cleanup unused arguments in several functions - Synthesize AMD 0x80000021 leaf - Fixes and optimization for Hyper-V sparse-bank hypercalls - Implement Hyper-V's enlightened MSR bitmap for nested SVM - Remove MMU auditing - Eager splitting of page tables (new aka "TDP" MMU only) when dirty page tracking is enabled - Cleanup the implementation of the guest PGD cache - Preparation for the implementation of Intel IPI virtualization - Fix some segment descriptor checks in the emulator - Allow AMD AVIC support on systems with physical APIC ID above 255 - Better API to disable virtualization quirks - Fixes and optimizations for the zapping of page tables: - Zap roots in two passes, avoiding RCU read-side critical sections that last too long for very large guests backed by 4 KiB SPTEs. - Zap invalid and defunct roots asynchronously via concurrency-managed work queue. - Allowing yielding when zapping TDP MMU roots in response to the root's last reference being put. - Batch more TLB flushes with an RCU trick. Whoever frees the paging structure now holds RCU as a proxy for all vCPUs running in the guest, i.e. to prolongs the grace period on their behalf. It then kicks the the vCPUs out of guest mode before doing rcu_read_unlock(). Generic: - Introduce __vcalloc and use it for very large allocations that need memcg accounting" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits) KVM: use kvcalloc for array allocations KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2 kvm: x86: Require const tsc for RT KVM: x86: synthesize CPUID leaf 0x80000021h if useful KVM: x86: add support for CPUID leaf 0x80000021 KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()" kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU KVM: arm64: fix typos in comments KVM: arm64: Generalise VM features into a set of flags KVM: s390: selftests: Add error memop tests KVM: s390: selftests: Add more copy memop tests KVM: s390: selftests: Add named stages for memop test KVM: s390: selftests: Add macro as abstraction for MEM_OP KVM: s390: selftests: Split memop tests KVM: s390x: fix SCK locking RISC-V: KVM: Implement SBI HSM suspend call RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function RISC-V: Add SBI HSM suspend related defines RISC-V: KVM: Implement SBI v0.3 SRST extension ...
2022-03-24Merge tag 'flexible-array-transformations-5.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull flexible-array transformations from Gustavo Silva: "Treewide patch that replaces zero-length arrays with flexible-array members. This has been baking in linux-next for a whole development cycle" * tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: treewide: Replace zero-length arrays with flexible-array members
2022-03-24Merge tag 'prlimit-tasklist_lock-for-v5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull tasklist_lock optimizations from Eric Biederman: "prlimit and getpriority tasklist_lock optimizations The tasklist_lock popped up as a scalability bottleneck on some testing workloads. The readlocks in do_prlimit and set/getpriority are not necessary in all cases. Based on a cycles profile, it looked like ~87% of the time was spent in the kernel, ~42% of which was just trying to get *some* spinlock (queued_spin_lock_slowpath, not necessarily the tasklist_lock). The big offenders (with rough percentages in cycles of the overall trace): - do_wait 11% - setpriority 8% (done previously in commit 7f8ca0edfe07) - kill 8% - do_exit 5% - clone 3% - prlimit64 2% (this patchset) - getrlimit 1% (this patchset) I can't easily test this patchset on the original workload for various reasons. Instead, I used the microbenchmark below to at least verify there was some improvement. This patchset had a 28% speedup (12% from baseline to set/getprio, then another 14% for prlimit). This series used to do the setpriority case, but an almost identical change was merged as commit 7f8ca0edfe07 ("kernel/sys.c: only take tasklist_lock for get/setpriority(PRIO_PGRP)") so that has been dropped from here. One interesting thing is that my libc's getrlimit() was calling prlimit64, so hoisting the read_lock(tasklist_lock) into sys_prlimit64 had no effect - it essentially optimized the older syscalls only. I didn't do that in this patchset, but figured I'd mention it since it was an option from the previous patch's discussion" micobenchmark.c: --------------- int main(int argc, char **argv) { pid_t child; struct rlimit rlim[1]; fork(); fork(); fork(); fork(); fork(); fork(); for (int i = 0; i < 5000; i++) { child = fork(); if (child < 0) exit(1); if (child > 0) { usleep(1000); kill(child, SIGTERM); waitpid(child, NULL, 0); } else { for (;;) { setpriority(PRIO_PROCESS, 0, getpriority(PRIO_PROCESS, 0)); getrlimit(RLIMIT_CPU, rlim); } } } return 0; } Link: https://lore.kernel.org/lkml/20211213220401.1039578-1-brho@google.com/ [v1] Link: https://lore.kernel.org/lkml/20220105212828.197013-1-brho@google.com/ [v2] Link: https://lore.kernel.org/lkml/20220106172041.522167-1-brho@google.com/ [v3] * tag 'prlimit-tasklist_lock-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: prlimit: do not grab the tasklist_lock prlimit: make do_prlimit() static
2022-03-24netfilter: egress: Report interface as outgoingPhil Sutter
Otherwise packets in egress chains seem like they are being received by the interface, not sent out via it. Fixes: 42df6e1d221dd ("netfilter: Introduce egress hook") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Florian Westphal <fw@strlen.de>
2022-03-24io_uring: remove IORING_CQE_F_MSGJens Axboe
This was introduced with the message ring opcode, but isn't strictly required for the request itself. The sender can encode what is needed in user_data, which is passed to the receiver. It's unclear if having a separate flag that essentially says "This CQE did not originate from an SQE on this ring" provides any real utility to applications. While we can always re-introduce a flag to provide this information, we cannot take it away at a later point in time. Remove the flag while we still can, before it's in a released kernel. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-23kexec: make crashk_res, crashk_low_res and crash_notes symbols always visibleJisheng Zhang
Patch series "kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef", v2. Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and increase compile coverage. I only modified x86, arm, arm64 and riscv, other architectures such as sh, powerpc and s390 are better to be kept kexec code as-is so they are not touched. This patch (of 5): Make the forward declarations of crashk_res, crashk_low_res and crash_notes always visible. Code referring to these symbols can then just check for IS_ENABLED(CONFIG_KEXEC_CORE), instead of requiring conditional compilation using an #ifdef, thus preparing to increase compile coverage and simplify the code. Link: https://lkml.kernel.org/r/20211206160514.2000-1-jszhang@kernel.org Link: https://lkml.kernel.org/r/20211206160514.2000-2-jszhang@kernel.org Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Acked-by: Baoquan He <bhe@redhat.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Palmer Dabbelt <palmer@rivosinc.com> Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23init.h: improve __setup and early_param documentationRandy Dunlap
Igor noted in [1] that there are quite a few __setup() handling functions that return incorrect values. Doing this can be harmless, but it can also cause strings to be added to init's argument or environment list, polluting them. Since __setup() handling and return values are not documented, first add documentation for that. Also add more documentation for early_param() handling and return values. For __setup() functions, returning 0 (not handled) has questionable value if it is just a malformed option value, as in rodata=junk since returning 0 would just cause "rodata=junk" to be added to init's environment unnecessarily: Run /sbin/init as init process with arguments: /sbin/init with environment: HOME=/ TERM=linux splash=native rodata=junk Also, there are no recommendations on whether to print a warning when an unknown parameter value is seen. I am not addressing that here. [1] lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru Link: https://lkml.kernel.org/r/20220221050852.1147-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru> Cc: Ingo Molnar <mingo@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23bitfield: add explicit inclusions to the exampleAndy Shevchenko
It's not obvious that bitfield.h doesn't guarantee the bits.h inclusion and the example in the former is confusing. Some developers think that it's okay to just include bitfield.h to get it working. Change example to explicitly include necessary headers in order to avoid confusion. Link: https://lkml.kernel.org/r/20220207123341.47533-1-andriy.shevchenko@linux.intel.com Fixes: 3e9b3112ec74 ("add basic register-field manipulation macros") Depends-on: 8bd9cb51daac ("locking/atomics, asm-generic: Move some macros from <linux/bitops.h> to a new <linux/bits.h> file") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reported-by: Jan Dąbroś <jsd@semihalf.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23ilog2: force inlining of __ilog2_u32() and __ilog2_u64()Christophe Leroy
Building a kernel with CONFIG_CC_OPTIMISE_FOR_SIZE leads to __ilog2_u32() being duplicated 50 times and __ilog2_u64() 3 times in vmlinux on a tiny powerpc32 config. __ilog2_u32() being 2 instructions it is not worth being kept out of line, so force inlining. Allthough the u64 version is a bit bigger, there is still a small benefit in keeping it inlined. On a 64 bits config there's a real benefit. With this change the size of vmlinux text is reduced by 1 kbytes, which is approx 50% more than the size of the removed functions. Before the patch there is for instance: c00d2a94 <__ilog2_u32>: c00d2a94: 7c 63 00 34 cntlzw r3,r3 c00d2a98: 20 63 00 1f subfic r3,r3,31 c00d2a9c: 4e 80 00 20 blr c00d36d8 <__order_base_2>: c00d36d8: 28 03 00 01 cmplwi r3,1 c00d36dc: 40 81 00 2c ble c00d3708 <__order_base_2+0x30> c00d36e0: 94 21 ff f0 stwu r1,-16(r1) c00d36e4: 7c 08 02 a6 mflr r0 c00d36e8: 38 63 ff ff addi r3,r3,-1 c00d36ec: 90 01 00 14 stw r0,20(r1) c00d36f0: 4b ff f3 a5 bl c00d2a94 <__ilog2_u32> c00d36f4: 80 01 00 14 lwz r0,20(r1) c00d36f8: 38 63 00 01 addi r3,r3,1 c00d36fc: 7c 08 03 a6 mtlr r0 c00d3700: 38 21 00 10 addi r1,r1,16 c00d3704: 4e 80 00 20 blr c00d3708: 38 60 00 00 li r3,0 c00d370c: 4e 80 00 20 blr With the patch it has become: c00d356c <__order_base_2>: c00d356c: 28 03 00 01 cmplwi r3,1 c00d3570: 40 81 00 14 ble c00d3584 <__order_base_2+0x18> c00d3574: 38 63 ff ff addi r3,r3,-1 c00d3578: 7c 63 00 34 cntlzw r3,r3 c00d357c: 20 63 00 20 subfic r3,r3,32 c00d3580: 4e 80 00 20 blr c00d3584: 38 60 00 00 li r3,0 c00d3588: 4e 80 00 20 blr No more need for __order_base_2() to setup a stack frame and save/restore caller address. And the following 'add 1' is merged in the subtract. Another typical use of it: c080ff28 <hugepagesz_setup>: ... c080fff8: 7f c3 f3 78 mr r3,r30 c080fffc: 4b 8f 81 f1 bl c01081ec <__ilog2_u32> c0810000: 38 63 ff f2 addi r3,r3,-14 ... Becomes c080ff1c <hugepagesz_setup>: ... c080ffec: 7f c3 00 34 cntlzw r3,r30 c080fff0: 20 63 00 11 subfic r3,r3,17 ... Here no need to move r30 argument to r3 then substract 14 to result. Just work on r30 and merge the 'sub 14' with the 'sub from 31'. Link: https://lkml.kernel.org/r/803a2ac3d923ebcfd0dd40f5886b05cae7bb0aba.1644243860.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23include: drop pointless __compiler_offsetof indirectionRasmus Villemoes
(1) compiler_types.h is unconditionally included via an -include flag (see scripts/Makefile.lib), and it defines __compiler_offsetof unconditionally. So testing for definedness of __compiler_offsetof is mostly pointless. (2) Every relevant compiler provides __builtin_offsetof (even sparse has had that for 14 years), and if for whatever reason one would end up picking up the poor man's fallback definition (C file compiler with completely custom CFLAGS?), newer clang versions won't treat the result as an Integer Constant Expression, so if used in place where such is required (static initializer or static_assert), one would get errors like t.c:11:16: error: static_assert expression is not an integral constant expression t.c:11:16: note: cast that performs the conversions of a reinterpret_cast is not allowed in a constant expression t.c:4:33: note: expanded from macro 'offsetof' #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER) So just define offsetof unconditionally and directly in terms of __builtin_offsetof. Link: https://lkml.kernel.org/r/20220202102147.326672-1-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23Documentation/sparse: add hints about __CHECKER__Bjorn Helgaas
Several attributes depend on __CHECKER__, but previously there was no clue in the tree about when __CHECKER__ might be defined. Add hints at the most common places (__kernel, __user, __iomem, __bitwise) and in the sparse documentation. Link: https://lkml.kernel.org/r/20220310220927.245704-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: "Michael S . Tsirkin" <mst@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23linux/types.h: remove unnecessary __bitwise__Bjorn Helgaas
There are no users of "__bitwise__" except the definition of "__bitwise". Remove __bitwise__ and define __bitwise directly. This is a follow-up to 05de97003c77 ("linux/types.h: enable endian checks for all sparse builds"). [akpm@linux-foundation.org: change the tools/include/linux/types.h definition also] Link: https://lkml.kernel.org/r/20220310220927.245704-2-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23Merge tag 'arm-dt-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/socLinus Torvalds
Pull ARM devicetree updates from Arnd Bergmann: "After a somewhat quiet 5.17 release, the size of the DT changes is a bit larger again. There are nine new SoC that get added, all of them related to existing platforms: - Airoha (formerly Mediatek/EcoNet) EN7523 networking SoC and EVB - Mediatek mt6582 tablet platform with the Prestigio PMT5008 3G tablet - Microchip Lan966 networking SoC and it evaluation board - Qualcomm Snapdragon 625/632 midrange phone SoCs, with the LG Nexus 5X and Fairphone FP3 phones - Renesas RZ/G2LC and RZ/V2L general-purpose embedded SoCs, along with their evaluation boards - Samsung Exynos 850 phone SoC and reference board - Samsung Exynos7885 with the Samsung Galaxy A8 (2018) phone - Tesla FSD (Fully Self-Driving), an automotive SoC loosely derived from the Samsung Exynos family. - TI K3/AM62 SoC and reference board Support for additional functionality in existing dts files is added all over the place: Samsung, Renesas, Mstar, wpcm450, OMAP, AT91, Allwinner, i.MX, Tegra, Aspeed, Oxnas, Qualcomm, Mediatek, and Broadcom. Samsung has a rework for its pinctrl schema that is a bit tricky and requires driver changes to be included here. A few more platforms only have smaller cleanups and DT Schema fixes, this includes SoCFPGA, ux500, ixp4xx, STi, Xilinx Zynq, LG, and Juno. The new machines are really too many to list, but I'll do it anyway: Allwinner: - A20-Marsboard development board Amlogic: - Amediatek X96-AIR (Amlogic S905X3) - CYX A95XF3-AIR (Amlogic S905X3) - Haochuangy H96-Max (Amlogic S905X3) - Amlogic AQ222 (Amlogic S4) - OSMC Vero 4K+ (Amlogic S905D) Arm Juno: - Separate DT depending on SCMI firmware version Aspeed: - Quanta S6Q BMC (AST2600) - ASRock ROMED8HM3 (AST2500) Broadcom: - Raspberry Pi Zero 2 W Marvell MVEBU/Armada: - Ctera C200 V1 NAS (kirkwood) - Ctera C200 V2 NAS (armada-370) Mstar: - DongShanPiOne, a low-end embedded board - Miyoo Mini handheld game console NXP i.MX: - Numerous i.MX8M Mini based boards in even more variations, but none based on other SoCs this time: Protonic PRT8MM, emCON-MX8M Mini, Toradex Verdin, and Gateworks GW7903 Qualcomm: - Google Herobrine R1 Chromebook platform (Snapdragon 7c Gen 3) - SHIFT6mq phone (Snapdragon 845) - Samsung Galaxy Book2 (Snapdragon 850) - Snapdragon 8 Gen 1 Hardware Development Kit TI OMAP: - SanCloud BeagleBone Enhanced WiFi Rockchip: - Pine64 PineNote ereader tablet (rk356x) - Bananapi-R2-Pro (rk356x) STM32: - emtrion emSBS-Argon embedded board (stm32mp157c)" * tag 'arm-dt-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (627 commits) arm64: dts: n5x: drop invalid property and fix edac node name arm64: dts: fsd: Add the MCT support arm64: dts: stingray: Fix spi clock name arm64: dts: ns2: Fix spi clock name ARM: dts: rockchip: Update regulator name for PX3 ARM: dts: rockchip: Add #clock-cells value for rk805 arm64: dts: rockchip: Add #clock-cells value for rk805 arm64: dts: rockchip: Remove vcc13 and vcc14 for rk808 arm64: dts: rockchip: Fix SDIO regulator supply properties on rk3399-firefly ARM: dts: at91: sama7g5: Add NAND support ARM: dts: at91: sama7g5: add eic node ARM: dts: at91: sama7g5: Remove unused properties in i2c nodes ARM: dts: at91: sam9x60ek: modify vdd_1v5 regulator to vdd_1v15 arm64: dts: lg: align pl330 node name with dtschema arm64: dts: lg: add dma-cells to pl330 node arm64: dts: juno: align pl330 node name with dtschema arm64: dts: broadcom: Fix sata nodename arm64: dts: n5x: add sdr edac support arm64: dts: agilex/stratix10: add clock-names to USB DWC2 node dt-bindings: usb: dwc2: add disable-over-current ...
2022-03-23Merge tag 'arm-drivers-5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM driver updates from Arnd Bergmann: "There are a few separately maintained driver subsystems that we merge through the SoC tree, notable changes are: - Memory controller updates, mainly for Tegra and Mediatek SoCs, and clarifications for the memory controller DT bindings - SCMI firmware interface updates, in particular a new transport based on OPTEE and support for atomic operations. - Cleanups to the TEE subsystem, refactoring its memory management For SoC specific drivers without a separate subsystem, changes include - Smaller updates and fixes for TI, AT91/SAMA5, Qualcomm and NXP Layerscape SoCs. - Driver support for Microchip SAMA5D29, Tesla FSD, Renesas RZ/G2L, and Qualcomm SM8450. - Better power management on Mediatek MT81xx, NXP i.MX8MQ and older NVIDIA Tegra chips" * tag 'arm-drivers-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (154 commits) ARM: spear: fix typos in comments soc/microchip: fix invalid free in mpfs_sys_controller_delete soc: s4: Add support for power domains controller dt-bindings: power: add Amlogic s4 power domains bindings ARM: at91: add support in soc driver for new SAMA5D29 soc: mediatek: mmsys: add sw0_rst_offset in mmsys driver data dt-bindings: memory: renesas,rpc-if: Document RZ/V2L SoC memory: emif: check the pointer temp in get_device_details() memory: emif: Add check for setup_interrupts dt-bindings: arm: mediatek: mmsys: add support for MT8186 dt-bindings: mediatek: add compatible for MT8186 pwrap soc: mediatek: pwrap: add pwrap driver for MT8186 SoC soc: mediatek: mmsys: add mmsys reset control for MT8186 soc: mediatek: mtk-infracfg: Disable ACP on MT8192 soc: ti: k3-socinfo: Add AM62x JTAG ID soc: mediatek: add MTK mutex support for MT8186 soc: mediatek: mmsys: add mt8186 mmsys routing table soc: mediatek: pm-domains: Add support for mt8186 dt-bindings: power: Add MT8186 power domains soc: mediatek: pm-domains: Add support for mt8195 ...
2022-03-23Merge tag 'arm-soc-5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC updates from Arnd Bergmann: "SoC specific code is generally used for older platforms that don't (yet) use device tree to do the same things. - Support is added for i.MXRT10xx, a Cortex-M7 based microcontroller from NXP. At the moment this is still incomplete as other portions are merged through different trees. - Long abandoned support for running NOMMU ARMv4 or ARMv5 platforms gets removed, now the Arm NOMMU platforms are limited to the Cortex-M family of microcontrollers - Two old PXA boards get removed, along with corresponding driver bits. - Continued cleanup of the Intel IXP4xx platforms, removing some remnants of the old board files. - Minor Cleanups and fixes for Orion, PXA, MMP, Mstar, Samsung - CPU idle support for AT91 - A system controller driver for Polarfire" * tag 'arm-soc-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (29 commits) ARM: remove support for NOMMU ARMv4/v5 ARM: PXA: fix up decompressor code soc: microchip: make mpfs_sys_controller_put static ARM: pxa: remove Intel Imote2 and Stargate 2 boards ARM: mmp: Fix failure to remove sram device ARM: mstar: Select ARM_ERRATA_814220 soc: add microchip polarfire soc system controller ARM: at91: Kconfig: select PM_OPP ARM: at91: PM: add cpu idle support for sama7g5 ARM: at91: ddr: fix typo to align with datasheet naming ARM: at91: ddr: align macro definitions ARM: at91: ddr: remove CONFIG_SOC_SAMA7 dependency ARM: ixp4xx: Convert to SPARSE_IRQ and P2V ARM: ixp4xx: Drop all common code ARM: ixp4xx: Drop custom DMA coherency and bouncing ARM: ixp4xx: Remove feature bit accessors net: ixp4xx_hss: Check features using syscon net: ixp4xx_eth: Drop platform data support soc: ixp4xx-npe: Access syscon regs using regmap soc: ixp4xx: Add features from regmap helper ...
2022-03-23Merge tag 'asm-generic-5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "There are three sets of updates for 5.18 in the asm-generic tree: - The set_fs()/get_fs() infrastructure gets removed for good. This was already gone from all major architectures, but now we can finally remove it everywhere, which loses some particularly tricky and error-prone code. There is a small merge conflict against a parisc cleanup, the solution is to use their new version. - The nds32 architecture ends its tenure in the Linux kernel. The hardware is still used and the code is in reasonable shape, but the mainline port is not actively maintained any more, as all remaining users are thought to run vendor kernels that would never be updated to a future release. - A series from Masahiro Yamada cleans up some of the uapi header files to pass the compile-time checks" * tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits) nds32: Remove the architecture uaccess: remove CONFIG_SET_FS ia64: remove CONFIG_SET_FS support sh: remove CONFIG_SET_FS support sparc64: remove CONFIG_SET_FS support lib/test_lockup: fix kernel pointer check for separate address spaces uaccess: generalize access_ok() uaccess: fix type mismatch warnings from access_ok() arm64: simplify access_ok() m68k: fix access_ok for coldfire MIPS: use simpler access_ok() MIPS: Handle address errors for accesses above CPU max virtual user address uaccess: add generic __{get,put}_kernel_nofault nios2: drop access_ok() check from __put_user() x86: use more conventional access_ok() definition x86: remove __range_not_ok() sparc64: add __{get,put}_kernel_nofault() nds32: fix access_ok() checks in get/put_user uaccess: fix nios2 and microblaze get_user_8() sparc64: fix building assembly files ...
2022-03-23Merge tag 'sound-5.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "It's been a fairly calm development cycle. There are a few last-minute ALSA core fixes, most notably for covering PCM ioctl races, but the most of rest are device-specific changes. Below are some highlights: ALSA core: - Fixes for PCM ioctl races that may lead to UAF - Fix for oversized allocations in PCM OSS layer ASoC: - Start of moving SoF to support multiple IPC mechanisms - Use of NHLT ACPI table to reduce the amount of quirking required for Intel systems - Preliminary works forthcoming Intel AVS driver for legacy Intel DSP firmwares - Support for AMD PDM, Atmel PDMC, Awinic AW8738, i.MX cards with TLV320AIC31xx, Intel machines with CS35L41 and ESSX8336, Mediatek MT8181 wideband bluetooth, nVidia Tegra234, Qualcomm SC7280, Renesas RZ/V2L, Texas Instruments TAS585M HD-audio: - Driver re-binding fix for HD-audio - Updates for Intel ADL and Tegra234, various platform quirks for Dell, HP, Lenovo, ASUS, Samsung and Clevo machines USB-audio: - Quirk updates for Scarlett2, RODE, Corsair devices" * tag 'sound-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (486 commits) ALSA: hda/realtek: Add alc256-samsung-headphone fixup ALSA: pci: fix reading of swapped values from pcmreg in AC97 codec ALSA: pcm: Add stream lock during PCM reset ioctl operations ALSA: pcm: Fix races among concurrent prealloc proc writes ALSA: pcm: Fix races among concurrent prepare and hw_params/hw_free calls ALSA: pcm: Fix races among concurrent read/write and buffer changes ALSA: pcm: Fix races among concurrent hw_params and hw_free calls ASoC: atmel: mchp-pdmc: print the correct property name MAINTAINERS: Add Shengjiu to maintainer list of sound/soc/fsl ASoC: SOF: Add a new dai_get_clk topology IPC op ASoC: SOF: topology: Add ops for setting up and tearing down pipelines ASoC: SOF: expose sof_route_setup() ASoC: SOF: Add dai_link_fixup PCM op for IPC3 ASoC: SOF: Add trigger PCM op for IPC3 ASoC: SOF: Define hw_params PCM op for IPC3 ASoC: SOF: Introduce IPC3 PCM hw_free op ASoC: SOF: pcm: expose the sof_pcm_setup_connected_widgets() function ASoC: SOF: Introduce IPC-specific PCM ops ASoC: SOF: Add bytes_ext control IPC ops for IPC3 ASoC: SOF: Add bytes_get/put control IPC ops for IPC3 ...
2022-03-23Merge tag 'media/v5.18-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - a major reorg at platform Kconfig/Makefile files, organizing them per vendor. The other media Kconfig/Makefile files also sorted - New sensor drivers: hi847, isl7998x, ov08d10 - New Amphion vpu decoder stateful driver - New Atmel microchip csi2dc driver - tegra-vde driver promoted from staging - atomisp: some fixes for it to work on BYT - imx7-mipi-csis driver promoted from staging and renamed - camss driver got initial support for VFE hardware version Titan 480 - mtk-vcodec has gained support for MT8192 - lots of driver changes, fixes and improvements * tag 'media/v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (417 commits) media: nxp: Restrict VIDEO_IMX_MIPI_CSIS to ARCH_MXC or COMPILE_TEST media: amphion: cleanup media device if register it fail media: amphion: fix some issues to improve robust media: amphion: fix some error related with undefined reference to __divdi3 media: amphion: fix an issue that using pm_runtime_get_sync incorrectly media: vidtv: use vfree() for memory allocated with vzalloc() media: m5mols/m5mols.h: document new reset field media: pixfmt-yuv-planar.rst: fix PIX_FMT labels media: platform: Remove unnecessary print function dev_err() media: amphion: Add missing of_node_put() in vpu_core_parse_dt() media: mtk-vcodec: Add missing of_node_put() in mtk_vdec_hw_prob_done() media: platform: amphion: Fix build error without MAILBOX media: spi: Kconfig: Place SPI drivers on a single menu media: i2c: Kconfig: move camera drivers to the top media: atomisp: fix bad usage at error handling logic media: platform: rename mediatek/mtk-jpeg/ to mediatek/jpeg/ media: media/*/Kconfig: sort entries media: Kconfig: cleanup VIDEO_DEV dependencies media: platform/*/Kconfig: make manufacturer menus more uniform media: platform: Create vendor/{Makefile,Kconfig} files ...
2022-03-23Merge tag 'ata-5.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata Pull ata updates from Damien Le Moal: "For this cycle, no big change but many small fixes and code cleanup to libata, the ahci driver and various pata drivers. In more details: - Code simplification in pata_platform using platform_get_mem_or_io(), from Lad. - Fix read-only arrays declarations as const in pata_atiixp and pata_pdc202xx_old, from Colin. - Various cleanups and code simplification in libata-scsi, from me. - Remove dead code in libata-acpi, from Sergey. - Skip device scan deboune delay for Marvell 88SE9235 adapters (ahci) to speedup boot, from Paul. - Simplify functions declaration and use for functions always returning 0 in libata-core, from Sergey. - Non-fatal error fixes and in the pata_hpt366 and pata_hpt3x2n drivers, from Sergey. - Various code cleanup in the pata_artop, pata_hpt37x, pata_hpt366, pata_hpt3x2n, pata_samsung_cf and sata_rcar drivers, from Sergey. - Some libata-sff and libata-scsi code cleanup (e.g. change functions to return "bool"), from Sergey. - Renae ahci_board_mobile to board_ahci_low_power to be more descriptive of the feature as that is also used on PC and server AHCI adapters, from Mario. - Cleanup of OF match tables, from Geert. - Simplify the pata_pxa driver initialization using platform_get_irq(), from Minghao" * tag 'ata-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: (38 commits) ata: pata_pxa: Use platform_get_irq() to get the interrupt ata: Drop commas after OF match table sentinels ata: ahci: Rename CONFIG_SATA_LPM_MOBILE_POLICY configuration item ata: ahci: Rename `AHCI_HFLAG_IS_MOBILE` ata: ahci: Rename board_ahci_mobile ata: pata_hpt37x: merge transfer mode setting methods ata: libata-sff: use *switch* statement in ata_sff_dev_classify() ata: add/use ata_taskfile::{error|status} fields ata: Kconfig: fix sata gemini compile test condition ata: libata-scsi: use *switch* statements to check SCSI command codes ata: libata-sff: refactor ata_sff_altstatus() ata: libata-sff: refactor ata_sff_set_devctl() ata: libata-sff: make ata_resources_present() return 'bool' ata: pata_hpt3x2n: disable fast interrupts in prereset() method ata: pata_hpt37x: disable fast interrupts in prereset() method ata: pata_hpt366: disable fast interrupts in prereset() method ata: pata_mpc52xx: use GFP_KERNEL ata: sata_rcar: drop unused #define's ata: pata_hpt366: check channel enable bits ata: sata_rcar: make sata_rcar_ata_devchk() return 'bool' ...
2022-03-23Merge tag 'linux-kselftest-kunit-5.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull KUnit updates from Shuah Khan: - changes to decrease macro layering string, integer, EQ/NE asserts - remove unused macros - several cleanups and fixes - new list tests for list_del_init_careful(), list_is_head() and list_entry_is_head() * tag 'linux-kselftest-kunit-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: list: test: Add a test for list_entry_is_head() list: test: Add a test for list_is_head() list: test: Add test for list_del_init_careful() kunit: cleanup assertion macro internal variables kunit: factor out str constants from binary assertion structs kunit: consolidate KUNIT_INIT_BINARY_ASSERT_STRUCT macros kunit: remove va_format from kunit_assert kunit: tool: drop mostly unused KunitResult.result field kunit: decrease macro layering for EQ/NE asserts kunit: decrease macro layering for integer asserts kunit: reduce layering in string assertion macros kunit: drop unused intermediate macros for ptr inequality checks kunit: make KUNIT_EXPECT_EQ() use KUNIT_EXPECT_EQ_MSG(), etc. kunit: drop unused assert_type from kunit_assert and clean up macros kunit: split out part of kunit_assert into a static const kunit: factor out kunit_base_assert_format() call into kunit_fail() kunit: drop unused kunit* field in kunit_assert kunit: move check if assertion passed into the macros kunit: add example test case showing off all the expect macros
2022-03-23Merge tag 'slab-for-5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - A few non-trivial SLUB code cleanups, most notably a refactoring of deactivate_slab(). - A bunch of trivial changes, such as removal of unused parameters, making stuff static, and employing helper functions. * tag 'slab-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm: slub: Delete useless parameter of alloc_slab_page() mm: slab: Delete unused SLAB_DEACTIVATED flag mm/slub: remove forced_order parameter in calculate_sizes mm/slub: refactor deactivate_slab() mm/slub: limit number of node partial slabs only in cache creation mm/slub: use helper macro __ATTR_XX_MODE for SLAB_ATTR(_RO) mm/slab_common: use helper function is_power_of_2() mm/slob: make kmem_cache_boot static
2022-03-23drivers/nvdimm: Fix build failure when CONFIG_PERF_EVENTS is not setKajol Jain
The following build failure occurs when CONFIG_PERF_EVENTS is not set as generic pmu functions are not visible in that scenario. |-- s390-randconfig-r044-20220313 | |-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_migrate_context | |-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_register | `-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_unregister Similar build failure in nds32 architecture: nd_perf.c:(.text+0x21e): undefined reference to `perf_pmu_migrate_context' nd_perf.c:(.text+0x434): undefined reference to `perf_pmu_register' nd_perf.c:(.text+0x57c): undefined reference to `perf_pmu_unregister' Fix this issue by adding check for CONFIG_PERF_EVENTS config option and disabling the nvdimm perf interface incase this config is not set. Also remove function declaration of perf_pmu_migrate_context, perf_pmu_register, perf_pmu_unregister functions from nd.h as these are common pmu functions which are part of perf_event.h and since we are disabling nvdimm perf interface incase CONFIG_PERF_EVENTS option is not set, we not need to declare them in nd.h Also move the platform_device header file addition part from nd.h to nd_perf.c and add stub functions for register_nvdimm_pmu and unregister_nvdimm_pmu functions to handle CONFIG_PERF_EVENTS=n case. Fixes: 0fab1ba6ad6b ("drivers/nvdimm: Add perf interface to expose nvdimm performance stats") (Commit id based on libnvdimm-for-next tree) Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Link: https://lore.kernel.org/all/62317124.YBQFU33+s%2FwdvWGj%25lkp@intel.com/ Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/r/20220323164550.109768-1-kjain@linux.ibm.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-03-23rtc: remove uie_unsupportedAlexandre Belloni
uie_unsupported is not used by any drivers anymore, remove it. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20220309162301.61679-29-alexandre.belloni@bootlin.com
2022-03-23rtc: add new RTC_FEATURE_ALARM_WAKEUP_ONLY featureAlexandre Belloni
Some RTCs have an IRQ pin that is not connected to a CPU interrupt but rather directly to a PMIC or power supply. In that case, it is still useful to be able to set alarms but we shouldn't expect interrupts. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20220309162301.61679-22-alexandre.belloni@bootlin.com
2022-03-23rtc: ds1685: drop no_irqAlexandre Belloni
No platforms are currently setting no_irq. Anyway, letting platform_get_irq fail is fine as this means that there is no IRQ. In that case, clear RTC_FEATURE_ALARM so the core knows there are no alarms. Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20220309162301.61679-2-alexandre.belloni@bootlin.com
2022-03-23clk: sunxi-ng: Add support for the sun6i RTC clocksSamuel Holland
The RTC power domain in sun6i and newer SoCs manages the 16 MHz RC oscillator (called "IOSC" or "osc16M") and the optional 32 kHz crystal oscillator (called "LOSC" or "osc32k"). Starting with the H6, this power domain also handles the 24 MHz DCXO (called variously "HOSC", "dcxo24M", or "osc24M") as well. The H6 also adds a calibration circuit for IOSC. Later SoCs introduce further variations on the design: - H616 adds an additional mux for the 32 kHz fanout source. - R329 adds an additional mux for the RTC timekeeping clock, a clock for the SPI bus between power domains inside the RTC, and removes the IOSC calibration functionality. Take advantage of the CCU framework to handle this increased complexity. This driver is intended to be a drop-in replacement for the existing RTC clock provider. So some runtime adjustment of the clock parents is needed, both to handle hardware differences, and to support the old binding which omitted some of the input clocks. Signed-off-by: Samuel Holland <samuel@sholland.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20220203021736.13434-6-samuel@sholland.org
2022-03-23Merge tag 'trace-v5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - New user_events interface. User space can register an event with the kernel describing the format of the event. Then it will receive a byte in a page mapping that it can check against. A privileged task can then enable that event like any other event, which will change the mapped byte to true, telling the user space application to start writing the event to the tracing buffer. - Add new "ftrace_boot_snapshot" kernel command line parameter. When set, the tracing buffer will be saved in the snapshot buffer at boot up when the kernel hands things over to user space. This will keep the traces that happened at boot up available even if user space boot up has tracing as well. - Have TRACE_EVENT_ENUM() also update trace event field type descriptions. Thus if a static array defines its size with an enum, the user space trace event parsers can still know how to parse that array. - Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the TRACE_EVENT() macro, but will attach to an existing tracepoint. This will make one tracepoint be able to trace different content and not be stuck at only what the original TRACE_EVENT() macro exports. - Fixes to tracing error logging. - Better saving of cmdlines to PIDs when tracing (use the wakeup events for mapping). * tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits) tracing: Have type enum modifications copy the strings user_events: Add trace event call as root for low permission cases tracing/user_events: Use alloc_pages instead of kzalloc() for register pages tracing: Add snapshot at end of kernel boot up tracing: Have TRACE_DEFINE_ENUM affect trace event types as well tracing: Fix strncpy warning in trace_events_synth.c user_events: Prevent dyn_event delete racing with ioctl add/delete tracing: Add TRACE_CUSTOM_EVENT() macro tracing: Move the defines to create TRACE_EVENTS into their own files tracing: Add sample code for custom trace events tracing: Allow custom events to be added to the tracefs directory tracing: Fix last_cmd_set() string management in histogram code user_events: Fix potential uninitialized pointer while parsing field tracing: Fix allocation of last_cmd in last_cmd_set() user_events: Add documentation file user_events: Add sample code for typical usage user_events: Add self-test for validator boundaries user_events: Add self-test for perf_event integration user_events: Add self-test for dynamic_events integration user_events: Add self-test for ftrace integration ...
2022-03-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in overtime fixes, no conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-23net/sched: fix incorrect vlan_push_eth dest fieldLouis Peens
Seems like a potential copy-paste bug slipped in here, the second memcpy should of course be populating src and not dest. Fixes: ab95465cde23 ("net/sched: add vlan push_eth and pop_eth action to the hardware IR") Signed-off-by: Louis Peens <louis.peens@corigine.com> Link: https://lore.kernel.org/r/20220323092506.21639-1-louis.peens@corigine.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-23cacheflush.h: Add forward declaration for struct folioHerbert Xu
The struct folio is not declared in cacheflush.h so we need to provide a forward declaration as otherwise users of this header file may get warnings. Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: 522a0032af00 ("Add linux/cacheflush.h") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-23Merge tag 'nand/for-5.18' into mtd/nextMiquel Raynal
Raw NAND core changes: * Rework of_get_nand_bus_width() * Remove of_get_nand_on_flash_bbt() wrapper * Protect access to rawnand devices while in suspend * bindings: Document the wp-gpios property Rax NAND controller driver changes: * atmel: Fix refcount issue in atmel_nand_controller_init * nandsim: - Add NS_PAGE_BYTE_SHIFT macro to replace the repeat pattern - Merge repeat codes in ns_switch_state - Replace overflow check with kzalloc to single kcalloc * rockchip: Fix platform_get_irq.cocci warning * stm32_fmc2: Add NAND Write Protect support * pl353: Set the nand chip node as the flash node * brcmnand: Fix sparse warnings in bcma_nand * omap_elm: Remove redundant variable 'errors' * gpmi: - Support fast edo timings for mx28 - Validate controller clock rate - Fix controller timings setting * brcmnand: - Add BCMA shim - BCMA controller uses command shift of 0 - Allow platform data instantation - Add platform data structure for BCMA - Allow working without interrupts - Move OF operations out of brcmnand_init_cs() - Avoid pdev in brcmnand_init_cs() - Allow SoC to provide I/O operations - Assign soc as early as possible Onenand changes: * Check for error irq Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2022-03-23mfd: db8500-prcmu: Remove unused inline functionYueHaibing
commit b0e846248de5 ("mfd: db8500-prcmu: Remove dead code for a non-existing config") left behind this, remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://lore.kernel.org/r/20220311125518.31064-1-yuehaibing@huawei.com
2022-03-22block: avoid calling blkg_free() in atomic contextMing Lei
blkg_free() can currently be called in atomic context, either spin lock is held, or run in rcu callback. Meantime either request queue's release handler or ->pd_free_fn can sleep. Fix the issue by scheduling a work function for freeing blkcg_gq the instance. [ 148.553894] BUG: sleeping function called from invalid context at block/blk-sysfs.c:767 [ 148.557381] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/13 [ 148.560741] preempt_count: 101, expected: 0 [ 148.562577] RCU nest depth: 0, expected: 0 [ 148.564379] 1 lock held by swapper/13/0: [ 148.566127] #0: ffffffff82615f80 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire+0x0/0x1b [ 148.569640] Preemption disabled at: [ 148.569642] [<ffffffff8123f9c3>] ___slab_alloc+0x554/0x661 [ 148.573559] CPU: 13 PID: 0 Comm: swapper/13 Kdump: loaded Not tainted 5.17.0_up+ #110 [ 148.576834] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014 [ 148.579768] Call Trace: [ 148.580567] <IRQ> [ 148.581262] dump_stack_lvl+0x56/0x7c [ 148.582367] ? ___slab_alloc+0x554/0x661 [ 148.583526] __might_resched+0x1af/0x1c8 [ 148.584678] blk_release_queue+0x24/0x109 [ 148.585861] kobject_cleanup+0xc9/0xfe [ 148.586979] blkg_free+0x46/0x63 [ 148.587962] rcu_do_batch+0x1c5/0x3db [ 148.589057] rcu_core+0x14a/0x184 [ 148.590065] __do_softirq+0x14d/0x2c7 [ 148.591167] __irq_exit_rcu+0x7a/0xd4 [ 148.592264] sysvec_apic_timer_interrupt+0x82/0xa5 [ 148.593649] </IRQ> [ 148.594354] <TASK> [ 148.595058] asm_sysvec_apic_timer_interrupt+0x12/0x20 Cc: Tejun Heo <tj@kernel.org> Fixes: 0a9a25ca7843 ("block: let blkcg_gq grab request queue's refcnt") Reported-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-block/20220322093322.GA27283@lst.de/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220323011308.2010380-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-22Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds
Pull filesystem folio updates from Matthew Wilcox: "Primarily this series converts some of the address_space operations to take a folio instead of a page. Notably: - a_ops->is_partially_uptodate() takes a folio instead of a page and changes the type of the 'from' and 'count' arguments to make it obvious they're bytes. - a_ops->invalidatepage() becomes ->invalidate_folio() and has a similar type change. - a_ops->launder_page() becomes ->launder_folio() - a_ops->set_page_dirty() becomes ->dirty_folio() and adds the address_space as an argument. There are a couple of other misc changes up front that weren't worth separating into their own pull request" * tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache: (53 commits) fs: Remove aops ->set_page_dirty fb_defio: Use noop_dirty_folio() fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio fs: Convert __set_page_dirty_buffers to block_dirty_folio nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio() mm: Convert swap_set_page_dirty() to swap_dirty_folio() ubifs: Convert ubifs_set_page_dirty to ubifs_dirty_folio f2fs: Convert f2fs_set_node_page_dirty to f2fs_dirty_node_folio f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio f2fs: Convert f2fs_set_meta_page_dirty to f2fs_dirty_meta_folio afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio() btrfs: Convert extent_range_redirty_for_io() to use folios fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio btrfs: Convert from set_page_dirty to dirty_folio fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio() fs: Add aops->dirty_folio fs: Remove aops->launder_page orangefs: Convert launder_page to launder_folio nfs: Convert from launder_page to launder_folio fuse: Convert from launder_page to launder_folio ...
2022-03-22Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds
Pull folio updates from Matthew Wilcox: - Rewrite how munlock works to massively reduce the contention on i_mmap_rwsem (Hugh Dickins): https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/ - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig): https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/ - Convert GUP to use folios and make pincount available for order-1 pages. (Matthew Wilcox) - Convert a few more truncation functions to use folios (Matthew Wilcox) - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox) - Convert rmap_walk to use folios (Matthew Wilcox) - Convert most of shrink_page_list() to use a folio (Matthew Wilcox) - Add support for creating large folios in readahead (Matthew Wilcox) * tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits) mm/damon: minor cleanup for damon_pa_young selftests/vm/transhuge-stress: Support file-backed PMD folios mm/filemap: Support VM_HUGEPAGE for file mappings mm/readahead: Switch to page_cache_ra_order mm/readahead: Align file mappings for non-DAX mm/readahead: Add large folio readahead mm: Support arbitrary THP sizes mm: Make large folios depend on THP mm: Fix READ_ONLY_THP warning mm/filemap: Allow large folios to be added to the page cache mm: Turn can_split_huge_page() into can_split_folio() mm/vmscan: Convert pageout() to take a folio mm/vmscan: Turn page_check_references() into folio_check_references() mm/vmscan: Account large folios correctly mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios mm/vmscan: Free non-shmem folios without splitting them mm/rmap: Constify the rmap_walk_control argument mm/rmap: Convert rmap_walk() to take a folio mm: Turn page_anon_vma() into folio_anon_vma() mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read() ...
2022-03-22net/mlx5e: Fix build warning, detected write beyond size of fieldSaeed Mahameed
When merged with Linus tree, the cited patch below will cause the following build warning: In function 'fortify_memset_chk', inlined from 'mlx5e_xmit_xdp_frame' at drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c:438:3: include/linux/fortify-string.h:242:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 242 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix that by grouping the fields to memeset in struct_group() to avoid the false alarm. Fixes: 9ded70fa1d81 ("net/mlx5e: Don't prefill WQEs in XDP SQ in the multi buffer mode") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20220322172224.31849-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-22Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ...
2022-03-22mm/damon/core: add number of each enum type valuesSeongJae Park
This commit declares the number of legal values for each DAMON enum types to make traversals of such DAMON enum types easy and safe. Link: https://lkml.kernel.org/r/20220228081314.5770-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Xin Hao <xhao@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm/damon/core: allow non-exclusive DAMON start/stopSeongJae Park
Patch series "Introduce DAMON sysfs interface", v3. Introduction ============ DAMON's debugfs-based user interface (DAMON_DBGFS) served very well, so far. However, it unnecessarily depends on debugfs, while DAMON is not aimed to be used for only debugging. Also, the interface receives multiple values via one file. For example, schemes file receives 18 values. As a result, it is inefficient, hard to be used, and difficult to be extended. Especially, keeping backward compatibility of user space tools is getting only challenging. It would be better to implement another reliable and flexible interface and deprecate DAMON_DBGFS in long term. For the reason, this patchset introduces a sysfs-based new user interface of DAMON. The idea of the new interface is, using directory hierarchies and having one dedicated file for each value. For a short example, users can do the virtual address monitoring via the interface as below: # cd /sys/kernel/mm/damon/admin/ # echo 1 > kdamonds/nr_kdamonds # echo 1 > kdamonds/0/contexts/nr_contexts # echo vaddr > kdamonds/0/contexts/0/operations # echo 1 > kdamonds/0/contexts/0/targets/nr_targets # echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid_target # echo on > kdamonds/0/state A brief representation of the files hierarchy of DAMON sysfs interface is as below. Childs are represented with indentation, directories are having '/' suffix, and files in each directory are separated by comma. /sys/kernel/mm/damon/admin │ kdamonds/nr_kdamonds │ │ 0/state,pid │ │ │ contexts/nr_contexts │ │ │ │ 0/operations │ │ │ │ │ monitoring_attrs/ │ │ │ │ │ │ intervals/sample_us,aggr_us,update_us │ │ │ │ │ │ nr_regions/min,max │ │ │ │ │ targets/nr_targets │ │ │ │ │ │ 0/pid_target │ │ │ │ │ │ │ regions/nr_regions │ │ │ │ │ │ │ │ 0/start,end │ │ │ │ │ │ │ │ ... │ │ │ │ │ │ ... │ │ │ │ │ schemes/nr_schemes │ │ │ │ │ │ 0/action │ │ │ │ │ │ │ access_pattern/ │ │ │ │ │ │ │ │ sz/min,max │ │ │ │ │ │ │ │ nr_accesses/min,max │ │ │ │ │ │ │ │ age/min,max │ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil │ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low │ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds │ │ │ │ │ │ ... │ │ │ │ ... │ │ ... Detailed usage of the files will be described in the final Documentation patch of this patchset. Main Difference Between DAMON_DBGFS and DAMON_SYSFS --------------------------------------------------- At the moment, DAMON_DBGFS and DAMON_SYSFS provides same features. One important difference between them is their exclusiveness. DAMON_DBGFS works in an exclusive manner, so that no DAMON worker thread (kdamond) in the system can run concurrently and interfere somehow. For the reason, DAMON_DBGFS asks users to construct all monitoring contexts and start them at once. It's not a big problem but makes the operation a little bit complex and unflexible. For more flexible usage, DAMON_SYSFS moves the responsibility of preventing any possible interference to the admins and work in a non-exclusive manner. That is, users can configure and start contexts one by one. Note that DAMON respects both exclusive groups and non-exclusive groups of contexts, in a manner similar to that of reader-writer locks. That is, if any exclusive monitoring contexts (e.g., contexts that started via DAMON_DBGFS) are running, DAMON_SYSFS does not start new contexts, and vice versa. Future Plan of DAMON_DBGFS Deprecation ====================================== Once this patchset is merged, DAMON_DBGFS development will be frozen. That is, we will maintain it to work as is now so that no users will be break. But, it will not be extended to provide any new feature of DAMON. The support will be continued only until next LTS release. After that, we will drop DAMON_DBGFS. User-space Tooling Compatibility -------------------------------- As DAMON_SYSFS provides all features of DAMON_DBGFS, all user space tooling can move to DAMON_SYSFS. As we will continue supporting DAMON_DBGFS until next LTS kernel release, user space tools would have enough time to move to DAMON_SYSFS. The official user space tool, damo[1], is already supporting both DAMON_SYSFS and DAMON_DBGFS. Both correctness tests[2] and performance tests[3] of DAMON using DAMON_SYSFS also passed. [1] https://github.com/awslabs/damo [2] https://github.com/awslabs/damon-tests/tree/master/corr [3] https://github.com/awslabs/damon-tests/tree/master/perf Sequence of Patches =================== First two patches (patches 1-2) make core changes for DAMON_SYSFS. The first one (patch 1) allows non-exclusive DAMON contexts so that DAMON_SYSFS can work in non-exclusive mode, while the second one (patch 2) adds size of DAMON enum types so that DAMON API users can safely iterate the enums. Third patch (patch 3) implements basic sysfs stub for virtual address spaces monitoring. Note that this implements only sysfs files and DAMON is not linked. Fourth patch (patch 4) links the DAMON_SYSFS to DAMON so that users can control DAMON using the sysfs files. Following six patches (patches 5-10) implements other DAMON features that DAMON_DBGFS supports one by one (physical address space monitoring, DAMON-based operation schemes, schemes quotas, schemes prioritization weights, schemes watermarks, and schemes stats). Following patch (patch 11) adds a simple selftest for DAMON_SYSFS, and the final one (patch 12) documents DAMON_SYSFS. This patch (of 13): To avoid interference between DAMON contexts monitoring overlapping memory regions, damon_start() works in an exclusive manner. That is, damon_start() does nothing bug fails if any context that started by another instance of the function is still running. This makes its usage a little bit restrictive. However, admins could aware each DAMON usage and address such interferences on their own in some cases. This commit hence implements non-exclusive mode of the function and allows the callers to select the mode. Note that the exclusive groups and non-exclusive groups of contexts will respect each other in a manner similar to that of reader-writer locks. Therefore, this commit will not cause any behavioral change to the exclusive groups. Link: https://lkml.kernel.org/r/20220228081314.5770-1-sj@kernel.org Link: https://lkml.kernel.org/r/20220228081314.5770-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Xin Hao <xhao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()SeongJae Park
Because DAMON debugfs interface and DAMON-based proactive reclaim are now using monitoring operations via registration mechanism, damon_{p,v}a_{target_valid,set_operations}() functions have no user. This commit clean them up. Link: https://lkml.kernel.org/r/20220215184603.1479-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Xin Hao <xhao@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm/damon: let monitoring operations can be registered and selectedSeongJae Park
In-kernel DAMON user code like DAMON debugfs interface should set 'struct damon_operations' of its 'struct damon_ctx' on its own. Therefore, the client code should depend on all supporting monitoring operations implementations that it could use. For example, DAMON debugfs interface depends on both vaddr and paddr, while some of the users are not always interested in both. To minimize such unnecessary dependencies, this commit makes the monitoring operations can be registered by implementing code and then dynamically selected by the user code without build-time dependency. Link: https://lkml.kernel.org/r/20220215184603.1479-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Xin Hao <xhao@linux.alibaba.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm/damon: rename damon_primitives to damon_operationsSeongJae Park
Patch series "Allow DAMON user code independent of monitoring primitives". In-kernel DAMON user code is required to configure the monitoring context (struct damon_ctx) with proper monitoring primitives (struct damon_primitive). This makes the user code dependent to all supporting monitoring primitives. For example, DAMON debugfs interface depends on both DAMON_VADDR and DAMON_PADDR, though some users have interest in only one use case. As more monitoring primitives are introduced, the problem will be bigger. To minimize such unnecessary dependency, this patchset makes monitoring primitives can be registered by the implemnting code and later dynamically searched and selected by the user code. In addition to that, this patchset renames monitoring primitives to monitoring operations, which is more easy to intuitively understand what it means and how it would be structed. This patch (of 8): DAMON has a set of callback functions called monitoring primitives and let it can be configured with various implementations for easy extension for different address spaces and usages. However, the word 'primitive' is not so explicit. Meanwhile, many other structs resembles similar purpose calls themselves 'operations'. To make the code easier to be understood, this commit renames 'damon_primitives' to 'damon_operations' before it is too late to rename. Link: https://lkml.kernel.org/r/20220215184603.1479-1-sj@kernel.org Link: https://lkml.kernel.org/r/20220215184603.1479-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Xin Hao <xhao@linux.alibaba.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm/damon: remove the target id conceptSeongJae Park
DAMON asks each monitoring target ('struct damon_target') to have one 'unsigned long' integer called 'id', which should be unique among the targets of same monitoring context. Meaning of it is, however, totally up to the monitoring primitives that registered to the monitoring context. For example, the virtual address spaces monitoring primitives treats the id as a 'struct pid' pointer. This makes the code flexible, but ugly, not well-documented, and type-unsafe[1]. Also, identification of each target can be done via its index. For the reason, this commit removes the concept and uses clear type definition. For now, only 'struct pid' pointer is used for the virtual address spaces monitoring. If DAMON is extended in future so that we need to put another identifier field in the struct, we will use a union for such primitives-dependent fields and document which primitives are using which type. [1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/ Link: https://lkml.kernel.org/r/20211230100723.2238-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm/damon/core: move damon_set_targets() into dbgfsSeongJae Park
damon_set_targets() function is defined in the core for general use cases, but called from only dbgfs. Also, because the function is for general use cases, dbgfs does additional handling of pid type target id case. To make the situation simpler, this commit moves the function into dbgfs and makes it to do the pid type case handling on its own. Link: https://lkml.kernel.org/r/20211230100723.2238-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22highmem: document kunmap_local()Ira Weiny
Some users of kmap() add an offset to the kmap() address to be used during the mapping. When converting to kmap_local_page() the base address does not need to be stored because any address within the page can be used in kunmap_local(). However, this was not clear from the documentation and cause some questions.[1] Document that any address in the page can be used in kunmap_local() to clarify this for future users. [1] https://lore.kernel.org/lkml/20211213154543.GM3538886@iweiny-DESK2.sc.intel.com/ [ira.weiny@intel.com: updates per Christoph] Link: https://lkml.kernel.org/r/20220124182138.816693-1-ira.weiny@intel.com Link: https://lkml.kernel.org/r/20220124013045.806718-1-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm: uninline copy_overflow()Christophe Leroy
While building a small config with CONFIG_CC_OPTIMISE_FOR_SIZE, I ended up with more than 50 times the following function in vmlinux because GCC doesn't honor the 'inline' keyword: c00243bc <copy_overflow>: c00243bc: 94 21 ff f0 stwu r1,-16(r1) c00243c0: 7c 85 23 78 mr r5,r4 c00243c4: 7c 64 1b 78 mr r4,r3 c00243c8: 3c 60 c0 62 lis r3,-16286 c00243cc: 7c 08 02 a6 mflr r0 c00243d0: 38 63 5e e5 addi r3,r3,24293 c00243d4: 90 01 00 14 stw r0,20(r1) c00243d8: 4b ff 82 45 bl c001c61c <__warn_printk> c00243dc: 0f e0 00 00 twui r0,0 c00243e0: 80 01 00 14 lwz r0,20(r1) c00243e4: 38 21 00 10 addi r1,r1,16 c00243e8: 7c 08 03 a6 mtlr r0 c00243ec: 4e 80 00 20 blr With -Winline, GCC tells: /include/linux/thread_info.h:212:20: warning: inlining failed in call to 'copy_overflow': call is unlikely and code size would grow [-Winline] copy_overflow() is a non conditional warning called by check_copy_size() on an error path. check_copy_size() have to remain inlined in order to benefit from constant folding, but copy_overflow() is not worth inlining. Uninline the warning when CONFIG_BUG is selected. When CONFIG_BUG is not selected, WARN() does nothing so skip it. This reduces the size of vmlinux by almost 4kbytes. Link: https://lkml.kernel.org/r/e1723b9cfa924bcefcd41f69d0025b38e4c9364e.1644819985.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm: remove usercopy_warn()Christophe Leroy
Users of usercopy_warn() were removed by commit 53944f171a89 ("mm: remove HARDENED_USERCOPY_FALLBACK") Remove it. Link: https://lkml.kernel.org/r/5f26643fc70b05f8455b60b99c30c17d635fa640.1644231910.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Stephen Kitt <steve@sk2.org> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22mm: only re-generate demotion targets when a numa node changes its N_CPU stateOscar Salvador
Abhishek reported that after patch [1], hotplug operations are taking roughly double the expected time. [2] The reason behind is that the CPU callbacks that migrate_on_reclaim_init() sets always call set_migration_target_nodes() whenever a CPU is brought up/down. But we only care about numa nodes going from having cpus to become cpuless, and vice versa, as that influences the demotion_target order. We do already have two CPU callbacks (vmstat_cpu_online() and vmstat_cpu_dead()) that check exactly that, so get rid of the CPU callbacks in migrate_on_reclaim_init() and only call set_migration_target_nodes() from vmstat_cpu_{dead,online}() whenever a numa node change its N_CPU state. [1] https://lore.kernel.org/linux-mm/20210721063926.3024591-2-ying.huang@intel.com/ [2] https://lore.kernel.org/linux-mm/eb438ddd-2919-73d4-bd9f-b7eecdd9577a@linux.vnet.ibm.com/ [osalvador@suse.de: add feedback from Huang Ying] Link: https://lkml.kernel.org/r/20220314150945.12694-1-osalvador@suse.de Link: https://lkml.kernel.org/r/20220310120749.23077-1-osalvador@suse.de Fixes: 884a6e5d1f93b ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reported-by: Abhishek Goel <huntbag@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Abhishek Goel <huntbag@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22drivers/base/memory: determine and store zone for single-zone memory blocksDavid Hildenbrand
test_pages_in_a_zone() is just another nasty PFN walker that can easily stumble over ZONE_DEVICE memory ranges falling into the same memory block as ordinary system RAM: the memmap of parts of these ranges might possibly be uninitialized. In fact, we observed (on an older kernel) with UBSAN: UBSAN: Undefined behaviour in ./include/linux/mm.h:1133:50 index 7 is out of range for type 'zone [5]' CPU: 121 PID: 35603 Comm: read_all Kdump: loaded Tainted: [...] Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 1.12.2 11/15/2019 Call Trace: dump_stack+0x9a/0xf0 ubsan_epilogue+0x9/0x7a __ubsan_handle_out_of_bounds+0x13a/0x181 test_pages_in_a_zone+0x3c4/0x500 show_valid_zones+0x1fa/0x380 dev_attr_show+0x43/0xb0 sysfs_kf_seq_show+0x1c5/0x440 seq_read+0x49d/0x1190 vfs_read+0xff/0x300 ksys_read+0xb8/0x170 do_syscall_64+0xa5/0x4b0 entry_SYSCALL_64_after_hwframe+0x6a/0xdf RIP: 0033:0x7f01f4439b52 We seem to stumble over a memmap that contains a garbage zone id. While we could try inserting pfn_to_online_page() calls, it will just make memory offlining slower, because we use test_pages_in_a_zone() to make sure we're offlining pages that all belong to the same zone. Let's just get rid of this PFN walker and determine the single zone of a memory block -- if any -- for early memory blocks during boot. For memory onlining, we know the single zone already. Let's avoid any additional memmap scanning and just rely on the zone information available during boot. For memory hot(un)plug, we only really care about memory blocks that: * span a single zone (and, thereby, a single node) * are completely System RAM (IOW, no holes, no ZONE_DEVICE) If one of these conditions is not met, we reject memory offlining. Hotplugged memory blocks (starting out offline), always meet both conditions. There are three scenarios to handle: (1) Memory hot(un)plug A memory block with zone == NULL cannot be offlined, corresponding to our previous test_pages_in_a_zone() check. After successful memory onlining/offlining, we simply set the zone accordingly. * Memory onlining: set the zone we just used for onlining * Memory offlining: set zone = NULL So a hotplugged memory block starts with zone = NULL. Once memory onlining is done, we set the proper zone. (2) Boot memory with !CONFIG_NUMA We know that there is just a single pgdat, so we simply scan all zones of that pgdat for an intersection with our memory block PFN range when adding the memory block. If more than one zone intersects (e.g., DMA and DMA32 on x86 for the first memory block) we set zone = NULL and consequently mimic what test_pages_in_a_zone() used to do. (3) Boot memory with CONFIG_NUMA At the point in time we create the memory block devices during boot, we don't know yet which nodes *actually* span a memory block. While we could scan all zones of all nodes for intersections, overlapping nodes complicate the situation and scanning all nodes is possibly expensive. But that problem has already been solved by the code that sets the node of a memory block and creates the link in the sysfs -- do_register_memory_block_under_node(). So, we hook into the code that sets the node id for a memory block. If we already have a different node id set for the memory block, we know that multiple nodes *actually* have PFNs falling into our memory block: we set zone = NULL and consequently mimic what test_pages_in_a_zone() used to do. If there is no node id set, we do the same as (2) for the given node. Note that the call order in driver_init() is: -> memory_dev_init(): create memory block devices -> node_dev_init(): link memory block devices to the node and set the node id So in summary, we detect if there is a single zone responsible for this memory block and we consequently store the zone in that case in the memory block, updating it during memory onlining/offlining. Link: https://lkml.kernel.org/r/20220210184359.235565-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reported-by: Rafael Parra <rparrazo@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Rafael Parra <rparrazo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22drivers/base/node: rename link_mem_sections() to ↵David Hildenbrand
register_memory_block_under_node() Patch series "drivers/base/memory: determine and store zone for single-zone memory blocks", v2. I remember talking to Michal in the past about removing test_pages_in_a_zone(), which we use for: * verifying that a memory block we intend to offline is really only managed by a single zone. We don't support offlining of memory blocks that are managed by multiple zones (e.g., multiple nodes, DMA and DMA32) * exposing that zone to user space via /sys/devices/system/memory/memory*/valid_zones Now that I identified some more cases where test_pages_in_a_zone() might go wrong, and we received an UBSAN report (see patch #3), let's get rid of this PFN walker. So instead of detecting the zone at runtime with test_pages_in_a_zone() by scanning the memmap, let's determine and remember for each memory block if it's managed by a single zone. The stored zone can then be used for the above two cases, avoiding a manual lookup using test_pages_in_a_zone(). This avoids eventually stumbling over uninitialized memmaps in corner cases, especially when ZONE_DEVICE ranges partly fall into memory block (that are responsible for managing System RAM). Handling memory onlining is easy, because we online to exactly one zone. Handling boot memory is more tricky, because we want to avoid scanning all zones of all nodes to detect possible zones that overlap with the physical memory region of interest. Fortunately, we already have code that determines the applicable nodes for a memory block, to create sysfs links -- we'll hook into that. Patch #1 is a simple cleanup I had laying around for a longer time. Patch #2 contains the main logic to remove test_pages_in_a_zone() and further details. [1] https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com [2] https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com This patch (of 2): Let's adjust the stale terminology, making it match unregister_memory_block_under_nodes() and do_register_memory_block_under_node(). We're dealing with memory block devices, which span 1..X memory sections. Link: https://lkml.kernel.org/r/20220210184359.235565-1-david@redhat.com Link: https://lkml.kernel.org/r/20220210184359.235565-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Rafael Parra <rparrazo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>