summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-11-02io_uring/rsrc: add an empty io_rsrc_node for sparse buffer entriesJens Axboe
Rather than allocate an io_rsrc_node for an empty/sparse buffer entry, add a const entry that can be used for that. This just needs checking for writing the tag, and the put check needs to check for that sparse node rather than NULL for validity. This avoids allocating rsrc nodes for sparse buffer entries. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02io_uring/rsrc: get rid of io_rsrc_node allocation cacheJens Axboe
It's not going to be needed in the fast path going forward, so kill it off. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02io_uring/rsrc: get rid of per-ring io_rsrc_node listJens Axboe
Work in progress, but get rid of the per-ring serialization of resource nodes, like registered buffers and files. Main issue here is that one node can otherwise hold up a bunch of other nodes from getting freed, which is especially a problem for file resource nodes and networked workloads where some descriptors may not see activity in a long time. As an example, instantiate an io_uring ring fd and create a sparse registered file table. Even 2 will do. Then create a socket and register it as fixed file 0, F0. The number of open files in the app is now 5, with 0/1/2 being the usual stdin/out/err, 3 being the ring fd, and 4 being the socket. Register this socket (eg "the listener") in slot 0 of the registered file table. Now add an operation on the socket that uses slot 0. Finally, loop N times, where each loop creates a new socket, registers said socket as a file, then unregisters the socket, and finally closes the socket. This is roughly similar to what a basic accept loop would look like. At the end of this loop, it's not unreasonable to expect that there would still be 5 open files. Each socket created and registered in the loop is also unregistered and closed. But since the listener socket registered first still has references to its resource node due to still being active, each subsequent socket unregistration is stuck behind it for reclaim. Hence 5 + N files are still open at that point, where N is awaiting the final put held up by the listener socket. Rewrite the io_rsrc_node handling to NOT rely on serialization. Struct io_kiocb now gets explicit resource nodes assigned, with each holding a reference to the parent node. A parent node is either of type FILE or BUFFER, which are the two types of nodes that exist. A request can have two nodes assigned, if it's using both registered files and buffers. Since request issue and task_work completion is both under the ring private lock, no atomics are needed to handle these references. It's a simple unlocked inc/dec. As before, the registered buffer or file table each hold a reference as well to the registered nodes. Final put of the node will remove the node and free the underlying resource, eg unmap the buffer or put the file. Outside of removing the stall in resource reclaim described above, it has the following advantages: 1) It's a lot simpler than the previous scheme, and easier to follow. No need to specific quiesce handling anymore. 2) There are no resource node allocations in the fast path, all of that happens at resource registration time. 3) The structs related to resource handling can all get simplified quite a bit, like io_rsrc_node and io_rsrc_data. io_rsrc_put can go away completely. 4) Handling of resource tags is much simpler, and doesn't require persistent storage as it can simply get assigned up front at registration time. Just copy them in one-by-one at registration time and assign to the resource node. The only real downside is that a request is now explicitly limited to pinning 2 resources, one file and one buffer, where before just assigning a resource node to a request would pin all of them. The upside is that it's easier to follow now, as an individual resource is explicitly referenced and assigned to the request. With this in place, the above mentioned example will be using exactly 5 files at the end of the loop, not N. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02Merge tag 'nfsd-6.12-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix two async COPY bugs found during NFS bake-a-thon - Fix an svcrdma memory leak * tag 'nfsd-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: rpcrdma: Always release the rpcrdma_device's xa_array NFSD: Never decrement pending_async_copies on error NFSD: Initialize struct nfsd4_copy earlier
2024-11-02Merge tag 'xfs-6.12-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Carlos Maiolino: - fix a sysbot reported crash on filestreams - Reduce cpu time spent searching for extents in a very fragmented FS - Check for delayed allocations before setting extsize * tag 'xfs-6.12-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: streamline xfs_filestream_pick_ag xfs: fix finding a last resort AG in xfs_filestream_pick_ag xfs: Reduce unnecessary searches when searching for the best extents xfs: Check for delayed allocations before setting extsize
2024-11-01lib/iov_iter: fix bvec iterator setupMing Lei
.bi_size of bvec iterator should be initialized as real max size for walking, and .bi_bvec_done just counts how many bytes need to be skipped in the 1st bvec, so .bi_size isn't related with .bi_bvec_done. This patch fixes bvec iterator initialization, and the inner `size` check isn't needed any more, so revert Eric Dumazet's commit 7bc802acf193 ("iov-iter: do not return more bytes than requested in iov_iter_extract_bvec_pages()"). Cc: Eric Dumazet <edumazet@google.com> Fixes: e4e535bff2bc ("iov_iter: don't require contiguous pages in iov_iter_extract_bvec_pages") Reported-by: syzbot+71abe7ab2b70bca770fd@syzkaller.appspotmail.com Tested-by: syzbot+71abe7ab2b70bca770fd@syzkaller.appspotmail.com Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-01loop: Simplify discard granularity calcJohn Garry
A bdev discard granularity is always at least SECTOR_SIZE, so don't check for a zero value. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241101092215.422428-1-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-01Merge tag 'linux_kselftest-fixes-6.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: - fix syntax error in frequency calculation arithmetic expression in intel_pstate run.sh - add missing cpupower dependency check intel_pstate run.sh - fix idmap_mount_tree_invalid test failure due to incorrect argument - fix watchdog-test run leaving the watchdog timer enabled causing system reboot. With this fix, the test disables the watchdog timer when it gets terminated with SIGTERM, SIGKILL, and SIGQUIT in addition to SIGINT * tag 'linux_kselftest-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/watchdog-test: Fix system accidentally reset after watchdog-test selftests/intel_pstate: check if cpupower is installed selftests/intel_pstate: fix operand expected error selftests/mount_setattr: fix idmap_mount_tree_invalid failed to run
2024-11-01dt-bindings: net: xlnx,axi-ethernet: Correct phy-mode property valueSuraj Gupta
Correct phy-mode property value to 1000base-x. Fixes: cbb1ca6d5f9a ("dt-bindings: net: xlnx,axi-ethernet: convert bindings document to yaml") Signed-off-by: Suraj Gupta <suraj.gupta2@amd.com> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20241028091214.2078726-1-suraj.gupta2@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-01Merge tag 'rust-fixes-6.12-3' of https://github.com/Rust-for-Linux/linuxLinus Torvalds
Pull rust fixes from Miguel Ojeda: "Toolchain and infrastructure: - Avoid build errors with old 'rustc's without LLVM patch version (important since it impacts people that do not even enable Rust) - Update LLVM version for 'HAVE_CFI_ICALL_NORMALIZE_INTEGERS' in 'depends on' condition (the fix was eventually backported rather than land in LLVM 19)" * tag 'rust-fixes-6.12-3' of https://github.com/Rust-for-Linux/linux: cfi: tweak llvm version for HAVE_CFI_ICALL_NORMALIZE_INTEGERS kbuild: rust: avoid errors with old `rustc`s without LLVM patch version
2024-11-01Merge tag 'pci-v6.12-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fix from Bjorn Helgaas: - Enable device-specific ACS-like functionality even if the device doesn't advertise an ACS capability, which got broken when adding fancy ACS kernel parameter (Jason Gunthorpe) * tag 'pci-v6.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI: Fix pci_enable_acs() support for the ACS quirks
2024-11-01Merge tag 'drm-fixes-2024-11-02' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Regular fixes pull, nothing too out of the ordinary, the mediatek fixes came in a batch that I might have preferred a bit earlier but all seem fine, otherwise regular xe/amdgpu and a few misc ones. xe: - Fix missing HPD interrupt enabling, bringing one PM refactor with it - Workaround LNL GGTT invalidation not being visible to GuC - Avoid getting jobs stuck without a protecting timeout ivpu: - Fix firewall IRQ handling panthor: - Fix firmware initialization wrt page sizes - Fix handling and reporting of dead job groups sched: - Guarantee forward progress via WC_MEM_RECLAIM tests: - Fix memory leak in drm_display_mode_from_cea_vic() amdgpu: - DCN 3.5 fix - Vangogh SMU KASAN fix - SMU 13 profile reporting fix mediatek: - Fix degradation problem of alpha blending - Fix color format MACROs in OVL - Fix get efuse issue for MT8188 DPTX - Fix potential NULL dereference in mtk_crtc_destroy() - Correct dpi power-domains property - Add split subschema property constraints" * tag 'drm-fixes-2024-11-02' of https://gitlab.freedesktop.org/drm/kernel: (27 commits) drm/xe: Don't short circuit TDR on jobs not started drm/xe: Add mmio read before GGTT invalidate drm/tests: hdmi: Fix memory leaks in drm_display_mode_from_cea_vic() drm/connector: hdmi: Fix memory leak in drm_display_mode_from_cea_vic() drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic() drm/panthor: Report group as timedout when we fail to properly suspend drm/panthor: Fail job creation when the group is dead drm/panthor: Fix firmware initialization on systems with a page size > 4k accel/ivpu: Fix NOC firewall interrupt handling drm/xe/display: Add missing HPD interrupt enabling during non-d3cold RPM resume drm/xe/display: Separate the d3cold and non-d3cold runtime PM handling drm/xe: Remove runtime argument from display s/r functions drm/amdgpu/smu13: fix profile reporting drm/amd/pm: Vangogh: Fix kernel memory out of bounds write Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35" drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM drm/tegra: Fix NULL vs IS_ERR() check in probe() dt-bindings: display: mediatek: split: add subschema property constraints dt-bindings: display: mediatek: dpi: correct power-domains property drm/mediatek: Fix potential NULL dereference in mtk_crtc_destroy() ...
2024-11-01Merge tag 'cxl-fixes-6.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Ira Weiny: "The bulk of these fixes center around an initialization order bug reported by Gregory Price and some additional fall out from the debugging effort. In summary, cxl_acpi and cxl_mem race and previously worked because of a bus_rescan_devices() while testing without modules built in. Unfortunately with modules built in the rescan would fail due to the cxl_port driver being registered late via the build order. Furthermore it was found bus_rescan_devices() did not guarantee a probe barrier which CXL was expecting. Additional fixes to cxl-test and decoder allocation came along as they were found in this debugging effort. The other fixes are pretty minor but one affects trace point data seen by user space. Summary: - Fix crashes when running with cxl-test code - Fix Trace DRAM Event Record field decodes - Fix module/built in initialization order errors - Fix use after free on decoder shutdowns - Fix out of order decoder allocations - Improve cxl-test to better reflect real world systems" * tag 'cxl-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/test: Improve init-order fidelity relative to real-world systems cxl/port: Prevent out-of-order decoder allocation cxl/port: Fix use-after-free, permit out-of-order decoder shutdown cxl/acpi: Ensure ports ready at cxl_acpi_probe() return cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices() cxl/port: Fix CXL port initialization order when the subsystem is built-in cxl/events: Fix Trace DRAM Event Record cxl/core: Return error when cxl_endpoint_gather_bandwidth() handles a non-PCI device
2024-11-01Merge tag 'block-6.12-20241101' of git://git.kernel.dk/linuxLinus Torvalds
Pull block fixes from Jens Axboe: - Fixup for a recent blk_rq_map_user_bvec() patch - NVMe pull request via Keith: - Spec compliant identification fix (Keith) - Module parameter to enable backward compatibility on unusual namespace formats (Keith) - Target double free fix when using keys (Vitaliy) - Passthrough command error handling fix (Keith) * tag 'block-6.12-20241101' of git://git.kernel.dk/linux: nvme: re-fix error-handling for io_uring nvme-passthrough nvmet-auth: assign dh_key to NULL after kfree_sensitive nvme: module parameter to disable pi with offsets block: fix queue limits checks in blk_rq_map_user_bvec for real nvme: enhance cns version checking
2024-11-01Merge tag 'io_uring-6.12-20241101' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fix from Jens Axboe: - Fix not honoring IOCB_NOWAIT for starting buffered writes in terms of calling sb_start_write(), leading to a deadlock if someone is attempting to freeze the file system with writes in progress, as each side will end up waiting for the other to make progress. * tag 'io_uring-6.12-20241101' of git://git.kernel.dk/linux: io_uring/rw: fix missing NOWAIT check for O_DIRECT start write
2024-11-01Merge tag 'acpi-6.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Make the ACPI CPPC library use a raw spinlock for operations carried out in scheduler context via the schedutil governor and the ACPI CPPC cpufreq driver (Pierre Gondois)" * tag 'acpi-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: CPPC: Make rmw_lock a raw_spin_lock
2024-11-01Merge tag 'gpio-fixes-for-v6.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix an uninitialized variable in GPIO swnode code - add a missing return value check for devm_mutex_init() - fix an old issue with debugfs output * tag 'gpio-fixes-for-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpiolib: fix debugfs dangling chip separator gpiolib: fix debugfs newline separators gpio: sloppy-logic-analyzer: Check for error code from devm_mutex_init() call gpio: fix uninit-value in swnode_find_gpio
2024-11-02Merge tag 'drm-xe-fixes-2024-10-31' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - Fix missing HPD interrupt enabling, bringing one PM refactor with it (Imre / Maarten) - Workaround LNL GGTT invalidation not being visible to GuC (Matthew Brost) - Avoid getting jobs stuck without a protecting timeout (Matthew Brost) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/tsbftadm7owyizzdaqnqu7u4tqggxgeqeztlfvmj5fryxlfomi@5m5bfv2zvzmw
2024-11-01Merge tag 'riscv-for-linus-6.11-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - Avoid accessing the early boot ACPI tables via unsafe memory attributes, which can result in incorrect ACPI table data appearing. This can cause all sorts of bad behavior. - Avoid compiler-inserted library calls in the VDSO. - GCC+Rust builds have been disabled, to avoid issues related to ISA string mismatched between the GCC and LLVM Rust implementations. - The NX flag is now set in the EFI PE/COFF headers, which is necessary for some distro GRUB versions to boot images. - A fix to avoid leaking DT node reference counts on ACPI systems during cache info parsing. - CPU numbers are now printed as unsigned values during hotplug. - A pair of build fixes for usused macros, which can trigger warnings on some configurations. * tag 'riscv-for-linus-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Remove duplicated GET_RM riscv: Remove unused GENERATING_ASM_OFFSETS riscv: Use '%u' to format the output of 'cpu' riscv: Prevent a bad reference count on CPU nodes riscv: efi: Set NX compat flag in PE/COFF header RISC-V: disallow gcc + rust builds riscv: Do not use fortify in early code RISC-V: ACPI: fix early_ioremap to early_memremap riscv: vdso: Prevent the compiler from inserting calls to memset()
2024-11-01Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The important one is a change to the way in which we handle protection keys around signal delivery so that we're more closely aligned with the x86 behaviour, however there is also a revert of the previous fix to disable software tag-based KASAN with GCC, since a workaround materialised shortly afterwards. I'd love to say we're done with 6.12, but we're aware of some longstanding fpsimd register corruption issues that we're almost at the bottom of resolving. Summary: - Fix handling of POR_EL0 during signal delivery so that pushing the signal context doesn't fail based on the pkey configuration of the interrupted context and align our user-visible behaviour with that of x86. - Fix a bogus pointer being passed to the CPU hotplug code from the Arm SDEI driver. - Re-enable software tag-based KASAN with GCC by using an alternative implementation of '__no_sanitize_address'" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: signal: Improve POR_EL0 handling to avoid uaccess failures firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state() Revert "kasan: Disable Software Tag-Based KASAN with GCC" kasan: Fix Software Tag-Based KASAN with GCC
2024-11-01Merge tag 'vfs-6.12-rc6.iomap' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull iomap fixes from Christian Brauner: "Fixes for iomap to prevent data corruption bugs in the fallocate unshare range implementation of fsdax and a small cleanup to turn iomap_want_unshare_iter() into an inline function" * tag 'vfs-6.12-rc6.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: iomap: turn iomap_want_unshare_iter into an inline function fsdax: dax_unshare_iter needs to copy entire blocks fsdax: remove zeroing code from dax_unshare_iter iomap: share iomap_unshare_iter predicate code with fsdax xfs: don't allocate COW extents when unsharing a hole
2024-11-01Merge tag 'vfs-6.12-rc6.fixes' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull filesystem fixes from Christian Brauner: "VFS: - Fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP=y is set - Add a get_tree_bdev_flags() helper that allows to modify e.g., whether errors are logged into the filesystem context during superblock creation. This is used by erofs to fix a userspace regression where an error is currently logged when its used on a regular file which is an new allowed mode in erofs. netfs: - Fix the sysfs debug path in the documentation. - Fix iov_iter_get_pages*() for folio queues by skipping the page extracation if we're at the end of a folio. afs: - Fix moving subdirectories to different parent directory. autofs: - Fix handling of AUTOFS_DEV_IOCTL_TIMEOUT_CMD ioctl in validate_dev_ioctl(). The actual ioctl number, not the ioctl command needs to be checked for autofs" * tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP autofs: fix thinko in validate_dev_ioctl() iov_iter: Fix iov_iter_get_pages*() for folio_queue afs: Fix missing subdir edit when renamed between parent dirs doc: correcting the debug path for cachefiles erofs: use get_tree_bdev_flags() to avoid misleading messages fs/super.c: introduce get_tree_bdev_flags()
2024-11-01Merge tag 'for-6.12-rc5-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more stability fixes. There's one patch adding export of MIPS cmpxchg helper, used in the error propagation fix. - fix error propagation from split bios to the original btrfs bio - fix merging of adjacent extents (normal operation, defragmentation) - fix potential use after free after freeing btrfs device structures" * tag 'for-6.12-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix defrag not merging contiguous extents due to merged extent maps btrfs: fix extent map merging not happening for adjacent extents btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids() btrfs: fix error propagation of split bios MIPS: export __cmpxchg_small()
2024-11-01Merge tag 'bcachefs-2024-10-31' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Various syzbot fixes, and the more notable ones: - Fix for pointers in an extent overflowing the max (16) on a filesystem with many devices: we were creating too many cached copies when moving data around. Now, we only create at most one cached copy if there's a promote target set. Caching will be a bit broken for reflinked data until 6.13: I have larger series queued up which significantly improves the plumbing for data options down into the extent (bch_extent_rebalance) to fix this. - Fix for deadlock on -ENOSPC on tiny filesystems Allocation from the partial open_bucket list wasn't correctly accounting partial open_buckets as free: this fixes the main cause of tests timing out in the automated tests" * tag 'bcachefs-2024-10-31' of git://evilpiepirate.org/bcachefs: bcachefs: Fix NULL ptr dereference in btree_node_iter_and_journal_peek bcachefs: fix possible null-ptr-deref in __bch2_ec_stripe_head_get() bcachefs: Fix deadlock on -ENOSPC w.r.t. partial open buckets bcachefs: Don't filter partial list buckets in open_buckets_to_text() bcachefs: Don't keep tons of cached pointers around bcachefs: init freespace inited bits to 0 in bch2_fs_initialize bcachefs: Fix unhandled transaction restart in fallocate bcachefs: Fix UAF in bch2_reconstruct_alloc() bcachefs: fix null-ptr-deref in have_stripes() bcachefs: fix shift oob in alloc_lru_idx_fragmentation bcachefs: Fix invalid shift in validate_sb_layout()
2024-11-01kselftest/arm64: Fix encoding for SVE B16B16 testMark Brown
The test for SVE_B16B16 had a cut'n'paste of a SME instruction, fix it with a relevant SVE instruction. Fixes: 44d10c27bd75 ("kselftest/arm64: Add 2023 DPISA hwcap test coverage") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241028-arm64-b16b16-test-v1-1-59a4a7449bdf@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-01arm64: Expose ID_AA64ISAR1_EL1.XS to sanitised feature consumersMarc Zyngier
Despite KVM now being able to deal with XS-tagged TLBIs, we still don't expose these feature bits to KVM. Plumb in the feature in ID_AA64ISAR1_EL1. Fixes: 0feec7769a63 ("KVM: arm64: nv: Add handling of NXS-flavoured TLBI operations") Signed-off-by: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241031083519.364313-1-maz@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-01arm64/gcs: Fix outdated ptrace documentationMark Brown
The ptrace documentation for GCS was written prior to the implementation of clone3() when we still blocked enabling of GCS via ptrace. This restriction was relaxed as part of implementing clone3() support since we implemented support for the GCS not being managed by the kernel but the documentation still mentions the restriction. Update the documentation to reflect what was merged. We have not yet merged clone3() itself but all the support other than in clone() itself is there. Fixes: 7058bf87cd59 ("arm64/gcs: Document the ABI for Guarded Control Stacks") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241031-arm64-gcs-doc-disable-v1-1-d7f6ded62046@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-01kselftest/arm64: Use ksft_perror() to log MTE failuresMark Brown
The logging in the allocation helpers variously uses ksft_print_msg() with very intermittent logging of errno and perror() (which won't produce KTAP conformant output) when logging the result of API calls that set errno. Standardise on using the ksft_perror() helper in these cases so that more information is available should the tests fail. Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/20241029-arm64-mte-test-logging-v1-1-a128e732e36e@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-01arm64: Return early when break handler is found on linked-listLiao Chang
The search for breakpoint handlers iterate through the entire linked list. Given that all registered hook has a valid fn field, and no registered hooks share the same mask and imm. This commit optimize the efficiency slightly by returning early as a matching handler is found. Signed-off-by: Liao Chang <liaochang1@huawei.com> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241024034120.3814224-1-liaochang1@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-01arm64/mm: Re-organize arch_make_huge_pte()Anshuman Khandual
Core HugeTLB defines a fallback definition for arch_make_huge_pte(), which calls platform provided pte_mkhuge(). But if any platform already provides an override for arch_make_huge_pte(), then it does not need to provide the helper pte_mkhuge(). arm64 override for arch_make_huge_pte() calls pte_mkhuge() internally, thus creating an impression, that both of these callbacks are being used in core HugeTLB and hence required to be defined. This drops off pte_mkhuge() which was never required to begin with as there could not be any section mappings at the PTE level. Re-organize arch_make_huge_pte() based on requested page size and create the entry for the applicable page table level as needed. It also removes a redundancy of clearing PTE_TABLE_BIT bit followed by setting both PTE_TABLE_BIT and PTE_VALID bits (via PTE_TYPE_MASK) in the pte, while creating CONT_PTE_SIZE size entries. Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20241029044529.2624785-1-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-01Merge tag 'qcom-arm64-fixes-for-6.12-2' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into HEAD More Qualcomm Arm64 DeviceTree fixes for v6.12 Bring a range of PCIe fixes across the X Elite platform, as well as marking the NVMe power supply boot-on to avoid glitching the power supply during boot. The X Elite CRD audio configuration sees a spelling mistake corrected. On SM8450 the PCIe 1 PIPE clock definition is corrected, to fix a regression where this isn't able to acquire it's clocks. * tag 'qcom-arm64-fixes-for-6.12-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: dts: qcom: x1e80100: fix PCIe5 interconnect arm64: dts: qcom: x1e80100: fix PCIe4 interconnect arm64: dts: qcom: x1e80100: Fix up BAR spaces arm64: dts: qcom: x1e80100-qcp: fix nvme regulator boot glitch arm64: dts: qcom: x1e80100-microsoft-romulus: fix nvme regulator boot glitch arm64: dts: qcom: x1e80100-yoga-slim7x: fix nvme regulator boot glitch arm64: dts: qcom: x1e80100-vivobook-s15: fix nvme regulator boot glitch arm64: dts: qcom: x1e80100-crd: fix nvme regulator boot glitch arm64: dts: qcom: x1e78100-t14s: fix nvme regulator boot glitch arm64: dts: qcom: x1e80100-crd Rename "Twitter" to "Tweeter" arm64: dts: qcom: x1e80100: Fix PCIe 6a lanes description arm64: dts: qcom: sm8450 fix PIPE clock specification for pcie1 arm64: dts: qcom: x1e80100: Add Broadcast_AND region in LLCC block arm64: dts: qcom: x1e80100: fix PCIe5 PHY clocks arm64: dts: qcom: x1e80100: fix PCIe4 and PCIe6a PHY clocks Link: https://lore.kernel.org/r/20241101143206.738617-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-01Merge tag 'qcom-arm64-fixes-for-6.12' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into HEAD Qualcomm Arm64 DeviceTree fix for v6.12 This reverts the conversion to use the mailbox binding for RPM IPC interrupts, as this broke boot on msm8939. * tag 'qcom-arm64-fixes-for-6.12' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: dts: qcom: msm8939: revert use of APCS mbox for RPM Link: https://lore.kernel.org/r/20241101142414.737828-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-01arm64: mops: Document requirements for hypervisorsKristina Martsenko
Add a mops.rst document to clarify in more detail what hypervisors need to do to run a Linux guest on a system with FEAT_MOPS. Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20241028185721.52852-1-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-01Merge tag 'scmi-fixes-6.12-2' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into HEAD Arm SCMI fixes for v6.12(part 2) Couple of fixes to address slab-use-after-free in scmi_bus_notifier() via scmi_dev->name and possible incorrect clear channel transport operation on A2P channel if some sort of P2A only messages are initiated on A2P channel(occurs when stress tested passing /dev/random to the channel). Apart from this, there are fixes to address missing "arm" prefix in the recently added property max-rx-timeout-ms which was missed in the review but was identified when further additions to the same binding were getting reviewed. * tag 'scmi-fixes-6.12-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: firmware: arm_scmi: Use vendor string in max-rx-timeout-ms dt-bindings: firmware: arm,scmi: Add missing vendor string firmware: arm_scmi: Reject clear channel request on A2P firmware: arm_scmi: Fix slab-use-after-free in scmi_bus_notifier() Link: https://lore.kernel.org/r/20241031172734.3109140-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-01Merge tag 'riscv-soc-fixes-for-v6.12-rc6' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux into HEAD RISC-V soc fixes for v6.12-rc6 StarFive: Two minor dts fixes, one setting the correct eth phy delay parameters and one disabling unused nodes that caused warnings at probe time. Firmware: Fix the poll_complete() implementation in the auto-update driver so that it behaves as the framework expects. Misc: Update the maintainer pattern for my dts entry, so that it covers the specific platforms listed , rather than including all riscv platforms with the list platforms excluded. Signed-off-by: Conor Dooley <conor.dooley@microchip.com> * tag 'riscv-soc-fixes-for-v6.12-rc6' of https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux: MAINTAINERS: invert Misc RISC-V SoC Support's pattern riscv: dts: starfive: Update ethernet phy0 delay parameter values for Star64 riscv: dts: starfive: disable unused csi/camss nodes firmware: microchip: auto-update: fix poll_complete() to not report spurious timeout errors Link: https://lore.kernel.org/r/20241031-colossal-cassette-617817c9bec3@spud Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-01regulator: rk808: Add apply_bit for BUCK3 on RK809Mikhail Rudenko
Currently, RK809's BUCK3 regulator is modelled in the driver as a configurable regulator with 0.5-2.4V voltage range. But the voltage setting is not actually applied, because when bit 6 of PMIC_POWER_CONFIG register is set to 0 (default), BUCK3 output voltage is determined by the external feedback resistor. Fix this, by setting bit 6 when voltage selection is set. Existing users which do not specify voltage constraints in their device trees will not be affected by this change, since no voltage setting is applied in those cases, and bit 6 is not enabled. Signed-off-by: Mikhail Rudenko <mike.rudenko@gmail.com> Link: https://patch.msgid.link/20241017-rk809-dcdc3-v1-1-e3c3de92f39c@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-01Merge tag 'v6.12-rockchip-dtsfixes1' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into HEAD A number of DTS correctnes fixes, to bring down the amount of errors reported by dtbscheck. * tag 'v6.12-rockchip-dtsfixes1' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: (23 commits) arm64: dts: rockchip: Correct GPIO polarity on brcm BT nodes arm64: dts: rockchip: Drop invalid clock-names from es8388 codec nodes ARM: dts: rockchip: Fix the realtek audio codec on rk3036-kylin ARM: dts: rockchip: Fix the spi controller on rk3036 ARM: dts: rockchip: drop grf reference from rk3036 hdmi ARM: dts: rockchip: fix rk3036 acodec node arm64: dts: rockchip: remove orphaned pinctrl-names from pinephone pro arm64: dts: rockchip: remove num-slots property from rk3328-nanopi-r2s-plus arm64: dts: rockchip: Fix LED triggers on rk3308-roc-cc arm64: dts: rockchip: Remove #cooling-cells from fan on Theobroma lion arm64: dts: rockchip: Remove undocumented supports-emmc property arm64: dts: rockchip: Fix bluetooth properties on Rock960 boards arm64: dts: rockchip: Fix bluetooth properties on rk3566 box demo arm64: dts: rockchip: Drop regulator-init-microvolt from two boards arm64: dts: rockchip: fix i2c2 pinctrl-names property on anbernic-rg353p/v arm64: dts: rockchip: Fix reset-gpios property on brcm BT nodes arm64: dts: rockchip: Fix wakeup prop names on PineNote BT node arm64: dts: rockchip: Remove hdmi's 2nd interrupt on rk3328 arm64: dts: rockchip: Designate Turing RK1's system power controller arm64: dts: rockchip: Start cooling maps numbering from zero on ROCK 5B ... Link: https://lore.kernel.org/r/2847150.mvXUDI8C0e@phil Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-01Merge tag 'riscv-sophgo-dt-fixes-for-v6.12-rc1' of ↵Arnd Bergmann
https://github.com/sophgo/linux into HEAD RISC-V Sophgo Devicetree fixes for v6.12-rc1 Just one minor fix to replace deprecated "snps,nr-gpios" property with "ngpios" for snps,dw-apb-gpio-port devices. Signed-off-by: Chen Wang <unicorn_wang@outlook.com> * tag 'riscv-sophgo-dt-fixes-for-v6.12-rc1' of https://github.com/sophgo/linux: riscv: dts: Replace deprecated snps,nr-gpios property for snps,dw-apb-gpio-port devices Link: https://lore.kernel.org/r/MA0P287MB2822A17623C51A558DB948FCFE482@MA0P287MB2822.INDP287.PROD.OUTLOOK.COM Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-01Merge tag 'imx-fixes-6.12' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into HEAD i.MX fixes for 6.12: - An imx8qm change from Alexander Stein to fix VPU IRQs - An imx8 LVDS subsystem change from Diogo Silva to fix clock-output-names - An imx8ulp change from Haibo Chen to correct flexspi compatible string - An imx8mp-skov board change from Liu Ying to set correct clock rate for media_isp - An imx8mp-phyboard change from Marek Vasut to correct Video PLL1 frequency - An imx8mp change from Peng Fan to correct SDHC IPG clock * tag 'imx-fixes-6.12' of https://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: arm64: dts: imx8mp-phyboard-pollux: Set Video PLL1 frequency to 506.8 MHz arm64: dts: imx8mp: correct sdhc ipg clk arm64: dts: imx8mp-skov-revb-mi1010ait-1cp1: Assign "media_isp" clock rate arm64: dts: imx8: Fix lvds0 device tree arm64: dts: imx8ulp: correct the flexspi compatible string arm64: dts: imx8-ss-vpu: Fix imx8qm VPU IRQs Link: https://lore.kernel.org/r/ZxhsnnLudN2kD2Po@dragon Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-11-01fs: optimize acl_permission_check()Linus Torvalds
generic_permission() turned out to be costlier than expected. The reason is that acl_permission_check() performs owner checks that involve costly pointer dereferences. There's already code that skips expensive group checks if the group and other permission bits are the same wrt to the mask that is checked. This logic can be extended to also shortcut permission checking in acl_permission_check(). If the inode doesn't have POSIX ACLs than ownership doesn't matter. If there are no missing UGO permissions the permission check can be shortcut. Acked-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/all/CAHk-=wgXEoAOFRkDg+grxs+p1U+QjWXLixRGmYEfd=vG+OBuFw@mail.gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-01tracing/selftests: Add tracefs mount options testKalesh Singh
Add a selftest to check that the tracefs gid mount option is applied correctly. ./ftracetest test.d/00basic/mount_options.tc Use the new readme string "[gid=<gid>] as a requirement and also update test_ownership.tc requirements to use this. Cc: Eric Sandeen <sandeen@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Ali Zahraee <ahzahraee@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/20241030171928.4168869-4-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-01tracing: Document tracefs gid mount optionKalesh Singh
Commit ee7f3666995d ("tracefs: Have new files inherit the ownership of their parent") and commit 48b27b6b5191 ("tracefs: Set all files to the same group ownership as the mount option") introduced a new gid mount option that allows specifying a group to apply to all entries in tracefs. Document this in the tracing readme. Cc: Eric Sandeen <sandeen@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Ali Zahraee <ahzahraee@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://lore.kernel.org/20241030171928.4168869-3-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-01tracing: Fix tracefs mount optionsKalesh Singh
Commit 78ff64081949 ("vfs: Convert tracefs to use the new mount API") converted tracefs to use the new mount APIs caused mount options (e.g. gid=<gid>) to not take effect. The tracefs superblock can be updated from multiple paths: - on fs_initcall() to init_trace_printk_function_export() - from a work queue to initialize eventfs tracer_init_tracefs_work_func() - fsconfig() syscall to mount or remount of tracefs The tracefs superblock root inode gets created early on in init_trace_printk_function_export(). With the new mount API, tracefs effectively uses get_tree_single() instead of the old API mount_single(). Previously, mount_single() ensured that the options are always applied to the superblock root inode: (1) If the root inode didn't exist, call fill_super() to create it and apply the options. (2) If the root inode exists, call reconfigure_single() which effectively calls tracefs_apply_options() to parse and apply options to the subperblock's fs_info and inode and remount eventfs (if necessary) On the other hand, get_tree_single() effectively calls vfs_get_super() which: (3) If the root inode doesn't exists, calls fill_super() to create it and apply the options. (4) If the root inode already exists, updates the fs_context root with the superblock's root inode. (4) above is always the case for tracefs mounts, since the super block's root inode will already be created by init_trace_printk_function_export(). This means that the mount options get ignored: - Since it isn't applied to the superblock's root inode, it doesn't get inherited by the children. - Since eventfs is initialized from a separate work queue and before call to mount with the options, and it doesn't get remounted for mount. Ensure that the mount options are applied to the super block and eventfs is remounted to respect the mount options. To understand this better, if fstab has the following: tracefs /sys/kernel/tracing tracefs nosuid,nodev,noexec,gid=tracing 0 0 On boot up, permissions look like: # ls -l /sys/kernel/tracing/trace -rw-r----- 1 root root 0 Nov 1 08:37 /sys/kernel/tracing/trace When it should look like: # ls -l /sys/kernel/tracing/trace -rw-r----- 1 root tracing 0 Nov 1 08:37 /sys/kernel/tracing/trace Link: https://lore.kernel.org/r/536e99d3-345c-448b-adee-a21389d7ab4b@redhat.com/ Cc: Eric Sandeen <sandeen@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Ali Zahraee <ahzahraee@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Fixes: 78ff64081949 ("vfs: Convert tracefs to use the new mount API") Link: https://lore.kernel.org/20241030171928.4168869-2-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-01pmdomain: imx93-blk-ctrl: correct remove pathPeng Fan
The check condition should be 'i < bc->onecell_data.num_domains', not 'bc->onecell_data.num_domains' which will make the look never finish and cause kernel panic. Also disable runtime to address "imx93-blk-ctrl 4ac10000.system-controller: Unbalanced pm_runtime_enable!" Fixes: e9aa77d413c9 ("soc: imx: add i.MX93 media blk ctrl driver") Signed-off-by: Peng Fan <peng.fan@nxp.com> Reviewed-by: Stefan Wahren <wahrenst@gmx.net> Cc: stable@vger.kernel.org Message-ID: <20241101101252.1448466-1-peng.fan@oss.nxp.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-10-31mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizesVlastimil Babka
Since commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") a mmap() of anonymous memory without a specific address hint and of at least PMD_SIZE will be aligned to PMD so that it can benefit from a THP backing page. However this change has been shown to regress some workloads significantly. [1] reports regressions in various spec benchmarks, with up to 600% slowdown of the cactusBSSN benchmark on some platforms. The benchmark seems to create many mappings of 4632kB, which would have merged to a large THP-backed area before commit efa7df3e3bb5 and now they are fragmented to multiple areas each aligned to PMD boundary with gaps between. The regression then seems to be caused mainly due to the benchmark's memory access pattern suffering from TLB or cache aliasing due to the aligned boundaries of the individual areas. Another known regression bisected to commit efa7df3e3bb5 is darktable [2] [3] and early testing suggests this patch fixes the regression there as well. To fix the regression but still try to benefit from THP-friendly anonymous mapping alignment, add a condition that the size of the mapping must be a multiple of PMD size instead of at least PMD size. In case of many odd-sized mapping like the cactusBSSN creates, those will stop being aligned and with gaps between, and instead naturally merge again. Link: https://lkml.kernel.org/r/20241024151228.101841-2-vbabka@suse.cz Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries") Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Michael Matz <matz@suse.de> Debugged-by: Gabriel Krisman Bertazi <gabriel@krisman.be> Closes: https://bugzilla.suse.com/show_bug.cgi?id=1229012 [1] Reported-by: Matthias Bodenbinder <matthias@bodenbinder.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219366 [2] Closes: https://lore.kernel.org/all/2050f0d4-57b0-481d-bab8-05e8d48fed0c@leemhuis.info/ [3] Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Yang Shi <yang@os.amperecomputing.com> Cc: Rik van Riel <riel@surriel.com> Cc: Jann Horn <jannh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Petr Tesarik <ptesarik@suse.com> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-31mm: shrinker: avoid memleak in alloc_shrinker_infoChen Ridong
A memleak was found as below: unreferenced object 0xffff8881010d2a80 (size 32): comm "mkdir", pid 1559, jiffies 4294932666 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 @............... backtrace (crc 2e7ef6fa): [<ffffffff81372754>] __kmalloc_node_noprof+0x394/0x470 [<ffffffff813024ab>] alloc_shrinker_info+0x7b/0x1a0 [<ffffffff813b526a>] mem_cgroup_css_online+0x11a/0x3b0 [<ffffffff81198dd9>] online_css+0x29/0xa0 [<ffffffff811a243d>] cgroup_apply_control_enable+0x20d/0x360 [<ffffffff811a5728>] cgroup_mkdir+0x168/0x5f0 [<ffffffff8148543e>] kernfs_iop_mkdir+0x5e/0x90 [<ffffffff813dbb24>] vfs_mkdir+0x144/0x220 [<ffffffff813e1c97>] do_mkdirat+0x87/0x130 [<ffffffff813e1de9>] __x64_sys_mkdir+0x49/0x70 [<ffffffff81f8c928>] do_syscall_64+0x68/0x140 [<ffffffff8200012f>] entry_SYSCALL_64_after_hwframe+0x76/0x7e alloc_shrinker_info(), when shrinker_unit_alloc() returns an errer, the info won't be freed. Just fix it. Link: https://lkml.kernel.org/r/20241025060942.1049263-1-chenridong@huaweicloud.com Fixes: 307bececcd12 ("mm: shrinker: add a secondary array for shrinker_info::{map, nr_deferred}") Signed-off-by: Chen Ridong <chenridong@huawei.com> Acked-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Wang Weiyang <wangweiyang2@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-31.mailmap: update e-mail address for Eugen HristevEugen Hristev
Update e-mail address. Link: https://lkml.kernel.org/r/20241025085848.483149-1-eugen.hristev@linaro.org Signed-off-by: Eugen Hristev <eugen.hristev@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-31vmscan,migrate: fix page count imbalance on node stats when demoting pagesGregory Price
When numa balancing is enabled with demotion, vmscan will call migrate_pages when shrinking LRUs. migrate_pages will decrement the the node's isolated page count, leading to an imbalanced count when invoked from (MG)LRU code. The result is dmesg output like such: $ cat /proc/sys/vm/stat_refresh [77383.088417] vmstat_refresh: nr_isolated_anon -103212 [77383.088417] vmstat_refresh: nr_isolated_file -899642 This negative value may impact compaction and reclaim throttling. The following path produces the decrement: shrink_folio_list demote_folio_list migrate_pages migrate_pages_batch migrate_folio_move migrate_folio_done mod_node_page_state(-ve) <- decrement This path happens for SUCCESSFUL migrations, not failures. Typically callers to migrate_pages are required to handle putback/accounting for failures, but this is already handled in the shrink code. When accounting for migrations, instead do not decrement the count when the migration reason is MR_DEMOTION. As of v6.11, this demotion logic is the only source of MR_DEMOTION. Link: https://lkml.kernel.org/r/20241025141724.17927-1-gourry@gourry.net Fixes: 26aa2d199d6f ("mm/migrate: demote pages during reclaim") Signed-off-by: Gregory Price <gourry@gourry.net> Reviewed-by: Yang Shi <shy828301@gmail.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: "Huang, Ying" <ying.huang@intel.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-31mailmap: update Jarkko's email addressesJarkko Sakkinen
Remove my previous work email, and the new one. The previous was never used in the commit log, so there's no good reason to spare it. Link: https://lkml.kernel.org/r/20241025181530.6151-1-jarkko@kernel.org Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Cc: Alex Elder <elder@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Geliang Tang <geliang@kernel.org> Cc: Jiri Kosina <jikos@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Matthieu Baerts (NGI0) <matttbe@kernel.org> Cc: Matt Ranostay <matt@ranostay.sg> Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Cc: Quentin Monnet <qmo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-31Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: - Put the QP netlink dump back in cxgb4, fixes a user visible regression - Don't change the rounding style in mlx5 for user provided rd_atomic values - Resolve a race in bnxt_re around the qp-handle table array * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/bnxt_re: synchronize the qp-handle table array RDMA/bnxt_re: Fix the usage of control path spin locks RDMA/mlx5: Round max_rd_atomic/max_dest_rd_atomic up instead of down RDMA/cxgb4: Dump vendor specific QP details