linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-02-18	PCI: Add restore_dev_resource()	Ilpo Järvinen
	Resource fitting needs to restore the saved dev resources in a few places. Add a restore_dev_resource() helper for that. Link: https://lore.kernel.org/r/20241216175632.4175-19-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Remove incorrect comment from pci_reassign_resource()	Ilpo Järvinen
	Commit a4ac9fea016f ("PCI : Calculate right add_size") removed including min_align into new_size in pci_reassign_resource() which is the correct thing to do. However, it also added a snakeoil comment that the resource would already be aligned with min_align which is incorrect. A resource that is assigned earlier is aligned with the old alignment, NOT with the new requested alignment (min_align) until later deep within the reassignment callchain. Thus, remove the incorrect comment. Link: https://lore.kernel.org/r/20241216175632.4175-18-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com> Cc: Yinghai Lu <yinghai@kernel.org>
2025-02-18	PCI: Consolidate assignment loop next round preparation	Ilpo Järvinen
	pci_assign_unassigned_root_bus_resources() and pci_assign_unassigned_bridge_resources() have a loop that may perform several rounds to assign resources. The code to prepare for the next round is identical. Consolidate the code that prepares for the next assignment round into pci_prepare_next_assign_round(). Link: https://lore.kernel.org/r/20241216175632.4175-17-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Rename retval to ret	Ilpo Järvinen
	Rename 'retval' to 'ret' in pci_assign_unassigned_bridge_resources(). Link: https://lore.kernel.org/r/20241216175632.4175-16-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Use while loop and break instead of gotos	Ilpo Järvinen
	pci_assign_unassigned_root_bus_resources() and pci_assign_unassigned_bridge_resources() contain ad hoc loops using backwards goto and gotos out of the loop. Replace them with while loops and break statements. While reindenting the loop bodies, add braces & remove parenthesis. Link: https://lore.kernel.org/r/20241216175632.4175-15-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Refactor pdev_sort_resources() & __dev_sort_resources()	Ilpo Järvinen
	Reduce level of call nesting by calling pdev_sort_resources() directly and by moving the tests done inside __dev_sort_resources() into pdev_resources_assignable() helper. Link: https://lore.kernel.org/r/20241216175632.4175-14-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Converge return paths in __assign_resources_sorted()	Ilpo Järvinen
	All return paths want to free head list in __assign_resources_sorted(), so add a label and use goto. Link: https://lore.kernel.org/r/20241216175632.4175-13-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Add dev & res local variables to resource assignment funcs	Ilpo Järvinen
	Many PCI resource allocation related functions process struct pci_dev_resource items which hold the struct pci_dev and resource pointers. Reduce the number of lines that need indirection by adding 'dev' and 'res' local variable to hold the pointers. Link: https://lore.kernel.org/r/20241216175632.4175-12-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Add pci_resource_num() helper	Ilpo Järvinen
	A few places in PCI code, mainly in setup-bus.c, need to reverse lookup the index of a resource in pci_dev's resource array. Create pci_resource_num() helper to avoid repeating the pointer arithmetic trick used to calculate the index. Link: https://lore.kernel.org/r/20241216175632.4175-11-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Check resource_size() separately	Ilpo Järvinen
	Instead of chaining logic inside if () condition so that multiple lines are required, make !resource_size() a separate check and use continue. Link: https://lore.kernel.org/r/20241216175632.4175-10-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Add pci_resource_is_iov() to identify IOV resources	Michał Winiarski
	There are multiple places where special handling is required for IOV resources. Extract the identification of IOV resources to pci_resource_is_iov() and drop a few ifdefs. Link: https://lore.kernel.org/r/20241216175632.4175-9-ilpo.jarvinen@linux.intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Christian König <christian.koenig@amd.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Use resource_set_{range,size}() helpers	Ilpo Järvinen
	A few sites that could use resource_set_range/size() in setup-bus.c were not picked up earlier due to them no matching the usual pattern. Convert them now. These are more cases similar to 783602c920e9 ("PCI: Use resource_set_{range,size}() helpers"). Link: https://lore.kernel.org/r/20241216175632.4175-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> [bhelgaas: add 783602c920e9 history] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Use SZ_* instead of literals in setup-bus.c	Ilpo Järvinen
	Convert literals in setup-bus.c to SZ_* defines that make the size more human readable. As the code is now self-explanatory, eliminate comments about the size. Link: https://lore.kernel.org/r/20241216175632.4175-7-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Fix old_size lower bound in calculate_iosize() too	Ilpo Järvinen
	Commit 903534fa7d30 ("PCI: Fix resource double counting on remove & rescan") fixed double counting of mem resources because of old_size being applied too early. Fix a similar counting bug on the io resource side. Link: https://lore.kernel.org/r/20241216175632.4175-6-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Allow relaxed bridge window tail sizing for optional resources	Ilpo Järvinen
	Commit 566f1dd52816 ("PCI: Relax bridge window tail sizing rules") relaxed the bridge window requirements for non-optional size (size0) but pbus_size_mem() also handles optional sizes (IOV resources) using size1. This can manifest, e.g., as a failure to resize a BAR back to its original size after it was first shrunk when device has a VF BAR resource because the bridge window (size1) is enlarged beyond what is strictly required to fit the downstream resources. Allow using relaxed bridge window tail sizing rules also with the optional resources (size1) so that the remove/realloc cycle during BAR resize (smaller and back to the original size) does not fail unexpectedly due to increase in bridge window size demand. Also move add_align calculation to more logical place next to size1 assignment as they are strongly related to each other. Link: https://lore.kernel.org/r/20241216175632.4175-5-ilpo.jarvinen@linux.intel.com Fixes: 566f1dd52816 ("PCI: Relax bridge window tail sizing rules") Reported-by: Michał Winiarski <michal.winiarski@intel.com> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Simplify size1 assignment logic	Ilpo Järvinen
	In pbus_size_io() and pbus_size_mem(), a complex ?: operation is performed to set size1. Decompose this so it's easier to read. In the case of pbus_size_mem(), simply initializing size1 to zero ensures the size1 checks work as expected. Link: https://lore.kernel.org/r/20241216175632.4175-4-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Use min_align, not unrelated add_align, for size0	Ilpo Järvinen
	Commit 566f1dd52816 ("PCI: Relax bridge window tail sizing rules") relaxed bridge window tail alignment rule for the non-optional part (size0, no add_size/add_align). The required alignment given for pbus_upstream_space_available(), however, was add_align which relates only to size1 alignment. As pbus_upstream_space_available() only selects between normal and relaxed tail alignment of the bridge window, the different alignment only makes relaxed tail alignment to be used more often than what was intended, which should be harmless because relaxed tail alignment itself should work in all cases. For consistency, change pbus_upstream_space_available() call to use min_align which is the alignment that is going to be used for the bridge window in the case where size0 sized allocation is attempted. Link: https://lore.kernel.org/r/20241216175632.4175-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Remove add_align overwrite unrelated to size0	Ilpo Järvinen
	Commit 566f1dd52816 ("PCI: Relax bridge window tail sizing rules") relaxed bridge window tail alignment rule for the non-optional part (size0, no add_size/add_align). The change, however, also overwrote add_align, which is only related to case where optional size1 related entry is added into realloc head. Correct this by removing the add_align overwrite. Link: https://lore.kernel.org/r/20241216175632.4175-2-ilpo.jarvinen@linux.intel.com Fixes: 566f1dd52816 ("PCI: Relax bridge window tail sizing rules") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Xiaochun Lee <lixc17@lenovo.com>
2025-02-18	PCI: Use downstream bridges for distributing resources	Kai-Heng Feng
	7180c1d08639 ("PCI: Distribute available resources for root buses, too") breaks BAR assignment on some devices: pci 0006:03:00.0: BAR 0 [mem 0x6300c0000000-0x6300c1ffffff 64bit pref]: assigned pci 0006:03:00.1: BAR 0 [mem 0x6300c2000000-0x6300c3ffffff 64bit pref]: assigned pci 0006:03:00.2: BAR 0 [mem size 0x00800000 64bit pref]: can't assign; no space pci 0006:03:00.0: VF BAR 0 [mem size 0x02000000 64bit pref]: can't assign; no space pci 0006:03:00.1: VF BAR 0 [mem size 0x02000000 64bit pref]: can't assign; no space The apertures of domain 0006 before 7180c1d08639: 6300c0000000-63ffffffffff : PCI Bus 0006:00 6300c0000000-6300c9ffffff : PCI Bus 0006:01 6300c0000000-6300c9ffffff : PCI Bus 0006:02 # 160MB 6300c0000000-6300c8ffffff : PCI Bus 0006:03 # 144MB 6300c0000000-6300c1ffffff : 0006:03:00.0 # 32MB 6300c2000000-6300c3ffffff : 0006:03:00.1 # 32MB 6300c4000000-6300c47fffff : 0006:03:00.2 # 8MB 6300c4800000-6300c67fffff : 0006:03:00.0 # 32MB 6300c6800000-6300c87fffff : 0006:03:00.1 # 32MB 6300c9000000-6300c9bfffff : PCI Bus 0006:04 # 12MB 6300c9000000-6300c9bfffff : PCI Bus 0006:05 # 12MB 6300c9000000-6300c91fffff : PCI Bus 0006:06 # 2MB 6300c9200000-6300c93fffff : PCI Bus 0006:07 # 2MB 6300c9400000-6300c95fffff : PCI Bus 0006:08 # 2MB 6300c9600000-6300c97fffff : PCI Bus 0006:09 # 2MB After 7180c1d08639: 6300c0000000-63ffffffffff : PCI Bus 0006:00 6300c0000000-6300c9ffffff : PCI Bus 0006:01 6300c0000000-6300c9ffffff : PCI Bus 0006:02 # 160MB 6300c0000000-6300c43fffff : PCI Bus 0006:03 # 68MB 6300c0000000-6300c1ffffff : 0006:03:00.0 # 32MB 6300c2000000-6300c3ffffff : 0006:03:00.1 # 32MB --- no space --- : 0006:03:00.2 # 8MB --- no space --- : 0006:03:00.0 # 32MB --- no space --- : 0006:03:00.1 # 32MB 6300c4400000-6300c4dfffff : PCI Bus 0006:04 # 10MB 6300c4400000-6300c4dfffff : PCI Bus 0006:05 # 10MB 6300c4400000-6300c45fffff : PCI Bus 0006:06 # 2MB 6300c4600000-6300c47fffff : PCI Bus 0006:07 # 2MB 6300c4800000-6300c49fffff : PCI Bus 0006:08 # 2MB 6300c4a00000-6300c4bfffff : PCI Bus 0006:09 # 2MB We can see that the window to 0006:03 gets shrunken too much and 0006:04 eats away the window for 0006:03:00.2. The offending commit distributes the upstream bridge's resources multiple times to every downstream bridge, hence makes the aperture smaller than desired because calculation of io_per_b, mmio_per_b and mmio_pref_per_b becomes incorrect. Instead, distribute downstream bridges' own resources to resolve the issue. Link: https://lore.kernel.org/r/20241204022457.51322-1-kaihengf@nvidia.com Fixes: 7180c1d08639 ("PCI: Distribute available resources for root buses, too") Link: https://bugzilla.kernel.org/show_bug.cgi?id=219540 Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com> Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Carol Soto <csoto@nvidia.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Chris Chiu <chris.chiu@canonical.com>
2025-02-14	PCI: Cleanup dev->resource + resno to use pci_resource_n()	Ilpo Järvinen
	Replace pointer arithmetic in finding the correct resource entry with the pci_resource_n() helper. Link: https://lore.kernel.org/r/20250207162301.2842-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-02-02	Linux 6.14-rc1v6.14-rc1	Linus Torvalds

2025-02-02	Merge tag 'turbostat-2025.02.02' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: - Fix regression that affinitized forked child in one-shot mode. - Harden one-shot mode against hotplug online/offline - Enable RAPL SysWatt column by default - Add initial PTL, CWF platform support - Harden initial PMT code in response to early use - Enable first built-in PMT counter: CWF c1e residency - Refuse to run on unsupported platforms without --force, to encourage updating to a version that supports the system, and to avoid no-so-useful measurement results * tag 'turbostat-2025.02.02' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (25 commits) tools/power turbostat: version 2025.02.02 tools/power turbostat: Add CPU%c1e BIC for CWF tools/power turbostat: Harden one-shot mode against cpu offline tools/power turbostat: Fix forked child affinity regression tools/power turbostat: Add tcore clock PMT type tools/power turbostat: version 2025.01.14 tools/power turbostat: Allow adding PMT counters directly by sysfs path tools/power turbostat: Allow mapping multiple PMT files with the same GUID tools/power turbostat: Add PMT directory iterator helper tools/power turbostat: Extend PMT identification with a sequence number tools/power turbostat: Return default value for unmapped PMT domains tools/power turbostat: Check for non-zero value when MSR probing tools/power turbostat: Enhance turbostat self-performance visibility tools/power turbostat: Add fixed RAPL PSYS divisor for SPR tools/power turbostat: Fix PMT mmaped file size rounding tools/power turbostat: Remove SysWatt from DISABLED_BY_DEFAULT tools/power turbostat: Add an NMI column tools/power turbostat: add Busy% to "show idle" tools/power turbostat: Introduce --force parameter tools/power turbostat: Improve --help output ...
2025-02-02	Merge tag 'sh-for-v6.14-tag1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux Pull sh updates from John Paul Adrian Glaubitz: "Fixes and improvements for sh: - replace seq_printf() with the more efficient seq_put_decimal_ull_width() to increase performance when stress reading /proc/interrupts (David Wang) - migrate sh to the generic rule for built-in DTB to help avoid race conditions during parallel builds which can occur because Kbuild decends into arch//boot/dts twice (Masahiro Yamada) - replace select with imply in the board Kconfig for enabling hardware with complex dependencies. This addresses warnings which were reported by the kernel test robot (Geert Uytterhoeven)" tag 'sh-for-v6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux: sh: boards: Use imply to enable hardware with complex dependencies sh: Migrate to the generic rule for built-in DTB sh: irq: Use seq_put_decimal_ull_width() for decimal values
2025-02-02	tools/power turbostat: version 2025.02.02	Len Brown
	Summary of Changes since 2024.11.30: Fix regression in 2023.11.07 that affinitized forked child in one-shot mode. Harden one-shot mode against hotplug online/offline Enable RAPL SysWatt column by default. Add initial PTL, CWF platform support. Harden initial PMT code in response to early use. Enable first built-in PMT counter: CWF c1e residency Refuse to run on unsupported platforms without --force, to encourage updating to a version that supports the system, and to avoid no-so-useful measurement results. Signed-off-by: Len Brown <len.brown@intel.com>
2025-02-01	Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds
	Pull misc vfs cleanups from Al Viro: "Two unrelated patches - one is a removal of long-obsolete include in overlayfs (it used to need fs/internal.h, but the extern it wanted has been moved back to include/linux/namei.h) and another introduces convenience helper constructing struct qstr by a NUL-terminated string" * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: add a string-to-qstr constructor fs/overlayfs/namei.c: get rid of include ../internal.h
2025-02-01	Merge tag 'mips_6.14_1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fix from Thomas Bogendoerfer: "Revert commit breaking sysv ipc for o32 ABI" * tag 'mips_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: Revert "mips: fix shmctl/semctl/msgctl syscall for o32"
2025-02-01	Merge tag 'v6.14-rc-smb3-client-fixes-part2' of ↵	Linus Torvalds
	git://git.samba.org/sfrench/cifs-2.6 Pull more smb client updates from Steve French: - various updates for special file handling: symlink handling, support for creating sockets, cleanups, new mount options (e.g. to allow disabling using reparse points for them, and to allow overriding the way symlinks are saved), and fixes to error paths - fix for kerberos mounts (allow IAKerb) - SMB1 fix for stat and for setting SACL (auditing) - fix an incorrect error code mapping - cleanups" * tag 'v6.14-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: (21 commits) cifs: Fix parsing native symlinks directory/file type cifs: update internal version number cifs: Add support for creating WSL-style symlinks smb3: add support for IAKerb cifs: Fix struct FILE_ALL_INFO cifs: Add support for creating NFS-style symlinks cifs: Add support for creating native Windows sockets cifs: Add mount option -o reparse=none cifs: Add mount option -o symlink= for choosing symlink create type cifs: Fix creating and resolving absolute NT-style symlinks cifs: Simplify reparse point check in cifs_query_path_info() function cifs: Remove symlink member from cifs_open_info_data union cifs: Update description about ACL permissions cifs: Rename struct reparse_posix_data to reparse_nfs_data_buffer and move to common/smb2pdu.h cifs: Remove struct reparse_posix_data from struct cifs_open_info_data cifs: Remove unicode parameter from parse_reparse_point() function cifs: Fix getting and setting SACLs over SMB1 cifs: Remove intermediate object of failed create SFU call cifs: Validate EAs for WSL reparse points cifs: Change translation of STATUS_PRIVILEGE_NOT_HELD to -EPERM ...
2025-02-01	Merge tag 'driver-core-6.14-rc1-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull debugfs fix from Greg KH: "Here is a single debugfs fix from Al to resolve a reported regression in the driver-core tree. It has been reported to fix the issue" * tag 'driver-core-6.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: debugfs: Fix the missing initializations in __debugfs_file_get()
2025-02-01	Merge tag 'mm-hotfixes-stable-2025-02-01-03-56' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "21 hotfixes. 8 are cc:stable and the remainder address post-6.13 issues. 13 are for MM and 8 are for non-MM. All are singletons, please see the changelogs for details" * tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) MAINTAINERS: include linux-mm for xarray maintenance revert "xarray: port tests to kunit" MAINTAINERS: add lib/test_xarray.c mailmap, MAINTAINERS, docs: update Carlos's email address mm/hugetlb: fix hugepage allocation for interleaved memory nodes mm: gup: fix infinite loop within __get_longterm_locked mm, swap: fix reclaim offset calculation error during allocation .mailmap: update email address for Christopher Obbard kfence: skip __GFP_THISNODE allocations on NUMA systems nilfs2: fix possible int overflows in nilfs_fiemap() mm: compaction: use the proper flag to determine watermarks kernel: be more careful about dup_mmap() failures and uprobe registering mm/fake-numa: handle cases with no SRAT info mm: kmemleak: fix upper boundary check for physical address objects mailmap: add an entry for Hamza Mahfooz MAINTAINERS: mailmap: update Yosry Ahmed's email address scripts/gdb: fix aarch64 userspace detection in get_current_task mm/vmscan: accumulate nr_demoted for accurate demotion statistics ocfs2: fix incorrect CPU endianness conversion causing mount failure mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc() ...
2025-02-01	Merge tag 'media/v6.14-2' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fix from Mauro Carvalho Chehab: "A revert for a regression in the uvcvideo driver" * tag 'media/v6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: Revert "media: uvcvideo: Require entities to have a non-zero unique ID"
2025-02-01	MAINTAINERS: include linux-mm for xarray maintenance	Andrew Morton
	MM developers have an interest in the xarray code. Cc: David Gow <davidgow@google.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	revert "xarray: port tests to kunit"	Andrew Morton
	Revert c7bb5cf9fc4e ("xarray: port tests to kunit"). It broke the build when compiing the xarray userspace test harness code. Reported-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Closes: https://lkml.kernel.org/r/07cf896e-adf8-414f-a629-a808fc26014a@oracle.com Cc: David Gow <davidgow@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Tamir Duberstein <tamird@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	MAINTAINERS: add lib/test_xarray.c	Tamir Duberstein
	Ensure test-only changes are sent to the relevant maintainer. Link: https://lkml.kernel.org/r/20250129-xarray-test-maintainer-v1-1-482e31f30f47@gmail.com Signed-off-by: Tamir Duberstein <tamird@gmail.com> Cc: Mattew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mailmap, MAINTAINERS, docs: update Carlos's email address	Carlos Bilbao
	Update .mailmap to reflect my new (and final) primary email address, carlos.bilbao@kernel.org. Also update contact information in files Documentation/translations/sp_SP/index.rst and MAINTAINERS. Link: https://lkml.kernel.org/r/20250130012248.1196208-1-carlos.bilbao@kernel.org Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org> Cc: Carlos Bilbao <bilbao@vt.edu> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Mattew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/hugetlb: fix hugepage allocation for interleaved memory nodes	Ritesh Harjani (IBM)
	gather_bootmem_prealloc() assumes the start nid as 0 and size as num_node_state(N_MEMORY). That means in case if memory attached numa nodes are interleaved, then gather_bootmem_prealloc_parallel() will fail to scan few of these nodes. Since memory attached numa nodes can be interleaved in any fashion, hence ensure that the current code checks for all numa node ids (.size = nr_node_ids). Let's still keep max_threads as N_MEMORY, so that it can distributes all nr_node_ids among the these many no. threads. e.g. qemu cmdline ======================== numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20" mem_cmd="-object memory-backend-ram,id=mem1,size=16G" w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2): ========================== ~ # cat /proc/meminfo \|grep -i huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 0 kB with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2): =========================== ~ # cat /proc/meminfo \|grep -i huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 2 HugePages_Free: 2 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 1048576 kB Hugetlb: 2097152 kB Link: https://lkml.kernel.org/r/f8d8dad3a5471d284f54185f65d575a6aaab692b.1736592534.git.ritesh.list@gmail.com Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reported-by: Pavithra Prakash <pavrampu@linux.ibm.com> Suggested-by: Muchun Song <muchun.song@linux.dev> Tested-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Luiz Capitulino <luizcap@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Gang Li <gang.li@linux.dev> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm: gup: fix infinite loop within __get_longterm_locked	Zhaoyang Huang
	We can run into an infinite loop in __get_longterm_locked() when collect_longterm_unpinnable_folios() finds only folios that are isolated from the LRU or were never added to the LRU. This can happen when all folios to be pinned are never added to the LRU, for example when vm_ops->fault allocated pages using cma_alloc() and never added them to the LRU. Fix it by simply taking a look at the list in the single caller, to see if anything was added. [zhaoyang.huang@unisoc.com: move definition of local] Link: https://lkml.kernel.org/r/20250122012604.3654667-1-zhaoyang.huang@unisoc.com Link: https://lkml.kernel.org/r/20250121020159.3636477-1-zhaoyang.huang@unisoc.com Fixes: 67e139b02d99 ("mm/gup.c: refactor check_and_migrate_movable_pages()") Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: David Hildenbrand <david@redhat.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Aijun Sun <aijun.sun@unisoc.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm, swap: fix reclaim offset calculation error during allocation	Kairui Song
	There is a code error that will cause the swap entry allocator to reclaim and check the whole cluster with an unexpected tail offset instead of the part that needs to be reclaimed. This may cause corruption of the swap map, so fix it. Link: https://lkml.kernel.org/r/20250130115131.37777-1-ryncsn@gmail.com Fixes: 3b644773eefd ("mm, swap: reduce contention on device lock") Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Chris Li <chrisl@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	.mailmap: update email address for Christopher Obbard	Christopher Obbard
	Update my email address. Link: https://lkml.kernel.org/r/20250122-wip-obbardc-update-email-v2-1-12bde6b79ad0@linaro.org Signed-off-by: Christopher Obbard <christopher.obbard@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	kfence: skip __GFP_THISNODE allocations on NUMA systems	Marco Elver
	On NUMA systems, __GFP_THISNODE indicates that an allocation _must_ be on a particular node, and failure to allocate on the desired node will result in a failed allocation. Skip __GFP_THISNODE allocations if we are running on a NUMA system, since KFENCE can't guarantee which node its pool pages are allocated on. Link: https://lkml.kernel.org/r/20250124120145.410066-1-elver@google.com Fixes: 236e9f153852 ("kfence: skip all GFP_ZONEMASK allocations") Signed-off-by: Marco Elver <elver@google.com> Reported-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Alexander Potapenko <glider@google.com> Cc: Chistoph Lameter <cl@linux.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	nilfs2: fix possible int overflows in nilfs_fiemap()	Nikita Zhandarovich
	Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result by being prepared to go through potentially maxblocks == INT_MAX blocks, the value in n may experience an overflow caused by left shift of blkbits. While it is extremely unlikely to occur, play it safe and cast right hand expression to wider type to mitigate the issue. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com Fixes: 622daaff0a89 ("nilfs2: fiemap support") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm: compaction: use the proper flag to determine watermarks	yangge
	There are 4 NUMA nodes on my machine, and each NUMA node has 32GB of memory. I have configured 16GB of CMA memory on each NUMA node, and starting a 32GB virtual machine with device passthrough is extremely slow, taking almost an hour. Long term GUP cannot allocate memory from CMA area, so a maximum of 16 GB of no-CMA memory on a NUMA node can be used as virtual machine memory. There is 16GB of free CMA memory on a NUMA node, which is sufficient to pass the order-0 watermark check, causing the __compaction_suitable() function to consistently return true. For costly allocations, if the __compaction_suitable() function always returns true, it causes the __alloc_pages_slowpath() function to fail to exit at the appropriate point. This prevents timely fallback to allocating memory on other nodes, ultimately resulting in excessively long virtual machine startup times. Call trace: __alloc_pages_slowpath if (compact_result == COMPACT_SKIPPED \|\| compact_result == COMPACT_DEFERRED) goto nopage; // should exit __alloc_pages_slowpath() from here We could use the real unmovable allocation context to have __zone_watermark_unusable_free() subtract CMA pages, and thus we won't pass the order-0 check anymore once the non-CMA part is exhausted. There is some risk that in some different scenario the compaction could in fact migrate pages from the exhausted non-CMA part of the zone to the CMA part and succeed, and we'll skip it instead. But only __GFP_NORETRY allocations should be affected in the immediate "goto nopage" when compaction is skipped, others will attempt with DEF_COMPACT_PRIORITY anyway and won't fail without trying to compact-migrate the non-CMA pageblocks into CMA pageblocks first, so it should be fine. After this fix, it only takes a few tens of seconds to start a 32GB virtual machine with device passthrough functionality. Link: https://lore.kernel.org/lkml/1736335854-548-1-git-send-email-yangge1116@126.com/ Link: https://lkml.kernel.org/r/1737788037-8439-1-git-send-email-yangge1116@126.com Signed-off-by: yangge <yangge1116@126.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Barry Song <21cnbao@gmail.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	kernel: be more careful about dup_mmap() failures and uprobe registering	Liam R. Howlett
	If a memory allocation fails during dup_mmap(), the maple tree can be left in an unsafe state for other iterators besides the exit path. All the locks are dropped before the exit_mmap() call (in mm/mmap.c), but the incomplete mm_struct can be reached through (at least) the rmap finding the vmas which have a pointer back to the mm_struct. Up to this point, there have been no issues with being able to find an mm_struct that was only partially initialised. Syzbot was able to make the incomplete mm_struct fail with recent forking changes, so it has been proven unsafe to use the mm_struct that hasn't been initialised, as referenced in the link below. Although 8ac662f5da19f ("fork: avoid inappropriate uprobe access to invalid mm") fixed the uprobe access, it does not completely remove the race. This patch sets the MMF_OOM_SKIP to avoid the iteration of the vmas on the oom side (even though this is extremely unlikely to be selected as an oom victim in the race window), and sets MMF_UNSTABLE to avoid other potential users from using a partially initialised mm_struct. When registering vmas for uprobe, skip the vmas in an mm that is marked unstable. Modifying a vma in an unstable mm may cause issues if the mm isn't fully initialised. Link: https://lore.kernel.org/all/6756d273.050a0220.2477f.003d.GAE@google.com/ Link: https://lkml.kernel.org/r/20250127170221.1761366-1-Liam.Howlett@oracle.com Fixes: d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Peng Zhang <zhangpeng.00@bytedance.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/fake-numa: handle cases with no SRAT info	Bruno Faccini
	Handle more gracefully cases where no SRAT information is available, like in VMs with no Numa support, and allow fake-numa configuration to complete successfully in these cases Link: https://lkml.kernel.org/r/20250127171623.1523171-1-bfaccini@nvidia.com Fixes: 63db8170bf34 (“mm/fake-numa: allow later numa node hotplug”) Signed-off-by: Bruno Faccini <bfaccini@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <hyeonggon.yoo@sk.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Len Brown <lenb@kernel.org> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm: kmemleak: fix upper boundary check for physical address objects	Catalin Marinas
	Memblock allocations are registered by kmemleak separately, based on their physical address. During the scanning stage, it checks whether an object is within the min_low_pfn and max_low_pfn boundaries and ignores it otherwise. With the recent addition of __percpu pointer leak detection (commit 6c99d4eb7c5e ("kmemleak: enable tracking for percpu pointers")), kmemleak started reporting leaks in setup_zone_pageset() and setup_per_cpu_pageset(). These were caused by the node_data[0] object (initialised in alloc_node_data()) ending on the PFN_PHYS(max_low_pfn) boundary. The non-strict upper boundary check introduced by commit 84c326299191 ("mm: kmemleak: check physical address when scan") causes the pg_data_t object to be ignored (not scanned) and the __percpu pointers it contains to be reported as leaks. Make the max_low_pfn upper boundary check strict when deciding whether to ignore a physical address object and not scan it. Link: https://lkml.kernel.org/r/20250127184233.2974311-1-catalin.marinas@arm.com Fixes: 84c326299191 ("mm: kmemleak: check physical address when scan") Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Tested-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Cc: Patrick Wang <patrick.wang.shcn@gmail.com> Cc: <stable@vger.kernel.org> [6.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mailmap: add an entry for Hamza Mahfooz	Hamza Mahfooz
	Map my previous work email to my current one. Link: https://lkml.kernel.org/r/20250120205659.139027-1-hamzamahfooz@linux.microsoft.com Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Cc: David S. Miller <davem@davemloft.net> Cc: Hans verkuil <hverkuil@xs4all.nl> Cc: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	MAINTAINERS: mailmap: update Yosry Ahmed's email address	Yosry Ahmed
	Moving to a linux.dev email address. Link: https://lkml.kernel.org/r/20250123231344.817358-1-yosry.ahmed@linux.dev Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	scripts/gdb: fix aarch64 userspace detection in get_current_task	Jan Kiszka
	At least recent gdb releases (seen with 14.2) return SP_EL0 as signed long which lets the right-shift always return 0. Link: https://lkml.kernel.org/r/dcd2fabc-9131-4b48-8419-6444e2d67454@siemens.com Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: Barry Song <baohua@kernel.org> Cc: Kieran Bingham <kbingham@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/vmscan: accumulate nr_demoted for accurate demotion statistics	Li Zhijian
	In shrink_folio_list(), demote_folio_list() can be called 2 times. Currently stat->nr_demoted will only store the last nr_demoted( the later nr_demoted is always zero, the former nr_demoted will get lost), as a result number of demoted pages is not accurate. Accumulate the nr_demoted count across multiple calls to demote_folio_list(), ensuring accurate reporting of demotion statistics. [lizhijian@fujitsu.com: introduce local nr_demoted to fix nr_reclaimed double counting] Link: https://lkml.kernel.org/r/20250111015253.425693-1-lizhijian@fujitsu.com Link: https://lkml.kernel.org/r/20250110122133.423481-1-lizhijian@fujitsu.com Fixes: f77f0c751478 ("mm,memcg: provide per-cgroup counters for NUMA balancing operations") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Acked-by: Kaiyang Zhao <kaiyang2@cs.cmu.edu> Tested-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	ocfs2: fix incorrect CPU endianness conversion causing mount failure	Heming Zhao
	Commit 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()") introduced a regression bug. The blksz_bits value is already converted to CPU endian in the previous code; therefore, the code shouldn't use le32_to_cpu() anymore. Link: https://lkml.kernel.org/r/20250121112204.12834-1-heming.zhao@suse.com Fixes: 23aab037106d ("ocfs2: fix UBSAN warning in ocfs2_verify_volume()") Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01	mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc()	Hyeonggon Yoo
	Commit c1b3bb73d55e ("mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage()") introduces is_first_zpdesc() function. However, the function is only used when CONFIG_DEBUG_VM=y. When building with LLVM=1 and W=1 option, the following warning is generated: $ make -j12 W=1 LLVM=1 mm/zsmalloc.o mm/zsmalloc.c:455:20: error: function 'is_first_zpdesc' is not needed and will not be emitted [-Werror,-Wunneeded-internal-declaration] 455 \| static inline bool is_first_zpdesc(struct zpdesc *zpdesc) \| ^~~~~~~~~~~~~~~ 1 error generated. Fix the warning by adding __maybe_unused attribute to the function. No functional change intended. Link: https://lkml.kernel.org/r/20250127231631.4363-1-42.hyeyoo@gmail.com Fixes: c1b3bb73d55e ("mm/zsmalloc: use zpdesc in trylock_zspage()/lock_zspage()") Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501240958.4ILzuBrH-lkp@intel.com/ Cc: Alex Shi <alexs@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>