summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide
AgeCommit message (Collapse)Author
13 daysMerge tag 'mm-nonmm-stable-2025-05-31-15-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "hung_task: extend blocking task stacktrace dump to semaphore" from Lance Yang enhances the hung task detector. The detector presently dumps the blocking tasks's stack when it is blocked on a mutex. Lance's series extends this to semaphores - "nilfs2: improve sanity checks in dirty state propagation" from Wentao Liang addresses a couple of minor flaws in nilfs2 - "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn fixes a couple of issues in the gdb scripts - "Support kdump with LUKS encryption by reusing LUKS volume keys" from Coiby Xu addresses a usability problem with kdump. When the dump device is LUKS-encrypted, the kdump kernel may not have the keys to the encrypted filesystem. A full writeup of this is in the series [0/N] cover letter - "sysfs: add counters for lockups and stalls" from Max Kellermann adds /sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and /sys/kernel/rcu_stall_count - "fork: Page operation cleanups in the fork code" from Pasha Tatashin implements a number of code cleanups in fork.c - "scripts/gdb/symbols: determine KASLR offset on s390 during early boot" from Ilya Leoshkevich fixes some s390 issues in the gdb scripts * tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits) llist: make llist_add_batch() a static inline delayacct: remove redundant code and adjust indentation squashfs: add optional full compressed block caching crash_dump, nvme: select CONFIGFS_FS as built-in scripts/gdb/symbols: determine KASLR offset on s390 during early boot scripts/gdb/symbols: factor out pagination_off() scripts/gdb/symbols: factor out get_vmlinux() kernel/panic.c: format kernel-doc comments mailmap: update and consolidate Casey Connolly's name and email nilfs2: remove wbc->for_reclaim handling fork: define a local GFP_VMAP_STACK fork: check charging success before zeroing stack fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code fork: clean-up ifdef logic around stack allocation kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count x86/crash: make the page that stores the dm crypt keys inaccessible x86/crash: pass dm crypt keys to kdump kernel Revert "x86/mm: Remove unused __set_memory_prot()" crash_dump: retrieve dm crypt keys in kdump kernel ...
13 daysMerge tag 'mm-stable-2025-05-31-14-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of creating a pte which addresses the first page in a folio and reduces the amount of plumbing which architecture must implement to provide this. - "Misc folio patches for 6.16" from Matthew Wilcox is a shower of largely unrelated folio infrastructure changes which clean things up and better prepare us for future work. - "memory,x86,acpi: hotplug memory alignment advisement" from Gregory Price adds early-init code to prevent x86 from leaving physical memory unused when physical address regions are not aligned to memory block size. - "mm/compaction: allow more aggressive proactive compaction" from Michal Clapinski provides some tuning of the (sadly, hard-coded (more sadly, not auto-tuned)) thresholds for our invokation of proactive compaction. In a simple test case, the reduction of a guest VM's memory consumption was dramatic. - "Minor cleanups and improvements to swap freeing code" from Kemeng Shi provides some code cleaups and a small efficiency improvement to this part of our swap handling code. - "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin adds the ability for a ptracer to modify syscalls arguments. At this time we can alter only "system call information that are used by strace system call tampering, namely, syscall number, syscall arguments, and syscall return value. This series should have been incorporated into mm.git's "non-MM" branch, but I goofed. - "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl against /proc/pid/pagemap. This permits CRIU to more efficiently get at the info about guard regions. - "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan implements that fix. No runtime effect is expected because validate_page_before_insert() happens to fix up this error. - "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David Hildenbrand basically brings uprobe text poking into the current decade. Remove a bunch of hand-rolled implementation in favor of using more current facilities. - "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman Khandual provides enhancements and generalizations to the pte dumping code. This might be needed when 128-bit Page Table Descriptors are enabled for ARM. - "Always call constructor for kernel page tables" from Kevin Brodsky ensures that the ctor/dtor is always called for kernel pgtables, as it already is for user pgtables. This permits the addition of more functionality such as "insert hooks to protect page tables". This change does result in various architectures performing unnecesary work, but this is fixed up where it is anticipated to occur. - "Rust support for mm_struct, vm_area_struct, and mmap" from Alice Ryhl adds plumbing to permit Rust access to core MM structures. - "fix incorrectly disallowed anonymous VMA merges" from Lorenzo Stoakes takes advantage of some VMA merging opportunities which we've been missing for 15 years. - "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from SeongJae Park optimizes process_madvise()'s TLB flushing. Instead of flushing each address range in the provided iovec, we batch the flushing across all the iovec entries. The syscall's cost was approximately halved with a microbenchmark which was designed to load this particular operation. - "Track node vacancy to reduce worst case allocation counts" from Sidhartha Kumar makes the maple tree smarter about its node preallocation. stress-ng mmap performance increased by single-digit percentages and the amount of unnecessarily preallocated memory was dramaticelly reduced. - "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes a few unnecessary things which Baoquan noted when reading the code. - ""Enhance sysfs handling for memory hotplug in weighted interleave" from Rakie Kim "enhances the weighted interleave policy in the memory management subsystem by improving sysfs handling, fixing memory leaks, and introducing dynamic sysfs updates for memory hotplug support". Fixes things on error paths which we are unlikely to hit. - "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory" from SeongJae Park introduces new DAMOS quota goal metrics which eliminate the manual tuning which is required when utilizing DAMON for memory tiering. - "mm/vmalloc.c: code cleanup and improvements" from Baoquan He provides cleanups and small efficiency improvements which Baoquan found via code inspection. - "vmscan: enforce mems_effective during demotion" from Gregory Price changes reclaim to respect cpuset.mems_effective during demotion when possible. because presently, reclaim explicitly ignores cpuset.mems_effective when demoting, which may cause the cpuset settings to violated. This is useful for isolating workloads on a multi-tenant system from certain classes of memory more consistently. - "Clean up split_huge_pmd_locked() and remove unnecessary folio pointers" from Gavin Guo provides minor cleanups and efficiency gains in in the huge page splitting and migrating code. - "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache for `struct mem_cgroup', yielding improved memory utilization. - "add max arg to swappiness in memory.reclaim and lru_gen" from Zhongkun He adds a new "max" argument to the "swappiness=" argument for memory.reclaim MGLRU's lru_gen. This directs proactive reclaim to reclaim from only anon folios rather than file-backed folios. - "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the first step on the path to permitting the kernel to maintain existing VMs while replacing the host kernel via file-based kexec. At this time only memblock's reserve_mem is preserved. - "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides and uses a smarter way of looping over a pfn range. By skipping ranges of invalid pfns. - "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning when a task is pinned a single NUMA mode. Dramatic performance benefits were seen in some real world cases. - "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank Garg addresses a warning which occurs during memory compaction when using JFS. - "move all VMA allocation, freeing and duplication logic to mm" from Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more appropriate mm/vma.c. - "mm, swap: clean up swap cache mapping helper" from Kairui Song provides code consolidation and cleanups related to the folio_index() function. - "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that. - "memcg: Fix test_memcg_min/low test failures" from Waiman Long addresses some bogus failures which are being reported by the test_memcontrol selftest. - "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo Stoakes commences the deprecation of file_operations.mmap() in favor of the new file_operations.mmap_prepare(). The latter is more restrictive and prevents drivers from messing with things in ways which, amongst other problems, may defeat VMA merging. - "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples the per-cpu memcg charge cache from the objcg's one. This is a step along the way to making memcg and objcg charging NMI-safe, which is a BPF requirement. - "mm/damon: minor fixups and improvements for code, tests, and documents" from SeongJae Park is yet another batch of miscellaneous DAMON changes. Fix and improve minor problems in code, tests and documents. - "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg stats to be irq safe. Another step along the way to making memcg charging and stats updates NMI-safe, a BPF requirement. - "Let unmap_hugepage_range() and several related functions take folio instead of page" from Fan Ni provides folio conversions in the hugetlb code. * tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits) mm: pcp: increase pcp->free_count threshold to trigger free_high mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range() mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page mm/hugetlb: pass folio instead of page to unmap_ref_private() memcg: objcg stock trylock without irq disabling memcg: no stock lock for cpu hot-unplug memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs memcg: make count_memcg_events re-entrant safe against irqs memcg: make mod_memcg_state re-entrant safe against irqs memcg: move preempt disable to callers of memcg_rstat_updated memcg: memcg_rstat_updated re-entrant safe against irqs mm: khugepaged: decouple SHMEM and file folios' collapse selftests/eventfd: correct test name and improve messages alloc_tag: check mem_profiling_support in alloc_tag_init Docs/damon: update titles and brief introductions to explain DAMOS selftests/damon/_damon_sysfs: read tried regions directories in order mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject() mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat() mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs ...
14 daysMerge tag 'gcc-minimum-version-6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull compiler version requirement update from Arnd Bergmann: "Require gcc-8 and binutils-2.30 x86 already uses gcc-8 as the minimum version, this changes all other architectures to the same version. gcc-8 is used is Debian 10 and Red Hat Enterprise Linux 8, both of which are still supported, and binutils 2.30 is the oldest corresponding version on those. Ubuntu Pro 18.04 and SUSE Linux Enterprise Server 15 both use gcc-7 as the system compiler but additionally include toolchains that remain supported. With the new minimum toolchain versions, a number of workarounds for older versions can be dropped, in particular on x86_64 and arm64. Importantly, the updated compiler version allows removing two of the five remaining gcc plugins, as support for sancov and structeak features is already included in modern compiler versions. I tried collecting the known changes that are possible based on the new toolchain version, but expect that more cleanups will be possible. Since this touches multiple architectures, I merged the patches through the asm-generic tree." * tag 'gcc-minimum-version-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: Makefile.kcov: apply needed compiler option unconditionally in CFLAGS_KCOV Documentation: update binutils-2.30 version reference gcc-plugins: remove SANCOV gcc plugin Kbuild: remove structleak gcc plugin arm64: drop binutils version checks raid6: skip avx512 checks kbuild: require gcc-8 and binutils-2.30
2025-05-29Merge tag 'platform-drivers-x86-v6.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform drivers updates from Ilpo Järvinen: "The changes are mostly business as usual. Besides pdx86 changes, there are a few power supply changes needed for related pdx86 features, move of oxpec driver from hwmon (oxp-sensors) to pdx86, and one FW version warning to hid-asus. Highlights: - alienware-wmi-wmax: - Add HWMON support - Add ABI and admin-guide documentation - Expose GPIO debug methods through debug FS - Support manual fan control and "custom" thermal profile - amd/hsmp: - Add sysfs files to show HSMP telemetry - Report power readings and limits via hwmon - amd/isp4: Add AMD ISP platform config for OV05C10 - asus-wmi: - Refactor Ally suspend/resume to work better with older FW - hid-asus: check ROG Ally MCU version and warn about old FW versions - dasharo-acpi: - Add driver for Dasharo devices supporting fans and temperatures monitoring - dell-ddv: - Expose the battery health and manufacture date to userspace using power supply extensions - Implement the battery matching algorithm - dell-pc: - Improve error propagation - Use faux device - int3472: - Add delays to avoid GPIO regulator spikes - Add handshake pin support - Make regulator supply name configurable and allow registering more than 1 GPIO regulator - Map mt9m114 powerdown pin to powerenable - intel/pmc: Add separate SSRAM Telemetry driver - intel-uncore-freq: Add attributes to show agent types and die ID - ISST: - Support SST-TF revision 2 (allows more cores per bucket) - Support SST-PP revision 2 (fabric 1 frequencies) - Remove unnecessary SST MSRs restore (the package retains MSRs despite CPU offlining) - mellanox: Add support for SN2201, SN4280, SN5610, and SN5640 - mellanox: mlxbf-pmc: Support additional PMC blocks - oxpec: - Add OneXFly variants - Add support for charge limit, charge thresholds, and turbo LED - Distinguish current X1 variants to avoid unwanted matching to new variants - Follow hwmon conventions - Move from hwmon/oxp-sensors to platform/x86 to match the enlarged scope - power supply: - Add inhibit-charge-awake (needed by oxpec) - Add additional battery health status values ("blown fuse" and "cell imbalance") (needed by dell-ddv) - powerwell-ec: Add driver for Portwell EC supporting GPIO and watchdog - thinkpad-acpi: Support camera shutter switch hotkey - tuxedo: Add virtual LampArray for TUXEDO NB04 devices - tools/power/x86/intel-speed-select: - Support displaying SST-PP revision 2 fields - Skip uncore frequency update on newer generations of CPUs - Miscellaneous cleanups / refactoring / improvements" * tag 'platform-drivers-x86-v6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (112 commits) thermal/drivers/acerhdf: Constify struct thermal_zone_device_ops platform/x86/amd/hsmp: fix building with CONFIG_HWMON=m platform/x86: asus-wmi: fix build without CONFIG_SUSPEND docs: ABI: Fix "aassociated" to "associated" platform/x86: Add AMD ISP platform config for OV05C10 Documentation: admin-guide: pm: Add documentation for die_id platform/x86/intel-uncore-freq: Add attributes to show die_id platform/x86/intel: power-domains: Add interface to get Linux die ID Documentation: admin-guide: pm: Add documentation for agent_types platform/x86/intel-uncore-freq: Add attributes to show agent types platform/x86/tuxedo: Prevent invalid Kconfig state platform/x86: dell-ddv: Expose the battery health to userspace platform/x86: dell-ddv: Expose the battery manufacture date to userspace platform/x86: dell-ddv: Implement the battery matching algorithm power: supply: core: Add additional health status values platform/x86/amd/hsmp: acpi: Add sysfs files to display HSMP telemetry platform/x86/amd/hsmp: Report power via hwmon sensors platform/x86/amd/hsmp: Use a single DRIVER_VERSION for all hsmp modules platform/mellanox: mlxreg-dpu: Fix smatch warnings platform: mellanox: nvsw-sn2200: Fix .items in nvsw_sn2201_busbar_hotplug ...
2025-05-29Merge tag 'ipe-pr-20250527' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe Pull IPE update from Fan Wu: "A single commit from Jasjiv Singh, that adds an errno field to IPE policy load auditing to log failures with error details, not just successes. This improves the security audit trail and helps diagnose policy deployment issues" * tag 'ipe-pr-20250527' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe: ipe: add errno field to IPE policy load auditing
2025-05-28Merge tag 'net-next-6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Implement the Device Memory TCP transmit path, allowing zero-copy data transmission on top of TCP from e.g. GPU memory to the wire. - Move all the IPv6 routing tables management outside the RTNL scope, under its own lock and RCU. The route control path is now 3x times faster. - Convert queue related netlink ops to instance lock, reducing again the scope of the RTNL lock. This improves the control plane scalability. - Refactor the software crc32c implementation, removing unneeded abstraction layers and improving significantly the related micro-benchmarks. - Optimize the GRO engine for UDP-tunneled traffic, for a 10% performance improvement in related stream tests. - Cover more per-CPU storage with local nested BH locking; this is a prep work to remove the current per-CPU lock in local_bh_disable() on PREMPT_RT. - Introduce and use nlmsg_payload helper, combining buffer bounds verification with accessing payload carried by netlink messages. Netfilter: - Rewrite the procfs conntrack table implementation, improving considerably the dump performance. A lot of user-space tools still use this interface. - Implement support for wildcard netdevice in netdev basechain and flowtables. - Integrate conntrack information into nft trace infrastructure. - Export set count and backend name to userspace, for better introspection. BPF: - BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops programs and can be controlled in similar way to traditional qdiscs using the "tc qdisc" command. - Refactor the UDP socket iterator, addressing long standing issues WRT duplicate hits or missed sockets. Protocols: - Improve TCP receive buffer auto-tuning and increase the default upper bound for the receive buffer; overall this improves the single flow maximum thoughput on 200Gbs link by over 60%. - Add AFS GSSAPI security class to AF_RXRPC; it provides transport security for connections to the AFS fileserver and VL server. - Improve TCP multipath routing, so that the sources address always matches the nexthop device. - Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS, and thus preventing DoS caused by passing around problematic FDs. - Retire DCCP socket. DCCP only receives updates for bugs, and major distros disable it by default. Its removal allows for better organisation of TCP fields to reduce the number of cache lines hit in the fast path. - Extend TCP drop-reason support to cover PAWS checks. Driver API: - Reorganize PTP ioctl flag support to require an explicit opt-in for the drivers, avoiding the problem of drivers not rejecting new unsupported flags. - Converted several device drivers to timestamping APIs. - Introduce per-PHY ethtool dump helpers, improving the support for dump operations targeting PHYs. Tests and tooling: - Add support for classic netlink in user space C codegen, so that ynl-c can now read, create and modify links, routes addresses and qdisc layer configuration. - Add ynl sub-types for binary attributes, allowing ynl-c to output known struct instead of raw binary data, clarifying the classic netlink output. - Extend MPTCP selftests to improve the code-coverage. - Add tests for XDP tail adjustment in AF_XDP. New hardware / drivers: - OpenVPN virtual driver: offload OpenVPN data channels processing to the kernel-space, increasing the data transfer throughput WRT the user-space implementation. - Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC. - Broadcom asp-v3.0 ethernet driver. - AMD Renoir ethernet device. - ReakTek MT9888 2.5G ethernet PHY driver. - Aeonsemi 10G C45 PHYs driver. Drivers: - Ethernet high-speed NICs: - nVidia/Mellanox (mlx5): - refactor the steering table handling to significantly reduce the amount of memory used - add support for complex matches in H/W flow steering - improve flow streeing error handling - convert to netdev instance locking - Intel (100G, ice, igb, ixgbe, idpf): - ice: add switchdev support for LLDP traffic over VF - ixgbe: add firmware manipulation and regions devlink support - igb: introduce support for frame transmission premption - igb: adds persistent NAPI configuration - idpf: introduce RDMA support - idpf: add initial PTP support - Meta (fbnic): - extend hardware stats coverage - add devlink dev flash support - Broadcom (bnxt): - add support for RX-side device memory TCP - Wangxun (txgbe): - implement support for udp tunnel offload - complete PTP and SRIOV support for AML 25G/10G devices - Ethernet NICs embedded and virtual: - Google (gve): - add device memory TCP TX support - Amazon (ena): - support persistent per-NAPI config - Airoha: - add H/W support for L2 traffic offload - add per flow stats for flow offloading - RealTek (rtl8211): add support for WoL magic packet - Synopsys (stmmac): - dwmac-socfpga 1000BaseX support - add Loongson-2K3000 support - introduce support for hardware-accelerated VLAN stripping - Broadcom (bcmgenet): - expose more H/W stats - Freescale (enetc, dpaa2-eth): - enetc: add MAC filter, VLAN filter RSS and loopback support - dpaa2-eth: convert to H/W timestamping APIs - vxlan: convert FDB table to rhashtable, for better scalabilty - veth: apply qdisc backpressure on full ring to reduce TX drops - Ethernet switches: - Microchip (kzZ88x3): add ETS scheduler support - Ethernet PHYs: - RealTek (rtl8211): - add support for WoL magic packet - add support for PHY LEDs - CAN: - Adds RZ/G3E CANFD support to the rcar_canfd driver. - Preparatory work for CAN-XL support. - Add self-tests framework with support for CAN physical interfaces. - WiFi: - mac80211: - scan improvements with multi-link operation (MLO) - Qualcomm (ath12k): - enable AHB support for IPQ5332 - add monitor interface support to QCN9274 - add multi-link operation support to WCN7850 - add 802.11d scan offload support to WCN7850 - monitor mode for WCN7850, better 6 GHz regulatory - Qualcomm (ath11k): - restore hibernation support - MediaTek (mt76): - WiFi-7 improvements - implement support for mt7990 - Intel (iwlwifi): - enhanced multi-link single-radio (EMLSR) support on 5 GHz links - rework device configuration - RealTek (rtw88): - improve throughput for RTL8814AU - RealTek (rtw89): - add multi-link operation support - STA/P2P concurrency improvements - support different SAR configs by antenna - Bluetooth: - introduce HCI Driver protocol - btintel_pcie: do not generate coredump for diagnostic events - btusb: add HCI Drv commands for configuring altsetting - btusb: add RTL8851BE device 0x0bda:0xb850 - btusb: add new VID/PID 13d3/3584 for MT7922 - btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925 - btnxpuart: implement host-wakeup feature" * tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits) selftests/bpf: Fix bpf selftest build warning selftests: netfilter: Fix skip of wildcard interface test net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames net: openvswitch: Fix the dead loop of MPLS parse calipso: Don't call calipso functions for AF_INET sk. selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem net_sched: hfsc: Address reentrant enqueue adding class to eltree twice octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback octeontx2-pf: QOS: Perform cache sync on send queue teardown net: mana: Add support for Multi Vports on Bare metal net: devmem: ncdevmem: remove unused variable net: devmem: ksft: upgrade rx test to send 1K data net: devmem: ksft: add 5 tuple FS support net: devmem: ksft: add exit_wait to make rx test pass net: devmem: ksft: add ipv4 support net: devmem: preserve sockc_err page_pool: fix ugly page_pool formatting net: devmem: move list_add to net_devmem_bind_dmabuf. selftests: netfilter: nft_queue.sh: include file transfer duration in log message net: phy: mscc: Fix memory leak when using one step timestamping ...
2025-05-28Merge tag 'media/v6.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - v4l2-core fix: V4L2_BUF_TYPE_VIDEO_OVERLAY is capture, not output - New driver: Amlogic C3 ISP - New sensor drivers: ST VD55G1 and VD56G3, OmniVision OV02C10 - amlogic: c3-mipi-csi2: Handle 64-bits division - a fix for 64-bits division at the amlogic c3-mipi-csi2 driver - Changes at atomisp to support mainline mt9m114 driver and remove deprecated GPIO APIs - various cleanups, fixes and enhancements * tag 'media/v6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (314 commits) media: rkvdec: h264: Support High 10 and 4:2:2 profiles media: rkvdec: Add get_image_fmt ops media: rkvdec: Initialize the m2m context before the controls media: rkvdec: h264: Limit minimum profile to constrained baseline media: mediatek: jpeg: support 34bits media: verisilicon: Free post processor buffers on error media: platform: mtk-mdp3: Remove unused mdp_get_plat_device media: amlogic: c3-mipi-csi2: Handle 64-bits division media: uvcvideo: Use dev_err_probe for devm_gpiod_get_optional media: uvcvideo: Fix deferred probing error media: uvcvideo: Rollback non processed entities on error media: uvcvideo: Send control events for partial succeeds media: uvcvideo: Return the number of processed controls media: uvcvideo: Do not turn on the camera for some ioctls media: uvcvideo: Make power management granular media: uvcvideo: Increase/decrease the PM counter per IOCTL media: uvcvideo: Create uvc_pm_(get|put) functions media: uvcvideo: Keep streaming state in the file handle Documentation: media: Add documentation file c3-isp.rst Documentation: media: Add documentation file metafmt-c3-isp.rst ...
2025-05-27Merge tag 'cgroup-for-6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - cgroup rstat shared the tracking tree across all controllers with the rationale being that a cgroup which is using one resource is likely to be using other resources at the same time (ie. if something is allocating memory, it's probably consuming CPU cycles). However, this turned out to not scale very well especially with memcg using rstat for internal operations which made memcg stat read and flush patterns substantially different from other controllers. JP Kobryn split the rstat tree per controller. - cgroup BPF support was hooking into cgroup init/exit paths directly. Convert them to use a notifier chain instead so that other usages can be added easily. The two of the patches which implement this are mislabeled as belonging to sched_ext instead of cgroup. Sorry. - Relatively minor cpuset updates - Documentation updates * tag 'cgroup-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (23 commits) sched_ext: Convert cgroup BPF support to use cgroup_lifetime_notifier sched_ext: Introduce cgroup_lifetime_notifier cgroup: Minor reorganization of cgroup_create() cgroup, docs: cpu controller's interaction with various scheduling policies cgroup, docs: convert space indentation to tab indentation cgroup: avoid per-cpu allocation of size zero rstat cpu locks cgroup, docs: be specific about bandwidth control of rt processes cgroup: document the rstat per-cpu initialization cgroup: helper for checking rstat participation of css cgroup: use subsystem-specific rstat locks to avoid contention cgroup: use separate rstat trees for each subsystem cgroup: compare css to cgroup::self in helper for distingushing css cgroup: warn on rstat usage by early init subsystems cgroup/cpuset: drop useless cpumask_empty() in compute_effective_exclusive_cpumask() cgroup/rstat: Improve cgroup_rstat_push_children() documentation cgroup: fix goto ordering in cgroup_init() cgroup: fix pointer check in css_rstat_init() cgroup/cpuset: Add warnings to catch inconsistency in exclusive CPUs cgroup/cpuset: Fix obsolete comment in cpuset_css_offline() cgroup/cpuset: Always use cpu_active_mask ...
2025-05-27ipe: add errno field to IPE policy load auditingJasjiv Singh
Users of IPE require a way to identify when and why an operation fails, allowing them to both respond to violations of policy and be notified of potentially malicious actions on their systems with respect to IPE. This patch introduces a new error field to the AUDIT_IPE_POLICY_LOAD event to log policy loading failures. Currently, IPE only logs successful policy loads, but not failures. Tracking failures is crucial to detect malicious attempts and ensure a complete audit trail for security events. The new error field will capture the following error codes: * -ENOKEY: Key used to sign the IPE policy not found in the keyring * -ESTALE: Attempting to update an IPE policy with an older version * -EKEYREJECTED: IPE signature verification failed * -ENOENT: Policy was deleted while updating * -EEXIST: Same name policy already deployed * -ERANGE: Policy version number overflow * -EINVAL: Policy version parsing error * -EPERM: Insufficient permission * -ENOMEM: Out of memory (OOM) * -EBADMSG: Policy is invalid Here are some examples of the updated audit record types: AUDIT_IPE_POLICY_LOAD(1422): audit: AUDIT1422 policy_name="Test_Policy" policy_version=0.0.1 policy_digest=sha256:84EFBA8FA71E62AE0A537FAB962F8A2BD1053964C4299DCA 92BFFF4DB82E86D3 auid=1000 ses=3 lsm=ipe res=1 errno=0 The above record shows a new policy has been successfully loaded into the kernel with the policy name, version, and hash with the errno=0. AUDIT_IPE_POLICY_LOAD(1422) with error: audit: AUDIT1422 policy_name=? policy_version=? policy_digest=? auid=1000 ses=3 lsm=ipe res=0 errno=-74 The above record shows a policy load failure due to an invalid policy (-EBADMSG). By adding this error field, we ensure that all policy load attempts, whether successful or failed, are logged, providing a comprehensive audit trail for IPE policy management. Signed-off-by: Jasjiv Singh <jasjivsingh@linux.microsoft.com> Signed-off-by: Fan Wu <wufan@kernel.org>
2025-05-27Merge tag 'pm-6.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "Once again, the changes are dominated by cpufreq updates, but this time the majority of them are cpufreq core changes, mostly related to the introduction of policy locking guards and __free() usage, and fixes related to boost handling. Still, there is also a significant update of the intel_pstate driver making it register an energy model when running on a hybrid platform which is used for enabling energy-aware scheduling (EAS) if the driver operates in the passive mode (and schedutil is used as the cpufreq governor for all CPUs which is the passive mode default). There are some amd-pstate driver updates too, for a good measure, including the "Requested CPU Min frequency" BIOS option support and new online/offline callbacks. In the cpuidle space, the most significant change is the addition of a C1 demotion on/off sysfs knob to intel_idle which should help some users to configure their systems more precisely. There is also the conversion of the PSCI cpuidle driver to a faux device one and there are two small updates of cpuidle governors. Device power management is also modified quite a bit, especially the handling of devices with asynchronous suspend and resume enabled during system transitions. They are now going to be handled more asynchronously during suspend transitions and somewhat less aggressively during resume transitions. Apart from the above, the operating performance points (OPP) library is now going to use mutex locking guards and scope-based cleanup helpers and there is the usual bunch of assorted fixes and code cleanups. Specifics: - Fix potential division-by-zero error in em_compute_costs() (Yaxiong Tian) - Fix typos in energy model documentation and example driver code (Moon Hee Lee, Atul Kumar Pant) - Rearrange the energy model management code and add a new function for adjusting a CPU energy model after adjusting the capacity of the given CPU to it (Rafael Wysocki) - Refactor cpufreq_online(), add and use cpufreq policy locking guards, use __free() in policy reference counting, and clean up core cpufreq code on top of that (Rafael Wysocki) - Fix boost handling on CPU suspend/resume and sysfs updates (Viresh Kumar) - Fix des_perf clamping with max_perf in amd_pstate_update() (Dhananjay Ugwekar) - Add offline, online and suspend callbacks to the amd-pstate driver, rename and use the existing amd_pstate_epp callbacks in it (Dhananjay Ugwekar) - Add support for the "Requested CPU Min frequency" BIOS option to the amd-pstate driver (Dhananjay Ugwekar) - Reset amd-pstate driver mode after running selftests (Swapnil Sapkal) - Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan Chancellor) - Add helper for governor checks to the schedutil cpufreq governor and move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki) - Populate the cpu_capacity sysfs entries from the intel_pstate driver after registering asym capacity support (Ricardo Neri) - Add support for enabling Energy-aware scheduling (EAS) to the intel_pstate driver when operating in the passive mode on a hybrid platform (Rafael Wysocki) - Drop redundant cpus_read_lock() from store_local_boost() in the cpufreq core (Seyediman Seyedarab) - Replace sscanf() with kstrtouint() in the cpufreq code and use a symbol instead of a raw number in it (Bowen Yu) - Add support for autonomous CPU performance state selection to the CPPC cpufreq driver (Lifeng Zheng) - OPP: Add dev_pm_opp_set_level() (Praveen Talari) - Introduce scope-based cleanup headers and mutex locking guards in OPP core (Viresh Kumar) - Switch OPP to use kmemdup_array() (Zhang Enpei) - Optimize bucket assignment when next_timer_ns equals KTIME_MAX in the menu cpuidle governor (Zhongqiu Han) - Convert the cpuidle PSCI driver to a faux device one (Sudeep Holla) - Add C1 demotion on/off sysfs knob to the intel_idle driver (Artem Bityutskiy) - Fix typos in two comments in the teo cpuidle governor (Atul Kumar Pant) - Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja Kalla) - Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki) - Add new devm_ functions for enabling runtime PM and runtime PM reference counting (Bence Csókás) - Remove size arguments from strscpy() calls in the hibernation core code (Thorsten Blum) - Adjust the handling of devices with asynchronous suspend enabled during system suspend and resume to start resuming them immediately after resuming their parents and to start suspending such a device immediately after suspending its first child (Rafael Wysocki) - Adjust messages printed during tasks freezing to avoid using pr_cont() (Andrew Sayers, Paul Menzel) - Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan Zhang) - Add missing wakeup source attribute relax_count to sysfs and remove the space character at the end ofi the string produced by pm_show_wakelocks() (Zijun Hu) - Add configurable pm_test delay for hibernation (Zihuan Zhang) - Disable asynchronous suspend in ucsi_ccg_probe() to prevent the cypd4226 device on Tegra boards from suspending prematurely (Jon Hunter) - Unbreak printing PM debug messages during hibernation and clean up some related code (Rafael Wysocki) - Add a systemd service to run cpupower and change cpupower binding's Makefile to use -lcpupower (John B. Wyatt IV, Francesco Poli)" * tag 'pm-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (72 commits) cpufreq: CPPC: Add support for autonomous selection cpufreq: Update sscanf() to kstrtouint() cpufreq: Replace magic number OPP: switch to use kmemdup_array() PM: freezer: Rewrite restarting tasks log to remove stray *done.* PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn() cpufreq: drop redundant cpus_read_lock() from store_local_boost() cpupower: do not install files to /etc/default/ cpupower: do not call systemctl at install time cpupower: do not write DESTDIR to cpupower.service PM: sleep: Introduce pm_sleep_transition_in_progress() cpufreq/amd-pstate: Avoid shadowing ret in amd_pstate_ut_check_driver() cpufreq: intel_pstate: Document hybrid processor support cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache cpufreq: intel_pstate: EAS support for hybrid platforms PM: EM: Introduce em_adjust_cpu_capacity() PM: EM: Move CPU capacity check to em_adjust_new_capacity() PM: EM: Documentation: Fix typos in example driver code cpufreq: Drop policy locking from cpufreq_policy_is_good_for_eas() PM: sleep: Introduce pm_suspend_in_progress() ...
2025-05-27Merge tag 'gpio-updates-for-v6.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio updates from Bartosz Golaszewski: "We have three new drivers, some refactoring in the GPIO core, lots of various changes across many drivers, new configfs interface for the virtual gpio-aggregator module and DT-bindings updates. The treewide conversion of GPIO drivers to using the new value setter callbacks is ongoing with another round of GPIO drivers updated. You will also see these commits coming in from other subsystems as with the relevant changes merged into mainline last cycle, I've started converting GPIO providers located elsewhere than drivers/gpio/. GPIO core: - use more lock guards where applicable - refactor GPIO ACPI code and shrink it in the process by 8% - move GPIO ACPI quirks into a separate file - remove unneeded #ifdef - convert GPIO devres helpers to using devm_add_action() where applicable which shrinks and simplifies the code - refactor GPIO descriptor validation in GPIO consumer interfaces - don't allow setting values on input lines in the GPIO core which will take off the burden from GPIO drivers of checking this down the line - provide gpiod_is_equal() as a way of safely comparing two GPIO descriptors (the only current user is in regulator core) New drivers: - add the GPIO module for the max77759 multifunction device - add the GPIO driver for the VeriSilicon BLZP1600 GPIO controller - add the GPIO driver for the Spacemit K1 SoC Driver improvements: - convert more drivers to using the new GPIO line value setter callbacks - convert more drivers to making the irq_chip immutable as is recommended by the interrupt subsystem - extend build testing coverage by enabling more modules to be built with COMPILE_TEST=y - extend the gpio-aggregator module with a configfs interface that makes the setup easier for user-space than the existing driver-level sysfs attributes and also adds more advanced configuration features (such as referring to aggregated lines by their original names or modifying their names as exposed by the aggregated chip) - add a missing mutex_destroy() in gpio-imx-scu - add an OF polarity quirk for s5m8767 - allow building gpio-vf610 as a loadable module - make gpio-mxc not hardcode its GPIO base number with GPIO SYSFS interface disabled (another small step towards getting rid of the global GPIO numberspace) - add support for level-triggered interrupts to gpio-pca953x - don't double-check the ngpios property in gpio-ds4520 as GPIO core already does it - don't double-check the number of GPIOs in gpio-imx-scu as GPIO core already does it - remove unused callbacks from gpio-max3191x DT bindings: - add device-tree bindings for max77759, spacemit,k1 and blzp1600 (new drivers added this cycle) - document more properties for gpio-vf610 and gpio-tegra186 - document a new pca95xx variant - fix style of examples in several GPIO DT-binding documents Misc: - TODO list updates" * tag 'gpio-updates-for-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: (123 commits) gpio: timberdale: select GPIOLIB_IRQCHIP gpio: lpc18xx: select GPIOLIB_IRQCHIP gpio: grgpio: select GPIOLIB_IRQCHIP gpio: bcm-kona: select GPIOLIB_IRQCHIP dt-bindings: gpio: vf610: add ngpios and gpio-reserved-ranges gpio: davinci: select GPIOLIB_IRQCHIP gpiolib-acpi: Update file references in the Documentation and MAINTAINERS gpiolib: acpi: Move quirks to a separate file gpiolib: acpi: Add acpi_gpio_need_run_edge_events_on_boot() getter gpiolib: acpi: Handle deferred list via new API gpiolib: acpi: Make sure we fill struct acpi_gpio_info gpiolib: acpi: Switch to use enum in acpi_gpio_in_ignore_list() gpiolib: acpi: Use temporary variable for struct acpi_gpio_info gpiolib: remove unneeded #ifdef gpio: mpc8xxx: select GPIOLIB_IRQCHIP gpio: pxa: select GPIOLIB_IRQCHIP gpio: pxa: Make irq_chip immutable gpio: timberdale: Make irq_chip immutable gpio: xgene-sb: Make irq_chip immutable gpio: davinci: Make irq_chip immutable ...
2025-05-27Merge tag 'docs-6.16' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "A moderately busy cycle for documentation this time around: - The most significant change is the replacement of the old kernel-doc script (a monstrous collection of Perl regexes that predates the Git era) with a Python reimplementation. That, too, is a horrifying collection of regexes, but in a much cleaner and more maintainable structure that integrates far better with the Sphinx build system. This change has been in linux-next for the full 6.15 cycle; the small number of problems that turned up have been addressed, seemingly to everybody's satisfaction. The Perl kernel-doc script remains in tree (as scripts/kernel-doc.pl) and can be used with a command-line option if need be. Unless some reason to keep it around materializes, it will probably go away in 6.17. Credit goes to Mauro Carvalho Chehab for doing all this work. - Some RTLA documentation updates - A handful of Chinese translations - The usual collection of typo fixes, general updates, etc" * tag 'docs-6.16' of git://git.lwn.net/linux: (85 commits) Docs: doc-guide: update sphinx.rst Sphinx version number docs: doc-guide: clarify latest theme usage Documentation/scheduler: Fix typo in sched-stats domain field description scripts: kernel-doc: prevent a KeyError when checking output docs: kerneldoc.py: simplify exception handling logic MAINTAINERS: update linux-doc entry to cover new Python scripts docs: align with scripts/syscall.tbl migration Documentation: NTB: Fix typo Documentation: ioctl-number: Update table intro docs: conf.py: drop backward support for old Sphinx versions Docs: driver-api/basics: add kobject_event interfaces Docs: relay: editing cleanups docs: fix "incase" typo in coresight/panic.rst Fix spelling error for 'parallel' docs: admin-guide: fix typos in reporting-issues.rst docs: dmaengine: add explanation for DMA_ASYNC_TX capability Documentation: leds: improve readibility of multicolor doc docs: fix typo in firmware-related section docs: Makefile: Inherit PYTHONPYCACHEPREFIX setting as env variable Documentation: ioctl-number: Update outdated submission info ...
2025-05-26Merge tag 'x86-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: "Boot code changes: - A large series of changes to reorganize the x86 boot code into a better isolated and easier to maintain base of PIC early startup code in arch/x86/boot/startup/, by Ard Biesheuvel. Motivation & background: | Since commit | | c88d71508e36 ("x86/boot/64: Rewrite startup_64() in C") | | dated Jun 6 2017, we have been using C code on the boot path in a way | that is not supported by the toolchain, i.e., to execute non-PIC C | code from a mapping of memory that is different from the one provided | to the linker. It should have been obvious at the time that this was a | bad idea, given the need to sprinkle fixup_pointer() calls left and | right to manipulate global variables (including non-pointer variables) | without crashing. | | This C startup code has been expanding, and in particular, the SEV-SNP | startup code has been expanding over the past couple of years, and | grown many of these warts, where the C code needs to use special | annotations or helpers to access global objects. This tree includes the first phase of this work-in-progress x86 boot code reorganization. Scalability enhancements and micro-optimizations: - Improve code-patching scalability (Eric Dumazet) - Remove MFENCEs for X86_BUG_CLFLUSH_MONITOR (Andrew Cooper) CPU features enumeration updates: - Thorough reorganization and cleanup of CPUID parsing APIs (Ahmed S. Darwish) - Fix, refactor and clean up the cacheinfo code (Ahmed S. Darwish, Thomas Gleixner) - Update CPUID bitfields to x86-cpuid-db v2.3 (Ahmed S. Darwish) Memory management changes: - Allow temporary MMs when IRQs are on (Andy Lutomirski) - Opt-in to IRQs-off activate_mm() (Andy Lutomirski) - Simplify choose_new_asid() and generate better code (Borislav Petkov) - Simplify 32-bit PAE page table handling (Dave Hansen) - Always use dynamic memory layout (Kirill A. Shutemov) - Make SPARSEMEM_VMEMMAP the only memory model (Kirill A. Shutemov) - Make 5-level paging support unconditional (Kirill A. Shutemov) - Stop prefetching current->mm->mmap_lock on page faults (Mateusz Guzik) - Predict valid_user_address() returning true (Mateusz Guzik) - Consolidate initmem_init() (Mike Rapoport) FPU support and vector computing: - Enable Intel APX support (Chang S. Bae) - Reorgnize and clean up the xstate code (Chang S. Bae) - Make task_struct::thread constant size (Ingo Molnar) - Restore fpu_thread_struct_whitelist() to fix CONFIG_HARDENED_USERCOPY=y (Kees Cook) - Simplify the switch_fpu_prepare() + switch_fpu_finish() logic (Oleg Nesterov) - Always preserve non-user xfeatures/flags in __state_perm (Sean Christopherson) Microcode loader changes: - Help users notice when running old Intel microcode (Dave Hansen) - AMD: Do not return error when microcode update is not necessary (Annie Li) - AMD: Clean the cache if update did not load microcode (Boris Ostrovsky) Code patching (alternatives) changes: - Simplify, reorganize and clean up the x86 text-patching code (Ingo Molnar) - Make smp_text_poke_batch_process() subsume smp_text_poke_batch_finish() (Nikolay Borisov) - Refactor the {,un}use_temporary_mm() code (Peter Zijlstra) Debugging support: - Add early IDT and GDT loading to debug relocate_kernel() bugs (David Woodhouse) - Print the reason for the last reset on modern AMD CPUs (Yazen Ghannam) - Add AMD Zen debugging document (Mario Limonciello) - Fix opcode map (!REX2) superscript tags (Masami Hiramatsu) - Stop decoding i64 instructions in x86-64 mode at opcode (Masami Hiramatsu) CPU bugs and bug mitigations: - Remove X86_BUG_MMIO_UNKNOWN (Borislav Petkov) - Fix SRSO reporting on Zen1/2 with SMT disabled (Borislav Petkov) - Restructure and harmonize the various CPU bug mitigation methods (David Kaplan) - Fix spectre_v2 mitigation default on Intel (Pawan Gupta) MSR API: - Large MSR code and API cleanup (Xin Li) - In-kernel MSR API type cleanups and renames (Ingo Molnar) PKEYS: - Simplify PKRU update in signal frame (Chang S. Bae) NMI handling code: - Clean up, refactor and simplify the NMI handling code (Sohil Mehta) - Improve NMI duration console printouts (Sohil Mehta) Paravirt guests interface: - Restrict PARAVIRT_XXL to 64-bit only (Kirill A. Shutemov) SEV support: - Share the sev_secrets_pa value again (Tom Lendacky) x86 platform changes: - Introduce the <asm/amd/> header namespace (Ingo Molnar) - i2c: piix4, x86/platform: Move the SB800 PIIX4 FCH definitions to <asm/amd/fch.h> (Mario Limonciello) Fixes and cleanups: - x86 assembly code cleanups and fixes (Uros Bizjak) - Misc fixes and cleanups (Andi Kleen, Andy Lutomirski, Andy Shevchenko, Ard Biesheuvel, Bagas Sanjaya, Baoquan He, Borislav Petkov, Chang S. Bae, Chao Gao, Dan Williams, Dave Hansen, David Kaplan, David Woodhouse, Eric Biggers, Ingo Molnar, Josh Poimboeuf, Juergen Gross, Malaya Kumar Rout, Mario Limonciello, Nathan Chancellor, Oleg Nesterov, Pawan Gupta, Peter Zijlstra, Shivank Garg, Sohil Mehta, Thomas Gleixner, Uros Bizjak, Xin Li)" * tag 'x86-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (331 commits) x86/bugs: Fix spectre_v2 mitigation default on Intel x86/bugs: Restructure ITS mitigation x86/xen/msr: Fix uninitialized variable 'err' x86/msr: Remove a superfluous inclusion of <asm/asm.h> x86/paravirt: Restrict PARAVIRT_XXL to 64-bit only x86/mm/64: Make 5-level paging support unconditional x86/mm/64: Make SPARSEMEM_VMEMMAP the only memory model x86/mm/64: Always use dynamic memory layout x86/bugs: Fix indentation due to ITS merge x86/cpuid: Rename hypervisor_cpuid_base()/for_each_possible_hypervisor_cpuid_base() to cpuid_base_hypervisor()/for_each_possible_cpuid_base_hypervisor() x86/cpu/intel: Rename CPUID(0x2) descriptors iterator parameter x86/cacheinfo: Rename CPUID(0x2) descriptors iterator parameter x86/cpuid: Rename cpuid_get_leaf_0x2_regs() to cpuid_leaf_0x2() x86/cpuid: Rename have_cpuid_p() to cpuid_feature() x86/cpuid: Set <asm/cpuid/api.h> as the main CPUID header x86/cpuid: Move CPUID(0x2) APIs into <cpuid/api.h> x86/msr: Add rdmsrl_on_cpu() compatibility wrapper x86/mm: Fix kernel-doc descriptions of various pgtable methods x86/asm-offsets: Export certain 'struct cpuinfo_x86' fields for 64-bit asm use too x86/boot: Defer initialization of VM space related global variables ...
2025-05-26Merge tag 'sched-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core & fair scheduler changes: - Tweak wait_task_inactive() to force dequeue sched_delayed tasks (John Stultz) - Adhere to place_entity() constraints (Peter Zijlstra) - Allow decaying util_est when util_avg > CPU capacity (Pierre Gondois) - Fix up wake_up_sync() vs DELAYED_DEQUEUE (Xuewen Yan) Energy management: - Introduce sched_update_asym_prefer_cpu() (K Prateek Nayak) - cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings change (K Prateek Nayak) - Align uclamp and util_est and call before freq update (Xuewen Yan) CPU isolation: - Make use of more than one housekeeping CPU (Phil Auld) RT scheduler: - Fix race in push_rt_task() (Harshit Agarwal) - Add kernel cmdline option for rt_group_sched (Michal Koutný) Scheduler topology support: - Improve topology_span_sane speed (Steve Wahl) Scheduler debugging: - Move and extend the sched_process_exit() tracepoint (Andrii Nakryiko) - Add RT_GROUP WARN checks for non-root task_groups (Michal Koutný) - Fix trace_sched_switch(.prev_state) (Peter Zijlstra) - Untangle cond_resched() and live-patching (Peter Zijlstra) Fixes and cleanups: - Misc fixes and cleanups (K Prateek Nayak, Michal Koutný, Peter Zijlstra, Xuewen Yan)" * tag 'sched-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits) sched/uclamp: Align uclamp and util_est and call before freq update sched/util_est: Simplify condition for util_est_{en,de}queue() sched/fair: Fixup wake_up_sync() vs DELAYED_DEQUEUE sched,livepatch: Untangle cond_resched() and live-patching sched/core: Tweak wait_task_inactive() to force dequeue sched_delayed tasks sched/fair: Adhere to place_entity() constraints sched/debug: Print the local group's asym_prefer_cpu cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings change sched/topology: Introduce sched_update_asym_prefer_cpu() sched/fair: Use READ_ONCE() to read sg->asym_prefer_cpu sched/isolation: Make use of more than one housekeeping cpu sched/rt: Fix race in push_rt_task sched: Add annotations to RT_GROUP_SCHED fields sched: Add RT_GROUP WARN checks for non-root task_groups sched: Do not construct nor expose RT_GROUP_SCHED structures if disabled sched: Bypass bandwitdh checks with runtime disabled RT_GROUP_SCHED sched: Skip non-root task_groups with disabled RT_GROUP_SCHED sched: Add commadline option for RT_GROUP_SCHED toggling sched: Always initialize rt_rq's task_group sched: Remove unneeed macro wrap ...
2025-05-26Merge tag 'next.2025.05.17a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU updates from Joel Fernandes: - Removed swake_up_one_online() workaround - Reverted an incorrect rcuog wake-up fix from offline softirq - Rust RCU Guard methods marked as inline - Updated MAINTAINERS with Joel’s and Zqiang's new email address - Replaced magic constant in rcu_seq_done_exact() with named constant - Added warning mechanism to validate rcu_seq_done_exact() - Switched SRCU polling API to use rcu_seq_done_exact() - Commented on redundant delta check in rcu_seq_done_exact() - Made ->gpwrap tests in rcutorture more frequent - Fixed reuse of ARM64 images in rcutorture - rcutorture improved to check Kconfig and reader conflict handling - Extracted logic from rcu_torture_one_read() for clarity - Updated LWN RCU API documentation links - Enabled --do-rt in torture.sh for CONFIG_PREEMPT_RT - Added tests for SRCU up/down reader primitives - Added comments and delays checks in rcutorture - Deprecated srcu_read_lock_lite() and srcu_read_unlock_lite() via checkpatch - Added --do-normal and --do-no-normal to torture.sh - Added RCU Rust binding tests to torture.sh - Reduced CPU overcommit and removed MAXSMP/CPUMASK_OFFSTACK in TREE01 - Replaced kmalloc() with kcalloc() in rcuscale - Refined listRCU example code for stale data elimination - Fixed hardirq count bug for x86 in cpu_stall_cputime - Added safety checks in rcu/nocb for offloaded rdp access - Other miscellaneous changes * tag 'next.2025.05.17a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (27 commits) rcutorture: Fix issue with re-using old images on ARM64 rcutorture: Remove MAXSMP and CPUMASK_OFFSTACK from TREE01 rcutorture: Reduce TREE01 CPU overcommit torture: Check for "Call trace:" as well as "Call Trace:" rcutorture: Perform more frequent testing of ->gpwrap torture: Add testing of RCU's Rust bindings to torture.sh torture: Add --do-{,no-}normal to torture.sh checkpatch: Deprecate srcu_read_lock_lite() and srcu_read_unlock_lite() rcutorture: Comment invocations of tick_dep_set_task() rcu/nocb: Add Safe checks for access offloaded rdp rcuscale: using kcalloc() to relpace kmalloc() doc/RCU/listRCU: refine example code for eliminating stale data doc: Update LWN RCU API links in whatisRCU.rst Revert "rcu/nocb: Fix rcuog wake-up from offline softirq" rust: sync: rcu: Mark Guard methods as inline rcu/cpu_stall_cputime: fix the hardirq count for x86 architecture rcu: Remove swake_up_one_online() bandaid MAINTAINERS: Update Zqiang's email address rcutorture: Make torture.sh --do-rt use CONFIG_PREEMPT_RT srcu: Use rcu_seq_done_exact() for polling API ...
2025-05-26Merge tag 'xfs-merge-6.16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Carlos Maiolino: - Atomic writes for XFS - Remove experimental warnings for pNFS, scrub and parent pointers * tag 'xfs-merge-6.16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (26 commits) xfs: add inode to zone caching for data placement xfs: free the item in xfs_mru_cache_insert on failure xfs: remove the EXPERIMENTAL warning for pNFS xfs: remove some EXPERIMENTAL warnings xfs: Remove deprecated xfs_bufd sysctl parameters xfs: stop using set_blocksize xfs: allow sysadmins to specify a maximum atomic write limit at mount time xfs: update atomic write limits xfs: add xfs_calc_atomic_write_unit_max() xfs: add xfs_file_dio_write_atomic() xfs: commit CoW-based atomic writes atomically xfs: add large atomic writes checks in xfs_direct_write_iomap_begin() xfs: add xfs_atomic_write_cow_iomap_begin() xfs: refine atomic write size check in xfs_file_write_iter() xfs: refactor xfs_reflink_end_cow_extent() xfs: allow block allocator to take an alignment hint xfs: ignore HW which cannot atomic write a single block xfs: add helpers to compute transaction reservation for finishing intent items xfs: add helpers to compute log item overhead xfs: separate out setting buftarg atomic writes limits ...
2025-05-26Merge branches 'pm-runtime' and 'pm-sleep'Rafael J. Wysocki
Merge updates related to system sleep handling and runtime PM for 6.16-rc1: - Fix denying of auto suspend in pm_suspend_timer_fn() (Charan Teja Kalla). - Move debug runtime PM attributes to runtime_attrs[] (Rafael Wysocki). - Add new devm_ functions for enabling runtime PM and runtime PM reference counting (Bence Csókás). - Remove size arguments from strscpy() calls in the hibernation core code (Thorsten Blum). - Adjust the handling of devices with asynchronous suspend enabled during system suspend and resume to start resuming them immediately after resuming their parents and to start suspending such a device immediately after suspending its first child (Rafael Wysocki). - Adjust messages printed during tasks freezing to avoid using pr_cont() (Andrew Sayers, Paul Menzel). - Clean up unnecessary usage of !! in pm_print_times_init() (Zihuan Zhang). - Add missing wakeup source attribute relax_count to sysfs and remove the space character at the end ofi the string produced by pm_show_wakelocks() (Zijun Hu). - Add configurable pm_test delay for hibernation (Zihuan Zhang). - Disable asynchronous suspend in ucsi_ccg_probe() to prevent the cypd4226 device on Tegra boards from suspending prematurely (Jon Hunter). - Unbreak printing PM debug messages during hibernation and clean up some related code (Rafael Wysocki). * pm-runtime: PM: runtime: fix denying of auto suspend in pm_suspend_timer_fn() PM: sysfs: Move debug runtime PM attributes to runtime_attrs[] PM: runtime: Add new devm functions * pm-sleep: PM: freezer: Rewrite restarting tasks log to remove stray *done.* PM: sleep: Introduce pm_sleep_transition_in_progress() PM: sleep: Introduce pm_suspend_in_progress() PM: sleep: Print PM debug messages during hibernation ucsi_ccg: Disable async suspend in ucsi_ccg_probe() PM: hibernate: add configurable delay for pm_test PM: wakeup: Delete space in the end of string shown by pm_show_wakelocks() PM: wakeup: Add missing wakeup source attribute relax_count PM: sleep: Remove unnecessary !! PM: sleep: Use two lines for "Restarting..." / "done" messages PM: sleep: Make suspend of devices more asynchronous PM: sleep: Suspend async parents after suspending children PM: sleep: Resume children after resuming the parent PM: hibernate: Remove size arguments when calling strscpy()
2025-05-26Merge branch 'pm-cpuidle'Rafael J. Wysocki
Merge cpuidle updates for 6.16-rc1: - Optimize bucket assignment when next_timer_ns equals KTIME_MAX in the menu cpuidle governor (Zhongqiu Han). - Convert the cpuidle PSCI driver to a faux device one (Sudeep Holla). - Add C1 demotion on/off sysfs knob to the intel_idle driver (Artem Bityutskiy). - Fix typos in two comments in the teo cpuidle governor (Atul Kumar Pant). * pm-cpuidle: cpuidle: psci: Avoid initializing faux device if no DT idle states are present Documentation: ABI: testing: document the new cpuidle sysfs file Documentation: admin-guide: pm: Document intel_idle C1 demotion intel_idle: Add C1 demotion on/off sysfs knob cpuidle: psci: Transition to the faux device interface cpuidle: menu: Optimize bucket assignment when next_timer_ns equals KTIME_MAX cpuidle: teo: Fix typos in two comments
2025-05-26Merge tag 'for-6.16/block-20250523' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - ublk updates: - Add support for updating the size of a ublk instance - Zero-copy improvements - Auto-registering of buffers for zero-copy - Series simplifying and improving GET_DATA and request lookup - Series adding quiesce support - Lots of selftests additions - Various cleanups - NVMe updates via Christoph: - add per-node DMA pools and use them for PRP/SGL allocations (Caleb Sander Mateos, Keith Busch) - nvme-fcloop refcounting fixes (Daniel Wagner) - support delayed removal of the multipath node and optionally support the multipath node for private namespaces (Nilay Shroff) - support shared CQs in the PCI endpoint target code (Wilfred Mallawa) - support admin-queue only authentication (Hannes Reinecke) - use the crc32c library instead of the crypto API (Eric Biggers) - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes Reinecke, Leon Romanovsky, Gustavo A. R. Silva) - MD updates via Yu: - Fix that normal IO can be starved by sync IO, found by mkfs on newly created large raid5, with some clean up patches for bdev inflight counters - Clean up brd, getting rid of atomic kmaps and bvec poking - Add loop driver specifically for zoned IO testing - Eliminate blk-rq-qos calls with a static key, if not enabled - Improve hctx locking for when a plug has IO for multiple queues pending - Remove block layer bouncing support, which in turn means we can remove the per-node bounce stat as well - Improve blk-throttle support - Improve delay support for blk-throttle - Improve brd discard support - Unify IO scheduler switching. This should also fix a bunch of lockdep warnings we've been seeing, after enabling lockdep support for queue freezing/unfreezeing - Add support for block write streams via FDP (flexible data placement) on NVMe - Add a bunch of block helpers, facilitating the removal of a bunch of duplicated boilerplate code - Remove obsolete BLK_MQ pci and virtio Kconfig options - Add atomic/untorn write support to blktrace - Various little cleanups and fixes * tag 'for-6.16/block-20250523' of git://git.kernel.dk/linux: (186 commits) selftests: ublk: add test for UBLK_F_QUIESCE ublk: add feature UBLK_F_QUIESCE selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE traceevent/block: Add REQ_ATOMIC flag to block trace events ublk: run auto buf unregisgering in same io_ring_ctx with registering io_uring: add helper io_uring_cmd_ctx_handle() ublk: remove io argument from ublk_auto_buf_reg_fallback() ublk: handle ublk_set_auto_buf_reg() failure correctly in ublk_fetch() selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK selftests: ublk: support UBLK_F_AUTO_BUF_REG ublk: support UBLK_AUTO_BUF_REG_FALLBACK ublk: register buffer to local io_uring with provided buf index via UBLK_F_AUTO_BUF_REG ublk: prepare for supporting to register request buffer automatically ublk: convert to refcount_t selftests: ublk: make IO & device removal test more stressful nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk nvme: introduce multipath_always_on module param nvme-multipath: introduce delayed removal of the multipath head node nvme-pci: derive and better document max segments limits nvme-pci: use struct_size for allocation struct nvme_dev ...
2025-05-26Merge branch 'pm-cpufreq'Rafael J. Wysocki
Merge cpufreq updates for 6.16-rc1: - Refactor cpufreq_online(), add and use cpufreq policy locking guards, use __free() in policy reference counting, and clean up core cpufreq code on top of that (Rafael Wysocki). - Fix boost handling on CPU suspend/resume and sysfs updates (Viresh Kumar). - Fix des_perf clamping with max_perf in amd_pstate_update() (Dhananjay Ugwekar). - Add offline, online and suspend callbacks to the amd-pstate driver, rename and use the existing amd_pstate_epp callbacks in it (Dhananjay Ugwekar). - Add support for the "Requested CPU Min frequency" BIOS option to the amd-pstate driver (Dhananjay Ugwekar). - Reset amd-pstate driver mode after running selftests (Swapnil Sapkal). - Add helper for governor checks to the schedutil cpufreq governor and move cpufreq-specific EAS checks to cpufreq (Rafael Wysocki). - Populate the cpu_capacity sysfs entries from the intel_pstate driver after registering asym capacity support (Ricardo Neri). - Add support for enabling Energy-aware scheduling (EAS) to the intel_pstate driver when operating in the passive mode on a hybrid platform (Rafael Wysocki). - Avoid shadowing ret in amd_pstate_ut_check_driver() (Nathan Chancellor). - Drop redundant cpus_read_lock() from store_local_boost() in the cpufreq core (Seyediman Seyedarab). - Replace sscanf() with kstrtouint() in the cpufreq code and use a symbol instead of a raw number in it (Bowen Yu). - Add support for autonomous CPU performance state selection to the CPPC cpufreq driver (Lifeng Zheng). * pm-cpufreq: (31 commits) cpufreq: CPPC: Add support for autonomous selection cpufreq: Update sscanf() to kstrtouint() cpufreq: Replace magic number cpufreq: drop redundant cpus_read_lock() from store_local_boost() cpufreq/amd-pstate: Avoid shadowing ret in amd_pstate_ut_check_driver() cpufreq: intel_pstate: Document hybrid processor support cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache cpufreq: intel_pstate: EAS support for hybrid platforms cpufreq: Drop policy locking from cpufreq_policy_is_good_for_eas() cpufreq: intel_pstate: Populate the cpu_capacity sysfs entries arch_topology: Relocate cpu_scale to topology.[h|c] cpufreq/sched: Move cpufreq-specific EAS checks to cpufreq cpufreq/sched: schedutil: Add helper for governor checks amd-pstate-ut: Reset amd-pstate driver mode after running selftests cpufreq/amd-pstate: Add support for the "Requested CPU Min frequency" BIOS option cpufreq/amd-pstate: Add offline, online and suspend callbacks for amd_pstate_driver cpufreq: Force sync policy boost with global boost on sysfs update cpufreq: Preserve policy's boost state after resume cpufreq: Introduce policy_set_boost() cpufreq: Don't unnecessarily call set_boost() ...
2025-05-26Merge tag 'vfs-6.16-rc1.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Use folios for symlinks in the page cache FUSE already uses folios for its symlinks. Mirror that conversion in the generic code and the NFS code. That lets us get rid of a few folio->page->folio conversions in this path, and some of the few remaining users of read_cache_page() / read_mapping_page() - Try and make a few filesystem operations killable on the VFS inode->i_mutex level - Add sysctl vfs_cache_pressure_denom for bulk file operations Some workloads need to preserve more dentries than we currently allow through out sysctl interface A HDFS servers with 12 HDDs per server, on a HDFS datanode startup involves scanning all files and caching their metadata (including dentries and inodes) in memory. Each HDD contains approximately 2 million files, resulting in a total of ~20 million cached dentries after initialization To minimize dentry reclamation, they set vfs_cache_pressure to 1. Despite this configuration, memory pressure conditions can still trigger reclamation of up to 50% of cached dentries, reducing the cache from 20 million to approximately 10 million entries. During the subsequent cache rebuild period, any HDFS datanode restart operation incurs substantial latency penalties until full cache recovery completes To maintain service stability, more dentries need to be preserved during memory reclamation. The current minimum reclaim ratio (1/100 of total dentries) remains too aggressive for such workload. This patch introduces vfs_cache_pressure_denom for more granular cache pressure control The configuration [vfs_cache_pressure=1, vfs_cache_pressure_denom=10000] effectively maintains the full 20 million dentry cache under memory pressure, preventing datanode restart performance degradation - Avoid some jumps in inode_permission() using likely()/unlikely() - Avid a memory access which is most likely a cache miss when descending into devcgroup_inode_permission() - Add fastpath predicts for stat() and fdput() - Anonymous inodes currently don't come with a proper mode causing issues in the kernel when we want to add useful VFS debug assert. Fix that by giving them a proper mode and masking it off when we report it to userspace which relies on them not having any mode - Anonymous inodes currently allow to change inode attributes because the VFS falls back to simple_setattr() if i_op->setattr isn't implemented. This means the ownership and mode for every single user of anon_inode_inode can be changed. Block that as it's either useless or actively harmful. If specific ownership is needed the respective subsystem should allocate anonymous inodes from their own private superblock - Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock - Add proper tests for anonymous inode behavior - Make it easy to detect proper anonymous inodes and to ensure that we can detect them in codepaths such as readahead() Cleanups: - Port pidfs to the new anon_inode_{g,s}etattr() helpers - Try to remove the uselib() system call - Add unlikely branch hint return path for poll - Add unlikely branch hint on return path for core_sys_select - Don't allow signals to interrupt getdents copying for fuse - Provide a size hint to dir_context for during readdir() - Use writeback_iter directly in mpage_writepages - Update compression and mtime descriptions in initramfs documentation - Update main netfs API document - Remove useless plus one in super_cache_scan() - Remove unnecessary NULL-check guards during setns() - Add separate separate {get,put}_cgroup_ns no-op cases Fixes: - Fix typo in root= kernel parameter description - Use KERN_INFO for infof()|info_plog()|infofc() - Correct comments of fs_validate_description() - Mark an unlikely if condition with unlikely() in vfs_parse_monolithic_sep() - Delete macro fsparam_u32hex() - Remove unused and problematic validate_constant_table() - Fix potential unsigned integer underflow in fs_name() - Make file-nr output the total allocated file handles" * tag 'vfs-6.16-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (43 commits) fs: Pass a folio to page_put_link() nfs: Use a folio in nfs_get_link() fs: Convert __page_get_link() to use a folio fs/read_write: make default_llseek() killable fs/open: make do_truncate() killable fs/open: make chmod_common() and chown_common() killable include/linux/fs.h: add inode_lock_killable() readdir: supply dir_context.count as readdir buffer size hint vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations fuse: don't allow signals to interrupt getdents copying Documentation: fix typo in root= kernel parameter description include/cgroup: separate {get,put}_cgroup_ns no-op case kernel/nsproxy: remove unnecessary guards fs: use writeback_iter directly in mpage_writepages fs: remove useless plus one in super_cache_scan() fs: add S_ANON_INODE fs: remove uselib() system call device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission() fs/fs_parse: Remove unused and problematic validate_constant_table() fs: touch up predicts in inode_permission() ...
2025-05-26Merge tag 'vfs-6.16-rc1.writepage' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull final writepage conversion from Christian Brauner: "This converts vboxfs from ->writepage() to ->writepages(). This was the last user of the ->writepage() method. So remove ->writepage() completely and all references to it" * tag 'vfs-6.16-rc1.writepage' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Remove aops->writepage mm: Remove swap_writepage() and shmem_writepage() ttm: Call shmem_writeout() from ttm_backup_backup_page() i915: Use writeback_iter() shmem: Add shmem_writeout() writeback: Remove writeback_use_writepage() migrate: Remove call to ->writepage vboxsf: Convert to writepages 9p: Add a migrate_folio method
2025-05-22Docs/damon: update titles and brief introductions to explain DAMOSSeongJae Park
DAMON was initially developed only for data access monitoring, and then extended for not only access monitoring but also access-aware system operations (DAMOS). But the documents have old titles and brief introductions for only the monitoring part. Update the titles and the brief introductions to explain DAMOS part together. Link: https://lkml.kernel.org/r/20250513002715.40126-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: David Gow <davidgow@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-22cgroup, docs: cpu controller's interaction with various scheduling policiesShashank Balaji
The cpu controller interface files account for or affect processes differently based on their scheduling policy, and the underlying scheduler used (fair-class vs. BPF scheduler). Document these differences Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-22cgroup, docs: convert space indentation to tab indentationShashank Balaji
The paragraphs on cpu.uclamp.{min,max} are space indented. Convert them to tab indentation to make them uniform with the other paragraphs. Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-21x86/crash: pass dm crypt keys to kdump kernelCoiby Xu
1st kernel will build up the kernel command parameter dmcryptkeys as similar to elfcorehdr to pass the memory address of the stored info of dm crypt key to kdump kernel. Link: https://lkml.kernel.org/r/20250502011246.99238-8-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21crash_dump: reuse saved dm crypt keys for CPU/memory hot-pluggingCoiby Xu
When there are CPU and memory hot un/plugs, the dm crypt keys may need to be reloaded again depending on the solution for crash hotplug support. Currently, there are two solutions. One is to utilizes udev to instruct user space to reload the kdump kernel image and initrd, elfcorehdr and etc again. The other is to only update the elfcorehdr segment introduced in commit 247262756121 ("crash: add generic infrastructure for crash hotplug support"). For the 1st solution, the dm crypt keys need to be reloaded again. The user space can write true to /sys/kernel/config/crash_dm_crypt_key/reuse so the stored keys can be re-used. For the 2nd solution, the dm crypt keys don't need to be reloaded. Currently, only x86 supports the 2nd solution. If the 2nd solution gets extended to all arches, this patch can be dropped. Link: https://lkml.kernel.org/r/20250502011246.99238-5-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21crash_dump: make dm crypt keys persist for the kdump kernelCoiby Xu
A configfs /sys/kernel/config/crash_dm_crypt_keys is provided for user space to make the dm crypt keys persist for the kdump kernel. Take the case of dumping to a LUKS-encrypted target as an example, here is the life cycle of the kdump copies of LUKS volume keys, 1. After the 1st kernel loads the initramfs during boot, systemd uses an user-input passphrase to de-crypt the LUKS volume keys or simply TPM-sealed volume keys and then save the volume keys to specified keyring (using the --link-vk-to-keyring API) and the keys will expire within specified time. 2. A user space tool (kdump initramfs loader like kdump-utils) create key items inside /sys/kernel/config/crash_dm_crypt_keys to inform the 1st kernel which keys are needed. 3. When the kdump initramfs is loaded by the kexec_file_load syscall, the 1st kernel will iterate created key items, save the keys to kdump reserved memory. 4. When the 1st kernel crashes and the kdump initramfs is booted, the kdump initramfs asks the kdump kernel to create a user key using the key stored in kdump reserved memory by writing yes to /sys/kernel/crash_dm_crypt_keys/restore. Then the LUKS encrypted device is unlocked with libcryptsetup's --volume-key-keyring API. 5. The system gets rebooted to the 1st kernel after dumping vmcore to the LUKS encrypted device is finished Eventually the keys have to stay in the kdump reserved memory for the kdump kernel to unlock encrypted volumes. During this process, some measures like letting the keys expire within specified time are desirable to reduce security risk. This patch assumes, 1) there are 128 LUKS devices at maximum to be unlocked thus MAX_KEY_NUM=128. 2) a key description won't exceed 128 bytes thus KEY_DESC_MAX_LEN=128. And here is a demo on how to interact with /sys/kernel/config/crash_dm_crypt_keys, # Add key #1 mkdir /sys/kernel/config/crash_dm_crypt_keys/7d26b7b4-e342-4d2d-b660-7426b0996720 # Add key #1's description echo cryptsetup:7d26b7b4-e342-4d2d-b660-7426b0996720 > /sys/kernel/config/crash_dm_crypt_keys/description # how many keys do we have now? cat /sys/kernel/config/crash_dm_crypt_keys/count 1 # Add key# 2 in the same way # how many keys do we have now? cat /sys/kernel/config/crash_dm_crypt_keys/count 2 # the tree structure of /crash_dm_crypt_keys configfs tree /sys/kernel/config/crash_dm_crypt_keys/ /sys/kernel/config/crash_dm_crypt_keys/ ├── 7d26b7b4-e342-4d2d-b660-7426b0996720 │   └── description ├── count ├── fce2cd38-4d59-4317-8ce2-1fd24d52c46a │   └── description Link: https://lkml.kernel.org/r/20250502011246.99238-3-coxu@redhat.com Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: Jan Pazdziora <jpazdziora@redhat.com> Cc: Liu Pingfan <kernelfans@gmail.com> Cc: Milan Broz <gmazyland@gmail.com> Cc: Ondrej Kozina <okozina@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-21cgroup, docs: be specific about bandwidth control of rt processesShashank Balaji
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-19docs: admin-guide: fix typos in reporting-issues.rstShivam Sharma
Correct pin-point to pinpoint, If that the case to If that is the case, and its only slightly modified to it's only slightly modified in Documentation/admin-guide/reporting-issues.rst for proper spelling and grammar. Signed-off-by: Shivam Sharma <10sharmashivam@gmail.com> Acked-by: Thorsten Leemhuis <linux@leemhuis.info> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20250518172658.6983-1-10sharmashivam@gmail.com>
2025-05-16rcutorture: Perform more frequent testing of ->gpwrapJoel Fernandes
Currently, the ->gpwrap is not tested (at all per my testing) due to the requirement of a large delta between a CPU's rdp->gp_seq and its node's rnp->gpseq. This results in no testing of ->gpwrap being set. This patch by default adds 5 minutes of testing with ->gpwrap forced by lowering the delta between rdp->gp_seq and rnp->gp_seq to just 8 GPs. All of this is configurable, including the active time for the setting and a full testing cycle. By default, the first 25 minutes of a test will have the _default_ behavior there is right now (ULONG_MAX / 4) delta. Then for 5 minutes, we switch to a smaller delta causing 1-2 wraps in 5 minutes. I believe this is reasonable since we at least add a little bit of testing for usecases where ->gpwrap is set. [ Apply fix for Dan Carpenter's bug report on init path cleanup. ] [ Apply kernel doc warning fix from Akira Yokosawa. ] Tested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2025-05-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.15-rc7). Conflicts: tools/testing/selftests/drivers/net/hw/ncdevmem.c 97c4e094a4b2 ("tests/ncdevmem: Fix double-free of queue array") 2f1a805f32ba ("selftests: ncdevmem: Implement devmem TCP TX") https://lore.kernel.org/20250514122900.1e77d62d@canb.auug.org.au Adjacent changes: net/core/devmem.c net/core/devmem.h 0afc44d8cdf6 ("net: devmem: fix kernel panic when netlink socket close after module unload") bd61848900bf ("net: devmem: Implement TX path") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-15Documentation: admin-guide: pm: Add documentation for die_idSrinivas Pandruvada
Add documentation to describe die_id attribute. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20250508230250.1186619-6-srinivas.pandruvada@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-15Documentation: admin-guide: pm: Add documentation for agent_typesSrinivas Pandruvada
Add documentation to describe agent_types attribute. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20250508230250.1186619-3-srinivas.pandruvada@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-05-15vfs: Add sysctl vfs_cache_pressure_denom for bulk file operationsYafang Shao
On our HDFS servers with 12 HDDs per server, a HDFS datanode[0] startup involves scanning all files and caching their metadata (including dentries and inodes) in memory. Each HDD contains approximately 2 million files, resulting in a total of ~20 million cached dentries after initialization. To minimize dentry reclamation, we set vfs_cache_pressure to 1. Despite this configuration, memory pressure conditions can still trigger reclamation of up to 50% of cached dentries, reducing the cache from 20 million to approximately 10 million entries. During the subsequent cache rebuild period, any HDFS datanode restart operation incurs substantial latency penalties until full cache recovery completes. To maintain service stability, we need to preserve more dentries during memory reclamation. The current minimum reclaim ratio (1/100 of total dentries) remains too aggressive for our workload. This patch introduces vfs_cache_pressure_denom for more granular cache pressure control. The configuration [vfs_cache_pressure=1, vfs_cache_pressure_denom=10000] effectively maintains the full 20 million dentry cache under memory pressure, preventing datanode restart performance degradation. Link: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#NameNode+and+DataNodes [0] Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/20250511083624.9305-1-laoar.shao@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-13cpufreq: intel_pstate: Document hybrid processor supportRafael J. Wysocki
Describe the support for hybrid processors in intel_pstate, including the CAS and EAS support, in the admin-guide documentation. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/1935040.CQOukoFCf9@rjwysocki.net
2025-05-13Merge commit 'its-for-linus-20250509-merge' into x86/core, to resolve conflictsIngo Molnar
Conflicts: Documentation/admin-guide/hw-vuln/index.rst arch/x86/include/asm/cpufeatures.h arch/x86/kernel/alternative.c arch/x86/kernel/cpu/bugs.c arch/x86/kernel/cpu/common.c drivers/base/cpu.c include/linux/cpu.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-13Merge branch 'x86/microcode' into x86/core, to merge dependent commitsIngo Molnar
Prepare to resolve conflicts with an upstream series of fixes that conflict with pending x86 changes: 6f5bf947bab0 Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-13Documentation: fix typo in root= kernel parameter descriptionPetr Vaněk
Fixes a typo in the root= parameter description, changing "this a a" to "this is a". Fixes: c0c1a7dcb6f5 ("init: move the nfs/cifs/ram special cases out of name_to_dev_t") Signed-off-by: Petr Vaněk <arkamar@atlas.cz> Link: https://lore.kernel.org/20250512110827.32530-1-arkamar@atlas.cz Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-12Documentation: add documentation for KHOAlexander Graf
With KHO in place, let's add documentation that describes what it is and how to use it. Link: https://lkml.kernel.org/r/20250509074635.3187114-17-changyuanl@google.com Signed-off-by: Alexander Graf <graf@amazon.com> Co-developed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Co-developed-by: Changyuan Lyu <changyuanl@google.com> Signed-off-by: Changyuan Lyu <changyuanl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anthony Yznaga <anthony.yznaga@oracle.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ashish Kalra <ashish.kalra@amd.com> Cc: Ben Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Gowans <jgowans@amazon.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Krzysztof Kozlowski <krzk@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pratyush Yadav <ptyadav@amazon.de> Cc: Rob Herring <robh@kernel.org> Cc: Saravana Kannan <saravanak@google.com> Cc: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Thomas Lendacky <thomas.lendacky@amd.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12mm: add max swappiness arg to lru_gen for anonymous memory onlyZhongkun He
The MGLRU already supports reclaiming only from anonymous memory via the /sys/kernel/debug/lru_gen interface. Now, memory.reclaim also supports the swappiness=max parameter to enable reclaiming solely from anonymous memory. To unify the semantics of proactive reclaiming from anonymous folios, the max parameter is introduced. [hezhongkun.hzk@bytedance.com: use strcmp instead of strncmp, if swappiness is not set, use the default value] Link: https://lkml.kernel.org/r/20250507071057.3184240-1-hezhongkun.hzk@bytedance.com [akpm@linux-foundation.org: tweak coding style] Link: https://lkml.kernel.org/r/65181f7745d657d664d833c26d8a94cae40538b9.1745225696.git.hezhongkun.hzk@bytedance.com Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12mm: add swappiness=max arg to memory.reclaim for only anon reclaimZhongkun He
Patch series "add max arg to swappiness in memory.reclaim and lru_gen", v4. This patchset adds max arg to swappiness in memory.reclaim and lru_gen for anon only proactive memory reclaim. With commit <68cd9050d871> ("mm: add swappiness= arg to memory.reclaim") we can submit an additional swappiness=<val> argument to memory.reclaim. It is very useful because we can dynamically adjust the reclamation ratio based on the anonymous folios and file folios of each cgroup. For example,when swappiness is set to 0, we only reclaim from file folios. But we can not relciam memory just from anon folios. This patchset introduces a new macro, SWAPPINESS_ANON_ONLY, defined as MAX_SWAPPINESS + 1, represent the max arg semantics. It specifically indicates that reclamation should occur only from anonymous pages. Patch 1 adds swappiness=max arg to memory.reclaim suggested-by: Yosry Ahmed Patch 2 add more comments for cache_trim_mode from Johannes Weiner in [1]. Patch 3 add max arg to lru_gen for proactive memory reclaim in MGLRU. The MGLRU already supports reclaiming exclusively from anonymous pages. This patch formalizes that behavior by introducing a max parameter to represent the corresponding semantics. Patch 4 using SWAPPINESS_ANON_ONLY in MGLRU Using SWAPPINESS_ANON_ONLY instead of MAX_SWAPPINESS + 1 to indicate reclaiming only from anonymous pages makes the code more readable and explicit Here is the previous discussion: https://lore.kernel.org/all/20250314033350.1156370-1-hezhongkun.hzk@bytedance.com/ https://lore.kernel.org/all/20250312094337.2296278-1-hezhongkun.hzk@bytedance.com/ https://lore.kernel.org/all/20250318135330.3358345-1-hezhongkun.hzk@bytedance.com/ This patch (of 4): With commit <68cd9050d871> ("mm: add swappiness= arg to memory.reclaim") we can submit an additional swappiness=<val> argument to memory.reclaim. It is very useful because we can dynamically adjust the reclamation ratio based on the anonymous folios and file folios of each cgroup. For example,when swappiness is set to 0, we only reclaim from file folios. However,we have also encountered a new issue: when swappiness is set to the MAX_SWAPPINESS, it may still only reclaim file folios. So, we hope to add a new arg 'swappiness=max' in memory.reclaim where proactive memory reclaim only reclaims from anonymous folios when swappiness is set to max. The swappiness semantics from a user perspective remain unchanged. For example, something like this: echo "2M swappiness=max" > /sys/fs/cgroup/memory.reclaim will perform reclaim on the rootcg with a swappiness setting of 'max' (a new mode) regardless of the file folios. Users have a more comprehensive view of the application's memory distribution because there are many metrics available. For example, if we find that a certain cgroup has a large number of inactive anon folios, we can reclaim only those and skip file folios, because with the zram/zswap, the IO tradeoff that cache_trim_mode or other file first logic is making doesn't hold - file refaults will cause IO, whereas anon decompression will not. With this patch, the swappiness argument of memory.reclaim has a new mode 'max', means reclaiming just from anonymous folios both in traditional LRU and MGLRU. Link: https://lkml.kernel.org/r/cover.1745225696.git.hezhongkun.hzk@bytedance.com Link: https://lore.kernel.org/all/20250314141833.GA1316033@cmpxchg.org/ [1] Link: https://lkml.kernel.org/r/519e12b9b1f8c31a01e228c8b4b91a2419684f77.1745225696.git.hezhongkun.hzk@bytedance.com Signed-off-by: Zhongkun He <hezhongkun.hzk@bytedance.com> Suggested-by: Yosry Ahmed <yosry.ahmed@linux.dev> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12memcg-introduce-non-blocking-limit-setting-option-v3Shakeel Butt
add more explanation in doc and commit message on O_NONBLOCK side-effects (Johannes) Link: https://lkml.kernel.org/r/20250506232833.3109790-1-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Greg Thelen <gthelen@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12memcg: introduce non-blocking limit setting optionShakeel Butt
Setting the max and high limits can trigger synchronous reclaim and/or oom-kill if the usage is higher than the given limit. This behavior is fine for newly created cgroups but it can cause issues for the node controller while setting limits for existing cgroups. In our production multi-tenant and overcommitted environment, we are seeing priority inversion when the node controller dynamically adjusts the limits of running jobs of different priorities. Based on the system situation, the node controller may reduce the limits of lower priority jobs and increase the limits of higher priority jobs. However we are seeing node controller getting stuck for long period of time while reclaiming from lower priority jobs while setting their limits and also spends a lot of its own CPU. One of the workaround we are trying is to fork a new process which sets the limit of the lower priority job along with setting an alarm to get itself killed if it get stuck in the reclaim for lower priority job. However we are finding it very unreliable and costly. Either we need a good enough time buffer for the alarm to be delivered after setting limit and potentialy spend a lot of CPU in the reclaim or be unreliable in setting the limit for much shorter but cheaper (less reclaim) alarms. Let's introduce new limit setting option which does not trigger reclaim and/or oom-kill and let the processes in the target cgroup to trigger reclaim and/or throttling and/or oom-kill in their next charge request. This will make the node controller on multi-tenant overcommitted environment much more reliable. Explanation from Johannes on side-effects of O_NONBLOCK limit change: It's usually the allocating tasks inside the group bearing the cost of limit enforcement and reclaim. This allows a (privileged) updater from outside the group to keep that cost in there - instead of having to help, from a context that doesn't necessarily make sense. I suppose the tradeoff with that - and the reason why this was doing sync reclaim in the first place - is that, if the group is idle and not trying to allocate more, it can take indefinitely for the new limit to actually be met. It should be okay in most scenarios in practice. As the capacity is reallocated from group A to B, B will exert pressure on A once it tries to claim it and thereby shrink it down. If A is idle, that shouldn't be hard. If A is running, it's likely to fault/allocate soon-ish and then join the effort. It does leave a (malicious) corner case where A is just busy-hitting its memory to interfere with the clawback. This is comparable to reclaiming memory.low overage from the outside, though, which is an acceptable risk. Users of O_NONBLOCK just need to be aware. Link: https://lkml.kernel.org/r/20250419183545.1982187-1-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Thelen <gthelen@google.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Yosry Ahmed <yosry.ahmed@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12Update Christoph's Email address and make it consistentChristoph Lameter (Ampere)
Use cl@gentwo.org throughout and remove the old email addresses. Link: https://lkml.kernel.org/r/8b962f57-4d98-cbb0-cd82-b6ba456733e8@gentwo.org Signed-off-by: Christoph Lameter <cl@gentwo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-12Docs/admin-guide/mm/damon/usage: document 'nid' fileSeongJae Park
Add description of 'nid' file, which is optionally used for specific DAMOS quota goal metrics such as node_mem_{used,free}_bp on DAMON usage document. Link: https://lkml.kernel.org/r/20250420194030.75838-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Yunjeong Mun <yunjeong.mun@sk.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11crash: export PAGE_UNACCEPTED_MAPCOUNT_VALUE to vmcoreinfoZhiquan Li
On Intel TDX guest, unaccepted memory is unusable free memory which is not managed by buddy, until it's accepted by guest. Before that, it cannot be accessed by the first kernel as well as the kexec'ed kernel. The kexec'ed kernel will skip these pages and fill in zero data for the reader of vmcore. The dump tool like makedumpfile creates a page descriptor (size 24 bytes) for each non-free page, including zero data page, but it will not create descriptor for free pages. If it is not able to distinguish these unaccepted pages with zero data pages, a certain amount of space will be wasted in proportion (~1/170). In fact, as a special kind of free page the unaccepted pages should be excluded, like the real free pages. Export the page type PAGE_UNACCEPTED_MAPCOUNT_VALUE to vmcoreinfo, so that dump tool can identify whether a page is unaccepted. [zhiquan1.li@intel.com: fix docs: "Title underline too short" warning] Link: https://lore.kernel.org/all/20240809114854.3745464-5-kirill.shutemov@linux.intel.com/ Link: https://lkml.kernel.org/r/20250405060610.860465-1-zhiquan1.li@intel.com Link: https://lore.kernel.org/all/20240809114854.3745464-5-kirill.shutemov@linux.intel.com/ Link: https://lkml.kernel.org/r/20250403030801.758687-1-zhiquan1.li@intel.com Signed-off-by: Zhiquan Li <zhiquan1.li@intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Zhiquan Li <zhiquan1.li@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11Documentation: zram: update IDLE pages tracking documentationSergey Senozhatsky
Move IDLE pages tracking into a separate chapter because there are multiple features that use (or depend on) it either in built-in variant ("mark all") or in extended variant (ac-time tracking). In addition, recompression doesn't require memory tracking to be enabled in order to be able to perform idle recompression. Link: https://lkml.kernel.org/r/20250416042833.3858827-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reported-by: Shin Kawamura <kawasin@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regionsAndrei Vagin
Patch series "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions", v2. Introduce the PAGE_IS_GUARD flag in the PAGEMAP_SCAN ioctl to expose information about guard regions. This allows userspace tools, such as CRIU, to detect and handle guard regions. Currently, CRIU utilizes PAGEMAP_SCAN as a more efficient alternative to parsing /proc/pid/pagemap. Without this change, guard regions are incorrectly reported as swap-anon regions, leading CRIU to attempt dumping them and subsequently failing. The series includes updates to the documentation and selftests to reflect the new functionality. This patch (of 3): Introduce the PAGE_IS_GUARD flag in the PAGEMAP_SCAN ioctl to expose information about guard regions. This allows userspace tools, such as CRIU, to detect and handle guard regions. Link: https://lkml.kernel.org/r/20250324065328.107678-1-avagin@google.com Link: https://lkml.kernel.org/r/20250324065328.107678-2-avagin@google.com Signed-off-by: Andrei Vagin <avagin@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11mm/compaction: reduce the difference between low and high watermarksMichal Clapinski
Reduce the diff between low and high watermarks when compaction proactiveness is set to high. This allows users who set the proactiveness really high to have more stable fragmentation score over time. Link: https://lkml.kernel.org/r/20250404111103.1994507-3-mclapinski@google.com Signed-off-by: Michal Clapinski <mclapinski@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>