Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening fixes from Kees Cook:
- randstruct: gcc-plugin: Fix attribute addition with GCC 15
- ubsan: integer-overflow: depend on BROKEN to keep this out of CI
- overflow: Introduce __DEFINE_FLEX for having no initializer
- wifi: iwlwifi: mld: Work around Clang loop unrolling bug
[ Take two after a jump scare due to some repo rewriting by 'b4' - Linus ]
* tag 'hardening-v6.16-rc1-fix1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
randstruct: gcc-plugin: Fix attribute addition
overflow: Introduce __DEFINE_FLEX for having no initializer
ubsan: integer-overflow: depend on BROKEN to keep this out of CI
wifi: iwlwifi: mld: Work around Clang loop unrolling bug
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- "hung_task: extend blocking task stacktrace dump to semaphore" from
Lance Yang enhances the hung task detector.
The detector presently dumps the blocking tasks's stack when it is
blocked on a mutex. Lance's series extends this to semaphores
- "nilfs2: improve sanity checks in dirty state propagation" from
Wentao Liang addresses a couple of minor flaws in nilfs2
- "scripts/gdb: Fixes related to lx_per_cpu()" from Illia Ostapyshyn
fixes a couple of issues in the gdb scripts
- "Support kdump with LUKS encryption by reusing LUKS volume keys" from
Coiby Xu addresses a usability problem with kdump.
When the dump device is LUKS-encrypted, the kdump kernel may not have
the keys to the encrypted filesystem. A full writeup of this is in
the series [0/N] cover letter
- "sysfs: add counters for lockups and stalls" from Max Kellermann adds
/sys/kernel/hardlockup_count and /sys/kernel/hardlockup_count and
/sys/kernel/rcu_stall_count
- "fork: Page operation cleanups in the fork code" from Pasha Tatashin
implements a number of code cleanups in fork.c
- "scripts/gdb/symbols: determine KASLR offset on s390 during early
boot" from Ilya Leoshkevich fixes some s390 issues in the gdb
scripts
* tag 'mm-nonmm-stable-2025-05-31-15-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (67 commits)
llist: make llist_add_batch() a static inline
delayacct: remove redundant code and adjust indentation
squashfs: add optional full compressed block caching
crash_dump, nvme: select CONFIGFS_FS as built-in
scripts/gdb/symbols: determine KASLR offset on s390 during early boot
scripts/gdb/symbols: factor out pagination_off()
scripts/gdb/symbols: factor out get_vmlinux()
kernel/panic.c: format kernel-doc comments
mailmap: update and consolidate Casey Connolly's name and email
nilfs2: remove wbc->for_reclaim handling
fork: define a local GFP_VMAP_STACK
fork: check charging success before zeroing stack
fork: clean-up naming of vm_stack/vm_struct variables in vmap stacks code
fork: clean-up ifdef logic around stack allocation
kernel/rcu/tree_stall: add /sys/kernel/rcu_stall_count
kernel/watchdog: add /sys/kernel/{hard,soft}lockup_count
x86/crash: make the page that stores the dm crypt keys inaccessible
x86/crash: pass dm crypt keys to kdump kernel
Revert "x86/mm: Remove unused __set_memory_prot()"
crash_dump: retrieve dm crypt keys in kdump kernel
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "Add folio_mk_pte()" from Matthew Wilcox simplifies the act of
creating a pte which addresses the first page in a folio and reduces
the amount of plumbing which architecture must implement to provide
this.
- "Misc folio patches for 6.16" from Matthew Wilcox is a shower of
largely unrelated folio infrastructure changes which clean things up
and better prepare us for future work.
- "memory,x86,acpi: hotplug memory alignment advisement" from Gregory
Price adds early-init code to prevent x86 from leaving physical
memory unused when physical address regions are not aligned to memory
block size.
- "mm/compaction: allow more aggressive proactive compaction" from
Michal Clapinski provides some tuning of the (sadly, hard-coded (more
sadly, not auto-tuned)) thresholds for our invokation of proactive
compaction. In a simple test case, the reduction of a guest VM's
memory consumption was dramatic.
- "Minor cleanups and improvements to swap freeing code" from Kemeng
Shi provides some code cleaups and a small efficiency improvement to
this part of our swap handling code.
- "ptrace: introduce PTRACE_SET_SYSCALL_INFO API" from Dmitry Levin
adds the ability for a ptracer to modify syscalls arguments. At this
time we can alter only "system call information that are used by
strace system call tampering, namely, syscall number, syscall
arguments, and syscall return value.
This series should have been incorporated into mm.git's "non-MM"
branch, but I goofed.
- "fs/proc: extend the PAGEMAP_SCAN ioctl to report guard regions" from
Andrei Vagin extends the info returned by the PAGEMAP_SCAN ioctl
against /proc/pid/pagemap. This permits CRIU to more efficiently get
at the info about guard regions.
- "Fix parameter passed to page_mapcount_is_type()" from Gavin Shan
implements that fix. No runtime effect is expected because
validate_page_before_insert() happens to fix up this error.
- "kernel/events/uprobes: uprobe_write_opcode() rewrite" from David
Hildenbrand basically brings uprobe text poking into the current
decade. Remove a bunch of hand-rolled implementation in favor of
using more current facilities.
- "mm/ptdump: Drop assumption that pxd_val() is u64" from Anshuman
Khandual provides enhancements and generalizations to the pte dumping
code. This might be needed when 128-bit Page Table Descriptors are
enabled for ARM.
- "Always call constructor for kernel page tables" from Kevin Brodsky
ensures that the ctor/dtor is always called for kernel pgtables, as
it already is for user pgtables.
This permits the addition of more functionality such as "insert hooks
to protect page tables". This change does result in various
architectures performing unnecesary work, but this is fixed up where
it is anticipated to occur.
- "Rust support for mm_struct, vm_area_struct, and mmap" from Alice
Ryhl adds plumbing to permit Rust access to core MM structures.
- "fix incorrectly disallowed anonymous VMA merges" from Lorenzo
Stoakes takes advantage of some VMA merging opportunities which we've
been missing for 15 years.
- "mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE" from
SeongJae Park optimizes process_madvise()'s TLB flushing.
Instead of flushing each address range in the provided iovec, we
batch the flushing across all the iovec entries. The syscall's cost
was approximately halved with a microbenchmark which was designed to
load this particular operation.
- "Track node vacancy to reduce worst case allocation counts" from
Sidhartha Kumar makes the maple tree smarter about its node
preallocation.
stress-ng mmap performance increased by single-digit percentages and
the amount of unnecessarily preallocated memory was dramaticelly
reduced.
- "mm/gup: Minor fix, cleanup and improvements" from Baoquan He removes
a few unnecessary things which Baoquan noted when reading the code.
- ""Enhance sysfs handling for memory hotplug in weighted interleave"
from Rakie Kim "enhances the weighted interleave policy in the memory
management subsystem by improving sysfs handling, fixing memory
leaks, and introducing dynamic sysfs updates for memory hotplug
support". Fixes things on error paths which we are unlikely to hit.
- "mm/damon: auto-tune DAMOS for NUMA setups including tiered memory"
from SeongJae Park introduces new DAMOS quota goal metrics which
eliminate the manual tuning which is required when utilizing DAMON
for memory tiering.
- "mm/vmalloc.c: code cleanup and improvements" from Baoquan He
provides cleanups and small efficiency improvements which Baoquan
found via code inspection.
- "vmscan: enforce mems_effective during demotion" from Gregory Price
changes reclaim to respect cpuset.mems_effective during demotion when
possible. because presently, reclaim explicitly ignores
cpuset.mems_effective when demoting, which may cause the cpuset
settings to violated.
This is useful for isolating workloads on a multi-tenant system from
certain classes of memory more consistently.
- "Clean up split_huge_pmd_locked() and remove unnecessary folio
pointers" from Gavin Guo provides minor cleanups and efficiency gains
in in the huge page splitting and migrating code.
- "Use kmem_cache for memcg alloc" from Huan Yang creates a slab cache
for `struct mem_cgroup', yielding improved memory utilization.
- "add max arg to swappiness in memory.reclaim and lru_gen" from
Zhongkun He adds a new "max" argument to the "swappiness=" argument
for memory.reclaim MGLRU's lru_gen.
This directs proactive reclaim to reclaim from only anon folios
rather than file-backed folios.
- "kexec: introduce Kexec HandOver (KHO)" from Mike Rapoport is the
first step on the path to permitting the kernel to maintain existing
VMs while replacing the host kernel via file-based kexec. At this
time only memblock's reserve_mem is preserved.
- "mm: Introduce for_each_valid_pfn()" from David Woodhouse provides
and uses a smarter way of looping over a pfn range. By skipping
ranges of invalid pfns.
- "sched/numa: Skip VMA scanning on memory pinned to one NUMA node via
cpuset.mems" from Libo Chen removes a lot of pointless VMA scanning
when a task is pinned a single NUMA mode.
Dramatic performance benefits were seen in some real world cases.
- "JFS: Implement migrate_folio for jfs_metapage_aops" from Shivank
Garg addresses a warning which occurs during memory compaction when
using JFS.
- "move all VMA allocation, freeing and duplication logic to mm" from
Lorenzo Stoakes moves some VMA code from kernel/fork.c into the more
appropriate mm/vma.c.
- "mm, swap: clean up swap cache mapping helper" from Kairui Song
provides code consolidation and cleanups related to the folio_index()
function.
- "mm/gup: Cleanup memfd_pin_folios()" from Vishal Moola does that.
- "memcg: Fix test_memcg_min/low test failures" from Waiman Long
addresses some bogus failures which are being reported by the
test_memcontrol selftest.
- "eliminate mmap() retry merge, add .mmap_prepare hook" from Lorenzo
Stoakes commences the deprecation of file_operations.mmap() in favor
of the new file_operations.mmap_prepare().
The latter is more restrictive and prevents drivers from messing with
things in ways which, amongst other problems, may defeat VMA merging.
- "memcg: decouple memcg and objcg stocks"" from Shakeel Butt decouples
the per-cpu memcg charge cache from the objcg's one.
This is a step along the way to making memcg and objcg charging
NMI-safe, which is a BPF requirement.
- "mm/damon: minor fixups and improvements for code, tests, and
documents" from SeongJae Park is yet another batch of miscellaneous
DAMON changes. Fix and improve minor problems in code, tests and
documents.
- "memcg: make memcg stats irq safe" from Shakeel Butt converts memcg
stats to be irq safe. Another step along the way to making memcg
charging and stats updates NMI-safe, a BPF requirement.
- "Let unmap_hugepage_range() and several related functions take folio
instead of page" from Fan Ni provides folio conversions in the
hugetlb code.
* tag 'mm-stable-2025-05-31-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (285 commits)
mm: pcp: increase pcp->free_count threshold to trigger free_high
mm/hugetlb: convert use of struct page to folio in __unmap_hugepage_range()
mm/hugetlb: refactor __unmap_hugepage_range() to take folio instead of page
mm/hugetlb: refactor unmap_hugepage_range() to take folio instead of page
mm/hugetlb: pass folio instead of page to unmap_ref_private()
memcg: objcg stock trylock without irq disabling
memcg: no stock lock for cpu hot-unplug
memcg: make __mod_memcg_lruvec_state re-entrant safe against irqs
memcg: make count_memcg_events re-entrant safe against irqs
memcg: make mod_memcg_state re-entrant safe against irqs
memcg: move preempt disable to callers of memcg_rstat_updated
memcg: memcg_rstat_updated re-entrant safe against irqs
mm: khugepaged: decouple SHMEM and file folios' collapse
selftests/eventfd: correct test name and improve messages
alloc_tag: check mem_profiling_support in alloc_tag_init
Docs/damon: update titles and brief introductions to explain DAMOS
selftests/damon/_damon_sysfs: read tried regions directories in order
mm/damon/tests/core-kunit: add a test for damos_set_filters_default_reject()
mm/damon/paddr: remove unused variable, folio_list, in damon_pa_stat()
mm/damon/sysfs-schemes: fix wrong comment on damons_sysfs_quota_goal_metric_strs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull compiler version requirement update from Arnd Bergmann:
"Require gcc-8 and binutils-2.30
x86 already uses gcc-8 as the minimum version, this changes all other
architectures to the same version. gcc-8 is used is Debian 10 and Red
Hat Enterprise Linux 8, both of which are still supported, and
binutils 2.30 is the oldest corresponding version on those.
Ubuntu Pro 18.04 and SUSE Linux Enterprise Server 15 both use gcc-7 as
the system compiler but additionally include toolchains that remain
supported.
With the new minimum toolchain versions, a number of workarounds for
older versions can be dropped, in particular on x86_64 and arm64.
Importantly, the updated compiler version allows removing two of the
five remaining gcc plugins, as support for sancov and structeak
features is already included in modern compiler versions.
I tried collecting the known changes that are possible based on the
new toolchain version, but expect that more cleanups will be possible.
Since this touches multiple architectures, I merged the patches
through the asm-generic tree."
* tag 'gcc-minimum-version-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
Makefile.kcov: apply needed compiler option unconditionally in CFLAGS_KCOV
Documentation: update binutils-2.30 version reference
gcc-plugins: remove SANCOV gcc plugin
Kbuild: remove structleak gcc plugin
arm64: drop binutils version checks
raid6: skip avx512 checks
kbuild: require gcc-8 and binutils-2.30
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
"DT Bindings:
- Convert all remaining interrupt-controller bindings to DT schema
- Convert Rockchip CDN-DP and Freescale TCON, M4IF, TigerP, LDB, PPC
PMC, imx-drm, and ftm-quaddec to DT schema
- Add bindings for fsl,vf610-pit, fsl,ls1021a-wdt, sgx,vz89te,
maxim,max30208, ti,lp8864, and fairphone,fp5-sndcard
- Add top-level constraints for renesas,vsp1 and renesas,fcp
- Add missing constraint in amlogic,pinctrl-a4 'group' nodes
- Adjust the allowed properties for dwc3-xilinx, sony,imx219,
pci-iommu, and renesas,dsi
- Add EcoNet vendor prefix
- Fix the reserved-memory.yaml in fsl,qman-fqd
- Drop obsolete numa.txt and cpu-topology.txt which are schemas in
dtschema now
- Drop Renesas RZ/N1S bindings
- Ensure Arm cpu nodes don't allow undocumented properties. Add all
the properties which are in use and undocumented. Drop the Mediatek
cpufreq binding which is not a binding, but just what DT properties
the driver uses.
- Add compatibles for Renesas RZ/G3E and RZ/V2N Mali Bifrost GPU
- Update documentation on defining child nodes with separate schemas
- Add bindings to PSCI MAINTAINERS entry
DT core:
- Add new functions to simplify driver handling of 'memory-region'
properties. Users to be added next cycle.
- Simplify of_dma_set_restricted_buffer() to use
of_for_each_phandle()
- Add missing unlock on error in unittest_data_add()"
* tag 'devicetree-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (87 commits)
dt-bindings: timer: Add fsl,vf610-pit.yaml
dt-bindings: gpu: mali-bifrost: Add compatible for RZ/G3E SoC
ASoC: dt-bindings: qcom,sm8250: Add Fairphone 5 sound card
dt-bindings: arm/cpus: Allow 2 power-domains entries
dt-bindings: usb: dwc3-xilinx: allow dma-coherent
media: dt-bindings: sony,imx219: Allow props from video-interface-devices
dt-bindings: soundwire: qcom: Document v2.1.0 version of IP block
dt-bindings: watchdog: fsl-imx-wdt: add compatible string fsl,ls1021a-wdt
dt-bindings: pinctrl: amlogic,pinctrl-a4: Add missing constraint on allowed 'group' node properties
dt-bindings: display: rockchip: Convert cdn-dp-rockchip.txt to yaml
dt-bindings: display: bridge: renesas,dsi: allow properties from dsi-controller
dt-bindings: trivial-devices: Add VZ89TE to trivial
media: dt-bindings: renesas,vsp1: add top-level constraints
media: dt-bindings: renesas,fcp: add top-level constraints
dt-bindings: trivial-devices: Add Maxim max30208
dt-bindings: soc: fsl,qman-fqd: Fix reserved-memory.yaml reference
dt-bindings: interrupt-controller: Convert ti,omap-intc-irq to DT schema
dt-bindings: interrupt-controller: Convert ti,omap4-wugen-mpu to DT schema
dt-bindings: interrupt-controller: Convert ti,keystone-irq to DT schema
dt-bindings: interrupt-controller: Convert technologic,ts4800-irqc to DT schema
...
|
|
Pull kvm updates from Paolo Bonzini:
"As far as x86 goes this pull request "only" includes TDX host support.
Quotes are appropriate because (at 6k lines and 100+ commits) it is
much bigger than the rest, which will come later this week and
consists mostly of bugfixes and selftests. s390 changes will also come
in the second batch.
ARM:
- Add large stage-2 mapping (THP) support for non-protected guests
when pKVM is enabled, clawing back some performance.
- Enable nested virtualisation support on systems that support it,
though it is disabled by default.
- Add UBSAN support to the standalone EL2 object used in nVHE/hVHE
and protected modes.
- Large rework of the way KVM tracks architecture features and links
them with the effects of control bits. While this has no functional
impact, it ensures correctness of emulation (the data is
automatically extracted from the published JSON files), and helps
dealing with the evolution of the architecture.
- Significant changes to the way pKVM tracks ownership of pages,
avoiding page table walks by storing the state in the hypervisor's
vmemmap. This in turn enables the THP support described above.
- New selftest checking the pKVM ownership transition rules
- Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
even if the host didn't have it.
- Fixes for the address translation emulation, which happened to be
rather buggy in some specific contexts.
- Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
from the number of counters exposed to a guest and addressing a
number of issues in the process.
- Add a new selftest for the SVE host state being corrupted by a
guest.
- Keep HCR_EL2.xMO set at all times for systems running with the
kernel at EL2, ensuring that the window for interrupts is slightly
bigger, and avoiding a pretty bad erratum on the AmpereOne HW.
- Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
from a pretty bad case of TLB corruption unless accesses to HCR_EL2
are heavily synchronised.
- Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
tables in a human-friendly fashion.
- and the usual random cleanups.
LoongArch:
- Don't flush tlb if the host supports hardware page table walks.
- Add KVM selftests support.
RISC-V:
- Add vector registers to get-reg-list selftest
- VCPU reset related improvements
- Remove scounteren initialization from VCPU reset
- Support VCPU reset from userspace using set_mpstate() ioctl
x86:
- Initial support for TDX in KVM.
This finally makes it possible to use the TDX module to run
confidential guests on Intel processors. This is quite a large
series, including support for private page tables (managed by the
TDX module and mirrored in KVM for efficiency), forwarding some
TDVMCALLs to userspace, and handling several special VM exits from
the TDX module.
This has been in the works for literally years and it's not really
possible to describe everything here, so I'll defer to the various
merge commits up to and including commit 7bcf7246c42a ('Merge
branch 'kvm-tdx-finish-initial' into HEAD')"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits)
x86/tdx: mark tdh_vp_enter() as __flatten
Documentation: virt/kvm: remove unreferenced footnote
RISC-V: KVM: lock the correct mp_state during reset
KVM: arm64: Fix documentation for vgic_its_iter_next()
KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
KVM: arm64: Stage-2 huge mappings for np-guests
KVM: arm64: Add a range to pkvm_mappings
KVM: arm64: Convert pkvm_mappings to interval tree
KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
KVM: arm64: Add a range to __pkvm_host_unshare_guest()
KVM: arm64: Add a range to __pkvm_host_share_guest()
KVM: arm64: Introduce for_each_hyp_page
KVM: arm64: Handle huge mappings for np-guest CMOs
KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
RISC-V: KVM: Remove scounteren initialization
KVM: RISC-V: remove unnecessary SBI reset state
...
|
|
Depending on !COMPILE_TEST isn't sufficient to keep this feature out of
CI because we can't stop it from being included in randconfig builds.
This feature is still highly experimental, and is developed in lock-step
with Clang's Overflow Behavior Types[1]. Depend on BROKEN to keep it
from being enabled by anyone not expecting it.
Link: https://discourse.llvm.org/t/rfc-v2-clang-introduce-overflowbehaviortypes-for-wrapping-and-non-wrapping-arithmetic/86507 [1]
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202505281024.f42beaa7-lkp@intel.com
Fixes: 557f8c582a9b ("ubsan: Reintroduce signed overflow sanitizer")
Acked-by: Eric Biggers <ebiggers@kernel.org>
Link: https://lore.kernel.org/r/20250528182616.work.296-kees@kernel.org
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Implement the Device Memory TCP transmit path, allowing zero-copy
data transmission on top of TCP from e.g. GPU memory to the wire.
- Move all the IPv6 routing tables management outside the RTNL scope,
under its own lock and RCU. The route control path is now 3x times
faster.
- Convert queue related netlink ops to instance lock, reducing again
the scope of the RTNL lock. This improves the control plane
scalability.
- Refactor the software crc32c implementation, removing unneeded
abstraction layers and improving significantly the related
micro-benchmarks.
- Optimize the GRO engine for UDP-tunneled traffic, for a 10%
performance improvement in related stream tests.
- Cover more per-CPU storage with local nested BH locking; this is a
prep work to remove the current per-CPU lock in local_bh_disable()
on PREMPT_RT.
- Introduce and use nlmsg_payload helper, combining buffer bounds
verification with accessing payload carried by netlink messages.
Netfilter:
- Rewrite the procfs conntrack table implementation, improving
considerably the dump performance. A lot of user-space tools still
use this interface.
- Implement support for wildcard netdevice in netdev basechain and
flowtables.
- Integrate conntrack information into nft trace infrastructure.
- Export set count and backend name to userspace, for better
introspection.
BPF:
- BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops
programs and can be controlled in similar way to traditional qdiscs
using the "tc qdisc" command.
- Refactor the UDP socket iterator, addressing long standing issues
WRT duplicate hits or missed sockets.
Protocols:
- Improve TCP receive buffer auto-tuning and increase the default
upper bound for the receive buffer; overall this improves the
single flow maximum thoughput on 200Gbs link by over 60%.
- Add AFS GSSAPI security class to AF_RXRPC; it provides transport
security for connections to the AFS fileserver and VL server.
- Improve TCP multipath routing, so that the sources address always
matches the nexthop device.
- Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS,
and thus preventing DoS caused by passing around problematic FDs.
- Retire DCCP socket. DCCP only receives updates for bugs, and major
distros disable it by default. Its removal allows for better
organisation of TCP fields to reduce the number of cache lines hit
in the fast path.
- Extend TCP drop-reason support to cover PAWS checks.
Driver API:
- Reorganize PTP ioctl flag support to require an explicit opt-in for
the drivers, avoiding the problem of drivers not rejecting new
unsupported flags.
- Converted several device drivers to timestamping APIs.
- Introduce per-PHY ethtool dump helpers, improving the support for
dump operations targeting PHYs.
Tests and tooling:
- Add support for classic netlink in user space C codegen, so that
ynl-c can now read, create and modify links, routes addresses and
qdisc layer configuration.
- Add ynl sub-types for binary attributes, allowing ynl-c to output
known struct instead of raw binary data, clarifying the classic
netlink output.
- Extend MPTCP selftests to improve the code-coverage.
- Add tests for XDP tail adjustment in AF_XDP.
New hardware / drivers:
- OpenVPN virtual driver: offload OpenVPN data channels processing to
the kernel-space, increasing the data transfer throughput WRT the
user-space implementation.
- Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC.
- Broadcom asp-v3.0 ethernet driver.
- AMD Renoir ethernet device.
- ReakTek MT9888 2.5G ethernet PHY driver.
- Aeonsemi 10G C45 PHYs driver.
Drivers:
- Ethernet high-speed NICs:
- nVidia/Mellanox (mlx5):
- refactor the steering table handling to significantly
reduce the amount of memory used
- add support for complex matches in H/W flow steering
- improve flow streeing error handling
- convert to netdev instance locking
- Intel (100G, ice, igb, ixgbe, idpf):
- ice: add switchdev support for LLDP traffic over VF
- ixgbe: add firmware manipulation and regions devlink support
- igb: introduce support for frame transmission premption
- igb: adds persistent NAPI configuration
- idpf: introduce RDMA support
- idpf: add initial PTP support
- Meta (fbnic):
- extend hardware stats coverage
- add devlink dev flash support
- Broadcom (bnxt):
- add support for RX-side device memory TCP
- Wangxun (txgbe):
- implement support for udp tunnel offload
- complete PTP and SRIOV support for AML 25G/10G devices
- Ethernet NICs embedded and virtual:
- Google (gve):
- add device memory TCP TX support
- Amazon (ena):
- support persistent per-NAPI config
- Airoha:
- add H/W support for L2 traffic offload
- add per flow stats for flow offloading
- RealTek (rtl8211): add support for WoL magic packet
- Synopsys (stmmac):
- dwmac-socfpga 1000BaseX support
- add Loongson-2K3000 support
- introduce support for hardware-accelerated VLAN stripping
- Broadcom (bcmgenet):
- expose more H/W stats
- Freescale (enetc, dpaa2-eth):
- enetc: add MAC filter, VLAN filter RSS and loopback support
- dpaa2-eth: convert to H/W timestamping APIs
- vxlan: convert FDB table to rhashtable, for better scalabilty
- veth: apply qdisc backpressure on full ring to reduce TX drops
- Ethernet switches:
- Microchip (kzZ88x3): add ETS scheduler support
- Ethernet PHYs:
- RealTek (rtl8211):
- add support for WoL magic packet
- add support for PHY LEDs
- CAN:
- Adds RZ/G3E CANFD support to the rcar_canfd driver.
- Preparatory work for CAN-XL support.
- Add self-tests framework with support for CAN physical interfaces.
- WiFi:
- mac80211:
- scan improvements with multi-link operation (MLO)
- Qualcomm (ath12k):
- enable AHB support for IPQ5332
- add monitor interface support to QCN9274
- add multi-link operation support to WCN7850
- add 802.11d scan offload support to WCN7850
- monitor mode for WCN7850, better 6 GHz regulatory
- Qualcomm (ath11k):
- restore hibernation support
- MediaTek (mt76):
- WiFi-7 improvements
- implement support for mt7990
- Intel (iwlwifi):
- enhanced multi-link single-radio (EMLSR) support on 5 GHz links
- rework device configuration
- RealTek (rtw88):
- improve throughput for RTL8814AU
- RealTek (rtw89):
- add multi-link operation support
- STA/P2P concurrency improvements
- support different SAR configs by antenna
- Bluetooth:
- introduce HCI Driver protocol
- btintel_pcie: do not generate coredump for diagnostic events
- btusb: add HCI Drv commands for configuring altsetting
- btusb: add RTL8851BE device 0x0bda:0xb850
- btusb: add new VID/PID 13d3/3584 for MT7922
- btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925
- btnxpuart: implement host-wakeup feature"
* tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits)
selftests/bpf: Fix bpf selftest build warning
selftests: netfilter: Fix skip of wildcard interface test
net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames
net: openvswitch: Fix the dead loop of MPLS parse
calipso: Don't call calipso functions for AF_INET sk.
selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem
net_sched: hfsc: Address reentrant enqueue adding class to eltree twice
octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback
octeontx2-pf: QOS: Perform cache sync on send queue teardown
net: mana: Add support for Multi Vports on Bare metal
net: devmem: ncdevmem: remove unused variable
net: devmem: ksft: upgrade rx test to send 1K data
net: devmem: ksft: add 5 tuple FS support
net: devmem: ksft: add exit_wait to make rx test pass
net: devmem: ksft: add ipv4 support
net: devmem: preserve sockc_err
page_pool: fix ugly page_pool formatting
net: devmem: move list_add to net_devmem_bind_dmabuf.
selftests: netfilter: nft_queue.sh: include file transfer duration in log message
net: phy: mscc: Fix memory leak when using one step timestamping
...
|
|
Pull drm updates from Dave Airlie:
"As part of building up nova-core/nova-drm pieces we've brought in some
rust abstractions through this tree, aux bus being the main one, with
devres changes also in the driver-core tree. Along with the drm core
abstractions and enough nova-core/nova-drm to use them. This is still
all stub work under construction, to build the nova driver upstream.
The other big NVIDIA related one is nouveau adds support for
Hopper/Blackwell GPUs, this required a new GSP firmware update to
570.144, and a bunch of rework in order to support multiple fw
interfaces.
There is also the introduction of an asahi uapi header file as a
precursor to getting the real driver in later, but to unblock
userspace mesa packages while the driver is trapped behind rust
enablement.
Otherwise it's the usual mixture of stuff all over, amdgpu, i915/xe,
and msm being the main ones, and some changes to vsprintf.
new drivers:
- bring in the asahi uapi header standalone
- nova-drm: stub driver
rust dependencies (for nova-core):
- auxiliary
- bus abstractions
- driver registration
- sample driver
- devres changes from driver-core
- revocable changes
core:
- add Apple fourcc modifiers
- add virtio capset definitions
- extend EXPORT_SYNC_FILE for timeline syncobjs
- convert to devm_platform_ioremap_resource
- refactor shmem helper page pinning
- DP powerup/down link helpers
- extended %p4cc in vsprintf.c to support fourcc prints
- change vsprintf %p4cn to %p4chR, remove %p4cn
- Add drm_file_err function
- IN_FORMATS_ASYNC property
- move sitronix from tiny to their own subdir
rust:
- add drm core infrastructure rust abstractions
(device/driver, ioctl, file, gem)
dma-buf:
- adjust sg handling to not cache map on attach
- allow setting dma-device for import
- Add a helper to sort and deduplicate dma_fence arrays
docs:
- updated drm scheduler docs
- fbdev todo update
- fb rendering
- actual brightness
ttm:
- fix delayed destroy resv object
bridge:
- add kunit tests
- convert tc358775 to atomic
- convert drivers to devm_drm_bridge_alloc
- convert rk3066_hdmi to bridge driver
scheduler:
- add kunit tests
panel:
- refcount panels to improve lifetime handling
- Powertip PH128800T004-ZZA01
- NLT NL13676BC25-03F, Tianma TM070JDHG34-00
- Himax HX8279/HX8279-D DDIC
- Visionox G2647FB105
- Sitronix ST7571
- ZOTAC rotation quirk
vkms:
- allow attaching more displays
i915:
- xe3lpd display updates
- vrr refactor
- intel_display struct conversions
- xe2hpd memory type identification
- add link rate/count to i915_display_info
- cleanup VGA plane handling
- refactor HDCP GSC
- fix SLPC wait boosting reference counting
- add 20ms delay to engine reset
- fix fence release on early probe errors
xe:
- SRIOV updates
- BMG PCI ID update
- support separate firmware for each GT
- SVM fix, prelim SVM multi-device work
- export fan speed
- temp disable d3cold on BMG
- backup VRAM in PM notifier instead of suspend/freeze
- update xe_ttm_access_memory to use GPU for non-visible access
- fix guc_info debugfs for VFs
- use copy_from_user instead of __copy_from_user
- append PCIe gen5 limitations to xe_firmware document
amdgpu:
- DSC cleanup
- DC Scaling updates
- Fused I2C-over-AUX updates
- DMUB updates
- Use drm_file_err in amdgpu
- Enforce isolation updates
- Use new dma_fence helpers
- USERQ fixes
- Documentation updates
- SR-IOV updates
- RAS updates
- PSP 12 cleanups
- GC 9.5 updates
- SMU 13.x updates
- VCN / JPEG SR-IOV updates
amdkfd:
- Update error messages for SDMA
- Userptr updates
- XNACK fixes
radeon:
- CIK doorbell cleanup
nouveau:
- add support for NVIDIA r570 GSP firmware
- enable Hopper/Blackwell support
nova-core:
- fix task list
- register definition infrastructure
- move firmware into own rust module
- register auxiliary device for nova-drm
nova-drm:
- initial driver skeleton
msm:
- GPU:
- ACD (adaptive clock distribution) for X1-85
- drop fictional address_space_size
- improve GMU HFI response time out robustness
- fix crash when throttling during boot
- DPU:
- use single CTL path for flushing on DPU 5.x+
- improve SSPP allocation code for better sharing
- Enabled SmartDMA on SM8150, SC8180X, SC8280XP, SM8550
- Added SAR2130P support
- Disabled DSC support on MSM8937, MSM8917, MSM8953, SDM660
- DP:
- switch to new audio helpers
- better LTTPR handling
- DSI:
- Added support for SA8775P
- Added SAR2130P support
- HDMI:
- Switched to use new helpers for ACR data
- Fixed old standing issue of HPD not working in some cases
amdxdna:
- add dma-buf support
- allow empty command submits
renesas:
- add dma-buf support
- add zpos, alpha, blend support
panthor:
- fail properly for NO_MMAP bos
- add SET_LABEL ioctl
- debugfs BO dumping support
imagination:
- update DT bindings
- support TI AM68 GPU
hibmc:
- improve interrupt handling and HPD support
virtio:
- add panic handler support
rockchip:
- add RK3588 support
- add DP AUX bus panel support
ivpu:
- add heartbeat based hangcheck
mediatek:
- prepares support for MT8195/99 HDMIv2/DDCv2
anx7625:
- improve HPD
tegra:
- speed up firmware loading
* tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel: (1627 commits)
drm/nouveau/tegra: Fix error pointer vs NULL return in nvkm_device_tegra_resource_addr()
drm/xe: Default auto_link_downgrade status to false
drm/xe/guc: Make creation of SLPC debugfs files conditional
drm/i915/display: Add check for alloc_ordered_workqueue() and alloc_workqueue()
drm/i915/dp_mst: Work around Thunderbolt sink disconnect after SINK_COUNT_ESI read
drm/i915/ptl: Use everywhere the correct DDI port clock select mask
drm/nouveau/kms: add support for GB20x
drm/dp: add option to disable zero sized address only transactions.
drm/nouveau: add support for GB20x
drm/nouveau/gsp: add hal for fifo.chan.doorbell_handle
drm/nouveau: add support for GB10x
drm/nouveau/gf100-: track chan progress with non-WFI semaphore release
drm/nouveau/nv50-: separate CHANNEL_GPFIFO handling out from CHANNEL_DMA
drm/nouveau: add helper functions for allocating pinned/cpu-mapped bos
drm/nouveau: add support for GH100
drm/nouveau: improve handling of 64-bit BARs
drm/nouveau/gv100-: switch to volta semaphore methods
drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES
drm/nouveau/gsp: init client VMMs with NV0080_CTRL_DMA_SET_PAGE_DIRECTORY
drm/nouveau/gsp: fetch level shift and PDE from BAR2 VMM
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
- Update overflow helpers to ease refactoring of on-stack flex array
instances (Gustavo A. R. Silva, Kees Cook)
- lkdtm: Use SLAB_NO_MERGE instead of constructors (Harry Yoo)
- Simplify CONFIG_CC_HAS_COUNTED_BY (Jan Hendrik Farr)
- Disable u64 usercopy KUnit test on 32-bit SPARC (Thomas Weißschuh)
- Add missed designated initializers now exposed by fixed randstruct
(Nathan Chancellor, Kees Cook)
- Document compilers versions for __builtin_dynamic_object_size
- Remove ARM_SSP_PER_TASK GCC plugin
- Fix GCC plugin randstruct, add selftests, and restore COMPILE_TEST
builds
- Kbuild: induce full rebuilds when dependencies change with GCC
plugins, the Clang sanitizer .scl file, or the randstruct seed.
- Kbuild: Switch from -Wvla to -Wvla-larger-than=1
- Correct several __nonstring uses for -Wunterminated-string-initialization
* tag 'hardening-v6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
Revert "hardening: Disable GCC randstruct for COMPILE_TEST"
lib/tests: randstruct: Add deep function pointer layout test
lib/tests: Add randstruct KUnit test
randstruct: gcc-plugin: Remove bogus void member
net: qede: Initialize qede_ll_ops with designated initializer
scsi: qedf: Use designated initializer for struct qed_fcoe_cb_ops
md/bcache: Mark __nonstring look-up table
integer-wrap: Force full rebuild when .scl file changes
randstruct: Force full rebuild when seed changes
gcc-plugins: Force full rebuild when plugins change
kbuild: Switch from -Wvla to -Wvla-larger-than=1
hardening: simplify CONFIG_CC_HAS_COUNTED_BY
overflow: Fix direct struct member initialization in _DEFINE_FLEX()
kunit/overflow: Add tests for STACK_FLEX_ARRAY_SIZE() helper
overflow: Add STACK_FLEX_ARRAY_SIZE() helper
input/joystick: magellan: Mark __nonstring look-up table const
watchdog: exar: Shorten identity name to fit correctly
mod_devicetable: Enlarge the maximum platform_device_id name length
overflow: Clarify expectations for getting DEFINE_FLEX variable sizes
compiler_types: Identify compiler versions for __builtin_dynamic_object_size
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl
Pull sysctl updates from Joel Granados:
- Move kern_table members out of kernel/sysctl.c
Moved a subset (tracing, panic, signal, stack_tracer and sparc) out
of the kern_table array. The goal is for kern_table to only have
sysctl elements. All this increases modularity by placing the
ctl_tables closer to where they are used while reducing the chances
of merge conflicts in kernel/sysctl.c.
- Fixed sysctl unit test panic by relocating it to selftests
* tag 'sysctl-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl:
sysctl: Close test ctl_headers with a for loop
sysctl: call sysctl tests with a for loop
sysctl: Add 0012 to test the u8 range check
sysctl: move u8 register test to lib/test_sysctl.c
sparc: mv sparc sysctls into their own file under arch/sparc/kernel
stack_tracer: move sysctl registration to kernel/trace/trace_stack.c
tracing: Move trace sysctls into trace.c
signal: Move signal ctl tables into signal.c
panic: Move panic ctl tables into panic.c
|
|
The function is small enough that it should be, and it's a (very) hot path
for io_uring. Doing this actually reduces my vmlinux text size for my
standard build/test box.
Before:
axboe@r7625 ~/g/linux (test)> size vmlinux
text data bss dec hex filename
19892174 5938310 2470432 28300916 1afd674 vmlinux
After:
axboe@r7625 ~/g/linux (test)> size vmlinux
text data bss dec hex filename
19891878 5938310 2470436 28300624 1afd550 vmlinux
Link: https://lkml.kernel.org/r/f1d104c6-7ac8-457a-a53d-6bb741421b2f@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"These add support for a new feature, Platform Temperature Control
(PTC), to the Intel int340x thermal driver, add support for the Airoha
EN7581 thermal sensor and the IPQ5018 platform, fix up the ACPI
thermal zones handling, fix other assorted issues and clean up code
Specifics:
- Add Platform Temperature Control (PTC) support to the Intel int340x
thermal driver (Srinivas Pandruvada)
- Make the Hisilicon thermal driver compile by default when ARCH_HISI
is set (Krzysztof Kozlowski)
- Clean up printk() format by using %pC instead of %pCn in the
bcm2835 thermal driver (Luca Ceresoli)
- Fix variable name coding style in the AmLogic thermal driver
(Enrique Isidoro Vazquez Ramos)
- Fix missing debugfs entry removal on failure by using the devm_
variant in the LVTS thermal driver (AngeloGioacchino Del Regno)
- Remove the unused lvts_debugfs_exit() function as the devm_ variant
introduced before takes care of removing the debugfs entry in the
LVTS driver (Arnd Bergmann)
- Add the Airoha EN7581 thermal sensor support along with its DT
bindings (Christian Marangi)
- Add ipq5018 compatible string DT binding, cleanup and add its
suppot to the QCom Tsens thermal driver (Sricharan Ramabadhran,
George Moussalem)
- Fix comments typos in the Airoha driver (Christian Marangi, Colin
Ian King)
- Address a sparse warning by making a local variable static in the
QCom thermal driver (George Moussalem)
- Fix the usage of the _SCP control method in the driver for ACPI
thermal zones (Armin Wolf)"
* tag 'thermal-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
thermal: qcom: ipq5018: make ops_ipq5018 struct static
thermal/drivers/airoha: Fix spelling mistake "calibrarion" -> "calibration"
ACPI: thermal: Execute _SCP before reading trip points
ACPI: OSI: Stop advertising support for "3.0 _SCP Extensions"
thermal/drivers/airoha: Fix spelling mistake
thermal/drivers/qcom/tsens: Add support for IPQ5018 tsens
thermal/drivers/qcom/tsens: Add support for tsens v1 without RPM
thermal/drivers/qcom/tsens: Update conditions to strictly evaluate for IP v2+
dt-bindings: thermal: qcom-tsens: Add ipq5018 compatible
thermal/drivers: Add support for Airoha EN7581 thermal sensor
dt-bindings: thermal: Add support for Airoha EN7581 thermal sensor
thermal/drivers/mediatek/lvts: Remove unused lvts_debugfs_exit
thermal/drivers/mediatek/lvts: Fix debugfs unregister on failure
thermal/drivers/amlogic: Rename Uptat to uptat to follow kernel coding style
vsprintf: remove redundant and unused %pCn format specifier
thermal/drivers/bcm2835: Use %pC instead of %pCn
thermal/drivers/hisi: Do not enable by default during compile testing
thermal: int340x: processor_thermal: Platform temperature control documentation
thermal: intel: int340x: Enable platform temperature control
thermal: intel: int340x: Add platform temperature control interface
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"We've received a lot of activities in this cycle, mostly about leaf
driver codes rather than the core part, but with a good mixture of
code cleanups and new driver additions. Below are some highlights:
ASoC:
- Support for automatically enumerating DAIs from standards
conforming SoundWire SDCA devices; not much used as of this
writing, rather for future implementations
- Conversion of quite a few drivers to newer GPIO APIs
- Continued cleanups and helper usages in allover places
- Support for a wider range of Intel AVS platforms
- Support for AMD ACP 7.x platforms, Cirrus Logic CS35L63 and CS48L32
Everest Semiconductor ES8375 and ES8389, Longsoon-1 AC'97
controllers, nVidia Tegra264, Richtek ALC203 and RT9123 and
Rockchip SAI controllers
HD-audio:
- Lots of cleanups of TAS2781 codec drivers
- A new HD-audio control bound via ACPI for Nvidia
- Support for Tegra264, Intel WCL, usual new codec quirks
USB-audio:
- Fix a race at removal of MIDI device
- Pioneer DJM-V10 support, Scarlett2 driver cleanups
Misc:
- Cleanups of deprecated PCI functions
- Removal of unused / dead function codes"
* tag 'sound-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (364 commits)
firmware: cs_dsp: Fix OOB memory read access in KUnit test
ASoC: codecs: add support for ES8375
ASoC: dt-bindings: Add Everest ES8375 audio CODEC
ALSA: hda: acpi: Make driver's match data const static
ALSA: hda: acpi: Use SYSTEM_SLEEP_PM_OPS()
ALSA: atmel: Replace deprecated strcpy() with strscpy()
ALSA: core: fix up bus match const issues.
ASoC: wm_adsp: Make cirrus_dir const
ASoC: tegra: Tegra264 support in isomgr_bw
ASoC: tegra: AHUB: Add Tegra264 support
ASoC: tegra: ADX: Add Tegra264 support
ASoC: tegra: AMX: Add Tegra264 support
ASoC: tegra: I2S: Add Tegra264 support
ASoC: tegra: Update PLL rate for Tegra264
ASoC: tegra: ASRC: Update ARAM address
ASoC: tegra: ADMAIF: Add Tegra264 support
ASoC: tegra: CIF: Add Tegra264 support
dt-bindings: ASoC: Document Tegra264 APE support
dt-bindings: ASoC: admaif: Add missing properties
ASoC: dt-bindings: audio-graph-card2: reference audio-graph routing property
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull rate-limit updates from Paul McKenney:
"lib/ratelimit: Reduce false-positive and silent misses:
- Reduce open-coded use of ratelimit_state structure fields.
- Convert the ->missed field to atomic_t.
- Count misses that are due to lock contention.
- Eliminate jiffies=0 special case.
- Reduce ___ratelimit() false-positive rate limiting (Petr Mladek).
- Allow zero ->burst to hard-disable rate limiting.
- Optimize away atomic operations when a miss is guaranteed.
- Warn if ->interval or ->burst are negative (Petr Mladek).
- Simplify the resulting code.
A smoke test and stress test have been created, but they are not yet
ready for mainline. With luck, we will offer them for the v6.17 merge
window"
* tag 'ratelimit.2025.05.25a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
ratelimit: Drop redundant accesses to burst
ratelimit: Use nolock_ret restructuring to collapse common case code
ratelimit: Use nolock_ret label to collapse lock-failure code
ratelimit: Use nolock_ret label to save a couple of lines of code
ratelimit: Simplify common-case exit path
ratelimit: Warn if ->interval or ->burst are negative
ratelimit: Avoid atomic decrement under lock if already rate-limited
ratelimit: Avoid atomic decrement if already rate-limited
ratelimit: Don't flush misses counter if RATELIMIT_MSG_ON_RELEASE
ratelimit: Force re-initialization when rate-limiting re-enabled
ratelimit: Allow zero ->burst to disable ratelimiting
ratelimit: Reduce ___ratelimit() false-positive rate limiting
ratelimit: Avoid jiffies=0 special case
ratelimit: Count misses due to lock contention
ratelimit: Convert the ->missed field to atomic_t
drm/amd/pm: Avoid open-coded use of ratelimit_state structure's internals
drm/i915: Avoid open-coded use of ratelimit_state structure's ->missed field
random: Avoid open-coded use of ratelimit_state structure's ->missed field
ratelimit: Create functions to handle ratelimit_state internals
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov:
"Carve out the resctrl filesystem-related code into fs/resctrl/ so that
multiple architectures can share the fs API for manipulating their
respective hw resource control implementation.
This is the second step in the work towards sharing the resctrl
filesystem interface, the next one being plugging ARM's MPAM into the
aforementioned fs API"
* tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
MAINTAINERS: Add reviewers for fs/resctrl
x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl
x86/resctrl: Always initialise rid field in rdt_resources_all[]
x86/resctrl: Relax some asm #includes
x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context()
x86/resctrl: Squelch whitespace anomalies in resctrl core code
x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h
x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs
x86/resctrl: Move enum resctrl_event_id to resctrl.h
x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl
fs/resctrl: Add boiler plate for external resctrl code
x86/resctrl: Add 'resctrl' to the title of the resctrl documentation
x86/resctrl: Split trace.h
x86/resctrl: Expand the width of domid by replacing mon_data_bits
x86/resctrl: Add end-marker to the resctrl_event_id enum
x86/resctrl: Move is_mba_sc() out of core.c
x86/resctrl: Drop __init/__exit on assorted symbols
x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point
x86/resctrl: Check all domains are offline in resctrl_exit()
x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_"
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- Enable qemu_config for riscv32, sparc 64-bit, PowerPC 32-bit BE and
64-bit LE
- Enable CONFIG_SPARC32 to clearly differentiate between sparc 32-bit
and 64-bit configurations
- Enable CONFIG_CPU_BIG_ENDIAN to clearly differentiate between powerpc
LE and BE configurations
- Add feature to list available architectures to kunit tool
- Fixes to bugs and changes to documentation
* tag 'linux_kselftest-kunit-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Fix wrong parameter to kunit_deactivate_static_stub()
kunit: tool: add test counts to JSON output
Documentation: kunit: improve example on testing static functions
kunit: executor: Remove const from kunit_filter_suites() allocation type
kunit: qemu_configs: Disable faulting tests on 32-bit SPARC
kunit: qemu_configs: Add 64-bit SPARC configuration
kunit: qemu_configs: sparc: Explicitly enable CONFIG_SPARC32=y
kunit: qemu_configs: Add PowerPC 32-bit BE and 64-bit LE
kunit: qemu_configs: powerpc: Explicitly enable CONFIG_CPU_BIG_ENDIAN=y
kunit: tool: Implement listing of available architectures
kunit: qemu_configs: Add riscv32 config
kunit: configs: Enable CONFIG_INIT_STACK_ALL_PATTERN in all_tests
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Fix memcpy_sglist to handle partially overlapping SG lists
- Use memcpy_sglist to replace null skcipher
- Rename CRYPTO_TESTS to CRYPTO_BENCHMARK
- Flip CRYPTO_MANAGER_DISABLE_TEST into CRYPTO_SELFTESTS
- Hide CRYPTO_MANAGER
- Add delayed freeing of driver crypto_alg structures
Compression:
- Allocate large buffers on first use instead of initialisation in scomp
- Drop destination linearisation buffer in scomp
- Move scomp stream allocation into acomp
- Add acomp scatter-gather walker
- Remove request chaining
- Add optional async request allocation
Hashing:
- Remove request chaining
- Add optional async request allocation
- Move partial block handling into API
- Add ahash support to hmac
- Fix shash documentation to disallow usage in hard IRQs
Algorithms:
- Remove unnecessary SIMD fallback code on x86 and arm/arm64
- Drop avx10_256 xts(aes)/ctr(aes) on x86
- Improve avx-512 optimisations for xts(aes)
- Move chacha arch implementations into lib/crypto
- Move poly1305 into lib/crypto and drop unused Crypto API algorithm
- Disable powerpc/poly1305 as it has no SIMD fallback
- Move sha256 arch implementations into lib/crypto
- Convert deflate to acomp
- Set block size correctly in cbcmac
Drivers:
- Do not use sg_dma_len before mapping in sun8i-ss
- Fix warm-reboot failure by making shutdown do more work in qat
- Add locking in zynqmp-sha
- Remove cavium/zip
- Add support for PCI device 0x17D8 to ccp
- Add qat_6xxx support in qat
- Add support for RK3576 in rockchip-rng
- Add support for i.MX8QM in caam
Others:
- Fix irq_fpu_usable/kernel_fpu_begin inconsistency during CPU bring-up
- Add new SEV/SNP platform shutdown API in ccp"
* tag 'v6.16-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (382 commits)
x86/fpu: Fix irq_fpu_usable() to return false during CPU onlining
crypto: qat - add missing header inclusion
crypto: api - Redo lookup on EEXIST
Revert "crypto: testmgr - Add hash export format testing"
crypto: marvell/cesa - Do not chain submitted requests
crypto: powerpc/poly1305 - add depends on BROKEN for now
Revert "crypto: powerpc/poly1305 - Add SIMD fallback"
crypto: ccp - Add missing tee info reg for teev2
crypto: ccp - Add missing bootloader info reg for pspv5
crypto: sun8i-ce - move fallback ahash_request to the end of the struct
crypto: octeontx2 - Use dynamic allocated memory region for lmtst
crypto: octeontx2 - Initialize cptlfs device info once
crypto: xts - Only add ecb if it is not already there
crypto: lrw - Only add ecb if it is not already there
crypto: testmgr - Add hash export format testing
crypto: testmgr - Use ahash for generic tfm
crypto: hmac - Add ahash support
crypto: testmgr - Ignore EEXIST on shash allocation
crypto: algapi - Add driver template support to crypto_inst_setname
crypto: shash - Set reqsize in shash_alg
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull CRC updates from Eric Biggers:
"Cleanups for the kernel's CRC (cyclic redundancy check) code:
- Use __ro_after_init where appropriate
- Remove unnecessary static_key on s390
- Rename some source code files
- Rename the crc32 and crc32c crypto API modules
- Use subsys_initcall instead of arch_initcall
- Restore maintainers for crc_kunit.c
- Fold crc16_byte() into crc16.c
- Add some SPDX license identifiers"
* tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crc32: add SPDX license identifier
lib/crc16: unexport crc16_table and crc16_byte()
w1: ds2406: use crc16() instead of crc16_byte() loop
MAINTAINERS: add crc_kunit.c back to CRC LIBRARY
lib/crc: make arch-optimized code use subsys_initcall
crypto: crc32 - remove "generic" from file and module names
x86/crc: drop "glue" from filenames
sparc/crc: drop "glue" from filenames
s390/crc: drop "glue" from filenames
powerpc/crc: rename crc32-vpmsum_core.S to crc-vpmsum-template.S
powerpc/crc: drop "glue" from filenames
arm64/crc: drop "glue" from filenames
arm/crc: drop "glue" from filenames
s390/crc32: Remove no-op module init and exit functions
s390/crc32: Remove have_vxrs static key
lib/crc: make the CPU feature static keys __ro_after_init
|
|
When a module gets unloaded it checks whether any of its tags are still in
use and if so, we keep the memory containing module's allocation tags
alive until all tags are unused. However percpu counters referenced by
the tags are freed by free_module(). This will lead to UAF if the memory
allocated by a module is accessed after module was unloaded.
To fix this we allocate percpu counters for module allocation tags
dynamically and we keep it alive for tags which are still in use after
module unloading. This also removes the requirement of a larger
PERCPU_MODULE_RESERVE when memory allocation profiling is enabled because
percpu memory for counters does not need to be reserved anymore.
Link: https://lkml.kernel.org/r/20250517000739.5930-1-surenb@google.com
Fixes: 0db6f8d7820a ("alloc_tag: load module tags into separate contiguous memory")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: David Wang <00107082@163.com>
Closes: https://lore.kernel.org/all/20250516131246.6244-1-00107082@163.com/
Tested-by: David Wang <00107082@163.com>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If mem_profiling_support is false, for example by
sysctl.vm.mem_profiling=never, alloc_tag_init should skip module tags
allocation, codetag type registration and procfs init.
Link: https://lkml.kernel.org/r/20250513182602.121843-1-cachen@purestorage.com
Signed-off-by: Casey Chen <cachen@purestorage.com>
Reviewed-by: Yuanyuan Zhong <yzhong@purestorage.com>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
crc32c_combine() and crc32c_shift() are no longer used (except by the
KUnit test that tests them), and their current implementation is very
slow. Remove them.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://patch.msgid.link/20250519175012.36581-8-ebiggers@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
kunit_deactivate_static_stub() accepts real_fn_addr instead of
replacement_addr. In the case, it always passes NULL to
kunit_deactivate_static_stub().
Fix it.
Link: https://lore.kernel.org/r/20250520082050.2254875-1-tzungbi@kernel.org
Fixes: e047c5eaa763 ("kunit: Expose 'static stub' API to redirect functions")
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
%pC and %pCn print the same string, and commit 900cca294425 ("lib/vsprintf:
add %pC{,n,r} format specifiers for clocks") introducing them does not
clarify any intended difference. It can be assumed %pC is a default for
%pCn as some other specifiers do, but not all are consistent with this
policy. Moreover there is now no other suffix other than 'n', which makes a
default not really useful.
All users in the kernel were using %pC except for one which has been
converted. So now remove %pCn and all the unnecessary extra code and
documentation.
Acked-by: Stephen Boyd <sboyd@kernel.org>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Tested-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Yanteng Si <si.yanteng@linux.dev>
Link: https://lore.kernel.org/r/20250311-vsprintf-pcn-v2-2-0af40fc7dee4@bootlin.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
The function helps to implement cpumask_andnot() APIs.
Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: James Morse <james.morse@arm.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Link: https://lore.kernel.org/20250515165855.31452-3-james.morse@arm.com
|
|
Not all drivers require send_package_data or send_component_table when
updating firmware. Instead of forcing drivers to implement a stub allow
these functions to go undefined.
Signed-off-by: Lee Trager <lee@trager.us>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20250512190109.2475614-2-lee@trager.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
lib/crc32.c and include/linux/crc32.h got missed by the bulk SPDX
conversion because of the nonstandard explanation of the license.
However, crc32.c clearly states that it's licensed under the GNU General
Public License, Version 2. And the comment in crc32.h clearly indicates
that it's meant to have the same license as crc32.c. Therefore, apply
SPDX-License-Identifier: GPL-2.0-only to both files.
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20250514052409.194822-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Now that neither crc16_table nor crc16_byte() is used outside
lib/crc16.c, fold them into lib/crc16.c.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20250513022115.39109-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
Fix the documentation for __xa_cmpxchg to actually describe the
cmpxch-like semantics correctly, based on the version for xa_cmpxchg.
Link: https://lkml.kernel.org/r/20250507051656.3900864-1-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently the full set of crypto self-tests requires
CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y. This is problematic in two ways.
First, developers regularly overlook this option. Second, the
description of the tests as "extra" sometimes gives the impression that
it is not required that all algorithms pass these tests.
Given that the main use case for the crypto self-tests is for
developers, make enabling CONFIG_CRYPTO_SELFTESTS=y just enable the full
set of crypto self-tests by default.
The slow tests can still be disabled by adding the command-line
parameter cryptomgr.noextratests=1, soon to be renamed to
cryptomgr.noslowtests=1. The only known use case for doing this is for
people trying to use the crypto self-tests to satisfy the FIPS 140-3
pre-operational self-testing requirements when the kernel is being
validated as a FIPS 140-3 cryptographic module.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The negative-sense of CRYPTO_MANAGER_DISABLE_TESTS is a longstanding
mistake that regularly causes confusion. Especially bad is that you can
have CRYPTO=n && CRYPTO_MANAGER_DISABLE_TESTS=n, which is ambiguous.
Replace CRYPTO_MANAGER_DISABLE_TESTS with CRYPTO_SELFTESTS which has the
expected behavior.
The tests continue to be disabled by default.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add explicit array bounds to the function prototypes for the parameters
that didn't already get handled by the conversion to use chacha_state:
- chacha_block_*():
Change 'u8 *out' or 'u8 *stream' to u8 out[CHACHA_BLOCK_SIZE].
- hchacha_block_*():
Change 'u32 *out' or 'u32 *stream' to u32 out[HCHACHA_OUT_WORDS].
- chacha_init():
Change 'const u32 *key' to 'const u32 key[CHACHA_KEY_WORDS]'.
Change 'const u8 *iv' to 'const u8 iv[CHACHA_IV_SIZE]'.
No functional changes. This just makes it clear when fixed-size arrays
are expected.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Now that the ChaCha state matrix is strongly-typed, add a helper
function chacha_zeroize_state() which zeroizes it. Then convert all
applicable callers to use it instead of direct memzero_explicit. No
functional changes.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Use struct assignment instead of memcpy() in lib/crypto/chacha.c where
appropriate. No functional change.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The ChaCha state matrix is 16 32-bit words. Currently it is represented
in the code as a raw u32 array, or even just a pointer to u32. This
weak typing is error-prone. Instead, introduce struct chacha_state:
struct chacha_state {
u32 x[16];
};
Convert all ChaCha and HChaCha functions to use struct chacha_state.
No functional changes.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
sprint_OID() was added as part of 2012's commit 4f73175d0375 ("X.509: Add
utility functions to render OIDs as strings") but it hasn't been used.
Remove it.
Note that there's also 'sprint_oid' (lower case) which is used in a lot of
places; that's left as is except for fixing its case in a comment.
Link: https://lkml.kernel.org/r/20250501010502.326472-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Right now test_kmod has hardcoded dependencies on btrfs/xfs. That is not
optimal since you end up needing to select/build them, but it is not
really required since other fs could be selected for the testing. Also,
we can't change the default/driver module used for testing on
initialization.
Thus make it more generic: introduce two module parameters (start_driver
and start_test_fs), which allow to select which modules/fs to use for the
testing on test_kmod initialization. Then it's up to the user to select
which modules/fs to use for testing based on his config. However, keep
test_module as required default.
This way, config/modules becomes selectable as when the testing is done
from selftests (userspace).
While at it, also change trigger_config_run_type, since at module
initialization we already set the defaults at __kmod_config_init and
should not need to do it again in test_kmod_init(), thus we can avoid to
again set test_driver/test_fs.
Link: https://lkml.kernel.org/r/20250418165047.702487-1-herton@redhat.com
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Reviewed-by: Luis Chambelrain <mcgrof@kernel.org>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Petr Pavlu <petr.pavlu@suse.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
sg_next() is a short function called frequently in I/O paths. Define it
in the header file so it can be inlined into its callers.
Link: https://lkml.kernel.org/r/20250416160615.3571958-1-csander@purestorage.com
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Current errseq implementation depends on a very special precondition that
macro MAX_ERRNO must be (2^n - 1).
Eliminate the limitation by
- redefining macro ERRSEQ_SHIFT
- defining a new macro ERRNO_MASK instead of MAX_ERRNO for errno mask.
There is no plan to change the value of MAX_ERRNO, but this makes the
implementation more generic and eliminates the BUILD_BUG_ON().
Link: https://lkml.kernel.org/r/20250407-improve_errseq-v1-1-7b27cbeb8298@quicinc.com
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In some places in the kernel there is a design pattern for sysfs
attributes to use kstrtobool() in store() and str_enabled_disabled() in
show().
This is counterintuitive to interact with because kstrtobool() takes
on/off but str_enabled_disabled() shows enabled/disabled. Some of those
sysfs uses could switch to str_on_off() but for some attributes
enabled/disabled really makes more sense.
Add support for kstrtobool() to accept enabled/disabled.
Link: https://lkml.kernel.org/r/20250321022538.1532445-1-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Replace `sr` with `Sr`. The condition `!tmp1 || rb_is_black(tmp1)`
ensures that `tmp1` (which is `sibling->rb_right`) is either NULL or a
black node. Therefore, the right child of the sibling must be black, and
the example should use `Sr` instead of `sr`.
Link: https://lkml.kernel.org/r/20250403112614.570140-1-johnny1001s000602@gmail.com
Signed-off-by: Chisheng Chen <johnny1001s000602@gmail.com>
Cc: Hsin Chang Yu <zxcvb600870024@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove the dependency on module loading ("m") for the vmalloc test suite,
enabling it to be built directly into the kernel, so both ("=m") and
("=y") are supported.
Motivation:
- Faster debugging/testing of vmalloc code;
- It allows to configure the test via kernel-boot parameters.
Configuration example:
test_vmalloc.nr_threads=64
test_vmalloc.run_test_mask=7
test_vmalloc.sequential_test_order=1
Link: https://lkml.kernel.org/r/20250417161216.88318-2-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Adrian Huang <ahuang12@lenovo.com>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Cc: Christop Hellwig <hch@infradead.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The test has the initialization step during which threads are created. To
prevent the workers from starting prematurely a write lock was previously
used by the main setup thread, while each worker would block on a read
lock.
Replace this RWSEM based synchronization with a simpler SRCU based
approach. Which does two basic steps:
- Main thread wraps the setup phase in an SRCU read-side critical
section. Pair of srcu_read_lock()/srcu_read_unlock().
- Each worker calls synchronize_srcu() on entry, ensuring it waits for
the initialization phase to be completed.
This patch eliminates the need for down_read()/up_read() and
down_write()/up_write() pairs thus simplifying the logic and improving
clarity.
[urezki@gmail.com: fix compile error with CONFIG_TINY_RCU]
Link: https://lkml.kernel.org/r/20250420142029.103169-1-urezki@gmail.com
Link: https://lkml.kernel.org/r/20250417161216.88318-1-urezki@gmail.com
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Adrian Huang <ahuang12@lenovo.com>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Christop Hellwig <hch@infradead.org>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Move the unlikely case that mas->store_type is invalid to be the last
evaluated case and put liklier cases higher up.
Link: https://lkml.kernel.org/r/20250410191446.2474640-7-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Suggested-by: Liam R. Howlett <liam.howlett@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In order to support rebalancing and spanning stores using less than the
worst case number of nodes, we need to track more than just the vacant
height. Using only vacant height to reduce the worst case maple node
allocation count can lead to a shortcoming of nodes in the following
scenarios.
For rebalancing writes, when a leaf node becomes insufficient, it may be
combined with a sibling into a single node. This means that the parent
node which has entries for this children will lose one entry. If this
parent node was just meeting the minimum entries, losing one entry will
now cause this parent node to be insufficient. This leads to a cascading
operation of rebalancing at different levels and can lead to more node
allocations than simply using vacant height can return.
For spanning writes, a similar situation occurs. At the location at which
a spanning write is detected, the number of ancestor nodes may similarly
need to rebalanced into a smaller number of nodes and the same cascading
situation could occur.
To use less than the full height of the tree for the number of
allocations, we also need to track the height at which a non-leaf node
cannot become insufficient. This means even if a rebalance occurs to a
child of this node, it currently has enough entries that it can lose one
without any further action. This field is stored in the maple write state
as sufficient height. In mas_prealloc_calc() when figuring out how many
nodes to allocate, we check if the vacant node is lower in the tree than a
sufficient node (has a larger value). If it is, we cannot use the vacant
height and must use the difference in the height and sufficient height as
the basis for the number of nodes needed.
An off by one bug was also discovered in mast_overflow() where it is using
>= rather than >. This caused extra iterations of the
mas_spanning_rebalance() loop and lead to unneeded allocations. A test is
also added to check the number of allocations is correct.
Link: https://lkml.kernel.org/r/20250410191446.2474640-6-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This allows support for using the vacant height to calculate the worst
case number of nodes needed for wr_rebalance operation.
mas_spanning_rebalance() was seen to perform unnecessary node allocations.
We can reduce allocations by breaking early during the rebalancing loop
once we realize that we have ascended to a common ancestor.
Link: https://lkml.kernel.org/r/20250410191446.2474640-5-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Suggested-by: Liam Howlett <liam.howlett@oracle.com>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In order to determine the store type for a maple tree operation, a walk of
the tree is done through mas_wr_walk(). This function descends the tree
until a spanning write is detected or we reach a leaf node. While
descending, keep track of the height at which we encounter a node with
available space. This is done by checking if mas->end is less than the
number of slots a given node type can fit.
Now that the height of the vacant node is tracked, we can use the
difference between the height of the tree and the height of the vacant
node to know how many levels we will have to propagate creating new nodes.
Update mas_prealloc_calc() to consider the vacant height and reduce the
number of worst-case allocations.
Rebalancing and spanning stores are not supported and fall back to using
the full height of the tree for allocations.
Update preallocation testing assertions to take into account vacant
height.
Link: https://lkml.kernel.org/r/20250410191446.2474640-4-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
For the maple tree, the root node is defined to have a depth of 0 with a
height of 1. Each level down from the node, these values are incremented
by 1. Various code paths define a root with depth 1 which is inconsisent
with the definition. Modify the code to be consistent with this
definition.
In mas_spanning_rebalance(), l_mas.depth was being used to track the
height based on the number of iterations done in the main loop. This
information was then used in mas_put_in_tree() to set the height. Rather
than overload the l_mas.depth field to track height, simply keep track of
height in the local variable new_height and directly pass this to
mas_wmb_replace() which will be passed into mas_put_in_tree(). This
allows up to remove writes to l_mas.depth.
Link: https://lkml.kernel.org/r/20250410191446.2474640-3-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Track node vacancy to reduce worst case allocation counts", v5.
================ overview ========================
Currently, the maple tree preallocates the worst case number of nodes for
given store type by taking into account the whole height of the tree.
This comes from a worst case scenario of every node in the tree being full
and having to propagate node allocation upwards until we reach the root of
the tree. This can be optimized if there are vacancies in nodes that are
at a lower depth than the root node. This series implements tracking the
level at which there is a vacant node so we only need to allocate until
this level is reached, rather than always using the full height of the
tree. The ma_wr_state struct is modified to add a field which keeps track
of the vacant height and is updated during walks of the tree. This value
is then read in mas_prealloc_calc() when we decide how many nodes to
allocate.
For rebalancing and spanning stores, we also need to track the lowest
height at which a node has 1 more entry than the minimum sufficient number
of entries. This is because rebalancing can cause a parent node to become
insufficient which results in further node allocations. In this case, we
need to use the sufficient height as the worst case rather than the vacant
height.
patch 1-2: preparatory patches
patch 3: implement vacant height tracking + update the tests
patch 4: support vacant height tracking for rebalancing writes
patch 5: implement sufficient height tracking
patch 6: reorder switch case statements
================ results =========================
Bpftrace was used to profile the allocation path for requesting new maple
nodes while running stress-ng mmap 120s. The histograms below represent
requests to kmem_cache_alloc_bulk() and show the count argument. This
represnts how many maple nodes the caller is requesting in
kmem_cache_alloc_bulk()
command: stress-ng --mmap 4 --timeout 120
mm-unstable
@bulk_alloc_req:
[3, 4) 4 | |
[4, 5) 54170 |@ |
[5, 6) 0 | |
[6, 7) 893057 |@@@@@@@@@@@@@@@@@@@@ |
[7, 8) 4 | |
[8, 9) 2230287 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[9, 10) 55811 |@ |
[10, 11) 77834 |@ |
[11, 12) 0 | |
[12, 13) 1368684 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[13, 14) 0 | |
[14, 15) 0 | |
[15, 16) 367197 |@@@@@@@@ |
@maple_node_total: 46,630,160
@total_vmas: 46184591
mm-unstable + this series
@bulk_alloc_req:
[2, 3) 198 | |
[3, 4) 4 | |
[4, 5) 43 | |
[5, 6) 0 | |
[6, 7) 1069503 |@@@@@@@@@@@@@@@@@@@@@ |
[7, 8) 4 | |
[8, 9) 2597268 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[9, 10) 472191 |@@@@@@@@@ |
[10, 11) 191904 |@@@ |
[11, 12) 0 | |
[12, 13) 247316 |@@@@ |
[13, 14) 0 | |
[14, 15) 0 | |
[15, 16) 98769 |@ |
@maple_node_total: 37,813,856
@total_vmas: 43493287
This represents a ~19% reduction in the number of bulk maple nodes allocated.
For more reproducible results, a historgram of the return value of
mas_prealloc_calc() is displayed while running the maple_tree_tests whcih
have a deterministic store pattern
mas_prealloc_calc() return value mm-unstable
1 : (12068)
3 : (11836)
5 : ***** (271192)
7 : ************************************************** (2329329)
9 : *********** (534186)
10 : (435)
11 : *************** (704306)
13 : ******** (409781)
mas_prealloc_calc() return value mm-unstable + this series
1 : (12070)
3 : ************************************************** (3548777)
5 : ******** (633458)
7 : (65081)
9 : (11224)
10 : (341)
11 : (2973)
13 : (68)
do_mmap latency was also measured for regressions:
command: stress-ng --mmap 4 --timeout 120
mm-unstable:
avg = 7162 nsecs, total: 16101821292 nsecs, count: 2248034
mm-unstable + this series:
avg = 6689 nsecs, total: 15135391764 nsecs, count: 2262726
stress-ng --mmap4 --timeout 120
with vacant_height:
stress-ng: info: [257] 21526312 Maple Tree Read 0.176 M/sec
stress-ng: info: [257] 339979348 Maple Tree Write 2.774 M/sec
without vacant_height:
stress-ng: info: [8228] 20968900 Maple Tree Read 0.171 M/sec
stress-ng: info: [8228] 312214648 Maple Tree Write 2.547 M/sec
This represents an increase of ~3% read throughput and ~9% increase in
write throughput.
This patch (of 6):
In a subsequent patch, mas_prealloc_calc() will need to access fields only
in the ma_wr_state. Convert the function to take in a ma_wr_state and
modify all callers. There is no functional change.
Link: https://lkml.kernel.org/r/20250410191446.2474640-1-sidhartha.kumar@oracle.com
Link: https://lkml.kernel.org/r/20250410191446.2474640-2-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Change xa_alloc_cyclic() to return 0 even on wrap-around. Do the same for
xa_alloc_cyclic_irq() and xa_alloc_cyclic_bh().
This will prevent any future bug of treating return of 1 as an error:
int ret = xa_alloc_cyclic(...)
if (ret) // currently mishandles ret==1
goto failure;
If there will be someone interested in when wrap-around occurs, there is
still __xa_alloc_cyclic() that behaves as before. For now there is no
such user.
Link: https://lkml.kernel.org/r/20250320102219.8101-1-przemyslaw.kitszel@intel.com
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/netdev/Z9gUd-5t8b5NX2wE@casper.infradead.org
Cc: Andriy Shevchenko <andriy.shevchenko@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|