summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
7 hoursMerge tag 'bitmap-for-6.17' of https://github.com/norov/linuxLinus Torvalds
Pull bitmap updates from Yury Norov: - find_random_bit() series (Yury) - GENMASK() consolidation (Vincent) - random cleanups (Shaopeng, Ben, Yury) * tag 'bitmap-for-6.17' of https://github.com/norov/linux: bitfield: Ensure the return values of helper functions are checked test_bits: add tests for __GENMASK() and __GENMASK_ULL() bits: unify the non-asm GENMASK*() bits: split the definition of the asm and non-asm GENMASK*() cpumask: Remove unnecessary cpumask_nth_andnot() watchdog: fix opencoded cpumask_next_wrap() in watchdog_next_cpu() clocksource: Improve randomness in clocksource_verify_choose_cpus() cpumask: introduce cpumask_random() bitmap: generalize node_random()
7 hoursMerge tag 'sched_ext-for-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext updates from Tejun Heo: - Add support for cgroup "cpu.max" interface - Code organization cleanup so that ext_idle.c doesn't depend on the source-file-inclusion build method of sched/ - Drop UP paths in accordance with sched core changes - Documentation and other misc changes * tag 'sched_ext-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Fix scx_bpf_reenqueue_local() reference sched_ext: Drop kfuncs marked for removal in 6.15 sched_ext, rcu: Eject BPF scheduler on RCU CPU stall panic kernel/sched/ext.c: fix typo "occured" -> "occurred" in comments sched_ext: Add support for cgroup bandwidth control interface sched_ext, sched/core: Factor out struct scx_task_group sched_ext: Return NULL in llc_span sched_ext: Always use SMP versions in kernel/sched/ext_idle.h sched_ext: Always use SMP versions in kernel/sched/ext_idle.c sched_ext: Always use SMP versions in kernel/sched/ext.h sched_ext: Always use SMP versions in kernel/sched/ext.c sched_ext: Documentation: Clarify time slice handling in task lifecycle sched_ext: Make scx_locked_rq() inline sched_ext: Make scx_rq_bypassing() inline sched_ext: idle: Make local functions static in ext_idle.c sched_ext: idle: Remove unnecessary ifdef in scx_bpf_cpu_node()
8 hoursMerge tag 'cgroup-for-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Allow css_rstat_updated() in NMI context to enable memory accounting for allocations in NMI context. - /proc/cgroups doesn't contain useful information for cgroup2 and was updated to only show v1 controllers. This unfortunately broke something in the wild. Add an option to bring back the old behavior to ease transition. - selftest updates and other cleanups. * tag 'cgroup-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Add compatibility option for content of /proc/cgroups selftests/cgroup: fix cpu.max tests cgroup: llist: avoid memory tears for llist_node selftests: cgroup: Fix missing newline in test_zswap_writeback_one selftests: cgroup: Allow longer timeout for kmem_dead_cgroups cleanup memcg: cgroup: call css_rstat_updated irrespective of in_nmi() cgroup: remove per-cpu per-subsystem locks cgroup: make css_rstat_updated nmi safe cgroup: support to enable nmi-safe css_rstat_updated selftests: cgroup: Fix compilation on pre-cgroupns kernels selftests: cgroup: Optionally set up v1 environment selftests: cgroup: Add support for named v1 hierarchies in test_core selftests: cgroup_util: Add helpers for testing named v1 hierarchies Documentation: cgroup: add section explaining controller availability cgroup: Drop sock_cgroup_classid() dummy implementation
8 hoursMerge tag 'wq-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds
Pull workqueue updates from Tejun Heo: - Prepare for defaulting to unbound workqueue. A separate branch was created to ease pulling in from other trees but none of the conversions have landed yet - Memory allocation profiling support added - Misc changes * tag 'wq-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Use atomic_try_cmpxchg_relaxed() in tryinc_node_nr_active() workqueue: Remove unused work_on_cpu_safe workqueue: Add new WQ_PERCPU flag workqueue: Add system_percpu_wq and system_dfl_wq workqueue: Basic memory allocation profiling support workqueue: fix opencoded cpumask_next_and_wrap() in wq_select_unbound_cpu()
9 hoursMerge tag 'mm-stable-2025-07-30-15-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "As usual, many cleanups. The below blurbiage describes 42 patchsets. 21 of those are partially or fully cleanup work. "cleans up", "cleanup", "maintainability", "rationalizes", etc. I never knew the MM code was so dirty. "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes) addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for merging with existing adjacent VMAs. "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park) adds a new kernel module which simplifies the setup and usage of DAMON in production environments. "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig) is a cleanup to the writeback code which removes a couple of pointers from struct writeback_control. "drivers/base/node.c: optimization and cleanups" (Donet Tom) contains largely uncorrelated cleanups to the NUMA node setup and management code. "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman) does some maintenance work on the userfaultfd code. "Readahead tweaks for larger folios" (Ryan Roberts) implements some tuneups for pagecache readahead when it is reading into order>0 folios. "selftests/mm: Tweaks to the cow test" (Mark Brown) provides some cleanups and consistency improvements to the selftests code. "Optimize mremap() for large folios" (Dev Jain) does that. A 37% reduction in execution time was measured in a memset+mremap+munmap microbenchmark. "Remove zero_user()" (Matthew Wilcox) expunges zero_user() in favor of the more modern memzero_page(). "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand) addresses some warts which David noticed in the huge page code. These were not known to be causing any issues at this time. "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park) provides some cleanup and consolidation work in DAMON. "use vm_flags_t consistently" (Lorenzo Stoakes) uses vm_flags_t in places where we were inappropriately using other types. "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy) increases the reliability of large page allocation in the memfd code. "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple) removes several now-unneeded PFN_* flags. "mm/damon: decouple sysfs from core" (SeongJae Park) implememnts some cleanup and maintainability work in the DAMON sysfs layer. "madvise cleanup" (Lorenzo Stoakes) does quite a lot of cleanup/maintenance work in the madvise() code. "madvise anon_name cleanups" (Vlastimil Babka) provides additional cleanups on top or Lorenzo's effort. "Implement numa node notifier" (Oscar Salvador) creates a standalone notifier for NUMA node memory state changes. Previously these were lumped under the more general memory on/offline notifier. "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan) cleans up the pageblock isolation code and fixes a potential issue which doesn't seem to cause any problems in practice. "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park) adds additional drgn- and python-based DAMON selftests which are more comprehensive than the existing selftest suite. "Misc rework on hugetlb faulting path" (Oscar Salvador) fixes a rather obscure deadlock in the hugetlb fault code and follows that fix with a series of cleanups. "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport) rationalizes and cleans up the highmem-specific code in the CMA allocator. "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand) provides cleanups and future-preparedness to the migration code. "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park) adds some tracepoints to some DAMON auto-tuning code. "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park) does that. "mm/damon: misc cleanups" (SeongJae Park) also does what it claims. "mm: folio_pte_batch() improvements" (David Hildenbrand) cleans up the large folio PTE batching code. "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park) facilitates dynamic alteration of DAMON's inter-node allocation policy. "Remove unmap_and_put_page()" (Vishal Moola) provides a couple of page->folio conversions. "mm: per-node proactive reclaim" (Davidlohr Bueso) implements a per-node control of proactive reclaim - beyond the current memcg-based implementation. "mm/damon: remove damon_callback" (SeongJae Park) replaces the damon_callback interface with a more general and powerful damon_call()+damos_walk() interface. "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes) implements a number of mremap cleanups (of course) in preparation for adding new mremap() functionality: newly permit the remapping of multiple VMAs when the user is specifying MREMAP_FIXED. It still excludes some specialized situations where this cannot be performed reliably. "drop hugetlb_free_pgd_range()" (Anthony Yznaga) switches some sparc hugetlb code over to the generic version and removes the thus-unneeded hugetlb_free_pgd_range(). "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park) augments the present userspace-requested update of DAMON sysfs monitoring files. Automatic update is now provided, along with a tunable to control the update interval. "Some randome fixes and cleanups to swapfile" (Kemeng Shi) does what is claims. "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand) provides (and uses) a means by which debug-style functions can grab a copy of a pageframe and inspect it locklessly without tripping over the races inherent in operating on the live pageframe directly. "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan) addresses the large contention issues which can be triggered by reads from that procfs file. Latencies are reduced by more than half in some situations. The series also introduces several new selftests for the /proc/pid/maps interface. "__folio_split() clean up" (Zi Yan) cleans up __folio_split()! "Optimize mprotect() for large folios" (Dev Jain) provides some quite large (>3x) speedups to mprotect() when dealing with large folios. "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian) does some cleanup work in the selftests code. "tools/testing: expand mremap testing" (Lorenzo Stoakes) extends the mremap() selftest in several ways, including adding more checking of Lorenzo's recently added "permit mremap() move of multiple VMAs" feature. "selftests/damon/sysfs.py: test all parameters" (SeongJae Park) extends the DAMON sysfs interface selftest so that it tests all possible user-requested parameters. Rather than the present minimal subset" * tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits) MAINTAINERS: add missing headers to mempory policy & migration section MAINTAINERS: add missing file to cgroup section MAINTAINERS: add MM MISC section, add missing files to MISC and CORE MAINTAINERS: add missing zsmalloc file MAINTAINERS: add missing files to page alloc section MAINTAINERS: add missing shrinker files MAINTAINERS: move memremap.[ch] to hotplug section MAINTAINERS: add missing mm_slot.h file THP section MAINTAINERS: add missing interval_tree.c to memory mapping section MAINTAINERS: add missing percpu-internal.h file to per-cpu section mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info() selftests/damon: introduce _common.sh to host shared function selftests/damon/sysfs.py: test runtime reduction of DAMON parameters selftests/damon/sysfs.py: test non-default parameters runtime commit selftests/damon/sysfs.py: generalize DAMON context commit assertion selftests/damon/sysfs.py: generalize monitoring attributes commit assertion selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion selftests/damon/sysfs.py: test DAMOS filters commitment selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion selftests/damon/sysfs.py: test DAMOS destinations commitment ...
14 hoursMerge tag 'v6.17-p1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Allow hash drivers without fallbacks (e.g., hardware key) Algorithms: - Add hmac hardware key support (phmac) on s390 - Re-enable sha384 in FIPS mode - Disable sha1 in FIPS mode - Convert zstd to acomp Drivers: - Lower priority of qat skcipher and aead - Convert aspeed to partial block API - Add iMX8QXP support in caam - Add rate limiting support for GEN6 devices in qat - Enable telemetry for GEN6 devices in qat - Implement full backlog mode for hisilicon/sec2" * tag 'v6.17-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits) crypto: keembay - Use min() to simplify ocs_create_linked_list_from_sg() crypto: hisilicon/hpre - fix dma unmap sequence crypto: qat - make adf_dev_autoreset() static crypto: ccp - reduce stack usage in ccp_run_aes_gcm_cmd crypto: qat - refactor ring-related debug functions crypto: qat - fix seq_file position update in adf_ring_next() crypto: qat - fix DMA direction for compression on GEN2 devices crypto: jitter - replace ARRAY_SIZE definition with header include crypto: engine - remove {prepare,unprepare}_crypt_hardware callbacks crypto: engine - remove request batching support crypto: qat - flush misc workqueue during device shutdown crypto: qat - enable rate limiting feature for GEN6 devices crypto: qat - add compression slice count for rate limiting crypto: qat - add get_svc_slice_cnt() in device data structure crypto: qat - add adf_rl_get_num_svc_aes() in rate limiting crypto: qat - relocate service related functions crypto: qat - consolidate service enums crypto: qat - add decompression service for rate limiting crypto: qat - validate service in rate limiting sysfs api crypto: hisilicon/sec2 - implement full backlog mode for sec ...
15 hourswatchdog: fix opencoded cpumask_next_wrap() in watchdog_next_cpu()Yury Norov [NVIDIA]
The dedicated helper is more verbose and efficient comparing to cpumask_next() followed by cpumask_first(). Signed-off-by: "Yury Norov [NVIDIA]" <yury.norov@gmail.com> Reviewed-by: Douglas Anderson <dianders@chromium.org>
15 hoursclocksource: Improve randomness in clocksource_verify_choose_cpus()Yury Norov [NVIDIA]
The current algorithm of picking a random CPU works OK for dense online cpumask, but if cpumask is non-dense, the distribution of picked CPUs is skewed. For example, on 8-CPU board with CPUs 4-7 offlined, the probability of selecting CPU 0 is 5/8. Accordingly, cpus 1, 2 and 3 are chosen with probability 1/8 each. The proper algorithm should pick each online CPU with probability 1/4. Switch it to cpumask_random(), which has better statistical characteristics. CC: Andrew Morton <akpm@linux-foundation.org> Acked-by: John Stultz <jstultz@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Yury Norov [NVIDIA]" <yury.norov@gmail.com>
28 hoursMerge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "Highlights: - Intel xe enable Panthor Lake, started adding WildCat Lake - amdgpu has a bunch of reset improvments along with the usual IP updates - msm got VM_BIND support which is important for vulkan sparse memory - more drm_panic users - gpusvm common code to handle a bunch of core SVM work outside drivers. Detail summary: Changes outside drm subdirectory: - 'shrink_shmem_memory()' for better shmem/hibernate interaction - Rust support infrastructure: - make ETIMEDOUT available - add size constants up to SZ_2G - add DMA coherent allocation bindings - mtd driver for Intel GPU non-volatile storage - i2c designware quirk for Intel xe core: - atomic helpers: tune enable/disable sequences - add task info to wedge API - refactor EDID quirks - connector: move HDR sink to drm_display_info - fourcc: half-float and 32-bit float formats - mode_config: pass format info to simplify dma-buf: - heaps: Give CMA heap a stable name ci: - add device tree validation and kunit displayport: - change AUX DPCD access probe address - add quirk for DPCD probe - add panel replay definitions - backlight control helpers fbdev: - make CONFIG_FIRMWARE_EDID available on all arches fence: - fix UAF issues format-helper: - improve tests gpusvm: - introduce devmem only flag for allocation - add timeslicing support to GPU SVM ttm: - improve eviction sched: - tracing improvements - kunit improvements - memory leak fixes - reset handling improvements color mgmt: - add hardware gamma LUT handling helpers bridge: - add destroy hook - switch to reference counted drm_bridge allocations - tc358767: convert to devm_drm_bridge_alloc - improve CEC handling panel: - switch to reference counter drm_panel allocations - fwnode panel lookup - Huiling hl055fhv028c support - Raspberry Pi 7" 720x1280 support - edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK - simple: AUO P238HAN01 - st7701: Winstar wf40eswaa6mnn0 - visionox: rm69299-shift - Renesas R61307, Renesas R69328 support - DJN HX83112B hdmi: - add CEC handling - YUV420 output support xe: - WildCat Lake support - Enable PanthorLake by default - mark BMG as SRIOV capable - update firmware recommendations - Expose media OA units - aux-bux support for non-volatile memory - MTD intel-dg driver for non-volatile memory - Expose fan control and voltage regulator in sysfs - restructure migration for multi-device - Restore GuC submit UAF fix - make GEM shrinker drm managed - SRIOV VF Post-migration recovery of GGTT nodes - W/A additions/reworks - Prefetch support for svm ranges - Don't allocate managed BO for each policy change - HWMON fixes for BMG - Create LRC BO without VM - PCI ID updates - make SLPC debugfs files optional - rework eviction rejection of bound external BOs - consolidate PAT programming logic for pre/post Xe2 - init changes for flicker-free boot - Enable GuC Dynamic Inhibit Context switch i915: - drm_panic support for i915/xe - initial flip queue off by default for LNL/PNL - Wildcat Lake Display support - Support for DSC fractional link bpp - Support for simultaneous Panel Replay and Adaptive sync - Support for PTL+ double buffer LUT - initial PIPEDMC event handling - drm_panel_follower support - DPLL interface renames - allocate struct intel_display dynamically - flip queue preperation - abstract DRAM detection better - avoid GuC scheduling stalls - remove DG1 force probe requirement - fix MEI interrupt handler on RT kernels - use backlight control helpers for eDP - more shared display code refactoring amdgpu: - add userq slot to INFO ioctl - SR-IOV hibernation support - Suspend improvements - Backlight improvements - Use scaling for non-native eDP modes - cleaner shader updates for GC 9.x - Remove fence slab - SDMA fw checks for userq support - RAS updates - DMCUB updates - DP tunneling fixes - Display idle D3 support - Per queue reset improvements - initial smartmux support amdkfd: - enable KFD on loongarch - mtype fix for ext coherent system memory radeon: - CS validation additional GL extensions - drop console lock during suspend/resume - bump driver version msm: - VM BIND support - CI: infrastructure updates - UBWC single source of truth - decouple GPU and KMS support - DP: rework I/O accessors - DPU: SM8750 support - DSI: SM8750 support - GPU: X1-45 support and speedbin support for X1-85 - MDSS: SM8750 support nova: - register! macro improvements - DMA object abstraction - VBIOS parser + fwsec lookup - sysmem flush page support - falcon: generic falcon boot code and HAL - FWSEC-FRTS: fb setup and load/execute ivpu: - Add Wildcat Lake support - Add turbo flag ast: - improve hardware generations implementation imx: - IMX8qxq Display Controller support lima: - Rockchip RK3528 GPU support nouveau: - fence handling cleanup panfrost: - MT8370 support - bo labeling - 64-bit register access qaic: - add RAS support rockchip: - convert inno_hdmi to a bridge rz-du: - add RZ/V2H(P) support - MIPI-DSI DCS support sitronix: - ST7567 support sun4i: - add H616 support tidss: - add TI AM62L support - AM65x OLDI bridge support bochs: - drm panic support vkms: - YUV and R* format support - use faux device vmwgfx: - fence improvements hyperv: - move out of simple - add drm_panic support" * tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits) drm/tidss: oldi: convert to devm_drm_bridge_alloc() API drm/tidss: encoder: convert to devm_drm_bridge_alloc() drm/amdgpu: move reset support type checks into the caller drm/amdgpu/sdma7: re-emit unprocessed state on ring reset drm/amdgpu/sdma6: re-emit unprocessed state on ring reset drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset drm/amdgpu/sdma5: re-emit unprocessed state on ring reset drm/amdgpu/gfx12: re-emit unprocessed state on ring reset drm/amdgpu/gfx11: re-emit unprocessed state on ring reset drm/amdgpu/gfx10: re-emit unprocessed state on ring reset drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset drm/amdgpu: Add WARN_ON to the resource clear function drm/amd/pm: Use cached metrics data on SMUv13.0.6 drm/amd/pm: Use cached data for min/max clocks gpu: nova-core: fix bounds check in PmuLookupTableEntry::new drm/amdgpu: Replace HQD terminology with slots naming drm/amdgpu: Add user queue instance count in HW IP info drm/amd/amdgpu: Add helper functions for isp buffers drm/amd/amdgpu: Initialize swnode for ISP MFD device ...
31 hoursMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor - Convert more system register sanitization to the config-driven implementation - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls - Various cleanups and minor fixes LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n) - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list) - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC - Various cleanups and fixes x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still) - Fix a variety of flaws and bugs in the AVIC device posted IRQ code - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs - Clean up and document/comment the irqfd assignment code - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation Selftests: - Fix a comment typo - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing) - Skip tests that hit EACCES when attempting to access a file, and print a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits) Documentation: KVM: Use unordered list for pre-init VGIC registers RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map() RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs RISC-V: perf/kvm: Add reporting of interrupt events RISC-V: KVM: Enable ring-based dirty memory tracking RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap RISC-V: KVM: Delegate illegal instruction fault to VS mode RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs RISC-V: KVM: Factor-out g-stage page table management RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence RISC-V: KVM: Introduce struct kvm_gstage_mapping RISC-V: KVM: Factor-out MMU related declarations into separate headers RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect() RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range() RISC-V: KVM: Don't flush TLB when PTE is unchanged RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize() RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init() RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list ...
31 hoursMerge tag 'trace-unused-v6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracepoint cleanup from Steven Rostedt: "Remove or hide unused tracepoints Tracepoints take up memory (around 5K per tracepoint) even when they are unused. Changes are being made to detect when a tracepoint is defined but unused and a warning is shown at build. But those changes are not yet ready for inclusion. - Fix some of the unused tracepoints that it detected Some tracepoints were removed and others were hidden by config settings to match the config settings of where they are instantiated. Some tracepoints were moved into architecture specific code as only one architecture used them. - Call the ftrace_test_filter tracepoint in an unreachable if statement The ftrace_test_filter tracepoint which is defined when ftrace selftests are configured and is used to test the filter logic, but the tracepoint is not actually called. It is put into an if statement to not have it get compiled out, but also not warn for not being used" * tag 'trace-unused-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: sched: Hide numa events under CONFIG_NUMA_BALANCING powerpc/thp: tracing: Hide hugepage events under CONFIG_PPC_BOOK3S_64 tracing: Call trace_ftrace_test_filter() for the event tracing: arm: arm64: Hide trace events ipi_raise, ipi_entry and ipi_exit binder: Remove unused binder lock events PM: tracing: Hide power_domain_target event under ARCH_OMAP2PLUS PM: tracing: Hide device_pm_callback events under PM_SLEEP PM: tracing: Hide psci_domain_idle events under ARM_PSCI_CPUIDLE PM: cpufreq: powernv/tracing: Move powernv_throttle trace event alarmtimer: Hide alarmtimer_suspend event when RTC_CLASS is not configured tracing, AER: Hide PCIe AER event when PCIEAER is not configured
32 hoursMerge tag 'trace-rv-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull runtime verification updates from Steven Rostedt: - Added Linear temporal logic monitors for RT application Real-time applications may have design flaws causing them to have unexpected latency. For example, the applications may raise page faults, or may be blocked trying to take a mutex without priority inheritance. However, while attempting to implement DA monitors for these real-time rules, deterministic automaton is found to be inappropriate as the specification language. The automaton is complicated, hard to understand, and error-prone. For these cases, linear temporal logic is found to be more suitable. The LTL is more concise and intuitive. - Make printk_deferred() public The new monitors needed access to printk_deferred(). Make them visible for the entire kernel. - Add a vpanic() to allow for va_list to be passed to panic. - Add rtapp container monitor. A collection of monitors that check for common problems with real-time applications that cause unexpected latency. - Add page fault tracepoints to risc-v These tracepoints are necessary to for the RV monitor to run on risc-v. - Fix the behaviour of the rv tool with -s and idle tasks. - Allow the rv tool to gracefully terminate with SIGTERM - Adjusts dot2c not to create lines over 100 columns - Properly order nested monitors in the RV Kconfig file - Return the registration error in all DA monitor instead of 0 - Update and add new sched collection monitors Replace tss and sncid monitors with more complete sts: Not only prove that switches occur in scheduling context and scheduling needs interrupt disabled but also that each call to the scheduler disables interrupts to (optionally) switch. New monitor: nrp Preemption requires need resched which is cleared by any switch (includes a non optimal workaround for /nested/ preemptions) New monitor: sssw suspension requires setting the task to sleepable and, after the switch occurs, the task requires a wakeup to come back to runnable New monitor: opid waking and need-resched operations occur with interrupts and preemption disabled or in IRQ without explicitly disabling preemption" * tag 'trace-rv-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (48 commits) rv: Add opid per-cpu monitor rv: Add nrp and sssw per-task monitors rv: Replace tss and sncid monitors with more complete sts sched: Adapt sched tracepoints for RV task model rv: Retry when da monitor detects race conditions rv: Adjust monitor dependencies rv: Use strings in da monitors tracepoints rv: Remove trailing whitespace from tracepoint string rv: Add da_handle_start_run_event_ to per-task monitors rv: Fix wrong type cast in reactors_show() and monitor_reactor_show() rv: Fix wrong type cast in monitors_show() rv: Remove struct rv_monitor::reacting rv: Remove rv_reactor's reference counter rv: Merge struct rv_reactor_def into struct rv_reactor rv: Merge struct rv_monitor_def into struct rv_monitor rv: Remove unused field in struct rv_monitor_def rv: Return init error when registering monitors verification/rvgen: Organise Kconfig entries for nested monitors tools/dot2c: Fix generated files going over 100 column limit tools/rv: Stop gracefully also on SIGTERM ...
32 hoursMerge tag 'trace-ringbuffer-v6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer updates from Steven Rostedt: - Rewind persistent ring buffer on boot When the persistent ring buffer is being used for live kernel tracing and the system crashes, the tool that is reading the trace may not have recorded the data when the system crashed. Although the persistent ring buffer still has that data, when reading it after a reboot, it will start where it left off. That is, what was read will not be accessible. Instead, on reboot, have the persistent ring buffer restart where the data starts and this will allow the tooling to recover what was lost when the crash occurred. - Remove the ring_buffer_read_prepare_sync() logic Reading the trace file required stopping writing to the ring buffer as the trace file is only an iterator and does not consume what it read. It was originally not safe to read the ring buffer in this mode and required disabling writing. The ring_buffer_read_prepare_sync() logic was used to stop each per_cpu ring buffer, call synchronize_rcu() and then start the iterator. This was used instead of calling synchronize_rcu() for each per_cpu buffer. Today, the iterator has been updated where it is safe to read the trace file while writing to the ring buffer is still occurring. There is no more need to do this synchronization and it is causing large delays on machines with many CPUs. Remove this unneeded synchronization. - Make static string array a constant in show_irq_str() Making the string array into a constant has shown to decrease code text/data size. * tag 'trace-ringbuffer-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: ring-buffer: Make the const read-only 'type' static ring-buffer: Remove ring_buffer_read_prepare_sync() tracing: ring_buffer: Rewind persistent ring buffer on reboot
32 hoursMerge tag 'ftrace-v6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace updates from Steven Rostedt: - Keep track of when fgraph_ops are registered or not Keep accounting of when fgraph_ops are registered as if a fgraph_ops is registered twice it can mess up the accounting and it will not work as expected later. Trigger a warning if something registers it twice as to catch bugs before they are found by things just not working as expected. - Make DYNAMIC_FTRACE always enabled for architectures that support it As static ftrace (where all functions are always traced) is very expensive and only exists to help architectures support ftrace, do not make it an option. As soon as an architecture supports DYNAMIC_FTRACE make it use it. This simplifies the code. - Remove redundant config HAVE_FTRACE_MCOUNT_RECORD The CONFIG_HAVE_FTRACE_MCOUNT was added to help simplify the DYNAMIC_FTRACE work, but now every architecture that implements DYNAMIC_FTRACE also has HAVE_FTRACE_MCOUNT set too, making it redundant with the HAVE_DYNAMIC_FTRACE. - Make pid_ptr string size match the comment In print_graph_proc() the pid_ptr string is of size 11, but the comment says /* sign + log10(MAX_INT) + '\0' */ which is actually 12. * tag 'ftrace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Remove redundant config HAVE_FTRACE_MCOUNT_RECORD ftrace: Make DYNAMIC_FTRACE always enabled for architectures that support it fgraph: Keep track of when fgraph_ops are registered or not fgraph: Make pid_str size match the comment
32 hoursMerge tag 'probes-v6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes updates from Masami Hiramatsu: "Stack usage reduction for probe events: - Allocate string buffers from the heap for uprobe, eprobe, kprobe, and fprobe events to avoid stack overflow - Allocate traceprobe_parse_context from the heap to prevent potential stack overflow - Fix a typo in the above commit New features for eprobe and tprobe events: - Add support for arrays in eprobes - Support multiple tprobes on the same tracepoint Improve efficiency: - Register fprobe-events only when it is enabled to reduce overhead - Register tracepoints for tprobe events only when enabled to resolve a lock dependency Code Cleanup: - Add kerneldoc for traceprobe_parse_event_name() and __get_insn_slot() - Sort #include alphabetically in the probes code - Remove the unused 'mod' field from the tprobe-event - Clean up the entry-arg storing code in probe-events Selftest update - Enable fprobe events before checking enable_functions in selftests" * tag 'probes-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: trace_fprobe: Fix typo of the semicolon tracing: Have eprobes handle arrays tracing: probes: Add a kerneldoc for traceprobe_parse_event_name() tracing: uprobe-event: Allocate string buffers from heap tracing: eprobe-event: Allocate string buffers from heap tracing: kprobe-event: Allocate string buffers from heap tracing: fprobe-event: Allocate string buffers from heap tracing: probe: Allocate traceprobe_parse_context from heap tracing: probes: Sort #include alphabetically kprobes: Add missing kerneldoc for __get_insn_slot tracing: tprobe-events: Register tracepoint when enable tprobe event selftests: tracing: Enable fprobe events before checking enable_functions tracing: fprobe-events: Register fprobe-events only when it is enabled tracing: tprobe-events: Support multiple tprobes on the same tracepoint tracing: tprobe-events: Remove mod field from tprobe-event tracing: probe-events: Cleanup entry-arg storing code
32 hoursMerge tag 'probes-fixes-v6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fix from Masami Hiramatsu: - Fix a potential infinite recursion in fprobe by using preempt_*_notrace() * tag 'probes-fixes-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: fprobe: Fix infinite recursion using preempt_*_notrace()
37 hoursMerge tag 'rcu.release.v6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU updates from Neeraj Upadhyay: "Expedited grace period updates: - Protect against early RCU exp quiescent state reporting during exp grace period initialization - Remove superfluous barrier in task unblock path - Remove the CPU online quiescent state report optimization, which is error prone for certain scenarios - Add warning for unexpected pending requested expedited quiescent state on dying CPU Core: - Robustify rcu_is_cpu_rrupt_from_idle() by using more accurate indicators of the actual context tracking state of a CPU - Handle ->defer_qs_iw_pending field data race - Enable rcu_normal_wake_from_gp by default on systems with <= 16 CPUs - Fix lockup in rcu_read_unlock() due to recursive irq_exit() calls - Refactor expedited handling condition in rcu_read_unlock_special() - Documentation updates for hotplug and GP init scan ordering, separation of rcu_state and rnp's gp_seq states, quiescent state reporting for offline CPUs torture-scripts: - Cleanup and improve scripts : remove superfluous warnings for disabled tests; better handling of kvm.sh --kconfig arg; suppress some confusing diagnostics; tolerate bad kvm.sh args; add new diagnostic for build output; fail allmodconfig testing on warnings - Include RCU_TORTURE_TEST_CHK_RDR_STATE config for KCSAN kernels - Disable default RCU-tasks and clocksource-wdog testing on arm64 - Add EXPERT Kconfig option for arm64 KCSAN runs - Remove SRCU-lite testing rcutorture: - Start torture writer threads creation after reader threads to handle race in SRCU-P scenario - Add SRCU down_read()/up_read() test - Add diagnostics for delayed SRCU up_read(), unmatched up_read(), print number of up/down readers and the number of such readers which migrated to other CPU - Ignore certain unsupported configurations for trivial RCU test - Fix splats in RT kernels due to inaccurate checks for BH-disabled context - Enable checks and logs to capture intentionally exercised unexpected scenarios (too short readers) for BUSTED test - Remove SRCU-lite testing srcu: - Expedite SRCU-fast grace periods - Remove SRCU-lite implementation - Add guards for SRCU-fast readers rcu nocb: - Dump NOCB group leader state on stall detection - Robustify nocb_cb_kthread pointer accesses - Fix delayed execution of hurry callbacks when LAZY_RCU is enabled refscale: - Fix multiplication overflow in "loops" and "nreaders" calculations" * tag 'rcu.release.v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (49 commits) rcu: Document concurrent quiescent state reporting for offline CPUs rcu: Document separation of rcu_state and rnp's gp_seq rcu: Document GP init vs hotplug-scan ordering requirements srcu: Add guards for SRCU-fast readers rcu: Fix delayed execution of hurry callbacks rcu: Refactor expedited handling check in rcu_read_unlock_special() checkpatch: Remove SRCU-lite deprecation srcu: Remove SRCU-lite implementation srcu: Expedite SRCU-fast grace periods rcutorture: Remove support for SRCU-lite rcutorture: Remove SRCU-lite scenarios torture: Remove support for SRCU-lite torture: Make torture.sh --allmodconfig testing fail on warnings torture: Add "ERROR" diagnostic for testing kernel-build output torture: Make torture.sh tolerate runs having bad kvm.sh arguments torture: Add textid.txt file to --do-allmodconfig and --do-rcu-rust runs torture: Extract testid.txt generation to separate script torture: Suppress "find" diagnostics from torture.sh --do-none run torture: Provide EXPERT Kconfig option for arm64 KCSAN torture.sh runs rcu: Fix rcu_read_unlock() deadloop due to IRQ work ...
37 hoursMerge tag 'kcsan-20250728-v6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/melver/linux Pull Kernel Concurrency Sanitizer (KCSAN) update from Marco Elver: - A single fix to silence an uninitialized variable warning * tag 'kcsan-20250728-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/melver/linux: kcsan: test: Initialize dummy variable
37 hoursMerge tag 'kvm-s390-next-6.17-1' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD RCU wakeup fix for KVM s390 guest entry
38 hoursMerge tag 'bpf-next-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Remove usermode driver (UMD) framework (Thomas Weißschuh) - Introduce Strongly Connected Component (SCC) in the verifier to detect loops and refine register liveness (Eduard Zingerman) - Allow 'void *' cast using bpf_rdonly_cast() and corresponding '__arg_untrusted' for global function parameters (Eduard Zingerman) - Improve precision for BPF_ADD and BPF_SUB operations in the verifier (Harishankar Vishwanathan) - Teach the verifier that constant pointer to a map cannot be NULL (Ihor Solodrai) - Introduce BPF streams for error reporting of various conditions detected by BPF runtime (Kumar Kartikeya Dwivedi) - Teach the verifier to insert runtime speculation barrier (lfence on x86) to mitigate speculative execution instead of rejecting the programs (Luis Gerhorst) - Various improvements for 'veristat' (Mykyta Yatsenko) - For CONFIG_DEBUG_KERNEL config warn on internal verifier errors to improve bug detection by syzbot (Paul Chaignon) - Support BPF private stack on arm64 (Puranjay Mohan) - Introduce bpf_cgroup_read_xattr() kfunc to read xattr of cgroup's node (Song Liu) - Introduce kfuncs for read-only string opreations (Viktor Malik) - Implement show_fdinfo() for bpf_links (Tao Chen) - Reduce verifier's stack consumption (Yonghong Song) - Implement mprog API for cgroup-bpf programs (Yonghong Song) * tag 'bpf-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (192 commits) selftests/bpf: Migrate fexit_noreturns case into tracing_failure test suite selftests/bpf: Add selftest for attaching tracing programs to functions in deny list bpf: Add log for attaching tracing programs to functions in deny list bpf: Show precise rejected function when attaching fexit/fmod_ret to __noreturn functions bpf: Fix various typos in verifier.c comments bpf: Add third round of bounds deduction selftests/bpf: Test invariants on JSLT crossing sign selftests/bpf: Test cross-sign 64bits range refinement selftests/bpf: Update reg_bound range refinement logic bpf: Improve bounds when s64 crosses sign boundary bpf: Simplify bounds refinement from s32 selftests/bpf: Enable private stack tests for arm64 bpf, arm64: JIT support for private stack bpf: Move bpf_jit_get_prog_name() to core.c bpf, arm64: Fix fp initialization for exception boundary umd: Remove usermode driver framework bpf/preload: Don't select USERMODE_DRIVER selftests/bpf: Fix test dynptr/test_dynptr_memset_xdp_chunks failure selftests/bpf: Fix test dynptr/test_dynptr_copy_xdp failure selftests/bpf: Increase xdp data size for arm64 64K page size ...
39 hoursMerge tag 'net-next-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Wrap datapath globals into net_aligned_data, to avoid false sharing - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container) - Add SO_INQ and SCM_INQ support to AF_UNIX - Add SIOCINQ support to AF_VSOCK - Add TCP_MAXSEG sockopt to MPTCP - Add IPv6 force_forwarding sysctl to enable forwarding per interface - Make TCP validation of whether packet fully fits in the receive window and the rcv_buf more strict. With increased use of HW aggregation a single "packet" can be multiple 100s of kB - Add MSG_MORE flag to optimize large TCP transmissions via sockmap, improves latency up to 33% for sockmap users - Convert TCP send queue handling from tasklet to BH workque - Improve BPF iteration over TCP sockets to see each socket exactly once - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code - Support enabling kernel threads for NAPI processing on per-NAPI instance basis rather than a whole device. Fully stop the kernel NAPI thread when threaded NAPI gets disabled. Previously thread would stick around until ifdown due to tricky synchronization - Allow multicast routing to take effect on locally-generated packets - Add output interface argument for End.X in segment routing - MCTP: add support for gateway routing, improve bind() handling - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink - Add a new neighbor flag ("extern_valid"), which cedes refresh responsibilities to userspace. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed - Support NUD_PERMANENT for proxy neighbor entries - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM - Add sequence numbers to netconsole messages. Unregister netconsole's console when all net targets are removed. Code refactoring. Add a number of selftests - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol should be used for an inbound SA lookup - Support inspecting ref_tracker state via DebugFS - Don't force bonding advertisement frames tx to ~333 ms boundaries. Add broadcast_neighbor option to send ARP/ND on all bonded links - Allow providing upcall pid for the 'execute' command in openvswitch - Remove DCCP support from Netfilter's conntrack - Disallow multiple packet duplications in the queuing layer - Prevent use of deprecated iptables code on PREEMPT_RT Driver API: - Support RSS and hashing configuration over ethtool Netlink - Add dedicated ethtool callbacks for getting and setting hashing fields - Add support for power budget evaluation strategy in PSE / Power-over-Ethernet. Generate Netlink events for overcurrent etc - Support DPLL phase offset monitoring across all device inputs. Support providing clock reference and SYNC over separate DPLL inputs - Support traffic classes in devlink rate API for bandwidth management - Remove rtnl_lock dependency from UDP tunnel port configuration Device drivers: - Add a new Broadcom driver for 800G Ethernet (bnge) - Add a standalone driver for Microchip ZL3073x DPLL - Remove IBM's NETIUCV device driver - Ethernet high-speed NICs: - Broadcom (bnxt): - support zero-copy Tx of DMABUF memory - take page size into account for page pool recycling rings - Intel (100G, ice, idpf): - idpf: XDP and AF_XDP support preparations - idpf: add flow steering - add link_down_events statistic - clean up the TSPLL code - preparations for live VM migration - nVidia/Mellanox: - support zero-copy Rx/Tx interfaces (DMABUF and io_uring) - optimize context memory usage for matchers - expose serial numbers in devlink info - support PCIe congestion metrics - Meta (fbnic): - add 25G, 50G, and 100G link modes to phylink - support dumping FW logs - Marvell/Cavium: - support for CN20K generation of the Octeon chips - Amazon: - add HW clock (without timestamping, just hypervisor time access) - Ethernet virtual: - VirtIO net: - support segmentation of UDP-tunnel-encapsulated packets - Google (gve): - support packet timestamping and clock synchronization - Microsoft vNIC: - add handler for device-originated servicing events - allow dynamic MSI-X vector allocation - support Tx bandwidth clamping - Ethernet NICs consumer, and embedded: - AMD: - amd-xgbe: hardware timestamping and PTP clock support - Broadcom integrated MACs (bcmgenet, bcmasp): - use napi_complete_done() return value to support NAPI polling - add support for re-starting auto-negotiation - Broadcom switches (b53): - support BCM5325 switches - add bcm63xx EPHY power control - Synopsys (stmmac): - lots of code refactoring and cleanups - TI: - icssg-prueth: read firmware-names from device tree - icssg: PRP offload support - Microchip: - lan78xx: convert to PHYLINK for improved PHY and MAC management - ksz: add KSZ8463 switch support - Intel: - support similar queue priority scheme in multi-queue and time-sensitive networking (taprio) - support packet pre-emption in both - RealTek (r8169): - enable EEE at 5Gbps on RTL8126 - Airoha: - add PPPoE offload support - MDIO bus controller for Airoha AN7583 - Ethernet PHYs: - support for the IPQ5018 internal GE PHY - micrel KSZ9477 switch-integrated PHYs: - add MDI/MDI-X control support - add RX error counters - add cable test support - add Signal Quality Indicator (SQI) reporting - dp83tg720: improve reset handling and reduce link recovery time - support bcm54811 (and its MII-Lite interface type) - air_en8811h: support resume/suspend - support PHY counters for QCA807x and QCA808x - support WoL for QCA807x - CAN drivers: - rcar_canfd: support for Transceiver Delay Compensation - kvaser: report FW versions via devlink dev info - WiFi: - extended regulatory info support (6 GHz) - add statistics and beacon monitor for Multi-Link Operation (MLO) - support S1G aggregation, improve S1G support - add Radio Measurement action fields - support per-radio RTS threshold - some work around how FIPS affects wifi, which was wrong (RC4 is used by TKIP, not only WEP) - improvements for unsolicited probe response handling - WiFi drivers: - RealTek (rtw88): - IBSS mode for SDIO devices - RealTek (rtw89): - BT coexistence for MLO/WiFi7 - concurrent station + P2P support - support for USB devices RTL8851BU/RTL8852BU - Intel (iwlwifi): - use embedded PNVM in (to be released) FW images to fix compatibility issues - many cleanups (unused FW APIs, PCIe code, WoWLAN) - some FIPS interoperability - MediaTek (mt76): - firmware recovery improvements - more MLO work - Qualcomm/Atheros (ath12k): - fix scan on multi-radio devices - more EHT/Wi-Fi 7 features - encapsulation/decapsulation offload - Broadcom (brcm80211): - support SDIO 43751 device - Bluetooth: - hci_event: add support for handling LE BIG Sync Lost event - ISO: add socket option to report packet seqnum via CMSG - ISO: support SCM_TIMESTAMPING for ISO TS - Bluetooth drivers: - intel_pcie: support Function Level Reset - nxpuart: add support for 4M baudrate - nxpuart: implement powerup sequence, reset, FW dump, and FW loading" * tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits) dpll: zl3073x: Fix build failure selftests: bpf: fix legacy netfilter options ipv6: annotate data-races around rt->fib6_nsiblings ipv6: fix possible infinite loop in fib6_info_uses_dev() ipv6: prevent infinite loop in rt6_nlmsg_size() ipv6: add a retry logic in net6_rt_notify() vrf: Drop existing dst reference in vrf_ip6_input_dst net/sched: taprio: align entry index attr validation with mqprio net: fsl_pq_mdio: use dev_err_probe selftests: rtnetlink.sh: remove esp4_offload after test vsock: remove unnecessary null check in vsock_getname() igb: xsk: solve negative overflow of nb_pkts in zerocopy mode stmmac: xsk: fix negative overflow of budget in zerocopy mode dt-bindings: ieee802154: Convert at86rf230.txt yaml format net: dsa: microchip: Disable PTP function of KSZ8463 net: dsa: microchip: Setup fiber ports for KSZ8463 net: dsa: microchip: Write switch MAC address differently for KSZ8463 net: dsa: microchip: Use different registers for KSZ8463 net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver dt-bindings: net: dsa: microchip: Add KSZ8463 switch support ...
2 daysMerge tag 'sysctl-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl updates from Joel Granados: - Move sysctls out of the kern_table array This is the final move of ctl_tables into their respective subsystems. Only 5 (out of the original 50) will remain in kernel/sysctl.c file; these handle either sysctl or common arch variables. By decentralizing sysctl registrations, subsystem maintainers regain control over their sysctl interfaces, improving maintainability and reducing the likelihood of merge conflicts. - docs: Remove false positives from check-sysctl-docs Stopped falsely identifying sysctls as undocumented or unimplemented in the check-sysctl-docs script. This script can now be used to automatically identify if documentation is missing. * tag 'sysctl-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: (23 commits) docs: Downgrade arm64 & riscv from titles to comment docs: Replace spaces with tabs in check-sysctl-docs docs: Remove colon from ctltable title in vm.rst docs: Add awk section for ucount sysctl entries docs: Use skiplist when checking sysctl admin-guide docs: nixify check-sysctl-docs sysctl: rename kern_table -> sysctl_subsys_table kernel/sys.c: Move overflow{uid,gid} sysctl into kernel/sys.c uevent: mv uevent_helper into kobject_uevent.c sysctl: Removed unused variable sysctl: Nixify sysctl.sh sysctl: Remove superfluous includes from kernel/sysctl.c sysctl: Remove (very) old file changelog sysctl: Move sysctl_panic_on_stackoverflow to kernel/panic.c sysctl: move cad_pid into kernel/pid.c sysctl: Move tainted ctl_table into kernel/panic.c Input: sysrq: mv sysrq into drivers/tty/sysrq.c fork: mv threads-max into kernel/fork.c parisc/power: Move soft-power into power.c mm: move randomize_va_space into memory.c ...
2 daysMerge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "A quick summary: perf support for Branch Record Buffer Extensions (BRBE), typical PMU hardware updates, small additions to MTE for store-only tag checking and exposing non-address bits to signal handlers, HAVE_LIVEPATCH enabled on arm64, VMAP_STACK forced on. There is also a TLBI optimisation on hardware that does not require break-before-make when changing the user PTEs between contiguous and non-contiguous. More details: Perf and PMU updates: - Add support for new (v3) Hisilicon SLLC and DDRC PMUs - Add support for Arm-NI PMU integrations that share interrupts between clock domains within a given instance - Allow SPE to be configured with a lower sample period than the minimum recommendation advertised by PMSIDR_EL1.Interval - Add suppport for Arm's "Branch Record Buffer Extension" (BRBE) - Adjust the perf watchdog period according to cpu frequency changes - Minor driver fixes and cleanups Hardware features: - Support for MTE store-only checking (FEAT_MTE_STORE_ONLY) - Support for reporting the non-address bits during a synchronous MTE tag check fault (FEAT_MTE_TAGGED_FAR) - Optimise the TLBI when folding/unfolding contiguous PTEs on hardware with FEAT_BBM (break-before-make) level 2 and no TLB conflict aborts Software features: - Enable HAVE_LIVEPATCH after implementing arch_stack_walk_reliable() and using the text-poke API for late module relocations - Force VMAP_STACK always on and change arm64_efi_rt_init() to use arch_alloc_vmap_stack() in order to avoid KASAN false positives ACPI: - Improve SPCR handling and messaging on systems lacking an SPCR table Debug: - Simplify the debug exception entry path - Drop redundant DBG_MDSCR_* macros Kselftests: - Cleanups and improvements for SME, SVE and FPSIMD tests Miscellaneous: - Optimise loop to reduce redundant operations in contpte_ptep_get() - Remove ISB when resetting POR_EL0 during signal handling - Mark the kernel as tainted on SEA and SError panic - Remove redundant gcs_free() call" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits) arm64/gcs: task_gcs_el0_enable() should use passed task arm64: Kconfig: Keep selects somewhat alphabetically ordered arm64: signal: Remove ISB when resetting POR_EL0 kselftest/arm64: Handle attempts to disable SM on SME only systems kselftest/arm64: Fix SVE write data generation for SME only systems kselftest/arm64: Test SME on SME only systems in fp-ptrace kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace kselftest/arm64: Allow sve-ptrace to run on SME only systems arm64/mm: Drop redundant addr increment in set_huge_pte_at() kselftest/arm4: Provide local defines for AT_HWCAP3 arm64: Mark kernel as tainted on SAE and SError panic arm64/gcs: Don't call gcs_free() when releasing task_struct drivers/perf: hisi: Support PMUs with no interrupt drivers/perf: hisi: Relax the event number check of v2 PMUs drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver drivers/perf: hisi: Simplify the probe process for each DDRC version perf/arm-ni: Support sharing IRQs within an NI instance perf/arm-ni: Consolidate CPU affinity handling ...
2 daysMerge tag 'locking-core-2025-07-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Locking primitives: - Mark devm_mutex_init() as __must_check and fix drivers that didn't check the return code (Thomas Weißschuh) - Reorganize <linux/local_lock.h> to better expose the internal APIs to local variables (Sebastian Andrzej Siewior) - Remove OWNER_SPINNABLE in rwsem (Jinliang Zheng) - Remove redundant #ifdefs in the mutex code (Ran Xiaokai) Lockdep: - Avoid returning struct in lock_stats() (Arnd Bergmann) - Change `static const` into enum for LOCKF_*_IRQ_* (Arnd Bergmann) - Temporarily use synchronize_rcu_expedited() in lockdep_unregister_key() to speed things up. (Breno Leitao) Rust runtime: - Add #[must_use] to Lock::try_lock() (Jason Devers)" * tag 'locking-core-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Speed up lockdep_unregister_key() with expedited RCU synchronization locking/mutex: Remove redundant #ifdefs locking/lockdep: Change 'static const' variables to enum values locking/lockdep: Avoid struct return in lock_stats() locking/rwsem: Use OWNER_NONSPINNABLE directly instead of OWNER_SPINNABLE rust: sync: Add #[must_use] to Lock::try_lock() locking/mutex: Mark devm_mutex_init() as __must_check leds: lp8860: Check return value of devm_mutex_init() spi: spi-nxp-fspi: Check return value of devm_mutex_init() local_lock: Move this_cpu_ptr() notation from internal to main header
2 daysMerge tag 'sched-core-2025-07-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Core scheduler changes: - Better tracking of maximum lag of tasks in presence of different slices duration, for better handling of lag in the fair scheduler (Vincent Guittot) - Clean up and standardize #if/#else/#endif markers throughout the entire scheduler code base (Ingo Molnar) - Make SMP unconditional: build the SMP scheduler's data structures and logic on UP kernel too, even though they are not used, to simplify the scheduler and remove around 200 #ifdef/[#else]/#endif blocks from the scheduler (Ingo Molnar) - Reorganize cgroup bandwidth control interface handling for better interfacing with sched_ext (Tejun Heo) Balancing: - Bump sd->max_newidle_lb_cost when newidle balance fails (Chris Mason) - Remove sched_domain_topology_level::flags to simplify the code (Prateek Nayak) - Simplify and clean up build_sched_topology() (Li Chen) - Optimize build_sched_topology() on large machines (Li Chen) Real-time scheduling: - Add initial version of proxy execution: a mechanism for mutex-owning tasks to inherit the scheduling context of higher priority waiters. Currently limited to a single runqueue and conditional on CONFIG_EXPERT, and other limitations (John Stultz, Peter Zijlstra, Valentin Schneider) - Deadline scheduler (Juri Lelli): - Fix dl_servers initialization order (Juri Lelli) - Fix DL scheduler's root domain reinitialization logic (Juri Lelli) - Fix accounting bugs after global limits change (Juri Lelli) - Fix scalability regression by implementing less agressive dl_server handling (Peter Zijlstra) PSI: - Improve scalability by optimizing psi_group_change() cpu_clock() usage (Peter Zijlstra) Rust changes: - Make Task, CondVar and PollCondVar methods inline to avoid unnecessary function calls (Kunwu Chan, Panagiotis Foliadis) - Add might_sleep() support for Rust code: Rust's "#[track_caller]" mechanism is used so that Rust's might_sleep() doesn't need to be defined as a macro (Fujita Tomonori) - Introduce file_from_location() (Boqun Feng) Debugging & instrumentation: - Make clangd usable with scheduler source code files again (Peter Zijlstra) - tools: Add root_domains_dump.py which dumps root domains info (Juri Lelli) - tools: Add dl_bw_dump.py for printing bandwidth accounting info (Juri Lelli) Misc cleanups & fixes: - Remove play_idle() (Feng Lee) - Fix check_preemption_disabled() (Sebastian Andrzej Siewior) - Do not call __put_task_struct() on RT if pi_blocked_on is set (Luis Claudio R. Goncalves) - Correct the comment in place_entity() (wang wei)" * tag 'sched-core-2025-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (84 commits) sched/idle: Remove play_idle() sched: Do not call __put_task_struct() on rt if pi_blocked_on is set sched: Start blocked_on chain processing in find_proxy_task() sched: Fix proxy/current (push,pull)ability sched: Add an initial sketch of the find_proxy_task() function sched: Fix runtime accounting w/ split exec & sched contexts sched: Move update_curr_task logic into update_curr_se locking/mutex: Add p->blocked_on wrappers for correctness checks locking/mutex: Rework task_struct::blocked_on sched: Add CONFIG_SCHED_PROXY_EXEC & boot argument to enable/disable sched/topology: Remove sched_domain_topology_level::flags x86/smpboot: avoid SMT domain attach/destroy if SMT is not enabled x86/smpboot: moves x86_topology to static initialize and truncate x86/smpboot: remove redundant CONFIG_SCHED_SMT smpboot: introduce SDTL_INIT() helper to tidy sched topology setup tools/sched: Add dl_bw_dump.py for printing bandwidth accounting info tools/sched: Add root_domains_dump.py which dumps root domains info sched/deadline: Fix accounting after global limits change sched/deadline: Reset extra_bw to max_bw when clearing root domains sched/deadline: Initialize dl_servers after SMP ...
2 daysMerge tag 'x86_bugs_for_v6.17_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CPU mitigation updates from Borislav Petkov: - Untangle the Retbleed from the ITS mitigation on Intel. Allow for ITS to enable stuffing independently from Retbleed, do some cleanups to simplify and streamline the code - Simplify SRSO and make mitigation types selection more versatile depending on the Retbleed mitigation selection. Simplify code some - Add the second part of the attack vector controls which provide a lot friendlier user interface to the speculation mitigations than selecting each one by one as it is now. Instead, the selection of whole attack vectors which are relevant to the system in use can be done and protection against only those vectors is enabled, thus giving back some performance to the users * tag 'x86_bugs_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/bugs: Print enabled attack vectors x86/bugs: Add attack vector controls for TSA x86/pti: Add attack vector controls for PTI x86/bugs: Add attack vector controls for ITS x86/bugs: Add attack vector controls for SRSO x86/bugs: Add attack vector controls for L1TF x86/bugs: Add attack vector controls for spectre_v2 x86/bugs: Add attack vector controls for BHI x86/bugs: Add attack vector controls for spectre_v2_user x86/bugs: Add attack vector controls for retbleed x86/bugs: Add attack vector controls for spectre_v1 x86/bugs: Add attack vector controls for GDS x86/bugs: Add attack vector controls for SRBDS x86/bugs: Add attack vector controls for RFDS x86/bugs: Add attack vector controls for MMIO x86/bugs: Add attack vector controls for TAA x86/bugs: Add attack vector controls for MDS x86/bugs: Define attack vectors relevant for each bug x86/Kconfig: Add arch attack vector support cpu: Define attack vectors ...
2 daysMerge tag 'stop-machine.2025.07.23a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull stop-machine documentation updates from Paul McKenney: - Improve kernel-doc function-header comments - Document preemption and stop_machine() mutual exclusion (Joel Fernandes) * tag 'stop-machine.2025.07.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: smp: Document preemption and stop_machine() mutual exclusion stop_machine: Improve kernel-doc function-header comments
2 daysMerge tag 'core-entry-2025-07-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull generic entry code updates from Thomas Gleixner: - Split the code into syscall and exception/interrupt parts to ease the conversion of ARM[64] to the generic entry infrastructure - Extend syscall user dispatching to support a single intercepted range instead of the default single non-intercepted range. That allows monitoring/analysis of a specific executable range, e.g. a library, and also provides flexibility for sandboxing scenarios - Cleanup and extend the user dispatch selftest * tag 'core-entry-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry: Split generic entry into generic exception and syscall entry selftests: Add tests for PR_SYS_DISPATCH_INCLUSIVE_ON syscall_user_dispatch: Add PR_SYS_DISPATCH_INCLUSIVE_ON selftests: Fix errno checking in syscall_user_dispatch test
2 daysMerge tag 'locking-futex-2025-07-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull futex updates from Thomas Gleixner: - Switch the reference counting to a RCU based per-CPU reference to address a performance bottleneck vs the single instance rcuref variant - Make the futex selftest build on 32-bit architectures which only support 64-bit time_t, e.g. RISCV-32 - Cleanups and improvements in selftests and futex bench * tag 'locking-futex-2025-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/futex: Fix spelling mistake "Succeffuly" -> "Successfully" selftests/futex: Define SYS_futex on 32-bit architectures with 64-bit time_t perf bench futex: Remove support for IMMUTABLE selftests/futex: Remove support for IMMUTABLE futex: Remove support for IMMUTABLE futex: Make futex_private_hash_get() static futex: Use RCU-based per-CPU reference counting instead of rcuref_t selftests/futex: Adapt the private hash test to RCU related changes
2 daysMerge tag 'timers-ptp-2025-07-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping and VDSO updates from Thomas Gleixner: - Introduce support for auxiliary timekeepers PTP clocks can be disconnected from the universal CLOCK_TAI reality for various reasons including regularatory requirements for functional safety redundancy. The kernel so far only supports a single notion of time, which means that all clocks are correlated in frequency and only differ by offset to each other. Access to non-correlated PTP clocks has been available so far only through the file descriptor based "POSIX clock IDs", which are subject to locking and have to go all the way out to the hardware. The access is not only horribly slow, as it has to go all the way out to the NIC/PTP hardware, but that also prevents the kernel to read the time of such clocks e.g. from the network stack, where it is required for TSN networking both on the transmit and receive side unless the hardware provides offloading. The auxiliary clocks provide a mechanism to support arbitrary clocks which are not correlated to the system clock. This is not restricted to the PTP use case on purpose as there is no kernel side association of these clocks to a particular PTP device because that's a pure user space configuration decision. Having them independent allows to utilize them for other purposes and also enables them to be tested without hardware dependencies. To avoid pointless overhead these clocks have to be enabled individualy via a new sysfs interface to reduce the overhead to a single compare in the hotpath if they are enabled at the Kconfig level at all. These clocks utilize the existing timekeeping/NTP infrastructures, which has been made possible over the recent releases by incrementaly converting these infrastructures over from a single static instance to a multi-instance pointer based implementation without any performance regression reported. The auxiliary clocks provide the same "emulation" of a "correct" clock as the existing CLOCK_* variants do with an independent instance of data and provide the same steering mechanism through the existing sys_clock_adjtime() interface, which has been confirmed to work by the chronyd(8) maintainer. That allows to provide lockless kernel internal and VDSO support so that applications and kernel internal functionalities can access these clocks without restrictions and at the same performance as the existing system clocks. - Avoid double notifications in the adjtimex() syscall. Not a big issue, but a trivial to avoid latency source. * tag 'timers-ptp-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits) vdso/gettimeofday: Add support for auxiliary clocks vdso/vsyscall: Update auxiliary clock data in the datapage vdso: Introduce aux_clock_resolution_ns() vdso/gettimeofday: Introduce vdso_get_timestamp() vdso/gettimeofday: Introduce vdso_set_timespec() vdso/gettimeofday: Introduce vdso_clockid_valid() vdso/gettimeofday: Return bool from clock_gettime() helpers vdso/gettimeofday: Return bool from clock_getres() helpers vdso/helpers: Add helpers for seqlocks of single vdso_clock vdso/vsyscall: Split up __arch_update_vsyscall() into __arch_update_vdso_clock() vdso/vsyscall: Introduce a helper to fill clock configurations timekeeping: Remove the temporary CLOCK_AUX workaround timekeeping: Provide ktime_get_clock_ts64() timekeeping: Provide interface to control auxiliary clocks timekeeping: Provide update for auxiliary timekeepers timekeeping: Provide adjtimex() for auxiliary clocks timekeeping: Prepare do_adtimex() for auxiliary clocks timekeeping: Make do_adjtimex() reusable timekeeping: Add auxiliary clock support to __timekeeping_inject_offset() timekeeping: Make timekeeping_inject_offset() reusable ...
2 daysMerge tag 'timers-core-2025-07-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: - Simplify the logic in the timer migration code - Simplify the clocksource code by utilizing the more modern cpumask+*() interfaces * tag 'timers-core-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource: Use cpumask_next_wrap() in clocksource_watchdog() clocksource: Use cpumask_any_but() in clocksource_verify_choose_cpus() timers/migration: Clean up the loop in tmigr_quick_check()
2 daysMerge tag 'timers-cleanups-2025-07-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A treewide cleanup of struct cycle_counter const annotations. The initial idea of making them const was correct as they were seperate instances. When they got embedded into larger data structures, which are even modified by the callback this got moot. The only reason why this went unnoticed is that the required container_of() casts the const attribute forcefully away. Stop pretending that it is const" * tag 'timers-cleanups-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time/timecounter: Fix the lie that struct cyclecounter is const
2 daysMerge tag 'irq-drivers-2025-07-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt chip driver updates from Thomas Gleixner: - Add support of forced affinity setting to yet offline CPUs for the MIPS-GIC to ensure that the affinity of per CPU interrupts can be set during the early bringup phase of a secondary CPU in the hotplug code before the CPU is set online and interrupts are enabled - Add support for the MIPS (RISC-V !?!?) P8700 SoC in the ACLINT_SSWI interrupt chip - Make the interrupt routing to RISV-V harts specification compliant so it supports arbitrary hart indices - Add a command line parameter and related handling to disable the generic RISCV IMSIC mechanism on platforms which use a trap-emulated IMSIC. Unfortunatly this is required because there is no mechanism available to discover this programatically. - Enable wakeup sources on the Renesas RZV2H driver - Convert interrupt chip drivers, which use a open coded variant of msi_create_parent_irq_domain() to use the new functionality - Convert interrupt chip drivers, which use the old style two level implementation of MSI support over to the MSI parent mechanism to prepare for removing at least one of the three PCI/MSI backend variants. - The usual cleanups and improvements all over the place * tag 'irq-drivers-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) irqchip/renesas-irqc: Convert to DEFINE_SIMPLE_DEV_PM_OPS() irqchip/renesas-intc-irqpin: Convert to DEFINE_SIMPLE_DEV_PM_OPS() irqchip/riscv-imsic: Add kernel parameter to disable IPIs irqchip/gic-v3: Fix GICD_CTLR register naming irqchip/ls-scfg-msi: Fix NULL dereference in error handling irqchip/ls-scfg-msi: Switch to use msi_create_parent_irq_domain() irqchip/armada-370-xp: Switch to msi_create_parent_irq_domain() irqchip/alpine-msi: Switch to msi_create_parent_irq_domain() irqchip/alpine-msi: Convert to __free irqchip/alpine-msi: Convert to lock guards irqchip/alpine-msi: Clean up whitespace style irqchip/sg2042-msi: Switch to msi_create_parent_irq_domain() irqchip/loongson-pch-msi.c: Switch to msi_create_parent_irq_domain() irqchip/imx-mu-msi: Convert to msi_create_parent_irq_domain() helper irqchip/riscv-imsic: Convert to msi_create_parent_irq_domain() helper irqchip/bcm2712-mip: Switch to msi_create_parent_irq_domain() irqdomain: Add device pointer to irq_domain_info and msi_domain_info irqchip/renesas-rzv2h: Remove unneeded includes irqchip/renesas-rzv2h: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND irqchip/aslint-sswi: Resolve hart index ...
2 daysMerge tag 'smp-core-2025-07-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull smp updates from Thomas Gleixner: "A set of updates for SMP function calls: - Improve locality of smp_call_function_any() by utilizing sched_numa_find_nth_cpu() instead of picking a random CPU - Wait for work completion in smp_call_function_many_cond() only when there was actually work enqueued - Simplify functions by unutlizing the appropriate cpumask_*() interfaces - Trivial cleanups" * tag 'smp-core-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Wait only if work was enqueued smp: Defer check for local execution in smp_call_function_many_cond() smp: Use cpumask_any_but() in smp_call_function_many_cond() smp: Improve locality in smp_call_function_any() smp: Fix typo in comment for raw_smp_processor_id()
2 daysMerge tag 'irq-core-2025-07-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: - Prevent a interrupt migration related live lock in handle_edge_irq() If the interrupt affinity is moved to a new target CPU and the interrupt is currently handled on the previous target CPU for edge type interrupts the handler might get stuck on the previous target for a long time, which causes both involved CPUs to waste cycles and eventually run into a soft-lockup situation. Solve this by checking whether the interrupt is redirected to a new target CPU and if the interrupt is handled on that new target CPU, busy wait for completion instead of masking it and sending the pending but which would cause the old CPU to re-run the handler and in the worst case repeating this excercise for a long time. This only works on architectures which use single CPU interrupt targets, but that's so far the only ones where this behaviour has been observed. - Add a kunit test for interrupt disable depth counts The nested interrupt disable depth has been an issue in the past especially vs. free_irq(), interrupt shutdown and CPU hotplug and their interactions. The test exercises the combinations of these scenarios and checks for correctness. * tag 'irq-core-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Prevent migration live lock in handle_edge_irq() genirq: Split up irq_pm_check_wakeup() genirq: Move irq_wait_for_poll() to call site genirq: Remove pointless local variable genirq: Add kunit tests for depth counts
2 daysMerge tag 'driver-core-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "debugfs: - Remove unneeded debugfs_file_{get,put}() instances - Remove last remnants of debugfs_real_fops() - Allow storing non-const void * in struct debugfs_inode_info::aux sysfs: - Switch back to attribute_group::bin_attrs (treewide) - Switch back to bin_attribute::read()/write() (treewide) - Constify internal references to 'struct bin_attribute' Support cache-ids for device-tree systems: - Add arch hook arch_compact_of_hwid() - Use arch_compact_of_hwid() to compact MPIDR values on arm64 Rust: - Device: - Introduce CoreInternal device context (for bus internal methods) - Provide generic drvdata accessors for bus devices - Provide Driver::unbind() callbacks - Use the infrastructure above for auxiliary, PCI and platform - Implement Device::as_bound() - Rename Device::as_ref() to Device::from_raw() (treewide) - Implement fwnode and device property abstractions - Implement example usage in the Rust platform sample driver - Devres: - Remove the inner reference count (Arc) and use pin-init instead - Replace Devres::new_foreign_owned() with devres::register() - Require T to be Send in Devres<T> - Initialize the data kept inside a Devres last - Provide an accessor for the Devres associated Device - Device ID: - Add support for ACPI device IDs and driver match tables - Split up generic device ID infrastructure - Use generic device ID infrastructure in net::phy - DMA: - Implement the dma::Device trait - Add DMA mask accessors to dma::Device - Implement dma::Device for PCI and platform devices - Use DMA masks from the DMA sample module - I/O: - Implement abstraction for resource regions (struct resource) - Implement resource-based ioremap() abstractions - Provide platform device accessors for I/O (remap) requests - Misc: - Support fallible PinInit types in Revocable - Implement Wrapper<T> for Opaque<T> - Merge pin-init blanket dependencies (for Devres) Misc: - Fix OF node leak in auxiliary_device_create() - Use util macros in device property iterators - Improve kobject sample code - Add device_link_test() for testing device link flags - Fix typo in Documentation/ABI/testing/sysfs-kernel-address_bits - Hint to prefer container_of_const() over container_of()" * tag 'driver-core-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (84 commits) rust: io: fix broken intra-doc links to `platform::Device` rust: io: fix broken intra-doc link to missing `flags` module rust: io: mem: enable IoRequest doc-tests rust: platform: add resource accessors rust: io: mem: add a generic iomem abstraction rust: io: add resource abstraction rust: samples: dma: set DMA mask rust: platform: implement the `dma::Device` trait rust: pci: implement the `dma::Device` trait rust: dma: add DMA addressing capabilities rust: dma: implement `dma::Device` trait rust: net::phy Change module_phy_driver macro to use module_device_table macro rust: net::phy represent DeviceId as transparent wrapper over mdio_device_id rust: device_id: split out index support into a separate trait device: rust: rename Device::as_ref() to Device::from_raw() arm64: cacheinfo: Provide helper to compress MPIDR value into u32 cacheinfo: Add arch hook to compress CPU h/w id into 32 bits for cache-id cacheinfo: Set cache 'id' based on DT data container_of: Document container_of() is not to be used in new code driver core: auxiliary bus: fix OF node leak ...
2 daysring-buffer: Make the const read-only 'type' staticColin Ian King
Don't populate the read-only 'type' on the stack at run time, instead make it static. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20250714160858.1234719-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
3 daysMerge tag 'kvm-x86-irqs-6.17' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM IRQ changes for 6.17 - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI. - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand. - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time. - Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c. - Fix a variety of flaws and bugs in the AVIC device posted IRQ code. - Inhibited AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation. - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry. - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs. - Dedup x86's device posted IRQ code, as the vast majority of functionality can be shared verbatime between SVM and VMX. - Harden the device posted IRQ code against bugs and runtime errors. - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n). - Generate GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU. - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs. - Clean up and document/comment the irqfd assignment code. - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique.
3 daystracing: fprobe: Fix infinite recursion using preempt_*_notrace()Masami Hiramatsu (Google)
Since preempt_count_add/del() are tracable functions, it is not allowed to use preempt_disable/enable() in ftrace handlers. Without this fix, probing on `preempt_count_add%return` will cause an infinite recursion of fprobes. To fix this problem, use preempt_disable/enable_notrace() in fprobe_return(). Link: https://lore.kernel.org/all/175374642359.1471729.1054175011228386560.stgit@mhiramat.tok.corp.google.com/ Fixes: 4346ba160409 ("fprobe: Rewrite fprobe on function-graph tracer") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
3 daysMerge tag 'pm-6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "As is tradition, cpufreq is the part with the largest number of updates that include core fixes and cleanups as well as updates of several assorted drivers, but there are also quite a few updates related to system sleep, mostly focused on asynchronous suspend and resume of devices and on making the integration of system suspend and resume with runtime PM easier. Runtime PM is also updated to allow some code duplication in drivers to be eliminated going forward and to work more consistently overall in some cases. Apart from that, there are some driver core updates related to PM domains that should help to address ordering issues with devm_ cleanup routines relying on PM domains, some assorted devfreq updates including core fixes and cleanups, tooling updates, and documentation and MAINTAINERS updates. Specifics: - Fix two initialization ordering issues in the cpufreq core and a governor initialization error path in it, and clean it up (Lifeng Zheng) - Add Granite Rapids support in no-HWP mode to the intel_pstate cpufreq driver (Li RongQing) - Make intel_pstate always use HWP_DESIRED_PERF when operating in the passive mode (Rafael Wysocki) - Allow building the tegra124 cpufreq driver as a module (Aaron Kling) - Do minor cleanups for Rust cpufreq and cpumask APIs and fix MAINTAINERS entry for cpu.rs (Abhinav Ananthu, Ritvik Gupta, Lukas Bulwahn) - Clean up assorted cpufreq drivers (Arnd Bergmann, Dan Carpenter, Krzysztof Kozlowski, Sven Peter, Svyatoslav Ryhel, Lifeng Zheng) - Add the NEED_UPDATE_LIMITS flag to the CPPC cpufreq driver (Prashant Malani) - Fix minimum performance state label error in the amd-pstate driver documentation (Shouye Liu) - Add the CPUFREQ_GOV_STRICT_TARGET flag to the userspace cpufreq governor and explain HW coordination influence on it in the documentation (Shashank Balaji) - Fix opencoded for_each_cpu() in idle_state_valid() in the DT cpuidle driver (Yury Norov) - Remove info about non-existing QoS interfaces from the PM QoS documentation (Ulf Hansson) - Use c_* types via kernel prelude in Rust for OPP (Abhinav Ananthu) - Add HiSilicon uncore frequency scaling driver to devfreq (Jie Zhan) - Allow devfreq drivers to add custom sysfs ABIs (Jie Zhan) - Simplify the sun8i-a33-mbus devfreq driver by using more devm functions (Uwe Kleine-König) - Fix an index typo in trans_stat() in devfreq (Chanwoo Choi) - Check devfreq governor before using governor->name (Lifeng Zheng) - Remove a redundant devfreq_get_freq_range() call from devfreq_add_device() (Lifeng Zheng) - Limit max_freq with scaling_min_freq in devfreq (Lifeng Zheng) - Replace sscanf() with kstrtoul() in set_freq_store() (Lifeng Zheng) - Extend the asynchronous suspend and resume of devices to handle suppliers like parents and consumers like children (Rafael Wysocki) - Make pm_runtime_force_resume() work for drivers that set the DPM_FLAG_SMART_SUSPEND flag and allow PCI drivers and drivers that collaborate with the general ACPI PM domain to set it (Rafael Wysocki) - Add kernel parameter to disable asynchronous suspend/resume of devices (Tudor Ambarus) - Drop redundant might_sleep() calls from some functions in the device suspend/resume core code (Zhongqiu Han) - Fix the handling of monitors connected right before waking up the system from sleep (tuhaowen) - Clean up MAINTAINERS entries for suspend and hibernation (Rafael Wysocki) - Fix error code path in the KEXEC_JUMP flow and drop a redundant pm_restore_gfp_mask() call from it (Rafael Wysocki) - Rearrange suspend/resume error handling in the core device suspend and resume code (Rafael Wysocki) - Fix up white space that does not follow coding style in the hibernation core code (Darshan Rathod) - Document return values of suspend-related API functions in the runtime PM framework (Sakari Ailus) - Mark last busy stamp in multiple autosuspend-related functions in the runtime PM framework and update its documentation (Sakari Ailus) - Take active children into account in pm_runtime_get_if_in_use() for consistency (Rafael Wysocki) - Fix NULL pointer dereference in get_pd_power_uw() in the dtpm_cpu power capping driver (Sivan Zohar-Kotzer) - Add support for the Bartlett Lake platform to the Intel RAPL power capping driver (Qiao Wei) - Add PL4 support for Panther Lake to the intel_rapl_msr power capping driver (Zhang Rui) - Update contact information in the PM ABI docs and maintainer information in the power domains DT binding (Rafael Wysocki) - Update PM header inclusions to follow the IWYU (Include What You Use) principle (Andy Shevchenko) - Add flags to specify power on attach/detach for PM domains, make the driver core detach PM domains in device_unbind_cleanup(), and drop the dev_pm_domain_detach() call from the platform bus type (Claudiu Beznea) - Improve Python binding's Makefile for cpupower (John B. Wyatt IV) - Fix printing of CORE, CPU fields in cpupower-monitor (Gautham Shenoy)" * tag 'pm-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits) cpufreq: CPPC: Mark driver with NEED_UPDATE_LIMITS flag PM: docs: Use my kernel.org address in ABI docs and DT bindings PM: hibernate: Fix up white space that does not follow coding style PM: sleep: Rearrange suspend/resume error handling in the core Documentation: amd-pstate:fix minimum performance state label error PM: runtime: Take active children into account in pm_runtime_get_if_in_use() kexec_core: Drop redundant pm_restore_gfp_mask() call kexec_core: Fix error code path in the KEXEC_JUMP flow PM: sleep: Clean up MAINTAINERS entries for suspend and hibernation drivers: cpufreq: add Tegra114 support rust: cpumask: Replace `MaybeUninit` and `mem::zeroed` with `Opaque` APIs cpufreq: Exit governor when failed to start old governor cpufreq: Move the check of cpufreq_driver->get into cpufreq_verify_current_freq() cpufreq: Init policy->rwsem before it may be possibly used cpufreq: Initialize cpufreq-based frequency-invariance later cpufreq: Remove duplicate check in __cpufreq_offline() cpufreq: Contain scaling_cur_freq.attr in cpufreq_attrs cpufreq: intel_pstate: Add Granite Rapids support in no-HWP mode cpufreq: intel_pstate: Always use HWP_DESIRED_PERF in passive mode PM / devfreq: Add HiSilicon uncore frequency scaling driver ...
3 daysbpf: Add log for attaching tracing programs to functions in deny listKaFai Wan
Show the rejected function name when attaching tracing programs to functions in deny list. With this change, we know why tracing programs can't attach to functions like __rcu_read_lock() from log. $ ./fentry libbpf: prog '__rcu_read_lock': BPF program load failed: -EINVAL libbpf: prog '__rcu_read_lock': -- BEGIN PROG LOAD LOG -- Attaching tracing programs to function '__rcu_read_lock' is rejected. Suggested-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: KaFai Wan <kafai.wan@linux.dev> Acked-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250724151454.499040-3-kafai.wan@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysbpf: Show precise rejected function when attaching fexit/fmod_ret to ↵KaFai Wan
__noreturn functions With this change, we know the precise rejected function name when attaching fexit/fmod_ret to __noreturn functions from log. $ ./fexit libbpf: prog 'fexit': BPF program load failed: -EINVAL libbpf: prog 'fexit': -- BEGIN PROG LOAD LOG -- Attaching fexit/fmod_ret to __noreturn function 'do_exit' is rejected. Suggested-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: KaFai Wan <kafai.wan@linux.dev> Acked-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250724151454.499040-2-kafai.wan@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
3 daysMerge tag 'audit-pr-20250725' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit update from Paul Moore: "A single audit patch that restores logging of an audit event in the module load failure case" * tag 'audit-pr-20250725' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit,module: restore audit logging in load failure case
3 daysMerge tag 'libcrypto-updates-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull crypto library updates from Eric Biggers: "This is the main crypto library pull request for 6.17. The main focus this cycle is on reorganizing the SHA-1 and SHA-2 code, providing high-quality library APIs for SHA-1 and SHA-2 including HMAC support, and establishing conventions for lib/crypto/ going forward: - Migrate the SHA-1 and SHA-512 code (and also SHA-384 which shares most of the SHA-512 code) into lib/crypto/. This includes both the generic and architecture-optimized code. Greatly simplify how the architecture-optimized code is integrated. Add an easy-to-use library API for each SHA variant, including HMAC support. Finally, reimplement the crypto_shash support on top of the library API. - Apply the same reorganization to the SHA-256 code (and also SHA-224 which shares most of the SHA-256 code). This is a somewhat smaller change, due to my earlier work on SHA-256. But this brings in all the same additional improvements that I made for SHA-1 and SHA-512. There are also some smaller changes: - Move the architecture-optimized ChaCha, Poly1305, and BLAKE2s code from arch/$(SRCARCH)/lib/crypto/ to lib/crypto/$(SRCARCH)/. For these algorithms it's just a move, not a full reorganization yet. - Fix the MIPS chacha-core.S to build with the clang assembler. - Fix the Poly1305 functions to work in all contexts. - Fix a performance regression in the x86_64 Poly1305 code. - Clean up the x86_64 SHA-NI optimized SHA-1 assembly code. Note that since the new organization of the SHA code is much simpler, the diffstat of this pull request is negative, despite the addition of new fully-documented library APIs for multiple SHA and HMAC-SHA variants. These APIs will allow further simplifications across the kernel as users start using them instead of the old-school crypto API. (I've already written a lot of such conversion patches, removing over 1000 more lines of code. But most of those will target 6.18 or later)" * tag 'libcrypto-updates-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: (67 commits) lib/crypto: arm64/sha512-ce: Drop compatibility macros for older binutils lib/crypto: x86/sha1-ni: Convert to use rounds macros lib/crypto: x86/sha1-ni: Minor optimizations and cleanup crypto: sha1 - Remove sha1_base.h lib/crypto: x86/sha1: Migrate optimized code into library lib/crypto: sparc/sha1: Migrate optimized code into library lib/crypto: s390/sha1: Migrate optimized code into library lib/crypto: powerpc/sha1: Migrate optimized code into library lib/crypto: mips/sha1: Migrate optimized code into library lib/crypto: arm64/sha1: Migrate optimized code into library lib/crypto: arm/sha1: Migrate optimized code into library crypto: sha1 - Use same state format as legacy drivers crypto: sha1 - Wrap library and add HMAC support lib/crypto: sha1: Add HMAC support lib/crypto: sha1: Add SHA-1 library functions lib/crypto: sha1: Rename sha1_init() to sha1_init_raw() crypto: x86/sha1 - Rename conflicting symbol lib/crypto: sha2: Add hmac_sha*_init_usingrawkey() lib/crypto: arm/poly1305: Remove unneeded empty weak function lib/crypto: x86/poly1305: Fix performance regression on short messages ...
3 daysMerge tag 'hardening-v6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening updates from Kees Cook: - Introduce and start using TRAILING_OVERLAP() helper for fixing embedded flex array instances (Gustavo A. R. Silva) - mux: Convert mux_control_ops to a flex array member in mux_chip (Thorsten Blum) - string: Group str_has_prefix() and strstarts() (Andy Shevchenko) - Remove KCOV instrumentation from __init and __head (Ritesh Harjani, Kees Cook) - Refactor and rename stackleak feature to support Clang - Add KUnit test for seq_buf API - Fix KUnit fortify test under LTO * tag 'hardening-v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (22 commits) sched/task_stack: Add missing const qualifier to end_of_stack() kstack_erase: Support Clang stack depth tracking kstack_erase: Add -mgeneral-regs-only to silence Clang warnings init.h: Disable sanitizer coverage for __init and __head kstack_erase: Disable kstack_erase for all of arm compressed boot code x86: Handle KCOV __init vs inline mismatches arm64: Handle KCOV __init vs inline mismatches s390: Handle KCOV __init vs inline mismatches arm: Handle KCOV __init vs inline mismatches mips: Handle KCOV __init vs inline mismatch powerpc/mm/book3s64: Move kfence and debug_pagealloc related calls to __init section configs/hardening: Enable CONFIG_INIT_ON_FREE_DEFAULT_ON configs/hardening: Enable CONFIG_KSTACK_ERASE stackleak: Split KSTACK_ERASE_CFLAGS from GCC_PLUGINS_CFLAGS stackleak: Rename stackleak_track_stack to __sanitizer_cov_stack_depth stackleak: Rename STACKLEAK to KSTACK_ERASE seq_buf: Introduce KUnit tests string: Group str_has_prefix() and strstarts() kunit/fortify: Add back "volatile" for sizeof() constants acpi: nfit: intel: avoid multiple -Wflex-array-member-not-at-end warnings ...
3 daysMerge tag 'execve-v6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: - Introduce regular REGSET note macros arch-wide (Dave Martin) - Remove arbitrary 4K limitation of program header size (Yin Fengwei) - Reorder function qualifiers for copy_clone_args_from_user() (Dishank Jogi) * tag 'execve-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (25 commits) fork: reorder function qualifiers for copy_clone_args_from_user binfmt_elf: remove the 4k limitation of program header size binfmt_elf: Warn on missing or suspicious regset note names xtensa: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names um: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names x86/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names sparc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names sh: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names s390/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names riscv: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names powerpc/ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names parisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names openrisc: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names nios2: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names MIPS: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names m68k: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names LoongArch: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names hexagon: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names csky: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names arm64: ptrace: Use USER_REGSET_NOTE_TYPE() to specify regset note names ...
3 daysMerge tag 'for-6.17/block-20250728' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - MD pull request via Yu: - call del_gendisk synchronously (Xiao) - cleanup unused variable (John) - cleanup workqueue flags (Ryo) - fix faulty rdev can't be removed during resync (Qixing) - NVMe pull request via Christoph: - try PCIe function level reset on init failure (Keith Busch) - log TLS handshake failures at error level (Maurizio Lombardi) - pci-epf: do not complete commands twice if nvmet_req_init() fails (Rick Wertenbroek) - misc cleanups (Alok Tiwari) - Removal of the pktcdvd driver This has been more than a decade coming at this point, and some recently revealed breakages that had it causing issues even for cases where it isn't required made me re-pull the trigger on this one. It's known broken and nobody has stepped up to maintain the code - Series for ublk supporting batch commands, enabling the use of multishot where appropriate - Speed up ublk exit handling - Fix for the two-stage elevator fixing which could leak data - Convert NVMe to use the new IOVA based API - Increase default max transfer size to something more reasonable - Series fixing write operations on zoned DM devices - Add tracepoints for zoned block device operations - Prep series working towards improving blk-mq queue management in the presence of isolated CPUs - Don't allow updating of the block size of a loop device that is currently under exclusively ownership/open - Set chunk sectors from stacked device stripe size and use it for the atomic write size limit - Switch to folios in bcache read_super() - Fix for CD-ROM MRW exit flush handling - Various tweaks, fixes, and cleanups * tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits) block: restore two stage elevator switch while running nr_hw_queue update cdrom: Call cdrom_mrw_exit from cdrom_release function sunvdc: Balance device refcount in vdc_port_mpgroup_check nvme-pci: try function level reset on init failure dm: split write BIOs on zone boundaries when zone append is not emulated block: use chunk_sectors when evaluating stacked atomic write limits dm-stripe: limit chunk_sectors to the stripe size md/raid10: set chunk_sectors limit md/raid0: set chunk_sectors limit block: sanitize chunk_sectors for atomic write limits ilog2: add max_pow_of_two_factor() nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails nvme-tcp: log TLS handshake failures at error level docs: nvme: fix grammar in nvme-pci-endpoint-target.rst nvme: fix typo in status code constant for self-test in progress nvmet: remove redundant assignment of error code in nvmet_ns_enable() nvme: fix incorrect variable in io cqes error message nvme: fix multiple spelling and grammar issues in host drivers block: fix blk_zone_append_update_request_bio() kernel-doc md/raid10: fix set but not used variable in sync_request_write() ...
3 daystracing: trace_fprobe: Fix typo of the semicolonMasami Hiramatsu (Google)
Fix a typo that uses ',' instead of ';' for line delimiter. Link: https://lore.kernel.org/linux-trace-kernel/175366879192.487099.5714468217360139639.stgit@mhiramat.tok.corp.google.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
3 daysMerge tag 'vfs-6.17-rc1.bpf' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs bpf updates from Christian Brauner: "These changes allow bpf to read extended attributes from cgroupfs. This is useful in redirecting AF_UNIX socket connections based on cgroup membership of the socket. One use-case is the ability to implement log namespaces in systemd so services and containers are redirected to different journals" * tag 'vfs-6.17-rc1.bpf' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/kernfs: test xattr retrieval selftests/bpf: Add tests for bpf_cgroup_read_xattr bpf: Mark cgroup_subsys_state->cgroup RCU safe bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node kernfs: remove iattr_mutex
3 daysMerge tag 'vfs-6.17-rc1.pidfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pidfs updates from Christian Brauner: - persistent info Persist exit and coredump information independent of whether anyone currently holds a pidfd for the struct pid. The current scheme allocated pidfs dentries on-demand repeatedly. This scheme is reaching it's limits as it makes it impossible to pin information that needs to be available after the task has exited or coredumped and that should not be lost simply because the pidfd got closed temporarily. The next opener should still see the stashed information. This is also a prerequisite for supporting extended attributes on pidfds to allow attaching meta information to them. If someone opens a pidfd for a struct pid a pidfs dentry is allocated and stashed in pid->stashed. Once the last pidfd for the struct pid is closed the pidfs dentry is released and removed from pid->stashed. So if 10 callers create a pidfs dentry for the same struct pid sequentially, i.e., each closing the pidfd before the other creates a new one then a new pidfs dentry is allocated every time. Because multiple tasks acquiring and releasing a pidfd for the same struct pid can race with each another a task may still find a valid pidfs entry from the previous task in pid->stashed and reuse it. Or it might find a dead dentry in there and fail to reuse it and so stashes a new pidfs dentry. Multiple tasks may race to stash a new pidfs dentry but only one will succeed, the other ones will put their dentry. The current scheme aims to ensure that a pidfs dentry for a struct pid can only be created if the task is still alive or if a pidfs dentry already existed before the task was reaped and so exit information has been was stashed in the pidfs inode. That's great except that it's buggy. If a pidfs dentry is stashed in pid->stashed after pidfs_exit() but before __unhash_process() is called we will return a pidfd for a reaped task without exit information being available. The pidfds_pid_valid() check does not guard against this race as it doens't sync at all with pidfs_exit(). The pid_has_task() check might be successful simply because we're before __unhash_process() but after pidfs_exit(). Introduce a new scheme where the lifetime of information associated with a pidfs entry (coredump and exit information) isn't bound to the lifetime of the pidfs inode but the struct pid itself. The first time a pidfs dentry is allocated for a struct pid a struct pidfs_attr will be allocated which will be used to store exit and coredump information. If all pidfs for the pidfs dentry are closed the dentry and inode can be cleaned up but the struct pidfs_attr will stick until the struct pid itself is freed. This will ensure minimal memory usage while persisting relevant information. The new scheme has various advantages. First, it allows to close the race where we end up handing out a pidfd for a reaped task for which no exit information is available. Second, it minimizes memory usage. Third, it allows to remove complex lifetime tracking via dentries when registering a struct pid with pidfs. There's no need to get or put a reference. Instead, the lifetime of exit and coredump information associated with a struct pid is bound to the lifetime of struct pid itself. - extended attributes Now that we have a way to persist information for pidfs dentries we can start supporting extended attributes on pidfds. This will allow userspace to attach meta information to tasks. One natural extension would be to introduce a custom pidfs.* extended attribute space and allow for the inheritance of extended attributes across fork() and exec(). The first simple scheme will allow privileged userspace to set trusted extended attributes on pidfs inodes. - Allow autonomous pidfs file handles Various filesystems such as pidfs and drm support opening file handles without having to require a file descriptor to identify the filesystem. The filesystem are global single instances and can be trivially identified solely on the information encoded in the file handle. This makes it possible to not have to keep or acquire a sentinal file descriptor just to pass it to open_by_handle_at() to identify the filesystem. That's especially useful when such sentinel file descriptor cannot or should not be acquired. For pidfs this means a file handle can function as full replacement for storing a pid in a file. Instead a file handle can be stored and reopened purely based on the file handle. Such autonomous file handles can be opened with or without specifying a a file descriptor. If no proper file descriptor is used the FD_PIDFS_ROOT sentinel must be passed. This allows us to define further special negative fd sentinels in the future. Userspace can trivially test for support by trying to open the file handle with an invalid file descriptor. - Allow pidfds for reaped tasks with SCM_PIDFD messages This is a logical continuation of the earlier work to create pidfds for reaped tasks through the SO_PEERPIDFD socket option merged in 923ea4d4482b ("Merge patch series "net, pidfs: enable handing out pidfds for reaped sk->sk_peer_pid""). - Two minor fixes: * Fold fs_struct->{lock,seq} into a seqlock * Don't bother with path_{get,put}() in unix_open_file() * tag 'vfs-6.17-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (37 commits) don't bother with path_get()/path_put() in unix_open_file() fold fs_struct->{lock,seq} into a seqlock selftests: net: extend SCM_PIDFD test to cover stale pidfds af_unix: enable handing out pidfds for reaped tasks in SCM_PIDFD af_unix: stash pidfs dentry when needed af_unix/scm: fix whitespace errors af_unix: introduce and use scm_replace_pid() helper af_unix: introduce unix_skb_to_scm helper af_unix: rework unix_maybe_add_creds() to allow sleep selftests/pidfd: decode pidfd file handles withou having to specify an fd fhandle, pidfs: support open_by_handle_at() purely based on file handle uapi/fcntl: add FD_PIDFS_ROOT uapi/fcntl: add FD_INVALID fcntl/pidfd: redefine PIDFD_SELF_THREAD_GROUP uapi/fcntl: mark range as reserved fhandle: reflow get_path_anchor() pidfs: add pidfs_root_path() helper fhandle: rename to get_path_anchor() fhandle: hoist copy_from_user() above get_path_from_fd() fhandle: raise FILEID_IS_DIR in handle_type ...