summaryrefslogtreecommitdiff
path: root/include/trace
AgeCommit message (Collapse)Author
2 daysMerge tag 'trace-v6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Deprecate auto-mounting tracefs to /sys/kernel/debug/tracing When tracefs was first introduced back in 2014, the directory /sys/kernel/tracing was added and is the designated location to mount tracefs. To keep backward compatibility, tracefs was auto-mounted in /sys/kernel/debug/tracing as well. All distros now mount tracefs on /sys/kernel/tracing. Having it seen in two different locations has lead to various issues and inconsistencies. The VFS folks have to also maintain debugfs_create_automount() for this single user. It's been over 10 years. Tooling and scripts should start replacing the debugfs location with the tracefs one. The reason tracefs was created in the first place was to allow access to the tracing facilities without the need to configure debugfs into the kernel. Using tracefs should now be more robust. A new config is created: CONFIG_TRACEFS_AUTOMOUNT_DEPRECATED which is default y, so that the kernel is still built with the automount. This config allows those that want to remove the automount from debugfs to do so. When tracefs is accessed from /sys/kernel/debug/tracing, the following printk is triggerd: pr_warn("NOTICE: Automounting of tracing to debugfs is deprecated and will be removed in 2030\n"); This gives users another 5 years to fix their scripts. - Use queue_rcu_work() instead of call_rcu() for freeing event filters The number of filters to be free can be many depending on the number of events within an event system. Freeing them from softirq context can potentially cause undesired latency. Use the RCU workqueue to free them instead. - Remove pointless memory barriers in latency code Memory barriers were added to some of the latency code a long time ago with the idea of "making them visible", but that's not what memory barriers are for. They are to synchronize access between different variables. There was no synchronization here making them pointless. - Remove "__attribute__()" from the type field of event format When LLVM is used to compile the kernel with CONFIG_DEBUG_INFO_BTF=y and PAHOLE_HAS_BTF_TAG=y, some of the format fields get expanded with the following: field:const char * filename; offset:24; size:8; signed:0; Turns into: field:const char __attribute__((btf_type_tag("user"))) * filename; offset:24; size:8; signed:0; This confuses parsers. Add code to strip these tags from the strings. - Add eprobe config option CONFIG_EPROBE_EVENTS Eprobes were added back in 5.15 but were only enabled when another probe was enabled (kprobe, fprobe, uprobe, etc). The eprobes had no config option of their own. Add one as they should be a separate entity. It's default y to keep with the old kernels but still has dependencies on TRACING and HAVE_REGS_AND_STACK_ACCESS_API. - Add eprobe documentation When eprobes were added back in 5.15 no documentation was added to describe them. This needs to be rectified. - Replace open coded cpumask_next_wrap() in move_to_next_cpu() - Have preemptirq_delay_run() use off-stack CPU mask - Remove obsolete comment about pelt_cfs event DECLARE_TRACE() appends "_tp" to trace events now, but the comment above pelt_cfs still mentioned appending it manually. - Remove EVENT_FILE_FL_SOFT_MODE flag The SOFT_MODE flag was required when the soft enabling and disabling of trace events was first introduced. But there was a bug with this approach as it only worked for a single instance. When multiple users required soft disabling and disabling the code was changed to have a ref count. The SOFT_MODE flag is now set iff the ref count is non zero. This is redundant and just reading the ref count is good enough. - Fix typo in comment * tag 'trace-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: Documentation: tracing: Add documentation about eprobes tracing: Have eprobes have their own config option tracing: Remove "__attribute__()" from the type field of event format tracing: Deprecate auto-mounting tracefs in debugfs tracing: Fix comment in trace_module_remove_events() tracing: Remove EVENT_FILE_FL_SOFT_MODE flag tracing: Remove pointless memory barriers tracing/sched: Remove obsolete comment on suffixes kernel: trace: preemptirq_delay_test: use offstack cpu mask tracing: Use queue_rcu_work() to free filters tracing: Replace opencoded cpumask_next_wrap() in move_to_next_cpu()
3 daysMerge tag 'cgroup-for-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Allow css_rstat_updated() in NMI context to enable memory accounting for allocations in NMI context. - /proc/cgroups doesn't contain useful information for cgroup2 and was updated to only show v1 controllers. This unfortunately broke something in the wild. Add an option to bring back the old behavior to ease transition. - selftest updates and other cleanups. * tag 'cgroup-for-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Add compatibility option for content of /proc/cgroups selftests/cgroup: fix cpu.max tests cgroup: llist: avoid memory tears for llist_node selftests: cgroup: Fix missing newline in test_zswap_writeback_one selftests: cgroup: Allow longer timeout for kmem_dead_cgroups cleanup memcg: cgroup: call css_rstat_updated irrespective of in_nmi() cgroup: remove per-cpu per-subsystem locks cgroup: make css_rstat_updated nmi safe cgroup: support to enable nmi-safe css_rstat_updated selftests: cgroup: Fix compilation on pre-cgroupns kernels selftests: cgroup: Optionally set up v1 environment selftests: cgroup: Add support for named v1 hierarchies in test_core selftests: cgroup_util: Add helpers for testing named v1 hierarchies Documentation: cgroup: add section explaining controller availability cgroup: Drop sock_cgroup_classid() dummy implementation
3 daysMerge tag 'mm-stable-2025-07-30-15-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "As usual, many cleanups. The below blurbiage describes 42 patchsets. 21 of those are partially or fully cleanup work. "cleans up", "cleanup", "maintainability", "rationalizes", etc. I never knew the MM code was so dirty. "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes) addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for merging with existing adjacent VMAs. "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park) adds a new kernel module which simplifies the setup and usage of DAMON in production environments. "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig) is a cleanup to the writeback code which removes a couple of pointers from struct writeback_control. "drivers/base/node.c: optimization and cleanups" (Donet Tom) contains largely uncorrelated cleanups to the NUMA node setup and management code. "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman) does some maintenance work on the userfaultfd code. "Readahead tweaks for larger folios" (Ryan Roberts) implements some tuneups for pagecache readahead when it is reading into order>0 folios. "selftests/mm: Tweaks to the cow test" (Mark Brown) provides some cleanups and consistency improvements to the selftests code. "Optimize mremap() for large folios" (Dev Jain) does that. A 37% reduction in execution time was measured in a memset+mremap+munmap microbenchmark. "Remove zero_user()" (Matthew Wilcox) expunges zero_user() in favor of the more modern memzero_page(). "mm/huge_memory: vmf_insert_folio_*() and vmf_insert_pfn_pud() fixes" (David Hildenbrand) addresses some warts which David noticed in the huge page code. These were not known to be causing any issues at this time. "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park) provides some cleanup and consolidation work in DAMON. "use vm_flags_t consistently" (Lorenzo Stoakes) uses vm_flags_t in places where we were inappropriately using other types. "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy) increases the reliability of large page allocation in the memfd code. "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple) removes several now-unneeded PFN_* flags. "mm/damon: decouple sysfs from core" (SeongJae Park) implememnts some cleanup and maintainability work in the DAMON sysfs layer. "madvise cleanup" (Lorenzo Stoakes) does quite a lot of cleanup/maintenance work in the madvise() code. "madvise anon_name cleanups" (Vlastimil Babka) provides additional cleanups on top or Lorenzo's effort. "Implement numa node notifier" (Oscar Salvador) creates a standalone notifier for NUMA node memory state changes. Previously these were lumped under the more general memory on/offline notifier. "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan) cleans up the pageblock isolation code and fixes a potential issue which doesn't seem to cause any problems in practice. "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park) adds additional drgn- and python-based DAMON selftests which are more comprehensive than the existing selftest suite. "Misc rework on hugetlb faulting path" (Oscar Salvador) fixes a rather obscure deadlock in the hugetlb fault code and follows that fix with a series of cleanups. "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport) rationalizes and cleans up the highmem-specific code in the CMA allocator. "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand) provides cleanups and future-preparedness to the migration code. "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park) adds some tracepoints to some DAMON auto-tuning code. "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park) does that. "mm/damon: misc cleanups" (SeongJae Park) also does what it claims. "mm: folio_pte_batch() improvements" (David Hildenbrand) cleans up the large folio PTE batching code. "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park) facilitates dynamic alteration of DAMON's inter-node allocation policy. "Remove unmap_and_put_page()" (Vishal Moola) provides a couple of page->folio conversions. "mm: per-node proactive reclaim" (Davidlohr Bueso) implements a per-node control of proactive reclaim - beyond the current memcg-based implementation. "mm/damon: remove damon_callback" (SeongJae Park) replaces the damon_callback interface with a more general and powerful damon_call()+damos_walk() interface. "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes) implements a number of mremap cleanups (of course) in preparation for adding new mremap() functionality: newly permit the remapping of multiple VMAs when the user is specifying MREMAP_FIXED. It still excludes some specialized situations where this cannot be performed reliably. "drop hugetlb_free_pgd_range()" (Anthony Yznaga) switches some sparc hugetlb code over to the generic version and removes the thus-unneeded hugetlb_free_pgd_range(). "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park) augments the present userspace-requested update of DAMON sysfs monitoring files. Automatic update is now provided, along with a tunable to control the update interval. "Some randome fixes and cleanups to swapfile" (Kemeng Shi) does what is claims. "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand) provides (and uses) a means by which debug-style functions can grab a copy of a pageframe and inspect it locklessly without tripping over the races inherent in operating on the live pageframe directly. "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan) addresses the large contention issues which can be triggered by reads from that procfs file. Latencies are reduced by more than half in some situations. The series also introduces several new selftests for the /proc/pid/maps interface. "__folio_split() clean up" (Zi Yan) cleans up __folio_split()! "Optimize mprotect() for large folios" (Dev Jain) provides some quite large (>3x) speedups to mprotect() when dealing with large folios. "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian) does some cleanup work in the selftests code. "tools/testing: expand mremap testing" (Lorenzo Stoakes) extends the mremap() selftest in several ways, including adding more checking of Lorenzo's recently added "permit mremap() move of multiple VMAs" feature. "selftests/damon/sysfs.py: test all parameters" (SeongJae Park) extends the DAMON sysfs interface selftest so that it tests all possible user-requested parameters. Rather than the present minimal subset" * tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits) MAINTAINERS: add missing headers to mempory policy & migration section MAINTAINERS: add missing file to cgroup section MAINTAINERS: add MM MISC section, add missing files to MISC and CORE MAINTAINERS: add missing zsmalloc file MAINTAINERS: add missing files to page alloc section MAINTAINERS: add missing shrinker files MAINTAINERS: move memremap.[ch] to hotplug section MAINTAINERS: add missing mm_slot.h file THP section MAINTAINERS: add missing interval_tree.c to memory mapping section MAINTAINERS: add missing percpu-internal.h file to per-cpu section mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info() selftests/damon: introduce _common.sh to host shared function selftests/damon/sysfs.py: test runtime reduction of DAMON parameters selftests/damon/sysfs.py: test non-default parameters runtime commit selftests/damon/sysfs.py: generalize DAMON context commit assertion selftests/damon/sysfs.py: generalize monitoring attributes commit assertion selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion selftests/damon/sysfs.py: test DAMOS filters commitment selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion selftests/damon/sysfs.py: test DAMOS destinations commitment ...
3 daysMerge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull SCSI updates from James Bottomley: "Smaller set of driver updates than usual (ufs, lpfc, mpi3mr). The rest (including the core file changes) are doc updates and some minor bug fixes" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (49 commits) scsi: libiscsi: Initialize iscsi_conn->dd_data only if memory is allocated scsi: scsi_transport_fc: Add comments to describe added 'rport' parameter scsi: bfa: Double-free fix scsi: isci: Fix dma_unmap_sg() nents value scsi: mvsas: Fix dma_unmap_sg() nents value scsi: elx: efct: Fix dma_unmap_sg() nents value scsi: scsi_transport_fc: Change to use per-rport devloss_work_q scsi: ufs: exynos: Fix programming of HCI_UTRL_NEXUS_TYPE scsi: core: Fix kernel doc for scsi_track_queue_full() scsi: ibmvscsi_tgt: Fix dma_unmap_sg() nents value scsi: ibmvscsi_tgt: Fix typo in comment scsi: mpi3mr: Update driver version to 8.14.0.5.50 scsi: mpi3mr: Serialize admin queue BAR writes on 32-bit systems scsi: mpi3mr: Drop unnecessary volatile from __iomem pointers scsi: mpi3mr: Fix race between config read submit and interrupt completion scsi: ufs: ufs-qcom: Enable QUnipro Internal Clock Gating scsi: ufs: core: Add ufshcd_dme_rmw() to modify DME attributes scsi: ufs: ufs-qcom: Update esi_vec_mask for HW major version >= 6 scsi: core: Use scsi_cmd_priv() instead of open-coding it scsi: qla2xxx: Remove firmware URL ...
3 daysMerge tag 'ext4_for_linus_6.17-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Major ext4 changes for 6.17: - Better scalability for ext4 block allocation - Fix insufficient credits when writing back large folios Miscellaneous bug fixes, especially when handling exteded attriutes, inline data, and fast commit" * tag 'ext4_for_linus_6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (39 commits) ext4: do not BUG when INLINE_DATA_FL lacks system.data xattr ext4: implement linear-like traversal across order xarrays ext4: refactor choose group to scan group ext4: convert free groups order lists to xarrays ext4: factor out ext4_mb_scan_group() ext4: factor out ext4_mb_might_prefetch() ext4: factor out __ext4_mb_scan_group() ext4: fix largest free orders lists corruption on mb_optimize_scan switch ext4: fix zombie groups in average fragment size lists ext4: merge freed extent with existing extents before insertion ext4: convert sbi->s_mb_free_pending to atomic_t ext4: fix typo in CR_GOAL_LEN_SLOW comment ext4: get rid of some obsolete EXT4_MB_HINT flags ext4: utilize multiple global goals to reduce contention ext4: remove unnecessary s_md_lock on update s_mb_last_group ext4: remove unnecessary s_mb_last_start ext4: separate stream goal hits from s_bal_goals for better tracking ext4: add ext4_try_lock_group() to skip busy groups ext4: initialize superblock fields in the kballoc-test.c kunit tests ext4: refactor the inline directory conversion and new directory codepaths ...
4 daysMerge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "Highlights: - Intel xe enable Panthor Lake, started adding WildCat Lake - amdgpu has a bunch of reset improvments along with the usual IP updates - msm got VM_BIND support which is important for vulkan sparse memory - more drm_panic users - gpusvm common code to handle a bunch of core SVM work outside drivers. Detail summary: Changes outside drm subdirectory: - 'shrink_shmem_memory()' for better shmem/hibernate interaction - Rust support infrastructure: - make ETIMEDOUT available - add size constants up to SZ_2G - add DMA coherent allocation bindings - mtd driver for Intel GPU non-volatile storage - i2c designware quirk for Intel xe core: - atomic helpers: tune enable/disable sequences - add task info to wedge API - refactor EDID quirks - connector: move HDR sink to drm_display_info - fourcc: half-float and 32-bit float formats - mode_config: pass format info to simplify dma-buf: - heaps: Give CMA heap a stable name ci: - add device tree validation and kunit displayport: - change AUX DPCD access probe address - add quirk for DPCD probe - add panel replay definitions - backlight control helpers fbdev: - make CONFIG_FIRMWARE_EDID available on all arches fence: - fix UAF issues format-helper: - improve tests gpusvm: - introduce devmem only flag for allocation - add timeslicing support to GPU SVM ttm: - improve eviction sched: - tracing improvements - kunit improvements - memory leak fixes - reset handling improvements color mgmt: - add hardware gamma LUT handling helpers bridge: - add destroy hook - switch to reference counted drm_bridge allocations - tc358767: convert to devm_drm_bridge_alloc - improve CEC handling panel: - switch to reference counter drm_panel allocations - fwnode panel lookup - Huiling hl055fhv028c support - Raspberry Pi 7" 720x1280 support - edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK - simple: AUO P238HAN01 - st7701: Winstar wf40eswaa6mnn0 - visionox: rm69299-shift - Renesas R61307, Renesas R69328 support - DJN HX83112B hdmi: - add CEC handling - YUV420 output support xe: - WildCat Lake support - Enable PanthorLake by default - mark BMG as SRIOV capable - update firmware recommendations - Expose media OA units - aux-bux support for non-volatile memory - MTD intel-dg driver for non-volatile memory - Expose fan control and voltage regulator in sysfs - restructure migration for multi-device - Restore GuC submit UAF fix - make GEM shrinker drm managed - SRIOV VF Post-migration recovery of GGTT nodes - W/A additions/reworks - Prefetch support for svm ranges - Don't allocate managed BO for each policy change - HWMON fixes for BMG - Create LRC BO without VM - PCI ID updates - make SLPC debugfs files optional - rework eviction rejection of bound external BOs - consolidate PAT programming logic for pre/post Xe2 - init changes for flicker-free boot - Enable GuC Dynamic Inhibit Context switch i915: - drm_panic support for i915/xe - initial flip queue off by default for LNL/PNL - Wildcat Lake Display support - Support for DSC fractional link bpp - Support for simultaneous Panel Replay and Adaptive sync - Support for PTL+ double buffer LUT - initial PIPEDMC event handling - drm_panel_follower support - DPLL interface renames - allocate struct intel_display dynamically - flip queue preperation - abstract DRAM detection better - avoid GuC scheduling stalls - remove DG1 force probe requirement - fix MEI interrupt handler on RT kernels - use backlight control helpers for eDP - more shared display code refactoring amdgpu: - add userq slot to INFO ioctl - SR-IOV hibernation support - Suspend improvements - Backlight improvements - Use scaling for non-native eDP modes - cleaner shader updates for GC 9.x - Remove fence slab - SDMA fw checks for userq support - RAS updates - DMCUB updates - DP tunneling fixes - Display idle D3 support - Per queue reset improvements - initial smartmux support amdkfd: - enable KFD on loongarch - mtype fix for ext coherent system memory radeon: - CS validation additional GL extensions - drop console lock during suspend/resume - bump driver version msm: - VM BIND support - CI: infrastructure updates - UBWC single source of truth - decouple GPU and KMS support - DP: rework I/O accessors - DPU: SM8750 support - DSI: SM8750 support - GPU: X1-45 support and speedbin support for X1-85 - MDSS: SM8750 support nova: - register! macro improvements - DMA object abstraction - VBIOS parser + fwsec lookup - sysmem flush page support - falcon: generic falcon boot code and HAL - FWSEC-FRTS: fb setup and load/execute ivpu: - Add Wildcat Lake support - Add turbo flag ast: - improve hardware generations implementation imx: - IMX8qxq Display Controller support lima: - Rockchip RK3528 GPU support nouveau: - fence handling cleanup panfrost: - MT8370 support - bo labeling - 64-bit register access qaic: - add RAS support rockchip: - convert inno_hdmi to a bridge rz-du: - add RZ/V2H(P) support - MIPI-DSI DCS support sitronix: - ST7567 support sun4i: - add H616 support tidss: - add TI AM62L support - AM65x OLDI bridge support bochs: - drm panic support vkms: - YUV and R* format support - use faux device vmwgfx: - fence improvements hyperv: - move out of simple - add drm_panic support" * tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits) drm/tidss: oldi: convert to devm_drm_bridge_alloc() API drm/tidss: encoder: convert to devm_drm_bridge_alloc() drm/amdgpu: move reset support type checks into the caller drm/amdgpu/sdma7: re-emit unprocessed state on ring reset drm/amdgpu/sdma6: re-emit unprocessed state on ring reset drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset drm/amdgpu/sdma5: re-emit unprocessed state on ring reset drm/amdgpu/gfx12: re-emit unprocessed state on ring reset drm/amdgpu/gfx11: re-emit unprocessed state on ring reset drm/amdgpu/gfx10: re-emit unprocessed state on ring reset drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset drm/amdgpu: Add WARN_ON to the resource clear function drm/amd/pm: Use cached metrics data on SMUv13.0.6 drm/amd/pm: Use cached data for min/max clocks gpu: nova-core: fix bounds check in PmuLookupTableEntry::new drm/amdgpu: Replace HQD terminology with slots naming drm/amdgpu: Add user queue instance count in HW IP info drm/amd/amdgpu: Add helper functions for isp buffers drm/amd/amdgpu: Initialize swnode for ISP MFD device ...
4 daysMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor - Convert more system register sanitization to the config-driven implementation - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls - Various cleanups and minor fixes LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n) - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list) - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC - Various cleanups and fixes x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still) - Fix a variety of flaws and bugs in the AVIC device posted IRQ code - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs - Clean up and document/comment the irqfd assignment code - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation Selftests: - Fix a comment typo - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing) - Skip tests that hit EACCES when attempting to access a file, and print a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits) Documentation: KVM: Use unordered list for pre-init VGIC registers RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map() RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs RISC-V: perf/kvm: Add reporting of interrupt events RISC-V: KVM: Enable ring-based dirty memory tracking RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap RISC-V: KVM: Delegate illegal instruction fault to VS mode RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs RISC-V: KVM: Factor-out g-stage page table management RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence RISC-V: KVM: Introduce struct kvm_gstage_mapping RISC-V: KVM: Factor-out MMU related declarations into separate headers RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect() RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range() RISC-V: KVM: Don't flush TLB when PTE is unchanged RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize() RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init() RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list ...
4 daysMerge tag 'trace-unused-v6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracepoint cleanup from Steven Rostedt: "Remove or hide unused tracepoints Tracepoints take up memory (around 5K per tracepoint) even when they are unused. Changes are being made to detect when a tracepoint is defined but unused and a warning is shown at build. But those changes are not yet ready for inclusion. - Fix some of the unused tracepoints that it detected Some tracepoints were removed and others were hidden by config settings to match the config settings of where they are instantiated. Some tracepoints were moved into architecture specific code as only one architecture used them. - Call the ftrace_test_filter tracepoint in an unreachable if statement The ftrace_test_filter tracepoint which is defined when ftrace selftests are configured and is used to test the filter logic, but the tracepoint is not actually called. It is put into an if statement to not have it get compiled out, but also not warn for not being used" * tag 'trace-unused-v6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: sched: Hide numa events under CONFIG_NUMA_BALANCING powerpc/thp: tracing: Hide hugepage events under CONFIG_PPC_BOOK3S_64 tracing: Call trace_ftrace_test_filter() for the event tracing: arm: arm64: Hide trace events ipi_raise, ipi_entry and ipi_exit binder: Remove unused binder lock events PM: tracing: Hide power_domain_target event under ARCH_OMAP2PLUS PM: tracing: Hide device_pm_callback events under PM_SLEEP PM: tracing: Hide psci_domain_idle events under ARM_PSCI_CPUIDLE PM: cpufreq: powernv/tracing: Move powernv_throttle trace event alarmtimer: Hide alarmtimer_suspend event when RTC_CLASS is not configured tracing, AER: Hide PCIe AER event when PCIEAER is not configured
4 daysMerge tag 'trace-rv-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull runtime verification updates from Steven Rostedt: - Added Linear temporal logic monitors for RT application Real-time applications may have design flaws causing them to have unexpected latency. For example, the applications may raise page faults, or may be blocked trying to take a mutex without priority inheritance. However, while attempting to implement DA monitors for these real-time rules, deterministic automaton is found to be inappropriate as the specification language. The automaton is complicated, hard to understand, and error-prone. For these cases, linear temporal logic is found to be more suitable. The LTL is more concise and intuitive. - Make printk_deferred() public The new monitors needed access to printk_deferred(). Make them visible for the entire kernel. - Add a vpanic() to allow for va_list to be passed to panic. - Add rtapp container monitor. A collection of monitors that check for common problems with real-time applications that cause unexpected latency. - Add page fault tracepoints to risc-v These tracepoints are necessary to for the RV monitor to run on risc-v. - Fix the behaviour of the rv tool with -s and idle tasks. - Allow the rv tool to gracefully terminate with SIGTERM - Adjusts dot2c not to create lines over 100 columns - Properly order nested monitors in the RV Kconfig file - Return the registration error in all DA monitor instead of 0 - Update and add new sched collection monitors Replace tss and sncid monitors with more complete sts: Not only prove that switches occur in scheduling context and scheduling needs interrupt disabled but also that each call to the scheduler disables interrupts to (optionally) switch. New monitor: nrp Preemption requires need resched which is cleared by any switch (includes a non optimal workaround for /nested/ preemptions) New monitor: sssw suspension requires setting the task to sleepable and, after the switch occurs, the task requires a wakeup to come back to runnable New monitor: opid waking and need-resched operations occur with interrupts and preemption disabled or in IRQ without explicitly disabling preemption" * tag 'trace-rv-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (48 commits) rv: Add opid per-cpu monitor rv: Add nrp and sssw per-task monitors rv: Replace tss and sncid monitors with more complete sts sched: Adapt sched tracepoints for RV task model rv: Retry when da monitor detects race conditions rv: Adjust monitor dependencies rv: Use strings in da monitors tracepoints rv: Remove trailing whitespace from tracepoint string rv: Add da_handle_start_run_event_ to per-task monitors rv: Fix wrong type cast in reactors_show() and monitor_reactor_show() rv: Fix wrong type cast in monitors_show() rv: Remove struct rv_monitor::reacting rv: Remove rv_reactor's reference counter rv: Merge struct rv_reactor_def into struct rv_reactor rv: Merge struct rv_monitor_def into struct rv_monitor rv: Remove unused field in struct rv_monitor_def rv: Return init error when registering monitors verification/rvgen: Organise Kconfig entries for nested monitors tools/dot2c: Fix generated files going over 100 column limit tools/rv: Stop gracefully also on SIGTERM ...
4 daysMerge tag 'net-next-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Wrap datapath globals into net_aligned_data, to avoid false sharing - Preserve MSG_ZEROCOPY in forwarding (e.g. out of a container) - Add SO_INQ and SCM_INQ support to AF_UNIX - Add SIOCINQ support to AF_VSOCK - Add TCP_MAXSEG sockopt to MPTCP - Add IPv6 force_forwarding sysctl to enable forwarding per interface - Make TCP validation of whether packet fully fits in the receive window and the rcv_buf more strict. With increased use of HW aggregation a single "packet" can be multiple 100s of kB - Add MSG_MORE flag to optimize large TCP transmissions via sockmap, improves latency up to 33% for sockmap users - Convert TCP send queue handling from tasklet to BH workque - Improve BPF iteration over TCP sockets to see each socket exactly once - Remove obsolete and unused TCP RFC3517/RFC6675 loss recovery code - Support enabling kernel threads for NAPI processing on per-NAPI instance basis rather than a whole device. Fully stop the kernel NAPI thread when threaded NAPI gets disabled. Previously thread would stick around until ifdown due to tricky synchronization - Allow multicast routing to take effect on locally-generated packets - Add output interface argument for End.X in segment routing - MCTP: add support for gateway routing, improve bind() handling - Don't require rtnl_lock when fetching an IPv6 neighbor over Netlink - Add a new neighbor flag ("extern_valid"), which cedes refresh responsibilities to userspace. This is needed for EVPN multi-homing where a neighbor entry for a multi-homed host needs to be synced across all the VTEPs among which the host is multi-homed - Support NUD_PERMANENT for proxy neighbor entries - Add a new queuing discipline for IETF RFC9332 DualQ Coupled AQM - Add sequence numbers to netconsole messages. Unregister netconsole's console when all net targets are removed. Code refactoring. Add a number of selftests - Align IPSec inbound SA lookup to RFC 4301. Only SPI and protocol should be used for an inbound SA lookup - Support inspecting ref_tracker state via DebugFS - Don't force bonding advertisement frames tx to ~333 ms boundaries. Add broadcast_neighbor option to send ARP/ND on all bonded links - Allow providing upcall pid for the 'execute' command in openvswitch - Remove DCCP support from Netfilter's conntrack - Disallow multiple packet duplications in the queuing layer - Prevent use of deprecated iptables code on PREEMPT_RT Driver API: - Support RSS and hashing configuration over ethtool Netlink - Add dedicated ethtool callbacks for getting and setting hashing fields - Add support for power budget evaluation strategy in PSE / Power-over-Ethernet. Generate Netlink events for overcurrent etc - Support DPLL phase offset monitoring across all device inputs. Support providing clock reference and SYNC over separate DPLL inputs - Support traffic classes in devlink rate API for bandwidth management - Remove rtnl_lock dependency from UDP tunnel port configuration Device drivers: - Add a new Broadcom driver for 800G Ethernet (bnge) - Add a standalone driver for Microchip ZL3073x DPLL - Remove IBM's NETIUCV device driver - Ethernet high-speed NICs: - Broadcom (bnxt): - support zero-copy Tx of DMABUF memory - take page size into account for page pool recycling rings - Intel (100G, ice, idpf): - idpf: XDP and AF_XDP support preparations - idpf: add flow steering - add link_down_events statistic - clean up the TSPLL code - preparations for live VM migration - nVidia/Mellanox: - support zero-copy Rx/Tx interfaces (DMABUF and io_uring) - optimize context memory usage for matchers - expose serial numbers in devlink info - support PCIe congestion metrics - Meta (fbnic): - add 25G, 50G, and 100G link modes to phylink - support dumping FW logs - Marvell/Cavium: - support for CN20K generation of the Octeon chips - Amazon: - add HW clock (without timestamping, just hypervisor time access) - Ethernet virtual: - VirtIO net: - support segmentation of UDP-tunnel-encapsulated packets - Google (gve): - support packet timestamping and clock synchronization - Microsoft vNIC: - add handler for device-originated servicing events - allow dynamic MSI-X vector allocation - support Tx bandwidth clamping - Ethernet NICs consumer, and embedded: - AMD: - amd-xgbe: hardware timestamping and PTP clock support - Broadcom integrated MACs (bcmgenet, bcmasp): - use napi_complete_done() return value to support NAPI polling - add support for re-starting auto-negotiation - Broadcom switches (b53): - support BCM5325 switches - add bcm63xx EPHY power control - Synopsys (stmmac): - lots of code refactoring and cleanups - TI: - icssg-prueth: read firmware-names from device tree - icssg: PRP offload support - Microchip: - lan78xx: convert to PHYLINK for improved PHY and MAC management - ksz: add KSZ8463 switch support - Intel: - support similar queue priority scheme in multi-queue and time-sensitive networking (taprio) - support packet pre-emption in both - RealTek (r8169): - enable EEE at 5Gbps on RTL8126 - Airoha: - add PPPoE offload support - MDIO bus controller for Airoha AN7583 - Ethernet PHYs: - support for the IPQ5018 internal GE PHY - micrel KSZ9477 switch-integrated PHYs: - add MDI/MDI-X control support - add RX error counters - add cable test support - add Signal Quality Indicator (SQI) reporting - dp83tg720: improve reset handling and reduce link recovery time - support bcm54811 (and its MII-Lite interface type) - air_en8811h: support resume/suspend - support PHY counters for QCA807x and QCA808x - support WoL for QCA807x - CAN drivers: - rcar_canfd: support for Transceiver Delay Compensation - kvaser: report FW versions via devlink dev info - WiFi: - extended regulatory info support (6 GHz) - add statistics and beacon monitor for Multi-Link Operation (MLO) - support S1G aggregation, improve S1G support - add Radio Measurement action fields - support per-radio RTS threshold - some work around how FIPS affects wifi, which was wrong (RC4 is used by TKIP, not only WEP) - improvements for unsolicited probe response handling - WiFi drivers: - RealTek (rtw88): - IBSS mode for SDIO devices - RealTek (rtw89): - BT coexistence for MLO/WiFi7 - concurrent station + P2P support - support for USB devices RTL8851BU/RTL8852BU - Intel (iwlwifi): - use embedded PNVM in (to be released) FW images to fix compatibility issues - many cleanups (unused FW APIs, PCIe code, WoWLAN) - some FIPS interoperability - MediaTek (mt76): - firmware recovery improvements - more MLO work - Qualcomm/Atheros (ath12k): - fix scan on multi-radio devices - more EHT/Wi-Fi 7 features - encapsulation/decapsulation offload - Broadcom (brcm80211): - support SDIO 43751 device - Bluetooth: - hci_event: add support for handling LE BIG Sync Lost event - ISO: add socket option to report packet seqnum via CMSG - ISO: support SCM_TIMESTAMPING for ISO TS - Bluetooth drivers: - intel_pcie: support Function Level Reset - nxpuart: add support for 4M baudrate - nxpuart: implement powerup sequence, reset, FW dump, and FW loading" * tag 'net-next-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1742 commits) dpll: zl3073x: Fix build failure selftests: bpf: fix legacy netfilter options ipv6: annotate data-races around rt->fib6_nsiblings ipv6: fix possible infinite loop in fib6_info_uses_dev() ipv6: prevent infinite loop in rt6_nlmsg_size() ipv6: add a retry logic in net6_rt_notify() vrf: Drop existing dst reference in vrf_ip6_input_dst net/sched: taprio: align entry index attr validation with mqprio net: fsl_pq_mdio: use dev_err_probe selftests: rtnetlink.sh: remove esp4_offload after test vsock: remove unnecessary null check in vsock_getname() igb: xsk: solve negative overflow of nb_pkts in zerocopy mode stmmac: xsk: fix negative overflow of budget in zerocopy mode dt-bindings: ieee802154: Convert at86rf230.txt yaml format net: dsa: microchip: Disable PTP function of KSZ8463 net: dsa: microchip: Setup fiber ports for KSZ8463 net: dsa: microchip: Write switch MAC address differently for KSZ8463 net: dsa: microchip: Use different registers for KSZ8463 net: dsa: microchip: Add KSZ8463 switch support to KSZ DSA driver dt-bindings: net: dsa: microchip: Add KSZ8463 switch support ...
5 daysMerge tag 'soc-drivers-6.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "Changes are all over the place, but very little sticks out as noteworthy. There is a new misc driver for the Raspberry Pi 5's RP1 multifunction I/O chip, along with hooking it up to the pinctrl and clk frameworks. The reset controller and memory subsystems have mainly small updates, but there are two new reset drivers for the K230 and VC1800B SoCs, and new memory driver support for Tegra264. The ARM SMCCC and SCMI firmware drivers gain a few more features that should help them be supported across more environments. Similarly, the SoC specific firmware on Tegra and Qualcomm get minor enhancements and chip support. In the drivers/soc/ directory, the ASPEED LPC snoop driver gets an overhaul for code robustness, the Tegra and Qualcomm and NXP drivers grow to support more chips, while the Hisilicon, Mediatek and Renesas drivers see mostly janitorial fixes" * tag 'soc-drivers-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (100 commits) bus: del unnecessary init var soc: fsl: qe: convert set_multiple() to returning an integer pinctrl: rp1: use new GPIO line value setter callbacks soc: hisilicon: kunpeng_hccs: Fix incorrect log information dt-bindings: soc: qcom: qcom,pmic-glink: document Milos compatible dt-bindings: soc: qcom,aoss-qmp: document the Milos Always-On Subsystem side channel dt-bindings: firmware: qcom,scm: document Milos SCM Firmware Interface soc: qcom: socinfo: Add support to retrieve APPSBL build details soc: qcom: pmic_glink: fix OF node leak soc: qcom: spmi-pmic: add more PMIC SUBTYPE IDs soc: qcom: socinfo: Add PM7550 & PMIV0108 PMICs soc: qcom: socinfo: Add SoC IDs for SM7635 family dt-bindings: arm: qcom,ids: Add SoC IDs for SM7635 family firmware: qcom: scm: request the waitqueue irq *after* initializing SCM firmware: qcom: scm: initialize tzmem before marking SCM as available firmware: qcom: scm: take struct device as argument in SHM bridge enable firmware: qcom: scm: remove unused arguments from SHM bridge routines soc: qcom: rpmh-rsc: Add RSC version 4 support memory: tegra: Add Tegra264 MC and EMC support firmware: tegra: bpmp: Fix build failure for tegra264-only config ...
6 daysMerge tag 'kvm-x86-generic-6.17' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM generic changes for 6.17 - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions. - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL.
6 daysMerge tag 'kvm-x86-irqs-6.17' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM IRQ changes for 6.17 - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI. - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand. - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time. - Drop x86's irq_comm.c, and move a pile of IRQ related code into irq.c. - Fix a variety of flaws and bugs in the AVIC device posted IRQ code. - Inhibited AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation. - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry. - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs. - Dedup x86's device posted IRQ code, as the vast majority of functionality can be shared verbatime between SVM and VMX. - Harden the device posted IRQ code against bugs and runtime errors. - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n). - Generate GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU. - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs. - Clean up and document/comment the irqfd assignment code. - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique.
6 daysMerge tag 'for-6.17/block-20250728' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - MD pull request via Yu: - call del_gendisk synchronously (Xiao) - cleanup unused variable (John) - cleanup workqueue flags (Ryo) - fix faulty rdev can't be removed during resync (Qixing) - NVMe pull request via Christoph: - try PCIe function level reset on init failure (Keith Busch) - log TLS handshake failures at error level (Maurizio Lombardi) - pci-epf: do not complete commands twice if nvmet_req_init() fails (Rick Wertenbroek) - misc cleanups (Alok Tiwari) - Removal of the pktcdvd driver This has been more than a decade coming at this point, and some recently revealed breakages that had it causing issues even for cases where it isn't required made me re-pull the trigger on this one. It's known broken and nobody has stepped up to maintain the code - Series for ublk supporting batch commands, enabling the use of multishot where appropriate - Speed up ublk exit handling - Fix for the two-stage elevator fixing which could leak data - Convert NVMe to use the new IOVA based API - Increase default max transfer size to something more reasonable - Series fixing write operations on zoned DM devices - Add tracepoints for zoned block device operations - Prep series working towards improving blk-mq queue management in the presence of isolated CPUs - Don't allow updating of the block size of a loop device that is currently under exclusively ownership/open - Set chunk sectors from stacked device stripe size and use it for the atomic write size limit - Switch to folios in bcache read_super() - Fix for CD-ROM MRW exit flush handling - Various tweaks, fixes, and cleanups * tag 'for-6.17/block-20250728' of git://git.kernel.dk/linux: (94 commits) block: restore two stage elevator switch while running nr_hw_queue update cdrom: Call cdrom_mrw_exit from cdrom_release function sunvdc: Balance device refcount in vdc_port_mpgroup_check nvme-pci: try function level reset on init failure dm: split write BIOs on zone boundaries when zone append is not emulated block: use chunk_sectors when evaluating stacked atomic write limits dm-stripe: limit chunk_sectors to the stripe size md/raid10: set chunk_sectors limit md/raid0: set chunk_sectors limit block: sanitize chunk_sectors for atomic write limits ilog2: add max_pow_of_two_factor() nvmet: pci-epf: Do not complete commands twice if nvmet_req_init() fails nvme-tcp: log TLS handshake failures at error level docs: nvme: fix grammar in nvme-pci-endpoint-target.rst nvme: fix typo in status code constant for self-test in progress nvmet: remove redundant assignment of error code in nvmet_ns_enable() nvme: fix incorrect variable in io cqes error message nvme: fix multiple spelling and grammar issues in host drivers block: fix blk_zone_append_update_request_bio() kernel-doc md/raid10: fix set but not used variable in sync_request_write() ...
6 dayssched: Adapt sched tracepoints for RV task modelGabriele Monaco
Add the following tracepoint: * sched_set_need_resched(tsk, cpu, tif) Called when a task is set the need resched [lazy] flag Remove the unused ip parameter from sched_entry and sched_exit and alter sched_entry to have a value of preempt consistent with the one used in sched_switch. Also adapt all monitors using sched_{entry,exit} to avoid breaking build. These tracepoints are useful to describe the Linux task model and are adapted from the patches by Daniel Bristot de Oliveira (https://bristot.me/linux-task-model/). Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nam Cao <namcao@linutronix.de> Cc: Tomas Glozar <tglozar@redhat.com> Cc: Juri Lelli <jlelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250728135022.255578-7-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
6 daysMerge tag 'vfs-6.17-rc1.fallocate' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull fallocate updates from Christian Brauner: "fallocate() currently supports creating preallocated files efficiently. However, on most filesystems fallocate() will preallocate blocks in an unwriten state even if FALLOC_FL_ZERO_RANGE is specified. The extent state must later be converted to a written state when the user writes data into this range, which can trigger numerous metadata changes and journal I/O. This may leads to significant write amplification and performance degradation in synchronous write mode. At the moment, the only method to avoid this is to create an empty file and write zero data into it (for example, using 'dd' with a large block size). However, this method is slow and consumes a considerable amount of disk bandwidth. Now that more and more flash-based storage devices are available it is possible to efficiently write zeros to SSDs using the unmap write zeroes command if the devices do not write physical zeroes to the media. For example, if SCSI SSDs support the UMMAP bit or NVMe SSDs support the DEAC bit[1], the write zeroes command does not write actual data to the device, instead, NVMe converts the zeroed range to a deallocated state, which works fast and consumes almost no disk write bandwidth. This series implements the BLK_FEAT_WRITE_ZEROES_UNMAP feature and BLK_FLAG_WRITE_ZEROES_UNMAP_DISABLED flag for SCSI, NVMe and device-mapper drivers, and add the FALLOC_FL_WRITE_ZEROES and STATX_ATTR_WRITE_ZEROES_UNMAP support for ext4 and raw bdev devices. fallocate() is subsequently extended with the FALLOC_FL_WRITE_ZEROES flag. FALLOC_FL_WRITE_ZEROES zeroes a specified file range in such a way that subsequent writes to that range do not require further changes to the file mapping metadata. This flag is beneficial for subsequent pure overwriting within this range, as it can save on block allocation and, consequently, significant metadata changes" * tag 'vfs-6.17-rc1.fallocate' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: ext4: add FALLOC_FL_WRITE_ZEROES support block: add FALLOC_FL_WRITE_ZEROES support block: factor out common part in blkdev_fallocate() fs: introduce FALLOC_FL_WRITE_ZEROES to fallocate dm: clear unmap write zeroes limits when disabling write zeroes scsi: sd: set max_hw_wzeroes_unmap_sectors if device supports SD_ZERO_*_UNMAP nvmet: set WZDS and DRB if device enables unmap write zeroes operation nvme: set max_hw_wzeroes_unmap_sectors if device supports DEAC bit block: introduce max_{hw|user}_wzeroes_unmap_sectors to queue limits
6 daysMerge tag 'nfsd-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "NFSD is finally able to offer write delegations to clients that open files with O_WRONLY, thanks to patches from Dai Ngo. We're expecting this to accelerate a few interesting corner cases. The cap on the number of operations per NFSv4 COMPOUND has been lifted. Now, clients that send COMPOUNDs containing dozens of operations (for example, a long stream of LOOKUP operations to walk a pathname in a single round trip) will no longer be rejected. This release re-enables the ability for NFSD to perform NFSv4.2 COPY operations asynchronously. This feature has been disabled to mitigate the risk of denial-of-service when too many such requests arrive. Many thanks to the contributors, reviewers, testers, and bug reporters who participated during the v6.17 development cycle" * tag 'nfsd-6.17' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (32 commits) nfsd: Drop dprintk in blocklayout xdr functions sunrpc: make svc_tcp_sendmsg() take a signed sentp pointer sunrpc: rearrange struct svc_rqst for fewer cachelines sunrpc: return better error in svcauth_gss_accept() on alloc failure sunrpc: reset rq_accept_statp when starting a new RPC sunrpc: remove SVC_SYSERR sunrpc: fix handling of unknown auth status codes NFSD: Simplify struct knfsd_fh NFSD: Access a knfsd_fh's fsid by pointer Revert "NFSD: Force all NFSv4.2 COPY requests to be synchronous" NFSD: Avoid multiple -Wflex-array-member-not-at-end warnings NFSD: Use vfs_iocb_iter_write() NFSD: Use vfs_iocb_iter_read() NFSD: Clean up kdoc for nfsd_open_local_fh() NFSD: Clean up kdoc for nfsd_file_put_local() NFSD: Remove definition for trace_nfsd_ctl_maxconn NFSD: Remove definition for trace_nfsd_file_gc_recent NFSD: Remove definitions for unused trace_nfsd_file_lru trace points NFSD: Remove definition for trace_nfsd_file_unhash_and_queue nfsd: Use correct error code when decoding extents ...
8 daysmm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info()Zi Yan
The trace event has not recorded the right data since it was introduced at commit c8b360031218 ("mm: add alloc_contig_migrate_range allocation statistics"). Remove it. Link: https://lkml.kernel.org/r/20250722194649.4135191-1-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202507220742.P3SaKlI6-lkp@intel.com/ Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Brendan Jackman <jackmanb@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Martin Liu <liumartin@google.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Richard Chang <richardycc@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 daystracing: sched: Hide numa events under CONFIG_NUMA_BALANCINGSteven Rostedt
The events sched_move_numa, sched_stick_numa and sched_swap_numa are only called when CONFIG_NUMA_BALANCING is configured. As each event can take up to 5K of memory in text and meta data regardless if they are used or not, they should not be defined when unused. Move the #ifdef CONFIG_NUMA_BALANCING to hide these events as well. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/20250612100552.39672cf9@batman.local.home Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
10 daysext4: get rid of some obsolete EXT4_MB_HINT flagsBaokun Li
Since nobody has used these EXT4_MB_HINT flags for ages, let's remove them. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-7-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
10 dayspowerpc/thp: tracing: Hide hugepage events under CONFIG_PPC_BOOK3S_64Steven Rostedt
The events hugepage_set_pmd, hugepage_set_pud, hugepage_update_pmd and hugepage_update_pud are only called when CONFIG_PPC_BOOK3S_64 is defined. As each event can take up to 5K regardless if they are used or not, it's best not to define them when they are not used. Add #ifdef around these events when they are not used. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/20250612101259.0ad43e48@batman.local.home Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
10 daysMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc8). Conflicts: drivers/net/ethernet/microsoft/mana/gdma_main.c 9669ddda18fb ("net: mana: Fix warnings for missing export.h header inclusion") 755391121038 ("net: mana: Allocate MSI-X vectors dynamically") https://lore.kernel.org/20250711130752.23023d98@canb.auug.org.au Adjacent changes: drivers/net/ethernet/ti/icssg/icssg_prueth.h 6e86fb73de0f ("net: ti: icssg-prueth: Fix buffer allocation for ICSSG") ffe8a4909176 ("net: ti: icssg-prueth: Read firmware-names from device tree") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
11 daystracing: arm: arm64: Hide trace events ipi_raise, ipi_entry and ipi_exitSteven Rostedt
The ipi tracepoints are mostly generic, but the tracepoints ipi_raise, ipi_entry and ipi_exit are only used by arm and arm64. This means these trace events are wasting memory in all the other architectures that do not use them. Add CONFIG_HAVE_EXTRA_IPI_TRACEPOINTS and have arm and arm64 select it to enable these trace events. The config makes it easy if other architectures decide to trace these as well. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Will Deacon <will@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Nicolas Pitre <nico@fluxnic.net> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/20250722103714.64eba013@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
12 daystcp: trace retransmit failures in tcp_retransmit_skbFan Yu
Background ========== When TCP retransmits a packet due to missing ACKs, the retransmission may fail for various reasons (e.g., packets stuck in driver queues, receiver zero windows, or routing issues). The original tcp_retransmit_skb tracepoint: 'commit e086101b150a ("tcp: add a tracepoint for tcp retransmission")' lacks visibility into these failure causes, making production diagnostics difficult. Solution ======== Adds the retval("err") to the tcp_retransmit_skb tracepoint. Enables users to know why some tcp retransmission failed and users can filter retransmission failures by retval. Compatibility description ========================= This patch extends the tcp_retransmit_skb tracepoint by adding a new "err" field at the end of its existing structure (within TP_STRUCT__entry). The compatibility implications are detailed as follows: 1) Structural compatibility for legacy user-space tools Legacy tools/BPF programs accessing existing fields (by offset or name) can still work without modification or recompilation.The new field is appended to the end, preserving original memory layout. 2) Note: semantic changes The original tracepoint primarily only focused on successfully retransmitted packets. With this patch, the tracepoint now can figure out packets that may terminate early due to specific reasons. For accurate statistics, users should filter using "err" to distinguish outcomes. Before patched: field:const void * skbaddr; offset:8; size:8; signed:0; field:const void * skaddr; offset:16; size:8; signed:0; field:int state; offset:24; size:4; signed:1; field:__u16 sport; offset:28; size:2; signed:0; field:__u16 dport; offset:30; size:2; signed:0; field:__u16 family; offset:32; size:2; signed:0; field:__u8 saddr[4]; offset:34; size:4; signed:0; field:__u8 daddr[4]; offset:38; size:4; signed:0; field:__u8 saddr_v6[16]; offset:42; size:16; signed:0; field:__u8 daddr_v6[16]; offset:58; size:16; signed:0; print fmt: "skbaddr=%p skaddr=%p family=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c state=%s" After patched: field:const void * skbaddr; offset:8; size:8; signed:0; field:const void * skaddr; offset:16; size:8; signed:0; field:int state; offset:24; size:4; signed:1; field:__u16 sport; offset:28; size:2; signed:0; field:__u16 dport; offset:30; size:2; signed:0; field:__u16 family; offset:32; size:2; signed:0; field:__u8 saddr[4]; offset:34; size:4; signed:0; field:__u8 daddr[4]; offset:38; size:4; signed:0; field:__u8 saddr_v6[16]; offset:42; size:16; signed:0; field:__u8 daddr_v6[16]; offset:58; size:16; signed:0; field:int err; offset:76; size:4; signed:1; print fmt: "skbaddr=%p skaddr=%p family=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c state=%s err=%d" Co-developed-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Fan Yu <fan.yu9@zte.com.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250721111607626_BDnIJB0ywk6FghN63bor@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
13 daysbtrfs: tree-log: add and rename extent bits for dirty_log_pages treeDavid Sterba
The dirty_log_pages tree is used for tree logging and marks extents based on log_transid. The bits could be renamed to resemble the LOG1/LOG2 naming used for the BTRFS_FS_LOG1_ERR bits. The DIRTY bit is renamed to LOG1 and NEW to LOG2. Signed-off-by: David Sterba <dsterba@suse.com>
13 daysbtrfs: use refcount_t type for the extent buffer reference counterFilipe Manana
Instead of using a bare atomic, use the refcount_t type, which despite being a structure that contains only an atomic, has an API that checks for underflows and other hazards. This doesn't change the size of the extent_buffer structure. This removes the need to do things like this: WARN_ON(atomic_read(&eb->refs) == 0); if (atomic_dec_and_test(&eb->refs)) { (...) } And do just: if (refcount_dec_and_test(&eb->refs)) { (...) } Since refcount_dec_and_test() already triggers a warning when we decrement a ref count that has a value of 0 (or below zero). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
13 daysPM: tracing: Hide power_domain_target event under ARCH_OMAP2PLUSSteven Rostedt
The power_domain_target event event is only called when CONFIG_OMAP2PLUS is defined. As each event can take up to 5K regardless if they are used or not, it's best not to define them when they are not used. Add #ifdef around these events when they are not used. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/20250612145408.415483176@goodmis.org Acked-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
13 daysPM: tracing: Hide device_pm_callback events under PM_SLEEPSteven Rostedt
The events device_pm_callback_start and device_pm_callback_end events are only called when CONFIG_PM_SLEEP is defined. As each event can take up to 5K regardless if they are used or not, it's best not to define them when they are not used. Add #ifdef around these events when they are not used. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/20250612145408.246703478@goodmis.org Acked-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
13 daysPM: tracing: Hide psci_domain_idle events under ARM_PSCI_CPUIDLESteven Rostedt
The events psci_domain_idle_enter and psci_domain_idle_exit events are only called when CONFIG_ARM_PSCI_CPUIDLE is defined. As each event can take up to 5K (less for DEFINE_EVENT()) regardless if they are used or not, it's best not to define them when they are not used. Add #ifdef around these events when they are not used. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/20250612145408.074769245@goodmis.org Acked-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
13 daysPM: cpufreq: powernv/tracing: Move powernv_throttle trace eventSteven Rostedt
As the trace event powernv_throttle is only used by the powernv code, move it to a separate include file and have that code directly enable it. Trace events can take up around 5K of memory when they are defined regardless if they are used or not. It wastes memory to have them defined in configurations where the tracepoint is not used. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/20250612145407.906308844@goodmis.org Fixes: 0306e481d479a ("cpufreq: powernv/tracing: Add powernv_throttle tracepoint") Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
13 daysalarmtimer: Hide alarmtimer_suspend event when RTC_CLASS is not configuredSteven Rostedt
The trace event alarmtimer_suspend is only called when RTC_CLASS is defined. As every event created can create up to 5K of text and meta data regardless if it is called or not it should not be created and waste memory. Hide the event when CONFIG_RTC_CLASS is not defined. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/20250612095828.6d75dfa3@batman.local.home Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
14 daysMerge tag 'scmi-updates-6.17' of ↵Arnd Bergmann
https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers Arm SCMI updates for v6.17 1. A fix is introduced to correct turbo frequency marking for 64-bit devices with sustained frequencies over 4GHz, ensuring accurate turbo frequency identification. 2. Debug capabilities are being improved by introducing in-flight transfer tracking using debug counters, which help diagnose transfer congestion and behavior. Additional tracepoints are added to log in-flight counts at transfer begin and end, offering better runtime insight. The debug counters now support decrement operations using a newly added scmi_dec_count helper, making counter tracking symmetric and more robust. 3. A race condition in suspend-resume logic is being resolved by ensuring SCMI_SYSPOWER_IDLE state is set early during resume, improving suspend reliability under certain conditions. New suspend and resume operations are added to the scmi_bus_type to enable finer power management control for SCMI-based devices. 4. Finally enhancements are also made to avoid registering notifiers for events that a platform does not support, reducing unnecessary overhead by checking for unsupported event types during protocolinitialization. * tag 'scmi-updates-6.17' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: firmware: arm_scmi: Convert to SYSTEM_SLEEP_PM_OPS firmware: arm_scmi: Avoid notifier registration for unsupported events firmware: arm_scmi: power_control: Ensure SCMI_SYSPOWER_IDLE is set early during resume firmware: arm_scmi: Add power management operations to SCMI bus include: trace: Add tracepoint support for inflight xfer count firmware: arm_scmi: Track number of inflight SCMI transfers firmware: arm_scmi: Add support for debug counter decrement firmware: arm_scmi: Fix up turbo frequencies selection Link: https://lore.kernel.org/r/20250709122907.1171913-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2025-07-19Merge tag 'vfs-6.16-rc7.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix a memory leak in fcntl_dirnotify() - Raise SB_I_NOEXEC on secrement superblock instead of messing with flags on the mount - Add fsdevel and block mailing lists to uio entry. We had a few instances were very questionable stuff was added without either block or the VFS being aware of it - Fix netfs copy-to-cache so that it performs collection with ceph+fscache - Fix netfs race between cache write completion and ALL_QUEUED being set - Verify the inode mode when loading entries from disk in isofs - Avoid state_lock in iomap_set_range_uptodate() - Fix PIDFD_INFO_COREDUMP check in PIDFD_GET_INFO ioctl - Fix the incorrect return value in __cachefiles_write() * tag 'vfs-6.16-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: MAINTAINERS: add block and fsdevel lists to iov_iter netfs: Fix race between cache write completion and ALL_QUEUED being set netfs: Fix copy-to-cache so that it performs collection with ceph+fscache fix a leak in fcntl_dirnotify() iomap: avoid unnecessary ifs_set_range_uptodate() with locks isofs: Verify inode mode when loading from disk cachefiles: Fix the incorrect return value in __cachefiles_write() secretmem: use SB_I_NOEXEC coredump: fix PIDFD_INFO_COREDUMP ioctl check
2025-07-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc7). Conflicts: Documentation/netlink/specs/ovpn.yaml 880d43ca9aa4 ("netlink: specs: clean up spaces in brackets") af52020fc599 ("ovpn: reject unexpected netlink attributes") drivers/net/phy/phy_device.c a44312d58e78 ("net: phy: Don't register LEDs for genphy") f0f2b992d818 ("net: phy: Don't register LEDs for genphy") https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org drivers/net/wireless/intel/iwlwifi/fw/regulatory.c drivers/net/wireless/intel/iwlwifi/mld/regulatory.c 5fde0fcbd760 ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap") ea045a0de3b9 ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware") net/ipv6/mcast.c ae3264a25a46 ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()") a8594c956cc9 ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()") https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17rxrpc: Fix notification vs call-release vs recvmsgDavid Howells
When a call is released, rxrpc takes the spinlock and removes it from ->recvmsg_q in an effort to prevent racing recvmsg() invocations from seeing the same call. Now, rxrpc_recvmsg() only takes the spinlock when actually removing a call from the queue; it doesn't, however, take it in the lead up to that when it checks to see if the queue is empty. It *does* hold the socket lock, which prevents a recvmsg/recvmsg race - but this doesn't prevent sendmsg from ending the call because sendmsg() drops the socket lock and relies on the call->user_mutex. Fix this by firstly removing the bit in rxrpc_release_call() that dequeues the released call and, instead, rely on recvmsg() to simply discard released calls (done in a preceding fix). Secondly, rxrpc_notify_socket() is abandoned if the call is already marked as released rather than trying to be clever by setting both pointers in call->recvmsg_link to NULL to trick list_empty(). This isn't perfect and can still race, resulting in a released call on the queue, but recvmsg() will now clean that up. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> cc: LePremierHomme <kwqcheii@proton.me> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250717074350.3767366-4-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-17rxrpc: Fix recv-recv race of completed callDavid Howells
If a call receives an event (such as incoming data), the call gets placed on the socket's queue and a thread in recvmsg can be awakened to go and process it. Once the thread has picked up the call off of the queue, further events will cause it to be requeued, and once the socket lock is dropped (recvmsg uses call->user_mutex to allow the socket to be used in parallel), a second thread can come in and its recvmsg can pop the call off the socket queue again. In such a case, the first thread will be receiving stuff from the call and the second thread will be blocked on call->user_mutex. The first thread can, at this point, process both the event that it picked call for and the event that the second thread picked the call for and may see the call terminate - in which case the call will be "released", decoupling the call from the user call ID assigned to it (RXRPC_USER_CALL_ID in the control message). The first thread will return okay, but then the second thread will wake up holding the user_mutex and, if it sees that the call has been released by the first thread, it will BUG thusly: kernel BUG at net/rxrpc/recvmsg.c:474! Fix this by just dequeuing the call and ignoring it if it is seen to be already released. We can't tell userspace about it anyway as the user call ID has become stale. Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Reported-by: Junvyyang, Tencent Zhuque Lab <zhuque@tencent.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: LePremierHomme <kwqcheii@proton.me> cc: Marc Dionne <marc.dionne@auristor.com> cc: Simon Horman <horms@kernel.org> cc: linux-afs@lists.infradead.org Link: https://patch.msgid.link/20250717074350.3767366-3-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-16block: fix blk_zone_append_update_request_bio() kernel-docJohannes Thumshirn
Stephen reported new 'make htmldocs' warnings introduced by 4cc21a00762b ("block: add tracepoint for blk_zone_update_request_bio"). One is a wrong function name in the tracepoint's kernel-doc and one is a wrong function parameter. Fix these so 'make htmldocs' is warning free again for the block layer tracepoints. Fixes: 4cc21a00762b ("block: add tracepoint for blk_zone_update_request_bio") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250716133631.94898-1-johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-15block: add trace messages to zone write pluggingJohannes Thumshirn
Add tracepoints to zone write plugging plug and unplug events. Examples for these events are: kworker/u10:4-393 [001] d..1. 282.991660: disk_zone_wplug_add_bio: 8,0 zone 16, BIO 8388608 + 128 kworker/0:1H-58 [ [000] d..1. 283.083294: blk_zone_wplug_bio: 8,0 zone 15, BIO 7864320 + 128 Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250715115324.53308-6-johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-15block: add tracepoint for blkdev_zone_mgmtJohannes Thumshirn
Add a tracepoint for blkdev_zone_mgmt to trace zone management commands submitted by higher layers like file systems or user space. An example output for this tracepoint is as follows: mkfs.btrfs-203 [001] ..... 42.877493: blkdev_zone_mgmt: 8,0 ZRS 5242880 + 0 This example output shows a REQ_OP_ZONE_RESET operation submitted by mkfs.btrfs. Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250715115324.53308-5-johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-15block: add tracepoint for blk_zone_update_request_bioJohannes Thumshirn
Add a tracepoint in blk_zone_update_request_bio() to trace the bio sector update on ZONE APPEND completions. An example for this tracepoint is as follows: <idle>-0 [001] d.h1. 381.746444: blk_zone_update_request_bio: 259,5 ZAS 131072 () 1048832 + 256 none,0,0 [swapper/1] Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250715115324.53308-4-johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-15blktrace: add zoned block commands to blk_fill_rwbsJohannes Thumshirn
Add zoned block commands to blk_fill_rwbs: - ZONE APPEND will be decoded as 'ZA' - ZONE RESET will be decoded as 'ZR' - ZONE RESET ALL will be decoded as 'ZRA' - ZONE FINISH will be decoded as 'ZF' - ZONE OPEN will be decoded as 'ZO' - ZONE CLOSE will be decoded as 'ZC' Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20250715115324.53308-2-johannes.thumshirn@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-14sunrpc: remove SVC_SYSERRJeff Layton
Nothing returns this error code. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14sunrpc: new tracepoints around svc thread wakeupsJeff Layton
Convert the svc_wake_up tracepoint into svc_pool_thread_event class. Have it also record the pool id, and add new tracepoints for when the thread is already running and for when there are no idle threads. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-07-14netfs: Fix race between cache write completion and ALL_QUEUED being setDavid Howells
When netfslib is issuing subrequests, the subrequests start processing immediately and may complete before we reach the end of the issuing function. At the end of the issuing function we set NETFS_RREQ_ALL_QUEUED to indicate to the collector that we aren't going to issue any more subreqs and that it can do the final notifications and cleanup. Now, this isn't a problem if the request is synchronous (NETFS_RREQ_OFFLOAD_COLLECTION is unset) as the result collection will be done in-thread and we're guaranteed an opportunity to run the collector. However, if the request is asynchronous, collection is primarily triggered by the termination of subrequests queuing it on a workqueue. Now, a race can occur here if the app thread sets ALL_QUEUED after the last subrequest terminates. This can happen most easily with the copy2cache code (as used by Ceph) where, in the collection routine of a read request, an asynchronous write request is spawned to copy data to the cache. Folios are added to the write request as they're unlocked, but there may be a delay before ALL_QUEUED is set as the write subrequests may complete before we get there. If all the write subreqs have finished by the ALL_QUEUED point, no further events happen and the collection never happens, leaving the request hanging. Fix this by queuing the collector after setting ALL_QUEUED. This is a bit heavy-handed and it may be sufficient to do it only if there are no extant subreqs. Also add a tracepoint to cross-reference both requests in a copy-to-request operation and add a trace to the netfs_rreq tracepoint to indicate the setting of ALL_QUEUED. Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item") Reported-by: Max Kellermann <max.kellermann@ionos.com> Link: https://lore.kernel.org/r/CAKPOu+8z_ijTLHdiCYGU_Uk7yYD=shxyGLwfe-L7AV3DhebS3w@mail.gmail.com/ Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/20250711151005.2956810-3-dhowells@redhat.com Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> cc: Paulo Alcantara <pc@manguebit.org> cc: Viacheslav Dubeyko <slava@dubeyko.com> cc: Alex Markuze <amarkuze@redhat.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: netfs@lists.linux.dev cc: ceph-devel@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-13ext4: enhance tracepoints during the folios writebackZhang Yi
After mpage_map_and_submit_extent() supports restarting handle if credits are insufficient during allocating blocks, it is more likely to exit the current mapping iteration and continue to process the current processing partially mapped folio again. The existing tracepoints are not sufficient to track this situation, so enhance the tracepoints to track the writeback position and the return value before and after submitting the folios. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250707140814.542883-7-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-13ext4: process folios writeback in bytesZhang Yi
Since ext4 supports large folios, processing writebacks in pages is no longer appropriate, it can be modified to process writebacks in bytes. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250707140814.542883-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-13mm/damon: add trace event for effective size quotaSeongJae Park
Aim-oriented DAMOS quota auto-tuning is an important and recommended feature for DAMOS users. Add a trace event for the observability of the tuned quota and tuning itself. [sj@kernel.org: initialize sidx in damos_trace_esz()] Link: https://lkml.kernel.org/r/20250705172003.52324-1-sj@kernel.org [sj@kernel.org: make damos_esz unconditional trace event] Link: https://lkml.kernel.org/r/20250709182843.35812-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250704221408.38510-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-13mm/damon: add trace event for auto-tuned monitoring intervalsSeongJae Park
Patch series "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota". The aim-oriented auto-tuning features for monitoring intervals and DAMOS quota are important and recommended. Add tracepoints for observabilities of those tuned values and the tuning itself. This patch (of 2): Aim-oriented monitoring intervals auto-tuning is an important and recommended feature for DAMON users. Add a trace event for the observability of the tuned intervals and tuning itself. Link: https://lkml.kernel.org/r/20250704221408.38510-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250704221408.38510-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-13mm/page_isolation: remove migratetype parameter from more functionsZi Yan
migratetype is no longer overwritten during pageblock isolation, start_isolate_page_range(), has_unmovable_pages(), and set_migratetype_isolate() no longer need which migratetype to restore during isolation failure. For has_unmoable_pages(), it needs to know if the isolation is for CMA allocation, so adding PB_ISOLATE_MODE_CMA_ALLOC provide the information. At the same time change isolation flags to enum pb_isolate_mode (PB_ISOLATE_MODE_MEM_OFFLINE, PB_ISOLATE_MODE_CMA_ALLOC, PB_ISOLATE_MODE_OTHER). Remove REPORT_FAILURE and check PB_ISOLATE_MODE_MEM_OFFLINE, since only PB_ISOLATE_MODE_MEM_OFFLINE reports isolation failures. alloc_contig_range() no longer needs migratetype. Replace it with a newly defined acr_flags_t to tell if an allocation is for CMA. So does __alloc_contig_migrate_range(). Add ACR_FLAGS_NONE (set to 0) to indicate ordinary allocations. Link: https://lkml.kernel.org/r/20250617021115.2331563-7-ziy@nvidia.com Signed-off-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Richard Chang <richardycc@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.16-rc6). No conflicts. Adjacent changes: Documentation/devicetree/bindings/net/allwinner,sun8i-a83t-emac.yaml 0a12c435a1d6 ("dt-bindings: net: sun8i-emac: Add A100 EMAC compatible") b3603c0466a8 ("dt-bindings: net: sun8i-emac: Rename A523 EMAC0 to GMAC0") Signed-off-by: Jakub Kicinski <kuba@kernel.org>