linux-arm.git - Russell King's ARM Linux kernel tree

Age	Commit message (Collapse)	Author
2025-07-31	Merge tag 'mm-stable-2025-07-30-15-25' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "As usual, many cleanups. The below blurbiage describes 42 patchsets. 21 of those are partially or fully cleanup work. "cleans up", "cleanup", "maintainability", "rationalizes", etc. I never knew the MM code was so dirty. "mm: ksm: prevent KSM from breaking merging of new VMAs" (Lorenzo Stoakes) addresses an issue with KSM's PR_SET_MEMORY_MERGE mode: newly mapped VMAs were not eligible for merging with existing adjacent VMAs. "mm/damon: introduce DAMON_STAT for simple and practical access monitoring" (SeongJae Park) adds a new kernel module which simplifies the setup and usage of DAMON in production environments. "stop passing a writeback_control to swap/shmem writeout" (Christoph Hellwig) is a cleanup to the writeback code which removes a couple of pointers from struct writeback_control. "drivers/base/node.c: optimization and cleanups" (Donet Tom) contains largely uncorrelated cleanups to the NUMA node setup and management code. "mm: userfaultfd: assorted fixes and cleanups" (Tal Zussman) does some maintenance work on the userfaultfd code. "Readahead tweaks for larger folios" (Ryan Roberts) implements some tuneups for pagecache readahead when it is reading into order>0 folios. "selftests/mm: Tweaks to the cow test" (Mark Brown) provides some cleanups and consistency improvements to the selftests code. "Optimize mremap() for large folios" (Dev Jain) does that. A 37% reduction in execution time was measured in a memset+mremap+munmap microbenchmark. "Remove zero_user()" (Matthew Wilcox) expunges zero_user() in favor of the more modern memzero_page(). "mm/huge_memory: vmf_insert_folio_() and vmf_insert_pfn_pud() fixes" (David Hildenbrand) addresses some warts which David noticed in the huge page code. These were not known to be causing any issues at this time. "mm/damon: use alloc_migrate_target() for DAMOS_MIGRATE_{HOT,COLD" (SeongJae Park) provides some cleanup and consolidation work in DAMON. "use vm_flags_t consistently" (Lorenzo Stoakes) uses vm_flags_t in places where we were inappropriately using other types. "mm/memfd: Reserve hugetlb folios before allocation" (Vivek Kasireddy) increases the reliability of large page allocation in the memfd code. "mm: Remove pXX_devmap page table bit and pfn_t type" (Alistair Popple) removes several now-unneeded PFN_ flags. "mm/damon: decouple sysfs from core" (SeongJae Park) implememnts some cleanup and maintainability work in the DAMON sysfs layer. "madvise cleanup" (Lorenzo Stoakes) does quite a lot of cleanup/maintenance work in the madvise() code. "madvise anon_name cleanups" (Vlastimil Babka) provides additional cleanups on top or Lorenzo's effort. "Implement numa node notifier" (Oscar Salvador) creates a standalone notifier for NUMA node memory state changes. Previously these were lumped under the more general memory on/offline notifier. "Make MIGRATE_ISOLATE a standalone bit" (Zi Yan) cleans up the pageblock isolation code and fixes a potential issue which doesn't seem to cause any problems in practice. "selftests/damon: add python and drgn based DAMON sysfs functionality tests" (SeongJae Park) adds additional drgn- and python-based DAMON selftests which are more comprehensive than the existing selftest suite. "Misc rework on hugetlb faulting path" (Oscar Salvador) fixes a rather obscure deadlock in the hugetlb fault code and follows that fix with a series of cleanups. "cma: factor out allocation logic from __cma_declare_contiguous_nid" (Mike Rapoport) rationalizes and cleans up the highmem-specific code in the CMA allocator. "mm/migration: rework movable_ops page migration (part 1)" (David Hildenbrand) provides cleanups and future-preparedness to the migration code. "mm/damon: add trace events for auto-tuned monitoring intervals and DAMOS quota" (SeongJae Park) adds some tracepoints to some DAMON auto-tuning code. "mm/damon: fix misc bugs in DAMON modules" (SeongJae Park) does that. "mm/damon: misc cleanups" (SeongJae Park) also does what it claims. "mm: folio_pte_batch() improvements" (David Hildenbrand) cleans up the large folio PTE batching code. "mm/damon/vaddr: Allow interleaving in migrate_{hot,cold} actions" (SeongJae Park) facilitates dynamic alteration of DAMON's inter-node allocation policy. "Remove unmap_and_put_page()" (Vishal Moola) provides a couple of page->folio conversions. "mm: per-node proactive reclaim" (Davidlohr Bueso) implements a per-node control of proactive reclaim - beyond the current memcg-based implementation. "mm/damon: remove damon_callback" (SeongJae Park) replaces the damon_callback interface with a more general and powerful damon_call()+damos_walk() interface. "mm/mremap: permit mremap() move of multiple VMAs" (Lorenzo Stoakes) implements a number of mremap cleanups (of course) in preparation for adding new mremap() functionality: newly permit the remapping of multiple VMAs when the user is specifying MREMAP_FIXED. It still excludes some specialized situations where this cannot be performed reliably. "drop hugetlb_free_pgd_range()" (Anthony Yznaga) switches some sparc hugetlb code over to the generic version and removes the thus-unneeded hugetlb_free_pgd_range(). "mm/damon/sysfs: support periodic and automated stats update" (SeongJae Park) augments the present userspace-requested update of DAMON sysfs monitoring files. Automatic update is now provided, along with a tunable to control the update interval. "Some randome fixes and cleanups to swapfile" (Kemeng Shi) does what is claims. "mm: introduce snapshot_page" (Luiz Capitulino and David Hildenbrand) provides (and uses) a means by which debug-style functions can grab a copy of a pageframe and inspect it locklessly without tripping over the races inherent in operating on the live pageframe directly. "use per-vma locks for /proc/pid/maps reads" (Suren Baghdasaryan) addresses the large contention issues which can be triggered by reads from that procfs file. Latencies are reduced by more than half in some situations. The series also introduces several new selftests for the /proc/pid/maps interface. "__folio_split() clean up" (Zi Yan) cleans up __folio_split()! "Optimize mprotect() for large folios" (Dev Jain) provides some quite large (>3x) speedups to mprotect() when dealing with large folios. "selftests/mm: reuse FORCE_READ to replace "asm volatile("" : "+r" (XXX));" and some cleanup" (wang lian) does some cleanup work in the selftests code. "tools/testing: expand mremap testing" (Lorenzo Stoakes) extends the mremap() selftest in several ways, including adding more checking of Lorenzo's recently added "permit mremap() move of multiple VMAs" feature. "selftests/damon/sysfs.py: test all parameters" (SeongJae Park) extends the DAMON sysfs interface selftest so that it tests all possible user-requested parameters. Rather than the present minimal subset" * tag 'mm-stable-2025-07-30-15-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (370 commits) MAINTAINERS: add missing headers to mempory policy & migration section MAINTAINERS: add missing file to cgroup section MAINTAINERS: add MM MISC section, add missing files to MISC and CORE MAINTAINERS: add missing zsmalloc file MAINTAINERS: add missing files to page alloc section MAINTAINERS: add missing shrinker files MAINTAINERS: move memremap.[ch] to hotplug section MAINTAINERS: add missing mm_slot.h file THP section MAINTAINERS: add missing interval_tree.c to memory mapping section MAINTAINERS: add missing percpu-internal.h file to per-cpu section mm/page_alloc: remove trace_mm_alloc_contig_migrate_range_info() selftests/damon: introduce _common.sh to host shared function selftests/damon/sysfs.py: test runtime reduction of DAMON parameters selftests/damon/sysfs.py: test non-default parameters runtime commit selftests/damon/sysfs.py: generalize DAMON context commit assertion selftests/damon/sysfs.py: generalize monitoring attributes commit assertion selftests/damon/sysfs.py: generalize DAMOS schemes commit assertion selftests/damon/sysfs.py: test DAMOS filters commitment selftests/damon/sysfs.py: generalize DAMOS scheme commit assertion selftests/damon/sysfs.py: test DAMOS destinations commitment ...
2025-07-09	mm: update architecture and driver code to use vm_flags_t	Lorenzo Stoakes
	In future we intend to change the vm_flags_t type, so it isn't correct for architecture and driver code to assume it is unsigned long. Correct this assumption across the board. Overall, this patch does not introduce any functional change. Link: https://lkml.kernel.org/r/b6eb1894abc5555ece80bb08af5c022ef780c8bc.1750274467.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64] Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-06-23	KVM: PPC: Book3S HV: Add H_VIRT mapping for tracing exits	Gautam Menghani
	The macro kvm_trace_symbol_exit is used for providing the mappings for the trap vectors and their names. Add mapping for H_VIRT so that trap reason is displayed as string instead of a vector number when using the kvm_guest_exit tracepoint. Signed-off-by: Gautam Menghani <gautam@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250516121225.276466-1-gautam@linux.ibm.com
2025-06-08	treewide, timers: Rename from_timer() to timer_container_of()	Ingo Molnar
	Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-05-27	Merge tag 'timers-cleanups-2025-05-25' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "Another set of timer API cleanups: - Convert init_timer(), try_to_del_timer_sync() and destroy_timer_on_stack() over to the canonical timer_() namespace convention. There is another large conversion pending, which has not been included because it would have caused a gazillion of merge conflicts in next. The conversion scripts will be run towards the end of the merge window and a pull request sent once all conflict dependencies have been merged" * tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack() treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try() timers: Rename init_timers() as timers_init() timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA timers: Rename __init_timer_on_stack() as __timer_init_on_stack() timers: Rename __init_timer() as __timer_init() timers: Rename init_timer_on_stack_key() as timer_init_key_on_stack() timers: Rename init_timer_key() as timer_init_key()
2025-05-12	KVM: PPC: Book3S HV: Fix IRQ map warnings with XICS on pSeries KVM Guest	Amit Machhiwal
	The commit 9576730d0e6e ("KVM: PPC: select IRQ_BYPASS_MANAGER") enabled IRQ_BYPASS_MANAGER when CONFIG_KVM was set. Subsequently, commit c57875f5f9be ("KVM: PPC: Book3S HV: Enable IRQ bypass") enabled IRQ bypass and added the necessary callbacks to create/remove the mappings between host real IRQ and the guest GSI. The availability of IRQ bypass is determined by the arch-specific function kvm_arch_has_irq_bypass(), which invokes kvmppc_irq_bypass_add_producer_hv(). This function, in turn, calls kvmppc_set_passthru_irq_hv() to create a mapping in the passthrough IRQ map, associating a host IRQ to a guest GSI. However, when a pSeries KVM guest (L2) is booted within an LPAR (L1) with the kernel boot parameter `xive=off`, it defaults to using emulated XICS controller. As an attempt to establish host IRQ to guest GSI mappings via kvmppc_set_passthru_irq() on a PCI device hotplug (passhthrough) operation fail, returning -ENOENT. This failure occurs because only interrupts with EOI operations handled through OPAL calls (verified via is_pnv_opal_msi()) are currently supported. These mapping failures lead to below repeated warnings in the L1 host: [ 509.220349] kvmppc_set_passthru_irq_hv: Could not assign IRQ map for (58,4970) [ 509.220368] kvmppc_set_passthru_irq (irq 58, gsi 4970) fails: -2 [ 509.220376] vfio-pci 0015:01:00.0: irq bypass producer (token 0000000090bc635b) registration fails: -2 ... [ 509.291781] vfio-pci 0015:01:00.0: irq bypass producer (token 000000003822eed8) registration fails: -2 Fix this by restricting IRQ bypass enablement on pSeries systems by making the IRQ bypass callbacks unavailable when running on pSeries platform. Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com> Tested-by: Gautam Menghani <gautam@linux.ibm.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250425185641.1611857-1-amachhiw@linux.ibm.com
2025-05-08	timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA	Ingo Molnar
	Move this macro to the canonical TIMER_* namespace. Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250507175338.672442-7-mingo@kernel.org
2025-04-16	KVM: powerpc: Enable commented out BUILD_BUG_ON() assertion	Thorsten Blum
	The BUILD_BUG_ON() assertion was commented out in commit 38634e676992 ("powerpc/kvm: Remove problematic BUILD_BUG_ON statement") and fixed in commit c0a187e12d48 ("KVM: powerpc: Fix BUILD_BUG_ON condition"), but not enabled. Enable it now that this no longer breaks and remove the comment. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250411084222.6916-1-thorsten.blum@linux.dev
2025-04-16	kvm powerpc/book3s-apiv2: Introduce kvm-hv specific PMU	Vaibhav Jain
	Introduce a new PMU named 'kvm-hv' inside a new module named 'kvm-hv-pmu' to report Book3s kvm-hv specific performance counters. This will expose KVM-HV specific performance attributes to user-space via kernel's PMU infrastructure and would enableusers to monitor active kvm-hv based guests. The patch creates necessary scaffolding to for the new PMU callbacks and introduces the new kernel module name 'kvm-hv-pmu' which is built with CONFIG_KVM_BOOK3S_HV_PMU. The patch doesn't introduce any perf-events yet, which will be introduced in later patches Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250416162740.93143-5-vaibhav@linux.ibm.com
2025-04-16	kvm powerpc/book3s-apiv2: Add kunit tests for Hostwide GSB elements	Vaibhav Jain
	Update 'test-guest-state-buffer.c' to add two new KUNIT test cases for validating correctness of changes to Guest-state-buffer management infrastructure for adding support for Hostwide GSB elements. The newly introduced test test_gs_hostwide_msg() checks if the Hostwide elements can be set and parsed from a Guest-state-buffer. The second kunit test test_gs_hostwide_counters() checks if the Hostwide GSB elements can be send to the L0-PowerVM hypervisor via the H_GUEST_SET_STATE hcall and ensures that the returned guest-state-buffer has all the 5 Hostwide stat counters present. Below is the KATP test report with the newly added KUNIT tests: KTAP version 1 # Subtest: guest_state_buffer_test # module: test_guest_state_buffer 1..7 ok 1 test_creating_buffer ok 2 test_adding_element ok 3 test_gs_bitmap ok 4 test_gs_parsing ok 5 test_gs_msg ok 6 test_gs_hostwide_msg # test_gs_hostwide_counters: Guest Heap Size=0 bytes # test_gs_hostwide_counters: Guest Heap Size Max=10995367936 bytes # test_gs_hostwide_counters: Guest Page-table Size=2178304 bytes # test_gs_hostwide_counters: Guest Page-table Size Max=2147483648 bytes # test_gs_hostwide_counters: Guest Page-table Reclaim Size=0 bytes ok 7 test_gs_hostwide_counters # guest_state_buffer_test: pass:7 fail:0 skip:0 total:7 # Totals: pass:7 fail:0 skip:0 total:7 ok 1 guest_state_buffer_test Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250416162740.93143-4-vaibhav@linux.ibm.com
2025-04-16	kvm powerpc/book3s-apiv2: Add support for Hostwide GSB elements	Vaibhav Jain
	Add support for adding and parsing Hostwide elements to the Guest-state-buffer data structure used in apiv2. These elements are used to share meta-information pertaining to entire L1-Lpar and this meta-information is maintained by L0-PowerVM hypervisor. Example of this include the amount of the page-table memory currently used by L0-PowerVM for hosting the Shadow-Pagetable of all active L2-Guests. More of the are documented in kernel-documentation at [1]. The Hostwide GSB elements are currently only support with H_GUEST_SET_STATE hcall with a special flag namely 'KVMPPC_GS_FLAGS_HOST_WIDE'. The patch introduces new defs for the 5 new Hostwide GSB elements including their GSIDs as well as introduces a new class of GSB elements namely 'KVMPPC_GS_CLASS_HOSTWIDE' to indicate to GSB construction/parsing infrastructure in 'kvm/guest-state-buffer.c'. Also gs_msg_ops_vcpu_get_size(), kvmppc_gsid_type() and kvmppc_gse_{flatten,unflatten}_iden() are updated to appropriately indicate the needed size for these Hostwide GSB elements as well as how to flatten/unflatten their GSIDs so that they can be marked as available in GSB bitmap. [1] Documention/arch/powerpc/kvm-nested.rst Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250416162740.93143-3-vaibhav@linux.ibm.com
2025-04-06	Merge tag 'timers-cleanups-2025-04-06' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "A set of final cleanups for the timer subsystem: - Convert all del_timer[_sync]() instances over to the new timer_delete[_sync]() API and remove the legacy wrappers. Conversion was done with coccinelle plus some manual fixups as coccinelle chokes on scoped_guard(). - The final cleanup of the hrtimer_init() to hrtimer_setup() conversion. This has been delayed to the end of the merge window, so that all patches which have been merged through other trees are in mainline and all new users are catched. Doing this right before rc1 ensures that new code which is merged post rc1 is not introducing new instances of the original functionality" * tag 'timers-cleanups-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tracing/timers: Rename the hrtimer_init event to hrtimer_setup hrtimers: Rename debug_init_on_stack() to debug_setup_on_stack() hrtimers: Rename debug_init() to debug_setup() hrtimers: Rename __hrtimer_init_sleeper() to __hrtimer_setup_sleeper() hrtimers: Remove unnecessary NULL check in hrtimer_start_range_ns() hrtimers: Make callback function pointer private hrtimers: Merge __hrtimer_init() into __hrtimer_setup() hrtimers: Switch to use __htimer_setup() hrtimers: Delete hrtimer_init() treewide: Convert new and leftover hrtimer_init() users treewide: Switch/rename to timer_delete[_sync]()
2025-04-05	treewide: Switch/rename to timer_delete[_sync]()	Thomas Gleixner
	timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-04	irqdomain: Rename irq_get_default_host() to irq_get_default_domain()	Jiri Slaby (SUSE)
	Naming interrupt domains host is confusing at best and the irqdomain code uses both domain and host inconsistently. Therefore rename irq_get_default_host() to irq_get_default_domain(). Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250319092951.37667-4-jirislaby@kernel.org
2025-03-27	Merge tag 'powerpc-6.15-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Madhavan Srinivasan: - Remove support for IBM Cell Blades - SMP support for microwatt platform - Support for inline static calls on PPC32 - Enable pmu selftests for power11 platform - Enable hardware trace macro (HTM) hcall support - Support for limited address mode capability - Changes to RMA size from 512 MB to 768 MB to handle fadump - Misc fixes and cleanups Thanks to Abhishek Dubey, Amit Machhiwal, Andreas Schwab, Arnd Bergmann, Athira Rajeev, Avnish Chouhan, Christophe Leroy, Disha Goel, Donet Tom, Gaurav Batra, Gautam Menghani, Hari Bathini, Kajol Jain, Kees Cook, Mahesh Salgaonkar, Michael Ellerman, Paul Mackerras, Ritesh Harjani (IBM), Sathvika Vasireddy, Segher Boessenkool, Sourabh Jain, Vaibhav Jain, and Venkat Rao Bagalkote. * tag 'powerpc-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (61 commits) powerpc/kexec: fix physical address calculation in clear_utlb_entry() crypto: powerpc: Mark ghashp8-ppc.o as an OBJECT_FILES_NON_STANDARD powerpc: Fix 'intra_function_call not a direct call' warning powerpc/perf: Fix ref-counting on the PMU 'vpa_pmu' KVM: PPC: Enable CAP_SPAPR_TCE_VFIO on pSeries KVM guests powerpc/prom_init: Fixup missing #size-cells on PowerBook6,7 powerpc/microwatt: Add SMP support powerpc: Define config option for processors with broadcast TLBIE powerpc/microwatt: Define an idle power-save function powerpc/microwatt: Device-tree updates powerpc/microwatt: Select COMMON_CLK in order to get the clock framework net: toshiba: Remove reference to PPC_IBM_CELL_BLADE net: spider_net: Remove powerpc Cell driver cpufreq: ppc_cbe: Remove powerpc Cell driver genirq: Remove IRQ_EDGE_EOI_HANDLER docs: Remove reference to removed CBE_CPUFREQ_SPU_GOVERNOR powerpc: Remove UDBG_RTAS_CONSOLE powerpc/io: Use standard barrier macros in io.c powerpc/io: Rename _insw_ns() etc. powerpc/io: Use generic raw accessors ...
2025-03-10	powerpc: Fix 'intra_function_call not a direct call' warning	Christophe Leroy
	The following build warning have been reported: arch/powerpc/kvm/book3s_hv_rmhandlers.o: warning: objtool: .text+0xe84: intra_function_call not a direct call arch/powerpc/kernel/switch.o: warning: objtool: .text+0x4: intra_function_call not a direct call This happens due to commit bb7f054f4de2 ("objtool/powerpc: Add support for decoding all types of uncond branches") because that commit decodes 'bl .+4' as a normal instruction because that instruction is used by clang instead of 'bcl 20,31,+.4' for relocatable code. The solution is simply to remove the ANNOTATE_INTRA_FUNCTION_CALL annotation now that the instruction is not seen as a function call anymore. Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/all/8c4c3fc2-2bd7-4148-af68-2f504d6119e0@linux.ibm.com Fixes: bb7f054f4de2 ("objtool/powerpc: Add support for decoding all types of uncond branches") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Tested-By: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Sathvika Vasireddy <sv@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/88876fb4e412203452e57d1037a1341cf15ccc7b.1741128981.git.christophe.leroy@csgroup.eu
2025-03-07	KVM: PPC: Enable CAP_SPAPR_TCE_VFIO on pSeries KVM guests	Amit Machhiwal
	Currently on book3s-hv, the capability KVM_CAP_SPAPR_TCE_VFIO is only available for KVM Guests running on PowerNV and not for the KVM guests running on pSeries hypervisors. This prevents a pSeries L2 guest from leveraging the in-kernel acceleration for H_PUT_TCE_INDIRECT and H_STUFF_TCE hcalls that results in slow startup times for large memory guests. Support for VFIO on pSeries was restored in commit f431a8cde7f1 ("powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries"), making it possible to re-enable this capability on pSeries hosts. This change enables KVM_CAP_SPAPR_TCE_VFIO for nested PAPR guests on pSeries, while maintaining the existing behavior on PowerNV. Booting an L2 guest with 128GB of memory shows an average 11% improvement in startup time. Fixes: f431a8cde7f1 ("powerpc/iommu: Reimplement the iommu_table_group_ops for pSeries") Cc: stable@vger.kernel.org Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20250220070002.1478849-1-amachhiw@linux.ibm.com
2025-02-24	powerpc/vmlinux: Remove etext, edata and end	Christophe Leroy
	etext is not used anymore since commit 843a1ffaf6f2 ("powerpc/mm: use core_kernel_text() helper") edata and end have not been used since the merge of arch/ppc/ and arch/ppc64/ Remove the three and remove macro PROVIDE32. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/d1686d36cdd6b9d681e7ee4dd677c386d43babb1.1736332415.git.christophe.leroy@csgroup.eu
2025-02-18	KVM: PPC: Switch to use hrtimer_setup()	Nam Cao
	hrtimer_setup() takes the callback function pointer as argument and initializes the timer completely. Replace hrtimer_init() and the open coded initialization of hrtimer::function with the new setup mechanism. Patch was created by using Coccinelle. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/a58c4f16b1cfba13f96201fda355553850a97562.1738746821.git.namcao@linutronix.de
2025-01-26	Merge tag 'mm-nonmm-stable-2025-01-24-23-16' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Mainly individually changelogged singleton patches. The patch series in this pull are: - "lib min_heap: Improve min_heap safety, testing, and documentation" from Kuan-Wei Chiu provides various tightenings to the min_heap library code - "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms some cleanup and Rust preparation in the xarray library code - "Update reference to include/asm-<arch>" from Geert Uytterhoeven fixes pathnames in some code comments - "Converge on using secs_to_jiffies()" from Easwar Hariharan uses the new secs_to_jiffies() in various places where that is appropriate - "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen switches two filesystems to the new mount API - "Convert ocfs2 to use folios" from Matthew Wilcox does that - "Remove get_task_comm() and print task comm directly" from Yafang Shao removes now-unneeded calls to get_task_comm() in various places - "squashfs: reduce memory usage and update docs" from Phillip Lougher implements some memory savings in squashfs and performs some maintainability work - "lib: clarify comparison function requirements" from Kuan-Wei Chiu tightens the sort code's behaviour and adds some maintenance work - "nilfs2: protect busy buffer heads from being force-cleared" from Ryusuke Konishi fixes an issues in nlifs when the fs is presented with a corrupted image - "nilfs2: fix kernel-doc comments for function return values" from Ryusuke Konishi fixes some nilfs kerneldoc - "nilfs2: fix issues with rename operations" from Ryusuke Konishi addresses some nilfs BUG_ONs which syzbot was able to trigger - "minmax.h: Cleanups and minor optimisations" from David Laight does some maintenance work on the min/max library code - "Fixes and cleanups to xarray" from Kemeng Shi does maintenance work on the xarray library code" * tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits) ocfs2: use str_yes_no() and str_no_yes() helper functions include/linux/lz4.h: add some missing macros Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent Xarray: remove repeat check in xas_squash_marks() Xarray: distinguish large entries correctly in xas_split_alloc() Xarray: move forward index correctly in xas_pause() Xarray: do not return sibling entries from xas_find_marked() ipc/util.c: complete the kernel-doc function descriptions gcov: clang: use correct function param names latencytop: use correct kernel-doc format for func params minmax.h: remove some #defines that are only expanded once minmax.h: simplify the variants of clamp() minmax.h: move all the clamp() definitions after the min/max() ones minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp() minmax.h: reduce the #define expansion of min(), max() and clamp() minmax.h: update some comments minmax.h: add whitespace around operators and after commas nilfs2: do not update mtime of renamed directory that is not moved nilfs2: handle errors that nilfs_prepare_chunk() may return CREDITS: fix spelling mistake ...
2025-01-12	kernel-wide: add explicity\|\|explicitly to spelling.txt	Shivam Chaudhary
	Correct the spelling dictionary so that future instances will be caught by checkpatch, and fix the instances found. Link: https://lkml.kernel.org/r/20241211154903.47027-1-cvam0000@gmail.com Signed-off-by: Shivam Chaudhary <cvam0000@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Shivam Chaudhary <cvam0000@gmail.com> Cc: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-12	KVM: e500: perform hugepage check after looking up the PFN	Paolo Bonzini
	e500 KVM tries to bypass __kvm_faultin_pfn() in order to map VM_PFNMAP VMAs as huge pages. This is a Bad Idea because VM_PFNMAP VMAs could become noncontiguous as a result of callsto remap_pfn_range(). Instead, use the already existing host PTE lookup to retrieve a valid host-side mapping level after __kvm_faultin_pfn() has returned. Then find the largest size that will satisfy the guest's request while staying within a single host PTE. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-12	KVM: e500: map readonly host pages for read	Paolo Bonzini
	The new __kvm_faultin_pfn() function is upset by the fact that e500 KVM ignores host page permissions - __kvm_faultin requires a "writable" outgoing argument, but e500 KVM is nonchalantly passing NULL. If the host page permissions do not include writability, the shadow TLB entry is forcibly mapped read-only. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-12	KVM: e500: track host-writability of pages	Paolo Bonzini
	Add the possibility of marking a page so that the UW and SW bits are force-cleared. This is stored in the private info so that it persists across multiple calls to kvmppc_e500_setup_stlbe. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-12	KVM: e500: use shadow TLB entry as witness for writability	Paolo Bonzini
	kvmppc_e500_ref_setup is returning whether the guest TLB entry is writable, which is than passed to kvm_release_faultin_page. This makes little sense for two reasons: first, because the function sets up the private data for the page and the return value feels like it has been bolted on the side; second, because what really matters is whether the _shadow_ TLB entry is writable. If it is not writable, the page can be released as non-dirty. Shift from using tlbe_is_writable(gtlbe) to doing the same check on the shadow TLB entry. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-01-12	KVM: e500: always restore irqs	Paolo Bonzini
	If find_linux_pte fails, IRQs will not be restored. This is unlikely to happen in practice since it would have been reported as hanging hosts, but it should of course be fixed anyway. Cc: stable@vger.kernel.org Reported-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-23	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm updates from Paolo Bonzini: "The biggest change here is eliminating the awful idea that KVM had of essentially guessing which pfns are refcounted pages. The reason to do so was that KVM needs to map both non-refcounted pages (for example BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP VMAs that contain refcounted pages. However, the result was security issues in the past, and more recently the inability to map VM_IO and VM_PFNMAP memory that _is_ backed by struct page but is not refcounted. In particular this broke virtio-gpu blob resources (which directly map host graphics buffers into the guest as "vram" for the virtio-gpu device) with the amdgpu driver, because amdgpu allocates non-compound higher order pages and the tail pages could not be mapped into KVM. This requires adjusting all uses of struct page in the per-architecture code, to always work on the pfn whenever possible. The large series that did this, from David Stevens and Sean Christopherson, also cleaned up substantially the set of functions that provided arch code with the pfn for a host virtual addresses. The previous maze of twisty little passages, all different, is replaced by five functions (__gfn_to_page, __kvm_faultin_pfn, the non-__ versions of these two, and kvm_prefetch_pages) saving almost 200 lines of code. ARM: - Support for stage-1 permission indirection (FEAT_S1PIE) and permission overlays (FEAT_S1POE), including nested virt + the emulated page table walker - Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call was introduced in PSCIv1.3 as a mechanism to request hibernation, similar to the S4 state in ACPI - Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As part of it, introduce trivial initialization of the host's MPAM context so KVM can use the corresponding traps - PMU support under nested virtualization, honoring the guest hypervisor's trap configuration and event filtering when running a nested guest - Fixes to vgic ITS serialization where stale device/interrupt table entries are not zeroed when the mapping is invalidated by the VM - Avoid emulated MMIO completion if userspace has requested synchronous external abort injection - Various fixes and cleanups affecting pKVM, vCPU initialization, and selftests LoongArch: - Add iocsr and mmio bus simulation in kernel. - Add in-kernel interrupt controller emulation. - Add support for virtualization extensions to the eiointc irqchip. PPC: - Drop lingering and utterly obsolete references to PPC970 KVM, which was removed 10 years ago. - Fix incorrect documentation references to non-existing ioctls RISC-V: - Accelerate KVM RISC-V when running as a guest - Perf support to collect KVM guest statistics from host side s390: - New selftests: more ucontrol selftests and CPU model sanity checks - Support for the gen17 CPU model - List registers supported by KVM_GET/SET_ONE_REG in the documentation x86: - Cleanup KVM's handling of Accessed and Dirty bits to dedup code, improve documentation, harden against unexpected changes. Even if the hardware A/D tracking is disabled, it is possible to use the hardware-defined A/D bits to track if a PFN is Accessed and/or Dirty, and that removes a lot of special cases. - Elide TLB flushes when aging secondary PTEs, as has been done in x86's primary MMU for over 10 years. - Recover huge pages in-place in the TDP MMU when dirty page logging is toggled off, instead of zapping them and waiting until the page is re-accessed to create a huge mapping. This reduces vCPU jitter. - Batch TLB flushes when dirty page logging is toggled off. This reduces the time it takes to disable dirty logging by ~3x. - Remove the shrinker that was (poorly) attempting to reclaim shadow page tables in low-memory situations. - Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE. - Advertise CPUIDs for new instructions in Clearwater Forest - Quirk KVM's misguided behavior of initialized certain feature MSRs to their maximum supported feature set, which can result in KVM creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to a non-zero value results in the vCPU having invalid state if userspace hides PDCM from the guest, which in turn can lead to save/restore failures. - Fix KVM's handling of non-canonical checks for vCPUs that support LA57 to better follow the "architecture", in quotes because the actual behavior is poorly documented. E.g. most MSR writes and descriptor table loads ignore CR4.LA57 and operate purely on whether the CPU supports LA57. - Bypass the register cache when querying CPL from kvm_sched_out(), as filling the cache from IRQ context is generally unsafe; harden the cache accessors to try to prevent similar issues from occuring in the future. The issue that triggered this change was already fixed in 6.12, but was still kinda latent. - Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM over-advertises SPEC_CTRL when trying to support cross-vendor VMs. - Minor cleanups - Switch hugepage recovery thread to use vhost_task. These kthreads can consume significant amounts of CPU time on behalf of a VM or in response to how the VM behaves (for example how it accesses its memory); therefore KVM tried to place the thread in the VM's cgroups and charge the CPU time consumed by that work to the VM's container. However the kthreads did not process SIGSTOP/SIGCONT, and therefore cgroups which had KVM instances inside could not complete freezing. Fix this by replacing the kthread with a PF_USER_WORKER thread, via the vhost_task abstraction. Another 100+ lines removed, with generally better behavior too like having these threads properly parented in the process tree. - Revert a workaround for an old CPU erratum (Nehalem/Westmere) that didn't really work; there was really nothing to work around anyway: the broken patch was meant to fix nested virtualization, but the PERF_GLOBAL_CTRL MSR is virtualized and therefore unaffected by the erratum. - Fix 6.12 regression where CONFIG_KVM will be built as a module even if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is 'y'. x86 selftests: - x86 selftests can now use AVX. Documentation: - Use rST internal links - Reorganize the introduction to the API document Generic: - Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead of RCU, so that running a vCPU on a different task doesn't encounter long due to having to wait for all CPUs become quiescent. In general both reads and writes are rare, but userspace that supports confidential computing is introducing the use of "helper" vCPUs that may jump from one host processor to another. Those will be very happy to trigger a synchronize_rcu(), and the effect on performance is quite the disaster" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (298 commits) KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL \|\| KVM_AMD KVM: x86: add back X86_LOCAL_APIC dependency Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()" KVM: x86: switch hugepage recovery thread to vhost_task KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest Documentation: KVM: fix malformed table irqchip/loongson-eiointc: Add virt extension support LoongArch: KVM: Add irqfd support LoongArch: KVM: Add PCHPIC user mode read and write functions LoongArch: KVM: Add PCHPIC read and write functions LoongArch: KVM: Add PCHPIC device support LoongArch: KVM: Add EIOINTC user mode read and write functions LoongArch: KVM: Add EIOINTC read and write functions LoongArch: KVM: Add EIOINTC device support LoongArch: KVM: Add IPI user mode read and write function LoongArch: KVM: Add IPI read and write function LoongArch: KVM: Add IPI device support LoongArch: KVM: Add iocsr and mmio bus simulation in kernel KVM: arm64: Pass on SVE mapping failures ...
2024-11-23	Merge tag 'powerpc-6.13-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Rework kfence support for the HPT MMU to work on systems with >= 16TB of RAM. - Remove the powerpc "maple" platform, used by the "Yellow Dog Powerstation". - Add support for DYNAMIC_FTRACE_WITH_CALL_OPS, DYNAMIC_FTRACE_WITH_DIRECT_CALLS & BPF Trampolines. - Add support for running KVM nested guests on Power11. - Other small features, cleanups and fixes. Thanks to Amit Machhiwal, Arnd Bergmann, Christophe Leroy, Costa Shulyupin, David Hunter, David Wang, Disha Goel, Gautam Menghani, Geert Uytterhoeven, Hari Bathini, Julia Lawall, Kajol Jain, Keith Packard, Lukas Bulwahn, Madhavan Srinivasan, Markus Elfring, Michal Suchanek, Ming Lei, Mukesh Kumar Chaurasiya, Nathan Chancellor, Naveen N Rao, Nicholas Piggin, Nysal Jan K.A, Paulo Miguel Almeida, Pavithra Prakash, Ritesh Harjani (IBM), Rob Herring (Arm), Sachin P Bappalige, Shen Lichuan, Simon Horman, Sourabh Jain, Thomas Weißschuh, Thorsten Blum, Thorsten Leemhuis, Venkat Rao Bagalkote, Zhang Zekun, and zhang jiao. * tag 'powerpc-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (89 commits) EDAC/powerpc: Remove PPC_MAPLE drivers powerpc/perf: Add per-task/process monitoring to vpa_pmu driver powerpc/kvm: Add vpa latency counters to kvm_vcpu_arch docs: ABI: sysfs-bus-event_source-devices-vpa-pmu: Document sysfs event format entries for vpa_pmu powerpc/perf: Add perf interface to expose vpa counters MAINTAINERS: powerpc: Mark Maddy as "M" powerpc/Makefile: Allow overriding CPP powerpc-km82xx.c: replace of_node_put() with __free ps3: Correct some typos in comments powerpc/kexec: Fix return of uninitialized variable macintosh: Use common error handling code in via_pmu_led_init() powerpc/powermac: Use of_property_match_string() in pmac_has_backlight_type() powerpc: remove dead config options for MPC85xx platform support powerpc/xive: Use cpumask_intersects() selftests/powerpc: Remove the path after initialization. powerpc/xmon: symbol lookup length fixed powerpc/ep8248e: Use %pa to format resource_size_t powerpc/ps3: Reorganize kerneldoc parameter names KVM: PPC: Book3S HV: Fix kmv -> kvm typo powerpc/sstep: make emulate_vsx_load and emulate_vsx_store static ...
2024-11-19	powerpc/perf: Add per-task/process monitoring to vpa_pmu driver	Kajol Jain
	Enhance the vpa_pmu driver with a feature to observe context switch latency event for both per-task (tid) and per-pid (pid) option. Couple of new helper functions are added to hide the abstraction of reading the context switch latency counter from kvm_vcpu_arch struct and these helper functions are defined in the "kvm/book3s_hv.c". "PERF_ATTACH_TASK" flag is used to decide whether to read the counter values from lppaca or kvm_vcpu_arch struct. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Co-developed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241118114114.208964-4-kjain@linux.ibm.com
2024-11-19	powerpc/kvm: Add vpa latency counters to kvm_vcpu_arch	Kajol Jain
	Commit e1f288d2f9c69 ("KVM: PPC: Book3S HV nestedv2: Add support for reading VPA counters for pseries guests") introduced support for new Virtual Process Area(VPA) based software counters. These counters are useful when observing context switch latency of L1 <-> L2. It also added access to counters in lppaca, which is good enough to understand latency details per-cpu level. But to extend and aggregate per-process level(qemu) or per-pid/tid level(vcpu), these counters also needs to be added as part of kvm_vcpu_arch struct. Additional code added to update these new kvm_vcpu_arch variables in do_trace_nested_cs_time function. Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Co-developed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241118114114.208964-3-kjain@linux.ibm.com
2024-11-19	powerpc/perf: Add perf interface to expose vpa counters	Kajol Jain
	To support performance measurement for KVM on PowerVM(KoP) feature, PowerVM hypervisor has added couple of new software counters in Virtual Process Area(VPA) of the partition. Commit e1f288d2f9c69 ("KVM: PPC: Book3S HV nestedv2: Add support for reading VPA counters for pseries guests") have updated the paca fields with corresponding changes. Proposed perf interface is to expose these new software counters for monitoring of context switch latencies and runtime aggregate. Perf interface driver is called "vpa_pmu" and it has dependency on KVM and perf, hence added new config called "VPA_PMU" which depends on "CONFIG_KVM_BOOK3S_64_HV" and "CONFIG_HV_PERF_CTRS". Since, kvm and kvm_host are currently compiled as built-in modules, this perf interface takes the same path and registered as a module. vpa_pmu perf interface needs access to some of the kvm functions and structures like kvmhv_get_l2_counters_status(), hence kvm_book3s_64.h and kvm_ppc.h are included. Below are the events added to monitor KoP: vpa_pmu/l1_to_l2_lat/ vpa_pmu/l2_to_l1_lat/ vpa_pmu/l2_runtime_agg/ and vpa_pmu driver supports only per-cpu monitoring with this patch. Example usage: [command]# perf stat -e vpa_pmu/l1_to_l2_lat/ -a -I 1000 1.001017682 727,200 vpa_pmu/l1_to_l2_lat/ 2.003540491 1,118,824 vpa_pmu/l1_to_l2_lat/ 3.005699458 1,919,726 vpa_pmu/l1_to_l2_lat/ 4.007827011 2,364,630 vpa_pmu/l1_to_l2_lat/ Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Co-developed-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241118114114.208964-1-kjain@linux.ibm.com
2024-11-18	Merge tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds
	Pull 'struct fd' class updates from Al Viro: "The bulk of struct fd memory safety stuff Making sure that struct fd instances are destroyed in the same scope where they'd been created, getting rid of reassignments and passing them by reference, converting to CLASS(fd{,_pos,_raw}). We are getting very close to having the memory safety of that stuff trivial to verify" * tag 'pull-fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) deal with the last remaing boolean uses of fd_file() css_set_fork(): switch to CLASS(fd_raw, ...) memcg_write_event_control(): switch to CLASS(fd) assorted variants of irqfd setup: convert to CLASS(fd) do_pollfd(): convert to CLASS(fd) convert do_select() convert vfs_dedupe_file_range(). convert cifs_ioctl_copychunk() convert media_request_get_by_fd() convert spu_run(2) switch spufs_calls_{get,put}() to CLASS() use convert cachestat(2) convert do_preadv()/do_pwritev() fdget(), more trivial conversions fdget(), trivial conversions privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() o2hb_region_dev_store(): avoid goto around fdget()/fdput() introduce "fd_pos" class, convert fdget_pos() users to it. fdget_raw() users: switch to CLASS(fd_raw) convert vmsplice() to CLASS(fd) ...
2024-11-14	Merge tag 'loongarch-kvm-6.13' of ↵	Paolo Bonzini
	git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.13 1. Add iocsr and mmio bus simulation in kernel. 2. Add in-kernel interrupt controller emulation. 3. Add virt extension support for eiointc irqchip.
2024-11-14	KVM: PPC: Book3S HV: Fix kmv -> kvm typo	Kajol Jain
	Fix typo in the following kvm function names from: kmvhv_counters_tracepoint_regfunc -> kvmhv_counters_tracepoint_regfunc kmvhv_counters_tracepoint_unregfunc -> kvmhv_counters_tracepoint_unregfunc Fixes: e1f288d2f9c6 ("KVM: PPC: Book3S HV nestedv2: Add support for reading VPA counters for pseries guests") Reported-by: Madhavan Srinivasan <maddy@linux.ibm.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Amit Machhiwal <amachhiw@linux.ibm.com> Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241114085020.1147912-1-kjain@linux.ibm.com
2024-11-14	KVM: PPC: Book3S HV: Avoid returning to nested hypervisor on pending doorbells	Gautam Menghani
	Commit 6398326b9ba1 ("KVM: PPC: Book3S HV P9: Stop using vc->dpdes") dropped the use of vcore->dpdes for msgsndp / SMT emulation. Prior to that commit, the below code at L1 level (see [1] for terminology) was responsible for setting vc->dpdes for the respective L2 vCPU: if (!nested) { kvmppc_core_prepare_to_enter(vcpu); if (vcpu->arch.doorbell_request) { vc->dpdes = 1; smp_wmb(); vcpu->arch.doorbell_request = 0; } L1 then sent vc->dpdes to L0 via kvmhv_save_hv_regs(), and while servicing H_ENTER_NESTED at L0, the below condition at L0 level made sure to abort and go back to L1 if vcpu->arch.doorbell_request = 1 so that L1 sets vc->dpdes as per above if condition: } else if (vcpu->arch.pending_exceptions \|\| vcpu->arch.doorbell_request \|\| xive_interrupt_pending(vcpu)) { vcpu->arch.ret = RESUME_HOST; goto out; } This worked fine since vcpu->arch.doorbell_request was used more like a flag and vc->dpdes was used to pass around the doorbell state. But after Commit 6398326b9ba1 ("KVM: PPC: Book3S HV P9: Stop using vc->dpdes"), vcpu->arch.doorbell_request is the only variable used to pass around doorbell state. With the plumbing for handling doorbells for nested guests updated to use vcpu->arch.doorbell_request over vc->dpdes, the above "else if" stops doorbells from working correctly as L0 aborts execution of L2 and instead goes back to L1. Remove vcpu->arch.doorbell_request from the above "else if" condition as it is no longer needed for L0 to correctly handle the doorbell status while running L2. [1] Terminology 1. L0 : PowerNV linux running with HV privileges 2. L1 : Pseries KVM guest running on top of L0 2. L2 : Nested KVM guest running on top of L1 Fixes: 6398326b9ba1 ("KVM: PPC: Book3S HV P9: Stop using vc->dpdes") Signed-off-by: Gautam Menghani <gautam@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241109063301.105289-4-gautam@linux.ibm.com
2024-11-14	KVM: PPC: Book3S HV: Stop using vc->dpdes for nested KVM guests	Gautam Menghani
	commit 6398326b9ba1 ("KVM: PPC: Book3S HV P9: Stop using vc->dpdes") introduced an optimization to use only vcpu->doorbell_request for SMT emulation for Power9 and above guests, but the code for nested guests still relies on the old way of handling doorbells, due to which an L2 guest (see [1]) cannot be booted with XICS with SMT>1. The command to repro this issue is: // To be run in L1 qemu-system-ppc64 \ -drive file=rhel.qcow2,format=qcow2 \ -m 20G \ -smp 8,cores=1,threads=8 \ -cpu host \ -nographic \ -machine pseries,ic-mode=xics -accel kvm Fix the plumbing to utilize vcpu->doorbell_request instead of vcore->dpdes for nested KVM guests on P9 and above. [1] Terminology 1. L0 : PowerNV linux running with HV privileges 2. L1 : Pseries KVM guest running on top of L0 2. L2 : Nested KVM guest running on top of L1 Fixes: 6398326b9ba1 ("KVM: PPC: Book3S HV P9: Stop using vc->dpdes") Signed-off-by: Gautam Menghani <gautam@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241109063301.105289-3-gautam@linux.ibm.com
2024-11-14	Revert "KVM: PPC: Book3S HV Nested: Stop forwarding all HFUs to L1"	Gautam Menghani
	This reverts commit 7c3ded5735141ff4d049747c9f76672a8b737c49. On PowerNV, when a nested guest tries to use a feature prohibited by HFSCR, the nested hypervisor (L1) should get a H_FAC_UNAVAILABLE trap so that L1 can emulate the feature. But with the change introduced by commit 7c3ded573514 ("KVM: PPC: Book3S HV Nested: Stop forwarding all HFUs to L1") the L1 ends up getting a H_EMUL_ASSIST because of which, the L1 ends up injecting a SIGILL when L2 (nested guest) tries to use doorbells. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Gautam Menghani <gautam@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241109063301.105289-2-gautam@linux.ibm.com
2024-11-13	Merge branch 'kvm-docs-6.13' into HEAD	Paolo Bonzini
	- Drop obsolete references to PPC970 KVM, which was removed 10 years ago. - Fix incorrect references to non-existing ioctls - List registers supported by KVM_GET/SET_ONE_REG on s390 - Use rST internal links - Reorganize the introduction to the API document
2024-11-08	KVM: powerpc: remove remaining traces of KVM_CAP_PPC_RMA	Paolo Bonzini
	This was only needed for PPC970 support, which is long gone: the implementation was removed in 2014. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241023124507.280382-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-11-06	KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to avoid ↵	Gautam Menghani
	spurious interrupts Running a L2 vCPU (see [1] for terminology) with LPCR_MER bit set and no pending interrupts results in that L2 vCPU getting an infinite flood of spurious interrupts. The 'if check' in kvmhv_run_single_vcpu() sets the LPCR_MER bit if there are pending interrupts. The spurious flood problem can be observed in 2 cases: 1. Crashing the guest while interrupt heavy workload is running a. Start a L2 guest and run an interrupt heavy workload (eg: ipistorm) b. While the workload is running, crash the guest (make sure kdump is configured) c. Any one of the vCPUs of the guest will start getting an infinite flood of spurious interrupts. 2. Running LTP stress tests in multiple guests at the same time a. Start 4 L2 guests. b. Start running LTP stress tests on all 4 guests at same time. c. In some time, any one/more of the vCPUs of any of the guests will start getting an infinite flood of spurious interrupts. The root cause of both the above issues is the same: 1. A NMI is sent to a running vCPU that has LPCR_MER bit set. 2. In the NMI path, all registers are refreshed, i.e, H_GUEST_GET_STATE is called for all the registers. 3. When H_GUEST_GET_STATE is called for LPCR, the vcpu->arch.vcore->lpcr of that vCPU at L1 level gets updated with LPCR_MER set to 1, and this new value is always used whenever that vCPU runs, regardless of whether there was a pending interrupt. 4. Since LPCR_MER is set, the vCPU in L2 always jumps to the external interrupt handler, and this cycle never ends. Fix the spurious flood by masking off the LPCR_MER bit before running a L2 vCPU to ensure that it is not set if there are no pending interrupts. [1] Terminology: 1. L0 : PAPR hypervisor running in HV mode 2. L1 : Linux guest (logical partition) running on top of L0 3. L2 : KVM guest running on top of L1 Fixes: ec0f6639fa88 ("KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is passed to the L0") Cc: stable@vger.kernel.org # v6.8+ Signed-off-by: Gautam Menghani <gautam@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
2024-11-05	KVM: PPC: Book3S HV: Add Power11 capability support for Nested PAPR guests	Amit Machhiwal
	The Power11 architected and raw mode support in Linux was merged in commit c2ed087ed35c ("powerpc: Add Power11 architected and raw mode"), and the corresponding support in QEMU is pending in [1], which is currently in its V6. Currently, booting a KVM guest inside a pseries LPAR (Logical Partition) on a kernel without P11 support results the guest boot in a Power10 compatibility mode (i.e., with logical PVR of Power10). However, booting a KVM guest on a kernel with P11 support causes the following boot crash. On a Power11 LPAR, the Power Hypervisor (L0) returns a support for both Power10 and Power11 capabilities through H_GUEST_GET_CAPABILITIES hcall. However, KVM currently supports only Power10 capabilities, resulting in only Power10 capabilities being set as "nested capabilities" via an H_GUEST_SET_CAPABILITIES hcall. In the guest entry path, gs_msg_ops_kvmhv_nestedv2_config_fill_info() is called by kvmhv_nestedv2_flush_vcpu() to fill the GSB (Guest State Buffer) elements. The arch_compat is set to the logical PVR of Power11, followed by an H_GUEST_SET_STATE hcall. This hcall returns H_INVALID_ELEMENT_VALUE as a return code when setting a Power11 logical PVR, as only Power10 capabilities were communicated as supported between PHYP and KVM, utimately resulting in the KVM guest boot crash. KVM: unknown exit, hardware reason ffffffffffffffea NIP 000000007daf97e0 LR 000000007daf1aec CTR 000000007daf1ab4 XER 0000000020040000 CPU#0 MSR 8000000000103000 HID0 0000000000000000 HF 6c002000 iidx 3 didx 3 TB 00000000 00000000 DECR 0 GPR00 8000000000003000 000000007e580e20 000000007db26700 0000000000000000 GPR04 00000000041a0c80 000000007df7f000 0000000000200000 000000007df7f000 GPR08 000000007db6d5d8 000000007e65fa90 000000007db6d5d0 0000000000003000 GPR12 8000000000000001 0000000000000000 0000000000000000 0000000000000000 GPR16 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20 0000000000000000 0000000000000000 0000000000000000 000000007db21a30 GPR24 000000007db65000 0000000000000000 0000000000000000 0000000000000003 GPR28 000000007db6d5e0 000000007db22220 000000007daf27ac 000000007db75000 CR 20000404 [ E - - - - G - G ] RES 000@ffffffffffffffff SRR0 000000007daf97e0 SRR1 8000000000102000 PVR 0000000000820200 VRSAVE 0000000000000000 SPRG0 0000000000000000 SPRG1 000000000000ff20 SPRG2 0000000000000000 SPRG3 0000000000000000 SPRG4 0000000000000000 SPRG5 0000000000000000 SPRG6 0000000000000000 SPRG7 0000000000000000 CFAR 0000000000000000 LPCR 0000000000020400 PTCR 0000000000000000 DAR 0000000000000000 DSISR 0000000000000000 Fix this by adding the Power11 capability support and the required plumbing in place. Note: * Booting a Power11 KVM nested PAPR guest requires [1] in QEMU. [1] https://lore.kernel.org/all/20240731055022.696051-1-adityag@linux.ibm.com/ Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241028101622.741573-1-amachhiw@linux.ibm.com
2024-11-03	fdget(), trivial conversions	Al Viro
	fdget() is the first thing done in scope, all matching fdput() are immediately followed by leaving the scope. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-25	KVM: PPC: Explicitly require struct page memory for Ultravisor sharing	Sean Christopherson
	Explicitly require "struct page" memory when sharing memory between guest and host via an Ultravisor. Given the number of pfn_to_page() calls in the code, it's safe to assume that KVM already requires that the pfn returned by gfn_to_pfn() is backed by struct page, i.e. this is likely a bug fix, not a reduction in KVM capabilities. Switching to gfn_to_page() will eventually allow removing gfn_to_pfn() and kvm_pfn_to_refcounted_page(). Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-79-seanjc@google.com>
2024-10-25	KVM: PPC: Use kvm_vcpu_map() to map guest memory to patch dcbz instructions	Sean Christopherson
	Use kvm_vcpu_map() when patching dcbz in guest memory, as a regular GUP isn't technically sufficient when writing to data in the target pages. As per Documentation/core-api/pin_user_pages.rst: Correct (uses FOLL_PIN calls): pin_user_pages() write to the data within the pages unpin_user_pages() INCORRECT (uses FOLL_GET calls): get_user_pages() write to the data within the pages put_page() As a happy bonus, using kvm_vcpu_{,un}map() takes care of creating a mapping and marking the page dirty. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-75-seanjc@google.com>
2024-10-25	KVM: PPC: Remove extra get_page() to fix page refcount leak	Sean Christopherson
	Don't manually do get_page() when patching dcbz, as gfn_to_page() gifts the caller a reference. I.e. doing get_page() will leak the page due to not putting all references. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-74-seanjc@google.com>
2024-10-25	KVM: PPC: Use kvm_faultin_pfn() to handle page faults on Book3s PR	Sean Christopherson
	Convert Book3S PR to __kvm_faultin_pfn()+kvm_release_faultin_page(), which are new APIs to consolidate arch code and provide consistent behavior across all KVM architectures. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-65-seanjc@google.com>
2024-10-25	KVM: PPC: Book3S: Mark "struct page" pfns dirty/accessed after installing PTE	Sean Christopherson
	Mark pages/folios dirty/accessed after installing a PTE, and more specifically after acquiring mmu_lock and checking for an mmu_notifier invalidation. Marking a page/folio dirty after it has been written back can make some filesystems unhappy (backing KVM guests will such filesystem files is uncommon, and the race is minuscule, hence the lack of complaints). See the link below for details. This will also allow converting Book3S to kvm_release_faultin_page(), which requires that mmu_lock be held (for the aforementioned reason). Link: https://lore.kernel.org/all/cover.1683044162.git.lstoakes@gmail.com Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-64-seanjc@google.com>
2024-10-25	KVM: PPC: Drop unused @kvm_ro param from kvmppc_book3s_instantiate_page()	Sean Christopherson
	Drop @kvm_ro from kvmppc_book3s_instantiate_page() as it is now only written, and never read. No functional change intended. Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-63-seanjc@google.com>
2024-10-25	KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s Radix	Sean Christopherson
	Replace Book3s Radix's homebrewed (read: copy+pasted) fault-in logic with __kvm_faultin_pfn(), which functionally does pretty much the exact same thing. Note, when the code was written, KVM indeed didn't do fast GUP without "!atomic && !async", but that has long since changed (KVM tries fast GUP for all writable mappings). Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-62-seanjc@google.com>
2024-10-25	KVM: PPC: Use __kvm_faultin_pfn() to handle page faults on Book3s HV	Sean Christopherson
	Replace Book3s HV's homebrewed fault-in logic with __kvm_faultin_pfn(), which functionally does pretty much the exact same thing. Note, when the code was written, KVM indeed didn't do fast GUP without "!atomic && !async", but that has long since changed (KVM tries fast GUP for all writable mappings). Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-61-seanjc@google.com>