summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-13cpufreq: intel_pstate: EAS support for hybrid platformsRafael J. Wysocki
Modify intel_pstate to register EM perf domains for CPUs on hybrid platforms without SMT which causes EAS to be enabled on them when schedutil is used as the cpufreq governor (which requires intel_pstate to operate in the passive mode). This change is targeting platforms (for example, Lunar Lake) where the "little" CPUs (E-cores) are always more energy-efficient than the "big" or "performance" CPUs (P-cores) when run at the same HWP performance level, so it is sufficient to tell EAS that E-cores are always preferred (so long as there is enough spare capacity on one of them to run the given task). However, migrating tasks between CPUs of the same type too often is not desirable because it may hurt both performance and energy efficiency due to leaving warm caches behind. For this reason, register a separate perf domain for each CPU and choose the cost values for them so that the cost mostly depends on the CPU type, but there is also a small component of it depending on the performance level (utilization) which helps to balance the load between CPUs of the same type. The cost component related to the CPU type is computed with the help of the observation that the IPC metric value for a given CPU is inversely proportional to its performance-to-frequency scaling factor and the cost of running code on it can be assumed to be roughly proportional to that IPC ratio (in principle, the higher the IPC ratio, the more resources are utilized when running at a given frequency, so the cost should be higher). For all CPUs that are online at the system initialization time, EM perf domains are registered when the driver starts up, after asymmetric capacity support has been enabled. For the CPUs that become online later, EM perf domains are registered after setting the asymmetric capacity for them. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://patch.msgid.link/6057101.MhkbZ0Pkbq@rjwysocki.net
2025-05-13Merge Energy Model management code changes for 6.16Rafael J. Wysocki
2025-05-13PM: EM: Introduce em_adjust_cpu_capacity()Rafael J. Wysocki
Add a function for updating the Energy Model for a CPU after its capacity has changed, which subsequently will be used by the intel_pstate driver. An EM_PERF_DOMAIN_ARTIFICIAL check is added to em_recalc_and_update() to prevent it from calling em_compute_costs() for an "artificial" perf domain with a NULL cb parameter which would cause it to crash. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://patch.msgid.link/3637203.iIbC2pHGDl@rjwysocki.net
2025-05-13PM: EM: Move CPU capacity check to em_adjust_new_capacity()Rafael J. Wysocki
Move the check of the CPU capacity currently stored in the energy model against the arch_scale_cpu_capacity() value to em_adjust_new_capacity() so it will be done regardless of where the latter is called from. This will be useful when a new em_adjust_new_capacity() caller is added subsequently. While at it, move the pd local variable declaration in em_check_capacity_update() into the loop in which it is used. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://patch.msgid.link/7810787.EvYhyI6sBW@rjwysocki.net
2025-05-13PM: EM: Documentation: Fix typos in example driver codeAtul Kumar Pant
Fix the API name to free the allocated table in the example driver code that modifies the EM. Also fix the passing of correct table when updating the cost. Signed-off-by: Atul Kumar Pant <atulpant.linux@gmail.com> Link: https://patch.msgid.link/20250511071141.13237-1-atulpant.linux@gmail.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-13cpufreq: Drop policy locking from cpufreq_policy_is_good_for_eas()Rafael J. Wysocki
Policy locking was added to cpufreq_policy_is_good_for_eas() by commit 4854649b1fb4 ("cpufreq/sched: Move cpufreq-specific EAS checks to cpufreq") to address a theoretical race condition, but it turned out to introduce a circular locking dependency between the policy rwsem and sched_domains_mutex via cpuset_mutex. This leads to a board lockup on OdroidN2 that is based on the ARM64 Amlogic Meson SoC. Drop the policy locking from cpufreq_policy_is_good_for_eas() to address this issue. Fixes: 4854649b1fb4 ("cpufreq/sched: Move cpufreq-specific EAS checks to cpufreq") Closes: https://lore.kernel.org/linux-pm/1bf3df62-0641-459f-99fc-fd511e564b84@samsung.com/ Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/2806514.mvXUDI8C0e@rjwysocki.net
2025-05-09PM: EM: Documentation: fix typo in energy-model.rstMoon Hee Lee
Fixes a grammar issue ("than" -> "then") and changes "re-use" to "reuse" for consistency with modern spelling. Signed-off-by: Moon Hee Lee <moonhee.lee.ca@gmail.com> Link: https://patch.msgid.link/20250506220057.5589-1-moonhee.lee.ca@gmail.com [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-09Merge tag 'amd-pstate-v6.16-2025-05-08' of ↵Rafael J. Wysocki
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux Merge amd-pstate content for 6.16 (5/8/25) from Mario Limonciello: - Add support for a new feature on some BIOS that allows setting "lowest CPU minimum frequency". - Fix the amd-pstate-ut unit tests to restore system settings when done. * tag 'amd-pstate-v6.16-2025-05-08' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux: amd-pstate-ut: Reset amd-pstate driver mode after running selftests cpufreq/amd-pstate: Add support for the "Requested CPU Min frequency" BIOS option cpufreq/amd-pstate: Add offline, online and suspend callbacks for amd_pstate_driver cpufreq/amd-pstate: Move max_perf limiting in amd_pstate_update
2025-05-07cpufreq: intel_pstate: Populate the cpu_capacity sysfs entriesRicardo Neri
Intel hybrid processors have CPUs of different capacity. Populate the interface /sys/devices/system/cpu/cpuN/cpu_capacity. This interface uses the per-CPU variable `cpu_scale`. On x86 this variable has no other use besides feeding the sysfs entries. Initialize it when setting CPU capacity for the scheduler and scale-invariant code. Feed it with arch_scale_cpu_capacity() as it gives capacity normalized to the interval [0, SCHED_CAPACITY_SCALE]. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-07arch_topology: Relocate cpu_scale to topology.[h|c]Ricardo Neri
arch_topology.c provides functionality to parse and scale CPU capacity. It also provides a corresponding sysfs interface. Some architectures parse and scale CPU capacity differently as per their own needs. On Intel processors, for instance, it is responsibility of the Intel P-state driver. Relocate the implementation of that interface to a common location in topology.c. Architectures can use the interface and populate it using their own mechanisms. An alternative approach would be to compile arch_topology.c even if not needed only to get this interface. This approach would create duplicated and conflicting functionality and data structures. Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/20250419025504.9760-2-ricardo.neri-calderon@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-05-07cpufreq/sched: Move cpufreq-specific EAS checks to cpufreqRafael J. Wysocki
Doing cpufreq-specific EAS checks that require accessing policy internals directly from sched_is_eas_possible() is a bit unfortunate, so introduce cpufreq_ready_for_eas() in cpufreq, move those checks into that new function and make sched_is_eas_possible() call it. While at it, address a possible race between the EAS governor check and governor change by doing the former under the policy rwsem. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://patch.msgid.link/2317800.iZASKD2KPV@rjwysocki.net
2025-05-07cpufreq/sched: schedutil: Add helper for governor checksRafael J. Wysocki
Add a helper for checking if schedutil is the current governor for a given cpufreq policy and use it in sched_is_eas_possible() to avoid accessing cpufreq policy internals directly from there. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://patch.msgid.link/3365956.44csPzL39Z@rjwysocki.net
2025-05-05amd-pstate-ut: Reset amd-pstate driver mode after running selftestsSwapnil Sapkal
In amd-pstate-ut, one of the basic test is to switch between all possible mode combinations. After running this test the mode of the amd-pstate driver is active mode. Store and reset the mode to its original state. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Swapnil Sapkal <swapnil.sapkal@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20250430064206.7402-1-swapnil.sapkal@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2025-05-04Merge back cpufreq material for 6.16Rafael J. Wysocki
2025-04-30cpufreq: intel_pstate: Unchecked MSR aceess in legacy modeSrinivas Pandruvada
When turbo mode is unavailable on a Skylake-X system, executing the command: # echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo results in an unchecked MSR access error: WRMSR to 0x199 (attempted to write 0x0000000100001300). This issue was reproduced on an OEM (Original Equipment Manufacturer) system and is not a common problem across all Skylake-X systems. This error occurs because the MSR 0x199 Turbo Engage Bit (bit 32) is set when turbo mode is disabled. The issue arises when intel_pstate fails to detect that turbo mode is disabled. Here intel_pstate relies on MSR_IA32_MISC_ENABLE bit 38 to determine the status of turbo mode. However, on this system, bit 38 is not set even when turbo mode is disabled. According to the Intel Software Developer's Manual (SDM), the BIOS sets this bit during platform initialization to enable or disable opportunistic processor performance operations. Logically, this bit should be set in such cases. However, the SDM also specifies that "OS and applications must use CPUID leaf 06H to detect processors with opportunistic processor performance operations enabled." Therefore, in addition to checking MSR_IA32_MISC_ENABLE bit 38, verify that CPUID.06H:EAX[1] is 0 to accurately determine if turbo mode is disabled. Fixes: 4521e1a0ce17 ("cpufreq: intel_pstate: Reflect current no_turbo state correctly") Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-04-28cpufreq/amd-pstate: Add support for the "Requested CPU Min frequency" BIOS ↵Dhananjay Ugwekar
option Initialize lower frequency limit to the "Requested CPU Min frequency" BIOS option (if it is set) value as part of the driver->init() callback. The BIOS specified value is passed by the PMFW as min_perf in CPPC_REQ MSR. To ensure that we don't mistake a stale min_perf value in CPPC_REQ value as the "Requested CPU Min frequency" during a kexec wakeup, reset the CPPC_REQ.min_perf value back to the BIOS specified one in the offline, exit and suspend callbacks. amd_pstate_target() and amd_pstate_epp_update_limit() which are invoked as part of the resume() and online() callbacks will take care of restoring the CPPC_REQ back to the correct values. Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20250428071623.4309-1-dhananjay.ugwekar@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2025-04-28cpufreq/amd-pstate: Add offline, online and suspend callbacks for ↵Dhananjay Ugwekar
amd_pstate_driver Rename and use the existing amd_pstate_epp callbacks for amd_pstate driver as well. Remove the debug print in online callback while at it. These callbacks will be needed to support the "Requested CPU Min Frequency" BIOS option. Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com> Link: https://lore.kernel.org/r/20250428062520.4997-2-dhananjay.ugwekar@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2025-04-28PM: EM: Fix potential division-by-zero error in em_compute_costs()Yaxiong Tian
When the device is of a non-CPU type, table[i].performance won't be initialized in the previous em_init_performance(), resulting in division by zero when calculating costs in em_compute_costs(). Since the 'cost' algorithm is only used for EAS energy efficiency calculations and is currently not utilized by other device drivers, we should add the _is_cpu_device(dev) check to prevent this division-by-zero issue. Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove division") Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/tencent_7F99ED4767C1AF7889D0D8AD50F34859CE06@qq.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-04-28cpufreq: Fix setting policy limits when frequency tables are usedRafael J. Wysocki
Commit 7491cdf46b5c ("cpufreq: Avoid using inconsistent policy->min and policy->max") overlooked the fact that policy->min and policy->max were accessed directly in cpufreq_frequency_table_target() and in the functions called by it. Consequently, the changes made by that commit led to problems with setting policy limits. Address this by passing the target frequency limits to __resolve_freq() and cpufreq_frequency_table_target() and propagating them to the functions called by the latter. Fixes: 7491cdf46b5c ("cpufreq: Avoid using inconsistent policy->min and policy->max") Cc: 5.16+ <stable@vger.kernel.org> # 5.16+ Closes: https://lore.kernel.org/linux-pm/aAplED3IA_J0eZN0@linaro.org/ Reported-by: Stephan Gerhold <stephan.gerhold@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Stephan Gerhold <stephan.gerhold@linaro.org> Reviewed-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/5896780.DvuYhMxLoT@rjwysocki.net
2025-04-27Linux 6.15-rc4Linus Torvalds
2025-04-26Merge tag 'pci-v6.15-fixes-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI fixes from Bjorn Helgaas: - When releasing a start-aligned resource, e.g., a bridge window, save start/end/flags for the next assignment attempt; fixes a v6.15-rc1 regression (Ilpo Järvinen) - Move set_pcie_speed.sh from TEST_PROGS to TEST_FILE; fixes a bwctrl selftest v6.15-rc1 regression (Ilpo Järvinen) - Add Manivannan Sadhasivam as maintainer of native host bridge and endpoint drivers (Manivannan Sadhasivam) - In endpoint test driver, defer IRQ allocation from .probe() until ioctl() to fix a regression on platforms where the Vendor/Device ID match doesn't include driver_data (Niklas Cassel) * tag 'pci-v6.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: misc: pci_endpoint_test: Defer IRQ allocation until ioctl(PCITEST_SET_IRQTYPE) MAINTAINERS: Move Manivannan Sadhasivam as PCI Native host bridge and endpoint maintainer selftests/pcie_bwctrl: Fix test progs list PCI: Restore assigned resources fully after release
2025-04-26Merge tag 'nfsd-6.15-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Revert a v6.15 patch due to a report of SELinux test failures * tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: Revert "sunrpc: clean cache_detail immediately when flush is written frequently"
2025-04-26Merge tag 'x86-urgent-2025-04-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix 32-bit kernel boot crash if passed physical memory with more than 32 address bits - Fix Xen PV crash - Work around build bug in certain limited build environments - Fix CTEST instruction decoding in insn_decoder_test * tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/insn: Fix CTEST instruction decoding x86/boot: Work around broken busybox 'truncate' tool x86/mm: Fix _pgd_alloc() for Xen PV mode x86/e820: Discard high memory that can't be addressed by 32-bit systems
2025-04-26Merge tag 'sched-urgent-2025-04-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix sporadic crashes in dequeue_entities() due to ... bad math. [ Arguably if pick_eevdf()/pick_next_entity() was less trusting of complex math being correct it could have de-escalated a crash into a warning, but that's for a different patch ]" * tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash
2025-04-26Merge tag 'perf-urgent-2025-04-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc perf events fixes from Ingo Molnar: - Use POLLERR for events in error state, instead of the ambiguous POLLHUP error value - Fix non-sampling (counting) events on certain x86 platforms * tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix non-sampling (counting) events on certain x86 platforms perf/core: Change to POLLERR for pinned events with error
2025-04-26Merge tag 'irq-urgent-2025-04-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "Fix crashes in the gic-v2m irqchip driver, caused by an incorrect __init annotation" * tag 'irq-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()
2025-04-26Merge tag 'loongarch-fixes-6.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Add a missing Kconfig option, fix some bugs in exception handlers, memory management and KVM" * tag 'loongarch-fixes-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finally LoongArch: KVM: Fully clear some CSRs when VM reboot LoongArch: KVM: Fix multiple typos of KVM code LoongArch: Return NULL from huge_pte_offset() for invalid PMD LoongArch: Remove a bogus reference to ZONE_DMA LoongArch: Handle fp, lsx, lasx and lbt assembly symbols LoongArch: Make do_xyz() exception handlers more robust LoongArch: Make regs_irqs_disabled() more clear LoongArch: Select ARCH_USE_MEMTEST
2025-04-26Merge tag 'for-linus' of https://github.com/openrisc/linuxLinus Torvalds
Pull OpenRISC updates from Stafford Horne: - Support for cacheinfo API to expose OpenRISC cache info via sysfs, this also translated to some cleanups to OpenRISC cache flush and invalidate API's - Documentation updates for new mailing list and toolchain binaries * tag 'for-linus' of https://github.com/openrisc/linux: Documentation: openrisc: Update toolchain binaries URL Documentation: openrisc: Update mailing list openrisc: Add cacheinfo support openrisc: Introduce new utility functions to flush and invalidate caches openrisc: Refactor struct cpuinfo_or1k to reduce duplication
2025-04-26Revert "sunrpc: clean cache_detail immediately when flush is written frequently"Chuck Lever
Ondrej reports that certain SELinux tests are failing after commit fc2a169c56de ("sunrpc: clean cache_detail immediately when flush is written frequently"), merged during the v6.15 merge window. Reported-by: Ondrej Mosnacek <omosnace@redhat.com> Fixes: fc2a169c56de ("sunrpc: clean cache_detail immediately when flush is written frequently") Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2025-04-26Merge tag 'move-lib-kunit-v6.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull kunit fix from Kees Cook: "A single fix for the kunit lib/tests/ relocation: - Ensure prime numbers tests are included in KUnit test runs (Mark Brown)" * tag 'move-lib-kunit-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: lib: Ensure prime numbers tests are included in KUnit test runs
2025-04-26Merge tag 'drm-fixes-2025-04-26' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Weekly drm fixes, mostly amdgpu, with some exynos cleanups and a couple of minor fixes, seems a bit quiet, but probably some lag from Easter holidays. amdgpu: - P2P DMA fixes - Display reset fixes - DCN 3.5 fixes - ACPI EDID fix - LTTPR fix - mode_valid() fix exynos: - fix spelling error - remove redundant error handling in exynos_drm_vidi.c module - marks struct decon_data as const in the exynos7_drm_decon driver since it is only read - Remove unnecessary checking in exynos_drm_drv.c module meson: - Fix VCLK calculation panel: - jd9365a: Fix reset polarity" * tag 'drm-fixes-2025-04-26' of https://gitlab.freedesktop.org/drm/kernel: drm/exynos: Fix spelling mistake "enqueu" -> "enqueue" drm/exynos: exynos7_drm_decon: Consstify struct decon_data drm/exynos: fixed a spelling error drm/exynos/vidi: Remove redundant error handling in vidi_get_modes() drm/exynos: Remove unnecessary checking drm/amd/display: do not copy invalid CRTC timing info drm/amd/display: Default IPS to RCG_IN_ACTIVE_IPS2_IN_OFF drm/amd/display: Use 16ms AUX read interval for LTTPR with old sinks drm/amd/display: Fix ACPI edid parsing on some Lenovo systems drm/amdgpu: Allow P2P access through XGMI drm/amd/display: Enable urgent latency adjustment on DCN35 drm/amd/display: Force full update in gpu reset drm/amd/display: Fix gpu reset in multidisplay config drm/amdgpu: Don't pin VRAM without DMABUF_MOVE_NOTIFY drm/amdgpu: Use allowed_domains for pinning dmabufs drm: panel: jd9365da: fix reset signal polarity in unprepare drm/meson: use unsigned long long / Hz for frequency types Revert "drm/meson: vclk: fix calculation of 59.94 fractional rates"
2025-04-26sched/eevdf: Fix se->slice being set to U64_MAX and resulting crashOmar Sandoval
There is a code path in dequeue_entities() that can set the slice of a sched_entity to U64_MAX, which sometimes results in a crash. The offending case is when dequeue_entities() is called to dequeue a delayed group entity, and then the entity's parent's dequeue is delayed. In that case: 1. In the if (entity_is_task(se)) else block at the beginning of dequeue_entities(), slice is set to cfs_rq_min_slice(group_cfs_rq(se)). If the entity was delayed, then it has no queued tasks, so cfs_rq_min_slice() returns U64_MAX. 2. The first for_each_sched_entity() loop dequeues the entity. 3. If the entity was its parent's only child, then the next iteration tries to dequeue the parent. 4. If the parent's dequeue needs to be delayed, then it breaks from the first for_each_sched_entity() loop _without updating slice_. 5. The second for_each_sched_entity() loop sets the parent's ->slice to the saved slice, which is still U64_MAX. This throws off subsequent calculations with potentially catastrophic results. A manifestation we saw in production was: 6. In update_entity_lag(), se->slice is used to calculate limit, which ends up as a huge negative number. 7. limit is used in se->vlag = clamp(vlag, -limit, limit). Because limit is negative, vlag > limit, so se->vlag is set to the same huge negative number. 8. In place_entity(), se->vlag is scaled, which overflows and results in another huge (positive or negative) number. 9. The adjusted lag is subtracted from se->vruntime, which increases or decreases se->vruntime by a huge number. 10. pick_eevdf() calls entity_eligible()/vruntime_eligible(), which incorrectly returns false because the vruntime is so far from the other vruntimes on the queue, causing the (vruntime - cfs_rq->min_vruntime) * load calulation to overflow. 11. Nothing appears to be eligible, so pick_eevdf() returns NULL. 12. pick_next_entity() tries to dereference the return value of pick_eevdf() and crashes. Dumping the cfs_rq states from the core dumps with drgn showed tell-tale huge vruntime ranges and bogus vlag values, and I also traced se->slice being set to U64_MAX on live systems (which was usually "benign" since the rest of the runqueue needed to be in a particular state to crash). Fix it in dequeue_entities() by always setting slice from the first non-empty cfs_rq. Fixes: aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/f0c2d1072be229e1bdddc73c0703919a8b00c652.1745570998.git.osandov@fb.com
2025-04-26irqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()Suzuki K Poulose
With ACPI in place, gicv2m_get_fwnode() is registered with the pci subsystem as pci_msi_get_fwnode_cb(), which may get invoked at runtime during a PCI host bridge probe. But, the call back is wrongly marked as __init, causing it to be freed, while being registered with the PCI subsystem and could trigger: Unable to handle kernel paging request at virtual address ffff8000816c0400 gicv2m_get_fwnode+0x0/0x58 (P) pci_set_bus_msi_domain+0x74/0x88 pci_register_host_bridge+0x194/0x548 This is easily reproducible on a Juno board with ACPI boot. Retain the function for later use. Fixes: 0644b3daca28 ("irqchip/gic-v2m: acpi: Introducing GICv2m ACPI support") Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org
2025-04-26LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finallyBibo Mao
In function kvm_pre_enter_guest(), it prepares to enter guest and check whether there are pending signals or events. And it will not enter guest if there are, PMU pass-through preparation for guest should be cancelled and host should own PMU hardware. Cc: stable@vger.kernel.org Fixes: f4e40ea9f78f ("LoongArch: KVM: Add PMU support for guest") Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-26LoongArch: KVM: Fully clear some CSRs when VM rebootBibo Mao
Some registers such as LOONGARCH_CSR_ESTAT and LOONGARCH_CSR_GINTC are partly cleared with function _kvm_setcsr(). This comes from the hardware specification, some bits are read only in VM mode, and however they can be written in host mode. So they are partly cleared in VM mode, and can be fully cleared in host mode. These read only bits show pending interrupt or exception status. When VM reset, the read-only bits should be cleared, otherwise vCPU will receive unknown interrupts in boot stage. Here registers LOONGARCH_CSR_ESTAT/LOONGARCH_CSR_GINTC are fully cleared in ioctl KVM_REG_LOONGARCH_VCPU_RESET vCPU reset path. Cc: stable@vger.kernel.org Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-26LoongArch: KVM: Fix multiple typos of KVM codeYulong Han
Fix multiple typos inside arch/loongarch/kvm. Cc: stable@vger.kernel.org Reviewed-by: Yuli Wang <wangyuli@uniontech.com> Reviewed-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Yulong Han <wheatfox17@icloud.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-26LoongArch: Return NULL from huge_pte_offset() for invalid PMDMing Wang
LoongArch's huge_pte_offset() currently returns a pointer to a PMD slot even if the underlying entry points to invalid_pte_table (indicating no mapping). Callers like smaps_hugetlb_range() fetch this invalid entry value (the address of invalid_pte_table) via this pointer. The generic is_swap_pte() check then incorrectly identifies this address as a swap entry on LoongArch, because it satisfies the "!pte_present() && !pte_none()" conditions. This misinterpretation, combined with a coincidental match by is_migration_entry() on the address bits, leads to kernel crashes in pfn_swap_entry_to_page(). Fix this at the architecture level by modifying huge_pte_offset() to check the PMD entry's content using pmd_none() before returning. If the entry is invalid (i.e., it points to invalid_pte_table), return NULL instead of the pointer to the slot. Cc: stable@vger.kernel.org Acked-by: Peter Xu <peterx@redhat.com> Co-developed-by: Hongchen Zhang <zhanghongchen@loongson.cn> Signed-off-by: Hongchen Zhang <zhanghongchen@loongson.cn> Signed-off-by: Ming Wang <wangming01@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-26LoongArch: Remove a bogus reference to ZONE_DMAPetr Tesarik
Remove dead code. LoongArch does not have a DMA memory zone (24bit DMA). The architecture does not even define MAX_DMA_PFN. Cc: stable@vger.kernel.org Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Petr Tesarik <ptesarik@suse.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-26LoongArch: Handle fp, lsx, lasx and lbt assembly symbolsTiezhu Yang
Like the other relevant symbols, export some fp, lsx, lasx and lbt assembly symbols and put the function declarations in header files rather than source files. While at it, use "asmlinkage" for the other existing C prototypes of assembly functions and also do not use the "extern" keyword with function declarations according to the document coding-style.rst. Cc: stable@vger.kernel.org # 6.6+ Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-26LoongArch: Make do_xyz() exception handlers more robustTiezhu Yang
Currently, interrupts need to be disabled before single-step mode is set, it requires that CSR_PRMD_PIE be cleared in save_local_irqflag() which is called by setup_singlestep(), this is reasonable. But in the first kprobe breakpoint exception, if the irq is enabled at the beginning of do_bp(), it will not be disabled at the end of do_bp() due to the CSR_PRMD_PIE has been cleared in save_local_irqflag(). So for this case, it may corrupt exception context when restoring the exception after do_bp() in handle_bp(), this is not reasonable. In order to restore exception safely in handle_bp(), it needs to ensure the irq is disabled at the end of do_bp(), so just add a local variable to record the original interrupt status in the parent context, then use it as the check condition to enable and disable irq in do_bp(). While at it, do the similar thing for other do_xyz() exception handlers to make them more robust. Fixes: 6d4cc40fb5f5 ("LoongArch: Add kprobes support") Suggested-by: Jinyang He <hejinyang@loongson.cn> Suggested-by: Huacai Chen <chenhuacai@loongson.cn> Co-developed-by: Tianyang Zhang <zhangtianyang@loongson.cn> Signed-off-by: Tianyang Zhang <zhangtianyang@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-26LoongArch: Make regs_irqs_disabled() more clearTiezhu Yang
In the current code, the definition of regs_irqs_disabled() is actually "!(regs->csr_prmd & CSR_CRMD_IE)" because arch_irqs_disabled_flags() is defined as "!(flags & CSR_CRMD_IE)", it looks a little strange. Define regs_irqs_disabled() as !(regs->csr_prmd & CSR_PRMD_PIE) directly to make it more clear, no functional change. While at it, the return value of regs_irqs_disabled() is true or false, so change its type to reflect that and also make it always inline. Fixes: 803b0fc5c3f2 ("LoongArch: Add process management") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-26LoongArch: Select ARCH_USE_MEMTESTYuli Wang
As of commit dce44566192e ("mm/memtest: add ARCH_USE_MEMTEST"), architectures must select ARCH_USE_MEMTESET to enable CONFIG_MEMTEST. Commit 628c3bb40e9a ("LoongArch: Add boot and setup routines") added support for early_memtest but did not select ARCH_USE_MEMTESET. Fixes: 628c3bb40e9a ("LoongArch: Add boot and setup routines") Tested-by: Erpeng Xu <xuerpeng@uniontech.com> Tested-by: Yuli Wang <wangyuli@uniontech.com> Signed-off-by: Yuli Wang <wangyuli@uniontech.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2025-04-25Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Alexei Starovoitov: - Add namespace to BPF internal symbols (Alexei Starovoitov) - Fix possible endless loop in BPF map iteration (Brandon Kammerdiener) - Fix compilation failure for samples/bpf on LoongArch (Haoran Jiang) - Disable a part of sockmap_ktls test (Ihor Solodrai) - Correct typo in __clang_major__ macro (Peilin Ye) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Correct typo in __clang_major__ macro samples/bpf: Fix compilation failure for samples/bpf on LoongArch Fedora bpf: Add namespace to BPF internal symbols selftests/bpf: add test for softlock when modifying hashmap while iterating bpf: fix possible endless loop in BPF map iteration selftests/bpf: Mitigate sockmap_ktls disconnect_after_delete failure
2025-04-25selftests/bpf: Correct typo in __clang_major__ macroPeilin Ye
Make sure that CAN_USE_BPF_ST test (compute_live_registers/store) is enabled when __clang_major__ >= 18. Fixes: 2ea8f6a1cda7 ("selftests/bpf: test cases for compute_live_registers()") Signed-off-by: Peilin Ye <yepeilin@google.com> Link: https://lore.kernel.org/r/20250425213712.1542077-1-yepeilin@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-04-25Merge tag 'ata-6.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata fixes from Damien Le Moal: - Fix the incorrect return type of ata_mselect_control_ata_feature() - Several fixes for the control of the Command Duration Limits feature to avoid unnecessary enable and disable actions. Avoiding the unnecessary enable action also avoids unwanted resets of the CDL statistics log page as that is implied for any enable action. - Fix the translation for sensing the control mode page to correctly return the last enable or disable action performed, as defined in SAT-6. This correct mode sense information is used to fix the behavior of the scsi layer to avoid unnecessary mode select command issuing. * tag 'ata-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: scsi: Improve CDL control ata: libata-scsi: Improve CDL control ata: libata-scsi: Fix ata_msense_control_ata_feature() ata: libata-scsi: Fix ata_mselect_control_ata_feature() return type
2025-04-25Merge tag 'vfs-6.15-rc4.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - For some reason we went from zero to three maintainers for HFS/HFS+ in a matter of days. The lesson to learn from this might just be that we need to threaten code removal more often!? - Fix a regression introduced by enabling large folios for lage logical block sizes. This has caused issues for noref migration with large folios due to sleeping while in an atomic context. New sleeping variants of pagecache lookup helpers are introduced. These helpers take the folio lock instead of the mapping's private spinlock. The problematic users are converted to the sleeping variants and serialize against noref migration. Atomic users will bail on seeing the new BH_Migrate flag. This also shrinks the critical region of the mapping's private lock and the new blocking callers reduce contention on the spinlock for bdev mappings. - Fix two bugs in do_move_mount() when with MOVE_MOUNT_BENEATH. The first bug is using a mountpoint that is located on a mount we're not holding a reference to. The second bug is putting the mountpoint after we've called namespace_unlock() as it's no longer guaranteed that it does stay a mountpoint. - Remove a pointless call to vfs_getattr_nosec() in the devtmpfs code just to query i_mode instead of simply querying the inode directly. This also avoids lifetime issues for the dm code by an earlier bugfix this cycle that moved bdev_statx() handling into vfs_getattr_nosec(). - Fix AT_FDCWD handling with getname_maybe_null() in the xattr code. - Fix a performance regression for files when multiple callers issue a close when it's not the last reference. - Remove a duplicate noinline annotation from pipe_clear_nowait(). * tag 'vfs-6.15-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs/xattr: Fix handling of AT_FDCWD in setxattrat(2) and getxattrat(2) MAINTAINERS: hfs/hfsplus: add myself as maintainer splice: remove duplicate noinline from pipe_clear_nowait devtmpfs: don't use vfs_getattr_nosec to query i_mode fix a couple of races in MNT_TREE_BENEATH handling by do_move_mount() fs: fall back to file_ref_put() for non-last reference mm/migrate: fix sleep in atomic for large folios and buffer heads fs/ext4: use sleeping version of sb_find_get_block() fs/jbd2: use sleeping version of __find_get_block() fs/ocfs2: use sleeping version of __find_get_block() fs/buffer: use sleeping version of __find_get_block() fs/buffer: introduce sleeping flavors for pagecache lookups MAINTAINERS: add HFS/HFS+ maintainers fs/buffer: split locking for pagecache lookups
2025-04-25Merge tag 'ceph-for-6.15-rc4' of https://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A small CephFS encryption-related fix and a dead code cleanup" * tag 'ceph-for-6.15-rc4' of https://github.com/ceph/ceph-client: ceph: Fix incorrect flush end position calculation ceph: Remove osd_client deadcode
2025-04-25Merge tag 'cxl-fixes-6.15-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Dave Jiang: "The fixes address global persistent flush (GPF) changes and CXL Features support changes that went in the 6.15 merge window. And also a fix to an issue observed on CXL 1.1 platform during device enumeration. Summary: - Fix using the wrong GPF DVSEC location: - Fix caching of dport GPF DVSEC from the first endpoint - Ensure that the GPF phase timeout is only updated once by first endpoint - Drop is_port parameter for cxl_gpf_get_dvsec() - Fix the devm_* call host device for CXL fwctl setup - Set the out_len in Set Features failure case - Fix RCD initialization by skipping unneeded mem_en check" * tag 'cxl-fixes-6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/core/regs.c: Skip Memory Space Enable check for RCD and RCH Ports cxl/feature: Update out_len in set feature failure case cxl: Fix devm host device for CXL fwctl initialization cxl/pci: Drop the parameter is_port of cxl_gpf_get_dvsec() cxl/pci: Update Port GPF timeout only when the first EP attaching cxl/core: Fix caching dport GPF DVSEC issue
2025-04-26Merge tag 'amd-drm-fixes-6.15-2025-04-23' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.15-2025-04-23: amdgpu: - P2P DMA fixes - Display reset fixes - DCN 3.5 fixes - ACPI EDID fix - LTTPR fix - mode_valid() fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250423183045.2886753-1-alexander.deucher@amd.com
2025-04-26Merge tag 'exynos-drm-fixes-for-v6.15-rc4' of ↵Dave Airlie
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes Several fixups - fix spelling error - remove redundant error handling in exynos_drm_vidi.c module. - marks struct decon_data as const in the exynos7_drm_decon driver since it is only read. Cleanup - Remove unnecessary checking in exynos_drm_drv.c module Signed-off-by: Dave Airlie <airlied@redhat.com> From: Inki Dae <inki.dae@samsung.com> Link: https://lore.kernel.org/r/20250423143044.46165-1-inki.dae@samsung.com