summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2025-01-21PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on errorMohamed Khalfella
If dma_chan_tx allocation fails, set dma_chan_rx to NULL after it is freed. Link: https://lore.kernel.org/r/20241227160841.92382-1-khalfella@gmail.com Fixes: 8353813c88ef ("PCI: endpoint: Enable DMA tests for endpoints with DMA capabilities") Signed-off-by: Mohamed Khalfella <khalfella@gmail.com> [kwilczynski: commit log] Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Niklas Cassel <cassel@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-21regulator: TPS6287X: Use min/max uV to get VRANGEJonas Andreasson
Changing voltage might ignore slew rate and cause a current surge. With current implementation the driver will get the regulator to change the voltage range used during run time. According to communication I have had with Texas Instruments, this is not intended, since the Dynamic Voltage Scaling in the hardware is only designed to work within a voltage range. The current implementation will therefore ignore the slew rate that is defined in devicetree when the voltage range is changed during use. The current implementation will always select a voltage in the most accurate range that can reach that voltage even though multiple ranges are able to reach that voltage. There are 4 Voltage ranges with the following reach: 0b00: 0.4-0.71875V (1.25mV step size) 0b01: 0.4-1.0375V (2.5mV) 0b10: 0.4-1.675V (5mV) 0b11: 0.8-3.3V (10mV) This in practice means that a change from below to above 0.71875V will use the smallest range(0b00) for the values below and the second smallest range(0b01) for the voltages above (Up to 1.675V). I have timed how long it takes to go from below 0.71875V to above. The increase was 100mV which, with the slew rate set to 1250µV/µs. This in theory should take 80µs to do. With the current implementation, it takes 10µs on my hardware. Doing the same test with the slew rate set to 5000µV/µs, which should take 20µs, also only takes 10µs to do on my hardware. Not only is this not in line with the technical specification for the regulator. It also causes a current surge. Which when calculating the output current, as described in the technical specification, compared to what I could observe on my hardware the real output is ~1A higher (~1.2A) than what I calculated it to be(~0.2A). I tested also transitioning from a bigger to a smaller range, and the results were the same. Instead, let's limit the voltage range to a single one, which is in line with the intended use of the regulator. This is done by looking up the minimum and maximum requested voltage specified in devicetree. Signed-off-by: Jonas Andreasson <jonas.andreasson@axis.com> Link: https://patch.msgid.link/20250121-tps-fix-v2-1-50cc4d0f1635@axis.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-01-21fbdev: lcdcfb: Use backlight helperShixiong Ou
Signed-off-by: Shixiong Ou <oushixiong@kylinos.cn> Signed-off-by: Helge Deller <deller@gmx.de>
2025-01-21spi: omap2-mcspi: Correctly handle devm_clk_get_optional() errorsMark Brown
devm_clk_get_optional() returns NULL for missing clocks and a PTR_ERR() if there is a clock but we fail to get it, but currently we only handle the latter case and do so as though the clock was missing. If we get an error back we should handle that as an error since the clock exists but we failed to get it, if we get NULL then the clock doesn't exist and we should handle that. Fixes: 4c6ac5446d06 ("spi: omap2-mcspi: Fix the IS_ERR() bug for devm_clk_get_optional_enabled()") Reported-by: Lars Pedersen <lapeddk@gmail.com> Link: https://patch.msgid.link/20250117-spi-fix-omap2-optional-v1-1-e77d4ac6db6e@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Tested-by: Lars Pedersen <lapeddk@gmail.com> Signed-off-by: Mark Brown <broonie@kernel.org>
2025-01-21dm-crypt: fully initialize clone->bi_iter in crypt_alloc_buffer()Hou Tao
Both kcryptd_io_read() and kcryptd_crypt_write_convert() will invoke crypt_alloc_buffer() to allocate a new bio. Both of these two callers initialize bi_iter.bi_sector for the new bio separatedly after crypt_alloc_buffer() returns. However, kcryptd_crypt_write_convert() will copy the bi_iter of the new bio into ctx.iter_out or ctx.iter_in. Although it doesn't incur any harm now, it is better to fully initialize bi_iter before it is used. Therefore, initialize bi_iter.bi_sector in crypt_alloc_buffer() instead. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-01-21dm-crypt: set atomic as false when calling crypt_convert() in kworkerHou Tao
Both kcryptd_crypt_write_continue() and kcryptd_crypt_read_continue() are running in the kworker context, it is OK to call cond_resched(), Therefore, set atomic as false when invoking crypt_convert() under kworker context. Signed-off-by: Hou Tao <houtao1@huawei.com> Reviewed-by: Ignat Korchagin <ignat@cloudflare.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2025-01-21Merge branch 'fixes' into 'for-next'Ilpo Järvinen
Merged the 'fixes' branch into the 'for-next' branch to resolve a conflict in alienware-wmi zone teardown code.
2025-01-21drm/client: Handle tiled displays betterMaarten Lankhorst
When testing on my tiled display, initially the tiled display is detected correctly: [90376.523692] xe 0000:67:00.0: [drm:drm_client_firmware_config.isra.0 [drm]] fallback: Not all outputs enabled [90376.523713] xe 0000:67:00.0: [drm:drm_client_firmware_config.isra.0 [drm]] Enabled: 0, detected: 2 ... [90376.523967] xe 0000:67:00.0: [drm:drm_client_modeset_probe [drm]] [CRTC:82:pipe A] desired mode 1920x2160 set (1920,0) [90376.524020] xe 0000:67:00.0: [drm:drm_client_modeset_probe [drm]] [CRTC:134:pipe B] desired mode 1920x2160 set (0,0) But then, when modes have been set: [90379.729525] xe 0000:67:00.0: [drm:drm_client_firmware_config.isra.0 [drm]] [CONNECTOR:287:DP-4] on [CRTC:82:pipe A]: 1920x2160 [90379.729640] xe 0000:67:00.0: [drm:drm_client_firmware_config.isra.0 [drm]] [CONNECTOR:289:DP-5] on [CRTC:134:pipe B]: 1920x2160 ... [90379.730036] xe 0000:67:00.0: [drm:drm_client_modeset_probe [drm]] [CRTC:82:pipe A] desired mode 1920x2160 set (0,0) [90379.730124] xe 0000:67:00.0: [drm:drm_client_modeset_probe [drm]] [CRTC:134:pipe B] desired mode 1920x2160 set (0,0) Call drm_client_get_tile_offsets() in drm_client_firmware_config() as well, to ensure that the offset is set correctly. This has to be done as a separate pass, as the tile order may not be equal to the drm connector order. Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250116142825.3933-2-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Cc: <stable@vger.kernel.org>
2025-01-21drm/modeset: Handle tiled displays in pan_display_atomic.Maarten Lankhorst
Tiled displays have a different x/y offset to begin with. Instead of attempting to remember this, just apply a delta instead. This fixes the first tile being duplicated on other tiles when vt switching. Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20250116142825.3933-1-dev@lankhorst.se Signed-off-by: Maarten Lankhorst <dev@lankhorst.se> Cc: <stable@vger.kernel.org>
2025-01-21pmdomain: airoha: Fix compilation error with Clang-20 and Thumb2 modeChristian Marangi
The use of R7 in the SMCCC conflicts with the compiler's use of R7 as a frame pointer in Thumb2 mode, which is forcibly enabled by Clang when profiling hooks are inserted via the -pg switch. This is a known issue and similar driver workaround this with a Makefile ifdef. Exact workaround are applied in drivers/firmware/arm_scmi/transports/Makefile and other similar driver. Suggested-by: Sudeep Holla <sudeep.holla@arm.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501201840.XmpHXpQ4-lkp@intel.com/ Fixes: 82e703dd438b ("pmdomain: airoha: Add Airoha CPU PM Domain support") Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Link: https://lore.kernel.org/r/20250120153817.11807-1-ansuelsmth@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-01-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
No conflicts and no adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-01-20Merge tag 'powerpc-6.14-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Madhavan Srinivasan: - Add preempt lazy support - Deprecate cxl and cxl flash driver - Fix a possible IOMMU related OOPS at boot on pSeries - Optimize sched_clock() in ppc32 by replacing mulhdu() by mul_u64_u64_shr() Thanks to Andrew Donnellan, Andy Shevchenko, Ankur Arora, Christophe Leroy, Frederic Barrat, Gaurav Batra, Luis Felipe Hernandez, Michael Ellerman, Nilay Shroff, Ricardo B. Marliere, Ritesh Harjani (IBM), Sebastian Andrzej Siewior, Shrikanth Hegde, Sourabh Jain, Thorsten Blum, and Zhu Jun. * tag 'powerpc-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc: Fix argument order to timer_sub() powerpc/prom_init: Use IS_ENABLED() powerpc/pseries/iommu: IOMMU incorrectly marks MMIO range in DDW powerpc: Use str_on_off() helper in check_cache_coherency() powerpc: Large user copy aware of full:rt:lazy preemption powerpc: Add preempt lazy support powerpc/book3s64/hugetlb: Fix disabling hugetlb when fadump is active powerpc/vdso: Mark the vDSO code read-only after init powerpc/64: Use get_user() in start_thread() macintosh: declare ctl_table as const selftest/powerpc/ptrace: Cleanup duplicate macro definitions selftest/powerpc/ptrace/ptrace-pkey: Remove duplicate macros selftest/powerpc/ptrace/core-pkey: Remove duplicate macros powerpc/8xx: Drop legacy-of-mm-gpiochip.h header scsi/cxlflash: Deprecate driver cxl: Deprecate driver selftests/powerpc: Fix typo in test-vphn.c powerpc/xmon: Use str_yes_no() helper in dump_one_paca() powerpc/32: Replace mulhdu() by mul_u64_u64_shr()
2025-01-20Merge branch 'next' into for-linusDmitry Torokhov
Prepare input updates for 6.14 merge window.
2025-01-20Input: synaptics - fix crash when enabling pass-through portDmitry Torokhov
When enabling a pass-through port an interrupt might come before psmouse driver binds to the pass-through port. However synaptics sub-driver tries to access psmouse instance presumably associated with the pass-through port to figure out if only 1 byte of response or entire protocol packet needs to be forwarded to the pass-through port and may crash if psmouse instance has not been attached to the port yet. Fix the crash by introducing open() and close() methods for the port and check if the port is open before trying to access psmouse instance. Because psmouse calls serio_open() only after attaching psmouse instance to serio port instance this prevents the potential crash. Reported-by: Takashi Iwai <tiwai@suse.de> Fixes: 100e16959c3c ("Input: libps2 - attach ps2dev instances as serio port's drvdata") Link: https://bugzilla.suse.com/show_bug.cgi?id=1219522 Cc: stable@vger.kernel.org Reviewed-by: Takashi Iwai <tiwai@suse.de> Link: https://lore.kernel.org/r/Z4qSHORvPn7EU2j1@google.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-01-20Input: atkbd - map F23 key to support default copilot shortcutMark Pearson
Microsoft defined Meta+Shift+F23 as the Copilot shortcut instead of a dedicated keycode, and multiple vendors have their keyboards emit this sequence in response to users pressing a dedicated "Copilot" key. Unfortunately the default keymap table in atkbd does not map scancode 0x6e (F23) and so the key combination does not work even if userspace is ready to handle it. Because this behavior is common between multiple vendors and the scancode is currently unused map 0x6e to keycode 193 (KEY_F23) so that key sequence is generated properly. MS documentation for the scan code: https://learn.microsoft.com/en-us/windows/win32/inputdev/about-keyboard-input#scan-codes Confirmed on Lenovo, HP and Dell machines by Canonical. Tested on Lenovo T14s G6 AMD. Signed-off-by: Mark Pearson <mpearson-lenovo@squebb.ca> Link: https://lore.kernel.org/r/20250107034554.25843-1-mpearson-lenovo@squebb.ca Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-01-20Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "We've got a little less than normal thanks to the holidays in December, but there's the usual summary below. The highlight is probably the 52-bit physical addressing (LPA2) clean-up from Ard. Confidential Computing: - Register a platform device when running in CCA realm mode to enable automatic loading of dependent modules CPU Features: - Update a bunch of system register definitions to pick up new field encodings from the architectural documentation - Add hwcaps and selftests for the new (2024) dpISA extensions Documentation: - Update EL3 (firmware) requirements for booting Linux on modern arm64 designs - Remove stale information about the kernel virtual memory map Miscellaneous: - Minor cleanups and typo fixes Memory management: - Fix vmemmap_check_pmd() to look at the PMD type bits - LPA2 (52-bit physical addressing) cleanups and minor fixes - Adjust physical address space depending upon whether or not LPA2 is enabled Perf and PMUs: - Add port filtering support for NVIDIA's NVLINK-C2C Coresight PMU - Extend AXI filtering support for the DDR PMU on NXP IMX SoCs - Fix Designware PCIe PMU event numbering - Add generic branch events for the Apple M1 CPU PMU - Add support for Marvell Odyssey DDR and LLC-TAD PMUs - Cleanups to the Hisilicon DDRC and Uncore PMU code - Advertise discard mode for the SPE PMU - Add the perf users mailing list to our MAINTAINERS entry" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (64 commits) Documentation: arm64: Remove stale and redundant virtual memory diagrams perf docs: arm_spe: Document new discard mode perf: arm_spe: Add format option for discard mode MAINTAINERS: Add perf list for drivers/perf/ arm64: Remove duplicate included header drivers/perf: apple_m1: Map generic branch events arm64: rsi: Add automatic arm-cca-guest module loading kselftest/arm64: Add 2024 dpISA extensions to hwcap test KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1 arm64/hwcap: Describe 2024 dpISA extensions to userspace arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12 arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented drivers/perf: hisi: Set correct IRQ affinity for PMUs with no association arm64/sme: Move storage of reg_smidr to __cpuinfo_store_cpu() arm64: mm: Test for pmd_sect() in vmemmap_check_pmd() arm64/mm: Replace open encodings with PXD_TABLE_BIT arm64/mm: Rename pte_mkpresent() as pte_mkvalid() arm64/sysreg: Update ID_AA64ISAR2_EL1 to DDI0601 2024-09 arm64/sysreg: Update ID_AA64ZFR0_EL1 to DDI0601 2024-09 arm64/sysreg: Update ID_AA64FPFR0_EL1 to DDI0601 2024-09 ...
2025-01-20Merge tag 'm68k-for-v6.14-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - Use the generic muldi3 libgcc function - Miscellaneous fixes and improvements * tag 'm68k-for-v6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: libgcc: Fix lvalue abuse in umul_ppmm() m68k: vga: Fix I/O defines zorro: Constify 'struct bin_attribute' m68k: atari: Use str_on_off() helper in atari_nvram_proc_read() m68k: Use kernel's generic muldi3 libgcc function
2025-01-20Merge tag 's390-6.14-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Alexander Gordeev: - Select config option KASAN_VMALLOC if KASAN is enabled - Select config option VMAP_STACK unconditionally - Implement arch_atomic_inc() / arch_atomic_dec() functions which result in a single instruction if compiled for z196 or newer architectures - Make layering between atomic.h and atomic_ops.h consistent - Comment s390 preempt_count implementation - Remove pre MARCH_HAS_Z196_FEATURES preempt count implementation - GCC uses the number of lines of an inline assembly to calculate number of instructions and decide on inlining. Therefore remove superfluous new lines from a couple of inline assemblies. - Provide arch_atomic_*_and_test() implementations that allow the compiler to generate slightly better code. - Optimize __preempt_count_dec_and_test() - Remove __bootdata annotations from declarations in header files - Add missing include of <linux/smp.h> in abs_lowcore.h to provide declarations for get_cpu() and put_cpu() used in the code - Fix suboptimal kernel image base when running make kasan.config - Remove huge_pte_none() and huge_pte_none_mostly() as are identical to the generic variants - Remove unused PAGE_KERNEL_EXEC, SEGMENT_KERNEL_EXEC, and REGION3_KERNEL_EXEC defines - Simplify noexec page protection handling and change the page, segment and region3 protection definitions automatically if the instruction execution-protection facility is not available - Save one instruction and prefer EXRL instruction over EX in string, xor_*(), amode31 and other functions - Create /dev/diag misc device to fetch diagnose specific information from the kernel and provide it to userspace - Retrieve electrical power readings using DIAGNOSE 0x324 ioctl - Make ccw_device_get_ciw() consistent and use array indices instead of pointer arithmetic - s390/qdio: Move memory alloc/pointer arithmetic for slib and sl into one place - The sysfs core now allows instances of 'struct bin_attribute' to be moved into read-only memory. Make use of that in s390 code - Add missing TLB range adjustment in pud_free_tlb() - Improve topology setup by adding early polarization detection - Fix length checks in codepage_convert() function - The generic bitops implementation is nearly identical to the s390 one. Switch to the generic variant and decrease a bit the kernel image size - Provide an optimized arch_test_bit() implementation which makes use of flag output constraint. This generates slightly better code - Provide memory topology information obtanied with DIAGNOSE 0x310 using ioctl. - Various other small improvements, fixes, and cleanups Also, some changes came in through a merge of 'pci-device-recovery' branch: - Add PCI error recovery status mechanism - Simplify and document debug_next_entry() logic - Split private data allocation and freeing out of debug file open() and close() operations - Add debug_dump() function that gets a textual representation of a debug info (e.g. PCI recovery hardware error logs) - Add formatted content of pci_debug_msg_id to the PCI report * tag 's390-6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits) s390/futex: Fix FUTEX_OP_ANDN implementation s390/diag: Add memory topology information via diag310 s390/bitops: Provide optimized arch_test_bit() s390/bitops: Switch to generic bitops s390/ebcdic: Fix length decrement in codepage_convert() s390/ebcdic: Fix length check in codepage_convert() s390/ebcdic: Use exrl instead of ex s390/amode31: Use exrl instead of ex s390/stackleak: Use exrl instead of ex in __stackleak_poison() s390/lib: Use exrl instead of ex in xor functions s390/topology: Improve topology detection s390/tlb: Add missing TLB range adjustment s390/pkey: Constify 'struct bin_attribute' s390/sclp: Constify 'struct bin_attribute' s390/pci: Constify 'struct bin_attribute' s390/ipl: Constify 'struct bin_attribute' s390/crypto/cpacf: Constify 'struct bin_attribute' s390/qdio: Move memory alloc/pointer arithmetic for slib and sl into one place s390/cio: Use array indices instead of pointer arithmetic s390/qdio: Rename feature flag aif_osa to aif_qdio ...
2025-01-20Merge tag 'for-6.14/io_uring-20250119' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring updates from Jens Axboe: "Not a lot in terms of features this time around, mostly just cleanups and code consolidation: - Support for PI meta data read/write via io_uring, with NVMe and SCSI covered - Cleanup the per-op structure caching, making it consistent across various command types - Consolidate the various user mapped features into a concept called regions, making the various users of that consistent - Various cleanups and fixes" * tag 'for-6.14/io_uring-20250119' of git://git.kernel.dk/linux: (56 commits) io_uring/fdinfo: fix io_uring_show_fdinfo() misuse of ->d_iname io_uring: reuse io_should_terminate_tw() for cmds io_uring: Factor out a function to parse restrictions io_uring/rsrc: require cloned buffers to share accounting contexts io_uring: simplify the SQPOLL thread check when cancelling requests io_uring: expose read/write attribute capability io_uring/rw: don't gate retry on completion context io_uring/rw: handle -EAGAIN retry at IO completion time io_uring/rw: use io_rw_recycle() from cleanup path io_uring/rsrc: simplify the bvec iter count calculation io_uring: ensure io_queue_deferred() is out-of-line io_uring/rw: always clear ->bytes_done on io_async_rw setup io_uring/rw: use NULL for rw->free_iovec assigment io_uring/rw: don't mask in f_iocb_flags io_uring/msg_ring: Drop custom destructor io_uring: Move old async data allocation helper to header io_uring/rw: Allocate async data through helper io_uring/net: Allocate msghdr async data through helper io_uring/uring_cmd: Allocate async data through generic helper io_uring/poll: Allocate apoll with generic alloc_cache helper ...
2025-01-20Merge tag 'for-6.14/block-20250118' of git://git.kernel.dk/linuxLinus Torvalds
Pull block updates from Jens Axboe: - NVMe pull requests via Keith: - Target support for PCI-Endpoint transport (Damien) - TCP IO queue spreading fixes (Sagi, Chaitanya) - Target handling for "limited retry" flags (Guixen) - Poll type fix (Yongsoo) - Xarray storage error handling (Keisuke) - Host memory buffer free size fix on error (Francis) - MD pull requests via Song: - Reintroduce md-linear (Yu Kuai) - md-bitmap refactor and fix (Yu Kuai) - Replace kmap_atomic with kmap_local_page (David Reaver) - Quite a few queue freeze and debugfs deadlock fixes Ming introduced lockdep support for this in the 6.13 kernel, and it has (unsurprisingly) uncovered quite a few issues - Use const attributes for IO schedulers - Remove bio ioprio wrappers - Fixes for stacked device atomic write support - Refactor queue affinity helpers, in preparation for better supporting isolated CPUs - Cleanups of loop O_DIRECT handling - Cleanup of BLK_MQ_F_* flags - Add rotational support for null_blk - Various fixes and cleanups * tag 'for-6.14/block-20250118' of git://git.kernel.dk/linux: (106 commits) block: Don't trim an atomic write block: Add common atomic writes enable flag md/md-linear: Fix a NULL vs IS_ERR() bug in linear_add() block: limit disk max sectors to (LLONG_MAX >> 9) block: Change blk_stack_atomic_writes_limits() unit_min check block: Ensure start sector is aligned for stacking atomic writes blk-mq: Move more error handling into blk_mq_submit_bio() block: Reorder the request allocation code in blk_mq_submit_bio() nvme: fix bogus kzalloc() return check in nvme_init_effects_log() md/md-bitmap: move bitmap_{start, end}write to md upper layer md/raid5: implement pers->bitmap_sector() md: add a new callback pers->bitmap_sector() md/md-bitmap: remove the last parameter for bimtap_ops->endwrite() md/md-bitmap: factor behind write counters out from bitmap_{start/end}write() md: Replace deprecated kmap_atomic() with kmap_local_page() md: reintroduce md-linear partitions: ldm: remove the initial kernel-doc notation blk-cgroup: rwstat: fix kernel-doc warnings in header file blk-cgroup: fix kernel-doc warnings in header file nbd: fix partial sending ...
2025-01-20net: phylink: fix regression when binding a PHYRussell King (Oracle)
Some PHYs don't support clause 45 access, and return -EOPNOTSUPP from phy_modify_mmd(), which causes phylink_bringup_phy() to fail. Prevent this failure by allowing -EOPNOTSUPP to also mean success. Reported-by: Jiawen Wu <jiawenwu@trustnetic.com> Tested-by: Jiawen Wu <jiawenwu@trustnetic.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/E1tZp1a-001V62-DT@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20net: ethernet: ti: am65-cpsw: streamline TX queue creation and cleanupRoger Quadros
Introduce am65_cpsw_create_txqs() and am65_cpsw_destroy_txqs() and use them. Signed-off-by: Roger Quadros <rogerq@kernel.org> Link: https://patch.msgid.link/20250117-am65-cpsw-streamline-v2-3-91a29c97e569@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20net: ethernet: ti: am65-cpsw: streamline RX queue creation and cleanupRoger Quadros
Introduce am65_cpsw_create_rxqs() and am65_cpsw_destroy_rxqs() and use them. Signed-off-by: Roger Quadros <rogerq@kernel.org> Link: https://patch.msgid.link/20250117-am65-cpsw-streamline-v2-2-91a29c97e569@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20net: ethernet: ti: am65-cpsw: ensure proper channel cleanup in error pathRoger Quadros
We are missing netif_napi_del() and am65_cpsw_nuss_free_tx/rx_chns() in error path when am65_cpsw_nuss_init_tx/rx_chns() is used anywhere other than at probe(). i.e. am65_cpsw_nuss_update_tx_rx_chns and am65_cpsw_nuss_resume() As reported, in am65_cpsw_nuss_update_tx_rx_chns(), if am65_cpsw_nuss_init_tx_chns() partially fails then devm_add_action(dev, am65_cpsw_nuss_free_tx_chns,..) is added but the cleanup via am65_cpsw_nuss_free_tx_chns() will not run. Same issue exists for am65_cpsw_nuss_init_tx/rx_chns() failures in am65_cpsw_nuss_resume() as well. This would otherwise require more instances of devm_add/remove_action and is clearly more of a distraction than any benefit. So, drop devm_add/remove_action for am65_cpsw_nuss_free_tx/rx_chns() and call am65_cpsw_nuss_free_tx/rx_chns() and netif_napi_del() where required. Reported-by: Siddharth Vadapalli <s-vadapalli@ti.com> Closes: https://lore.kernel.org/all/m4rhkzcr7dlylxr54udyt6lal5s2q4krrvmyay6gzgzhcu4q2c@r34snfumzqxy/ Signed-off-by: Roger Quadros <rogerq@kernel.org> Link: https://patch.msgid.link/20250117-am65-cpsw-streamline-v2-1-91a29c97e569@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20net: stmmac: Drop redundant skb_mark_for_recycle() for SKB fragsFurong Xu
After commit df542f669307 ("net: stmmac: Switch to zero-copy in non-XDP RX path"), SKBs are always marked for recycle, it is redundant to mark SKBs more than once when new frags are appended. Signed-off-by: Furong Xu <0x1207@gmail.com> Link: https://patch.msgid.link/20250117062805.192393-1-0x1207@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20net: mii: Fix the Speed display when the network cable is not connectedXiangqian Zhang
Two different models of usb card, the drivers are r8152 and asix. If no network cable is connected, Speed = 10Mb/s. This problem is repeated in linux 3.10, 4.19, 5.4, 6.12. This problem also exists on the latest kernel. Both drivers call mii_ethtool_get_link_ksettings, but the value of cmd->base.speed in this function can only be SPEED_1000 or SPEED_100 or SPEED_10. When the network cable is not connected, set cmd->base.speed =SPEED_UNKNOWN. Signed-off-by: Xiangqian Zhang <zhangxiangqian@kylinos.cn> Link: https://patch.msgid.link/20250117094603.4192594-1-zhangxiangqian@kylinos.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20Merge branches 'pm-devfreq' and 'pm-opp'Rafael J. Wysocki
Merge devfreq and OPP (Operating Performance Points) updates for 6.14: - Clean up the Exynos devfreq driver and devfreq core (Markus Elfring, Jeongjun Park). - Minor cleanups and fixes for OPP (Dan Carpenter, Neil Armstrong, Joe Hattori). - Implement dev_pm_opp_get_bw() (Neil Armstrong). - Expose OPP reference counting helpers for Rust (Viresh Kumar). * pm-devfreq: PM / devfreq: exynos: remove unused function parameter PM / devfreq: event: Call of_node_put() only once in devfreq_event_get_edev_by_phandle() * pm-opp: PM / OPP: Add reference counting helpers for Rust implementation OPP: OF: Fix an OF node leak in _opp_add_static_v2() OPP: fix dev_pm_opp_find_bw_*() when bandwidth table not initialized OPP: add index check to assert to avoid buffer overflow in _read_freq() opp: core: Fix off by one in dev_pm_opp_get_bw() opp: core: implement dev_pm_opp_get_bw
2025-01-20eth: bnxt: update header sizing defaultsJakub Kicinski
300-400B RPC requests are fairly common. With the current default of 256B HDS threshold bnxt ends up splitting those, lowering PCIe bandwidth efficiency and increasing the number of memory allocation. Increase the HDS threshold to fit 4 buffers in a 4k page. This works out to 640B as the threshold on a typical kernel confing. This change increases the performance for a microbenchmark which receives 400B RPCs and sends empty responses by 4.5%. Admittedly this is just a single benchmark, but 256B works out to just 6 (so 2 more) packets per head page, because shinfo size dominates the headers. Now that we use page pool for the header pages I was also tempted to default rx_copybreak to 0, but in synthetic testing the copybreak size doesn't seem to make much difference. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20eth: bnxt: allocate enough buffer space to meet HDS thresholdJakub Kicinski
Now that we can configure HDS threshold separately from the rx_copybreak HDS threshold may be higher than rx_copybreak. We need to make sure that we have enough space for the headers. Fixes: 6b43673a25c3 ("bnxt_en: add support for hds-thresh ethtool command") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20net: ethtool: populate the default HDS params in the coreJakub Kicinski
The core has the current HDS config, it can pre-populate the values for the drivers. While at it, remove the zero-setting in netdevsim. Zero are the default values since the config is zalloc'ed. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20eth: bnxt: apply hds_thrs settings correctlyJakub Kicinski
Use the pending config for hds_thrs. Core will only update the "current" one after we return success. Without this change 2 reconfigs would be required for the setting to reach the device. Fixes: 6b43673a25c3 ("bnxt_en: add support for hds-thresh ethtool command") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20net: move HDS config from ethtool stateJakub Kicinski
Separate the HDS config from the ethtool state struct. The HDS config contains just simple parameters, not state. Having it as a separate struct will make it easier to clone / copy and also long term potentially make it per-queue. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250119020518.1962249-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20Merge branch 'pm-cpufreq'Rafael J. Wysocki
Merge cpufreq updates for 6.14: - Use str_enable_disable()-like helpers in cpufreq (Krzysztof Kozlowski). - Extend the Apple cpufreq driver to support more SoCs (Hector Martin, Nick Chan). - Add new cpufreq driver for Airoha SoCs (Christian Marangi). - Fix using cpufreq-dt as module (Andreas Kemnade). - Minor fixes for Sparc, SCMI, and Qcom cpufreq drivers (Ethan Carter Edwards, Sibi Sankar, Manivannan Sadhasivam). - Fix the maximum supported frequency computation in the ACPI cpufreq driver to avoid relying on unfounded assumptions (Gautham Shenoy). - Fix an amd-pstate driver regression with preferred core rankings not being used (Mario Limonciello). - Fix a precision issue with frequency calculation in the amd-pstate driver (Naresh Solanki). - Add ftrace event to the amd-pstate driver for active mode (Mario Limonciello). - Set default EPP policy on Ryzen processors in amd-pstate (Mario Limonciello). - Clean up the amd-pstate cpufreq driver and optimize it to increase code reuse (Mario Limonciello, Dhananjay Ugwekar). - Use CPPC to get scaling factors between HWP performance levels and frequency in the intel_pstate driver and make it stop using a built -in scaling factor for the Arrow Lake processor (Rafael Wysocki). - Make intel_pstate initialize epp_policy to CPUFREQ_POLICY_UNKNOWN for consistency with CPU offline (Christian Loehle). - Fix superfluous updates caused by need_freq_update in the schedutil cpufreq governor (Sultan Alsawaf). * pm-cpufreq: (40 commits) cpufreq: Use str_enable_disable()-like helpers cpufreq: airoha: Add EN7581 CPUFreq SMCCC driver cpufreq: ACPI: Fix max-frequency computation cpufreq/amd-pstate: Refactor max frequency calculation cpufreq/amd-pstate: Fix prefcore rankings cpufreq: sparc: change kzalloc to kcalloc cpufreq: qcom: Implement clk_ops::determine_rate() for qcom_cpufreq* clocks cpufreq: qcom: Fix qcom_cpufreq_hw_recalc_rate() to query LUT if LMh IRQ is not available cpufreq: apple-soc: Add Apple A7-A8X SoC cpufreq support cpufreq: apple-soc: Set fallback transition latency to APPLE_DVFS_TRANSITION_TIMEOUT cpufreq: apple-soc: Increase cluster switch timeout to 400us cpufreq: apple-soc: Use 32-bit read for status register cpufreq: apple-soc: Allow per-SoC configuration of APPLE_DVFS_CMD_PS1 cpufreq: apple-soc: Drop setting the PS2 field on M2+ dt-bindings: cpufreq: apple,cluster-cpufreq: Add A7-A11, T2 compatibles dt-bindings: cpufreq: Document support for Airoha EN7581 CPUFreq cpufreq: fix using cpufreq-dt as module cpufreq: scmi: Register for limit change notifications cpufreq: schedutil: Fix superfluous updates caused by need_freq_update cpufreq: intel_pstate: Use CPUFREQ_POLICY_UNKNOWN ...
2025-01-20PCI: dwc: Simplify config resource lookupBjorn Helgaas
If platform_get_resource_byname("config") fails, return error immediately and unindent the normal path. No functional change intended. Link: https://lore.kernel.org/r/20250117235119.712043-1-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2025-01-20PCI: imx6: Clean up comments and whitespaceBjorn Helgaas
For readability, fix typos and comments that needlessly exceed 80 columns. Link: https://lore.kernel.org/r/20250118210727.795559-1-helgaas@kernel.org Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Frank Li <Frank.Li@nxp.com>
2025-01-20Merge branches 'pm-sleep', 'pm-cpuidle' and 'pm-em'Rafael J. Wysocki
Merge updates related to system sleep, a cpuidle update and an Energy Model handling code update for 6.14-rc1: - Allow configuring the system suspend-resume (DPM) watchdog to warn earlier than panic (Douglas Anderson). - Implement devm_device_init_wakeup() helper and introduce a device- managed variant of dev_pm_set_wake_irq() (Joe Hattori, Peng Fan). - Remove direct inclusions of 'pm_wakeup.h' which should be only included via 'device.h' (Wolfram Sang). - Clean up two comments in the core system-wide PM code (Rafael Wysocki, Randy Dunlap). - Add Clearwater Forest processor support to the intel_idle cpuidle driver (Artem Bityutskiy). - Move sched domains rebuild function from the schedutil cpufreq governor to the Energy Model handling code (Rafael Wysocki). * pm-sleep: PM: sleep: wakeirq: Introduce device-managed variant of dev_pm_set_wake_irq() PM: sleep: Allow configuring the DPM watchdog to warn earlier than panic PM: sleep: convert comment from kernel-doc to plain comment PM: wakeup: implement devm_device_init_wakeup() helper PM: sleep: sysfs: don't include 'pm_wakeup.h' directly PM: sleep: autosleep: don't include 'pm_wakeup.h' directly PM: sleep: Update stale comment in device_resume() * pm-cpuidle: intel_idle: add Clearwater Forest SoC support * pm-em: PM: EM: Move sched domains rebuild function from schedutil to EM
2025-01-20Merge tag 'kernel-6.14-rc1.cred' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull cred refcount updates from Christian Brauner: "For the v6.13 cycle we switched overlayfs to a variant of override_creds() that doesn't take an extra reference. To this end the {override,revert}_creds_light() helpers were introduced. This generalizes the idea behind {override,revert}_creds_light() to the {override,revert}_creds() helpers. Afterwards overriding and reverting credentials is reference count free unless the caller explicitly takes a reference. All callers have been appropriately ported" * tag 'kernel-6.14-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (30 commits) cred: fold get_new_cred_many() into get_cred_many() cred: remove unused get_new_cred() nfsd: avoid pointless cred reference count bump cachefiles: avoid pointless cred reference count bump dns_resolver: avoid pointless cred reference count bump trace: avoid pointless cred reference count bump cgroup: avoid pointless cred reference count bump acct: avoid pointless reference count bump io_uring: avoid pointless cred reference count bump smb: avoid pointless cred reference count bump cifs: avoid pointless cred reference count bump cifs: avoid pointless cred reference count bump ovl: avoid pointless cred reference count bump open: avoid pointless cred reference count bump nfsfh: avoid pointless cred reference count bump nfs/nfs4recover: avoid pointless cred reference count bump nfs/nfs4idmap: avoid pointless reference count bump nfs/localio: avoid pointless cred reference count bumps coredump: avoid pointless cred reference count bump binfmt_misc: avoid pointless cred reference count bump ...
2025-01-20Merge tag 'vfs-6.14-rc1.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Support caching symlink lengths in inodes The size is stored in a new union utilizing the same space as i_devices, thus avoiding growing the struct or taking up any more space When utilized it dodges strlen() in vfs_readlink(), giving about 1.5% speed up when issuing readlink on /initrd.img on ext4 - Add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag If a file system supports uncached buffered IO, it may set FOP_DONTCACHE and enable support for RWF_DONTCACHE. If RWF_DONTCACHE is attempted without the file system supporting it, it'll get errored with -EOPNOTSUPP - Enable VBOXGUEST and VBOXSF_FS on ARM64 Now that VirtualBox is able to run as a host on arm64 (e.g. the Apple M3 processors) we can enable VBOXSF_FS (and in turn VBOXGUEST) for this architecture. Tested with various runs of bonnie++ and dbench on an Apple MacBook Pro with the latest Virtualbox 7.1.4 r165100 installed Cleanups: - Delay sysctl_nr_open check in expand_files() - Use kernel-doc includes in fiemap docbook - Use page->private instead of page->index in watch_queue - Use a consume fence in mnt_idmap() as it's heavily used in link_path_walk() - Replace magic number 7 with ARRAY_SIZE() in fc_log - Sort out a stale comment about races between fd alloc and dup2() - Fix return type of do_mount() from long to int - Various cosmetic cleanups for the lockref code Fixes: - Annotate spinning as unlikely() in __read_seqcount_begin The annotation already used to be there, but got lost in commit 52ac39e5db51 ("seqlock: seqcount_t: Implement all read APIs as statement expressions") - Fix proc_handler for sysctl_nr_open - Flush delayed work in delayed fput() - Fix grammar and spelling in propagate_umount() - Fix ESP not readable during coredump In /proc/PID/stat, there is the kstkesp field which is the stack pointer of a thread. While the thread is active, this field reads zero. But during a coredump, it should have a valid value However, at the moment, kstkesp is zero even during coredump - Don't wake up the writer if the pipe is still full - Fix unbalanced user_access_end() in select code" * tag 'vfs-6.14-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits) gfs2: use lockref_init for qd_lockref erofs: use lockref_init for pcl->lockref dcache: use lockref_init for d_lockref lockref: add a lockref_init helper lockref: drop superfluous externs lockref: use bool for false/true returns lockref: improve the lockref_get_not_zero description lockref: remove lockref_put_not_zero fs: Fix return type of do_mount() from long to int select: Fix unbalanced user_access_end() vbox: Enable VBOXGUEST and VBOXSF_FS on ARM64 pipe_read: don't wake up the writer if the pipe is still full selftests: coredump: Add stackdump test fs/proc: do_task_stat: Fix ESP not readable during coredump fs: add RWF_DONTCACHE iocb and FOP_DONTCACHE file_operations flag fs: sort out a stale comment about races between fd alloc and dup2 fs: Fix grammar and spelling in propagate_umount() fs: fc_log replace magic number 7 with ARRAY_SIZE() fs: use a consume fence in mnt_idmap() file: flush delayed work in delayed fput() ...
2025-01-20Merge branches 'acpi-battery', 'acpi-fan' and 'acpi-misc'Rafael J. Wysocki
Merge ACPI battery and fan drivers updates and miscellaneous ACPI chanages for 6.14: - Update messages printed by the ACPI battery driver to always refer to driver extensions as "hooks" to avoid confusion with similar functionality in the power supply subsystem in the future (Thomas Weißschuh). - Fix .probe() error path cleanup in the ACPI fan driver to avoid memory leaks (Joe Hattori). - Constify 'struct bin_attribute' in some places in the ACPI subsystem and mark it as __ro_after_init in one place to prevent binary blob attributes from being updated (Thomas Weißschuh) - Add empty stubs for several ACPI-related symbols so that they can be used when CONFIG_ACPI is unset and use them for removing unnecessary conditional compilation from the ipu-bridge driver (Ricardo Ribalda). * acpi-battery: ACPI: battery: Rename extensions to hook in messages * acpi-fan: ACPI: fan: cleanup resources in the error path of .probe() * acpi-misc: media: ipu-bridge: Remove unneeded conditional compilations ACPI: bus: implement acpi_device_hid when !ACPI ACPI: bus: implement for_each_acpi_consumer_dev when !ACPI ACPI: header: implement acpi_device_handle when !ACPI ACPI: bus: implement acpi_get_physical_device_location when !ACPI ACPI: bus: implement for_each_acpi_dev_match when !ACPI ACPI: bus: change the prototype for acpi_get_physical_device_location ACPI: sysfs: Constify 'struct bin_attribute' ACPI: BGRT: Constify 'struct bin_attribute' ACPI: BGRT: Mark bin_attribute as __ro_after_init
2025-01-20Merge branches 'acpi-osl', 'acpi-tables', 'acpi-property', 'acpi-prm' and ↵Rafael J. Wysocki
'acpi-apei' Merge assorted changes in ACPI library code for 6.14: - Use usleep_range() instead of msleep() in acpi_os_sleep() to reduce excessive delays due to timer inaccuracy, mostly affecting system suspend and resume (Rafael Wysocki). - Use str_enabled_disabled() string helpers in the ACPI tables parsing code to make it easier to follow (Sunil V L). - Update device properties parsing on systems using ACPI so that data firmware nodes resulting from _DSD evaluation are treated as available in firmware nodes walks (Sakari Ailus). - Fix missing guid_t declaration in linux/prmt.h (Robert Richter). - Update the GHES handling code to follow the global panic= instead of overriding it by force-rebooting the system after a fatal hw error has been reported (Borislav Petkov). * acpi-osl: ACPI: OSL: Use usleep_range() in acpi_os_sleep() * acpi-tables: ACPI: tables: Use string choice helpers * acpi-property: ACPI: property: Consider data nodes as being available * acpi-prm: ACPI: PRM: Fix missing guid_t declaration in linux/prmt.h * acpi-apei: APEI: GHES: Have GHES honor the panic= setting
2025-01-20iommufd/fault: Use a separate spinlock to protect fault->deliver listNicolin Chen
The fault->mutex serializes the fault read()/write() fops and the iommufd_fault_auto_response_faults(), mainly for fault->response. Also, it was conveniently used to fence the fault->deliver in poll() fop and iommufd_fault_iopf_handler(). However, copy_from/to_user() may sleep if pagefaults are enabled. Thus, they could take a long time to wait for user pages to swap in, blocking iommufd_fault_iopf_handler() and its caller that is typically a shared IRQ handler of an IOMMU driver, resulting in a potential global DOS. Instead of reusing the mutex to protect the fault->deliver list, add a separate spinlock, nested under the mutex, to do the job. iommufd_fault_iopf_handler() would no longer be blocked by copy_from/to_user(). Add a free_list in iommufd_auto_response_faults(), so the spinlock can simply fence a fast list_for_each_entry_safe routine. Provide two deliver list helpers for iommufd_fault_fops_read() to use: - Fetch the first iopf_group out of the fault->deliver list - Restore an iopf_group back to the head of the fault->deliver list Lastly, move the mutex closer to the response in the fault structure, and update its kdoc accordingly. Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object") Link: https://patch.msgid.link/r/20250117192901.79491-1-nicolinc@nvidia.com Cc: stable@vger.kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-01-20cpuidle: teo: Skip sleep length computation for low latency constraintsRafael J. Wysocki
If the idle state exit latency constraint is sufficiently low, it is better to avoid the additional latency related to calling tick_nohz_get_sleep_length(). It is also not necessary to compute the sleep length in that case because shallow idle state selection will be forced then regardless of the recent wakeup history. Accordingly, skip the sleep length computation and subsequent checks of the exit latency constraint is low enough. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/6122398.lOV4Wx5bFT@rjwysocki.net
2025-01-20cpuidle: teo: Replace time_span_ns with a flagRafael J. Wysocki
After recent updates, the time_span_ns field in struct teo_cpu has become an indicator on whether or not the most recent wakeup has been "genuine" which may as well be indicated by a bool field without calling local_clock(), so update the code accordingly. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/6010475.MhkbZ0Pkbq@rjwysocki.net
2025-01-20cpuidle: teo: Simplify handling of total events countRafael J. Wysocki
Instead of computing the total events count from scratch every time, decay it and add a PULSE value to it in teo_update(). No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/9388883.CDJkKcVGEf@rjwysocki.net
2025-01-20cpuidle: teo: Skip getting the sleep length if wakeups are very frequentRafael J. Wysocki
Commit 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases") attempted to reduce the governor overhead in some cases by making it avoid obtaining the sleep length (the time till the next timer event) which may be costly. Among other things, after the above commit, tick_nohz_get_sleep_length() was not called any more when idle state 0 was to be returned, which turned out to be problematic and the previous behavior in that respect was restored by commit 4b20b07ce72f ("cpuidle: teo: Don't count non- existent intercepts"). However, commit 6da8f9ba5a87 also caused the governor to avoid calling tick_nohz_get_sleep_length() on systems where idle state 0 is a "polling" one (that is, it is not really an idle state, but a loop continuously executed by the CPU) when the target residency of the idle state to be returned was low enough, so there was no practical need to refine the idle state selection in any way. This change was not removed by the other commit, so now on systems where idle state 0 is a "polling" one, tick_nohz_get_sleep_length() is called when idle state 0 is to be returned, but it is not called when a deeper idle state with sufficiently low target residency is to be returned. That is arguably confusing and inconsistent. Moreover, there is no specific reason why the behavior in question should depend on whether or not idle state 0 is a "polling" one. One way to address this would be to make the governor always call tick_nohz_get_sleep_length() to obtain the sleep length, but that would effectively mean reverting commit 6da8f9ba5a87 and restoring the latency issue that was the reason for doing it. This approach is thus not particularly attractive. To address it differently, notice that if a CPU is woken up very often, this is not likely to be caused by timers in the first place (user space has a default timer slack of 50 us and there are relatively few timers with a deadline shorter than several microseconds in the kernel) and even if it were the case, the potential benefit from using a deep idle state would then be questionable for latency reasons. Therefore, if the majority of CPU wakeups occur within several microseconds, it can be assumed that all wakeups in that range are non-timer and the sleep length need not be determined. Accordingly, introduce a new metric for counting wakeups with the measured idle duration below RESIDENCY_THRESHOLD_NS and modify the idle state selection to skip the tick_nohz_get_sleep_length() invocation if idle state 0 has been selected or the target residency of the candidate idle state is below RESIDENCY_THRESHOLD_NS and the value of the new metric is at least 1/2 of the total event count. Since the above requires the measured idle duration to be determined every time, except for the cases when one of the safety nets has triggered in which the wakeup is counted as a hit in the deepest idle state idle residency range, update the handling of those cases to avoid skipping the idle duration computation when the CPU wakeup is "genuine". Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://patch.msgid.link/3851791.kQq0lBPeGt@rjwysocki.net Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> [ rjw: Renamed a struct field ] [ rjw: Fixed typo in the subject and one in a comment ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-20cpuidle: teo: Simplify counting events used for tick managementRafael J. Wysocki
Replace the tick_hits metric with a new tick_intercepts one that can be used directly when deciding whether or not to stop the scheduler tick and update the governor functional description accordingly. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/1987985.PYKUYFuaPT@rjwysocki.net
2025-01-20cpuidle: teo: Clarify two code commentsRafael J. Wysocki
Rewrite two code comments suposed to explain its behavior that are too concise or not sufficiently clear. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/8472971.T7Z3S40VBb@rjwysocki.net [ rjw: Fixed 2 typos in new comments ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-20cpuidle: teo: Drop local variable prev_intercept_idxRafael J. Wysocki
Local variable prev_intercept_idx in teo_select() is redundant because it cannot be 0 when candidate state index is 0. The prev_intercept_idx value is the index of the deepest enabled idle state, so if it is 0, state 0 is the deepest enabled idle state, in which case it must be the only enabled idle state, but then teo_select() would have returned early before initializing prev_intercept_idx. Thus prev_intercept_idx must be nonzero and the check of it against 0 always passes, so it can be dropped altogether. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/3327997.aeNJFYEL58@rjwysocki.net [ rjw: Fixed typo in the changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-20cpuidle: teo: Combine candidate state index checks against 0Rafael J. Wysocki
There are two candidate state index checks against 0 in teo_select() that need not be separate, so combine them and update comments around them. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/13676346.uLZWGnKmhe@rjwysocki.net
2025-01-20cpuidle: teo: Reorder candidate state index checksRafael J. Wysocki
Since constraint_idx may be 0, the candidate state index may change to 0 after assigning constraint_idx to it, so first check if it is greater than constraint_idx (and update it if so) and then check it against 0. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Tested-by: Aboorva Devarajan <aboorvad@linux.ibm.com> Tested-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/1907276.tdWV9SEqCh@rjwysocki.net