summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-09-11Merge branches 'pm-cpuidle' and 'pm-powercap'Rafael J. Wysocki
Merge cpuidle updates and power capping updates for 6.12-rc1: - Add Granite Rapids Xeon support to intel_idle (Artem Bityutskiy). - Disable promotion to C1E on Jasper Lake and Elkhart Lake in intel_idle (Kai-Heng Feng). - Use scoped device node handling to fix missing of_node_put() and simplify walking OF children in the riscv-sbi cpuidle driver (Krzysztof Kozlowski). - Remove dead code from cpuidle_enter_state() (Dhruva Gole). - Change an error pointer to NULL to fix error handling in the intel_rapl power capping driver (Dan Carpenter). - Fix off by one in get_rpi() in the intel_rapl power capping driver (Dan Carpenter). - Add support for ArrowLake-U to the intel_rapl power capping driver (Sumeet Pawnikar). - Fix the energy-pkg event for AMD CPUs in the intel_rapl power capping driver (Dhananjay Ugwekar). - Add support for AMD family 1Ah processors to the intel_rapl power capping driver (Dhananjay Ugwekar). * pm-cpuidle: cpuidle: remove dead code from cpuidle_enter_state() cpuidle: riscv-sbi: Simplify with scoped for each OF child loop cpuidle: riscv-sbi: Use scoped device node handling to fix missing of_node_put intel_idle: Disable promotion to C1E on Jasper Lake and Elkhart Lake intel_idle: add Granite Rapids Xeon support * pm-powercap: powercap: intel_rapl: Change an error pointer to NULL powercap: intel_rapl: Fix off by one in get_rpi() powercap: intel_rapl: Add support for ArrowLake-U platform powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs powercap/intel_rapl: Add support for AMD family 1Ah
2024-09-11block: implement async io_uring discard cmdPavel Begunkov
io_uring allows implementing custom file specific asynchronous operations via the fops->uring_cmd callback, a.k.a. IORING_OP_URING_CMD requests or just io_uring commands. Use it to add support for async discards. Normally, it first tries to queue up bios in a non-blocking context, and if that fails, we'd retry from a blocking context by returning -EAGAIN to the core io_uring. We always get the result from bios asynchronously by setting a custom bi_end_io callback, at which point we drag the request into the task context to either reissue or complete it and post a completion to the user. Unlike ioctl(BLKDISCARD) with stronger guarantees against races, we only do a best effort attempt to invalidate page cache, and it can race with any writes and reads and leave page cache stale. It's the same kind of races we allow to direct writes. Also, apart from cases where discarding is not allowed at all, e.g. discards are not supported or the file/device is read only, the user should assume that the sector range on disk is not valid anymore, even when an error was returned to the user. Suggested-by: Conrad Meyer <conradmeyer@meta.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2b5210443e4fa0257934f73dfafcc18a77cd0e09.1726072086.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-11block: introduce blk_validate_byte_range()Pavel Begunkov
In preparation to further changes extract a helper function out of blk_ioctl_discard() that validates if we can do IO against the given range of disk byte addresses. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/19a7779323c71e742a2f511e4cf49efcfd68cfd4.1726072086.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-11filemap: introduce filemap_invalidate_pagesPavel Begunkov
kiocb_invalidate_pages() is useful for the write path, however not everything is backed by kiocb and we want to reuse the function for bio based discard implementation. Extract and and reuse a new helper called filemap_invalidate_pages(), which takes a argument indicating whether it should be non-blocking and might return -EAGAIN. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f81374b52c92d0dce0f01a279d1eed42b54056aa.1726072086.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-11io_uring/cmd: give inline space in request to cmdsPavel Begunkov
Some io_uring commands can use some inline space in io_kiocb. We have 32 bytes in struct io_uring_cmd, expose it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7ca779a61ee5e166e535d70df9c7f07b15d8a0ce.1726072086.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-11io_uring/cmd: expose iowq to cmdsPavel Begunkov
When an io_uring request needs blocking context we offload it to the io_uring's thread pool called io-wq. We can get there off ->uring_cmd by returning -EAGAIN, but there is no straightforward way of doing that from an asynchronous callback. Add a helper that would transfer a command to a blocking context. Note, we do an extra hop via task_work before io_queue_iowq(), that's a limitation of io_uring infra we have that can likely be lifted later if that would ever become a problem. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f735f807d7c8ba50c9452c69dfe5d3e9e535037b.1726072086.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-11Merge branch 'for-6.12/io_uring' into for-6.12/io_uring-discardJens Axboe
* for-6.12/io_uring: (31 commits) io_uring/io-wq: inherit cpuset of cgroup in io worker io_uring/io-wq: do not allow pinning outside of cpuset io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common() io_uring/rw: treat -EOPNOTSUPP for IOCB_NOWAIT like -EAGAIN io_uring/sqpoll: do not allow pinning outside of cpuset io_uring/eventfd: move refs to refcount_t io_uring: remove unused rsrc_put_fn io_uring: add new line after variable declaration io_uring: add GCOV_PROFILE_URING Kconfig option io_uring/kbuf: add support for incremental buffer consumption io_uring/kbuf: pass in 'len' argument for buffer commit Revert "io_uring: Require zeroed sqe->len on provided-buffers send" io_uring/kbuf: move io_ring_head_to_buf() to kbuf.h io_uring/kbuf: add io_kbuf_commit() helper io_uring/kbuf: shrink nr_iovs/mode in struct buf_sel_arg io_uring: wire up min batch wake timeout io_uring: add support for batch wait timeout io_uring: implement our own schedule timeout handling io_uring: move schedule wait logic into helper io_uring: encapsulate extraneous wait flags into a separate struct ...
2024-09-11Merge branch 'for-6.12/block' into for-6.12/io_uring-discardJens Axboe
* for-6.12/block: (115 commits) block: unpin user pages belonging to a folio at once mm: release number of pages of a folio block: introduce folio awareness and add a bigger size from folio block: Added folio-ized version of bio_add_hw_page() block, bfq: factor out a helper to split bfqq in bfq_init_rq() block, bfq: remove local variable 'bfqq_already_existing' in bfq_init_rq() block, bfq: remove local variable 'split' in bfq_init_rq() block, bfq: remove bfq_log_bfqg() block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator() block, bfq: fix procress reference leakage for bfqq in merge chain block, bfq: fix uaf for accessing waker_bfqq after splitting blk-throttle: support prioritized processing of metadata blk-throttle: remove last_low_overflow_time drbd: Add NULL check for net_conf to prevent dereference in state validation blk-mq: add missing unplug trace event mtip32xx: Remove redundant null pointer checks in mtip_hw_debugfs_init() md: Add new_level sysfs interface zram: Shrink zram_table_entry::flags. zram: Remove ZRAM_LOCK zram: Replace bit spinlocks with a spinlock_t. ...
2024-09-11Merge branch 'pm-cpufreq'Rafael J. Wysocki
Merge cpufreq updates for 6.12-rc1: - Remove LATENCY_MULTIPLIER from cpufreq (Qais Yousef). - Add support for Granite Rapids and Sierra Forest in OOB mode to the intel_pstate cpufreq driver (Srinivas Pandruvada). - Add basic support for CPU capacity scaling on x86 and make the intel_pstate driver set asymmetric CPU capacity on hybrid systems without SMT (Rafael Wysocki). - Add missing MODULE_DESCRIPTION() macros to the powerpc cpufreq driver (Jeff Johnson). - Several OF related cleanups in cpufreq drivers (Rob Herring). - Enable COMPILE_TEST for ARM drivers (Rob Herrring). - Introduce quirks for syscon failures and use socinfo to get revision for TI cpufreq driver (Dhruva Gole, Nishanth Menon). - Minor cleanups in amd-pstate driver (Anastasia Belova, Dhananjay Ugwekar). - Minor cleanups for loongson, cpufreq-dt and powernv cpufreq drivers (Danila Tikhonov, Huacai Chen, and Liu Jing). - Make amd-pstate validate return of any attempt to update EPP limits, which fixes the masking hardware problems (Mario Limonciello). - Move the calculation of the AMD boost numerator outside of amd-pstate, correcting acpi-cpufreq on systems with preferred cores (Mario Limonciello). - Harden preferred core detection in amd-pstate to avoid potential false positives (Mario Limonciello). - Add extra unit test coverage for mode state machine (Mario Limonciello). - Fix an "Uninitialized variables" issue in amd-pstste (Qianqiang Liu). * pm-cpufreq: (35 commits) cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue cpufreq/amd-pstate-ut: Add test case for mode switches cpufreq/amd-pstate: Export symbols for changing modes amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking` cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore` cpufreq: amd-pstate: Optimize amd_pstate_update_limits() cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator() x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator() x86/amd: Move amd_get_highest_perf() out of amd-pstate ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn ACPI: CPPC: Drop check for non zero perf ratio x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator() ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c cpufreq/amd-pstate: Catch failures for amd_pstate_epp_update_limit() cpufreq: ti-cpufreq: Use socinfo to get revision in AM62 family cpufreq: Fix the cacography in powernv-cpufreq.c cpufreq: ti-cpufreq: Introduce quirks to handle syscon fails appropriately cpufreq: loongson3: Use raw_smp_processor_id() in do_service_request() cpufreq: amd-pstate: add check for cpufreq_cpu_get's return value ...
2024-09-11Merge tag 'amd-pstate-v6.12-2024-09-11' of ↵Rafael J. Wysocki
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux Merge the second round of amd-pstate changes for 6.12 from Mario Limonciello: "* Move the calculation of the AMD boost numerator outside of amd-pstate, correcting acpi-cpufreq on systems with preferred cores * Harden preferred core detection to avoid potential false positives * Add extra unit test coverage for mode state machine" * tag 'amd-pstate-v6.12-2024-09-11' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/superm1/linux: cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issue cpufreq/amd-pstate-ut: Add test case for mode switches cpufreq/amd-pstate: Export symbols for changing modes amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking` cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore` cpufreq: amd-pstate: Optimize amd_pstate_update_limits() cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator() x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator() x86/amd: Move amd_get_highest_perf() out of amd-pstate ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn ACPI: CPPC: Drop check for non zero perf ratio x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator() ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIB x86/amd: Move amd_get_highest_perf() from amd.c to cppc.c
2024-09-11Merge branch 'bpf: Allow skb dynptr for tp_btf'Martin KaFai Lau
Philo Lu says: ==================== This makes bpf_dynptr_from_skb usable for tp_btf, so that we can easily parse skb in tracepoints. This has been discussed in [0], and Martin suggested to use dynptr (instead of helpers like bpf_skb_load_bytes). For safety, skb dynptr shouldn't be used in fentry/fexit. This is achieved by add KF_TRUSTED_ARGS flag in bpf_dynptr_from_skb defination, because pointers passed by tracepoint are trusted (PTR_TRUSTED) while those of fentry/fexit are not. Another problem raises that NULL pointers could be passed to tracepoint, such as trace_tcp_send_reset, and we need to recognize them. This is done by add a "__nullable" suffix in the func_proto of the tracepoint, discussed in [1]. 2 Test cases are added, one for "__nullable" suffix, and the other for using skb dynptr in tp_btf. changelog v2 -> v3 (Andrii Nakryiko): Patch 1: - Remove prog type check in prog_arg_maybe_null() - Add bpf_put_raw_tracepoint() after get() - Use kallsyms_lookup() instead of sprintf("%ps") Patch 2: Add separate test "tp_btf_nullable", and use full failure msg v1 -> v2: - Add "__nullable" suffix support (Alexei Starovoitov) - Replace "struct __sk_buff*" with "void*" in test (Martin KaFai Lau) [0] https://lore.kernel.org/all/20240205121038.41344-1-lulie@linux.alibaba.com/T/ [1] https://lore.kernel.org/all/20240430121805.104618-1-lulie@linux.alibaba.com/T/ ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11selftests/bpf: Expand skb dynptr selftests for tp_btfPhilo Lu
Add 3 test cases for skb dynptr used in tp_btf: - test_dynptr_skb_tp_btf: use skb dynptr in tp_btf and make sure it is read-only. - skb_invalid_ctx_fentry/skb_invalid_ctx_fexit: bpf_dynptr_from_skb should fail in fentry/fexit. In test_dynptr_skb_tp_btf, to trigger the tracepoint in kfree_skb, test_pkt_access is used for its test_run, as in kfree_skb.c. Because the test process is different from others, a new setup type is defined, i.e., SETUP_SKB_PROG_TP. The result is like: $ ./test_progs -t 'dynptr/test_dynptr_skb_tp_btf' #84/14 dynptr/test_dynptr_skb_tp_btf:OK #84 dynptr:OK #127 kfunc_dynptr_param:OK Summary: 2/1 PASSED, 0 SKIPPED, 0 FAILED $ ./test_progs -t 'dynptr/skb_invalid_ctx_f' #84/85 dynptr/skb_invalid_ctx_fentry:OK #84/86 dynptr/skb_invalid_ctx_fexit:OK #84 dynptr:OK #127 kfunc_dynptr_param:OK Summary: 2/2 PASSED, 0 SKIPPED, 0 FAILED Also fix two coding style nits (change spaces to tabs). Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Link: https://lore.kernel.org/r/20240911033719.91468-6-lulie@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11bpf: Allow bpf_dynptr_from_skb() for tp_btfPhilo Lu
Making tp_btf able to use bpf_dynptr_from_skb(), which is useful for skb parsing, especially for non-linear paged skb data. This is achieved by adding KF_TRUSTED_ARGS flag to bpf_dynptr_from_skb and registering it for TRACING progs. With KF_TRUSTED_ARGS, args from fentry/fexit are excluded, so that unsafe progs like fexit/__kfree_skb are not allowed. We also need the skb dynptr to be read-only in tp_btf. Because may_access_direct_pkt_data() returns false by default when checking bpf_dynptr_from_skb, there is no need to add BPF_PROG_TYPE_TRACING to it explicitly. Suggested-by: Martin KaFai Lau <martin.lau@linux.dev> Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240911033719.91468-5-lulie@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11tcp: Use skb__nullable in trace_tcp_send_resetPhilo Lu
Replace skb with skb__nullable as the argument name. The suffix tells bpf verifier through btf that the arg could be NULL and should be checked in tp_btf prog. For now, this is the only nullable argument in tcp tracepoints. Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240911033719.91468-4-lulie@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11selftests/bpf: Add test for __nullable suffix in tp_btfPhilo Lu
Add a tracepoint with __nullable suffix in bpf_testmod, and add cases for it: $ ./test_progs -t "tp_btf_nullable" #406/1 tp_btf_nullable/handle_tp_btf_nullable_bare1:OK #406/2 tp_btf_nullable/handle_tp_btf_nullable_bare2:OK #406 tp_btf_nullable:OK Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Link: https://lore.kernel.org/r/20240911033719.91468-3-lulie@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11bpf: Support __nullable argument suffix for tp_btfPhilo Lu
Pointers passed to tp_btf were trusted to be valid, but some tracepoints do take NULL pointer as input, such as trace_tcp_send_reset(). Then the invalid memory access cannot be detected by verifier. This patch fix it by add a suffix "__nullable" to the unreliable argument. The suffix is shown in btf, and PTR_MAYBE_NULL will be added to nullable arguments. Then users must check the pointer before use it. A problem here is that we use "btf_trace_##call" to search func_proto. As it is a typedef, argument names as well as the suffix are not recorded. To solve this, I use bpf_raw_event_map to find "__bpf_trace##template" from "btf_trace_##call", and then we can see the suffix. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Link: https://lore.kernel.org/r/20240911033719.91468-2-lulie@linux.alibaba.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-09-11i2c: aspeed: Update the stop sw state when the bus recovery occursTommy Huang
When the i2c bus recovery occurs, driver will send i2c stop command in the scl low condition. In this case the sw state will still keep original situation. Under multi-master usage, i2c bus recovery will be called when i2c transfer timeout occurs. Update the stop command calling with aspeed_i2c_do_stop function to update master_state. Fixes: f327c686d3ba ("i2c: aspeed: added driver for Aspeed I2C") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Tommy Huang <tommy_huang@aspeedtech.com> Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
2024-09-11cpufreq/amd-pstate-ut: Fix an "Uninitialized variables" issueQianqiang Liu
Using uninitialized value "mode2" when calling "amd_pstate_get_mode_string". Set "mode2" to "AMD_PSTATE_DISABLE" by default. Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com> Link: https://lore.kernel.org/r/20240910233923.46470-1-qianqiang.liu@163.com Acked-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11selftests: kselftest: Use strerror() on nolibczhang jiao
Nolibc gained an implementation of strerror() recently. Use it and drop the ifndef. Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com> Acked-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-09-11cpufreq/amd-pstate-ut: Add test case for mode switchesMario Limonciello
There is a state machine in the amd-pstate driver utilized for switches for all modes. To make sure that cleanup and setup works properly for each mode add a unit test case that tries all combinations. Reviewed-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11LoongArch: KVM: Add vm migration support for LBT registersBibo Mao
Every vcpu has separate LBT registers. And there are four scr registers, one flags and ftop register for LBT extension. When VM migrates, VMM needs to get LBT registers for every vcpu. Here macro KVM_REG_LOONGARCH_LBT is added for new vcpu lbt register type, the following macro is added to get/put LBT registers. KVM_REG_LOONGARCH_LBT_SCR0 KVM_REG_LOONGARCH_LBT_SCR1 KVM_REG_LOONGARCH_LBT_SCR2 KVM_REG_LOONGARCH_LBT_SCR3 KVM_REG_LOONGARCH_LBT_EFLAGS KVM_REG_LOONGARCH_LBT_FTOP Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-11LoongArch: KVM: Add Binary Translation extension supportBibo Mao
Loongson Binary Translation (LBT) is used to accelerate binary translation, which contains 4 scratch registers (scr0 to scr3), x86/ARM eflags (eflags) and x87 fpu stack pointer (ftop). Like FPU extension, here a lazy enabling method is used for LBT. the LBT context is saved/restored on the vcpu context switch path. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-11LoongArch: KVM: Add VM feature detection functionBibo Mao
Loongson SIMD Extension (LSX), Loongson Advanced SIMD Extension (LASX) and Loongson Binary Translation (LBT) features are defined in register CPUCFG2. Two kinds of LSX/LASX/LBT feature detection are added here, one is VCPU feature, and the other is VM feature. VCPU feature dection can only work with VCPU thread itself, and requires VCPU thread is created already. So LSX/LASX/LBT feature detection for VM is added also, it can be done even if VM is not created, and also can be done by any threads besides VCPU threads. Here ioctl command KVM_HAS_DEVICE_ATTR is added for VM, and macro KVM_LOONGARCH_VM_FEAT_CTRL is added to check supported feature. And five sub-features relative with LSX/LASX/LBT are added as following: KVM_LOONGARCH_VM_FEAT_LSX KVM_LOONGARCH_VM_FEAT_LASX KVM_LOONGARCH_VM_FEAT_X86BT KVM_LOONGARCH_VM_FEAT_ARMBT KVM_LOONGARCH_VM_FEAT_MIPSBT Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-11LoongArch: Revert qspinlock to test-and-set simple lock on VMBibo Mao
Similar with x86, when VM is detected, revert to a simple test-and-set lock to avoid the horrors of queue preemption. Tested on 3C5000 Dual-way machine with 32 cores and 2 numa nodes, test case is kcbench on kernel mainline 6.10, the detailed command is "kcbench --src /root/src/linux" Performance on host machine kernel compile time performance impact Original 150.29 seconds With patch 150.19 seconds almost no impact Performance on virtual machine: 1. 1 VM with 32 vCPUs and 2 numa node, numa node pinned kernel compile time performance impact Original 170.87 seconds With patch 171.73 seconds almost no impact 2. 2 VMs, each VM with 32 vCPUs and 2 numa node, numa node pinned kernel compile time performance impact Original 2362.04 seconds With patch 354.73 seconds +565% Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-09-11cpufreq/amd-pstate: Export symbols for changing modesMario Limonciello
In order to effectively test all mode switch combinations export everything necessarily for amd-pstate-ut to trigger a mode switch. Reviewed-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11amd-pstate: Add missing documentation for `amd_pstate_prefcore_ranking`Mario Limonciello
`amd_pstate_prefcore_ranking` reflects the dynamic rankings of a CPU core based on platform conditions. Explicitly include it in the documentation. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11cpufreq: amd-pstate: Add documentation for `amd_pstate_hw_prefcore`Mario Limonciello
Explain that the sysfs file represents both preferred core being enabled by the user and supported by the hardware. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11cpufreq: amd-pstate: Optimize amd_pstate_update_limits()Mario Limonciello
Don't take and release the mutex when prefcore isn't present and avoid initialization of variables that will be initially set in the function. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Reviewed-by: Perry Yuan <perry.yuan@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into ↵Mario Limonciello
amd_get_boost_ratio_numerator() The special case in amd_pstate_highest_perf_set() is the value used for calculating the boost numerator. Merge this into amd_get_boost_ratio_numerator() and then use that to calculate boost ratio. This allows dropping more special casing of the highest perf value. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()Mario Limonciello
AMD systems that support preferred cores will use "166" as their numerator for max frequency calculations instead of "255". Add a function for detecting preferred cores by looking at the highest perf value on all cores. If preferred cores are enabled return 166 and if disabled the value in the highest perf register. As the function will be called multiple times, cache the values for the boost numerator and if preferred cores will be enabled in global variables. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11x86/amd: Move amd_get_highest_perf() out of amd-pstateMario Limonciello
amd_pstate_get_highest_perf() is a helper used to get the highest perf value on AMD systems. It's used in amd-pstate as part of preferred core handling, but applicable for acpi-cpufreq as well. Move it out to cppc handling code as amd_get_highest_perf(). Reviewed-by: Perry Yuan <perry.yuan@amd.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warnMario Limonciello
If the boost ratio isn't calculated properly for the system for any reason this can cause other problems that are non-obvious. Raise all messages to warn instead. Suggested-by: Perry Yuan <Perry.Yuan@amd.com> Reviewed-by: Perry Yuan <perry.yuan@amd.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11ACPI: CPPC: Drop check for non zero perf ratioMario Limonciello
perf_ratio is a u64 and SCHED_CAPACITY_SCALE is a large number. Shifting by one will never have a zero value. Drop the check. Suggested-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Reviewed-by: Gautham R. Shenoy <gautham.sheoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11x86/amd: Rename amd_get_highest_perf() to amd_get_boost_ratio_numerator()Mario Limonciello
The function name is ambiguous because it returns an intermediate value for calculating maximum frequency rather than the CPPC 'Highest Perf' register. Rename the function to clarify its use and allow the function to return errors. Adjust the consumer in acpi-cpufreq to catch errors. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11ACPI: CPPC: Adjust return code for inline functions in !CONFIG_ACPI_CPPC_LIBMario Limonciello
Checkpath emits the following warning: ``` WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP ``` Adjust the code accordingly. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11x86/amd: Move amd_get_highest_perf() from amd.c to cppc.cMario Limonciello
To prepare to let amd_get_highest_perf() detect preferred cores it will require CPPC functions. Move amd_get_highest_perf() to cppc.c to prepare for 'preferred core detection' rework. No functional changes intended. Reviewed-by: Perry Yuan <perry.yuan@amd.com> Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
2024-09-11ASoC: meson: axg-card: fix 'use-after-free'Arseniy Krasnov
Buffer 'card->dai_link' is reallocated in 'meson_card_reallocate_links()', so move 'pad' pointer initialization after this function when memory is already reallocated. Kasan bug report: ================================================================== BUG: KASAN: slab-use-after-free in axg_card_add_link+0x76c/0x9bc Read of size 8 at addr ffff000000e8b260 by task modprobe/356 CPU: 0 PID: 356 Comm: modprobe Tainted: G O 6.9.12-sdkernel #1 Call trace: dump_backtrace+0x94/0xec show_stack+0x18/0x24 dump_stack_lvl+0x78/0x90 print_report+0xfc/0x5c0 kasan_report+0xb8/0xfc __asan_load8+0x9c/0xb8 axg_card_add_link+0x76c/0x9bc [snd_soc_meson_axg_sound_card] meson_card_probe+0x344/0x3b8 [snd_soc_meson_card_utils] platform_probe+0x8c/0xf4 really_probe+0x110/0x39c __driver_probe_device+0xb8/0x18c driver_probe_device+0x108/0x1d8 __driver_attach+0xd0/0x25c bus_for_each_dev+0xe0/0x154 driver_attach+0x34/0x44 bus_add_driver+0x134/0x294 driver_register+0xa8/0x1e8 __platform_driver_register+0x44/0x54 axg_card_pdrv_init+0x20/0x1000 [snd_soc_meson_axg_sound_card] do_one_initcall+0xdc/0x25c do_init_module+0x10c/0x334 load_module+0x24c4/0x26cc init_module_from_file+0xd4/0x128 __arm64_sys_finit_module+0x1f4/0x41c invoke_syscall+0x60/0x188 el0_svc_common.constprop.0+0x78/0x13c do_el0_svc+0x30/0x40 el0_svc+0x38/0x78 el0t_64_sync_handler+0x100/0x12c el0t_64_sync+0x190/0x194 Fixes: 7864a79f37b5 ("ASoC: meson: add axg sound card support") Cc: Stable@vger.kernel.org Signed-off-by: Arseniy Krasnov <avkrasnov@salutedevices.com> Reviewed-by: Jerome Brunet <jbrunet@baylibre.com> Link: https://patch.msgid.link/20240911142425.598631-1-avkrasnov@salutedevices.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-11ASoC: dt-bindings: microchip,sama7g5-spdifrx: Add common DAI referenceAndrei Simion
Update the spdifrx yaml file to reference the dai-common.yaml schema, enabling the use of the 'sound-name-prefix' property Signed-off-by: Andrei Simion <andrei.simion@microchip.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20240910082202.45972-1-andrei.simion@microchip.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-11ASoC: dt-bindings: renesas,rsnd: add post-init-providers propertyKuninori Morimoto
At least if rsnd is using DPCM connection on Audio-Graph-Card2, fw_devlink might doesn't have enough information to break the cycle (Same problem might occur with Multi-CPU/Codec or Codec2Codec). In such case, rsnd driver will not be probed. Add post-init-providers support to break the link cycle. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/87wmjkifob.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-11Add support for primary mi2s on SM8250Mark Brown
Merge series from Jens Reidel <adrian@travitia.xyz>: This patch adds support for the primary mi2s interface on devices using SM8250 audio drivers. Tested on SM7150 (xiaomi-davinci). SM7150 sound is close to SM8250 and we intend to use it as a fallback in the future. To: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> To: Liam Girdwood <lgirdwood@gmail.com> To: Mark Brown <broonie@kernel.org> To: Jaroslav Kysela <perex@perex.cz> To: Takashi Iwai <tiwai@suse.com> Cc: alsa-devel@alsa-project.org Cc: linux-arm-msm@vger.kernel.org Cc: linux-sound@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux@mainlining.org Jens Reidel (1): ASoC: qcom: sm8250: enable primary mi2s sound/soc/qcom/sm8250.c | 8 ++++++++ 1 file changed, 8 insertions(+) -- 2.46.0
2024-09-11drm/amd/display: Add all planes on CRTC to state for overlay cursorLeo Li
[Why] DC has a special commit path for native cursor, which use the built-in cursor pipe within DCN planes. This update path does not require all enabled planes to be added to the list of surface updates sent to DC. This is not the case for overlay cursor; it uses the same path as MPO commits. This update path requires all enabled planes to be added to the list of surface updates sent to DC. Otherwise, DC will disable planes not inside the list. [How] If overlay cursor is needed, add all planes on the same CRTC as this cursor to the atomic state. This is already done for non-cursor planes (MPO), just before the added lines. Fixes: 1b04dcca4fb1 ("drm/amd/display: Introduce overlay cursor mode") Closes: https://lore.kernel.org/lkml/f68020a3-c413-482d-beb2-5432d98a1d3e@amd.com Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0c8c5bdd7eaf291b6f727e98506fb68acee3a4cc)
2024-09-11regulator: core: fix the broken behavior of regulator_dev_lookup()Wei Fang
The behavior of regulator_dev_lookup() for non-DT way has been broken since the commit b8c325545714 ("regulator: Move OF-specific regulator lookup code to of_regulator.c"). Before the commit, of_get_regulator() was used to get the regulator, which returns NULL if the regulator is not found. So the regulator will be looked up through regulator_lookup_by_name() if no matching regulator is found in regulator_map_list. However, currently, of_regulator_dev_lookup() is used to instead of of_get_regulator(), but the variable 'r' is set to ERR_PTR(-ENODEV) instead of NULL if the regulator is not found. In this case, if no regulator is found in regulator_map_list, the variable 'r' is still ERR_PTR(-ENODEV), So regulator_dev_lookup() returns the value of 'r' directly instead of continuing to look up the regulator through regulator_lookup_by_name(). Fixes: b8c325545714 ("regulator: Move OF-specific regulator lookup code to of_regulator.c") Signed-off-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20240911120338.526384-1-wei.fang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-11ASoC: atmel: mchp-pdmc: Add snd_soc_dai_driver nameCodrin Ciubotariu
Set snd_soc_dai_driver name to improve controller's display of the DAI name. Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com> Signed-off-by: Andrei Simion <andrei.simion@microchip.com> Link: https://patch.msgid.link/20240911122909.133399-3-andrei.simion@microchip.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-11ASoC: atmel: mchp-pdmc: Improve maxburst calculation for better performanceCodrin Ciubotariu
Improve the DMA descriptor calculation by dividing the period size by the product of sample size and DMA chunk size, rather than just DMA chunk size. Ensure that all DMA descriptors start from a well-aligned address to improve the reliability and efficiency of DMA operations and avoid potential issues related to misaligned descriptors. [andrei.simion@microchip.com: Adjust the commit title. Reword the commit message. Add MACROS for each DMA size chunk supported by mchp-pdmc. Add DMA_BURST_ALIGNED preprocesor function to check the alignment of the DMA burst.] Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com> Signed-off-by: Andrei Simion <andrei.simion@microchip.com> Link: https://patch.msgid.link/20240911122909.133399-2-andrei.simion@microchip.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-09-11drm/amd/display: Add all planes on CRTC to state for overlay cursorLeo Li
[Why] DC has a special commit path for native cursor, which use the built-in cursor pipe within DCN planes. This update path does not require all enabled planes to be added to the list of surface updates sent to DC. This is not the case for overlay cursor; it uses the same path as MPO commits. This update path requires all enabled planes to be added to the list of surface updates sent to DC. Otherwise, DC will disable planes not inside the list. [How] If overlay cursor is needed, add all planes on the same CRTC as this cursor to the atomic state. This is already done for non-cursor planes (MPO), just before the added lines. Fixes: 1b04dcca4fb1 ("drm/amd/display: Introduce overlay cursor mode") Closes: https://lore.kernel.org/lkml/f68020a3-c413-482d-beb2-5432d98a1d3e@amd.com Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Leo Li <sunpeng.li@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-11bpf, cpumap: Move xdp:xdp_cpumap_kthread tracepoint before rcvDaniel Xu
cpumap takes RX processing out of softirq and onto a separate kthread. Since the kthread needs to be scheduled in order to run (versus softirq which does not), we can theoretically experience extra latency if the system is under load and the scheduler is being unfair to us. Moving the tracepoint to before passing the skb list up the stack allows users to more accurately measure enqueue/dequeue latency introduced by cpumap via xdp:xdp_cpumap_enqueue and xdp:xdp_cpumap_kthread tracepoints. f9419f7bd7a5 ("bpf: cpumap add tracepoints") which added the tracepoints states that the intent behind them was for general observability and for a feedback loop to see if the queues are being overwhelmed. This change does not mess with either of those use cases but rather adds a third one. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://lore.kernel.org/bpf/47615d5b5e302e4bd30220473779e98b492d47cd.1725585718.git.dxu@dxuuu.xyz
2024-09-11ALSA: pcm: Fix breakage of PCM rates used for topologyTakashi Iwai
It turned out that the topology ABI takes the standard PCM rate bits as is, and it means that the recent change of the PCM rate bits would lead to the inconsistent rate values used for topology. This patch reverts the original PCM rate bit definitions while adding the new rates to the extended bits instead. This needed the change of snd_pcm_known_rates, too. And this also required to fix the handling in snd_pcm_hw_limit_rates() that blindly assumed that the list is sorted while it became unsorted now. Fixes: 090624b7dc83 ("ALSA: pcm: add more sample rate definitions") Reported-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Closes: https://lore.kernel.org/1ab3efaa-863c-4dd0-8f81-b50fd9775fad@linux.intel.com Reviewed-by: Jaroslav Kysela <perex@perex.cz> Tested-by: Jerome Brunet <jbrunet@baylibre.com> Tested-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://patch.msgid.link/20240911135756.24434-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-09-11selftests/xsk: Read current MAX_SKB_FRAGS from sysctl knobMaciej Fijalkowski
Currently, xskxceiver assumes that MAX_SKB_FRAGS value is always 17 which is not true - since the introduction of BIG TCP this can now take any value between 17 to 45 via CONFIG_MAX_SKB_FRAGS. Adjust the TOO_MANY_FRAGS test case to read the currently configured MAX_SKB_FRAGS value by reading it from /proc/sys/net/core/max_skb_frags. If running system does not provide that sysctl file then let us try running the test with a default value. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/bpf/20240910124129.289874-1-maciej.fijalkowski@intel.com
2024-09-11io_uring/io-wq: inherit cpuset of cgroup in io workerFelix Moessbauer
The io worker threads are userland threads that just never exit to the userland. By that, they are also assigned to a cgroup (the group of the creating task). When creating a new io worker, this worker should inherit the cpuset of the cgroup. Fixes: da64d6db3bd3 ("io_uring: One wqe per wq") Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com> Link: https://lore.kernel.org/r/20240910171157.166423-3-felix.moessbauer@siemens.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-09-11io_uring/io-wq: do not allow pinning outside of cpusetFelix Moessbauer
The io worker threads are userland threads that just never exit to the userland. By that, they are also assigned to a cgroup (the group of the creating task). When changing the affinity of the io_wq thread via syscall, we must only allow cpumasks within the limits defined by the cpuset controller of the cgroup (if enabled). Fixes: da64d6db3bd3 ("io_uring: One wqe per wq") Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com> Link: https://lore.kernel.org/r/20240910171157.166423-2-felix.moessbauer@siemens.com Signed-off-by: Jens Axboe <axboe@kernel.dk>