summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-27wifi: iwlwifi: mld: Work around Clang loop unrolling bugKees Cook
The nested loop in iwl_mld_send_proto_offload() confuses Clang into thinking there could be a final loop iteration past the end of the "nsc" array (which is only 4 entries). The FORTIFY checking in memcmp() (via ipv6_addr_cmp()) notices this (due to the available bytes in the out-of-bounds position of &nsc[4] being 0), and errors out, failing the build. For some reason (likely due to architectural loop unrolling configurations), this is only exposed on ARM builds currently. Due to Clang's lack of inline tracking[1], the warning is not very helpful: include/linux/fortify-string.h:719:4: error: call to '__read_overflow' declared with 'error' attribute: detected read beyond size of object (1st parameter) 719 | __read_overflow(); | ^ 1 error generated. But this was tracked down to iwl_mld_send_proto_offload()'s ipv6_addr_cmp() call. An upstream Clang bug has been filed[2] to track this. For now fix the build by explicitly bounding the inner loop by "n_nsc", which is what "c" is already limited to. Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://github.com/ClangBuiltLinux/linux/issues/2076 Link: https://github.com/llvm/llvm-project/pull/73552 [1] Link: https://github.com/llvm/llvm-project/issues/136603 [2] Link: https://lore.kernel.org/r/20250421204153.work.935-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-05-27dt-bindings: watchdog: fsl-imx-wdt: add compatible string fsl,ls1021a-wdtFrank Li
Add compatible string fsl,ls1021a-wdt for ls1021a SoC. fsl,ls1021a-wdt allow big-endian property. Signed-off-by: Frank Li <Frank.Li@nxp.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20250522194732.493624-1-Frank.Li@nxp.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-05-27dt-bindings: pinctrl: amlogic,pinctrl-a4: Add missing constraint on allowed ↵Rob Herring (Arm)
'group' node properties The "^group-[0-9a-z-]+$" nodes schema doesn't constrain the allowed properties as the referenced common schemas don't have constraints. Add the missing "unevaluatedProperties" constraint. Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20250507215852.2748420-1-robh@kernel.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-05-27Merge patch series "dropbehind fixes and cleanups"Christian Brauner
Jens Axboe <axboe@kernel.dk> says: As per the thread here: https://lore.kernel.org/linux-fsdevel/20250525083209.GS2023217@ZenIV/ there was an issue with the dropbehind support, and hence it got reverted (effectively) for the 6.15 kernel release. The problem stems from the fact that the folio can get redirtied and/or scheduled for writeback after the initial dropbehind test, and before we have it locked again for invalidation. Patches 1+2 add a generic helper that both the read and write side can use, and which checks for !dirty && !writeback before going ahead with the invalidation. Patch 3 reverts the FOP_DONTCACHE disable, and patches 4 and 5 do a bit of cleanup work to further unify how the read and write side handling works. This can reasonably be considered a 2 part series, as 1-3 fix the issue and could go to stable, while 4-5 just cleanup the code. * patches from https://lore.kernel.org/20250527133255.452431-1-axboe@kernel.dk: mm/filemap: unify dropbehind flag testing and clearing mm/filemap: unify read/write dropbehind naming Revert "Disable FOP_DONTCACHE for now due to bugs" mm/filemap: use filemap_end_dropbehind() for read invalidation mm/filemap: gate dropbehind invalidate on folio !dirty && !writeback Link: https://lore.kernel.org/20250527133255.452431-1-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-27mm/filemap: unify dropbehind flag testing and clearingJens Axboe
The read and write side does this a bit differently, unify it such that the _{read,write} helpers check the bit before locking, and the generic handler is in charge of clearing the bit and invalidating, once under the folio lock. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/20250527133255.452431-6-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-27mm/filemap: unify read/write dropbehind namingJens Axboe
The read side is filemap_end_dropbehind_read(), while the write side used folio_ as the prefix rather than filemap_. The read side makes more sense, unify the naming such that the write side follows that. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/20250527133255.452431-5-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-27Revert "Disable FOP_DONTCACHE for now due to bugs"Jens Axboe
This reverts commit 478ad02d6844217cc7568619aeb0809d93ade43d. Both the read and write side dirty && writeback races should be resolved now, revert the commit that disabled FOP_DONTCACHE for filesystems. Link: https://lore.kernel.org/linux-fsdevel/20250525083209.GS2023217@ZenIV/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/20250527133255.452431-4-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-27mm/filemap: use filemap_end_dropbehind() for read invalidationJens Axboe
Use the filemap_end_dropbehind() helper rather than calling folio_unmap_invalidate() directly, as we need to check if the folio has been redirtied or marked for writeback once the folio lock has been re-acquired. Cc: stable@vger.kernel.org Reported-by: Trond Myklebust <trondmy@hammerspace.com> Fixes: 8026e49bff9b ("mm/filemap: add read support for RWF_DONTCACHE") Link: https://lore.kernel.org/linux-fsdevel/ba8a9805331ce258a622feaca266b163db681a10.camel@hammerspace.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/20250527133255.452431-3-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-27mm/filemap: gate dropbehind invalidate on folio !dirty && !writebackJens Axboe
It's possible for the folio to either get marked for writeback or redirtied. Add a helper, filemap_end_dropbehind(), which guards the folio_unmap_invalidate() call behind check for the folio being both non-dirty and not under writeback AFTER the folio lock has been acquired. Use this helper folio_end_dropbehind_write(). Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: fb7d3bc41493 ("mm/filemap: drop streaming/uncached pages when writeback completes") Link: https://lore.kernel.org/linux-fsdevel/20250525083209.GS2023217@ZenIV/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/20250527133255.452431-2-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-27Merge tag 'nolibc-20250526-for-6.16-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc Pull nolibc updates from Thomas Weißschuh: - New supported architectures: m68k, SPARC (32 and 64 bit) - Compatibility with kselftest_harness.h - A more robust mechanism to include all of nolibc from each header - Split existing features into new headers to simplify adoption - Compatibility with UBSAN and it is used in the testsuite - Many small new features focussing on usage in kselftests * tag 'nolibc-20250526-for-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc: (83 commits) selftests: harness: Stop using setjmp()/longjmp() selftests: harness: Add "variant" and "self" to test metadata selftests: harness: Add teardown callback to test metadata selftests: harness: Move teardown conditional into test metadata selftests: harness: Don't set setup_completed for fixtureless tests selftests: harness: Implement test timeouts through pidfd selftests: harness: Remove dependency on libatomic selftests: harness: Remove inline qualifier for wrappers selftests: harness: Mark functions without prototypes static selftests: harness: Ignore unused variant argument warning selftests: harness: Use C89 comment style selftests: harness: Add kselftest harness selftest selftests/nolibc: drop include guards around standard headers tools/nolibc: move NULL and offsetof() to sys/stddef.h tools/nolibc: move uname() and friends to sys/utsname.h tools/nolibc: move makedev() and friends to sys/sysmacros.h tools/nolibc: move getrlimit() and friends to sys/resource.h tools/nolibc: move reboot() to sys/reboot.h tools/nolibc: move prctl() to sys/prctl.h tools/nolibc: move mount() to sys/mount.h ...
2025-05-27Merge tag 'docs-6.16' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "A moderately busy cycle for documentation this time around: - The most significant change is the replacement of the old kernel-doc script (a monstrous collection of Perl regexes that predates the Git era) with a Python reimplementation. That, too, is a horrifying collection of regexes, but in a much cleaner and more maintainable structure that integrates far better with the Sphinx build system. This change has been in linux-next for the full 6.15 cycle; the small number of problems that turned up have been addressed, seemingly to everybody's satisfaction. The Perl kernel-doc script remains in tree (as scripts/kernel-doc.pl) and can be used with a command-line option if need be. Unless some reason to keep it around materializes, it will probably go away in 6.17. Credit goes to Mauro Carvalho Chehab for doing all this work. - Some RTLA documentation updates - A handful of Chinese translations - The usual collection of typo fixes, general updates, etc" * tag 'docs-6.16' of git://git.lwn.net/linux: (85 commits) Docs: doc-guide: update sphinx.rst Sphinx version number docs: doc-guide: clarify latest theme usage Documentation/scheduler: Fix typo in sched-stats domain field description scripts: kernel-doc: prevent a KeyError when checking output docs: kerneldoc.py: simplify exception handling logic MAINTAINERS: update linux-doc entry to cover new Python scripts docs: align with scripts/syscall.tbl migration Documentation: NTB: Fix typo Documentation: ioctl-number: Update table intro docs: conf.py: drop backward support for old Sphinx versions Docs: driver-api/basics: add kobject_event interfaces Docs: relay: editing cleanups docs: fix "incase" typo in coresight/panic.rst Fix spelling error for 'parallel' docs: admin-guide: fix typos in reporting-issues.rst docs: dmaengine: add explanation for DMA_ASYNC_TX capability Documentation: leds: improve readibility of multicolor doc docs: fix typo in firmware-related section docs: Makefile: Inherit PYTHONPYCACHEPREFIX setting as env variable Documentation: ioctl-number: Update outdated submission info ...
2025-05-27Merge tag 'lkmm.2025.05.25a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull lkmm updates from Paul McKenney: "Update LKMM documentation: - Cross-references, typos, broken URLs (Akira Yokosawa) - Clarify SRCU explanation (Uladzislau Rezki)" * tag 'lkmm.2025.05.25a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: tools/memory-model/Documentation: Fix SRCU section in explanation.txt tools/memory-model: docs/references: Remove broken link to imgtec.com tools/memory-model: docs/ordering: Fix trivial typos tools/memory-model: docs/simple.txt: Fix trivial typos tools/memory-model: docs/README: Update introduction of locking.txt
2025-05-27Merge branch 'bpf-arm64-support-up-to-12-arguments'Alexei Starovoitov
Alexis Lothoré says: ==================== this is the v2 of the many args series for arm64, being itself a revival of Xu Kuhoai's work to enable larger arguments count for BPF programs on ARM64 ([1]). The discussions in v1 shed some light on some issues around specific cases, for example with functions passing struct on stack with custom packing/alignment attributes: those cases can not be properly detected with the current BTF info. So this new revision aims to separate concerns with a simpler implementation, just accepting additional args on stack if we can make sure about the alignment constraints (and so, refusing attachment to functions passing structs on stacks). I then checked if the specific alignment constraints could be checked with larger scalar types rather than structs, but it appears that this use case is in fact rejected at the verifier level (see a9b59159d338 ("bpf: Do not allow btf_ctx_access with __int128 types")). So in the end the specific alignment corner cases raised in [1] can not really happen in the kernel in its current state. This new revision still brings support for the standard cases as a first step, it will then be possible to iterate on top of it to add the more specific cases like struct passed on stack and larger types. [1] https://lore.kernel.org/all/20230917150752.69612-1-xukuohai@huaweicloud.com/#t Changes in v3: - switch back -EOPNOTSUPP to -ENOTSUPP - fix comment style - group intializations for arg_aux - remove some unneeded round_up - Link to v2: https://lore.kernel.org/r/20250522-many_args_arm64-v2-0-d6afdb9cf819@bootlin.com Changes in v2: - remove alignment computation from btf.c - deduce alignment constraints directly in jit compiler for simple types - deny attachment to functions with "corner-cases" arguments (ie: structs on stack) - remove custom tests, as the corresponding use cases are locked either by the JIT comp or the verifier - drop RFC - Link to v1: https://lore.kernel.org/r/20250411-many_args_arm64-v1-0-0a32fe72339e@bootlin.com ==================== Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Acked-by: Xu Kuohai <xukuohai@huawei.com> Link: https://patch.msgid.link/20250527-many_args_arm64-v3-0-3faf7bb8e4a2@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27selftests/bpf: enable many-args tests for arm64Alexis Lothoré (eBPF Foundation)
Now that support for up to 12 args is enabled for tracing programs on ARM64, enable the existing tests for this feature on this architecture. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20250527-many_args_arm64-v3-2-3faf7bb8e4a2@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27bpf, arm64: Support up to 12 function argumentsXu Kuohai
Currently ARM64 bpf trampoline supports up to 8 function arguments. According to the statistics from commit 473e3150e30a ("bpf, x86: allow function arguments up to 12 for TRACING"), there are about 200 functions accept 9 to 12 arguments, so adding support for up to 12 function arguments. Due to bpf only supporting function arguments up to 16 bytes, according to AAPCS64, starting from the first argument, each argument is first attempted to be loaded to 1 or 2 smallest registers from x0-x7, if there are no enough registers to hold the entire argument, then all remaining arguments starting from this one are pushed to the stack for passing. There are some non-trivial cases for which it is not possible to correctly read arguments from/write arguments to the stack: for example struct variables may have custom packing/alignment attributes that are invisible in BTF info. Such cases are denied for now to make sure not to read incorrect values. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Co-developed-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20250527-many_args_arm64-v3-1-3faf7bb8e4a2@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27Merge tag 'ratelimit.2025.05.25a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull rate-limit updates from Paul McKenney: "lib/ratelimit: Reduce false-positive and silent misses: - Reduce open-coded use of ratelimit_state structure fields. - Convert the ->missed field to atomic_t. - Count misses that are due to lock contention. - Eliminate jiffies=0 special case. - Reduce ___ratelimit() false-positive rate limiting (Petr Mladek). - Allow zero ->burst to hard-disable rate limiting. - Optimize away atomic operations when a miss is guaranteed. - Warn if ->interval or ->burst are negative (Petr Mladek). - Simplify the resulting code. A smoke test and stress test have been created, but they are not yet ready for mainline. With luck, we will offer them for the v6.17 merge window" * tag 'ratelimit.2025.05.25a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: ratelimit: Drop redundant accesses to burst ratelimit: Use nolock_ret restructuring to collapse common case code ratelimit: Use nolock_ret label to collapse lock-failure code ratelimit: Use nolock_ret label to save a couple of lines of code ratelimit: Simplify common-case exit path ratelimit: Warn if ->interval or ->burst are negative ratelimit: Avoid atomic decrement under lock if already rate-limited ratelimit: Avoid atomic decrement if already rate-limited ratelimit: Don't flush misses counter if RATELIMIT_MSG_ON_RELEASE ratelimit: Force re-initialization when rate-limiting re-enabled ratelimit: Allow zero ->burst to disable ratelimiting ratelimit: Reduce ___ratelimit() false-positive rate limiting ratelimit: Avoid jiffies=0 special case ratelimit: Count misses due to lock contention ratelimit: Convert the ->missed field to atomic_t drm/amd/pm: Avoid open-coded use of ratelimit_state structure's internals drm/i915: Avoid open-coded use of ratelimit_state structure's ->missed field random: Avoid open-coded use of ratelimit_state structure's ->missed field ratelimit: Create functions to handle ratelimit_state internals
2025-05-27bpf: Check rcu_read_lock_trace_held() in bpf_map_lookup_percpu_elem()Hou Tao
bpf_map_lookup_percpu_elem() helper is also available for sleepable bpf program. When BPF JIT is disabled or under 32-bit host, bpf_map_lookup_percpu_elem() will not be inlined. Using it in a sleepable bpf program will trigger the warning in bpf_map_lookup_percpu_elem(), because the bpf program only holds rcu_read_lock_trace lock. Therefore, add the missed check. Reported-by: syzbot+dce5aae19ae4d6399986@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/000000000000176a130617420310@google.com/ Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20250526062534.1105938-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27bpf: Avoid __bpf_prog_ret0_warn when jit failsKaFai Wan
syzkaller reported an issue: WARNING: CPU: 3 PID: 217 at kernel/bpf/core.c:2357 __bpf_prog_ret0_warn+0xa/0x20 kernel/bpf/core.c:2357 Modules linked in: CPU: 3 UID: 0 PID: 217 Comm: kworker/u32:6 Not tainted 6.15.0-rc4-syzkaller-00040-g8bac8898fe39 RIP: 0010:__bpf_prog_ret0_warn+0xa/0x20 kernel/bpf/core.c:2357 Call Trace: <TASK> bpf_dispatcher_nop_func include/linux/bpf.h:1316 [inline] __bpf_prog_run include/linux/filter.h:718 [inline] bpf_prog_run include/linux/filter.h:725 [inline] cls_bpf_classify+0x74a/0x1110 net/sched/cls_bpf.c:105 ... When creating bpf program, 'fp->jit_requested' depends on bpf_jit_enable. This issue is triggered because of CONFIG_BPF_JIT_ALWAYS_ON is not set and bpf_jit_enable is set to 1, causing the arch to attempt JIT the prog, but jit failed due to FAULT_INJECTION. As a result, incorrectly treats the program as valid, when the program runs it calls `__bpf_prog_ret0_warn` and triggers the WARN_ON_ONCE(1). Reported-by: syzbot+0903f6d7f285e41cdf10@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/6816e34e.a70a0220.254cdc.002c.GAE@google.com Fixes: fa9dd599b4da ("bpf: get rid of pure_initcall dependency to enable jits") Signed-off-by: KaFai Wan <mannkafai@gmail.com> Link: https://lore.kernel.org/r/20250526133358.2594176-1-mannkafai@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27bpftool: Add support for custom BTF path in prog load/loadallJiayuan Chen
This patch exposes the btf_custom_path feature to bpftool, allowing users to specify a custom BTF file when loading BPF programs using prog load or prog loadall commands. The argument 'btf_custom_path' in libbpf is used for those kernels that don't have CONFIG_DEBUG_INFO_BTF enabled but still want to perform CO-RE relocations. Suggested-by: Quentin Monnet <qmo@kernel.org> Reviewed-by: Quentin Monnet <qmo@kernel.org> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20250516144708.298652-1-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27selftests/bpf: Add unit tests with __bpf_trap() kfuncYonghong Song
Add some inline-asm tests and C tests where __bpf_trap() or __builtin_trap() is used in the code. The __builtin_trap() test is guarded with llvm21 ([1]) since otherwise the compilation failure will happen. [1] https://github.com/llvm/llvm-project/pull/131731 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250523205331.1291734-1-yonghong.song@linux.dev Tested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27Merge tag 'x86_sev_for_v6.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull AMD SEV update from Borislav Petkov: "Add a virtual TPM driver glue which allows a guest kernel to talk to a TPM device emulated by a Secure VM Service Module (SVSM) - a helper module of sorts which runs at a different privilege level in the SEV-SNP VM stack. The intent being that a TPM device is emulated by a trusted entity and not by the untrusted host which is the default assumption in the confidential computing scenarios" * tag 'x86_sev_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Register tpm-svsm platform device tpm: Add SNP SVSM vTPM driver svsm: Add header with SVSM_VTPM_CMD helpers x86/sev: Add SVSM vTPM probe/send_command functions
2025-05-27Merge tag 'x86_mtrr_for_v6.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull mtrr update from Borislav Petkov: "A single change to verify the presence of fixed MTRR ranges before accessing the respective MSRs" * tag 'x86_mtrr_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mtrr: Check if fixed-range MTRRs exist in mtrr_save_fixed_ranges()
2025-05-27Merge tag 'edac_updates_for_v6.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - ie31200: Add support for Raptor Lake-S and Alder Lake-S compute dies - Rework how RRL registers per channel tracking is done in order to support newer hardware with different RRL configurations and refactor that code. Add support for Granite Rapids server - i10nm: explicitly set RRL modes to fix any wrong BIOS programming - Properly save and restore Retry Read error Log channel configuration info on Intel drivers - igen6: Handle correctly the case of fused off memory controllers on Arizona Beach and Amston Lake SoCs before adding support for them - the usual set of fixes and cleanups * tag 'edac_updates_for_v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/bluefield: Don't use bluefield_edac_readl() result on error EDAC/i10nm: Fix the bitwise operation between variables of different sizes EDAC/ie31200: Add two Intel SoCs for EDAC support EDAC/{skx_common,i10nm}: Add RRL support for Intel Granite Rapids server EDAC/{skx_common,i10nm}: Refactor show_retry_rd_err_log() EDAC/{skx_common,i10nm}: Refactor enable_retry_rd_err_log() EDAC/{skx_common,i10nm}: Structure the per-channel RRL registers EDAC/i10nm: Explicitly set the modes of the RRL register sets EDAC/{skx_common,i10nm}: Fix the loss of saved RRL for HBM pseudo channel 0 EDAC/skx_common: Fix general protection fault EDAC/igen6: Add Intel Amston Lake SoCs support EDAC/igen6: Add Intel Arizona Beach SoCs support EDAC/igen6: Skip absent memory controllers
2025-05-27bpf: Warn with __bpf_trap() kfunc maybe due to uninitialized variableYonghong Song
Marc Suñé (Isovalent, part of Cisco) reported an issue where an uninitialized variable caused generating bpf prog binary code not working as expected. The reproducer is in [1] where the flags “-Wall -Werror” are enabled, but there is no warning as the compiler takes advantage of uninitialized variable to do aggressive optimization. The optimized code looks like below: ; { 0: bf 16 00 00 00 00 00 00 r6 = r1 ; bpf_printk("Start"); 1: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll 0000000000000008: R_BPF_64_64 .rodata 3: b4 02 00 00 06 00 00 00 w2 = 0x6 4: 85 00 00 00 06 00 00 00 call 0x6 ; DEFINE_FUNC_CTX_POINTER(data) 5: 61 61 4c 00 00 00 00 00 w1 = *(u32 *)(r6 + 0x4c) ; bpf_printk("pre ipv6_hdrlen_offset"); 6: 18 01 00 00 06 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x6 ll 0000000000000030: R_BPF_64_64 .rodata 8: b4 02 00 00 17 00 00 00 w2 = 0x17 9: 85 00 00 00 06 00 00 00 call 0x6 <END> The verifier will report the following failure: 9: (85) call bpf_trace_printk#6 last insn is not an exit or jmp The above verifier log does not give a clear hint about how to fix the problem and user may take quite some time to figure out that the issue is due to compiler taking advantage of uninitialized variable. In llvm internals, uninitialized variable usage may generate 'unreachable' IR insn and these 'unreachable' IR insns may indicate uninitialized variable impact on code optimization. So far, llvm BPF backend ignores 'unreachable' IR hence the above code is generated. With clang21 patch [2], those 'unreachable' IR insn are converted to func __bpf_trap(). In order to maintain proper control flow graph for bpf progs, [2] also adds an 'exit' insn after bpf_trap() if __bpf_trap() is the last insn in the function. The new code looks like: ; { 0: bf 16 00 00 00 00 00 00 r6 = r1 ; bpf_printk("Start"); 1: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll 0000000000000008: R_BPF_64_64 .rodata 3: b4 02 00 00 06 00 00 00 w2 = 0x6 4: 85 00 00 00 06 00 00 00 call 0x6 ; DEFINE_FUNC_CTX_POINTER(data) 5: 61 61 4c 00 00 00 00 00 w1 = *(u32 *)(r6 + 0x4c) ; bpf_printk("pre ipv6_hdrlen_offset"); 6: 18 01 00 00 06 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x6 ll 0000000000000030: R_BPF_64_64 .rodata 8: b4 02 00 00 17 00 00 00 w2 = 0x17 9: 85 00 00 00 06 00 00 00 call 0x6 10: 85 10 00 00 ff ff ff ff call -0x1 0000000000000050: R_BPF_64_32 __bpf_trap 11: 95 00 00 00 00 00 00 00 exit <END> In kernel, a new kfunc __bpf_trap() is added. During insn verification, any hit with __bpf_trap() will result in verification failure. The kernel is able to provide better log message for debugging. With llvm patch [2] and without this patch (no __bpf_trap() kfunc for existing kernel), e.g., for old kernels, the verifier outputs 10: <invalid kfunc call> kfunc '__bpf_trap' is referenced but wasn't resolved Basically, kernel does not support __bpf_trap() kfunc. This still didn't give clear signals about possible reason. With llvm patch [2] and with this patch, the verifier outputs 10: (85) call __bpf_trap#74479 unexpected __bpf_trap() due to uninitialized variable? It gives much better hints for verification failure. [1] https://github.com/msune/clang_bpf/blob/main/Makefile#L3 [2] https://github.com/llvm/llvm-project/pull/131731 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250523205326.1291640-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27Merge tag 'x86_cache_for_v6.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "Carve out the resctrl filesystem-related code into fs/resctrl/ so that multiple architectures can share the fs API for manipulating their respective hw resource control implementation. This is the second step in the work towards sharing the resctrl filesystem interface, the next one being plugging ARM's MPAM into the aforementioned fs API" * tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) MAINTAINERS: Add reviewers for fs/resctrl x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl x86/resctrl: Always initialise rid field in rdt_resources_all[] x86/resctrl: Relax some asm #includes x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context() x86/resctrl: Squelch whitespace anomalies in resctrl core code x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs x86/resctrl: Move enum resctrl_event_id to resctrl.h x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl fs/resctrl: Add boiler plate for external resctrl code x86/resctrl: Add 'resctrl' to the title of the resctrl documentation x86/resctrl: Split trace.h x86/resctrl: Expand the width of domid by replacing mon_data_bits x86/resctrl: Add end-marker to the resctrl_event_id enum x86/resctrl: Move is_mba_sc() out of core.c x86/resctrl: Drop __init/__exit on assorted symbols x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point x86/resctrl: Check all domains are offline in resctrl_exit() x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_" ...
2025-05-27bpf: Remove special_kfunc_set from verifierYonghong Song
Currently, the verifier has both special_kfunc_set and special_kfunc_list. When adding a new kfunc usage to the verifier, it is often confusing about whether special_kfunc_set or special_kfunc_list or both should add that kfunc. For example, some kfuncs, e.g., bpf_dynptr_from_skb, bpf_dynptr_clone, bpf_wq_set_callback_impl, does not need to be in special_kfunc_set. To avoid potential future confusion, special_kfunc_set is deleted and btf_id_set_contains(&special_kfunc_set, ...) is removed. The code is refactored with a new func check_special_kfunc(), which contains all codes covered by original branch meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id) There is no functionality change. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250523205321.1291431-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27Merge branch 'replace-config_dmabuf_sysfs_stats-with-bpf'Alexei Starovoitov
T.J. Mercier says: ==================== Replace CONFIG_DMABUF_SYSFS_STATS with BPF Until CONFIG_DMABUF_SYSFS_STATS was added [1] it was only possible to perform per-buffer accounting with debugfs which is not suitable for production environments. Eventually we discovered the overhead with per-buffer sysfs file creation/removal was significantly impacting allocation and free times, and exacerbated kernfs lock contention. [2] dma_buf_stats_setup() is responsible for 39% of single-page buffer creation duration, or 74% of single-page dma_buf_export() duration when stressing dmabuf allocations and frees. I prototyped a change from per-buffer to per-exporter statistics with a RCU protected list of exporter allocations that accommodates most (but not all) of our use-cases and avoids almost all of the sysfs overhead. While that adds less overhead than per-buffer sysfs, and less even than the maintenance of the dmabuf debugfs_list, it's still *additional* overhead on top of the debugfs_list and doesn't give us per-buffer info. This series uses the existing dmabuf debugfs_list to implement a BPF dmabuf iterator, which adds no overhead to buffer allocation/free and provides per-buffer info. The list has been moved outside of CONFIG_DEBUG_FS scope so that it is always populated. The BPF program loaded by userspace that extracts per-buffer information gets to define its own interface which avoids the lack of ABI stability with debugfs. This will allow us to replace our use of CONFIG_DMABUF_SYSFS_STATS, and the plan is to remove it from the kernel after the next longterm stable release. [1] https://lore.kernel.org/linux-media/20201210044400.1080308-1-hridya@google.com [2] https://lore.kernel.org/all/20220516171315.2400578-1-tjmercier@google.com v1: https://lore.kernel.org/all/20250414225227.3642618-1-tjmercier@google.com v1 -> v2: Make the DMA buffer list independent of CONFIG_DEBUG_FS per Christian König Add CONFIG_DMA_SHARED_BUFFER check to kernel/bpf/Makefile per kernel test robot Use BTF_ID_LIST_SINGLE instead of BTF_ID_LIST_GLOBAL_SINGLE per Song Liu Fixup comment style, mixing code/declarations, and use ASSERT_OK_FD in selftest per Song Liu Add BPF_ITER_RESCHED feature to bpf_dmabuf_reg_info per Alexei Starovoitov Add open-coded iterator and selftest per Alexei Starovoitov Add a second test buffer from the system dmabuf heap to selftests Use the BPF program we'll use in production for selftest per Alexei Starovoitov https://r.android.com/c/platform/system/bpfprogs/+/3616123/2/dmabufIter.c https://r.android.com/c/platform/system/memory/libmeminfo/+/3614259/1/libdmabufinfo/dmabuf_bpf_stats.cpp v2: https://lore.kernel.org/all/20250504224149.1033867-1-tjmercier@google.com v2 -> v3: Rebase onto bpf-next/master Move get_next_dmabuf() into drivers/dma-buf/dma-buf.c, along with the new get_first_dmabuf(). This avoids having to expose the dmabuf list and mutex to the rest of the kernel, and keeps the dmabuf mutex operations near each other in the same file. (Christian König) Add Christian's RB to dma-buf: Rename debugfs symbols Drop RFC: dma-buf: Remove DMA-BUF statistics v3: https://lore.kernel.org/all/20250507001036.2278781-1-tjmercier@google.com v3 -> v4: Fix selftest BPF program comment style (not kdoc) per Alexei Starovoitov Fix dma-buf.c kdoc comment style per Alexei Starovoitov Rename get_first_dmabuf / get_next_dmabuf to dma_buf_iter_begin / dma_buf_iter_next per Christian König Add Christian's RB to bpf: Add dmabuf iterator v4: https://lore.kernel.org/all/20250508182025.2961555-1-tjmercier@google.com v4 -> v5: Add Christian's Acks to all patches Add Song Liu's Acks Move BTF_ID_LIST_SINGLE and DEFINE_BPF_ITER_FUNC closer to usage per Song Liu Fix open-coded iterator comment style per Song Liu Move iterator termination check to its own subtest per Song Liu Rework selftest buffer creation per Song Liu Fix spacing in sanitize_string per BPF CI v5: https://lore.kernel.org/all/20250512174036.266796-1-tjmercier@google.com v5 -> v6: Song Liu: Init test buffer FDs to -1 Zero-init udmabuf_create for future proofing Bail early for iterator fd/FILE creation failure Dereference char ptr to check for NUL in sanitize_string() Move map insertion from create_test_buffers() to test_dmabuf_iter() Add ACK to selftests/bpf: Add test for open coded dmabuf_iter v6: https://lore.kernel.org/all/20250513163601.812317-1-tjmercier@google.com v6 -> v7: Zero uninitialized name bytes following the end of name strings per s390x BPF CI Reorder sanitize_string bounds checks per Song Liu Add Song's Ack to: selftests/bpf: Add test for dmabuf_iter Rebase onto bpf-next/master per BPF CI ==================== Link: https://patch.msgid.link/20250522230429.941193-1-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27selftests/bpf: Add test for open coded dmabuf_iterT.J. Mercier
Use the same test buffers as the traditional iterator and a new BPF map to verify the test buffers can be found with the open coded dmabuf iterator. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-6-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27selftests/bpf: Add test for dmabuf_iterT.J. Mercier
This test creates a udmabuf, and a dmabuf from the system dmabuf heap, and uses a BPF program that prints dmabuf metadata with the new dmabuf_iter to verify they can be found. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-5-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27bpf: Add open coded dmabuf iteratorT.J. Mercier
This open coded iterator allows for more flexibility when creating BPF programs. It can support output in formats other than text. With an open coded iterator, a single BPF program can traverse multiple kernel data structures (now including dmabufs), allowing for more efficient analysis of kernel data compared to multiple reads from procfs, sysfs, or multiple traditional BPF iterator invocations. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-4-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27bpf: Add dmabuf iteratorT.J. Mercier
The dmabuf iterator traverses the list of all DMA buffers. DMA buffers are refcounted through their associated struct file. A reference is taken on each buffer as the list is iterated to ensure each buffer persists for the duration of the bpf program execution without holding the list mutex. Signed-off-by: T.J. Mercier <tjmercier@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-3-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27dma-buf: Rename debugfs symbolsT.J. Mercier
Rename the debugfs list and mutex so it's clear they are now usable without the need for CONFIG_DEBUG_FS. The list will always be populated to support the creation of a BPF iterator for dmabufs. Signed-off-by: T.J. Mercier <tjmercier@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-2-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUsMaxim Levitsky
Use kvm_trylock_all_vcpus instead of a custom implementation when locking all vCPUs of a VM. Compile tested only. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Anup Patel <anup@brainfault.org> Tested-by: Anup Patel <anup@brainfault.org> Message-ID: <20250512180407.659015-7-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUsMaxim Levitsky
Use kvm_trylock_all_vcpus instead of a custom implementation when locking all vCPUs of a VM, to avoid triggering a lockdep warning, in the case in which the VM is configured to have more than MAX_LOCK_DEPTH vCPUs. This fixes the following false lockdep warning: [ 328.171264] BUG: MAX_LOCK_DEPTH too low! [ 328.175227] turning off the locking correctness validator. [ 328.180726] Please attach the output of /proc/lock_stat to the bug report [ 328.187531] depth: 48 max: 48! [ 328.190678] 48 locks held by qemu-kvm/11664: [ 328.194957] #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0 [ 328.204048] #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.212521] #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.220991] #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.229463] #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.237934] #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.246405] #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-ID: <20250512180407.659015-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementationMaxim Levitsky
Use kvm_lock_all_vcpus instead of sev's own implementation. Because kvm_lock_all_vcpus uses the _nest_lock feature of lockdep, which ignores subclasses, there is no longer a need to use separate subclasses for source and target VMs. No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-ID: <20250512180407.659015-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27KVM: add kvm_lock_all_vcpus and kvm_trylock_all_vcpusMaxim Levitsky
In a few cases, usually in the initialization code, KVM locks all vCPUs of a VM to ensure that userspace doesn't do funny things while KVM performs an operation that affects the whole VM. Until now, all these operations were implemented using custom code, and all of them share the same problem: Lockdep can't cope with simultaneous locking of a large number of locks of the same class. However if these locks are taken while another lock is already held, which is luckily the case, it is possible to take advantage of little known _nest_lock feature of lockdep which allows in this case to have an unlimited number of locks of same class to be taken. To implement this, create two functions: kvm_lock_all_vcpus() and kvm_trylock_all_vcpus() Both functions are needed because some code that will be replaced in the subsequent patches, uses mutex_trylock, instead of regular mutex_lock. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-ID: <20250512180407.659015-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27locking/mutex: implement mutex_lock_killable_nest_lockMaxim Levitsky
KVM's SEV intra-host migration code needs to lock all vCPUs of the source and the target VM, before it proceeds with the migration. The number of vCPUs that belong to each VM is not bounded by anything except a self-imposed KVM limit of CONFIG_KVM_MAX_NR_VCPUS vCPUs which is significantly larger than the depth of lockdep's lock stack. Luckily, the locks in both of the cases mentioned above, are held under the 'kvm->lock' of each VM, which means that we can use the little known lockdep feature called a "nest_lock" to support this use case in a cleaner way, compared to the way it's currently done. Implement and expose 'mutex_lock_killable_nest_lock' for this purpose. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-ID: <20250512180407.659015-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27locking/mutex: implement mutex_trylock_nestedMaxim Levitsky
Despite the fact that several lockdep-related checks are skipped when calling trylock* versions of the locking primitives, for example mutex_trylock, each time the mutex is acquired, a held_lock is still placed onto the lockdep stack by __lock_acquire() which is called regardless of whether the trylock* or regular locking API was used. This means that if the caller successfully acquires more than MAX_LOCK_DEPTH locks of the same class, even when using mutex_trylock, lockdep will still complain that the maximum depth of the held lock stack has been reached and disable itself. For example, the following error currently occurs in the ARM version of KVM, once the code tries to lock all vCPUs of a VM configured with more than MAX_LOCK_DEPTH vCPUs, a situation that can easily happen on modern systems, where having more than 48 CPUs is common, and it's also common to run VMs that have vCPU counts approaching that number: [ 328.171264] BUG: MAX_LOCK_DEPTH too low! [ 328.175227] turning off the locking correctness validator. [ 328.180726] Please attach the output of /proc/lock_stat to the bug report [ 328.187531] depth: 48 max: 48! [ 328.190678] 48 locks held by qemu-kvm/11664: [ 328.194957] #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0 [ 328.204048] #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.212521] #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.220991] #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.229463] #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.237934] #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.246405] #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 Luckily, in all instances that require locking all vCPUs, the 'kvm->lock' is taken a priori, and that fact makes it possible to use the little known feature of lockdep, called a 'nest_lock', to avoid this warning and subsequent lockdep self-disablement. The action of 'nested lock' being provided to lockdep's lock_acquire(), causes the lockdep to detect that the top of the held lock stack contains a lock of the same class and then increment its reference counter instead of pushing a new held_lock item onto that stack. See __lock_acquire for more information. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-ID: <20250512180407.659015-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27Merge tag 'kvm-x86-svm-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM SVM changes for 6.16: - Wait for target vCPU to acknowledge KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN. - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM. - Add support for ALLOWED_SEV_FEATURES. - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y. - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits. - Don't account temporary allocations in sev_send_update_data(). - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold.
2025-05-27Merge tag 'kvm-x86-vmx-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM VMX changes for 6.16: - Explicitly check MSR load/store list counts to fix a potential overflow on 32-bit kernels. - Flush shadow VMCSes on emergency reboot. - Revert mem_enc_ioctl() back to an optional hook, as it's nullified when SEV or TDX is disabled via Kconfig. - Macrofy the handling of vt_x86_ops to eliminate a pile of boilerplate code needed for TDX, and to optimize CONFIG_KVM_INTEL_TDX=n builds.
2025-05-27Merge tag 'kvm-x86-selftests-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM selftests changes for 6.16: - Add support for SNP to the various SEV selftests. - Add a selftest to verify fastops instructions via forced emulation. - Add MGLRU support to the access tracking perf test.
2025-05-27Merge tag 'kvm-x86-pir-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 posted interrupt changes for 6.16: Refine and optimize KVM's software processing of the PIR, and ultimately share PIR harvesting code between KVM and the kernel's Posted MSI handler
2025-05-27Merge tag 'kvm-x86-mmu-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 MMU changes for 6.16: - Refine and harden handling of spurious faults. - Use kvm_x86_call() instead of open coding static_call().
2025-05-27Merge tag 'kvm-x86-misc-6.16' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 misc changes for 6.16: - Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU IBPB, between SVM and VMX. - Advertise support to userspace for WRMSRNS and PREFETCHI. - Rescan I/O APIC routes after handling EOI that needed to be intercepted due to the old/previous routing, but not the new/current routing. - Add a module param to control and enumerate support for device posted interrupts. - Misc cleanups.
2025-05-27KVM: VMX: use __always_inline for is_td_vcpu and is_tdEdward Adam Davis
is_td() and is_td_vcpu() are used in no-instrumentation sections; use __always_inline instead of inline. vmlinux.o: error: objtool: vmx_handle_nmi+0x47: call to is_td_vcpu.isra.0() leaves .noinstr.text section Fixes: 7172c753c26a ("KVM: VMX: Move common fields of struct vcpu_{vmx,tdx} to a struct") Signed-off-by: Edward Adam Davis <eadavis@qq.com> Message-ID: <tencent_1A767567C83C1137829622362E4A72756F09@qq.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27Merge tag 'timers-core-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer core updates from Thomas Gleixner: "Updates for the time/timer core code: - Rework the initialization of the posix-timer kmem_cache and move the cache pointer into the timer_data structure to prevent false sharing - Switch the alarmtimer code to lock guards - Improve the CPU selection criteria in the per CPU validation of the clocksource watchdog to avoid arbitrary selections (or omissions) on systems with a small number of CPUs - The usual cleanups and improvements" * tag 'timers-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/nohz: Remove unused tick_nohz_full_add_cpus_to() clocksource: Fix the CPUs' choice in the watchdog per CPU verification alarmtimer: Switch spin_{lock,unlock}_irqsave() to guards alarmtimer: Remove dead return value in clock2alarm() time/jiffies: Change register_refined_jiffies() to void __init timers: Remove unused __round_jiffies(_up) posix-timers: Initialize cache early and move pointer into __timer_data
2025-05-27Merge tag 'timers-clocksource-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull clocksource updates from Thomas Gleixner: "Updates for clocksource/clockevent drivers: - The final conversion of text formatted device tree binding to schemas - A new driver fot the System Timer Module on S32G NXP SoCs - A new driver fot the Econet HPT timer - The usual improvements and device tree binding updates" * tag 'timers-clocksource-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) clocksource/drivers/renesas-ostm: Unconditionally enable reprobe support dt-bindings: timer: renesas,ostm: Document RZ/V2N (R9A09G056) support dt-bindings: timer: Convert marvell,armada-370-timer to DT schema dt-bindings: timer: Convert ti,keystone-timer to DT schema dt-bindings: timer: Convert st,spear-timer to DT schema dt-bindings: timer: Convert socionext,milbeaut-timer to DT schema dt-bindings: timer: Convert snps,arc-timer to DT schema dt-bindings: timer: Convert snps,archs-rtc to DT schema dt-bindings: timer: Convert snps,archs-gfrc to DT schema dt-bindings: timer: Convert lsi,zevio-timer to DT schema dt-bindings: timer: Convert jcore,pit to DT schema dt-bindings: timer: Convert img,pistachio-gptimer to DT schema dt-bindings: timer: Convert ezchip,nps400-timer to DT schema dt-bindings: timer: Convert cirrus,clps711x-timer to DT schema dt-bindings: timer: Convert altr,timer-1.0 to DT schema dt-bindings: timer: Add ESWIN EIC7700 CLINT clocksource/drivers: Add EcoNet Timer HPT driver dt-bindings: timer: Add EcoNet EN751221 "HPT" CPU Timer dt-bindings: timer: Convert arm,mps2-timer to DT schema dt-bindings: timer: Add Sophgo SG2044 ACLINT timer ...
2025-05-27Merge tag 'timers-cleanups-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "Another set of timer API cleanups: - Convert init_timer*(), try_to_del_timer_sync() and destroy_timer_on_stack() over to the canonical timer_*() namespace convention. There is another large conversion pending, which has not been included because it would have caused a gazillion of merge conflicts in next. The conversion scripts will be run towards the end of the merge window and a pull request sent once all conflict dependencies have been merged" * tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack() treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try() timers: Rename init_timers() as timers_init() timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA timers: Rename __init_timer_on_stack() as __timer_init_on_stack() timers: Rename __init_timer() as __timer_init() timers: Rename init_timer_on_stack_key() as timer_init_key_on_stack() timers: Rename init_timer_key() as timer_init_key()
2025-05-27Merge tag 'irq-msi-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull MSI updates from Thomas Gleixner: "Updates for the MSI subsystem (core code and PCI): - Switch the MSI descriptor locking to lock guards - Replace a broken and naive implementation of PCI/MSI-X control word updates in the PCI/TPH driver with a properly serialized variant in the PCI/MSI core code. - Remove the MSI descriptor abuse in the SCCI/UFS/QCOM driver by replacing the direct access to the MSI descriptors with the proper API function calls. People will never understand that APIs exist for a reason... - Provide core infrastructre for the upcoming PCI endpoint library extensions. Currently limited to ARM GICv3+, but in theory extensible to other architectures. - Provide a MSI domain::teardown() callback, which allows drivers to undo the effects of the prepare() callback. - Move the MSI domain::prepare() callback invocation to domain creation time to avoid redundant (and in case of ARM/GIC-V3-ITS confusing) invocations on every allocation. In combination with the new teardown callback this removes some ugly hacks in the GIC-V3-ITS driver, which pretended to work around the short comings of the core code so far. With this update the code is correct by design and implementation. - Make the irqchip MSI library globally available, provide a MSI parent domain creation helper and convert a bunch of (PCI/)MSI drivers over to the modern MSI parent mechanism. This is the first step to get rid of at least one incarnation of the three PCI/MSI management schemes. - The usual small cleanups and improvements" * tag 'irq-msi-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) PCI/MSI: Use bool for MSI enable state tracking PCI: tegra: Convert to MSI parent infrastructure PCI: xgene: Convert to MSI parent infrastructure PCI: apple: Convert to MSI parent infrastructure irqchip/msi-lib: Honour the MSI_FLAG_NO_AFFINITY flag irqchip/mvebu: Convert to msi_create_parent_irq_domain() helper irqchip/gic: Convert to msi_create_parent_irq_domain() helper genirq/msi: Add helper for creating MSI-parent irq domains irqchip: Make irq-msi-lib.h globally available irqchip/gic-v3-its: Use allocation size from the prepare call genirq/msi: Engage the .msi_teardown() callback on domain removal genirq/msi: Move prepare() call to per-device allocation irqchip/gic-v3-its: Implement .msi_teardown() callback genirq/msi: Add .msi_teardown() callback as the reverse of .msi_prepare() irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask dt-bindings: PCI: pci-ep: Add support for iommu-map and msi-map irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable() platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all() genirq/msi: Rename msi_[un]lock_descs() ...
2025-05-27Merge tag 'irq-cleanups-2025-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq cleanups from Thomas Gleixner: "A set of cleanups for the generic interrupt subsystem: - Consolidate on one set of functions for the interrupt domain code to get rid of pointlessly duplicated code with only marginal different semantics. - Update the documentation accordingly and consolidate the coding style of the irqdomain header" * tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) irqdomain: Consolidate coding style irqdomain: Fix kernel-doc and add it to Documentation Documentation: irqdomain: Update it Documentation: irq-domain.rst: Simple improvements Documentation: irq/concepts: Minor improvements Documentation: irq/concepts: Add commas and reflow irqdomain: Improve kernel-docs of functions irqdomain: Make struct irq_domain_info variables const irqdomain: Use irq_domain_instantiate()'s return value as initializers irqdomain: Drop irq_linear_revmap() pinctrl: keembay: Switch to irq_find_mapping() irqchip/armada-370-xp: Switch to irq_find_mapping() gpu: ipu-v3: Switch to irq_find_mapping() gpio: idt3243x: Switch to irq_find_mapping() sh: Switch to irq_find_mapping() powerpc: Switch to irq_find_mapping() irqdomain: Drop irq_domain_add_*() functions powerpc: Switch irq_domain_add_nomap() to use fwnode thermal: Switch to irq_domain_create_linear() soc: Switch to irq_domain_create_*() ...