linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-05-27	dt-bindings: pinctrl: amlogic,pinctrl-a4: Add missing constraint on allowed ↵	Rob Herring (Arm)
	'group' node properties The "^group-[0-9a-z-]+$" nodes schema doesn't constrain the allowed properties as the referenced common schemas don't have constraints. Add the missing "unevaluatedProperties" constraint. Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20250507215852.2748420-1-robh@kernel.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-05-27	Merge patch series "dropbehind fixes and cleanups"	Christian Brauner
	Jens Axboe <axboe@kernel.dk> says: As per the thread here: https://lore.kernel.org/linux-fsdevel/20250525083209.GS2023217@ZenIV/ there was an issue with the dropbehind support, and hence it got reverted (effectively) for the 6.15 kernel release. The problem stems from the fact that the folio can get redirtied and/or scheduled for writeback after the initial dropbehind test, and before we have it locked again for invalidation. Patches 1+2 add a generic helper that both the read and write side can use, and which checks for !dirty && !writeback before going ahead with the invalidation. Patch 3 reverts the FOP_DONTCACHE disable, and patches 4 and 5 do a bit of cleanup work to further unify how the read and write side handling works. This can reasonably be considered a 2 part series, as 1-3 fix the issue and could go to stable, while 4-5 just cleanup the code. * patches from https://lore.kernel.org/20250527133255.452431-1-axboe@kernel.dk: mm/filemap: unify dropbehind flag testing and clearing mm/filemap: unify read/write dropbehind naming Revert "Disable FOP_DONTCACHE for now due to bugs" mm/filemap: use filemap_end_dropbehind() for read invalidation mm/filemap: gate dropbehind invalidate on folio !dirty && !writeback Link: https://lore.kernel.org/20250527133255.452431-1-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-27	mm/filemap: unify dropbehind flag testing and clearing	Jens Axboe
	The read and write side does this a bit differently, unify it such that the _{read,write} helpers check the bit before locking, and the generic handler is in charge of clearing the bit and invalidating, once under the folio lock. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/20250527133255.452431-6-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-27	mm/filemap: unify read/write dropbehind naming	Jens Axboe
	The read side is filemap_end_dropbehind_read(), while the write side used folio_ as the prefix rather than filemap_. The read side makes more sense, unify the naming such that the write side follows that. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/20250527133255.452431-5-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-27	Revert "Disable FOP_DONTCACHE for now due to bugs"	Jens Axboe
	This reverts commit 478ad02d6844217cc7568619aeb0809d93ade43d. Both the read and write side dirty && writeback races should be resolved now, revert the commit that disabled FOP_DONTCACHE for filesystems. Link: https://lore.kernel.org/linux-fsdevel/20250525083209.GS2023217@ZenIV/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/20250527133255.452431-4-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-27	mm/filemap: use filemap_end_dropbehind() for read invalidation	Jens Axboe
	Use the filemap_end_dropbehind() helper rather than calling folio_unmap_invalidate() directly, as we need to check if the folio has been redirtied or marked for writeback once the folio lock has been re-acquired. Cc: stable@vger.kernel.org Reported-by: Trond Myklebust <trondmy@hammerspace.com> Fixes: 8026e49bff9b ("mm/filemap: add read support for RWF_DONTCACHE") Link: https://lore.kernel.org/linux-fsdevel/ba8a9805331ce258a622feaca266b163db681a10.camel@hammerspace.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/20250527133255.452431-3-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-27	mm/filemap: gate dropbehind invalidate on folio !dirty && !writeback	Jens Axboe
	It's possible for the folio to either get marked for writeback or redirtied. Add a helper, filemap_end_dropbehind(), which guards the folio_unmap_invalidate() call behind check for the folio being both non-dirty and not under writeback AFTER the folio lock has been acquired. Use this helper folio_end_dropbehind_write(). Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: fb7d3bc41493 ("mm/filemap: drop streaming/uncached pages when writeback completes") Link: https://lore.kernel.org/linux-fsdevel/20250525083209.GS2023217@ZenIV/ Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/20250527133255.452431-2-axboe@kernel.dk Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-27	io_uring/zcrx: fix area release on registration failure	Pavel Begunkov
	On area registration failure there might be no ifq set and it's not safe to access area->ifq in the release path without checking it first. Cc: stable@vger.kernel.org Fixes: f12ecf5e1c5ec ("io_uring/zcrx: fix late dma unmap for a dead dev") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/bc02878678a5fec28bc77d33355cdba735418484.1748365640.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-27	Merge tag 'nolibc-20250526-for-6.16-1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc Pull nolibc updates from Thomas Weißschuh: - New supported architectures: m68k, SPARC (32 and 64 bit) - Compatibility with kselftest_harness.h - A more robust mechanism to include all of nolibc from each header - Split existing features into new headers to simplify adoption - Compatibility with UBSAN and it is used in the testsuite - Many small new features focussing on usage in kselftests * tag 'nolibc-20250526-for-6.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc: (83 commits) selftests: harness: Stop using setjmp()/longjmp() selftests: harness: Add "variant" and "self" to test metadata selftests: harness: Add teardown callback to test metadata selftests: harness: Move teardown conditional into test metadata selftests: harness: Don't set setup_completed for fixtureless tests selftests: harness: Implement test timeouts through pidfd selftests: harness: Remove dependency on libatomic selftests: harness: Remove inline qualifier for wrappers selftests: harness: Mark functions without prototypes static selftests: harness: Ignore unused variant argument warning selftests: harness: Use C89 comment style selftests: harness: Add kselftest harness selftest selftests/nolibc: drop include guards around standard headers tools/nolibc: move NULL and offsetof() to sys/stddef.h tools/nolibc: move uname() and friends to sys/utsname.h tools/nolibc: move makedev() and friends to sys/sysmacros.h tools/nolibc: move getrlimit() and friends to sys/resource.h tools/nolibc: move reboot() to sys/reboot.h tools/nolibc: move prctl() to sys/prctl.h tools/nolibc: move mount() to sys/mount.h ...
2025-05-27	Merge tag 'docs-6.16' of git://git.lwn.net/linux	Linus Torvalds
	Pull documentation updates from Jonathan Corbet: "A moderately busy cycle for documentation this time around: - The most significant change is the replacement of the old kernel-doc script (a monstrous collection of Perl regexes that predates the Git era) with a Python reimplementation. That, too, is a horrifying collection of regexes, but in a much cleaner and more maintainable structure that integrates far better with the Sphinx build system. This change has been in linux-next for the full 6.15 cycle; the small number of problems that turned up have been addressed, seemingly to everybody's satisfaction. The Perl kernel-doc script remains in tree (as scripts/kernel-doc.pl) and can be used with a command-line option if need be. Unless some reason to keep it around materializes, it will probably go away in 6.17. Credit goes to Mauro Carvalho Chehab for doing all this work. - Some RTLA documentation updates - A handful of Chinese translations - The usual collection of typo fixes, general updates, etc" * tag 'docs-6.16' of git://git.lwn.net/linux: (85 commits) Docs: doc-guide: update sphinx.rst Sphinx version number docs: doc-guide: clarify latest theme usage Documentation/scheduler: Fix typo in sched-stats domain field description scripts: kernel-doc: prevent a KeyError when checking output docs: kerneldoc.py: simplify exception handling logic MAINTAINERS: update linux-doc entry to cover new Python scripts docs: align with scripts/syscall.tbl migration Documentation: NTB: Fix typo Documentation: ioctl-number: Update table intro docs: conf.py: drop backward support for old Sphinx versions Docs: driver-api/basics: add kobject_event interfaces Docs: relay: editing cleanups docs: fix "incase" typo in coresight/panic.rst Fix spelling error for 'parallel' docs: admin-guide: fix typos in reporting-issues.rst docs: dmaengine: add explanation for DMA_ASYNC_TX capability Documentation: leds: improve readibility of multicolor doc docs: fix typo in firmware-related section docs: Makefile: Inherit PYTHONPYCACHEPREFIX setting as env variable Documentation: ioctl-number: Update outdated submission info ...
2025-05-27	Merge tag 'lkmm.2025.05.25a' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull lkmm updates from Paul McKenney: "Update LKMM documentation: - Cross-references, typos, broken URLs (Akira Yokosawa) - Clarify SRCU explanation (Uladzislau Rezki)" * tag 'lkmm.2025.05.25a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: tools/memory-model/Documentation: Fix SRCU section in explanation.txt tools/memory-model: docs/references: Remove broken link to imgtec.com tools/memory-model: docs/ordering: Fix trivial typos tools/memory-model: docs/simple.txt: Fix trivial typos tools/memory-model: docs/README: Update introduction of locking.txt
2025-05-27	Documentation: rust: testing: add docs on the new KUnit `#[test]` tests	Miguel Ojeda
	There was no documentation yet on the KUnit-based `#[test]`s. Thus add it now. It includes an explanation about the `assert*!` macros being mapped to KUnit and the support for `-> Result` introduced in these series. Reviewed-by: David Gow <davidgow@google.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250502215133.1923676-8-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-05-27	Documentation: rust: rename `#[test]`s to "`rusttest` host tests"	Miguel Ojeda
	Now that `rusttest`s are not really used much, clarify the section of the documentation that describes them. In addition, free the section name for the KUnit-based `#[test]`s that will be added afterwards. To do so, rename the section into `rusttest` host tests. Reviewed-by: David Gow <davidgow@google.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250502215133.1923676-7-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-05-27	rust: str: take advantage of the `-> Result` support in KUnit `#[test]`'s	Miguel Ojeda
	Since now we have support for returning `-> Result`s, we can convert some of these tests to use the feature, and serve as a first user for it too. Thus convert them, which allows us to remove some `unwrap()`s. We keep the actual assertions we want to make as explicit ones with `assert*!`s. Reviewed-by: David Gow <davidgow@google.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250502215133.1923676-6-ojeda@kernel.org [ Split the `CString` simplification into a new commit. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-05-27	rust: str: simplify KUnit tests `format!` macro	Miguel Ojeda
	Simplify the `format!` macro used in the tests by using `CString::try_from_fmt` and directly `unwrap()`ing. This will allow us to change both `unwrap()`s here in order to showcase the `?` operator support now that the tests are KUnit ones. Reviewed-by: David Gow <davidgow@google.com> Acked-by: Danilo Krummrich <dakr@kernel.org> [ Split from the next commit as suggested by Tamir. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-05-27	rust: str: convert `rusttest` tests into KUnit	Miguel Ojeda
	In general, we should aim to test as much as possible within the actual kernel, and not in the build host. Thus convert these `rusttest` tests into KUnit tests. Reviewed-by: David Gow <davidgow@google.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250502215133.1923676-5-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-05-27	rust: add `kunit_tests` to the prelude	Miguel Ojeda
	It is convenient to have certain things in the `kernel` prelude, and means kernel developers will find it even easier to start writing tests. And, anyway, nobody should need to use this identifier for anything else. Thus add it to the prelude. Reviewed-by: David Gow <davidgow@google.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250502215133.1923676-4-ojeda@kernel.org Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-05-27	rust: kunit: support checked `-> Result`s in KUnit `#[test]`s	Miguel Ojeda
	Currently, return values of KUnit `#[test]` functions are ignored. Thus introduce support for `-> Result` functions by checking their returned values. At the same time, require that test functions return `()` or `Result<T, E>`, which should avoid mistakes, especially with non-`#[must_use]` types. Other types can be supported in the future if needed. With this, a failing test like: #[test] fn my_test() -> Result { f()?; Ok(()) } will output: [ 3.744214] KTAP version 1 [ 3.744287] # Subtest: my_test_suite [ 3.744378] # speed: normal [ 3.744399] 1..1 [ 3.745817] # my_test: ASSERTION FAILED at rust/kernel/lib.rs:321 [ 3.745817] Expected is_test_result_ok(my_test()) to be true, but is false [ 3.747152] # my_test.speed: normal [ 3.747199] not ok 1 my_test [ 3.747345] not ok 4 my_test_suite Reviewed-by: David Gow <davidgow@google.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250502215133.1923676-3-ojeda@kernel.org [ Used `::kernel` for paths. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-05-27	rust: kunit: support KUnit-mapped `assert!` macros in `#[test]`s	Miguel Ojeda
	The KUnit `#[test]` support that landed recently is very basic and does not map the `assert!` macros into KUnit like the doctests do, so they panic at the moment. Thus implement the custom mapping in a similar way to doctests, reusing the infrastructure there. In Rust 1.88.0, the `file()` method in `Span` may be stable [1]. However, it was changed recently (from `SourceFile`), so we need to do something different in previous versions. Thus create a helper for it and use it to get the path. With this, a failing test suite like: #[kunit_tests(my_test_suite)] mod tests { use super::; #[test] fn my_first_test() { assert_eq!(42, 43); } #[test] fn my_second_test() { assert!(42 >= 43); } } will properly map back to KUnit, printing something like: [ 1.924325] KTAP version 1 [ 1.924421] # Subtest: my_test_suite [ 1.924506] # speed: normal [ 1.924525] 1..2 [ 1.926385] # my_first_test: ASSERTION FAILED at rust/kernel/lib.rs:251 [ 1.926385] Expected 42 == 43 to be true, but is false [ 1.928026] # my_first_test.speed: normal [ 1.928075] not ok 1 my_first_test [ 1.928723] # my_second_test: ASSERTION FAILED at rust/kernel/lib.rs:256 [ 1.928723] Expected 42 >= 43 to be true, but is false [ 1.929834] # my_second_test.speed: normal [ 1.929868] not ok 2 my_second_test [ 1.930032] # my_test_suite: pass:0 fail:2 skip:0 total:2 [ 1.930153] # Totals: pass:0 fail:2 skip:0 total Link: https://github.com/rust-lang/rust/pull/140514 [1] Reviewed-by: David Gow <davidgow@google.com> Acked-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250502215133.1923676-2-ojeda@kernel.org [ Required `KUNIT=y` like for doctests. Used the `cfg_attr` from the TODO comment and clarified its comment now that the stabilization is in beta and thus quite likely stable in Rust 1.88.0. Simplified the `new_body` code by introducing a new variable. Added `#[allow(clippy::incompatible_msrv)]`. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-05-27	Merge branch 'bpf-arm64-support-up-to-12-arguments'	Alexei Starovoitov
	Alexis Lothoré says: ==================== this is the v2 of the many args series for arm64, being itself a revival of Xu Kuhoai's work to enable larger arguments count for BPF programs on ARM64 ([1]). The discussions in v1 shed some light on some issues around specific cases, for example with functions passing struct on stack with custom packing/alignment attributes: those cases can not be properly detected with the current BTF info. So this new revision aims to separate concerns with a simpler implementation, just accepting additional args on stack if we can make sure about the alignment constraints (and so, refusing attachment to functions passing structs on stacks). I then checked if the specific alignment constraints could be checked with larger scalar types rather than structs, but it appears that this use case is in fact rejected at the verifier level (see a9b59159d338 ("bpf: Do not allow btf_ctx_access with __int128 types")). So in the end the specific alignment corner cases raised in [1] can not really happen in the kernel in its current state. This new revision still brings support for the standard cases as a first step, it will then be possible to iterate on top of it to add the more specific cases like struct passed on stack and larger types. [1] https://lore.kernel.org/all/20230917150752.69612-1-xukuohai@huaweicloud.com/#t Changes in v3: - switch back -EOPNOTSUPP to -ENOTSUPP - fix comment style - group intializations for arg_aux - remove some unneeded round_up - Link to v2: https://lore.kernel.org/r/20250522-many_args_arm64-v2-0-d6afdb9cf819@bootlin.com Changes in v2: - remove alignment computation from btf.c - deduce alignment constraints directly in jit compiler for simple types - deny attachment to functions with "corner-cases" arguments (ie: structs on stack) - remove custom tests, as the corresponding use cases are locked either by the JIT comp or the verifier - drop RFC - Link to v1: https://lore.kernel.org/r/20250411-many_args_arm64-v1-0-0a32fe72339e@bootlin.com ==================== Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Acked-by: Xu Kuohai <xukuohai@huawei.com> Link: https://patch.msgid.link/20250527-many_args_arm64-v3-0-3faf7bb8e4a2@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	selftests/bpf: enable many-args tests for arm64	Alexis Lothoré (eBPF Foundation)
	Now that support for up to 12 args is enabled for tracing programs on ARM64, enable the existing tests for this feature on this architecture. Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20250527-many_args_arm64-v3-2-3faf7bb8e4a2@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	bpf, arm64: Support up to 12 function arguments	Xu Kuohai
	Currently ARM64 bpf trampoline supports up to 8 function arguments. According to the statistics from commit 473e3150e30a ("bpf, x86: allow function arguments up to 12 for TRACING"), there are about 200 functions accept 9 to 12 arguments, so adding support for up to 12 function arguments. Due to bpf only supporting function arguments up to 16 bytes, according to AAPCS64, starting from the first argument, each argument is first attempted to be loaded to 1 or 2 smallest registers from x0-x7, if there are no enough registers to hold the entire argument, then all remaining arguments starting from this one are pushed to the stack for passing. There are some non-trivial cases for which it is not possible to correctly read arguments from/write arguments to the stack: for example struct variables may have custom packing/alignment attributes that are invisible in BTF info. Such cases are denied for now to make sure not to read incorrect values. Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Co-developed-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com> Link: https://lore.kernel.org/r/20250527-many_args_arm64-v3-1-3faf7bb8e4a2@bootlin.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	Merge tag 'ratelimit.2025.05.25a' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull rate-limit updates from Paul McKenney: "lib/ratelimit: Reduce false-positive and silent misses: - Reduce open-coded use of ratelimit_state structure fields. - Convert the ->missed field to atomic_t. - Count misses that are due to lock contention. - Eliminate jiffies=0 special case. - Reduce ___ratelimit() false-positive rate limiting (Petr Mladek). - Allow zero ->burst to hard-disable rate limiting. - Optimize away atomic operations when a miss is guaranteed. - Warn if ->interval or ->burst are negative (Petr Mladek). - Simplify the resulting code. A smoke test and stress test have been created, but they are not yet ready for mainline. With luck, we will offer them for the v6.17 merge window" * tag 'ratelimit.2025.05.25a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: ratelimit: Drop redundant accesses to burst ratelimit: Use nolock_ret restructuring to collapse common case code ratelimit: Use nolock_ret label to collapse lock-failure code ratelimit: Use nolock_ret label to save a couple of lines of code ratelimit: Simplify common-case exit path ratelimit: Warn if ->interval or ->burst are negative ratelimit: Avoid atomic decrement under lock if already rate-limited ratelimit: Avoid atomic decrement if already rate-limited ratelimit: Don't flush misses counter if RATELIMIT_MSG_ON_RELEASE ratelimit: Force re-initialization when rate-limiting re-enabled ratelimit: Allow zero ->burst to disable ratelimiting ratelimit: Reduce ___ratelimit() false-positive rate limiting ratelimit: Avoid jiffies=0 special case ratelimit: Count misses due to lock contention ratelimit: Convert the ->missed field to atomic_t drm/amd/pm: Avoid open-coded use of ratelimit_state structure's internals drm/i915: Avoid open-coded use of ratelimit_state structure's ->missed field random: Avoid open-coded use of ratelimit_state structure's ->missed field ratelimit: Create functions to handle ratelimit_state internals
2025-05-27	bpf: Check rcu_read_lock_trace_held() in bpf_map_lookup_percpu_elem()	Hou Tao
	bpf_map_lookup_percpu_elem() helper is also available for sleepable bpf program. When BPF JIT is disabled or under 32-bit host, bpf_map_lookup_percpu_elem() will not be inlined. Using it in a sleepable bpf program will trigger the warning in bpf_map_lookup_percpu_elem(), because the bpf program only holds rcu_read_lock_trace lock. Therefore, add the missed check. Reported-by: syzbot+dce5aae19ae4d6399986@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/000000000000176a130617420310@google.com/ Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20250526062534.1105938-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	bpf: Avoid __bpf_prog_ret0_warn when jit fails	KaFai Wan
	syzkaller reported an issue: WARNING: CPU: 3 PID: 217 at kernel/bpf/core.c:2357 __bpf_prog_ret0_warn+0xa/0x20 kernel/bpf/core.c:2357 Modules linked in: CPU: 3 UID: 0 PID: 217 Comm: kworker/u32:6 Not tainted 6.15.0-rc4-syzkaller-00040-g8bac8898fe39 RIP: 0010:__bpf_prog_ret0_warn+0xa/0x20 kernel/bpf/core.c:2357 Call Trace: <TASK> bpf_dispatcher_nop_func include/linux/bpf.h:1316 [inline] __bpf_prog_run include/linux/filter.h:718 [inline] bpf_prog_run include/linux/filter.h:725 [inline] cls_bpf_classify+0x74a/0x1110 net/sched/cls_bpf.c:105 ... When creating bpf program, 'fp->jit_requested' depends on bpf_jit_enable. This issue is triggered because of CONFIG_BPF_JIT_ALWAYS_ON is not set and bpf_jit_enable is set to 1, causing the arch to attempt JIT the prog, but jit failed due to FAULT_INJECTION. As a result, incorrectly treats the program as valid, when the program runs it calls `__bpf_prog_ret0_warn` and triggers the WARN_ON_ONCE(1). Reported-by: syzbot+0903f6d7f285e41cdf10@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/6816e34e.a70a0220.254cdc.002c.GAE@google.com Fixes: fa9dd599b4da ("bpf: get rid of pure_initcall dependency to enable jits") Signed-off-by: KaFai Wan <mannkafai@gmail.com> Link: https://lore.kernel.org/r/20250526133358.2594176-1-mannkafai@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	bpftool: Add support for custom BTF path in prog load/loadall	Jiayuan Chen
	This patch exposes the btf_custom_path feature to bpftool, allowing users to specify a custom BTF file when loading BPF programs using prog load or prog loadall commands. The argument 'btf_custom_path' in libbpf is used for those kernels that don't have CONFIG_DEBUG_INFO_BTF enabled but still want to perform CO-RE relocations. Suggested-by: Quentin Monnet <qmo@kernel.org> Reviewed-by: Quentin Monnet <qmo@kernel.org> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20250516144708.298652-1-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	selftests/bpf: Add unit tests with __bpf_trap() kfunc	Yonghong Song
	Add some inline-asm tests and C tests where __bpf_trap() or __builtin_trap() is used in the code. The __builtin_trap() test is guarded with llvm21 ([1]) since otherwise the compilation failure will happen. [1] https://github.com/llvm/llvm-project/pull/131731 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250523205331.1291734-1-yonghong.song@linux.dev Tested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	Merge tag 'x86_sev_for_v6.16_rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull AMD SEV update from Borislav Petkov: "Add a virtual TPM driver glue which allows a guest kernel to talk to a TPM device emulated by a Secure VM Service Module (SVSM) - a helper module of sorts which runs at a different privilege level in the SEV-SNP VM stack. The intent being that a TPM device is emulated by a trusted entity and not by the untrusted host which is the default assumption in the confidential computing scenarios" * tag 'x86_sev_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Register tpm-svsm platform device tpm: Add SNP SVSM vTPM driver svsm: Add header with SVSM_VTPM_CMD helpers x86/sev: Add SVSM vTPM probe/send_command functions
2025-05-27	Merge tag 'x86_mtrr_for_v6.16_rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull mtrr update from Borislav Petkov: "A single change to verify the presence of fixed MTRR ranges before accessing the respective MSRs" * tag 'x86_mtrr_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mtrr: Check if fixed-range MTRRs exist in mtrr_save_fixed_ranges()
2025-05-27	Merge tag 'edac_updates_for_v6.16' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: - ie31200: Add support for Raptor Lake-S and Alder Lake-S compute dies - Rework how RRL registers per channel tracking is done in order to support newer hardware with different RRL configurations and refactor that code. Add support for Granite Rapids server - i10nm: explicitly set RRL modes to fix any wrong BIOS programming - Properly save and restore Retry Read error Log channel configuration info on Intel drivers - igen6: Handle correctly the case of fused off memory controllers on Arizona Beach and Amston Lake SoCs before adding support for them - the usual set of fixes and cleanups * tag 'edac_updates_for_v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/bluefield: Don't use bluefield_edac_readl() result on error EDAC/i10nm: Fix the bitwise operation between variables of different sizes EDAC/ie31200: Add two Intel SoCs for EDAC support EDAC/{skx_common,i10nm}: Add RRL support for Intel Granite Rapids server EDAC/{skx_common,i10nm}: Refactor show_retry_rd_err_log() EDAC/{skx_common,i10nm}: Refactor enable_retry_rd_err_log() EDAC/{skx_common,i10nm}: Structure the per-channel RRL registers EDAC/i10nm: Explicitly set the modes of the RRL register sets EDAC/{skx_common,i10nm}: Fix the loss of saved RRL for HBM pseudo channel 0 EDAC/skx_common: Fix general protection fault EDAC/igen6: Add Intel Amston Lake SoCs support EDAC/igen6: Add Intel Arizona Beach SoCs support EDAC/igen6: Skip absent memory controllers
2025-05-27	bpf: Warn with __bpf_trap() kfunc maybe due to uninitialized variable	Yonghong Song
	Marc Suñé (Isovalent, part of Cisco) reported an issue where an uninitialized variable caused generating bpf prog binary code not working as expected. The reproducer is in [1] where the flags “-Wall -Werror” are enabled, but there is no warning as the compiler takes advantage of uninitialized variable to do aggressive optimization. The optimized code looks like below: ; { 0: bf 16 00 00 00 00 00 00 r6 = r1 ; bpf_printk("Start"); 1: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll 0000000000000008: R_BPF_64_64 .rodata 3: b4 02 00 00 06 00 00 00 w2 = 0x6 4: 85 00 00 00 06 00 00 00 call 0x6 ; DEFINE_FUNC_CTX_POINTER(data) 5: 61 61 4c 00 00 00 00 00 w1 = (u32 )(r6 + 0x4c) ; bpf_printk("pre ipv6_hdrlen_offset"); 6: 18 01 00 00 06 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x6 ll 0000000000000030: R_BPF_64_64 .rodata 8: b4 02 00 00 17 00 00 00 w2 = 0x17 9: 85 00 00 00 06 00 00 00 call 0x6 <END> The verifier will report the following failure: 9: (85) call bpf_trace_printk#6 last insn is not an exit or jmp The above verifier log does not give a clear hint about how to fix the problem and user may take quite some time to figure out that the issue is due to compiler taking advantage of uninitialized variable. In llvm internals, uninitialized variable usage may generate 'unreachable' IR insn and these 'unreachable' IR insns may indicate uninitialized variable impact on code optimization. So far, llvm BPF backend ignores 'unreachable' IR hence the above code is generated. With clang21 patch [2], those 'unreachable' IR insn are converted to func __bpf_trap(). In order to maintain proper control flow graph for bpf progs, [2] also adds an 'exit' insn after bpf_trap() if __bpf_trap() is the last insn in the function. The new code looks like: ; { 0: bf 16 00 00 00 00 00 00 r6 = r1 ; bpf_printk("Start"); 1: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll 0000000000000008: R_BPF_64_64 .rodata 3: b4 02 00 00 06 00 00 00 w2 = 0x6 4: 85 00 00 00 06 00 00 00 call 0x6 ; DEFINE_FUNC_CTX_POINTER(data) 5: 61 61 4c 00 00 00 00 00 w1 = (u32 )(r6 + 0x4c) ; bpf_printk("pre ipv6_hdrlen_offset"); 6: 18 01 00 00 06 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x6 ll 0000000000000030: R_BPF_64_64 .rodata 8: b4 02 00 00 17 00 00 00 w2 = 0x17 9: 85 00 00 00 06 00 00 00 call 0x6 10: 85 10 00 00 ff ff ff ff call -0x1 0000000000000050: R_BPF_64_32 __bpf_trap 11: 95 00 00 00 00 00 00 00 exit <END> In kernel, a new kfunc __bpf_trap() is added. During insn verification, any hit with __bpf_trap() will result in verification failure. The kernel is able to provide better log message for debugging. With llvm patch [2] and without this patch (no __bpf_trap() kfunc for existing kernel), e.g., for old kernels, the verifier outputs 10: <invalid kfunc call> kfunc '__bpf_trap' is referenced but wasn't resolved Basically, kernel does not support __bpf_trap() kfunc. This still didn't give clear signals about possible reason. With llvm patch [2] and with this patch, the verifier outputs 10: (85) call __bpf_trap#74479 unexpected __bpf_trap() due to uninitialized variable? It gives much better hints for verification failure. [1] https://github.com/msune/clang_bpf/blob/main/Makefile#L3 [2] https://github.com/llvm/llvm-project/pull/131731 Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250523205326.1291640-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	Merge tag 'x86_cache_for_v6.16_rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "Carve out the resctrl filesystem-related code into fs/resctrl/ so that multiple architectures can share the fs API for manipulating their respective hw resource control implementation. This is the second step in the work towards sharing the resctrl filesystem interface, the next one being plugging ARM's MPAM into the aforementioned fs API" * tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits) MAINTAINERS: Add reviewers for fs/resctrl x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl x86/resctrl: Always initialise rid field in rdt_resources_all[] x86/resctrl: Relax some asm #includes x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context() x86/resctrl: Squelch whitespace anomalies in resctrl core code x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs x86/resctrl: Move enum resctrl_event_id to resctrl.h x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl fs/resctrl: Add boiler plate for external resctrl code x86/resctrl: Add 'resctrl' to the title of the resctrl documentation x86/resctrl: Split trace.h x86/resctrl: Expand the width of domid by replacing mon_data_bits x86/resctrl: Add end-marker to the resctrl_event_id enum x86/resctrl: Move is_mba_sc() out of core.c x86/resctrl: Drop __init/__exit on assorted symbols x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point x86/resctrl: Check all domains are offline in resctrl_exit() x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_" ...
2025-05-27	bpf: Remove special_kfunc_set from verifier	Yonghong Song
	Currently, the verifier has both special_kfunc_set and special_kfunc_list. When adding a new kfunc usage to the verifier, it is often confusing about whether special_kfunc_set or special_kfunc_list or both should add that kfunc. For example, some kfuncs, e.g., bpf_dynptr_from_skb, bpf_dynptr_clone, bpf_wq_set_callback_impl, does not need to be in special_kfunc_set. To avoid potential future confusion, special_kfunc_set is deleted and btf_id_set_contains(&special_kfunc_set, ...) is removed. The code is refactored with a new func check_special_kfunc(), which contains all codes covered by original branch meta.btf == btf_vmlinux && btf_id_set_contains(&special_kfunc_set, meta.func_id) There is no functionality change. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250523205321.1291431-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	Merge branch 'replace-config_dmabuf_sysfs_stats-with-bpf'	Alexei Starovoitov
	T.J. Mercier says: ==================== Replace CONFIG_DMABUF_SYSFS_STATS with BPF Until CONFIG_DMABUF_SYSFS_STATS was added [1] it was only possible to perform per-buffer accounting with debugfs which is not suitable for production environments. Eventually we discovered the overhead with per-buffer sysfs file creation/removal was significantly impacting allocation and free times, and exacerbated kernfs lock contention. [2] dma_buf_stats_setup() is responsible for 39% of single-page buffer creation duration, or 74% of single-page dma_buf_export() duration when stressing dmabuf allocations and frees. I prototyped a change from per-buffer to per-exporter statistics with a RCU protected list of exporter allocations that accommodates most (but not all) of our use-cases and avoids almost all of the sysfs overhead. While that adds less overhead than per-buffer sysfs, and less even than the maintenance of the dmabuf debugfs_list, it's still additional overhead on top of the debugfs_list and doesn't give us per-buffer info. This series uses the existing dmabuf debugfs_list to implement a BPF dmabuf iterator, which adds no overhead to buffer allocation/free and provides per-buffer info. The list has been moved outside of CONFIG_DEBUG_FS scope so that it is always populated. The BPF program loaded by userspace that extracts per-buffer information gets to define its own interface which avoids the lack of ABI stability with debugfs. This will allow us to replace our use of CONFIG_DMABUF_SYSFS_STATS, and the plan is to remove it from the kernel after the next longterm stable release. [1] https://lore.kernel.org/linux-media/20201210044400.1080308-1-hridya@google.com [2] https://lore.kernel.org/all/20220516171315.2400578-1-tjmercier@google.com v1: https://lore.kernel.org/all/20250414225227.3642618-1-tjmercier@google.com v1 -> v2: Make the DMA buffer list independent of CONFIG_DEBUG_FS per Christian König Add CONFIG_DMA_SHARED_BUFFER check to kernel/bpf/Makefile per kernel test robot Use BTF_ID_LIST_SINGLE instead of BTF_ID_LIST_GLOBAL_SINGLE per Song Liu Fixup comment style, mixing code/declarations, and use ASSERT_OK_FD in selftest per Song Liu Add BPF_ITER_RESCHED feature to bpf_dmabuf_reg_info per Alexei Starovoitov Add open-coded iterator and selftest per Alexei Starovoitov Add a second test buffer from the system dmabuf heap to selftests Use the BPF program we'll use in production for selftest per Alexei Starovoitov https://r.android.com/c/platform/system/bpfprogs/+/3616123/2/dmabufIter.c https://r.android.com/c/platform/system/memory/libmeminfo/+/3614259/1/libdmabufinfo/dmabuf_bpf_stats.cpp v2: https://lore.kernel.org/all/20250504224149.1033867-1-tjmercier@google.com v2 -> v3: Rebase onto bpf-next/master Move get_next_dmabuf() into drivers/dma-buf/dma-buf.c, along with the new get_first_dmabuf(). This avoids having to expose the dmabuf list and mutex to the rest of the kernel, and keeps the dmabuf mutex operations near each other in the same file. (Christian König) Add Christian's RB to dma-buf: Rename debugfs symbols Drop RFC: dma-buf: Remove DMA-BUF statistics v3: https://lore.kernel.org/all/20250507001036.2278781-1-tjmercier@google.com v3 -> v4: Fix selftest BPF program comment style (not kdoc) per Alexei Starovoitov Fix dma-buf.c kdoc comment style per Alexei Starovoitov Rename get_first_dmabuf / get_next_dmabuf to dma_buf_iter_begin / dma_buf_iter_next per Christian König Add Christian's RB to bpf: Add dmabuf iterator v4: https://lore.kernel.org/all/20250508182025.2961555-1-tjmercier@google.com v4 -> v5: Add Christian's Acks to all patches Add Song Liu's Acks Move BTF_ID_LIST_SINGLE and DEFINE_BPF_ITER_FUNC closer to usage per Song Liu Fix open-coded iterator comment style per Song Liu Move iterator termination check to its own subtest per Song Liu Rework selftest buffer creation per Song Liu Fix spacing in sanitize_string per BPF CI v5: https://lore.kernel.org/all/20250512174036.266796-1-tjmercier@google.com v5 -> v6: Song Liu: Init test buffer FDs to -1 Zero-init udmabuf_create for future proofing Bail early for iterator fd/FILE creation failure Dereference char ptr to check for NUL in sanitize_string() Move map insertion from create_test_buffers() to test_dmabuf_iter() Add ACK to selftests/bpf: Add test for open coded dmabuf_iter v6: https://lore.kernel.org/all/20250513163601.812317-1-tjmercier@google.com v6 -> v7: Zero uninitialized name bytes following the end of name strings per s390x BPF CI Reorder sanitize_string bounds checks per Song Liu Add Song's Ack to: selftests/bpf: Add test for dmabuf_iter Rebase onto bpf-next/master per BPF CI ==================== Link: https://patch.msgid.link/20250522230429.941193-1-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	selftests/bpf: Add test for open coded dmabuf_iter	T.J. Mercier
	Use the same test buffers as the traditional iterator and a new BPF map to verify the test buffers can be found with the open coded dmabuf iterator. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-6-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	selftests/bpf: Add test for dmabuf_iter	T.J. Mercier
	This test creates a udmabuf, and a dmabuf from the system dmabuf heap, and uses a BPF program that prints dmabuf metadata with the new dmabuf_iter to verify they can be found. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-5-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	bpf: Add open coded dmabuf iterator	T.J. Mercier
	This open coded iterator allows for more flexibility when creating BPF programs. It can support output in formats other than text. With an open coded iterator, a single BPF program can traverse multiple kernel data structures (now including dmabufs), allowing for more efficient analysis of kernel data compared to multiple reads from procfs, sysfs, or multiple traditional BPF iterator invocations. Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-4-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	bpf: Add dmabuf iterator	T.J. Mercier
	The dmabuf iterator traverses the list of all DMA buffers. DMA buffers are refcounted through their associated struct file. A reference is taken on each buffer as the list is iterated to ensure each buffer persists for the duration of the bpf program execution without holding the list mutex. Signed-off-by: T.J. Mercier <tjmercier@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-3-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	dma-buf: Rename debugfs symbols	T.J. Mercier
	Rename the debugfs list and mutex so it's clear they are now usable without the need for CONFIG_DEBUG_FS. The list will always be populated to support the creation of a BPF iterator for dmabufs. Signed-off-by: T.J. Mercier <tjmercier@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250522230429.941193-2-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-05-27	ASoC: codecs: wcd93xx: Few regulator supplies fixes	Mark Brown
	Merge series from Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>: Fix cleanup paths in wcd9335 and wcd937x codec drivers.
2025-05-27	RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs	Maxim Levitsky
	Use kvm_trylock_all_vcpus instead of a custom implementation when locking all vCPUs of a VM. Compile tested only. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Anup Patel <anup@brainfault.org> Tested-by: Anup Patel <anup@brainfault.org> Message-ID: <20250512180407.659015-7-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27	KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs	Maxim Levitsky
	Use kvm_trylock_all_vcpus instead of a custom implementation when locking all vCPUs of a VM, to avoid triggering a lockdep warning, in the case in which the VM is configured to have more than MAX_LOCK_DEPTH vCPUs. This fixes the following false lockdep warning: [ 328.171264] BUG: MAX_LOCK_DEPTH too low! [ 328.175227] turning off the locking correctness validator. [ 328.180726] Please attach the output of /proc/lock_stat to the bug report [ 328.187531] depth: 48 max: 48! [ 328.190678] 48 locks held by qemu-kvm/11664: [ 328.194957] #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0 [ 328.204048] #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.212521] #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.220991] #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.229463] #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.237934] #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.246405] #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-ID: <20250512180407.659015-6-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27	x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation	Maxim Levitsky
	Use kvm_lock_all_vcpus instead of sev's own implementation. Because kvm_lock_all_vcpus uses the _nest_lock feature of lockdep, which ignores subclasses, there is no longer a need to use separate subclasses for source and target VMs. No functional change intended. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-ID: <20250512180407.659015-5-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27	KVM: add kvm_lock_all_vcpus and kvm_trylock_all_vcpus	Maxim Levitsky
	In a few cases, usually in the initialization code, KVM locks all vCPUs of a VM to ensure that userspace doesn't do funny things while KVM performs an operation that affects the whole VM. Until now, all these operations were implemented using custom code, and all of them share the same problem: Lockdep can't cope with simultaneous locking of a large number of locks of the same class. However if these locks are taken while another lock is already held, which is luckily the case, it is possible to take advantage of little known _nest_lock feature of lockdep which allows in this case to have an unlimited number of locks of same class to be taken. To implement this, create two functions: kvm_lock_all_vcpus() and kvm_trylock_all_vcpus() Both functions are needed because some code that will be replaced in the subsequent patches, uses mutex_trylock, instead of regular mutex_lock. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-ID: <20250512180407.659015-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27	locking/mutex: implement mutex_lock_killable_nest_lock	Maxim Levitsky
	KVM's SEV intra-host migration code needs to lock all vCPUs of the source and the target VM, before it proceeds with the migration. The number of vCPUs that belong to each VM is not bounded by anything except a self-imposed KVM limit of CONFIG_KVM_MAX_NR_VCPUS vCPUs which is significantly larger than the depth of lockdep's lock stack. Luckily, the locks in both of the cases mentioned above, are held under the 'kvm->lock' of each VM, which means that we can use the little known lockdep feature called a "nest_lock" to support this use case in a cleaner way, compared to the way it's currently done. Implement and expose 'mutex_lock_killable_nest_lock' for this purpose. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-ID: <20250512180407.659015-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27	locking/mutex: implement mutex_trylock_nested	Maxim Levitsky
	Despite the fact that several lockdep-related checks are skipped when calling trylock* versions of the locking primitives, for example mutex_trylock, each time the mutex is acquired, a held_lock is still placed onto the lockdep stack by __lock_acquire() which is called regardless of whether the trylock* or regular locking API was used. This means that if the caller successfully acquires more than MAX_LOCK_DEPTH locks of the same class, even when using mutex_trylock, lockdep will still complain that the maximum depth of the held lock stack has been reached and disable itself. For example, the following error currently occurs in the ARM version of KVM, once the code tries to lock all vCPUs of a VM configured with more than MAX_LOCK_DEPTH vCPUs, a situation that can easily happen on modern systems, where having more than 48 CPUs is common, and it's also common to run VMs that have vCPU counts approaching that number: [ 328.171264] BUG: MAX_LOCK_DEPTH too low! [ 328.175227] turning off the locking correctness validator. [ 328.180726] Please attach the output of /proc/lock_stat to the bug report [ 328.187531] depth: 48 max: 48! [ 328.190678] 48 locks held by qemu-kvm/11664: [ 328.194957] #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0 [ 328.204048] #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.212521] #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.220991] #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.229463] #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.237934] #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 [ 328.246405] #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0 Luckily, in all instances that require locking all vCPUs, the 'kvm->lock' is taken a priori, and that fact makes it possible to use the little known feature of lockdep, called a 'nest_lock', to avoid this warning and subsequent lockdep self-disablement. The action of 'nested lock' being provided to lockdep's lock_acquire(), causes the lockdep to detect that the top of the held lock stack contains a lock of the same class and then increment its reference counter instead of pushing a new held_lock item onto that stack. See __lock_acquire for more information. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Message-ID: <20250512180407.659015-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-27	Merge tag 'kvm-x86-svm-6.16' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM SVM changes for 6.16: - Wait for target vCPU to acknowledge KVM_REQ_UPDATE_PROTECTED_GUEST_STATE to fix a race between AP destroy and VMRUN. - Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for the VM. - Add support for ALLOWED_SEV_FEATURES. - Add #VMGEXIT to the set of handlers special cased for CONFIG_RETPOLINE=y. - Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing features that utilize those bits. - Don't account temporary allocations in sev_send_update_data(). - Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock Threshold.
2025-05-27	Merge tag 'kvm-x86-vmx-6.16' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM VMX changes for 6.16: - Explicitly check MSR load/store list counts to fix a potential overflow on 32-bit kernels. - Flush shadow VMCSes on emergency reboot. - Revert mem_enc_ioctl() back to an optional hook, as it's nullified when SEV or TDX is disabled via Kconfig. - Macrofy the handling of vt_x86_ops to eliminate a pile of boilerplate code needed for TDX, and to optimize CONFIG_KVM_INTEL_TDX=n builds.
2025-05-27	Merge tag 'kvm-x86-selftests-6.16' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM selftests changes for 6.16: - Add support for SNP to the various SEV selftests. - Add a selftest to verify fastops instructions via forced emulation. - Add MGLRU support to the access tracking perf test.
2025-05-27	Merge tag 'kvm-x86-pir-6.16' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 posted interrupt changes for 6.16: Refine and optimize KVM's software processing of the PIR, and ultimately share PIR harvesting code between KVM and the kernel's Posted MSI handler