summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-03-31Merge branch 'selftests-drv-net-replace-the-rpath-helper-with-path-objects'Jakub Kicinski
Jakub Kicinski says: ==================== selftests: drv-net: replace the rpath helper with Path objects Trying to change the env.rpath() helper during the development cycle was causing a lot of conflicts between net and net-next. Let's get it converted now that the trees are converged. v2: https://lore.kernel.org/20250306171158.1836674-1-kuba@kernel.org ==================== Link: https://patch.msgid.link/20250327222315.1098596-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-31selftests: net: use Path helpers in pingJakub Kicinski
Now that net and net-next have converged we can use the Path helpers in the ping test without conflicts. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250327222315.1098596-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-31selftests: net: use the dummy bpf from net/libJakub Kicinski
Commit 29b036be1b0b ("selftests: drv-net: test XDP, HDS auto and the ioctl path") added an sample XDP_PASS prog in net/lib, so that we can reuse it in various sub-directories. Delete the old sample and use the one from the lib in existing tests. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250327222315.1098596-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-31selftests: drv-net: replace the rpath helper with Path objectsJakub Kicinski
The single letter + "path" helpers do not have many fans (see Link). Use a Path object with a better name. test_dir is the replacement for rpath(), net_lib_dir is a new path of the $ksft/net/lib directory. The Path() class overloads the "/" operator and can be cast to string automatically, so to get a path to a file tests can do: path = env.test_dir / "binary" Link: https://lore.kernel.org/CA+FuTSemTNVZ5MxXkq8T9P=DYm=nSXcJnL7CJBPZNAT_9UFisQ@mail.gmail.com Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250327222315.1098596-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-31net: lapbether: use netdev_lockdep_set_classes() helperEric Dumazet
drivers/net/wan/lapbether.c uses stacked devices. Like similar drivers, it must use netdev_lockdep_set_classes() to avoid LOCKDEP splats. This is similar to commit 9bfc9d65a1dc ("hamradio: use netdev_lockdep_set_classes() helper") Fixes: 7e4d784f5810 ("net: hold netdev instance lock during rtnetlink operations") Reported-by: syzbot+377b71db585c9c705f8e@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/67cd611c.050a0220.14db68.0073.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250327144439.2463509-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-01rtc: max31335: Add driver support for max31331PavithraUdayakumar-adi
MAX31331 is an ultra-low-power, I2C Real-Time Clock RTC. Signed-off-by: PavithraUdayakumar-adi <pavithra.u@analog.com> Link: https://lore.kernel.org/r/20250217-add_support_max31331_fix_8-v1-2-16ebcfc02336@analog.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-04-01dt-bindings: rtc: max31335: Add max31331 supportPavithraUdayakumar-adi
Added DT compatible string for MAX31331. Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: PavithraUdayakumar-adi <pavithra.u@analog.com> Link: https://lore.kernel.org/r/20250217-add_support_max31331_fix_8-v1-1-16ebcfc02336@analog.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-03-31bcachefs: Fix null ptr deref in bch2_write_endio()Kent Overstreet
This was previously hard to hit since it requires racing with device removal, but splitting up io_ref uncovered it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-31bcachefs: Fix field spanning write warningKent Overstreet
Struct with embedded VLA... memcpy: detected field-spanning write (size 8) of single field "&gc->r.e" at fs/bcachefs/ec.c:465 (size 3) WARNING: CPU: 1 PID: 936 at fs/bcachefs/ec.c:465 bch2_trigger_stripe+0x706/0x730 Modules linked in: CPU: 1 UID: 0 PID: 936 Comm: mount.bcachefs Not tainted 6.14.0-rc6-ktest-00236-gefb0b5c62dbc #55 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:bch2_trigger_stripe+0x706/0x730 Code: b4 00 01 b9 03 00 00 00 48 89 fb 48 c7 c7 33 54 da 81 48 89 d6 49 89 d6 48 c7 c2 c3 36 db 81 e8 60 54 c5 ff 48 89 df 4c 89 f2 <0f> 0b e9 5c fd ff ff e8 fe 5e 4e 00 bf 10 00 00 00 48 c7 c6 ff ff RSP: 0018:ffff88817081f680 EFLAGS: 00010246 RAX: f8fe7dd1c56b5600 RBX: ffff888101265368 RCX: 0000000000000027 RDX: 0000000000000008 RSI: 00000000fffbffff RDI: ffff888101265368 RBP: 0000000000000000 R08: 000000000003ffff R09: ffff88817f1fe000 R10: 00000000000bfffd R11: 0000000000000004 R12: ffff8881012652c0 R13: 0000000000000000 R14: 0000000000000008 R15: ffff88817081f6c9 FS: 00007fc428bc7c80(0000) GS:ffff888179280000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd3ee4a038 CR3: 000000010a9bc000 CR4: 0000000000750eb0 PKRU: 55555554 Call Trace: <TASK> ? __warn+0xce/0x1b0 ? bch2_trigger_stripe+0x706/0x730 ? report_bug+0x11b/0x1a0 ? bch2_trigger_stripe+0x706/0x730 ? handle_bug+0x5e/0x90 ? exc_invalid_op+0x1a/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? bch2_trigger_stripe+0x706/0x730 bch2_gc_mark_key+0x2cf/0x430 bch2_check_allocations+0x1a64/0x1ed0 ? vsnprintf+0x1ad/0x420 ? bch2_check_allocations+0x191f/0x1ed0 bch2_run_recovery_passes+0x13b/0x2b0 bch2_fs_recovery+0x9b7/0x1290 ? __bch2_print+0xb2/0xf0 ? bch2_printbuf_exit+0x1e/0x30 ? print_mount_opts+0x153/0x180 bch2_fs_start+0x274/0x3b0 bch2_fs_get_tree+0x516/0x6e0 vfs_get_tree+0x21/0xa0 do_new_mount+0x153/0x350 __x64_sys_mount+0x16c/0x1f0 do_syscall_64+0x6c/0x140 ? arch_exit_to_user_mode_prepare+0x9/0x40 entry_SYSCALL_64_after_hwframe+0x4b/0x53 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-31bcachefs: Fix striping behaviourKent Overstreet
For striping across devices, we maintain "clocks", and we advance them by the inverse of "how much free space this device has left", so that we round robin biased in favor of devices with more free space. This code was originally trying to do EWMA-ish stuff when originally written, ~10 years ago, and was never properly cleaned up when it was realized that an EWMA is not the right approach here. That left a bug, when we rescale to keep all the clocks in the correct range and prevent overflow. It was assumed that we'd always be allocated from the device with the smallest clock hand, but that's actually not correct: with the target options, allocations will be first tried from a subset of devices, and then the entire filesystem if that fails. Thus, the rescale from the first allocation - allocating from a subset of devices - can pick the wrong rescale value and cause the rest of the clocks to go to 0, losing information. This resuls in incorrect striping behaviour when the desired number of replicas doesn't fit on the foreground target. Link: https://www.reddit.com/r/bcachefs/comments/1jn3t26/replica_allocation_not_evenly_distributed_among/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-31x86: don't re-generate cpufeaturemasks.h so eagerlyLinus Torvalds
It turns out the code to generate the x86 cpufeaturemasks.h header was way too aggressive, and would re-generate it whenever the timestamp on the kernel config file changed. Now, the regular 'make *config' tools are fairly careful to not rewrite the kernel config file unless the contents change, but other usecases aren't that careful. Michael Kelley reports that 'make-kpkg' ends up doing "make syncconfig" multiple times in prepping to build, and will modify the config file in the process (and then modify it back, but by then the timestamps have changed). Jakub Kicinski reports that the netdev CI does something similar in how it generates the config file in multiple steps. In both cases, the config file timestamp updates then cause the cpufeaturemasks.h file to be regenerated, and that in turn then causes lots of unnecessary rebuilds due to all the normal dependencies. Fix it by using our 'filechk' infrastructure in the Makefile to generate the header file. That will only write a new version of the file if the contents of the file have actually changed. Fixes: 841326332bcb ("x86/cpufeatures: Generate the <asm/cpufeaturemasks.h> header based on build config") Reported-by: Michael Kelley <mhklinux@outlook.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/all/SN6PR02MB415756D1829740F6E8AC11D1D4D82@SN6PR02MB4157.namprd02.prod.outlook.com/ Link: https://lore.kernel.org/all/20250328162311.08134fa6@kernel.org/ Cc: Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-31Merge tag 'trace-ringbuffer-v6.15-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer updates from Steven Rostedt: - Restructure the persistent memory to have a "scratch" area Instead of hard coding the KASLR offset in the persistent memory by the ring buffer, push that work up to the callers of the persistent memory as they are the ones that need this information. The offsets and such is not important to the ring buffer logic and it should not be part of that. A scratch pad is now created when the caller allocates a ring buffer from persistent memory by stating how much memory it needs to save. - Allow where modules are loaded to be saved in the new scratch pad Save the addresses of modules when they are loaded into the persistent memory scratch pad. - A new module_for_each_mod() helper function was created With the acknowledgement of the module maintainers a new module helper function was created to iterate over all the currently loaded modules. This has a callback to be called for each module. This is needed for when tracing is started in the persistent buffer and the currently loaded modules need to be saved in the scratch area. - Expose the last boot information where the kernel and modules were loaded The last_boot_info file is updated to print out the addresses of where the kernel "_text" location was loaded from a previous boot, as well as where the modules are loaded. If the buffer is recording the current boot, it only prints "# Current" so that it does not expose the KASLR offset of the currently running kernel. - Allow the persistent ring buffer to be released (freed) To have this in production environments, where the kernel command line can not be changed easily, the ring buffer needs to be freed when it is not going to be used. The memory for the buffer will always be allocated at boot up, but if the system isn't going to enable tracing, the memory needs to be freed. Allow it to be freed and added back to the kernel memory pool. - Allow stack traces to print the function names in the persistent buffer Now that the modules are saved in the persistent ring buffer, if the same modules are loaded, the printing of the function names will examine the saved modules. If the module is found in the scratch area and is also loaded, then it will do the offset shift and use kallsyms to display the function name. If the address is not found, it simply displays the address from the previous boot in hex. * tag 'trace-ringbuffer-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Use _text and the kernel offset in last_boot_info tracing: Show last module text symbols in the stacktrace ring-buffer: Remove the unused variable bmeta tracing: Skip update_last_data() if cleared and remove active check for save_mod() tracing: Initialize scratch_size to zero to prevent UB tracing: Fix a compilation error without CONFIG_MODULES tracing: Freeable reserved ring buffer mm/memblock: Add reserved memory release function tracing: Update modules to persistent instances when loaded tracing: Show module names and addresses of last boot tracing: Have persistent trace instances save module addresses module: Add module_for_each_mod() function tracing: Have persistent trace instances save KASLR offset ring-buffer: Add ring_buffer_meta_scratch() ring-buffer: Add buffer meta data for persistent ring buffer ring-buffer: Use kaslr address instead of text delta ring-buffer: Fix bytes_dropped calculation issue
2025-03-31io_uring/net: avoid import_ubuf for regvec sendPavel Begunkov
With registered buffers we set up iterators in helpers like io_import_fixed(), and there is no need for a import_ubuf() before that. It was fine as we used real pointers for offset calculation, but that's not the case anymore since introduction of ublk kernel buffers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9b2de1a50844f848f62c8de609b494971033a6b9.1743437358.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-31io_uring/rsrc: check size when importing reg bufferPavel Begunkov
We're relying on callers to verify the IO size, do it inside of io_import_fixed() instead. It's safer, easier to deal with, and more consistent as now it's done close to the iter init site. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f9c2c75ec4d356a0c61289073f68d98e8a9db190.1743446271.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-31Merge tag 'trace-latency-v6.15-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing documentation fix from Steven Rostedt: "Documentation fix for runtime verifier The runtime verifier documents that were created were not referenced in the indices, which caused warning when building the documentation tree. Those documents are now added to the rv indices" * tag 'trace-latency-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: Documentation/rv: Add sched pages to the indices
2025-03-31Merge tag 'perf-tools-for-v6.15-2025-03-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "perf record: - Introduce latency profiling using scheduler information. The latency profiling is to show impacts on wall-time rather than cpu-time. By tracking context switches, it can weight samples and find which part of the code contributed more to the execution latency. The value (period) of the sample is weighted by dividing it by the number of parallel execution at the moment. The parallelism is tracked in perf report with sched-switch records. This will reduce the portion that are run in parallel and in turn increase the portion of serial executions. For now, it's limited to profile processes, IOW system-wide profiling is not supported. You can add --latency option to enable this. $ perf record --latency -- make -C tools/perf I've run the above command for perf build which adds -j option to make with the number of CPUs in the system internally. Normally it'd show something like below: $ perf report -F overhead,comm ... # # Overhead Command # ........ ............... # 78.97% cc1 6.54% python3 4.21% shellcheck 3.28% ld 1.80% as 1.37% cc1plus 0.80% sh 0.62% clang 0.56% gcc 0.44% perl 0.39% make ... The cc1 takes around 80% of the overhead as it's the actual compiler. However it runs in parallel so its contribution to latency may be less than that. Now, perf report will show both overhead and latency (if --latency was given at record time) like below: $ perf report -s comm ... # # Overhead Latency Command # ........ ........ ............... # 78.97% 48.66% cc1 6.54% 25.68% python3 4.21% 0.39% shellcheck 3.28% 13.70% ld 1.80% 2.56% as 1.37% 3.08% cc1plus 0.80% 0.98% sh 0.62% 0.61% clang 0.56% 0.33% gcc 0.44% 1.71% perl 0.39% 0.83% make ... You can see latency of cc1 goes down to around 50% and python3 and ld contribute a lot more than their overhead. You can use --latency option in perf report to get the same result but ordered by latency. $ perf report --latency -s comm perf report: - As a side effect of the latency profiling work, it adds a new output field 'latency' and a sort key 'parallelism'. The below is a result from my system with 64 CPUs. The build was well-parallelized but contained some serial portions. $ perf report -s parallelism ... # # Overhead Latency Parallelism # ........ ........ ........... # 16.95% 1.54% 62 13.38% 1.24% 61 12.50% 70.47% 1 11.81% 1.06% 63 7.59% 0.71% 60 4.33% 12.20% 2 3.41% 0.33% 59 2.05% 0.18% 64 1.75% 1.09% 9 1.64% 1.85% 5 ... - Support Feodra mini-debuginfo which is a LZMA compressed symbol table inside ".gnu_debugdata" ELF section. perf annotate: - Add --code-with-type option to enable data-type profiling with the usual annotate output. Instead of focusing on data structure, it shows code annotation together with data type it accesses in case the instruction refers to a memory location (and it was able to resolve the target data type). Currently it only works with --stdio. $ perf annotate --stdio --code-with-type ... Percent | Source code & Disassembly of vmlinux for cpu/mem-loads,ldlat=30/pp (18 samples, percent: local period) ---------------------------------------------------------------------------------------------------------------------- : 0 0xffffffff81050610 <__fdget>: 0.00 : ffffffff81050610: callq 0xffffffff81c01b80 <__fentry__> # data-type: (stack operation) 0.00 : ffffffff81050615: pushq %rbp # data-type: (stack operation) 0.00 : ffffffff81050616: movq %rsp, %rbp 0.00 : ffffffff81050619: pushq %r15 # data-type: (stack operation) 0.00 : ffffffff8105061b: pushq %r14 # data-type: (stack operation) 0.00 : ffffffff8105061d: pushq %rbx # data-type: (stack operation) 0.00 : ffffffff8105061e: subq $0x10, %rsp 0.00 : ffffffff81050622: movl %edi, %ebx 0.00 : ffffffff81050624: movq %gs:0x7efc4814(%rip), %rax # 0x14e40 <current_task> # data-type: struct task_struct* +0 0.00 : ffffffff8105062c: movq 0x8d0(%rax), %r14 # data-type: struct task_struct +0x8d0 (files) 0.00 : ffffffff81050633: movl (%r14), %eax # data-type: struct files_struct +0 (count.counter) 0.00 : ffffffff81050636: cmpl $0x1, %eax 0.00 : ffffffff81050639: je 0xffffffff810506a9 <__fdget+0x99> 0.00 : ffffffff8105063b: movq 0x20(%r14), %rcx # data-type: struct files_struct +0x20 (fdt) 0.00 : ffffffff8105063f: movl (%rcx), %eax # data-type: struct fdtable +0 (max_fds) 0.00 : ffffffff81050641: cmpl %ebx, %eax 0.00 : ffffffff81050643: jbe 0xffffffff810506ef <__fdget+0xdf> 0.00 : ffffffff81050649: movl %ebx, %r15d 5.56 : ffffffff8105064c: movq 0x8(%rcx), %rdx # data-type: struct fdtable +0x8 (fd) ... The "# data-type:" part was added with this change. The first few entries are not very interesting. But later you can it accesses a couple of fields in the task_struct, files_struct and fdtable. perf trace: - Support syscall tracing for different ABI. For example it can trace system calls for 32-bit applications on 64-bit kernel transparently. - Add --summary-mode=total option to show global syscall summary. The default is 'thread' to show per-thread syscall summary. Python support: - Add more interfaces to 'perf' module to parse events, and config, enable or disable the event list properly so that it can implement basic functionalities purely in Python. There is an example code for these new interfaces in python/tracepoint.py. - Add mypy and pylint support to enable build time checking. Fix some code based on the findings from these tools. Internals: - Introduce io_dir__readdir() API to make directory traveral (usually for proc or sysfs) efficient with less memory footprint. JSON vendor events: - Add events and metrics for ARM Neoverse N3 and V3 - Update events and metrics on various Intel CPUs - Add/update events for a number of SiFive processors" * tag 'perf-tools-for-v6.15-2025-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (229 commits) perf bpf-filter: Fix a parsing error with comma perf report: Fix a memory leak for perf_env on AMD perf trace: Fix wrong size to bpf_map__update_elem call perf tools: annotate asm_pure_loop.S perf python: Fix setup.py mypy errors perf test: Address attr.py mypy error perf build: Add pylint build tests perf build: Add mypy build tests perf build: Rename TEST_LOGS to SHELL_TEST_LOGS tools/build: Don't pass test log files to linker perf bench sched pipe: fix enforced blocking reads in worker_thread perf tools: Fix is_compat_mode build break in ppc64 perf build: filter all combinations of -flto for libperl perf vendor events arm64 AmpereOneX: Fix frontend_bound calculation perf vendor events arm64: AmpereOne/AmpereOneX: Mark LD_RETIRED impacted by errata perf trace: Fix evlist memory leak perf trace: Fix BTF memory leak perf trace: Make syscall table stable perf syscalltbl: Mask off ABI type for MIPS system calls perf build: Remove Makefile.syscalls ...
2025-03-31ACPI: video: Handle fetching EDID as ACPI_TYPE_PACKAGEGergo Koteles
The _DDC method should return a buffer, or an integer in case of an error. But some Lenovo laptops incorrectly return EDID as buffer in ACPI package. Calling _DDC generates this ACPI Warning: ACPI Warning: \_SB.PCI0.GP17.VGA.LCD._DDC: Return type mismatch - \ found Package, expected Integer/Buffer (20240827/nspredef-254) Use the first element of the package to get the EDID buffer. The DSDT: Name (AUOP, Package (0x01) { Buffer (0x80) { ... } }) ... Method (_DDC, 1, NotSerialized) // _DDC: Display Data Current { If ((PAID == AUID)) { Return (AUOP) /* \_SB_.PCI0.GP17.VGA_.LCD_.AUOP */ } ElseIf ((PAID == IVID)) { Return (IVOP) /* \_SB_.PCI0.GP17.VGA_.LCD_.IVOP */ } ElseIf ((PAID == BOID)) { Return (BOEP) /* \_SB_.PCI0.GP17.VGA_.LCD_.BOEP */ } ElseIf ((PAID == SAID)) { Return (SUNG) /* \_SB_.PCI0.GP17.VGA_.LCD_.SUNG */ } Return (Zero) } Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/Apx_B_Video_Extensions/output-device-specific-methods.html#ddc-return-the-edid-for-this-device Cc: stable@vger.kernel.org Fixes: c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if available for eDP") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4085 Signed-off-by: Gergo Koteles <soyer@irl.hu> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/61c3df83ab73aba0bc7a941a443cd7faf4cf7fb0.1743195250.git.soyer@irl.hu Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-31io_uring: cleanup {g,s]etsockopt sqe readingPavel Begunkov
Add a local variable for the sqe pointer to avoid repetition. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8dbac0f9acda2d3842534eeb7ce10d9276b021ae.1743357108.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-31io_uring: hide caches sqes from driversPavel Begunkov
There is now an io_uring private part of cmd async_data, move saved sqe into it. Drivers are accessing it via struct io_uring_cmd::cmd. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ecbe078dd57acefdbc4366d083327086c0879378.1743357121.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-31io_uring: make zcrx depend on CONFIG_IO_URINGPavel Begunkov
Reflect in the kconfig that zcrx requires io_uring compiled. Fixes: 6f377873cb239 ("io_uring/zcrx: add interface queue and refill queue") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8047135a344e79dbd04ee36a7a69cc260aabc2ca.1743356260.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-31io_uring: add req flag invariant build assertionPavel Begunkov
We're caching some of file related request flags in a tricky way, put a build check to make sure flags don't get reshuffled. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9877577b83c25dd78224a8274f799187e7ec7639.1743407551.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-31Documentation: ublk: remove dead footnoteJens Axboe
A previous commit removed the use of this footnote, delete it. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 3fdf2ec7da1c ("Documentation: ublk: Drop Stefan Hajnoczi's message footnote") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-31fuse: remove unneeded atomic set in uring creationJoanne Koong
When the ring is allocated, it is kzalloc-ed. ring->queue_refs will already be initialized to 0 by default. It does not need to be atomically set to 0. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-03-31fuse: fix uring race condition for null dereference of fcJoanne Koong
There is a race condition leading to a kernel crash from a null dereference when attemping to access fc->lock in fuse_uring_create_queue(). fc may be NULL in the case where another thread is creating the uring in fuse_uring_create() and has set fc->ring but has not yet set ring->fc when fuse_uring_create_queue() reads ring->fc. There is another race condition as well where in fuse_uring_register(), ring->nr_queues may still be 0 and not yet set to the new value when we compare qid against it. This fix sets fc->ring only after ring->fc and ring->nr_queues have been set, which guarantees now that ring->fc is a proper pointer when any queues are created and ring->nr_queues reflects the right number of queues if ring is not NULL. We must use smp_store_release() and smp_load_acquire() semantics to ensure the ordering will remain correct where fc->ring is assigned only after ring->fc and ring->nr_queues have been assigned. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Fixes: 24fe962c86f5 ("fuse: {io-uring} Handle SQEs - register commands") Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-03-31fuse: Increase FUSE_NAME_MAX to PATH_MAXBernd Schubert
Our file system has a translation capability for S3-to-posix. The current value of 1kiB is enough to cover S3 keys, but does not allow encoding of %xx escape characters. The limit is increased to (PATH_MAX - 1), as we need 3 x 1024 and that is close to PATH_MAX (4kB) already. -1 is used as the terminating null is not included in the length calculation. Testing large file names was hard with libfuse/example file systems, so I created a new memfs that does not have a 255 file name length limitation. https://github.com/libfuse/libfuse/pull/1077 The connection is initialized with FUSE_NAME_LOW_MAX, which is set to the previous value of FUSE_NAME_MAX of 1024. With FUSE_MIN_READ_BUFFER of 8192 that is enough for two file names + fuse headers. When FUSE_INIT reply sets max_pages to a value > 1 we know that fuse daemon supports request buffers of at least 2 pages (+ header) and can therefore hold 2 x PATH_MAX file names - operations like rename or link that need two file names are no issue then. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-03-31fuse: Allocate only namelen buf memory in fuse_notify_Bernd Schubert
fuse_notify_inval_entry and fuse_notify_delete were using fixed allocations of FUSE_NAME_MAX to hold the file name. Often that large buffers are not needed as file names might be smaller, so this uses the actual file name size to do the allocation. Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-03-31fuse: add default_request_timeout and max_request_timeout sysctlsJoanne Koong
Introduce two new sysctls, "default_request_timeout" and "max_request_timeout". These control how long (in seconds) a server can take to reply to a request. If the server does not reply by the timeout, then the connection will be aborted. The upper bound on these sysctl values is 65535. "default_request_timeout" sets the default timeout if no timeout is specified by the fuse server on mount. 0 (default) indicates no default timeout should be enforced. If the server did specify a timeout, then default_request_timeout will be ignored. "max_request_timeout" sets the max amount of time the server may take to reply to a request. 0 (default) indicates no maximum timeout. If max_request_timeout is set and the fuse server attempts to set a timeout greater than max_request_timeout, the system will use max_request_timeout as the timeout. Similarly, if default_request_timeout is greater than max_request_timeout, the system will use max_request_timeout as the timeout. If the server does not request a timeout and default_request_timeout is set to 0 but max_request_timeout is set, then the timeout will be max_request_timeout. Please note that these timeouts are not 100% precise. The request may take roughly an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the set max timeout due to how it's internally implemented. $ sysctl -a | grep fuse.default_request_timeout fs.fuse.default_request_timeout = 0 $ echo 65536 | sudo tee /proc/sys/fs/fuse/default_request_timeout tee: /proc/sys/fs/fuse/default_request_timeout: Invalid argument $ echo 65535 | sudo tee /proc/sys/fs/fuse/default_request_timeout 65535 $ sysctl -a | grep fuse.default_request_timeout fs.fuse.default_request_timeout = 65535 $ echo 0 | sudo tee /proc/sys/fs/fuse/default_request_timeout 0 $ sysctl -a | grep fuse.default_request_timeout fs.fuse.default_request_timeout = 0 [Luis Henriques: Limit the timeout to the range [FUSE_TIMEOUT_TIMER_FREQ, fuse_max_req_timeout]] Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Luis Henriques <luis@igalia.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-03-31fuse: add kernel-enforced timeout option for requestsJoanne Koong
There are situations where fuse servers can become unresponsive or stuck, for example if the server is deadlocked. Currently, there's no good way to detect if a server is stuck and needs to be killed manually. This commit adds an option for enforcing a timeout (in seconds) for requests where if the timeout elapses without the server responding to the request, the connection will be automatically aborted. Please note that these timeouts are not 100% precise. For example, the request may take roughly an extra FUSE_TIMEOUT_TIMER_FREQ seconds beyond the requested timeout due to internal implementation, in order to mitigate overhead. [SzM: Bump the API version number] Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-03-31fuse: optmize missing FUSE_LINK supportMiklos Szeredi
If filesystem doesn't support FUSE_LINK (i.e. returns -ENOSYS), then remember this and next time return immediately, without incurring the overhead of a round trip to the server. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-03-31fuse: Return EPERM rather than ENOSYS from link()Matt Johnston
link() is documented to return EPERM when a filesystem doesn't support the operation, return that instead. Link: https://github.com/libfuse/libfuse/issues/925 Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-03-31fuse: removed unused function fuse_uring_create() from headerLuis Henriques
Function fuse_uring_create() is used only from dev_uring.c and does not need to be exposed in the header file. Furthermore, it has the wrong signature. While there, also remove the 'struct fuse_ring' forward declaration. Signed-off-by: Luis Henriques <luis@igalia.com> Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-03-31fuse: {io-uring} Fix a possible req cancellation raceBernd Schubert
task-A (application) might be in request_wait_answer and try to remove the request when it has FR_PENDING set. task-B (a fuse-server io-uring task) might handle this request with FUSE_IO_URING_CMD_COMMIT_AND_FETCH, when fetching the next request and accessed the req from the pending list in fuse_uring_ent_assign_req(). That code path was not protected by fiq->lock and so might race with task-A. For scaling reasons we better don't use fiq->lock, but add a handler to remove canceled requests from the queue. This also removes usage of fiq->lock from fuse_uring_add_req_to_ring_ent() altogether, as it was there just to protect against this race and incomplete. Also added is a comment why FR_PENDING is not cleared. Fixes: c090c8abae4b ("fuse: Add io-uring sqe commit and fetch support") Cc: <stable@vger.kernel.org> # v6.14 Reported-by: Joanne Koong <joannelkoong@gmail.com> Closes: https://lore.kernel.org/all/CAJnrk1ZgHNb78dz-yfNTpxmW7wtT88A=m-zF0ZoLXKLUHRjNTw@mail.gmail.com/ Signed-off-by: Bernd Schubert <bschubert@ddn.com> Reviewed-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2025-03-31net: phy: broadcom: Correct BCM5221 PHY model detectionJim Liu
Correct detect condition is applied to the entire 5221 family of PHYs. Fixes: 3abbd0699b67 ("net: phy: broadcom: add support for BCM5221 phy") Signed-off-by: Jim Liu <jim.t90615@gmail.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-03-31ACPI: processor: idle: Return an error if both P_LVL{2,3} idle states are ↵Giovanni Gherdovich
invalid Prior to commit 496121c02127 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state"), the acpi_idle driver wouldn't load on systems without a valid C-State at least as deep as C2. The behavior was desirable for guests on hypervisors such as VMWare ESXi, which by default don't have the _CST ACPI method, and set the C2 and C3 latencies to 101 and 1001 microseconds respectively via the FADT, to signify they're unsupported. Since the above change though, these virtualized deployments end up loading acpi_idle, and thus entering the default C1 C-State set by acpi_processor_get_power_info_default(); this is undesirable for a system that's communicating to the OS it doesn't want C-States (missing _CST, and invalid C2/C3 in FADT). Make acpi_processor_get_power_info_fadt() return -ENODEV in that case, so that acpi_processor_get_cstate_info() exits early and doesn't set pr->flags.power = 1. Fixes: 496121c02127 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state") Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Reviewed-by: Zhang Rui <rui.zhang@intel.com> Link: https://patch.msgid.link/20250328143040.9348-1-ggherdovich@suse.cz [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-03-31i3c: Add NULL pointer check in i3c_master_queue_ibi()Manjunatha Venkatesh
The I3C master driver may receive an IBI from a target device that has not been probed yet. In such cases, the master calls `i3c_master_queue_ibi()` to queue an IBI work task, leading to "Unable to handle kernel read from unreadable memory" and resulting in a kernel panic. Typical IBI handling flow: 1. The I3C master scans target devices and probes their respective drivers. 2. The target device driver calls `i3c_device_request_ibi()` to enable IBI and assigns `dev->ibi = ibi`. 3. The I3C master receives an IBI from the target device and calls `i3c_master_queue_ibi()` to queue the target device driver’s IBI handler task. However, since target device events are asynchronous to the I3C probe sequence, step 3 may occur before step 2, causing `dev->ibi` to be `NULL`, leading to a kernel panic. Add a NULL pointer check in `i3c_master_queue_ibi()` to prevent accessing an uninitialized `dev->ibi`, ensuring stability. Fixes: 3a379bbcea0af ("i3c: Add core I3C infrastructure") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/Z9gjGYudiYyl3bSe@lizhi-Precision-Tower-5810/ Signed-off-by: Manjunatha Venkatesh <manjunatha.venkatesh@nxp.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250326123047.2797946-1-manjunatha.venkatesh@nxp.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-03-31i3c: master: Drop duplicate check before calling OF APIsAndy Shevchenko
OF APIs are usually NULL-aware and returns an error in case when device node is not present or supported. We already have a check for the returned value, no need to check for the parameter. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20250321193044.457649-1-andriy.shevchenko@linux.intel.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2025-03-31scripts: generate_rust_analyzer: fix pin-init name in kernel depsAndrei Lalaev
Because of different crate names ("pin-init" and "pin_init") passed to "append_crate" and "append_crate_with_generated", the script fails with "KeyError: 'pin-init'". To overcome the issue, pass the same name to both functions. Signed-off-by: Andrei Lalaev <andrei.lalaev@anton-paar.com> Link: https://lore.kernel.org/r/AM9PR03MB7074692E5D24C288D2BBC801C8AD2@AM9PR03MB7074.eurprd03.prod.outlook.com Fixes: 4e82c87058f4 ("Merge tag 'rust-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux") [ Made author match the Signed-off-by one. Added newline. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-03-30bcachefs: fix bch2_write_point_to_text() unitsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30Merge tag 'rust-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull Rust updates from Miguel Ojeda: "Toolchain and infrastructure: - Extract the 'pin-init' API from the 'kernel' crate and make it into a standalone crate. In order to do this, the contents are rearranged so that they can easily be kept in sync with the version maintained out-of-tree that other projects have started to use too (or plan to, like QEMU). This will reduce the maintenance burden for Benno, who will now have his own sub-tree, and will simplify future expected changes like the move to use 'syn' to simplify the implementation. - Add '#[test]'-like support based on KUnit. We already had doctests support based on KUnit, which takes the examples in our Rust documentation and runs them under KUnit. Now, we are adding the beginning of the support for "normal" tests, similar to those the '#[test]' tests in userspace Rust. For instance: #[kunit_tests(my_suite)] mod tests { #[test] fn my_test() { assert_eq!(1 + 1, 2); } } Unlike with doctests, the 'assert*!'s do not map to the KUnit assertion APIs yet. - Check Rust signatures at compile time for functions called from C by name. In particular, introduce a new '#[export]' macro that can be placed in the Rust function definition. It will ensure that the function declaration on the C side matches the signature on the Rust function: #[export] pub unsafe extern "C" fn my_function(a: u8, b: i32) -> usize { // ... } The macro essentially forces the compiler to compare the types of the actual Rust function and the 'bindgen'-processed C signature. These cases are rare so far. In the future, we may consider introducing another tool, 'cbindgen', to generate C headers automatically. Even then, having these functions explicitly marked may be a good idea anyway. - Enable the 'raw_ref_op' Rust feature: it is already stable, and allows us to use the new '&raw' syntax, avoiding a couple macros. After everyone has migrated, we will disallow the macros. - Pass the correct target to 'bindgen' on Usermode Linux. - Fix 'rusttest' build in macOS. 'kernel' crate: - New 'hrtimer' module: add support for setting up intrusive timers without allocating when starting the timer. Add support for 'Pin<Box<_>>', 'Arc<_>', 'Pin<&_>' and 'Pin<&mut _>' as pointer types for use with timer callbacks. Add support for setting clock source and timer mode. - New 'dma' module: add a simple DMA coherent allocator abstraction and a test sample driver. - 'list' module: make the linked list 'Cursor' point between elements, rather than at an element, which is more convenient to us and allows for cursors to empty lists; and document it with examples of how to perform common operations with the provided methods. - 'str' module: implement a few traits for 'BStr' as well as the 'strip_prefix()' method. - 'sync' module: add 'Arc::as_ptr'. - 'alloc' module: add 'Box::into_pin'. - 'error' module: extend the 'Result' documentation, including a few examples on different ways of handling errors, a warning about using methods that may panic, and links to external documentation. 'macros' crate: - 'module' macro: add the 'authors' key to support multiple authors. The original key will be kept until everyone has migrated. Documentation: - Add error handling sections. MAINTAINERS: - Add Danilo Krummrich as reviewer of the Rust "subsystem". - Add 'RUST [PIN-INIT]' entry with Benno Lossin as maintainer. It has its own sub-tree. - Add sub-tree for 'RUST [ALLOC]'. - Add 'DMA MAPPING HELPERS DEVICE DRIVER API [RUST]' entry with Abdiel Janulgue as primary maintainer. It will go through the sub-tree of the 'RUST [ALLOC]' entry. - Add 'HIGH-RESOLUTION TIMERS [RUST]' entry with Andreas Hindborg as maintainer. It has its own sub-tree. And a few other cleanups and improvements" * tag 'rust-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (71 commits) rust: dma: add `Send` implementation for `CoherentAllocation` rust: macros: fix `make rusttest` build on macOS rust: block: refactor to use `&raw mut` rust: enable `raw_ref_op` feature rust: uaccess: name the correct function rust: rbtree: fix comments referring to Box instead of KBox rust: hrtimer: add maintainer entry rust: hrtimer: add clocksource selection through `ClockId` rust: hrtimer: add `HrTimerMode` rust: hrtimer: implement `HrTimerPointer` for `Pin<Box<T>>` rust: alloc: add `Box::into_pin` rust: hrtimer: implement `UnsafeHrTimerPointer` for `Pin<&mut T>` rust: hrtimer: implement `UnsafeHrTimerPointer` for `Pin<&T>` rust: hrtimer: add `hrtimer::ScopedHrTimerPointer` rust: hrtimer: add `UnsafeHrTimerPointer` rust: hrtimer: allow timer restart from timer handler rust: str: implement `strip_prefix` for `BStr` rust: str: implement `AsRef<BStr>` for `[u8]` and `BStr` rust: str: implement `Index` for `BStr` rust: str: implement `PartialEq` for `BStr` ...
2025-03-30Merge tag 'modules-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux Pull modules updates from Petr Pavlu: - Use RCU instead of RCU-sched The mix of rcu_read_lock(), rcu_read_lock_sched() and preempt_disable() in the module code and its users has been replaced with just rcu_read_lock() - The rest of changes are smaller fixes and updates * tag 'modules-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: (32 commits) MAINTAINERS: Update the MODULE SUPPORT section module: Remove unnecessary size argument when calling strscpy() module: Replace deprecated strncpy() with strscpy() params: Annotate struct module_param_attrs with __counted_by() bug: Use RCU instead RCU-sched to protect module_bug_list. static_call: Use RCU in all users of __module_text_address(). kprobes: Use RCU in all users of __module_text_address(). bpf: Use RCU in all users of __module_text_address(). jump_label: Use RCU in all users of __module_text_address(). jump_label: Use RCU in all users of __module_address(). x86: Use RCU in all users of __module_address(). cfi: Use RCU while invoking __module_address(). powerpc/ftrace: Use RCU in all users of __module_text_address(). LoongArch: ftrace: Use RCU in all users of __module_text_address(). LoongArch/orc: Use RCU in all users of __module_address(). arm64: module: Use RCU in all users of __module_text_address(). ARM: module: Use RCU in all users of __module_text_address(). module: Use RCU in all users of __module_text_address(). module: Use RCU in all users of __module_address(). module: Use RCU in search_module_extables(). ...
2025-03-30Merge tag 'x86-urgent-2025-03-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes and updates from Ingo Molnar: - Fix a large number of x86 Kconfig dependency and help text accuracy bugs/problems, by Mateusz Jończyk and David Heideberg - Fix a VM_PAT interaction with fork() crash. This also touches core kernel code - Fix an ORC unwinder bug for interrupt entries - Fixes and cleanups - Fix an AMD microcode loader bug that can promote verification failures into success - Add early-printk support for MMIO based UARTs on an x86 board that had no other serial debugging facility and also experienced early boot crashes * tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Fix __apply_microcode_amd()'s return value x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range() x86/fpu: Update the outdated comment above fpstate_init_user() x86/early_printk: Add support for MMIO-based UARTs x86/dumpstack: Fix inaccurate unwinding from exception stacks due to misplaced assignment x86/entry: Fix ORC unwinder for PUSH_REGS with save_ret=1 x86/Kconfig: Fix lists in X86_EXTENDED_PLATFORM help text x86/Kconfig: Correct X86_X2APIC help text x86/speculation: Remove the extra #ifdef around CALL_NOSPEC x86/Kconfig: Document release year of glibc 2.3.3 x86/Kconfig: Make CONFIG_PCI_CNB20LE_QUIRK depend on X86_32 x86/Kconfig: Document CONFIG_PCI_MMCONFIG x86/Kconfig: Update lists in X86_EXTENDED_PLATFORM x86/Kconfig: Move all X86_EXTENDED_PLATFORM options together x86/Kconfig: Always enable ARCH_SPARSEMEM_ENABLE x86/Kconfig: Enable X86_X2APIC by default and improve help text
2025-03-30bcachefs: Log original key being moved in data updatesKent Overstreet
There's something going on with the data move path; log the original key being moved for debugging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30bcachefs: BCH_JSET_ENTRY_log_bkeyKent Overstreet
Add a journal entry type for logging - but logging a bkey, not a string; to be used for data move path debugging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30Merge tag 'locking-urgent-2025-03-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc locking fixes and updates from Ingo Molnar: - Fix a locking self-test FAIL on PREEMPT_RT kernels - Fix nr_unused_locks accounting bug - Simplify the split-lock debugging feature's fast-path * tag 'locking-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Decrease nr_unused_locks if lock unused in zap_class() lockdep: Fix wait context check on softirq for PREEMPT_RT x86/split_lock: Simplify reenabling
2025-03-30Merge tag 'bpf_try_alloc_pages' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf try_alloc_pages() support from Alexei Starovoitov: "The pull includes work from Sebastian, Vlastimil and myself with a lot of help from Michal and Shakeel. This is a first step towards making kmalloc reentrant to get rid of slab wrappers: bpf_mem_alloc, kretprobe's objpool, etc. These patches make page allocator safe from any context. Vlastimil kicked off this effort at LSFMM 2024: https://lwn.net/Articles/974138/ and we continued at LSFMM 2025: https://lore.kernel.org/all/CAADnVQKfkGxudNUkcPJgwe3nTZ=xohnRshx9kLZBTmR_E1DFEg@mail.gmail.com/ Why: SLAB wrappers bind memory to a particular subsystem making it unavailable to the rest of the kernel. Some BPF maps in production consume Gbytes of preallocated memory. Top 5 in Meta: 1.5G, 1.2G, 1.1G, 300M, 200M. Once we have kmalloc that works in any context BPF map preallocation won't be necessary. How: Synchronous kmalloc/page alloc stack has multiple stages going from fast to slow: cmpxchg16 -> slab_alloc -> new_slab -> alloc_pages -> rmqueue_pcplist -> __rmqueue, where rmqueue_pcplist was already relying on trylock. This set changes rmqueue_bulk/rmqueue_buddy to attempt a trylock and return ENOMEM if alloc_flags & ALLOC_TRYLOCK. It then wraps this functionality into try_alloc_pages() helper. We make sure that the logic is sane in PREEMPT_RT. End result: try_alloc_pages()/free_pages_nolock() are safe to call from any context. try_kmalloc() for any context with similar trylock approach will follow. It will use try_alloc_pages() when slab needs a new page. Though such try_kmalloc/page_alloc() is an opportunistic allocator, this design ensures that the probability of successful allocation of small objects (up to one page in size) is high. Even before we have try_kmalloc(), we already use try_alloc_pages() in BPF arena implementation and it's going to be used more extensively in BPF" * tag 'bpf_try_alloc_pages' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: mm: Fix the flipped condition in gfpflags_allow_spinning() bpf: Use try_alloc_pages() to allocate pages for bpf needs. mm, bpf: Use memcg in try_alloc_pages(). memcg: Use trylock to access memcg stock_lock. mm, bpf: Introduce free_pages_nolock() mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation locking/local_lock: Introduce localtry_lock_t
2025-03-30bcachefs: Reorder error messages that include journal debugKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30bcachefs: Don't use designated initializers for disk_accounting_posKent Overstreet
Not all compilers fully initialize these - they're not guaranteed to because of the union shenanigans. Fixes: https://github.com/koverstreet/bcachefs/issues/844 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30bcachefs: Silence errors after emergency shutdownKent Overstreet
We don't care about errors from asynchronous ops that were because we did an emergency shutdown; silence them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30bcachefs: fix units in rebalance_statusKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-03-30bcachefs: bch2_ioctl_subvolume_destroy() fixesKent Overstreet
bch2_evict_subvolume_inodes() was getting stuck - due to incorrectly pruning the dcache. Also, fix missing permissions checks. Reported-by: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>