summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-13rust: error: import `kernel`'s `LayoutError` instead of `core`'sJimmy Ostler
Import the internal (`kernel::alloc`) version of `LayoutError` instead of the `core::alloc` one. In particular, this results in switching the type in the existing `From<LayoutError> for Error` implementation. Acked-by: Danilo Krummrich <dakr@kernel.org> Signed-off-by: Jimmy Ostler <jtostler1@gmail.com> Link: https://lore.kernel.org/r/fe58a02189e8804a9eabdd01cb1927d4c491d79c.1734674670.git.jtostler1@gmail.com [ Reworded commit. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-01-13rust: str: replace unwraps with question mark operatorsDaniel Sedlak
Simplify the error handling by replacing unwraps with the question mark operator. Furthermore, unwraps can convey a wrong impression that unwrapping is fine in general, thus this patch removes this unwrapping. Suggested-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/rust-for-linux/CANiq72nsK1D4NuQ1U7NqMWoYjXkqQSj4QuUEL98OmFbq022Z9A@mail.gmail.com/ Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Daniel Sedlak <daniel@sedlak.dev> Link: https://lore.kernel.org/r/20241123095033.41240-5-daniel@sedlak.dev [ Slightly reworded commit. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-01-13rust: page: remove unnecessary helper function from doctestDaniel Sedlak
Doctests in `page.rs` contained a helper function `dox` which acted as a wrapper for using the `?` operator. However, this is not needed because doctests are implicitly wrapped in function see [1]. Link: https://doc.rust-lang.org/rustdoc/write-documentation/documentation-tests.html#using--in-doc-tests [1] Suggested-by: Dirk Behme <dirk.behme@de.bosch.com> Suggested-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/rust-for-linux/459782fe-afca-4fe6-8ffb-ba7c7886de0a@de.bosch.com/ Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Tamir Duberstein <tamird@gmail.com> Signed-off-by: Daniel Sedlak <daniel@sedlak.dev> Link: https://lore.kernel.org/r/20241123095033.41240-4-daniel@sedlak.dev [ Fixed typo in SoB. Slightly reworded commit. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-01-13rust: rbtree: remove unwrap in assertsDaniel Sedlak
Remove `unwrap` in asserts and replace it with `Option::Some` matching. By doing it this way, the examples are more descriptive, so it disambiguates the return type of the `get(...)` and `next(...)`, because the `unwrap(...)` can also be called on `Result`. Signed-off-by: Daniel Sedlak <daniel@sedlak.dev> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20241123095033.41240-3-daniel@sedlak.dev [ Reworded title slightly. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-01-13rust: init: replace unwraps with question mark operatorsDaniel Sedlak
Use `?` operator in the doctests. Since it is in the examples, using unwraps can convey a wrong impression that unwrapping is fine in general, thus this patch removes this unwrapping. Suggested-by: Miguel Ojeda <ojeda@kernel.org> Link: https://lore.kernel.org/rust-for-linux/CANiq72nsK1D4NuQ1U7NqMWoYjXkqQSj4QuUEL98OmFbq022Z9A@mail.gmail.com/ Signed-off-by: Daniel Sedlak <daniel@sedlak.dev> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20241123095033.41240-2-daniel@sedlak.dev [ Reworded commit slightly. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-01-13io_uring: simplify the SQPOLL thread check when cancelling requestsBui Quang Minh
In io_uring_try_cancel_requests, we check whether sq_data->thread == current to determine if the function is called by the SQPOLL thread to do iopoll when IORING_SETUP_SQPOLL is set. This check can race with the SQPOLL thread termination. io_uring_cancel_generic is used in 2 places: io_uring_cancel_generic and io_ring_exit_work. In io_uring_cancel_generic, we have the information whether the current is SQPOLL thread already. And the SQPOLL thread never reaches io_ring_exit_work. So to avoid the racy check, this commit adds a boolean flag to io_uring_try_cancel_requests to determine if the caller is SQPOLL thread. Reported-by: syzbot+3c750be01dab672c513d@syzkaller.appspotmail.com Reported-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20250113160331.44057-1-minhquangbui99@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-13ring-buffer: Make reading page consistent with the code logicJeongjun Park
In the loop of __rb_map_vma(), the 's' variable is calculated from the same logic that nr_pages is and they both come from nr_subbufs. But the relationship is not obvious and there's a WARN_ON_ONCE() around the 's' variable to make sure it never becomes equal to nr_subbufs within the loop. If that happens, then the code is buggy and needs to be fixed. The 'page' variable is calculated from cpu_buffer->subbuf_ids[s] which is an array of 'nr_subbufs' entries. If the code becomes buggy and 's' becomes equal to or greater than 'nr_subbufs' then this will be an out of bounds hit before the WARN_ON() is triggered and the code exiting safely. Make the 'page' initialization consistent with the code logic and assign it after the out of bounds check. Link: https://lore.kernel.org/20250110162612.13983-1-aha310510@gmail.com Signed-off-by: Jeongjun Park <aha310510@gmail.com> [ sdr: rewrote change log ] Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-13btrfs: add the missing error handling inside get_canonical_dev_pathQu Wenruo
Inside function get_canonical_dev_path(), we call d_path() to get the final device path. But d_path() can return error, and in that case the next strscpy() call will trigger an invalid memory access. Add back the missing error handling for d_path(). Reported-by: Boris Burkov <boris@bur.io> Fixes: 7e06de7c83a7 ("btrfs: canonicalize the device path before adding it") Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-13ring-buffer: Check for empty ring-buffer with rb_num_of_entries()Vincent Donnefort
Currently there are two ways of identifying an empty ring-buffer. One relying on the current status of the commit / reader page (rb_per_cpu_empty()) and the other on the write and read counters (rb_num_of_entries() used in rb_get_reader_page()). with rb_num_of_entries(). This intends to ease later introduction of ring-buffer writers which are out of the kernel control and with whom, the only information available is through the meta-page counters. Link: https://lore.kernel.org/20250108114536.627715-2-vdonnefort@google.com Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-13ACPI: video: Fix random crashes due to bad kfree()Chris Bainbridge
Commit c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if available for eDP") added function dm_helpers_probe_acpi_edid(), which fetches the EDID from the BIOS by calling acpi_video_get_edid(). acpi_video_get_edid() returns a pointer to the EDID, but this pointer does not originate from kmalloc() - it is actually the internal "pointer" field from an acpi_buffer struct (which did come from kmalloc()). dm_helpers_probe_acpi_edid() then attempts to kfree() the EDID pointer, resulting in memory corruption which leads to random, intermittent crashes (e.g. 4% of boots will fail with some Oops). Fix this by allocating a new array (which can be safely freed) for the EDID data, and correctly freeing the acpi_buffer pointer. The only other caller of acpi_video_get_edid() is nouveau_acpi_edid(): remove the extraneous kmemdup() here as the EDID data is now copied in acpi_video_device_EDID(). Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com> Fixes: c6a837088bed ("drm/amd/display: Fetch the EDID from _DDC if available for eDP") Reviewed-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reported-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Closes: https://lore.kernel.org/amd-gfx/20250110175252.GBZ4FedNKqmBRaY4T3@fat_crate.local/T/#m324a23eb4c4c32fa7e89e31f8ba96c781e496fb1 Link: https://patch.msgid.link/Z4K_oQL7eA9Owkbs@debian.local [ rjw: Changed function description comment into a kerneldoc one ] [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-13btrfs: add io_uring interface for encoded writesMark Harmstone
Add an io_uring interface for encoded writes, with the same parameters as the BTRFS_IOC_ENCODED_WRITE ioctl. As with the encoded reads code, there's a test program for this at https://github.com/maharmstone/io_uring-encoded, and I'll get this worked into an fstest. How io_uring works is that it initially calls btrfs_uring_cmd with the IO_URING_F_NONBLOCK flag set, and if we return -EAGAIN it tries again in a kthread with the flag cleared. Ideally we'd honour this and call try_lock etc., but there's still a lot of work to be done to create non-blocking versions of all the functions in our write path. Instead, just validate the input in btrfs_uring_encoded_write() on the first pass and return -EAGAIN, with a view to properly optimizing the optimistic path later on. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-13bpf: Use ftrace_get_symaddr() for kprobe_multi probesMasami Hiramatsu (Google)
Add ftrace_get_entry_ip() which is only for ftrace based probes, and use it for kprobe multi probes because they are based on fprobe which uses ftrace instead of kprobes. Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Florent Revest <revest@chromium.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: bpf <bpf@vger.kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/173566081414.878879.10631096557346094362.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-01-13bcachefs: bcachefs_metadata_version_directory_sizeHongbo Li
This adds another metadata version for accounting directory size. For the new version of the filesystem, when new subdirectory items are created or deleted, the parent directory's size will change accordingly. For the old version of the existed file system, running fsck will automatically upgrade the metadata version, and it will do the check and recalculationg of the directory size. Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-13bcachefs: make directory i_size meaningfulHongbo Li
The isize of directory is 0 in bcachefs if the directory is empty. With more child dirents created, its size ought to change. Many other filesystems changed as that (ie. xfs and btrfs). And many of them changed as the size of child dirent name. Although the directory size may not seem to convey much, we can still give it some meaning. The formula of dentry size as follow: occupied_size = 40 + ALIGN(9 + namelen, 8) Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2025-01-13cpuidle: teo: Update documentation after previous changesRafael J. Wysocki
After previous changes, the description of the teo governor in the documentation comment does not match the code any more, so update it as appropriate. Fixes: 449914398083 ("cpuidle: teo: Remove recent intercepts metric") Fixes: 2662342079f5 ("cpuidle: teo: Gather statistics regarding whether or not to stop the tick") Fixes: 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/6120335.lOV4Wx5bFT@rjwysocki.net [ rjw: Corrected 3 typos found by Christian ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-13cpuidle: menu: Update documentation after previous changesRafael J. Wysocki
After commit 38f83090f515 ("cpuidle: menu: Remove iowait influence") and other previous changes, the description of the menu governor in the documentation does not match the code any more, so update it as appropriate. Fixes: 38f83090f515 ("cpuidle: menu: Remove iowait influence") Fixes: 5484e31bbbff ("cpuidle: menu: Skip tick_nohz_get_sleep_length() call in some cases") Reported-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/12589281.O9o76ZdvQC@rjwysocki.net
2025-01-13drm/amd/display: Disable replay and psr while VRR is enabledTom Chung
[Why] Replay and PSR will cause some video corruption while VRR is enabled. [How] 1. Disable the Replay and PSR while VRR is enabled. 2. Change the amdgpu_dm_crtc_vrr_active() parameter to const. Because the function will only read data from dm_crtc_state. Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d7879340e987b3056b8ae39db255b6c19c170a0d) Cc: stable@vger.kernel.org
2025-01-13drm/amd/display: Fix PSR-SU not support but still call the amdgpu_dm_psr_enableTom Chung
[Why] The enum DC_PSR_VERSION_SU_1 of psr_version is 1 and DC_PSR_VERSION_UNSUPPORTED is 0xFFFFFFFF. The original code may has chance trigger the amdgpu_dm_psr_enable() while psr version is set to DC_PSR_VERSION_UNSUPPORTED. [How] Modify the condition to psr->psr_version == DC_PSR_VERSION_SU_1 Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f765e7ce0417f8dc38479b4b495047c397c16902) Cc: stable@vger.kernel.org
2025-01-13ice: Add correct PHY lane assignmentKarol Kolacinski
Driver always naively assumes, that for PTP purposes, PHY lane to configure is corresponding to PF ID. This is not true for some port configurations, e.g.: - 2x50G per quad, where lanes used are 0 and 2 on each quad, but PF IDs are 0 and 1 - 100G per quad on 2 quads, where lanes used are 0 and 4, but PF IDs are 0 and 1 Use correct PHY lane assignment by getting and parsing port options. This is read from the NVM by the FW and provided to the driver with the indication of active port split. Remove ice_is_muxed_topo(), which is no longer needed. Fixes: 4409ea1726cb ("ice: Adjust PTP init for 2x50G E825C devices") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Arkadiusz Kubalewski <Arkadiusz.kubalewski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-13ice: Fix ETH56G FC-FEC Rx offset valueKarol Kolacinski
Fix ETH56G FC-FEC incorrect Rx offset value by changing it from -255.96 to -469.26 ns. Those values are derived from HW spec and reflect internal delays. Hex value is a fixed point representation in Q23.9 format. Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products") Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-13ice: Fix quad registers read on E825Karol Kolacinski
Quad registers are read/written incorrectly. E825 devices always use quad 0 address and differentiate between the PHYs by changing SBQ destination device (phy_0 or phy_0_peer). Add helpers for reading/writing PTP registers shared per quad and use correct quad address and SBQ destination device based on port. Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products") Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-13ice: Fix E825 initializationKarol Kolacinski
Current implementation checks revision of all PHYs on all PFs, which is incorrect and may result in initialization failure. Check only the revision of the current PHY. Fixes: 7cab44f1c35f ("ice: Introduce ETH56G PHY model for E825C products") Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2025-01-13docs: submitting-patches: clarify that signers may use their discretion on tagsMiguel Ojeda
Tags are really appreciated by maintainers in general, since it means someone is willing to put their name on a commit, be it as a reviewer, tester, etc. However, signers (i.e. submitters carrying tags from previous versions and maintainers applying patches) may need to take or drop tags, on a case-by-case basis, for different reasons. Yet this is not explicitly spelled out in the documentation, thus there may be instances [1] where contributors may feel unwelcome. Thus, to clarify, state this clearly. Link: https://lore.kernel.org/rust-for-linux/CAEg-Je-h4NitWb2ErFGCOqt0KQfXuyKWLhpnNHCdRzZdxi018Q@mail.gmail.com/ [1] Suggested-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Shuah Khan <skhan@linuxfoundation.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250112152946.761150-4-ojeda@kernel.org
2025-01-13docs: submitting-patches: clarify difference between Acked-by and Reviewed-byMiguel Ojeda
Newcomers to the kernel need to learn the different tags that are used in commit messages and when to apply them. Acked-by is sometimes misunderstood, since the documentation did not really clarify (up to the previous commit) when it should be used, especially compared to Reviewed-by. The previous commit already clarified who the usual providers of Acked-by tags are, with examples. Thus provide a clarification paragraph for the comparison with Reviewed-by, and give a couple examples reusing the cases given above, in the previous commit. Acked-by: Shuah Khan <skhan@linuxfoundation.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250112152946.761150-3-ojeda@kernel.org
2025-01-13docs: submitting-patches: clarify Acked-by and introduce "# Suffix"Miguel Ojeda
Acked-by is typically used by maintainers. However, sometimes it is useful to be able to accept the tag from other stakeholders that may not have done a deep technical review or may not be kernel developers. For instance: - People with domain knowledge, such as the original author of the code being modified. - Userspace-side reviewers for a kernel uAPI patch, like in DRM -- see Documentation/gpu/drm-uapi.rst: > The userspace-side reviewer should also provide an Acked-by on the > kernel uAPI patch indicating that they believe the proposed uAPI > is sound and sufficiently documented and validated for userspace's > consumption. - Key users of a feature, such as in [1]. Thus clarify that Acked-by may be used by other stakeholders (but most commonly by maintainers). Since, in these cases, it may be confusing why an Acked-by is/was provided, allow and suggest to provide a "# Suffix" explaining it. The "# Suffix" for Acked-by is already being used to clarify what part of the patch a maintainer is acknowledging, thus also mention "# Suffix" in the relevant paragraph. Link: https://lore.kernel.org/rust-for-linux/CANiq72m4fea15Z0fFZauz8N2madkBJ0G7Dc094OwoajnXmROOA@mail.gmail.com/ [1] Acked-by: Shuah Khan <skhan@linuxfoundation.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Miguel Ojeda <ojeda@kernel.org> Reviewed-by: Neal Gompa <neal@gompa.dev> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250112152946.761150-2-ojeda@kernel.org
2025-01-13Documentation: bug-hunting.rst: remove odd contact informationGreg Kroah-Hartman
At the bottom of the bug-hunting.rst file there is a "signature" which doesn't seem to make much sense. It seems to predate git, and perhaps was from an earlier bug report that got copied into the document, but now makes no sense so remove it. Cc: greg@wind.rmcc.com Cc: Jonathan Corbet <corbet@lwn.net> Cc: Alex Shi <alexs@kernel.org> Cc: Yanteng Si <si.yanteng@linux.dev> Cc: Hu Haowen <2023002089@link.tyut.edu.cn> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/2025011005-resistant-uncork-9814@gregkh
2025-01-13docs/zh_CN: Add sak index Chinese translationzhangwei
Translate lwn/Documentation/security/sak.rst into Chinese Update the translation through commit 4d3beaa06d35 ("docs: security: move some books to it and update") Reviewed-by: Yanteng Si <si.yanteng@linux.dev> Reviewed-by: Alex Shi <alexs@kernel.org> Signed-off-by: zhangwei <zhangwei@cqsoftware.com.cn> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250110100405.2225-1-zhangwei@cqsoftware.com.cn
2025-01-13Merge tag 'md-6.14-20250113' of ↵Jens Axboe
https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux into for-6.14/block Pull MD updates from Song: "1. Reintroduce md-linear, by Yu Kuai. 2. md-bitmap refactor and fix, by Yu Kuai. 3. Replace kmap_atomic with kmap_local_page, by David Reaver." * tag 'md-6.14-20250113' of https://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux: md/md-bitmap: move bitmap_{start, end}write to md upper layer md/raid5: implement pers->bitmap_sector() md: add a new callback pers->bitmap_sector() md/md-bitmap: remove the last parameter for bimtap_ops->endwrite() md/md-bitmap: factor behind write counters out from bitmap_{start/end}write() md: Replace deprecated kmap_atomic() with kmap_local_page() md: reintroduce md-linear
2025-01-13nvme: fix bogus kzalloc() return check in nvme_init_effects_log()Jens Axboe
nvme_init_effects_log() returns failure when kzalloc() is successful, which is obviously wrong and causes failures to boot. Correct the check. Fixes: d4a95adeabc6 ("nvme: Add error path for xa_store in nvme_init_effects") Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-13Merge tag 'mm-hotfixes-stable-2025-01-13-00-03' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "18 hotfixes. 11 are cc:stable. 13 are MM and 5 are non-MM. All patches are singletons - please see the relevant changelogs for details" * tag 'mm-hotfixes-stable-2025-01-13-00-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: fs/proc: fix softlockup in __read_vmcore (part 2) mm: fix assertion in folio_end_read() mm: vmscan : pgdemote vmstat is not getting updated when MGLRU is enabled. vmstat: disable vmstat_work on vmstat_cpu_down_prep() zram: fix potential UAF of zram table selftests/mm: set allocated memory to non-zero content in cow test mm: clear uffd-wp PTE/PMD state on mremap() module: fix writing of livepatch relocations in ROX text mm: zswap: properly synchronize freeing resources during CPU hotunplug Revert "mm: zswap: fix race between [de]compression and CPU hotunplug" hugetlb: fix NULL pointer dereference in trace_hugetlbfs_alloc_inode mm: fix div by zero in bdi_ratio_from_pages x86/execmem: fix ROX cache usage in Xen PV guests filemap: avoid truncating 64-bit offset to 32 bits tools: fix atomic_set() definition to set the value correctly mm/mempolicy: count MPOL_WEIGHTED_INTERLEAVE to "interleave_hit" scripts/decode_stacktrace.sh: fix decoding of lines with an additional info mm/kmemleak: fix percpu memory leak detection failure
2025-01-13Merge branch 'md-6.14-bitmap' into md-6.14Song Liu
Move bitmap_{start, end}write calls to md layer. These changes help address hangs in bitmap_startwrite([1],[2]). [1] https://lore.kernel.org/all/CAJpMwyjmHQLvm6zg1cmQErttNNQPDAAXPKM3xgTjMhbfts986Q@mail.gmail.com/ [2] https://lore.kernel.org/all/ADF7D720-5764-4AF3-B68E-1845988737AA@flyingcircus.io/ * md-6.14-bitmap: md/md-bitmap: move bitmap_{start, end}write to md upper layer md/raid5: implement pers->bitmap_sector() md: add a new callback pers->bitmap_sector() md/md-bitmap: remove the last parameter for bimtap_ops->endwrite() md/md-bitmap: factor behind write counters out from bitmap_{start/end}write()
2025-01-13md/md-bitmap: move bitmap_{start, end}write to md upper layerYu Kuai
There are two BUG reports that raid5 will hang at bitmap_startwrite([1],[2]), root cause is that bitmap start write and end write is unbalanced, it's not quite clear where, and while reviewing raid5 code, it's found that bitmap operations can be optimized. For example, for a 4 disks raid5, with chunksize=8k, if user issue a IO (0 + 48k) to the array: ┌────────────────────────────────────────────────────────────┐ │chunk 0 │ │ ┌────────────┬─────────────┬─────────────┬────────────┼ │ sh0 │A0: 0 + 4k │A1: 8k + 4k │A2: 16k + 4k │A3: P │ │ ┼────────────┼─────────────┼─────────────┼────────────┼ │ sh1 │B0: 4k + 4k │B1: 12k + 4k │B2: 20k + 4k │B3: P │ ┼──────┴────────────┴─────────────┴─────────────┴────────────┼ │chunk 1 │ │ ┌────────────┬─────────────┬─────────────┬────────────┤ │ sh2 │C0: 24k + 4k│C1: 32k + 4k │C2: P │C3: 40k + 4k│ │ ┼────────────┼─────────────┼─────────────┼────────────┼ │ sh3 │D0: 28k + 4k│D1: 36k + 4k │D2: P │D3: 44k + 4k│ └──────┴────────────┴─────────────┴─────────────┴────────────┘ Before this patch, 4 stripe head will be used, and each sh will attach bio for 3 disks, and each attached bio will trigger bitmap_startwrite() once, which means total 12 times. - 3 times (0 + 4k), for (A0, A1 and A2) - 3 times (4 + 4k), for (B0, B1 and B2) - 3 times (8 + 4k), for (C0, C1 and C3) - 3 times (12 + 4k), for (D0, D1 and D3) After this patch, md upper layer will calculate that IO range (0 + 48k) is corresponding to the bitmap (0 + 16k), and call bitmap_startwrite() just once. Noted that this patch will align bitmap ranges to the chunks, for example, if user issue a IO (0 + 4k) to array: - Before this patch, 1 time (0 + 4k), for A0; - After this patch, 1 time (0 + 8k) for chunk 0; Usually, one bitmap bit will represent more than one disk chunk, and this doesn't have any difference. And even if user really created a array that one chunk contain multiple bits, the overhead is that more data will be recovered after power failure. Also remove STRIPE_BITMAP_PENDING since it's not used anymore. [1] https://lore.kernel.org/all/CAJpMwyjmHQLvm6zg1cmQErttNNQPDAAXPKM3xgTjMhbfts986Q@mail.gmail.com/ [2] https://lore.kernel.org/all/ADF7D720-5764-4AF3-B68E-1845988737AA@flyingcircus.io/ Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250109015145.158868-6-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>
2025-01-13md/raid5: implement pers->bitmap_sector()Yu Kuai
Bitmap is used for the whole array for raid1/raid10, hence IO for the array can be used directly for bitmap. However, bitmap is used for underlying disks for raid5, hence IO for the array can't be used directly for bitmap. Implement pers->bitmap_sector() for raid5 to convert IO ranges from the array to the underlying disks. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250109015145.158868-5-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>
2025-01-13md: add a new callback pers->bitmap_sector()Yu Kuai
This callback will be used in raid5 to convert io ranges from array to bitmap. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Link: https://lore.kernel.org/r/20250109015145.158868-4-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>
2025-01-13md/md-bitmap: remove the last parameter for bimtap_ops->endwrite()Yu Kuai
For the case that IO failed for one rdev, the bit will be mark as NEEDED in following cases: 1) If badblocks is set and rdev is not faulty; 2) If rdev is faulty; Case 1) is useless because synchronize data to badblocks make no sense. Case 2) can be replaced with mddev->degraded. Also remove R1BIO_Degraded, R10BIO_Degraded and STRIPE_DEGRADED since case 2) no longer use them. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Link: https://lore.kernel.org/r/20250109015145.158868-3-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>
2025-01-13md/md-bitmap: factor behind write counters out from bitmap_{start/end}write()Yu Kuai
behind_write is only used in raid1, prepare to refactor bitmap_{start/end}write(), there are no functional changes. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Xiao Ni <xni@redhat.com> Link: https://lore.kernel.org/r/20250109015145.158868-2-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>
2025-01-13Documentation: arm64: Remove stale and redundant virtual memory diagramsWill Deacon
The arm64 'memory.rst' file tries to document the virtual memory map and the translation procedure for a couple of kernel configurations. Unfortunately, the virtual memory map changes relatively frequently and we support considerably more configurations than we did when the docs were introduced (e.g. we now have support for 16KiB pages and 52-bit addressing). Furthermore, the Arm ARM is the definitive resource for the translation procedure and so there's little point in duplicating part of that information in the kernel documentation. Rather than continue trying (and failing) to maintain these diagrams, let's rip them out. The kernel page-table can be dumped using CONFIG_PTDUMP_DEBUGFS if necesssary. Link: https://lore.kernel.org/r/20250102065554.1533781-1-sangmoon.kim@samsung.com Reported-by: Sangmoon Kim <sangmoon.kim@samsung.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-01-13vsnprintf: fix the number base for non-numeric formatsLinus Torvalds
Commit 8d4826cc8a8a ("vsnprintf: collapse the number format state into one single state") changed the format specification decoding to be a bit more straightforward but in the process ended up also resetting the number base to zero for formats that aren't clearly numerical. Now, the number base obviously doesn't matter for something like '%s', so this wasn't all that obvious. But some of our specialized pointer extension formatting (ie, things like "print out IPv6 address") did up depending on the default base-10 setting, and when they then tried to print out numbers in "base zero", things didn't work out so well. Most pointer formatting (including things like the default raw hex value conversion) didn't have this issue, because they used helpers that explicitly set the base. Reported-and-tested-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202501131352.e226f995-lkp@intel.com Fixes: 8d4826cc8a8a ("vsnprintf: collapse the number format state into one single state") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-01-13md: Replace deprecated kmap_atomic() with kmap_local_page()David Reaver
kmap_atomic() is deprecated and should be replaced with kmap_local_page() [1][2]. kmap_local_page() is faster in kernels with HIGHMEM enabled, can take page faults, and allows preemption. According to [2], this is safe as long as the code between kmap_atomic() and kunmap_atomic() does not implicitly depend on disabling page faults or preemption. It appears to me that none of the call sites in this patch depend on disabling page faults or preemption; they are all mapping a page to simply extract some information from it or print some debug info. [1] https://lwn.net/Articles/836144/ [2] https://docs.kernel.org/mm/highmem.html#temporary-virtual-mappings Signed-off-by: David Reaver <me@davidreaver.com> Link: https://lore.kernel.org/r/20250108192131.46843-1-me@davidreaver.com Signed-off-by: Song Liu <song@kernel.org>
2025-01-13md: reintroduce md-linearYu Kuai
THe md-linear is removed by commit 849d18e27be9 ("md: Remove deprecated CONFIG_MD_LINEAR") because it has been marked as deprecated for a long time. However, md-linear is used widely for underlying disks with different size, sadly we didn't know this until now, and it's true useful to create partitions and assemble multiple raid and then append one to the other. People have to use dm-linear in this case now, however, they will prefer to minimize the number of involved modules. Fixes: 849d18e27be9 ("md: Remove deprecated CONFIG_MD_LINEAR") Cc: stable@vger.kernel.org Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Coly Li <colyli@kernel.org> Acked-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20250102112841.1227111-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu <song@kernel.org>
2025-01-13select: Fix unbalanced user_access_end()Christophe Leroy
While working on implementing user access validation on powerpc I got the following warnings on a pmac32_defconfig build: CC fs/select.o fs/select.o: warning: objtool: sys_pselect6+0x1bc: redundant UACCESS disable fs/select.o: warning: objtool: sys_pselect6_time32+0x1bc: redundant UACCESS disable On powerpc/32s, user_read_access_begin/end() are no-ops, but the failure path has a user_access_end() instead of user_read_access_end() which means an access end without any prior access begin. Replace that user_access_end() by user_read_access_end(). Fixes: 7e71609f64ec ("pselect6() and friends: take handling the combined 6th/7th args into helper") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/r/a7139e28d767a13e667ee3c79599a8047222ef36.1736751221.git.christophe.leroy@csgroup.eu Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-01-13btrfs: remove the unused locked_folio parameter from ↵Qu Wenruo
btrfs_cleanup_ordered_extents() The function btrfs_cleanup_ordered_extents() is only called in error handling path, and the last caller with a @locked_folio parameter was removed to fix a bug in the btrfs_run_delalloc_range() error handling. There is no need to pass @locked_folio parameter anymore. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-13btrfs: add extra error messages for delalloc range related errorsQu Wenruo
All the error handling bugs I hit so far are all -ENOSPC from either: - cow_file_range() - run_delalloc_nocow() - submit_uncompressed_range() Previously when those functions failed, there was no error message at all, making the debugging much harder. So here we introduce extra error messages for: - cow_file_range() - run_delalloc_nocow() - submit_uncompressed_range() - writepage_delalloc() when btrfs_run_delalloc_range() failed - extent_writepage() when extent_writepage_io() failed One example of the new debug error messages is the following one: run fstests generic/750 at 2024-12-08 12:41:41 BTRFS: device fsid 461b25f5-e240-4543-8deb-e7c2bd01a6d3 devid 1 transid 8 /dev/mapper/test-scratch1 (253:4) scanned by mount (2436600) BTRFS info (device dm-4): first mount of filesystem 461b25f5-e240-4543-8deb-e7c2bd01a6d3 BTRFS info (device dm-4): using crc32c (crc32c-arm64) checksum algorithm BTRFS info (device dm-4): forcing free space tree for sector size 4096 with page size 65536 BTRFS info (device dm-4): using free-space-tree BTRFS warning (device dm-4): read-write for sector size 4096 with page size 65536 is experimental BTRFS info (device dm-4): checking UUID tree BTRFS error (device dm-4): cow_file_range failed, root=363 inode=412 start=503808 len=98304: -28 BTRFS error (device dm-4): run_delalloc_nocow failed, root=363 inode=412 start=503808 len=98304: -28 BTRFS error (device dm-4): failed to run delalloc range, root=363 ino=412 folio=458752 submit_bitmap=11-15 start=503808 len=98304: -28 Which shows an error from cow_file_range() which is called inside a nocow write attempt, along with the extra bitmap from writepage_delalloc(). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-13btrfs: subpage: dump the involved bitmap when ASSERT() failedQu Wenruo
For btrfs_folio_assert_not_dirty() and btrfs_folio_set_lock(), we call bitmap_test_range_all_zero() to ensure the involved range has no dirty/lock bit already set. However with my recent enhanced delalloc range error handling, I was hitting the ASSERT() inside btrfs_folio_set_lock(), and it turns out that some error handling path is not properly updating the folio flags. So add some extra dumping for the ASSERTs to dump the involved bitmap to help debug. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-13btrfs: subpage: fix the bitmap dump of the locked flagsQu Wenruo
We're dumping the locked bitmap into the @checked_bitmap variable, printing incorrect values during debug. Thankfully even during my development I haven't hit a case where I need to dump the locked bitmap. But for the sake of consistency, fix it by dupping the locked bitmap into @locked_bitmap variable for output. Fixes: 75258f20fb70 ("btrfs: subpage: dump extra subpage bitmaps for debug") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-13btrfs: do proper folio cleanup when run_delalloc_nocow() failedQu Wenruo
[BUG] With CONFIG_DEBUG_VM set, test case generic/476 has some chance to crash with the following VM_BUG_ON_FOLIO(): BTRFS error (device dm-3): cow_file_range failed, start 1146880 end 1253375 len 106496 ret -28 BTRFS error (device dm-3): run_delalloc_nocow failed, start 1146880 end 1253375 len 106496 ret -28 page: refcount:4 mapcount:0 mapping:00000000592787cc index:0x12 pfn:0x10664 aops:btrfs_aops [btrfs] ino:101 dentry name(?):"f1774" flags: 0x2fffff80004028(uptodate|lru|private|node=0|zone=2|lastcpupid=0xfffff) page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio)) ------------[ cut here ]------------ kernel BUG at mm/page-writeback.c:2992! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 2 UID: 0 PID: 3943513 Comm: kworker/u24:15 Tainted: G OE 6.12.0-rc7-custom+ #87 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] pc : folio_clear_dirty_for_io+0x128/0x258 lr : folio_clear_dirty_for_io+0x128/0x258 Call trace: folio_clear_dirty_for_io+0x128/0x258 btrfs_folio_clamp_clear_dirty+0x80/0xd0 [btrfs] __process_folios_contig+0x154/0x268 [btrfs] extent_clear_unlock_delalloc+0x5c/0x80 [btrfs] run_delalloc_nocow+0x5f8/0x760 [btrfs] btrfs_run_delalloc_range+0xa8/0x220 [btrfs] writepage_delalloc+0x230/0x4c8 [btrfs] extent_writepage+0xb8/0x358 [btrfs] extent_write_cache_pages+0x21c/0x4e8 [btrfs] btrfs_writepages+0x94/0x150 [btrfs] do_writepages+0x74/0x190 filemap_fdatawrite_wbc+0x88/0xc8 start_delalloc_inodes+0x178/0x3a8 [btrfs] btrfs_start_delalloc_roots+0x174/0x280 [btrfs] shrink_delalloc+0x114/0x280 [btrfs] flush_space+0x250/0x2f8 [btrfs] btrfs_async_reclaim_data_space+0x180/0x228 [btrfs] process_one_work+0x164/0x408 worker_thread+0x25c/0x388 kthread+0x100/0x118 ret_from_fork+0x10/0x20 Code: 910a8021 a90363f7 a9046bf9 94012379 (d4210000) ---[ end trace 0000000000000000 ]--- [CAUSE] The first two lines of extra debug messages show the problem is caused by the error handling of run_delalloc_nocow(). E.g. we have the following dirtied range (4K blocksize 4K page size): 0 16K 32K |//////////////////////////////////////| | Pre-allocated | And the range [0, 16K) has a preallocated extent. - Enter run_delalloc_nocow() for range [0, 16K) Which found range [0, 16K) is preallocated, can do the proper NOCOW write. - Enter fallback_to_fow() for range [16K, 32K) Since the range [16K, 32K) is not backed by preallocated extent, we have to go COW. - cow_file_range() failed for range [16K, 32K) So cow_file_range() will do the clean up by clearing folio dirty, unlock the folios. Now the folios in range [16K, 32K) is unlocked. - Enter extent_clear_unlock_delalloc() from run_delalloc_nocow() Which is called with PAGE_START_WRITEBACK to start page writeback. But folios can only be marked writeback when it's properly locked, thus this triggered the VM_BUG_ON_FOLIO(). Furthermore there is another hidden but common bug that run_delalloc_nocow() is not clearing the folio dirty flags in its error handling path. This is the common bug shared between run_delalloc_nocow() and cow_file_range(). [FIX] - Clear folio dirty for range [@start, @cur_offset) Introduce a helper, cleanup_dirty_folios(), which will find and lock the folio in the range, clear the dirty flag and start/end the writeback, with the extra handling for the @locked_folio. - Introduce a helper to clear folio dirty, start and end writeback - Introduce a helper to record the last failed COW range end This is to trace which range we should skip, to avoid double unlocking. - Skip the failed COW range for the error handling CC: stable@vger.kernel.org Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-13btrfs: do proper folio cleanup when cow_file_range() failedQu Wenruo
[BUG] When testing with COW fixup marked as BUG_ON() (this is involved with the new pin_user_pages*() change, which should not result new out-of-band dirty pages), I hit a crash triggered by the BUG_ON() from hitting COW fixup path. This BUG_ON() happens just after a failed btrfs_run_delalloc_range(): BTRFS error (device dm-2): failed to run delalloc range, root 348 ino 405 folio 65536 submit_bitmap 6-15 start 90112 len 106496: -28 ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent_io.c:1444! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 0 UID: 0 PID: 434621 Comm: kworker/u24:8 Tainted: G OE 6.12.0-rc7-custom+ #86 Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] pc : extent_writepage_io+0x2d4/0x308 [btrfs] lr : extent_writepage_io+0x2d4/0x308 [btrfs] Call trace: extent_writepage_io+0x2d4/0x308 [btrfs] extent_writepage+0x218/0x330 [btrfs] extent_write_cache_pages+0x1d4/0x4b0 [btrfs] btrfs_writepages+0x94/0x150 [btrfs] do_writepages+0x74/0x190 filemap_fdatawrite_wbc+0x88/0xc8 start_delalloc_inodes+0x180/0x3b0 [btrfs] btrfs_start_delalloc_roots+0x174/0x280 [btrfs] shrink_delalloc+0x114/0x280 [btrfs] flush_space+0x250/0x2f8 [btrfs] btrfs_async_reclaim_data_space+0x180/0x228 [btrfs] process_one_work+0x164/0x408 worker_thread+0x25c/0x388 kthread+0x100/0x118 ret_from_fork+0x10/0x20 Code: aa1403e1 9402f3ef aa1403e0 9402f36f (d4210000) ---[ end trace 0000000000000000 ]--- [CAUSE] That failure is mostly from cow_file_range(), where we can hit -ENOSPC. Although the -ENOSPC is already a bug related to our space reservation code, let's just focus on the error handling. For example, we have the following dirty range [0, 64K) of an inode, with 4K sector size and 4K page size: 0 16K 32K 48K 64K |///////////////////////////////////////| |#######################################| Where |///| means page are still dirty, and |###| means the extent io tree has EXTENT_DELALLOC flag. - Enter extent_writepage() for page 0 - Enter btrfs_run_delalloc_range() for range [0, 64K) - Enter cow_file_range() for range [0, 64K) - Function btrfs_reserve_extent() only reserved one 16K extent So we created extent map and ordered extent for range [0, 16K) 0 16K 32K 48K 64K |////////|//////////////////////////////| |<- OE ->|##############################| And range [0, 16K) has its delalloc flag cleared. But since we haven't yet submit any bio, involved 4 pages are still dirty. - Function btrfs_reserve_extent() returns with -ENOSPC Now we have to run error cleanup, which will clear all EXTENT_DELALLOC* flags and clear the dirty flags for the remaining ranges: 0 16K 32K 48K 64K |////////| | | | | Note that range [0, 16K) still has its pages dirty. - Some time later, writeback is triggered again for the range [0, 16K) since the page range still has dirty flags. - btrfs_run_delalloc_range() will do nothing because there is no EXTENT_DELALLOC flag. - extent_writepage_io() finds page 0 has no ordered flag Which falls into the COW fixup path, triggering the BUG_ON(). Unfortunately this error handling bug dates back to the introduction of btrfs. Thankfully with the abuse of COW fixup, at least it won't crash the kernel. [FIX] Instead of immediately unlocking the extent and folios, we keep the extent and folios locked until either erroring out or the whole delalloc range finished. When the whole delalloc range finished without error, we just unlock the whole range with PAGE_SET_ORDERED (and PAGE_UNLOCK for !keep_locked cases), with EXTENT_DELALLOC and EXTENT_LOCKED cleared. And the involved folios will be properly submitted, with their dirty flags cleared during submission. For the error path, it will be a little more complex: - The range with ordered extent allocated (range (1)) We only clear the EXTENT_DELALLOC and EXTENT_LOCKED, as the remaining flags are cleaned up by btrfs_mark_ordered_io_finished()->btrfs_finish_one_ordered(). For folios we finish the IO (clear dirty, start writeback and immediately finish the writeback) and unlock the folios. - The range with reserved extent but no ordered extent (range(2)) - The range we never touched (range(3)) For both range (2) and range(3) the behavior is not changed. Now even if cow_file_range() failed halfway with some successfully reserved extents/ordered extents, we will keep all folios clean, so there will be no future writeback triggered on them. CC: stable@vger.kernel.org Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-13nouveau/fence: handle cross device fences properlyDave Airlie
The fence sync logic doesn't handle a fence sync across devices as it tries to write to a channel offset from one device into the fence bo from a different device, which won't work so well. This patch fixes that to avoid using the sync path in the case where the fences come from different nouveau drm devices. This works fine on a single device as the fence bo is shared across the devices, and mapped into each channels vma space, the channel offsets are therefore okay to pass between sides, so one channel can sync on the seqnos from the other by using the offset into it's vma. Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Ben Skeggs <bskeggs@nvidia.com> [ Fix compilation issue; remove version log from commit messsage. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20250109005553.623947-1-airlied@gmail.com
2025-01-13partitions: ldm: remove the initial kernel-doc notationRandy Dunlap
Remove the file's first comment describing what the file is. This comment is not in kernel-doc format so it causes a kernel-doc warning. ldm.h:13: warning: expecting prototype for ldm(). Prototype was for _FS_PT_LDM_H_() instead Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Richard Russon (FlatCap) <ldm@flatcap.org> Cc: linux-ntfs-dev@lists.sourceforge.net Cc: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20250111062758.910458-1-rdunlap@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-13blk-cgroup: rwstat: fix kernel-doc warnings in header fileRandy Dunlap
Correct the function parameters to eliminate kernel-doc warnings: blk-cgroup-rwstat.h:63: warning: Function parameter or struct member 'opf' not described in 'blkg_rwstat_add' blk-cgroup-rwstat.h:63: warning: Excess function parameter 'op' description in 'blkg_rwstat_add' blk-cgroup-rwstat.h:91: warning: Function parameter or struct member 'result' not described in 'blkg_rwstat_read' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: cgroups@vger.kernel.org Link: https://lore.kernel.org/r/20250111062748.910442-1-rdunlap@infradead.org Signed-off-by: Jens Axboe <axboe@kernel.dk>