linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-11-11	block: Rework bio_split() return value	John Garry
	Instead of returning an inconclusive value of NULL for an error in calling bio_split(), return a ERR_PTR() always. Also remove the BUG_ON() calls, and WARN_ON_ONCE() instead. Indeed, since almost all callers don't check the return code from bio_split(), we'll crash anyway (for those failures). Fix up the only user which checks bio_split() return code today (directly or indirectly), blk_crypto_fallback_split_bio_if_needed(). The md/bcache code does check the return code in cached_dev_cache_miss() -> bio_next_split() -> bio_split(), but only to see if there was a split, so there would be no change in behaviour here (when returning a ERR_PTR()). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: John Garry <john.g.garry@oracle.com> Link: https://lore.kernel.org/r/20241111112150.3756529-2-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11	arm64: dts: rockchip: use less broad pinctrl for pcie3x1 on Radxa E25	FUKAUMI Naoki
	To avoid conflict with sdmmc_det, change pci3x1 pinctrl-0 name. Only the reset-pin is actually needed. Signed-off-by: FUKAUMI Naoki <naoki@radxa.com> Link: https://lore.kernel.org/r/20240918073236.648-1-naoki@radxa.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-11-11	arm64: dts: rockchip: add Radxa ROCK 5C	FUKAUMI Naoki
	Radxa ROCK 5C is a 8K computer for everything[1] using the Rockchip RK3588S2 chip: - Rockchip RK3588S2 - Quad A76 and Quad A55 CPU - 6 TOPS NPU - up to 32GB LPDDR4x RAM - eMMC / SPI flash connector - Micro SD Card slot - Gigabit ethernet port (supports PoE with add-on PoE HAT) - WiFi6 / BT5.4 - 1x USB 3.0 Type-A HOST port - 1x USB 3.0 Type-A OTG port - 2x USB 2.0 Type-A HOST port - 1x USB Type-C 5V power port [1] https://radxa.com/products/rock5/5c Signed-off-by: FUKAUMI Naoki <naoki@radxa.com> Link: https://lore.kernel.org/r/20241021090548.1052-2-naoki@radxa.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-11-11	dt-bindings: arm: rockchip: add Radxa ROCK 5C	FUKAUMI Naoki
	Add devicetree binding for the Radxa ROCK 5C. Radxa ROCK 5C is a 8K computer for everything[1] using the Rockchip RK3588S2 chip. [1] https://radxa.com/products/rock5/5c Signed-off-by: FUKAUMI Naoki <naoki@radxa.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20241021090548.1052-1-naoki@radxa.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-11-11	arm64: dts: rockchip: orangepi-5-plus: Enable GPU	Chen-Yu Tsai
	Enable the Mali GPU in the Orange Pi 5 Plus. Signed-off-by: Chen-Yu Tsai <wens@csie.org> Link: https://lore.kernel.org/r/20241025175409.886260-1-wens@kernel.org Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-11-11	arm64: dts: rockchip: enable USB3 on NanoPC-T6	Rick Wertenbroek
	Enable the USB3 port on FriendlyELEC NanoPC-T6. Signed-off-by: Rick Wertenbroek <rick.wertenbroek@gmail.com> Link: https://lore.kernel.org/r/20241106130314.1289055-1-rick.wertenbroek@gmail.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-11-11	arm64: dts: rockchip: adapt regulator nodenames to preferred form	Johan Jonker
	The preferred nodename for fixed-regulators has changed to pattern: '^regulator(-[0-9]+v[0-9]+\|-[0-9a-z-]+)?$' Fix all Rockchip DT regulator nodenames. Signed-off-by: Johan Jonker <jbx6244@gmail.com> Link: https://lore.kernel.org/r/0ae40493-93e9-40cd-9ca9-990ae064f21a@gmail.com [adapted rebased on top of a number of other changes and included neu6a-wifi + wolfvision-pf5-io-expander overlays] Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-11-11	arm64: dts: rockchip: Enable HDMI display for rk3588 Cool Pi GenBook	Andy Yan
	Enable hdmi display output on Cool Pi GenBook. Signed-off-by: Andy Yan <andyshrk@163.com> Link: https://lore.kernel.org/r/20241028123503.384866-4-andyshrk@163.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-11-11	arm64: dts: rockchip: Enable HDMI display for rk3588 Cool Pi 4B	Andy Yan
	Enable the micro HDMI on Cool Pi 4B. Signed-off-by: Andy Yan <andyshrk@163.com> Link: https://lore.kernel.org/r/20241028123503.384866-3-andyshrk@163.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-11-11	arm64: dts: rockchip: Enable HDMI0 for rk3588 Cool Pi CM5 EVB	Andy Yan
	As the hdmi-qp controller recently get merged, we can enable the HDMI0 display on this board now. Signed-off-by: Andy Yan <andyshrk@163.com> Link: https://lore.kernel.org/r/20241028123503.384866-2-andyshrk@163.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-11-11	arm64: dts: rockchip: Enable HDMI on NanoPi R6C/R6S	Jonas Karlman
	Add the necessary DT changes to enable HDMI on NanoPi R6C/R6S. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Link: https://lore.kernel.org/r/20241107212913.1322666-3-jonas@kwiboo.se Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-11-11	arm64: dts: rockchip: Enable GPU on NanoPi R6C/R6S	Jonas Karlman
	Add the necessary DT changes to enable GPU on NanoPi R6C/R6S. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Link: https://lore.kernel.org/r/20241107212913.1322666-2-jonas@kwiboo.se Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-11-11	arm64: dts: rockchip: Enable HDMI on Hardkernel ODROID-M2	Jonas Karlman
	Add the necessary DT changes to enable HDMI on Hardkernel ODROID-M2. Signed-off-by: Jonas Karlman <jonas@kwiboo.se> Link: https://lore.kernel.org/r/20241107211345.1318046-1-jonas@kwiboo.se Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-11-11	arm64: dts: rockchip: Remove non-removable flag from sdmmc on rk3576-sige5	Detlev Casanova
	The sdmmc node represents a removable SD card host. Make sure it is considered removable so that SD cards are detected when inserted. Signed-off-by: Detlev Casanova <detlev.casanova@collabora.com> Link: https://lore.kernel.org/r/20241108213357.268002-1-detlev.casanova@collabora.com Signed-off-by: Heiko Stuebner <heiko@sntech.de>
2024-11-11	ublk: fix ublk_ch_mmap() for 64K page size	Ming Lei
	In ublk_ch_mmap(), queue id is calculated in the following way: (vma->vm_pgoff << PAGE_SHIFT) / `max_cmd_buf_size` 'max_cmd_buf_size' is equal to `UBLK_MAX_QUEUE_DEPTH * sizeof(struct ublksrv_io_desc)` and UBLK_MAX_QUEUE_DEPTH is 4096 and part of UAPI, so 'max_cmd_buf_size' is always page aligned in 4K page size kernel. However, it isn't true in 64K page size kernel. Fixes the issue by always rounding up 'max_cmd_buf_size' with PAGE_SIZE. Cc: stable@vger.kernel.org Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20241111110718.1394001-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11	io_uring/uring_cmd: fix buffer index retrieval	Ming Lei
	Add back buffer index retrieval for IORING_URING_CMD_FIXED. Reported-by: Guangwu Zhang <guazhang@redhat.com> Cc: Jeff Moyer <jmoyer@redhat.com> Fixes: b54a14041ee6 ("io_uring/rsrc: add io_rsrc_node_lookup() helper") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Reviewed-by: Anuj Gupta <anuj20.g@samsung.com> Tested-by: Guangwu Zhang <guazhang@redhat.com> Link: https://lore.kernel.org/r/20241111101318.1387557-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11	intel_idle: add Granite Rapids Xeon D support	Artem Bityutskiy
	Add Granite Rapids Xeon D C-states support: C1, C1E, C6, and C6P. The C-states are basically the same as in Granite Rapids Xeon SP/AP, but characteristics (latency, target residency) are a bit different. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Link: https://patch.msgid.link/20241107115608.52233-1-artem.bityutskiy@linux.intel.com [ rjw: Changelog edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-11	ASoc: SOF: ipc4-pcm: fix uninit-value in sof_ipc4_pcm_dai_link_fixup_rate	Suraj Sonawane
	Fix an issue detected by the Smatch tool: sound/soc/sof/ipc4-pcm.c: sof_ipc4_pcm_dai_link_fixup_rate() error: uninitialized symbol 'be_rate'. The warning highlights a case where `be_rate` could remain uninitialized if `num_input_formats` is zero, which would cause undefined behavior when setting `rate->min` and `rate->max` based on `be_rate`. To address this issue, a `WARN_ON_ONCE(!num_input_formats)` check was added to ensure `num_input_formats` is greater than zero. If this condition fails, the function returns `-EINVAL`, preventing the use of an uninitialized `be_rate`. This change improves the robustness of the function by catching an invalid state early and providing better feedback during development. Signed-off-by: Suraj Sonawane <surajsonawane0215@gmail.com> Link: https://patch.msgid.link/20241107063609.11627-1-surajsonawane0215@gmail.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-11	ASoC: dt-bindings: stm32: add missing port property	Olivier Moysan
	Add missing port property in STM32 SPDIFRX binding. This will prevent potential warning: Unevaluated properties are not allowed ('port' was unexpected) Signed-off-by: Olivier Moysan <olivier.moysan@foss.st.com> Link: https://patch.msgid.link/20241105135942.526624-1-olivier.moysan@foss.st.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-11	ASoC: add symmetric_ prefix for dai->rate/channels/sample_bits	Kuninori Morimoto
	snd_soc_dai has rate/channels/sample_bits parameter, but it is only valid if symmetry is being enforced by symmetric_xxx flag on driver. It is very difficult to know about it from current naming, and easy to misunderstand it. add symmetric_ prefix for it. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://patch.msgid.link/87zfmd8bnf.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-11	Merge back ACPI processor driver changes for 6.13	Rafael J. Wysocki

2024-11-11	Merge back thermal control material for 6.13	Rafael J. Wysocki

2024-11-11	cpufreq: intel_pstate: Rearrange locking in hybrid_init_cpu_capacity_scaling()	Rafael J. Wysocki
	Notice that hybrid_init_cpu_capacity_scaling() only needs to hold hybrid_capacity_lock around __hybrid_init_cpu_capacity_scaling() calls, so introduce a "locked" wrapper around the latter and call it from the former. This allows to drop a local variable and a label that are not needed any more. Also, rename __hybrid_init_cpu_capacity_scaling() to __hybrid_refresh_cpu_capacity_scaling() for consistency. Interestingly enough, this fixes a locking issue introduced by commit 929ebc93ccaa ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems") that put an arch_enable_hybrid_capacity_scale() call under hybrid_capacity_lock, which was a mistake because the latter is acquired in CPU hotplug paths and so it cannot be held around cpus_read_lock() calls. Link: https://lore.kernel.org/linux-pm/SJ1PR11MB6129EDBF22F8A90FC3A3EDC8B9582@SJ1PR11MB6129.namprd11.prod.outlook.com/ Fixes: 929ebc93ccaa ("cpufreq: intel_pstate: Set asymmetric CPU capacity on hybrid systems") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-by: "Borah, Chaitanya Kumar" <chaitanya.kumar.borah@intel.com> Link: https://patch.msgid.link/12554508.O9o76ZdvQC@rjwysocki.net [ rjw: Changelog update ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-11	Merge back cpufreq material for 6.13	Rafael J. Wysocki

2024-11-11	Merge patch series "fscache/cachefiles: Some bugfixes"	Christian Brauner
	Zizhi Wo <wozizhi@huawei.com> says: This patchset mainly includes 5 fix patches about fscache/cachefiles. The first patch fixes an issue with the incorrect return length, and the fourth patch addresses a null pointer dereference issue with file. * patches from https://lore.kernel.org/r/20241107110649.3980193-1-wozizhi@huawei.com: netfs/fscache: Add a memory barrier for FSCACHE_VOLUME_CREATING cachefiles: Fix NULL pointer dereference in object->file cachefiles: Clean up in cachefiles_commit_tmpfile() cachefiles: Fix missing pos updates in cachefiles_ondemand_fd_write_iter() cachefiles: Fix incorrect length return value in cachefiles_ondemand_fd_write_iter() Link: https://lore.kernel.org/r/20241107110649.3980193-1-wozizhi@huawei.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-11	netfs/fscache: Add a memory barrier for FSCACHE_VOLUME_CREATING	Zizhi Wo
	In fscache_create_volume(), there is a missing memory barrier between the bit-clearing operation and the wake-up operation. This may cause a situation where, after a wake-up, the bit-clearing operation hasn't been detected yet, leading to an indefinite wait. The triggering process is as follows: [cookie1] [cookie2] [volume_work] fscache_perform_lookup fscache_create_volume fscache_perform_lookup fscache_create_volume fscache_create_volume_work cachefiles_acquire_volume clear_and_wake_up_bit test_and_set_bit test_and_set_bit goto maybe_wait goto no_wait In the above process, cookie1 and cookie2 has the same volume. When cookie1 enters the -no_wait- process, it will clear the bit and wake up the waiting process. If a barrier is missing, it may cause cookie2 to remain in the -wait- process indefinitely. In commit 3288666c7256 ("fscache: Use clear_and_wake_up_bit() in fscache_create_volume_work()"), barriers were added to similar operations in fscache_create_volume_work(), but fscache_create_volume() was missed. By combining the clear and wake operations into clear_and_wake_up_bit() to fix this issue. Fixes: bfa22da3ed65 ("fscache: Provide and use cache methods to lookup/create/free a volume") Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Link: https://lore.kernel.org/r/20241107110649.3980193-6-wozizhi@huawei.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-11	cachefiles: Fix NULL pointer dereference in object->file	Zizhi Wo
	At present, the object->file has the NULL pointer dereference problem in ondemand-mode. The root cause is that the allocated fd and object->file lifetime are inconsistent, and the user-space invocation to anon_fd uses object->file. Following is the process that triggers the issue: [write fd] [umount] cachefiles_ondemand_fd_write_iter fscache_cookie_state_machine cachefiles_withdraw_cookie if (!file) return -ENOBUFS cachefiles_clean_up_object cachefiles_unmark_inode_in_use fput(object->file) object->file = NULL // file NULL pointer dereference! __cachefiles_write(..., file, ...) Fix this issue by add an additional reference count to the object->file before write/llseek, and decrement after it finished. Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Link: https://lore.kernel.org/r/20241107110649.3980193-5-wozizhi@huawei.com Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-11	cachefiles: Clean up in cachefiles_commit_tmpfile()	Zizhi Wo
	Currently, cachefiles_commit_tmpfile() will only be called if object->flags is set to CACHEFILES_OBJECT_USING_TMPFILE. Only cachefiles_create_file() and cachefiles_invalidate_cookie() set this flag. Both of these functions replace object->file with the new tmpfile, and both are called by fscache_cookie_state_machine(), so there are no concurrency issues. So the equation "d_backing_inode(dentry) == file_inode(object->file)" in cachefiles_commit_tmpfile() will never hold true according to the above conditions. This patch removes this part of the redundant code and does not involve any other logical changes. Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Link: https://lore.kernel.org/r/20241107110649.3980193-4-wozizhi@huawei.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-11	cachefiles: Fix missing pos updates in cachefiles_ondemand_fd_write_iter()	Zizhi Wo
	In the erofs on-demand loading scenario, read and write operations are usually delivered through "off" and "len" contained in read req in user mode. Naturally, pwrite is used to specify a specific offset to complete write operations. However, if the write(not pwrite) syscall is called multiple times in the read-ahead scenario, we need to manually update ki_pos after each write operation to update file->f_pos. This step is currently missing from the cachefiles_ondemand_fd_write_iter function, added to address this issue. Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Link: https://lore.kernel.org/r/20241107110649.3980193-3-wozizhi@huawei.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-11	cachefiles: Fix incorrect length return value in ↵	Zizhi Wo
	cachefiles_ondemand_fd_write_iter() cachefiles_ondemand_fd_write_iter() function first aligns "pos" and "len" to block boundaries. When calling __cachefiles_write(), the aligned "pos" is passed in, but "len" is the original unaligned value(iter->count). Additionally, the returned length of the write operation is the modified "len" aligned by block size, which is unreasonable. The alignment of "pos" and "len" is intended only to check whether the cache has enough space. But the modified len should not be used as the return value of cachefiles_ondemand_fd_write_iter() because the length we passed to __cachefiles_write() is the previous "len". Doing so would result in a mismatch in the data written on-demand. For example, if the length of the user state passed in is not aligned to the block size (the preread scene/DIO writes only need 512 alignment/Fault injection), the length of the write will differ from the actual length of the return. To solve this issue, since the __cachefiles_prepare_write() modifies the size of "len", we pass "aligned_len" to __cachefiles_prepare_write() to calculate the free blocks and use the original "len" as the return value of cachefiles_ondemand_fd_write_iter(). Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Zizhi Wo <wozizhi@huawei.com> Link: https://lore.kernel.org/r/20241107110649.3980193-2-wozizhi@huawei.com Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-11	iomap: drop an obsolete comment in iomap_dio_bio_iter	Christoph Hellwig
	No more zone append special casing in iomap for quite a while. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241111121340.1390540-1-hch@lst.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-11	btrfs: send: check for read-only send root under critical section	Filipe Manana
	We're checking if the send root is read-only without being under the protection of the root's root_item_lock spinlock, which is what protects the root's flags when clearing the read-only flag, done at btrfs_ioctl_subvol_setflags(). Furthermore, it should be done in the same critical section that increments the root's send_in_progress counter, as btrfs_ioctl_subvol_setflags() clears the read-only flag in the same critical section that checks the counter's value. So fix this by moving the read-only check under the critical section delimited by the root's root_item_lock which also increments the root's send_in_progress counter. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: send: check for dead send root under critical section	Filipe Manana
	We're checking if the send root is dead without the protection of the root's root_item_lock spinlock, which is what protects the root's flags. The inverse, setting the dead flag on a root, is done under the protection of that lock, at btrfs_delete_subvolume(). Also checking and updating the root's send_in_progress counter is supposed to be done in the same critical section as checking for or setting the root dead flag, so that these operations are done atomically as a single step (which is correctly done by btrfs_delete_subvolume()). So fix this by checking if the send root is dead in the same critical section that updates the send_in_progress counter, which is protected by the root's root_item_lock spinlock. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: remove check for NULL fs_info at btrfs_folio_end_lock_bitmap()	Filipe Manana
	Smatch complains about possibly dereferencing a NULL fs_info at btrfs_folio_end_lock_bitmap(): fs/btrfs/subpage.c:332 btrfs_folio_end_lock_bitmap() warn: variable dereferenced before check 'fs_info' (see line 326) because we access fs_info to set the 'start_bit' variable before doing the check for a NULL fs_info. However fs_info is never NULL, since in the only caller of btrfs_folio_end_lock_bitmap() is extent_writepage(), where we have an inode which always as a non-NULL fs_info. So remove the check for a NULL fs_info at btrfs_folio_end_lock_bitmap(). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: fix warning on PTR_ERR() against NULL device at btrfs_control_ioctl()	Filipe Manana
	Smatch complains about calling PTR_ERR() against a NULL pointer: fs/btrfs/super.c:2272 btrfs_control_ioctl() warn: passing zero to 'PTR_ERR' Fix this by calling PTR_ERR() against the device pointer only if it contains an error. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: fix a typo in btrfs_use_zone_append	Christoph Hellwig
	REQ_OP_ZONE_APPNED -> REQ_OP_ZONE_APPEND. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: avoid superfluous calls to free_extent_map() in btrfs_encoded_read()	Mark Harmstone
	Change the control flow of btrfs_encoded_read() so that it doesn't call free_extent_map() when we know that this has already been done. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Mark Harmstone <maharmstone@fb.com> Suggested-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: simplify logic to decrement snapshot counter at btrfs_mksnapshot()	Filipe Manana
	There's no point in having a 'snapshot_force_cow' variable to track if we need to decrement the root->snapshot_force_cow counter, as we never jump to the 'out' label after incrementing the counter. Simplify this by removing the variable and always decrementing the counter before the 'out' label, right after the call to btrfs_mksubvol(). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: remove hole from struct btrfs_delayed_node	Filipe Manana
	On x86_64 and a release kernel, there's a 4 bytes hole in the structure after the ref count field: struct btrfs_delayed_node { u64 inode_id; /* 0 8 / u64 bytes_reserved; / 8 8 / struct btrfs_root root; /* 16 8 / struct list_head n_list; / 24 16 / struct list_head p_list; / 40 16 / struct rb_root_cached ins_root; / 56 16 / / --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- / struct rb_root_cached del_root; / 72 16 / struct mutex mutex; / 88 32 / struct btrfs_inode_item inode_item; / 120 160 / / --- cacheline 4 boundary (256 bytes) was 24 bytes ago --- / refcount_t refs; / 280 4 / / XXX 4 bytes hole, try to pack / u64 index_cnt; / 288 8 / long unsigned int flags; / 296 8 / int count; / 304 4 / u32 curr_index_batch_size; / 308 4 / u32 index_item_leaves; / 312 4 / / size: 320, cachelines: 5, members: 15 / / sum members: 312, holes: 1, sum holes: 4 / / padding: 4 / }; Move the 'count' field, which is 4 bytes long, to just below the ref count field, so we eliminate the hole and reduce the structure size from 320 bytes down to 312 bytes: struct btrfs_delayed_node { u64 inode_id; / 0 8 / u64 bytes_reserved; / 8 8 / struct btrfs_root root; /* 16 8 / struct list_head n_list; / 24 16 / struct list_head p_list; / 40 16 / struct rb_root_cached ins_root; / 56 16 / / --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- / struct rb_root_cached del_root; / 72 16 / struct mutex mutex; / 88 32 / struct btrfs_inode_item inode_item; / 120 160 / / --- cacheline 4 boundary (256 bytes) was 24 bytes ago --- / refcount_t refs; / 280 4 / int count; / 284 4 / u64 index_cnt; / 288 8 / long unsigned int flags; / 296 8 / u32 curr_index_batch_size; / 304 4 / u32 index_item_leaves; / 308 4 / / size: 312, cachelines: 5, members: 15 / / last cacheline: 56 bytes */ }; This now allows to have 13 delayed nodes per 4K page instead of 12. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: update stale comment for struct btrfs_delayed_ref_node::add_list	Filipe Manana
	The comment refers to a list in the respective delayed ref head that no longer exists (ref_list), it was replaced with a rbtree (ref_tree) in commit 0e0adbcfdc90 ("btrfs: track refs in a rb_tree instead of a list"). So update the stale comment to refer to the rbtree instead of the old list. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: add new ioctl to wait for cleaned subvolumes	David Sterba
	Add a new unprivileged ioctl that will let the command 'btrfs subvolume sync' work without the (privileged) SEARCH_TREE ioctl. There are several modes of operation, where the most common ones are to wait on a specific subvolume or all currently queued for cleaning. This is utilized e.g. in backup applications that delete subvolumes and wait until they're cleaned to check for remaining space. The other modes are for flexibility, e.g. for monitoring or checkpoints in the queue of deleted subvolumes, again without the need to use SEARCH_TREE. Notes: - waiting is interruptible, the timeout is set to 1 second and is not configurable - repeated calls to the ioctl see a different state, so this is inherently racy when using e.g. the count or peek next/last Use cases: - a subvolume A was deleted, wait for cleaning (WAIT_FOR_ONE) - a bunch of subvolumes were deleted, wait for all (WAIT_FOR_QUEUED or PEEK_LAST + WAIT_FOR_ONE) - count how many are queued (not blocking), for monitoring purposes - report progress (PEEK_NEXT), may miss some if cleaning is quick - own waiting in user space (PEEK_LAST until it's 0) Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: simplify range tracking in cow_file_range()	Haisu Wang
	Simplify tracking of the range processed by using cur_alloc_size only to store the reserved part that may fail to the allocated extent. Remove the ram_size as well since it is always equal to cur_alloc_size in the context. Advance the start in normal path until extent allocation succeeds and keep the start unchanged in the error handling path. Passed the fstest generic/475 test for a hundred times with quota enabled. And a modified generic/475 test by removing the sleep time for a hundred times. About one tenth of the tests do enter the error handling path due to fail to reserve extent. Suggested-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Haisu Wang <haisuwang@tencent.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: remove conditional path allocation in btrfs_read_locked_inode()	Leo Martins
	Remove conditional path allocation from btrfs_read_locked_inode(). Add an ASSERT(path) to indicate it should never be called with a NULL path. Call btrfs_read_locked_inode() directly from btrfs_iget(). This causes code duplication between btrfs_iget() and btrfs_iget_path(), but I think this is justifiable as it removes the need for conditionally allocating the path inside of btrfs_read_locked_inode(). This makes the code easier to reason about and makes it clear who has the responsibility of allocating and freeing the path. Signed-off-by: Leo Martins <loemra.dev@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: push cleanup into btrfs_read_locked_inode()	Leo Martins
	Move btrfs_add_inode_to_root() so it can be called from btrfs_read_locked_inode(), no changes were made to the function. Move cleanup code from btrfs_iget_path() to btrfs_read_locked_inode. This improves readability and improves a leaky abstraction. Previously btrfs_iget_path() had to handle a positive error case as a result of a call to btrfs_search_slot(), but it makes more sense to handle this closer to the source of the call. Signed-off-by: Leo Martins <loemra.dev@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	io_uring/cmd: let cmds to know about dying task	Pavel Begunkov
	When the taks that submitted a request is dying, a task work for that request might get run by a kernel thread or even worse by a half dismantled task. We can't just cancel the task work without running the callback as the cmd might need to do some clean up, so pass a flag instead. If set, it's not safe to access any task resources and the callback is expected to cancel the cmd ASAP. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: add struct io_btrfs_cmd as type for io_uring_cmd_to_pdu()	Mark Harmstone
	Add struct io_btrfs_cmd as a wrapper type for io_uring_cmd_to_pdu(), rather than using a raw pointer. Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: add io_uring command for encoded reads (ENCODED_READ ioctl)	Mark Harmstone
	Add an io_uring command for encoded reads, using the same interface as the existing BTRFS_IOC_ENCODED_READ ioctl. btrfs_uring_encoded_read() is an io_uring version of btrfs_ioctl_encoded_read(), which validates the user input and calls btrfs_encoded_read() to read the appropriate metadata. If we determine that we need to read an extent from disk, we call btrfs_encoded_read_regular_fill_pages() through btrfs_uring_read_extent() to prepare the bio. The existing btrfs_encoded_read_regular_fill_pages() is changed so that if it is passed a valid uring_ctx, rather than waking up any waiting threads it calls btrfs_uring_read_extent_endio(). This in turn copies the read data back to userspace, and calls io_uring_cmd_done() to complete the io_uring command. Because we're potentially doing a non-blocking read, btrfs_uring_read_extent() doesn't clean up after itself if it returns -EIOCBQUEUED. Instead, it allocates a priv struct, populates the fields there that we will need to unlock the inode and free our allocations, and defers this to the btrfs_uring_read_finished() that gets called when the bio completes. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages()	Mark Harmstone
	Change btrfs_encoded_read_regular_fill_pages() so that the priv struct is allocated rather than stored on the stack, in preparation for adding an asynchronous mode to the function. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: don't sleep in btrfs_encoded_read() if IOCB_NOWAIT is set	Mark Harmstone
	Change btrfs_encoded_read() so that it returns -EAGAIN rather than sleeps if IOCB_NOWAIT is set in iocb->ki_flags. The conditions that require sleeping are: inode lock, writeback, extent lock, ordered range. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-11	btrfs: change btrfs_encoded_read() so that reading of extent is done by caller	Mark Harmstone
	Change the behaviour of btrfs_encoded_read() so that if it needs to read an extent from disk, it leaves the extent and inode locked and returns -EIOCBQUEUED. The caller is then responsible for doing the I/O via btrfs_encoded_read_regular() and unlocking the extent and inode. Signed-off-by: Mark Harmstone <maharmstone@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>