linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-07-25	netfilter: nf_tables: Remove unused nft_reduce_is_readonly()	Yue Haibing
	Since commit 9e539c5b6d9c ("netfilter: nf_tables: disable expression reduction infra") this is unused. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: x_tables: Remove unused functions xt_{in\|out}name()	Yue Haibing
	Since commit 2173c519d5e9 ("audit: normalize NETFILTER_PKT") these are unused, so can be removed. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: load nf_log_syslog on enabling nf_conntrack_log_invalid	Lance Yang
	When no logger is registered, nf_conntrack_log_invalid fails to log invalid packets, leaving users unaware of actual invalid traffic. Improve this by loading nf_log_syslog, similar to how 'iptables -I FORWARD 1 -m conntrack --ctstate INVALID -j LOG' triggers it. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Zi Li <zi.li@linux.dev> Signed-off-by: Lance Yang <lance.yang@linux.dev> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	netfilter: conntrack: table full detailed log	lvxiafei
	Add the netns field in the "nf_conntrack: table full, dropping packet" log to help locate the specific netns when the table is full. Signed-off-by: lvxiafei <lvxiafei@sensetime.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-25	Merge patch series "can: kvaser_usb: Simplify identification of physical CAN ↵	Marc Kleine-Budde
	interfaces" Jimmy Assarsson <extja@kvaser.com> says: This patch series simplifies the process of identifying which network interface (can0..canX) corresponds to which physical CAN channel on Kvaser USB based CAN interfaces. Note that this patch series is based on [1] "can: kvaser_pciefd: Simplify identification of physical CAN interfaces" Changes in v3: - Fix GCC compiler array warning (-Warray-bounds) - Fix transient Sparse warning - Add tag Reviewed-by Vincent Mailhol Changes in v2: - New patch with devlink documentation - New patch assigning netdev.dev_port - Formatting and refactoring [1] https://lore.kernel.org/linux-can/20250725123230.8-1-extja@kvaser.com Link: https://patch.msgid.link/20250725123452.41-1-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	Documentation: devlink: add devlink documentation for the kvaser_usb driver	Jimmy Assarsson
	List the version information reported by the kvaser_usb driver through devlink. Suggested-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-12-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_usb: Add devlink port support	Jimmy Assarsson
	Register each CAN channel of the device as an devlink physical port. This makes it easier to get device information for a given network interface (i.e. can2). Example output: $ devlink dev usb/1-1.3:1.0 $ devlink port usb/1-1.3:1.0/0: type eth netdev can0 flavour physical port 0 splittable false usb/1-1.3:1.0/1: type eth netdev can1 flavour physical port 1 splittable false $ devlink port show can1 usb/1-1.3:1.0/1: type eth netdev can1 flavour physical port 0 splittable false $ devlink dev info usb/1-1.3:1.0: driver kvaser_usb serial_number 1020 versions: fixed: board.rev 1 board.id 7330130009653 running: fw 3.22.527 $ ethtool -i can1 driver: kvaser_usb version: 6.12.10-arch1-1 firmware-version: 3.22.527 expansion-rom-version: bus-info: 1-1.3:1.0 supports-statistics: no supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-11-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_usb: Expose device information via devlink info_get()	Jimmy Assarsson
	Expose device information via devlink info_get(): * Serial number * Firmware version * Hardware revision * EAN (product number) Example output: $ devlink dev usb/1-1.2:1.0 $ devlink dev info usb/1-1.2:1.0: driver kvaser_usb serial_number 1020 versions: fixed: board.rev 1 board.id 7330130009653 running: fw 3.22.527 Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-10-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_usb: Add devlink support	Jimmy Assarsson
	Add devlink support at device level. Example output: $ devlink dev usb/1-1.3:1.0 $ devlink dev info usb/1-1.3:1.0: driver kvaser_usb Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-9-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_usb: Store additional device information	Jimmy Assarsson
	Store additional device information; EAN (product number), serial_number and hardware revision. Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-8-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_usb: Store the different firmware version components in a struct	Jimmy Assarsson
	Store firmware version in kvaser_usb_fw_version struct, specifying the different components of the version number. And drop debug prinout of firmware version, since later patches will expose it via the devlink interface. Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-7-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_usb: Move comment regarding max_tx_urbs	Jimmy Assarsson
	Move comment regarding max_tx_urbs, to where the struct member is declared. Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-6-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_usb: Add intermediate variables	Jimmy Assarsson
	Add intermediate variables, for readability and to simplify future patches. Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-5-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_usb: Assign netdev.dev_port based on device channel index	Jimmy Assarsson
	Assign netdev.dev_port based on the device channel index, to indicate the port number of the network device. While this driver already uses netdev.dev_id for that purpose, dev_port is more appropriate. However, retain dev_id to avoid potential regressions. Fixes: 3e66d0138c05 ("can: populate netdev::dev_id for udev discrimination") Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-4-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_usb: Add support for ethtool set_phys_id()	Jimmy Assarsson
	Add support for ethtool set_phys_id(), to physically locate devices by flashing a LED on the device. Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-3-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_usb: Add support to control CAN LEDs on device	Jimmy Assarsson
	Add support to turn on/off CAN LEDs on device. Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123452.41-2-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	Merge patch series "can: kvaser_pciefd: Simplify identification of physical ↵	Marc Kleine-Budde
	CAN interfaces" Jimmy Assarsson <extja@kvaser.com> says: This patch series simplifies the process of identifying which network interface (can0..canX) corresponds to which physical CAN channel on Kvaser PCIe based CAN interfaces. Changes in v4: - Fix transient Sparse warning - Add tag Reviewed-by Vincent Mailhol Changes in v3: - Fixed typo; kvaser_pcied -> kvaser_pciefd in documentation patch Changes in v2: - Replace use of netdev.dev_id with netdev.dev_port - Formatting and refactoring - New patch with devlink documentation Link: https://patch.msgid.link/20250725123230.8-1-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	Documentation: devlink: add devlink documentation for the kvaser_pciefd driver	Jimmy Assarsson
	List the version information reported by the kvaser_pciefd driver through devlink. Suggested-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123230.8-11-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_pciefd: Add devlink port support	Jimmy Assarsson
	Register each CAN channel of the device as an devlink physical port. This makes it easier to get device information for a given network interface (i.e. can2). Example output: $ devlink dev pci/0000:07:00.0 pci/0000:08:00.0 pci/0000:09:00.0 $ devlink port pci/0000:07:00.0/0: type eth netdev can0 flavour physical port 0 splittable false pci/0000:07:00.0/1: type eth netdev can1 flavour physical port 1 splittable false pci/0000:07:00.0/2: type eth netdev can2 flavour physical port 2 splittable false pci/0000:07:00.0/3: type eth netdev can3 flavour physical port 3 splittable false pci/0000:08:00.0/0: type eth netdev can4 flavour physical port 0 splittable false pci/0000:08:00.0/1: type eth netdev can5 flavour physical port 1 splittable false pci/0000:09:00.0/0: type eth netdev can6 flavour physical port 0 splittable false pci/0000:09:00.0/1: type eth netdev can7 flavour physical port 1 splittable false pci/0000:09:00.0/2: type eth netdev can8 flavour physical port 2 splittable false pci/0000:09:00.0/3: type eth netdev can9 flavour physical port 3 splittable false $ devlink port show can2 pci/0000:07:00.0/2: type eth netdev can2 flavour physical port 2 splittable false $ devlink dev info pci/0000:07:00.0: driver kvaser_pciefd versions: running: fw 1.3.75 pci/0000:08:00.0: driver kvaser_pciefd versions: running: fw 2.4.29 pci/0000:09:00.0: driver kvaser_pciefd versions: running: fw 1.3.72 $ sudo ethtool -i can2 driver: kvaser_pciefd version: 6.8.0-40-generic firmware-version: 1.3.75 expansion-rom-version: bus-info: 0000:07:00.0 supports-statistics: no supports-test: no supports-eeprom-access: no supports-register-dump: no supports-priv-flags: no Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123230.8-10-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_pciefd: Expose device firmware version via devlink info_get()	Jimmy Assarsson
	Expose device firmware version via devlink info_get(). Example output: $ devlink dev pci/0000:07:00.0 pci/0000:08:00.0 pci/0000:09:00.0 $ devlink dev info pci/0000:07:00.0: driver kvaser_pciefd versions: running: fw 1.3.75 pci/0000:08:00.0: driver kvaser_pciefd versions: running: fw 2.4.29 pci/0000:09:00.0: driver kvaser_pciefd versions: running: fw 1.3.72 Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123230.8-9-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_pciefd: Add devlink support	Jimmy Assarsson
	Add devlink support at device level. Example output: $ devlink dev pci/0000:07:00.0 pci/0000:08:00.0 pci/0000:09:00.0 $ devlink dev info pci/0000:07:00.0: driver kvaser_pciefd pci/0000:08:00.0: driver kvaser_pciefd pci/0000:09:00.0: driver kvaser_pciefd Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123230.8-8-extja@kvaser.com [mkl: kvaser_pciefd_remove(): fix use-after-free] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_pciefd: Split driver into C-file and header-file.	Jimmy Assarsson
	Split driver into C-file and header-file, to simplify future patches. Move common definitions and declarations to a header file. Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123230.8-7-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_pciefd: Store device channel index	Jimmy Assarsson
	Store device channel index in netdev.dev_port. Fixes: 26ad340e582d ("can: kvaser_pciefd: Add driver for Kvaser PCIEcan devices") Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123230.8-6-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_pciefd: Store the different firmware version components in a struct	Jimmy Assarsson
	Store firmware version in kvaser_pciefd_fw_version struct, specifying the different components of the version number. And drop debug prinout of firmware version, since later patches will expose it via the devlink interface. Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123230.8-5-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_pciefd: Add intermediate variable for device struct in probe()	Jimmy Assarsson
	Add intermediate variable, for readability and to simplify future patches. Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123230.8-4-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_pciefd: Add support for ethtool set_phys_id()	Jimmy Assarsson
	Add support for ethtool set_phys_id(), to physically locate devices by flashing a LED on the device. Reviewed-by: Axel Forsman <axfo@kvaser.com> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123230.8-3-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	can: kvaser_pciefd: Add support to control CAN LEDs on device	Jimmy Assarsson
	Add support to turn on/off CAN LEDs on device. Turn off all CAN LEDs in probe, since they are default on after a reset or power on. Reviewed-by: Axel Forsman <axfo@kvaser.com> Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr> Signed-off-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250725123230.8-2-extja@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-07-25	Merge tag 'block-6.16-20250725' of git://git.kernel.dk/linux	Linus Torvalds
	Pull block fix from Jens Axboe: "Just a single fix for regression in this release, where a module reference could be leaked" * tag 'block-6.16-20250725' of git://git.kernel.dk/linux: block: fix module reference leak in mq-deadline I/O scheduler
2025-07-25	Merge tag 'vfs-6.16-rc8.fixes' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "Two last-minute fixes for this cycle: - Set afs vllist to NULL if addr parsing fails - Add a missing check for reaching the end of the string in afs" * tag 'vfs-6.16-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: afs: Set vllist to NULL if addr parsing fails afs: Fix check for NULL terminator
2025-07-25	Merge tag 'bcachefs-2025-07-24' of git://evilpiepirate.org/bcachefs	Linus Torvalds
	Pull bcachefs fixes from Kent Overstreet: "User reported fixes: - Fix btree node scan on encrypted filesystems by not using btree node header fields encrypted - Fix a race in btree write buffer flush; this caused EROs primarily during fsck for some people" * tag 'bcachefs-2025-07-24' of git://evilpiepirate.org/bcachefs: bcachefs: Add missing snapshots_seen_add_inorder() bcachefs: Fix write buffer flushing from open journal entry bcachefs: btree_node_scan: don't re-read before initializing found_btree_node
2025-07-25	ARM: 9450/1: Fix allowing linker DCE with binutils < 2.36	Nathan Chancellor
	Commit e7607f7d6d81 ("ARM: 9443/1: Require linker to support KEEP within OVERLAY for DCE") accidentally broke the binutils version restriction that was added in commit 0d437918fb64 ("ARM: 9414/1: Fix build issue with LD_DEAD_CODE_DATA_ELIMINATION"), reintroducing the segmentation fault addressed by that workaround. Restore the binutils version dependency by using CONFIG_LD_CAN_USE_KEEP_IN_OVERLAY as an additional condition to ensure that CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION is only enabled with binutils >= 2.36 and ld.lld >= 21.0.0. Closes: https://lore.kernel.org/6739da7d-e555-407a-b5cb-e5681da71056@landley.net/ Closes: https://lore.kernel.org/CAFERDQ0zPoya5ZQfpbeuKVZEo_fKsonLf6tJbp32QnSGAtbi+Q@mail.gmail.com/ Cc: stable@vger.kernel.org Fixes: e7607f7d6d81 ("ARM: 9443/1: Require linker to support KEEP within OVERLAY for DCE") Reported-by: Rob Landley <rob@landley.net> Tested-by: Rob Landley <rob@landley.net> Reported-by: Martin Wetterwald <martin@wetterwald.eu> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2025-07-25	ARM: 9448/1: Use an absolute path to unified.h in KBUILD_AFLAGS	Nathan Chancellor
	After commit d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper flags and language target"), which updated as-instr to use the 'assembler-with-cpp' language option, the Kbuild version of as-instr always fails internally for arch/arm with <command-line>: fatal error: asm/unified.h: No such file or directory compilation terminated. because '-include' flags are now taken into account by the compiler driver and as-instr does not have '$(LINUXINCLUDE)', so unified.h is not found. This went unnoticed at the time of the Kbuild change because the last use of as-instr in Kbuild that arch/arm could reach was removed in 5.7 by commit 541ad0150ca4 ("arm: Remove 32bit KVM host support") but a stable backport of the Kbuild change to before that point exposed this potential issue if one were to be reintroduced. Follow the general pattern of '-include' paths throughout the tree and make unified.h absolute using '$(srctree)' to ensure KBUILD_AFLAGS can be used independently. Closes: https://lore.kernel.org/CACo-S-1qbCX4WAVFA63dWfHtrRHZBTyyr2js8Lx=Az03XHTTHg@mail.gmail.com/ Cc: stable@vger.kernel.org Fixes: d5c8d6e0fa61 ("kbuild: Update assembler calls to use proper flags and language target") Reported-by: KernelCI bot <bot@kernelci.org> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
2025-07-25	ext4: do not BUG when INLINE_DATA_FL lacks system.data xattr	Theodore Ts'o
	A syzbot fuzzed image triggered a BUG_ON in ext4_update_inline_data() when an inode had the INLINE_DATA_FL flag set but was missing the system.data extended attribute. Since this can happen due to a maiciouly fuzzed file system, we shouldn't BUG, but rather, report it as a corrupted file system. Add similar replacements of BUG_ON with EXT4_ERROR_INODE() ii ext4_create_inline_data() and ext4_inline_data_truncate(). Reported-by: syzbot+544248a761451c0df72f@syzkaller.appspotmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: implement linear-like traversal across order xarrays	Baokun Li
	Although we now perform ordered traversal within an xarray, this is currently limited to a single xarray. However, we have multiple such xarrays, which prevents us from guaranteeing a linear-like traversal where all groups on the right are visited before all groups on the left. For example, suppose we have 128 block groups, with a target group of 64, a target length corresponding to an order of 1, and available free groups of 16 (order 1) and group 65 (order 8): For linear traversal, when no suitable free block is found in group 64, it will search in the next block group until group 127, then start searching from 0 up to block group 63. It ensures continuous forward traversal, which is consistent with the unidirectional rotation behavior of HDD platters. Additionally, the block group lock contention during freeing block is unavoidable. The goal increasing from 0 to 64 indicates that previously scanned groups (which had no suitable free space and are likely to free blocks later) and skipped groups (which are currently in use) have newly freed some used blocks. If we allocate blocks in these groups, the probability of competing with other processes increases. For non-linear traversal, we first traverse all groups in order_1. If only group 16 has free space in this list, we first traverse [63, 128), then traverse [0, 64) to find the available group 16, and then allocate blocks in group 16. Therefore, it cannot guarantee continuous traversal in one direction, thus increasing the probability of contention. So refactor ext4_mb_scan_groups_xarray() to ext4_mb_scan_groups_xa_range() to only traverse a fixed range of groups, and move the logic for handling wrap around to the caller. The caller first iterates through all xarrays in the range [start, ngroups) and then through the range [0, start). This approach simulates a linear scan, which reduces contention between freeing blocks and allocating blocks. Assume we have the following groups, where "\|" denotes the xarray traversal start position: order_1_groups: AB \| CD order_2_groups: EF \| GH Traversal order: Before: C > D > A > B > G > H > E > F After: C > D > G > H > A > B > E > F Performance test data follows: \|CPU: Kunpeng 920 \| P80 \| P1 \| \|Memory: 512GB \|------------------------\|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 19555 \| 20049 (+2.5%) \| 315636 \| 316724 (-0.3%) \| \|mb_optimize_scan=1 \| 15496 \| 19342 (+24.8%) \| 323569 \| 328324 (+1.4%) \| \|CPU: AMD 9654 * 2 \| P96 \| P1 \| \|Memory: 1536GB \|------------------------\|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 53192 \| 52125 (-2.0%) \| 212678 \| 215136 (+1.1%) \| \|mb_optimize_scan=1 \| 37636 \| 50331 (+33.7%) \| 214189 \| 209431 (-2.2%) \| Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-18-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: refactor choose group to scan group	Baokun Li
	This commit converts the `choose group` logic to `scan group` using previously prepared helper functions. This allows us to leverage xarrays for ordered non-linear traversal, thereby mitigating the "bouncing" issue inherent in the `choose group` mechanism. This also decouples linear and non-linear traversals, leading to cleaner and more readable code. Key changes: * ext4_mb_choose_next_group() is refactored to ext4_mb_scan_groups(). * Replaced ext4_mb_good_group() with ext4_mb_scan_group() in non-linear traversals, and related functions now return error codes instead of group info. * Added ext4_mb_scan_groups_linear() for performing linear scans starting from a specific group for a set number of times. * Linear scans now execute up to sbi->s_mb_max_linear_groups times, so ac_groups_linear_remaining is removed as it's no longer used. * ac->ac_criteria is now used directly instead of passing cr around. Also, ac->ac_criteria is incremented directly after groups scan fails for the corresponding criteria. * Since we're now directly scanning groups instead of finding a good group then scanning, the following variables and flags are no longer needed, s_bal_cX_groups_considered is sufficient. s_bal_p2_aligned_bad_suggestions s_bal_goal_fast_bad_suggestions s_bal_best_avail_bad_suggestions EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-17-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: convert free groups order lists to xarrays	Baokun Li
	While traversing the list, holding a spin_lock prevents load_buddy, making direct use of ext4_try_lock_group impossible. This can lead to a bouncing scenario where spin_is_locked(grp_A) succeeds, but ext4_try_lock_group() fails, forcing the list traversal to repeatedly restart from grp_A. In contrast, linear traversal directly uses ext4_try_lock_group(), avoiding this bouncing. Therefore, we need a lockless, ordered traversal to achieve linear-like efficiency. Therefore, this commit converts both average fragment size lists and largest free order lists into ordered xarrays. In an xarray, the index represents the block group number and the value holds the block group information; a non-empty value indicates the block group's presence. While insertion and deletion complexity remain O(1), lookup complexity changes from O(1) to O(nlogn), which may slightly reduce single-threaded performance. Additionally, xarray insertions might fail, potentially due to memory allocation issues. However, since we have linear traversal as a fallback, this isn't a major problem. Therefore, we've only added a warning message for insertion failures here. A helper function ext4_mb_find_good_group_xarray() is added to find good groups in the specified xarray starting at the specified position start, and when it reaches ngroups-1, it wraps around to 0 and then to start-1. This ensures an ordered traversal within the xarray. Performance test results are as follows: Single-process operations on an empty disk show negligible impact, while multi-process workloads demonstrate a noticeable performance gain. \|CPU: Kunpeng 920 \| P80 \| P1 \| \|Memory: 512GB \|------------------------\|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 20097 \| 19555 (-2.6%) \| 316141 \| 315636 (-0.2%) \| \|mb_optimize_scan=1 \| 13318 \| 15496 (+16.3%) \| 325273 \| 323569 (-0.5%) \| \|CPU: AMD 9654 * 2 \| P96 \| P1 \| \|Memory: 1536GB \|------------------------\|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 53603 \| 53192 (-0.7%) \| 214243 \| 212678 (-0.7%) \| \|mb_optimize_scan=1 \| 20887 \| 37636 (+80.1%) \| 213632 \| 214189 (+0.2%) \| [ Applied spelling fixes per discussion on the ext4-list see thread referened in the Link tag. --tytso] Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-16-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: factor out ext4_mb_scan_group()	Baokun Li
	Extract ext4_mb_scan_group() to make the code clearer and to prepare for the later conversion of 'choose group' to 'scan groups'. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-15-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: factor out ext4_mb_might_prefetch()	Baokun Li
	Extract ext4_mb_might_prefetch() to make the code clearer and to prepare for the later conversion of 'choose group' to 'scan groups'. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-14-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: factor out __ext4_mb_scan_group()	Baokun Li
	Extract __ext4_mb_scan_group() to make the code clearer and to prepare for the later conversion of 'choose group' to 'scan groups'. No functional changes. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-13-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: fix largest free orders lists corruption on mb_optimize_scan switch	Baokun Li
	The grp->bb_largest_free_order is updated regardless of whether mb_optimize_scan is enabled. This can lead to inconsistencies between grp->bb_largest_free_order and the actual s_mb_largest_free_orders list index when mb_optimize_scan is repeatedly enabled and disabled via remount. For example, if mb_optimize_scan is initially enabled, largest free order is 3, and the group is in s_mb_largest_free_orders[3]. Then, mb_optimize_scan is disabled via remount, block allocations occur, updating largest free order to 2. Finally, mb_optimize_scan is re-enabled via remount, more block allocations update largest free order to 1. At this point, the group would be removed from s_mb_largest_free_orders[3] under the protection of s_mb_largest_free_orders_locks[2]. This lock mismatch can lead to list corruption. To fix this, whenever grp->bb_largest_free_order changes, we now always attempt to remove the group from its old order list. However, we only insert the group into the new order list if `mb_optimize_scan` is enabled. This approach helps prevent lock inconsistencies and ensures the data in the order lists remains reliable. Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning") CC: stable@vger.kernel.org Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-12-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: fix zombie groups in average fragment size lists	Baokun Li
	Groups with no free blocks shouldn't be in any average fragment size list. However, when all blocks in a group are allocated(i.e., bb_fragments or bb_free is 0), we currently skip updating the average fragment size, which means the group isn't removed from its previous s_mb_avg_fragment_size[old] list. This created "zombie" groups that were always skipped during traversal as they couldn't satisfy any block allocation requests, negatively impacting traversal efficiency. Therefore, when a group becomes completely full, bb_avg_fragment_size_order is now set to -1. If the old order was not -1, a removal operation is performed; if the new order is not -1, an insertion is performed. Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning") CC: stable@vger.kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-11-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: merge freed extent with existing extents before insertion	Baokun Li
	Attempt to merge ext4_free_data with already inserted free extents prior to adding new ones. This strategy drastically cuts down the number of times locks are held. For example, if prev, new, and next extents are all mergeable, the existing code (before this patch) requires acquiring the s_md_lock three times: prev merge into new and free prev // hold lock next merge into new and free next // hold lock insert new // hold lock After the patch, it only needs to be acquired once: new merge into next and free new // no lock next merge into prev and free next // hold lock Performance test data follows: Test: Running will-it-scale/fallocate2 on CPU-bound containers. Observation: Average fallocate operations per container per second. \|CPU: Kunpeng 920 \| P80 \| P1 \| \|Memory: 512GB \|------------------------\|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 20043 \| 20097 (+0.2%) \| 314331 \| 316141 (+0.5%) \| \|mb_optimize_scan=1 \| 7290 \| 13318 (+87.4%) \| 324226 \| 325273 (+0.3%) \| \|CPU: AMD 9654 * 2 \| P96 \| P1 \| \|Memory: 1536GB \|------------------------\|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 54999 \| 53603 (-2.5%) \| 214380 \| 214243 (-0.06%)\| \|mb_optimize_scan=1 \| 13497 \| 20887 (+54.6%) \| 216276 \| 213632 (-1.2%) \| Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-10-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: convert sbi->s_mb_free_pending to atomic_t	Baokun Li
	Previously, s_md_lock was used to protect s_mb_free_pending during modifications, while smp_mb() ensured fresh reads, so s_md_lock just guarantees the atomicity of s_mb_free_pending. Thus we optimized it by converting s_mb_free_pending into an atomic variable, thereby eliminating s_md_lock and minimizing lock contention. This also prepares for future lockless merging of free extents. Following this modification, s_md_lock is exclusively responsible for managing insertions and deletions within s_freed_data_list, along with operations involving list_splice. Performance test data follows: Test: Running will-it-scale/fallocate2 on CPU-bound containers. Observation: Average fallocate operations per container per second. \|CPU: Kunpeng 920 \| P80 \| P1 \| \|Memory: 512GB \|------------------------\|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 19628 \| 20043 (+2.1%) \| 320885 \| 314331 (-2.0%) \| \|mb_optimize_scan=1 \| 7129 \| 7290 (+2.2%) \| 321275 \| 324226 (+0.9%) \| \|CPU: AMD 9654 * 2 \| P96 \| P1 \| \|Memory: 1536GB \|------------------------\|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 53760 \| 54999 (+2.3%) \| 213145 \| 214380 (+0.5%) \| \|mb_optimize_scan=1 \| 12716 \| 13497 (+6.1%) \| 215262 \| 216276 (+0.4%) \| Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-9-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: fix typo in CR_GOAL_LEN_SLOW comment	Baokun Li
	Remove the superfluous "find_". Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-8-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: get rid of some obsolete EXT4_MB_HINT flags	Baokun Li
	Since nobody has used these EXT4_MB_HINT flags for ages, let's remove them. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-7-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: utilize multiple global goals to reduce contention	Baokun Li
	When allocating data blocks, if the first try (goal allocation) fails and stream allocation is on, it tries a global goal starting from the last group we used (s_mb_last_group). This helps cluster large files together to reduce free space fragmentation, and the data block contiguity also accelerates write-back to disk. However, when multiple processes allocate blocks, having just one global goal means they all fight over the same group. This drastically lowers the chances of extents merging and leads to much worse file fragmentation. To mitigate this multi-process contention, we now employ multiple global goals, with the number of goals being the minimum between the number of possible CPUs and one-quarter of the filesystem's total block group count. To ensure a consistent goal for each inode, we select the corresponding goal by taking the inode number modulo the total number of goals. Performance test data follows: Test: Running will-it-scale/fallocate2 on CPU-bound containers. Observation: Average fallocate operations per container per second. \|CPU: Kunpeng 920 \| P80 \| P1 \| \|Memory: 512GB \|------------------------\|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 9636 \| 19628 (+103%) \| 337597 \| 320885 (-4.9%) \| \|mb_optimize_scan=1 \| 4834 \| 7129 (+47.4%) \| 341440 \| 321275 (-5.9%) \| \|CPU: AMD 9654 * 2 \| P96 \| P1 \| \|Memory: 1536GB \|------------------------\|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 22341 \| 53760 (+140%) \| 219707 \| 213145 (-2.9%) \| \|mb_optimize_scan=1 \| 9177 \| 12716 (+38.5%) \| 215732 \| 215262 (+0.2%) \| Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-6-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: remove unnecessary s_md_lock on update s_mb_last_group	Baokun Li
	After we optimized the block group lock, we found another lock contention issue when running will-it-scale/fallocate2 with multiple processes. The fallocate's block allocation and the truncate's block release were fighting over the s_md_lock. The problem is, this lock protects totally different things in those two processes: the list of freed data blocks (s_freed_data_list) when releasing, and where to start looking for new blocks (mb_last_group) when allocating. Now we only need to track s_mb_last_group and no longer need to track s_mb_last_start, so we don't need the s_md_lock lock to ensure that the two are consistent. Since s_mb_last_group is merely a hint and doesn't require strong synchronization, READ_ONCE/WRITE_ONCE is sufficient. Besides, the s_mb_last_group data type only requires ext4_group_t (i.e., unsigned int), rendering unsigned long superfluous. Performance test data follows: Test: Running will-it-scale/fallocate2 on CPU-bound containers. Observation: Average fallocate operations per container per second. \|CPU: Kunpeng 920 \| P80 \| P1 \| \|Memory: 512GB \|------------------------\|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 4821 \| 9636 (+99.8%) \| 314065 \| 337597 (+7.4%) \| \|mb_optimize_scan=1 \| 4784 \| 4834 (+1.04%) \| 316344 \| 341440 (+7.9%) \| \|CPU: AMD 9654 * 2 \| P96 \| P1 \| \|Memory: 1536GB \|------------------------\|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| base \| patched \| \|-------------------\|-------\|----------------\|--------\|----------------\| \|mb_optimize_scan=0 \| 15371 \| 22341 (+45.3%) \| 205851 \| 219707 (+6.7%) \| \|mb_optimize_scan=1 \| 6101 \| 9177 (+50.4%) \| 207373 \| 215732 (+4.0%) \| Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-5-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: remove unnecessary s_mb_last_start	Baokun Li
	Since stream allocation does not use ac->ac_f_ex.fe_start, it is set to -1 by default, so the no longer needed sbi->s_mb_last_start is removed. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: separate stream goal hits from s_bal_goals for better tracking	Baokun Li
	In ext4_mb_regular_allocator(), after the call to ext4_mb_find_by_goal() fails to achieve the inode goal, allocation continues with the stream allocation global goal. Currently, hits for both are combined in sbi->s_bal_goals, hindering accurate optimization. This commit separates global goal hits into sbi->s_bal_stream_goals. Since stream allocation doesn't use ac->ac_g_ex.fe_start, set fe_start to -1. This prevents stream allocations from being counted in s_bal_goals. Also clear EXT4_MB_HINT_TRY_GOAL to avoid calling ext4_mb_find_by_goal again. After adding `stream_goal_hits`, `/proc/fs/ext4/sdx/mb_stats` will show: mballoc: reqs: 840347 success: 750992 groups_scanned: 1230506 cr_p2_aligned_stats: hits: 21531 groups_considered: 411664 extents_scanned: 21531 useless_loops: 0 bad_suggestions: 6 cr_goal_fast_stats: hits: 111222 groups_considered: 1806728 extents_scanned: 467908 useless_loops: 0 bad_suggestions: 13 cr_best_avail_stats: hits: 36267 groups_considered: 1817631 extents_scanned: 156143 useless_loops: 0 bad_suggestions: 204 cr_goal_slow_stats: hits: 106396 groups_considered: 5671710 extents_scanned: 22540056 useless_loops: 123747 cr_any_free_stats: hits: 138071 groups_considered: 724692 extents_scanned: 23615593 useless_loops: 585 extents_scanned: 46804261 goal_hits: 1307 stream_goal_hits: 236317 len_goal_hits: 155549 2^n_hits: 21531 breaks: 225096 lost: 35062 buddies_generated: 40/40 buddies_time_used: 48004 preallocated: 5962467 discarded: 4847560 Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-07-25	ext4: add ext4_try_lock_group() to skip busy groups	Baokun Li
	When ext4 allocates blocks, we used to just go through the block groups one by one to find a good one. But when there are tons of block groups (like hundreds of thousands or even millions) and not many have free space (meaning they're mostly full), it takes a really long time to check them all, and performance gets bad. So, we added the "mb_optimize_scan" mount option (which is on by default now). It keeps track of some group lists, so when we need a free block, we can just grab a likely group from the right list. This saves time and makes block allocation much faster. But when multiple processes or containers are doing similar things, like constantly allocating 8k blocks, they all try to use the same block group in the same list. Even just two processes doing this can cut the IOPS in half. For example, one container might do 300,000 IOPS, but if you run two at the same time, the total is only 150,000. Since we can already look at block groups in a non-linear way, the first and last groups in the same list are basically the same for finding a block right now. Therefore, add an ext4_try_lock_group() helper function to skip the current group when it is locked by another process, thereby avoiding contention with other processes. This helps ext4 make better use of having multiple block groups. Also, to make sure we don't skip all the groups that have free space when allocating blocks, we won't try to skip busy groups anymore when ac_criteria is CR_ANY_FREE. Performance test data follows: Test: Running will-it-scale/fallocate2 on CPU-bound containers. Observation: Average fallocate operations per container per second. \|CPU: Kunpeng 920 \| P80 \| \|Memory: 512GB \|-------------------------\| \|960GB SSD (0.5GB/s)\| base \| patched \| \|-------------------\|-------\|-----------------\| \|mb_optimize_scan=0 \| 2667 \| 4821 (+80.7%) \| \|mb_optimize_scan=1 \| 2643 \| 4784 (+81.0%) \| \|CPU: AMD 9654 * 2 \| P96 \| \|Memory: 1536GB \|-------------------------\| \|960GB SSD (1GB/s) \| base \| patched \| \|-------------------\|-------\|-----------------\| \|mb_optimize_scan=0 \| 3450 \| 15371 (+345%) \| \|mb_optimize_scan=1 \| 3209 \| 6101 (+90.0%) \| Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250714130327.1830534-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>