summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-20io_uring/cmd: axe duplicate io_uring_cmd_import_fixed_vec() declarationCaleb Sander Mateos
io_uring_cmd_import_fixed_vec() is declared in both include/linux/io_uring/cmd.h and io_uring/uring_cmd.h. The declarations are identical (if redundant) for CONFIG_IO_URING=y. But if CONFIG_IO_URING=N, include/linux/io_uring/cmd.h declares the function as static inline while io_uring/uring_cmd.h declares it as extern. This causes linker errors if the declaration in io_uring/uring_cmd.h is used. Remove the declaration in io_uring/uring_cmd.h to avoid linker errors and prevent the declarations getting out of sync. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: ef4902752972 ("io_uring/cmd: introduce io_uring_cmd_import_fixed_vec") Link: https://lore.kernel.org/r/20250520193337.1374509-1-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20selftests/sched_ext: Add test for scx_bpf_select_cpu_and() via test_runAndrea Righi
Update the allowed_cpus selftest to include a check to validate the behavior of scx_bpf_select_cpu_and() when invoked via a BPF test_run call. Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-20sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked contextAndrea Righi
Allow scx_bpf_select_cpu_and() to be used from an unlocked context, in addition to ops.enqueue() or ops.select_cpu(). This enables schedulers, including user-space ones, to implement a consistent idle CPU selection policy and helps reduce code duplication. Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-20sched_ext: idle: Validate locking correctness in scx_bpf_select_cpu_and()Andrea Righi
Validate locking correctness when accessing p->nr_cpus_allowed and p->cpus_ptr inside scx_bpf_select_cpu_and(): if the rq lock is held, access is safe; otherwise, require that p->pi_lock is held. This allows to catch potential unsafe calls to scx_bpf_select_cpu_and(). Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-20sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.cAndrea Righi
Relocate the scx_kf_allowed_if_unlocked(), so it can be used from other source files (e.g., ext_idle.c). No functional change. Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-20sched_ext, docs: add labelShashank Balaji
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-20selftests: seccomp: Fix "performace" to "performance"Sumanth Gavini
Fix misspelling reported by codespell Signed-off-by: Sumanth Gavini <sumanth.gavini@yahoo.com> Link: https://lore.kernel.org/r/20250517011725.1149510-1-sumanth.gavini@yahoo.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-05-21Merge tag 'nova-next-v6.16-2025-05-20' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/nova into drm-next Nova changes for v6.16 auxiliary: - bus abstractions - implementation for driver registration - add sample driver drm: - implement __drm_dev_alloc() - DRM core infrastructure Rust abstractions - device, driver and registration - DRM IOCTL - DRM File - GEM object - IntoGEMObject rework - generically implement AlwaysRefCounted through IntoGEMObject - refactor unsound from_gem_obj() into as_ref() - refactor into_gem_obj() into as_raw() driver-core: - merge topic/device-context-2025-04-17 from driver-core tree - implement Devres::access() - fix: doctest build under `!CONFIG_PCI` - accessor for Device::parent() - fix: conditionally expect `dead_code` for `parent()` - impl TryFrom<&Device> bus devices (PCI, platform) nova-core: - remove completed Vec extentions from task list - register auxiliary device for nova-drm - derive useful traits for Chipset - add missing GA100 chipset - take &Device<Bound> in Gpu::new() - infrastructure to generate register definitions - fix register layout of NV_PMC_BOOT_0 - move Firmware into own (Rust) module - fix: select AUXILIARY_BUS nova-drm: - initial driver skeleton (depends on drm and auxiliary bus abstractions) - fix: select AUXILIARY_BUS Rust (dependencies): - implement Opaque::zeroed() - implement Revocable::try_access_with() - implement Revocable::access() From: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/aCxAf3RqQAXLDhAj@cassiopeiae
2025-05-20Merge patch series "can: kvaser_pciefd: Fix ISR race conditions"Marc Kleine-Budde
Axel Forsman <axfo@kvaser.com> says: This patch series fixes a couple of race conditions in the kvaser_pciefd driver surfaced by enabling MSI interrupts and the new Kvaser PCIe 8xCAN. Changes since version 2: * Rebase onto linux-can/main to resolve del_timer()/timer_delete() merge conflict. * Reword 2nd commit message slightly. Changes since version 1: * Change type of srb_cmd_reg from "__le32 __iomem *" to "void __iomem *". * Maintain TX FIFO count in driver instead of querying HW. * Stop queue at end of .start_xmit() if full. Link: https://patch.msgid.link/20250520114332.8961-1-axfo@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-05-20can: kvaser_pciefd: Continue parsing DMA buf after dropped RXAxel Forsman
Going bus-off on a channel doing RX could result in dropped packets. As netif_running() gets cleared before the channel abort procedure, the handling of any last RDATA packets would see netif_rx() return non-zero to signal a dropped packet. kvaser_pciefd_read_buffer() dealt with this "error" by breaking out of processing the remaining DMA RX buffer. Only return an error from kvaser_pciefd_read_buffer() due to packet corruption, otherwise handle it internally. Cc: stable@vger.kernel.org Signed-off-by: Axel Forsman <axfo@kvaser.com> Tested-by: Jimmy Assarsson <extja@kvaser.com> Reviewed-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250520114332.8961-4-axfo@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-05-20can: kvaser_pciefd: Fix echo_skb raceAxel Forsman
The functions kvaser_pciefd_start_xmit() and kvaser_pciefd_handle_ack_packet() raced to stop/wake TX queues and get/put echo skbs, as kvaser_pciefd_can->echo_lock was only ever taken when transmitting and KCAN_TX_NR_PACKETS_CURRENT gets decremented prior to handling of ACKs. E.g., this caused the following error: can_put_echo_skb: BUG! echo_skb 5 is occupied! Instead, use the synchronization helpers in netdev_queues.h. As those piggyback on BQL barriers, start updating in-flight packets and bytes counts as well. Cc: stable@vger.kernel.org Signed-off-by: Axel Forsman <axfo@kvaser.com> Tested-by: Jimmy Assarsson <extja@kvaser.com> Reviewed-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250520114332.8961-3-axfo@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-05-20can: kvaser_pciefd: Force IRQ edge in case of nested IRQAxel Forsman
Avoid the driver missing IRQs by temporarily masking IRQs in the ISR to enforce an edge even if a different IRQ is signalled before handled IRQs are cleared. Fixes: 48f827d4f48f ("can: kvaser_pciefd: Move reset of DMA RX buffers to the end of the ISR") Cc: stable@vger.kernel.org Signed-off-by: Axel Forsman <axfo@kvaser.com> Tested-by: Jimmy Assarsson <extja@kvaser.com> Reviewed-by: Jimmy Assarsson <extja@kvaser.com> Link: https://patch.msgid.link/20250520114332.8961-2-axfo@kvaser.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-05-20ext4: Add a WARN_ON_ONCE for querying LAST_IN_LEAF insteadRitesh Harjani (IBM)
We added the documentation in ext4_map_blocks() for usage of EXT4_GET_BLOCKS_QUERY_LAST_IN_LEAF flag. But It's better to add a WARN_ON_ONCE in case if anyone tries using this flag with CREATE to avoid a random issue later. Since depth can change with CREATE and it needs to be re-calculated before using it in there. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/ee6e82a224c50b432df9ce1ce3333c50182d8473.1747677758.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: Simplify flags in ext4_map_query_blocks()Ritesh Harjani (IBM)
Now that we have EXT4_EX_QUERY_FILTER mask, let's use that to simplify the filtering of flags for passing to ext4_ext_map_blocks() in ext4_map_query_blocks() function. This allows us to kill the query_flags local variable which is not needed anymore. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/4ae735e83e6f43341e53e2d289e59156a8360134.1747677758.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: Rename and document EXT4_EX_FILTER to EXT4_EX_QUERY_FILTERRitesh Harjani (IBM)
Rename EXT4_EX_FILTER to EXT4_EX_QUERY_FILTER to better describe its purpose as a filter mask used specifically in ext4_map_query_blocks(). Add a comment explaining that this macro is used to filter flags needed when querying the on-disk extent tree. We will later use EXT4_EX_QUERY_FILTER mask to add another EXT4_GET_BLOCKS_QUERY needed to lookup in on-disk extent tree. Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/51f05d0ba286372eb8693af95bd4b10194b53141.1747677758.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: Simplify last in leaf check in ext4_map_query_blocksRitesh Harjani (IBM)
This simplifies the check for last in leaf in ext4_map_query_blocks() and fixes this cocci warning. cocci warnings: (new ones prefixed by >>) >> fs/ext4/inode.c:573:49-51: WARNING !A || A && B is equivalent to !A || B Fixes: 5bb12b1837c0 ("ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202505191524.auftmOwK-lkp@intel.com/ Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/5fd5c806218c83f603c578c95997cf7f6da29d74.1747677758.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: Unwritten to written conversion requires EXT4_EX_NOCACHERitesh Harjani (IBM)
This fixes the atomic write patch series after it was rebased on top of extent status cache cleanup series i.e. 'commit 402e38e6b71f57 ("ext4: prevent stale extent cache entries caused by concurrent I/O writeback")' After the above series, EXT4_GET_BLOCKS_IO_CONVERT_EXT flag which has EXT4_GET_BLOCKS_IO_SUBMIT flag set, requires that the io submit context of any kind should pass EXT4_EX_NOCACHE to avoid caching unncecessary extents in the extent status cache. This patch fixes that by adding the EXT4_EX_NOCACHE flag in ext4_convert_unwritten_extents_atomic() for unwritten to written conversion calls to ext4_map_blocks(). Fixes: b86629c2b299 ("ext4: Add multi-fsblock atomic write support with bigalloc") Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/ea0ad9378ff6d31d73f4e53f87548e3a20817689.1747677758.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACKMing Lei
Add test for covering UBLK_AUTO_BUF_REG_FALLBACK: - pass '--auto_zc_fallback' to null target, which requires both F_AUTO_BUF_REG and F_SUPPORT_ZERO_COPY for handling UBLK_AUTO_BUF_REG_FALLBACK - add ->buf_index() method for returning invalid buffer index to trigger UBLK_AUTO_BUF_REG_FALLBACK - add generic_09 for running the test - add --auto_zc_fallback test in stress_03/stress_04/stress_05 Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250520045455.515691-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20selftests: ublk: support UBLK_F_AUTO_BUF_REGMing Lei
Enable UBLK_F_AUTO_BUF_REG support for ublk utility by argument `--auto_zc`, meantime support this feature in null, loop and stripe target code. Add function test generic_08 for covering basic UBLK_F_AUTO_BUF_REG feature. Also cover UBLK_F_AUTO_BUF_REG in stress_03, stress_04 and stress_05 test too. 'fio/t/io_uring -p0 /dev/ublkb0' shows that F_AUTO_BUF_REG can improve IOPS by 50% compared with F_SUPPORT_ZERO_COPY in my test VM. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250520045455.515691-6-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20ublk: support UBLK_AUTO_BUF_REG_FALLBACKMing Lei
For UBLK_F_AUTO_BUF_REG, buffer is registered to uring_cmd context automatically with the provided buffer index. User may provide one wrong buffer index, or the specified buffer is registered by application already. Add UBLK_AUTO_BUF_REG_FALLBACK for supporting to auto buffer registering fallback by completing the uring_cmd and telling ublk server the register failure via UBLK_AUTO_BUF_REG_FALLBACK, then ublk server still can register the buffer from userspace. So we can provide reliable way for supporting auto buffer register. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250520045455.515691-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20ublk: register buffer to local io_uring with provided buf index via ↵Ming Lei
UBLK_F_AUTO_BUF_REG Add UBLK_F_AUTO_BUF_REG for supporting to register buffer automatically to local io_uring context with provided buffer index. Add UAPI structure `struct ublk_auto_buf_reg` for holding user parameter to register request buffer automatically, one 'flags' field is defined, and there is still 32bit available for future extension, such as, adding one io_ring FD field for registering buffer to external io_uring. `struct ublk_auto_buf_reg` is populated from ublk uring_cmd's sqe->addr, and all existing ublk commands are data-less, so it is just fine to reuse sqe->addr for this purpose. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250520045455.515691-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20ublk: prepare for supporting to register request buffer automaticallyMing Lei
UBLK_F_SUPPORT_ZERO_COPY requires ublk server to issue explicit buffer register/unregister uring_cmd for each IO, this way is not only inefficient, but also introduce dependency between buffer consumer and buffer register/ unregister uring_cmd, please see tools/testing/selftests/ublk/stripe.c in which backing file IO has to be issued one by one by IOSQE_IO_LINK. Prepare for adding feature UBLK_F_AUTO_BUF_REG for addressing the existing zero copy limitation: - register request buffer automatically to ublk uring_cmd's io_uring context before delivering io command to ublk server - unregister request buffer automatically from the ublk uring_cmd's io_uring context when completing the request - io_uring will unregister the buffer automatically when uring is exiting, so we needn't worry about accident exit For using this feature, ublk server has to create one sparse buffer table Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250520045455.515691-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20ublk: convert to refcount_tMing Lei
Convert to refcount_t and prepare for supporting to register bvec buffer automatically, which needs to initialize reference counter as 2, and kref doesn't provide this interface, so convert to refcount_t. Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Suggested-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250520045455.515691-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20selftests: ublk: make IO & device removal test more stressfulMing Lei
__run_io_and_remove() is used in several stress tests for running heavy IO vs. removing device meantime. However, sequential `readwrite` is taken in the fio script, which isn't correct, we should take random IO for saturating ublk device. Also turns out '--num_jobs=4' isn't stressful enough, so change it to '--num_jobs=$(nproc)'. Finally we don't cover single queue test in `test_stress_02.sh`, so add single queue test which can trigger request tag recycling easier. With above change the issue in #1 can be reproduced reliably in stress_02.sh. Link:https://lore.kernel.org/linux-block/mruqwpf4tqenkbtgezv5oxwq7ngyq24jzeyqy4ixzvivatbbxv@4oh2wzz4e6qn/ #1 Cc: Jared Holzman <jholzman@nvidia.com> Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250519031620.245749-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20Merge tag 'nvme-6.16-2025-05-20' of git://git.infradead.org/nvme into ↵Jens Axboe
for-6.16/block Pull NVMe updates from Christoph: "nvme updates for Linux 6.16 - add per-node DMA pools and use them for PRP/SGL allocations (Caleb Sander Mateos, Keith Busch) - nvme-fcloop refcounting fixes (Daniel Wagner) - support delayed removal of the multipath node and optionally support the multipath node for private namespaces (Nilay Shroff) - support shared CQs in the PCI endpoint target code (Wilfred Mallawa) - support admin-queue only authentication (Hannes Reinecke) - use the crc32c library instead of the crypto API (Eric Biggers) - misc cleanups (Christoph Hellwig, Marcelo Moreira, Hannes Reinecke, Leon Romanovsky, Gustavo A. R. Silva)" * tag 'nvme-6.16-2025-05-20' of git://git.infradead.org/nvme: (42 commits) nvme: rename nvme_mpath_shutdown_disk to nvme_mpath_remove_disk nvme: introduce multipath_always_on module param nvme-multipath: introduce delayed removal of the multipath head node nvme-pci: derive and better document max segments limits nvme-pci: use struct_size for allocation struct nvme_dev nvme-pci: add a symolic name for the small pool size nvme-pci: use a better encoding for small prp pool allocations nvme-pci: rename the descriptor pools nvme-pci: remove struct nvme_descriptor nvme-pci: store aborted state in flags variable nvme-pci: don't try to use SGLs for metadata on the admin queue nvme-pci: make PRP list DMA pools per-NUMA-node nvme-pci: factor out a nvme_init_hctx_common() helper dmapool: add NUMA affinity support nvme-fc: do not reference lsrsp after failure nvmet-fcloop: don't wait for lport cleanup nvmet-fcloop: add missing fcloop_callback_host_done nvmet-fc: take tgtport refs for portentry nvmet-fc: free pending reqs on tgtport unregister nvmet-fcloop: drop response if targetport is gone ...
2025-05-20spi: dt-bindings: Add rk3528-spi compatibleChukun Pan
This adds a compatible string for the SPI controller on RK3528. Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20250520100102.1226725-2-amadeus@jmu.edu.cn Signed-off-by: Mark Brown <broonie@kernel.org>
2025-05-20Merge tag 'for-linus-6.15-ofs2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs fix from Mike Marshall: "Fix for orangefs page writeout counting" * tag 'for-linus-6.15-ofs2' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: adjust counting code to recover from 665575cf
2025-05-20wifi: ath9k_htc: Abort software beacon handling if disabledToke Høiland-Jørgensen
A malicious USB device can send a WMI_SWBA_EVENTID event from an ath9k_htc-managed device before beaconing has been enabled. This causes a device-by-zero error in the driver, leading to either a crash or an out of bounds read. Prevent this by aborting the handling in ath9k_htc_swba() if beacons are not enabled. Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/r/88967.1743099372@localhost Fixes: 832f6a18fc2a ("ath9k_htc: Add beacon slots") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://patch.msgid.link/20250402112217.58533-1-toke@toke.dk Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-05-20loop: don't require ->write_iter for writable files in loop_configureChristoph Hellwig
Block devices can be opened read-write even if they can't be written to for historic reasons. Remove the check requiring file->f_op->write_iter when the block devices was opened in loop_configure. The call to loop_check_backing_file just below ensures the ->write_iter is present for backing files opened for writing, which is the only check that is actually needed. Fixes: f5c84eff634b ("loop: Add sanity check for read/write_iter") Reported-by: Christian Hesse <mail@eworm.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250520135420.1177312-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-05-20orangefs: adjust counting code to recover from 665575cfMike Marshall
A late commit to 6.14-rc7! broke orangefs. 665575cf seems like a good change, but maybe should have been introduced during the merge window. This patch adjusts the counting code associated with writing out pages so that orangefs works in a 665575cf world. Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2025-05-20wifi: ath12k: remove redundant regulatory rules intersection logic in hostAishwarya R
Whenever there is a change in the country code settings from the user, driver does an intersection of the regulatory rules for this new country with the original regulatory rules which were reported during initialization time. There is also similar logic running in firmware with a difference that the intersection in firmware is only done when the country code is configuration during boot up time (BDF/OTP). Firmware logic does not kick in when no country code is configured during device bring up time as the device is always expected to have the country code configured properly in the deployment. There is a debug/test use case that requires absolute regulatory rules to be used for a user configured country code when the device is not configured with a particular country code during boot up time. To support the above test use case, remove the redundant regulatory rules intersection logic in the host driver. Depend on the intersection logic in firmware when the device comes up with pre-configured country code. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.3.1-00209-QCAHKSWPL_SILICONZ-1 Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Signed-off-by: Aishwarya R <quic_aisr@quicinc.com> Signed-off-by: Rajat Soni <quic_rajson@quicinc.com> Link: https://patch.msgid.link/20250505034351.1365914-1-quic_rajson@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-05-20wifi: ath12k: Send MCS15 support to firmware during peer assocMohan Kumar G
As per IEEE 802.11be-2024 - 9.4.2.321, EHT operation element contains MCS15 Disable subfield as the sixth bit, which is set when MCS15 support is not enabled. During association, firmware will use this MCS15 flag to enable or disable the reception of PPDU with EHT-MCS15 capability. Send MCS15 support to firmware through WMI command during peer assoc. Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.3.1-00173-QCAHKSWPL_SILICONZ-1 Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3 Co-developed-by: Dhanavandhana Kannan <quic_dhanavan1@quicinc.com> Signed-off-by: Dhanavandhana Kannan <quic_dhanavan1@quicinc.com> Signed-off-by: Mohan Kumar G <quic_mkumarg@quicinc.com> Link: https://patch.msgid.link/20250505153536.3275145-1-quic_mkumarg@quicinc.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
2025-05-20ext4: only dirty folios when data journaling regular filesBrian Foster
fstest generic/388 occasionally reproduces a crash that looks as follows: BUG: kernel NULL pointer dereference, address: 0000000000000000 ... Call Trace: <TASK> ext4_block_zero_page_range+0x30c/0x380 [ext4] ext4_truncate+0x436/0x440 [ext4] ext4_process_orphan+0x5d/0x110 [ext4] ext4_orphan_cleanup+0x124/0x4f0 [ext4] ext4_fill_super+0x262d/0x3110 [ext4] get_tree_bdev_flags+0x132/0x1d0 vfs_get_tree+0x26/0xd0 vfs_cmd_create+0x59/0xe0 __do_sys_fsconfig+0x4ed/0x6b0 do_syscall_64+0x82/0x170 ... This occurs when processing a symlink inode from the orphan list. The partial block zeroing code in the truncate path calls ext4_dirty_journalled_data() -> folio_mark_dirty(). The latter calls mapping->a_ops->dirty_folio(), but symlink inodes are not assigned an a_ops vector in ext4, hence the crash. To avoid this problem, update the ext4_dirty_journalled_data() helper to only mark the folio dirty on regular files (for which a_ops is assigned). This also matches the journaling logic in the ext4_symlink() creation path, where ext4_handle_dirty_metadata() is called directly. Fixes: d84c9ebdac1e ("ext4: Mark pages with journalled data dirty") Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://patch.msgid.link/20250516173800.175577-1-bfoster@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org
2025-05-20ext4: Add atomic block write documentationRitesh Harjani (IBM)
Add an initial documentation around atomic writes support in ext4. Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/d3893b9f5ad70317abae72046e81e4c180af91bf.1747337952.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: Enable support for ext4 multi-fsblock atomic write using bigallocRitesh Harjani (IBM)
Last couple of patches added the needed support for multi-fsblock atomic writes using bigalloc. This patch ensures that filesystem advertizes the needed atomic write unit min and max values for enabling multi-fsblock atomic write support with bigalloc. Acked-by: Darrick J. Wong <djwong@kernel.org> Co-developed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/5e45d7ed24499024b9079436ba6698dae5298e29.1747337952.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: Add multi-fsblock atomic write support with bigallocRitesh Harjani (IBM)
EXT4 supports bigalloc feature which allows the FS to work in size of clusters (group of blocks) rather than individual blocks. This patch adds atomic write support for bigalloc so that systems with bs = ps can also create FS using - mkfs.ext4 -F -O bigalloc -b 4096 -C 16384 <dev> With bigalloc ext4 can support multi-fsblock atomic writes. We will have to adjust ext4's atomic write unit max value to cluster size. This can then support atomic write of size anywhere between [blocksize, clustersize]. This patch adds the required changes to enable multi-fsblock atomic write support using bigalloc in the next patch. In this patch for block allocation: we first query the underlying region of the requested range by calling ext4_map_blocks() call. Here are the various cases which we then handle depending upon the underlying mapping type: 1. If the underlying region for the entire requested range is a mapped extent, then we don't call ext4_map_blocks() to allocate anything. We don't need to even start the jbd2 txn in this case. 2. For an append write case, we create a mapped extent. 3. If the underlying region is entirely a hole, then we create an unwritten extent for the requested range. 4. If the underlying region is a large unwritten extent, then we split the extent into 2 unwritten extent of required size. 5. If the underlying region has any type of mixed mapping, then we call ext4_map_blocks() in a loop to zero out the unwritten and the hole regions within the requested range. This then provide a single mapped extent type mapping for the requested range. Note: We invoke ext4_map_blocks() in a loop with the EXT4_GET_BLOCKS_ZERO flag only when the underlying extent mapping of the requested range is not entirely a hole, an unwritten extent, or a fully mapped extent. That is, if the underlying region contains a mix of hole(s), unwritten extent(s), and mapped extent(s), we use this loop to ensure that all the short mappings are zeroed out. This guarantees that the entire requested range becomes a single, uniformly mapped extent. It is ok to do so because we know this is being done on a bigalloc enabled filesystem where the block bitmap represents the entire cluster unit. Note having a single contiguous underlying region of type mapped, unwrittn or hole is not a problem. But the reason to avoid writing on top of mixed mapping region is because, atomic writes requires all or nothing should get written for the userspace pwritev2 request. So if at any point in time during the write if a crash or a sudden poweroff occurs, the region undergoing atomic write should read either complete old data or complete new data. But it should never have a mix of both old and new data. So, we first convert any mixed mapping region to a single contiguous mapped extent before any data gets written to it. This is because normally FS will only convert unwritten extents to written at the end of the write in ->end_io() call. And if we allow the writes over a mixed mapping and if a sudden power off happens in between, we will end up reading mix of new data (over mapped extents) and old data (over unwritten extents), because unwritten to written conversion never went through. So to avoid this and to avoid writes getting torned due to mixed mapping, we first allocate a single contiguous block mapping and then do the write. Acked-by: Darrick J. Wong <djwong@kernel.org> Co-developed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/c4965ac3407cbc773f0bc954d0966d9696f5038a.1747337952.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: Add support for EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKSRitesh Harjani (IBM)
There can be a case where there are contiguous extents on the adjacent leaf nodes of on-disk extent trees. So when someone tries to write to this contiguous range, ext4_map_blocks() call will split by returning 1 extent at a time if this is not already cached in extent_status tree cache (where if these extents when cached can get merged since they are contiguous). This is fine for a normal write however in case of atomic writes, it can't afford to break the write into two. Now this is also something that will only happen in the slow write case where we call ext4_map_blocks() for each of these extents spread across different leaf nodes. However, there is no guarantee that these extent status cache cannot be reclaimed before the last call to ext4_map_blocks() in ext4_map_blocks_atomic_write_slow(). Hence this patch adds support of EXT4_GET_BLOCKS_QUERY_LEAF_BLOCKS. This flag checks if the requested range can be fully found in extent status cache and return. If not, it looks up in on-disk extent tree via ext4_map_query_blocks(). If the found extent is the last entry in the leaf node, then it goes and queries the next lblk to see if there is an adjacent contiguous extent in the adjacent leaf node of the on-disk extent tree. Even though there can be a case where there are multiple adjacent extent entries spread across multiple leaf nodes. But we only read an adjacent leaf block i.e. in total of 2 extent entries spread across 2 leaf nodes. The reason for this is that we are mostly only going to support atomic writes with upto 64KB or maybe max upto 1MB of atomic write support. Acked-by: Darrick J. Wong <djwong@kernel.org> Co-developed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/6bb563e661f5fbd80e266a9e6ce6e29178f555f6.1747337952.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: Make ext4_meta_trans_blocks() non-static for later useRitesh Harjani (IBM)
Let's make ext4_meta_trans_blocks() non-static for use in later functions during ->end_io conversion for atomic writes. We will need this function to estimate journal credits for a special case. Instead of adding another wrapper around it, let's make this non-static. Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/23ce80d4286f792831ce99d13558182ee228fedb.1747337952.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: Check if inode uses extents in ext4_inode_can_atomic_write()Ritesh Harjani (IBM)
EXT4 only supports doing atomic write on inodes which uses extents, so add a check in ext4_inode_can_atomic_write() which gets called during open. Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/86bb502c979398a736ab371d8f35f6866a477f6c.1747337952.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: Document an edge case for overwritesRitesh Harjani (IBM)
ext4_iomap_overwrite_begin() clears the flag for IOMAP_WRITE before calling ext4_iomap_begin(). Document this above ext4_map_blocks() call as it is easy to miss it when focusing on write paths alone. Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/fd50ba05440042dff77d555e463a620a79f8d0e9.1747337952.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20jbd2: remove journal_t argument from jbd2_superblock_csum()Eric Biggers
Since jbd2_superblock_csum() no longer uses its journal_t argument, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250513053809.699974-5-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20jbd2: remove journal_t argument from jbd2_chksum()Eric Biggers
Since jbd2_chksum() no longer uses its journal_t argument, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250513053809.699974-4-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: remove sb argument from ext4_superblock_csum()Eric Biggers
Since ext4_superblock_csum() no longer uses its sb argument, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250513053809.699974-3-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: remove sbi argument from ext4_chksum()Eric Biggers
Since ext4_chksum() no longer uses its sbi argument, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250513053809.699974-2-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: enable large folio for regular fileZhang Yi
Besides fsverity, fscrypt, and the data=journal mode, ext4 now supports large folios for regular files. Enable this feature by default. However, since we cannot change the folio order limitation of mappings on active inodes, setting the journal=data mode via ioctl on an active inode will not take immediate effect in non-delalloc mode. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250512063319.3539411-9-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: make online defragmentation support large foliosZhang Yi
move_extent_per_page() currently assumes that each folio is the size of PAGE_SIZE and only copies data for one page. ext4_move_extents() should call move_extent_per_page() for each page. To support larger folios, simply modify the calculations for the block start and end offsets within the folio based on the provided range of 'data_offset_in_page' and 'block_len_in_page'. This function will continue to handle PAGE_SIZE of data at a time and will not convert this function to manage an entire folio. Additionally, we use the source folio to copy data, so it doesn't matter if the source and dest folios are different in size. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250512063319.3539411-8-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: make the writeback path support large foliosZhang Yi
In mpage_map_and_submit_buffers(), the 'lblk' is now aligned to PAGE_SIZE. Convert it to be aligned to folio size. Additionally, modify the wbc->nr_to_write update to reduce the number of pages in a single folio, ensuring that the entire writeback path can support large folios. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250512063319.3539411-7-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: correct the journal credits calculations of allocating blocksZhang Yi
The journal credits calculation in ext4_ext_index_trans_blocks() is currently inadequate. It only multiplies the depth of the extents tree and doesn't account for the blocks that may be required for adding the leaf extents themselves. After enabling large folios, we can easily run out of handle credits, triggering a warning in jbd2_journal_dirty_metadata() on filesystems with a 1KB block size. This occurs because we may need more extents when iterating through each large folio in ext4_do_writepages()->mpage_map_and_submit_extent(). Therefore, we should modify ext4_ext_index_trans_blocks() to include a count of the leaf extents in the worst case as well. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250512063319.3539411-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4/jbd2: convert jbd2_journal_blocks_per_page() to support large folioZhang Yi
jbd2_journal_blocks_per_page() returns the number of blocks in a single page. Rename it to jbd2_journal_blocks_per_folio() and make it returns the number of blocks in the largest folio, preparing for the calculation of journal credits blocks when allocating blocks within a large folio in the writeback path. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250512063319.3539411-5-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4: make __ext4_block_zero_page_range() support large folioZhang Yi
The partial block zero range helper __ext4_block_zero_page_range() currently only supports folios of PAGE_SIZE in size. The calculations for the start block and the offset within a folio for the given range are incorrect. Modify the implementation to use offset_in_folio() instead of directly masking PAGE_SIZE - 1, which will be able to support for large folios. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250512063319.3539411-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>