summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-09wifi: iwlwifi: cfg: remove DCCM offsets from new devicesJohannes Berg
This is only used with old-style debug dump, which isn't supported on newer devices, so remove the data. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250508121306.1277801-14-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: cfg: remove eeprom_size from new devicesJohannes Berg
Since 22000 series, the data is read by the firmware and the driver doesn't need to know, remove the useless setting. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250508121306.1277801-13-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: rename struct iwl_base_paramsJohannes Berg
These are (going to be) base MAC parameters that are identical even for different platforms with the same MAC, so rename the structure accordingly, calling it iwl_family_base_params. Also rename the pointer to it so the dereferencing is a bit shorter. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250508121306.1277801-12-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: cfg: remove rf_id fieldJohannes Berg
This field is always set for >= 9000 series, but then we already check that, so it's not needed. Remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250508121306.1277801-11-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: cfg: remove dbgc_supported fieldJohannes Berg
This field is unused, remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250508121306.1277801-10-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: rename cfg_trans_params to mac_cfgJohannes Berg
Since 9000 series devices, the devices are split into MAC and CRF parts. Currently, "struct iwl_cfg" reflects some MAC and some RF parameters, but we want to clean this up and move the MAC data to what's now "struct iwl_cfg_trans_params". As the first step, to reflect the intent, rename this structure. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250508121306.1277801-9-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: pass trans to iwl_parse_nvm_mcc_info()Johannes Berg
There's no need to pass various different pointers when the transport is already established, so just pass that instead. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250508121306.1277801-8-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: cfg: remove nvm_hw_section_num from new devicesJohannes Berg
This value hasn't been used since unified firmware in 22000 series, so there's no need to set the value for that or newer devices. Remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250508121306.1277801-7-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: cfg: handle cc firmware dynamicallyJohannes Berg
Instead of using fw_name_pre, handle the cc firmware file name specially in iwl_drv_get_fwname_pre() for the cc MAC type. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250508121306.1277801-6-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: cfg: build ax210 family FW names dynamicallyJohannes Berg
Add support for the TY MAC (discrete device) and GF4 RF to the list of MAC/RF types, and use that to remove fw_name_pre for the ax210 family devices. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250508121306.1277801-5-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: cfg: remove 'cdb' valueJohannes Berg
This is never used, so remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250508121306.1277801-4-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: cfg: remove QuZ/JF special casesJohannes Berg
Since JF RF always uses b0 step and QuZ MAC always uses a0 step for firmware, there's no need for these configs that just force the steps to those values. Remove them. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250508121306.1277801-3-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: build 9000 series FW filenames dynamicallyJohannes Berg
This is more maintainable than the fw_name_pre prefix, and requires fewer duplicate structs as well. Since only b0 FW exists, override the MAC/RF steps for these devices. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250508121306.1277801-2-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: add JF1/JF2 RF for dynamic FW buildingJohannes Berg
This is needed, otherwise we don't know what to do and will not find the correct firmware. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250506194102.3407967-16-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: pcie: remove 0x2726 devicesJohannes Berg
These are test chips, not available to users, and not even listed in the PCI IDs table (so the driver won't bind them). There's no reason to list specific devices with them in the driver. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250506194102.3407967-15-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: cfg: inline HT paramsJohannes Berg
With just a handful of values in two bytes, the params are smaller than the pointer to them. Inline them and save some space. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250506194102.3407967-14-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: cfg: remove 6 GHz from ht40_bandsJohannes Berg
Since there's no HT on 6 GHz, only HE, the HT capabilities are never initialized, and so the ht40_bands value is never checked for the 6 GHz band. Remove the misleading value. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250506194102.3407967-13-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: mld: call thermal exit without wiphy lock heldBenjamin Berg
The driver must not hold the wiphy mutex when unregistering the thermal devices. Do not hold the lock for the call to iwl_mld_thermal_exit and only do a lock/unlock to cancel the ct_kill_exit_wk work. The problem is that iwl_mld_tzone_get_temp needs to take the wiphy lock while the thermal code is holding its own locks already. When unregistering the device, the reverse would happen as the driver was calling thermal_cooling_device_unregister with the wiphy mutex already held. It is not likely to trigger this deadlock as it can only happen if the thermal code is polling the temperature while the driver is being unloaded. However, lockdep reported it so fix it. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250506194102.3407967-12-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: mld: avoid init-after-queueMiri Korenblit
rx_omi::finished_work is initialized when the containing link is. If the worker was queued and then an error happened, we will get to iwl_mld_init_link from the reconfig and initialize the work after it was queued. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250506194102.3407967-11-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: mld: use a radio/system specific power budgetBenjamin Berg
Different hardware has a different maximum power consumption and the BIOS can also define a power limit for the device. Add code to select an appropriate maximum power budget for the device and configure that instead of using a hardcoded table. This removes the old table. It does not work with the variable upper limit and the there should be no consumer that requires these exact step values. This considerably increases the power budget for some devices and can prevent throttling in high traffic situations. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250506194102.3407967-10-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: mvm: use a radio/system specific power budgetBenjamin Berg
Different hardware has a different maximum power consumption and the BIOS can also define a power limit for the device. Add code to select an appropriate maximum power budget for the device and configure that instead of using a hardcoded table. This removes the old table. It does not work with the variable upper limit and the there should be no consumer that requires these exact step values. This considerably increases the power budget for some devices and can prevent throttling in high traffic situations. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250506194102.3407967-9-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: fix thermal code compilation with -Werror=cast-qualBenjamin Berg
The compare_temps function in both mvm and mld dropped the const qualifier in a cast in a way that makes -Werror=cast-qual unhappy. Add the const to the cast to fix this. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20250506194102.3407967-8-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: pcie: remove iwl_trans_pcie_gen2_send_hcmdMiri Korenblit
This function is not implemented nor called. Remove its declaration. Link: https://patch.msgid.link/20250506194102.3407967-7-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: Add support for a new version for link config commandYedidya Benshimol
Add support for a new version of link configuration command which includes NPCA and high priority TX traffic support for wifi8. Signed-off-by: Yedidya Benshimol <yedidya.ben.shimol@intel.com> Link: https://patch.msgid.link/20250506194102.3407967-6-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: Add a new version for mac config commandYedidya Benshimol
Add a new version of mac configuration command which includes UHR support indication. Signed-off-by: Yedidya Benshimol <yedidya.ben.shimol@intel.com> Link: https://patch.msgid.link/20250506194102.3407967-5-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: Add a new version for sta config commandYedidya Benshimol
Add a new version of sta configuration command which includes these wifi8 features: 1. LDPC X2 CW size support indication 2. Indication if ICF frame is needed instead of RTS 3. support for MIC padding delays for protected control frames Signed-off-by: Yedidya Benshimol <yedidya.ben.shimol@intel.com> Link: https://patch.msgid.link/20250506194102.3407967-4-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: add range response version 10 supportAvraham Stern
Range response version 10 removes the rx and tx rates fields. These fields aren't used by the driver anyway, so no change is needed to support it. Signed-off-by: Avraham Stern <avraham.stern@intel.com> Link: https://patch.msgid.link/20250506194102.3407967-3-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09wifi: iwlwifi: mld: remove one more error in unallocated BAIDMiri Korenblit
Since the FW is the one to assign an ID to a BA, it can happen that the FW sends a bar_frame_release_notif before the driver had the chance to allocate the BAID. Convert the IWL_FW_CHECK into a regular debug print. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20250506194102.3407967-2-miriam.rachel.korenblit@intel.com Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
2025-05-09Merge branch 'net_sched-gso_skb-flushing'David S. Miller
Cong Wang says: ==================== net_sched: Fix gso_skb flushing during qdisc change This patchset contains a bug fix and its test cases, please check each patch description for more details. To keep the bug fix minimum, I intentionally limit the code changes to the cases reported here. --- v2: added a missing qlen-- fixed the new boolean parameter for two qdiscs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-09selftests/tc-testing: Add qdisc limit trimming testsCong Wang
Added new test cases for FQ, FQ_CODEL, FQ_PIE, and HHF qdiscs to verify queue trimming behavior when the qdisc limit is dynamically reduced. Each test injects packets, reduces the qdisc limit, and checks that the new limit is enforced. This is still best effort since timing qdisc backlog is not easy. Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-09net_sched: Flush gso_skb list too during ->change()Cong Wang
Previously, when reducing a qdisc's limit via the ->change() operation, only the main skb queue was trimmed, potentially leaving packets in the gso_skb list. This could result in NULL pointer dereference when we only check sch->limit against sch->q.qlen. This patch introduces a new helper, qdisc_dequeue_internal(), which ensures both the gso_skb list and the main queue are properly flushed when trimming excess packets. All relevant qdiscs (codel, fq, fq_codel, fq_pie, hhf, pie) are updated to use this helper in their ->change() routines. Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM") Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM") Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler") Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler") Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc") Fixes: d4b36210c2e6 ("net: pkt_sched: PIE AQM scheme") Reported-by: Will <willsroot@protonmail.com> Reported-by: Savy <savy@syst3mfailure.io> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2025-05-09perf/amlogic: Replace smp_processor_id() with raw_smp_processor_id() in ↵Anand Moon
meson_ddr_pmu_create() The Amlogic DDR PMU driver meson_ddr_pmu_create() function incorrectly uses smp_processor_id(), which assumes disabled preemption. This leads to kernel warnings during module loading because meson_ddr_pmu_create() can be called in a preemptible context. Following kernel warning and stack trace: [ 31.745138] [ T2289] BUG: using smp_processor_id() in preemptible [00000000] code: (udev-worker)/2289 [ 31.745154] [ T2289] caller is debug_smp_processor_id+0x28/0x38 [ 31.745172] [ T2289] CPU: 4 UID: 0 PID: 2289 Comm: (udev-worker) Tainted: GW 6.14.0-0-MANJARO-ARM #1 59519addcbca6ba8de735e151fd7b9e97aac7ff0 [ 31.745181] [ T2289] Tainted: [W]=WARN [ 31.745183] [ T2289] Hardware name: Hardkernel ODROID-N2Plus (DT) [ 31.745188] [ T2289] Call trace: [ 31.745191] [ T2289] show_stack+0x28/0x40 (C) [ 31.745199] [ T2289] dump_stack_lvl+0x4c/0x198 [ 31.745205] [ T2289] dump_stack+0x20/0x50 [ 31.745209] [ T2289] check_preemption_disabled+0xec/0xf0 [ 31.745213] [ T2289] debug_smp_processor_id+0x28/0x38 [ 31.745216] [ T2289] meson_ddr_pmu_create+0x200/0x560 [meson_ddr_pmu_g12 8095101c49676ad138d9961e3eddaee10acca7bd] [ 31.745237] [ T2289] g12_ddr_pmu_probe+0x20/0x38 [meson_ddr_pmu_g12 8095101c49676ad138d9961e3eddaee10acca7bd] [ 31.745246] [ T2289] platform_probe+0x98/0xe0 [ 31.745254] [ T2289] really_probe+0x144/0x3f8 [ 31.745258] [ T2289] __driver_probe_device+0xb8/0x180 [ 31.745261] [ T2289] driver_probe_device+0x54/0x268 [ 31.745264] [ T2289] __driver_attach+0x11c/0x288 [ 31.745267] [ T2289] bus_for_each_dev+0xfc/0x160 [ 31.745274] [ T2289] driver_attach+0x34/0x50 [ 31.745277] [ T2289] bus_add_driver+0x160/0x2b0 [ 31.745281] [ T2289] driver_register+0x78/0x120 [ 31.745285] [ T2289] __platform_driver_register+0x30/0x48 [ 31.745288] [ T2289] init_module+0x30/0xfe0 [meson_ddr_pmu_g12 8095101c49676ad138d9961e3eddaee10acca7bd] [ 31.745298] [ T2289] do_one_initcall+0x11c/0x438 [ 31.745303] [ T2289] do_init_module+0x68/0x228 [ 31.745311] [ T2289] load_module+0x118c/0x13a8 [ 31.745315] [ T2289] __arm64_sys_finit_module+0x274/0x390 [ 31.745320] [ T2289] invoke_syscall+0x74/0x108 [ 31.745326] [ T2289] el0_svc_common+0x90/0xf8 [ 31.745330] [ T2289] do_el0_svc+0x2c/0x48 [ 31.745333] [ T2289] el0_svc+0x60/0x150 [ 31.745337] [ T2289] el0t_64_sync_handler+0x80/0x118 [ 31.745341] [ T2289] el0t_64_sync+0x1b8/0x1c0 Changes replaces smp_processor_id() with raw_smp_processor_id() to ensure safe CPU ID retrieval in preemptible contexts. Cc: Jiucheng Xu <jiucheng.xu@amlogic.com> Fixes: 2016e2113d35 ("perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver") Signed-off-by: Anand Moon <linux.amoon@gmail.com> Link: https://lore.kernel.org/r/20250407063206.5211-1-linux.amoon@gmail.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-09perf/arm-cmn: Fix REQ2/SNP2 mixupRobin Murphy
Somehow the encodings for REQ2/SNP2 channels in XP events got mixed up... Unmix them. CC: stable@vger.kernel.org Fixes: 23760a014417 ("perf/arm-cmn: Add CMN-700 support") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/087023e9737ac93d7ec7a841da904758c254cb01.1746717400.git.robin.murphy@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2025-05-09Merge patch series "Minor namespace code simplication"Christian Brauner
Joel Savitz <jsavitz@redhat.com> says: The two patches are independent of each other. The first patch removes unnecssary NULL guards from free_nsproxy() and create_new_namespaces() in line with other usage of the put_*_ns() call sites. The second patch slightly reduces the size of the kernel when CONFIG_CGROUPS is not selected. * patches from https://lore.kernel.org/20250508184930.183040-1-jsavitz@redhat.com: include/cgroup: separate {get,put}_cgroup_ns no-op case kernel/nsproxy: remove unnecessary guards Link: https://lore.kernel.org/20250508184930.183040-1-jsavitz@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09include/cgroup: separate {get,put}_cgroup_ns no-op caseJoel Savitz
When CONFIG_CGROUPS is not selected, {get,put}_cgroup_ns become no-ops and therefore it is not necessary to compile in the code for changing the reference count. When CONFIG_CGROUP is selected, there is no valid case where either of {get,put}_cgroup_ns() will be called with a NULL argument. Signed-off-by: Joel Savitz <jsavitz@redhat.com> Link: https://lore.kernel.org/20250508184930.183040-3-jsavitz@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09kernel/nsproxy: remove unnecessary guardsJoel Savitz
In free_nsproxy() and the error path of create_new_namesapces() the put_*_ns() calls are guarded by unnecessary NULL checks. put_pid_ns(), put_ipc_ns(), put_uts_ns(), and put_time_ns() will never receive a NULL argument unless their namespace type is disabled, and in this case all four become no-ops at compile time anyway. put_mnt_ns() will never receive a null argument at any time. This unguarded usage is in line with other call sites of put_*_ns(). Signed-off-by: Joel Savitz <jsavitz@redhat.com> Link: https://lore.kernel.org/20250508184930.183040-2-jsavitz@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09f2fs: fix freezing filesystem during resizeChristian Brauner
Using FREEZE_HOLDER_USERSPACE has two consequences: (1) If userspace freezes the filesystem after mnt_drop_write_file() but before freeze_super() was called filesystem resizing will fail because the freeze isn't marked as nestable. (2) If the kernel has successfully frozen the filesystem via FREEZE_HOLDER_USERSPACE userspace can simply undo it by using the FITHAW ioctl. Fix both issues by using FREEZE_HOLDER_KERNEL. It will nest with FREEZE_HOLDER_USERSPACE and cannot be undone by userspace. And it is the correct thing to do because the kernel temporarily freezes the filesystem. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09Merge patch series "power: wire-up filesystem freeze/thaw with suspend/resume"Christian Brauner
Christian Brauner <brauner@kernel.org> says: Now all the pieces are in place to actually allow the power subsystem to freeze/thaw filesystems during suspend/resume. Filesystems are only frozen and thawed if the power subsystem does actually own the freeze. Othwerwise it risks thawing filesystems it didn't own. This could be done differently be e.g., keeping the filesystems that were actually frozen on a list and then unfreezing them from that list. This is disgustingly unclean though and reeks of an ugly hack. If the filesystem is already frozen by the time we've frozen all userspace processes we don't care to freeze it again. That's userspace's job once the process resumes. We only actually freeze filesystems if we absolutely have to and we ignore other failures to freeze. We could bubble up errors and fail suspend/resume if the error isn't EBUSY (aka it's already frozen) but I don't think that this is worth it. Filesystem freezing during suspend/resume is best-effort. If the user has 500 ext4 filesystems mounted and 4 fail to freeze for whatever reason then we simply skip them. What we have now is already a big improvement and let's see how we fare with it before making our lives even harder (and uglier) than we have to. * patches from https://lore.kernel.org/r/20250402-work-freeze-v2-0-6719a97b52ac@kernel.org: kernfs: add warning about implementing freeze/thaw power: freeze filesystems during suspend/resume Link: https://lore.kernel.org/r/20250402-work-freeze-v2-0-6719a97b52ac@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09Merge patch series "efivarfs: support freeze/thaw"Christian Brauner
Christian Brauner <brauner@kernel.org> says: Allow efivarfs to partake to resync variable state during system hibernation and suspend. Add freeze/thaw support. This is a pretty straightforward implementation. We simply add regular freeze/thaw support for both userspace and the kernel. This works without any big issues and congrats afaict efivars is the first pseudofilesystem that adds support for filesystem freezing and thawing. The simplicity comes from the fact that we simply always resync variable state after efivarfs has been frozen. It doesn't matter whether that's because of suspend, userspace initiated freeze or hibernation. Efivars is simple enough that it doesn't matter that we walk all dentries. There are no directories and there aren't insane amounts of entries and both freeze/thaw are already heavy-handed operations. If userspace initiated a freeze/thaw cycle they would need CAP_SYS_ADMIN in the initial user namespace (as that's where efivarfs is mounted) so it can't be triggered by random userspace. IOW, we really really don't care. * patches from https://lore.kernel.org/r/20250331-work-freeze-v1-0-6dfbe8253b9f@kernel.org: efivarfs: support freeze/thaw libfs: export find_next_child() Link: https://lore.kernel.org/r/20250331-work-freeze-v1-0-6dfbe8253b9f@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09kernfs: add warning about implementing freeze/thawChristian Brauner
Sysfs is built on top of kernfs and sysfs provides the power management infrastructure to support suspend/hibernate by writing to various files in /sys/power/. As filesystems may be automatically frozen during suspend/hibernate implementing freeze/thaw support for kernfs generically will cause deadlocks as the suspending/hibernation initiating task will hold a VFS lock that it will then wait upon to be released. If freeze/thaw for kernfs is needed talk to the VFS. Link: https://lore.kernel.org/r/20250402-work-freeze-v2-4-6719a97b52ac@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09efivarfs: support freeze/thawChristian Brauner
Allow efivarfs to partake to resync variable state during system hibernation and suspend. Add freeze/thaw support. This is a pretty straightforward implementation. We simply add regular freeze/thaw support for both userspace and the kernel. This works without any big issues and congrats afaict efivars is the first pseudofilesystem that adds support for filesystem freezing and thawing. The simplicity comes from the fact that we simply always resync variable state after efivarfs has been frozen. It doesn't matter whether that's because of suspend, userspace initiated freeze or hibernation. Efivars is simple enough that it doesn't matter that we walk all dentries. There are no directories and there aren't insane amounts of entries and both freeze/thaw are already heavy-handed operations. We really really don't need to care. Link: https://lore.kernel.org/r/20250331-work-freeze-v1-2-6dfbe8253b9f@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09power: freeze filesystems during suspend/resumeChristian Brauner
Now all the pieces are in place to actually allow the power subsystem to freeze/thaw filesystems during suspend/resume. Filesystems are only frozen and thawed if the power subsystem does actually own the freeze. We could bubble up errors and fail suspend/resume if the error isn't EBUSY (aka it's already frozen) but I don't think that this is worth it. Filesystem freezing during suspend/resume is best-effort. If the user has 500 ext4 filesystems mounted and 4 fail to freeze for whatever reason then we simply skip them. What we have now is already a big improvement and let's see how we fare with it before making our lives even harder (and uglier) than we have to. We add a new sysctl know /sys/power/freeze_filesystems that will allow userspace to freeze filesystems during suspend/hibernate. For now it defaults to off. The thaw logic doesn't require checking whether freezing is enabled because the power subsystem exclusively owns frozen filesystems for the duration of suspend/hibernate and is able to skip filesystems it doesn't need to freeze. Also it is technically possible that filesystem filesystem_freeze_enabled is true and power freezes the filesystems but before freezing all processes another process disables filesystem_freeze_enabled. If power were to place the filesystems_thaw() call under filesystems_freeze_enabled it would fail to thaw the fileystems it frozw. The exclusive holder mechanism makes it possible to iterate through the list without any concern making sure that no filesystems are left frozen. Link: https://lore.kernel.org/r/20250402-work-freeze-v2-3-6719a97b52ac@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09libfs: export find_next_child()Christian Brauner
Export find_next_child() so it can be used by efivarfs. Keep it internal for now. There's no reason to advertise this kernel-wide. Link: https://lore.kernel.org/r/20250331-work-freeze-v1-1-6dfbe8253b9f@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09Merge patch series "Extend freeze support to suspend and hibernate"Christian Brauner
Christian Brauner <brauner@kernel.org> says: Add the necessary infrastructure changes to support freezing for suspend and hibernate. This should all that's needed to wire up power. * patches from https://lore.kernel.org/r/20250329-work-freeze-v2-0-a47af37ecc3d@kernel.org: super: add filesystem freezing helpers for suspend and hibernate gfs2: pass through holder from the VFS for freeze/thaw super: use common iterator (Part 2) super: use a common iterator (Part 1) super: skip dying superblocks early super: simplify user_get_super() super: remove pointless s_root checks Link: https://lore.kernel.org/r/20250329-work-freeze-v2-0-a47af37ecc3d@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09super: add filesystem freezing helpers for suspend and hibernateChristian Brauner
Allow the power subsystem to support filesystem freeze for suspend and hibernate. For some kernel subsystems it is paramount that they are guaranteed that they are the owner of the freeze to avoid any risk of deadlocks. This is the case for the power subsystem. Enable it to recognize whether it did actually freeze the filesystem. If userspace has 10 filesystems and suspend/hibernate manges to freeze 5 and then fails on the 6th for whatever odd reason (current or future) then power needs to undo the freeze of the first 5 filesystems. It can't just walk the list again because while it's unlikely that a new filesystem got added in the meantime it still cannot tell which filesystems the power subsystem actually managed to get a freeze reference count on that needs to be dropped during thaw. There's various ways out of this ugliness. For example, record the filesystems the power subsystem managed to freeze on a temporary list in the callbacks and then walk that list backwards during thaw to undo the freezing or make sure that the power subsystem just actually exclusively freezes things it can freeze and marking such filesystems as being owned by power for the duration of the suspend or resume cycle. I opted for the latter as that seemed the clean thing to do even if it means more code changes. If hibernation races with filesystem freezing (e.g. DM reconfiguration), then hibernation need not freeze a filesystem because it's already frozen but userspace may thaw the filesystem before hibernation actually happens. If the race happens the other way around, DM reconfiguration may unexpectedly fail with EBUSY. So allow FREEZE_EXCL to nest with other holders. An exclusive freezer cannot be undone by any of the other concurrent freezers. Link: https://lore.kernel.org/r/20250329-work-freeze-v2-6-a47af37ecc3d@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09fs: use writeback_iter directly in mpage_writepagesChristoph Hellwig
Stop using write_cache_pages and use writeback_iter directly. This removes an indirect call per written folio and makes the code easier to follow. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250507062124.3933305-1-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09Merge patch series "iomap: misc buffered write path cleanups and prep"Christian Brauner
Brian Foster <bfoster@redhat.com> says: Here's a bit more fallout and prep. work associated with the folio batch prototype posted a while back [1]. Work on that is still pending so it isn't included here, but based on the iter advance cleanups most of these seemed worthwhile as standalone cleanups. Mainly this just cleans up some of the helpers and pushes some pos/len trimming further down in the write begin path. The fbatch thing is still in prototype stage, but for context the intent here is that it can mostly now just bolt onto the folio lookup path because we can advance the range that is skipped and return the next folio along with the folio subrange for the caller to process. [1] https://lore.kernel.org/linux-fsdevel/20241213150528.1003662-1-bfoster@redhat.com/ * patches from https://lore.kernel.org/20250506134118.911396-1-bfoster@redhat.com: iomap: rework iomap_write_begin() to return folio offset and length iomap: push non-large folio check into get folio path iomap: helper to trim pos/bytes to within folio iomap: drop pos param from __iomap_[get|put]_folio() iomap: drop unnecessary pos param from iomap_write_[begin|end] iomap: resample iter->pos after iomap_write_begin() calls Link: https://lore.kernel.org/20250506134118.911396-1-bfoster@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09iomap: rework iomap_write_begin() to return folio offset and lengthBrian Foster
iomap_write_begin() returns a folio based on current pos and remaining length in the iter, and each caller then trims the pos/length to the given folio. Clean this up a bit and let iomap_write_begin() return the trimmed range along with the folio. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/20250506134118.911396-7-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09iomap: push non-large folio check into get folio pathBrian Foster
The len param to __iomap_get_folio() is primarily a folio allocation hint. iomap_write_begin() already trims its local len variable based on the provided folio, so move the large folio support check closer to folio lookup. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/20250506134118.911396-6-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-05-09iomap: helper to trim pos/bytes to within folioBrian Foster
Several buffered write based iteration callbacks duplicate logic to trim the current pos and length to within the current folio. Factor this into a helper to make it easier to relocate closer to folio lookup. Signed-off-by: Brian Foster <bfoster@redhat.com> Link: https://lore.kernel.org/20250506134118.911396-5-bfoster@redhat.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>