Age | Commit message (Collapse) | Author |
|
This is only used with old-style debug dump, which isn't
supported on newer devices, so remove the data.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250508121306.1277801-14-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Since 22000 series, the data is read by the firmware and the
driver doesn't need to know, remove the useless setting.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250508121306.1277801-13-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
These are (going to be) base MAC parameters that are identical
even for different platforms with the same MAC, so rename the
structure accordingly, calling it iwl_family_base_params.
Also rename the pointer to it so the dereferencing is a bit
shorter.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250508121306.1277801-12-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
This field is always set for >= 9000 series, but then we
already check that, so it's not needed. Remove it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250508121306.1277801-11-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
This field is unused, remove it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250508121306.1277801-10-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Since 9000 series devices, the devices are split into MAC and
CRF parts. Currently, "struct iwl_cfg" reflects some MAC and
some RF parameters, but we want to clean this up and move the
MAC data to what's now "struct iwl_cfg_trans_params". As the
first step, to reflect the intent, rename this structure.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250508121306.1277801-9-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
There's no need to pass various different pointers when
the transport is already established, so just pass that
instead.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250508121306.1277801-8-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
This value hasn't been used since unified firmware in 22000
series, so there's no need to set the value for that or
newer devices. Remove it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250508121306.1277801-7-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Instead of using fw_name_pre, handle the cc firmware file
name specially in iwl_drv_get_fwname_pre() for the cc MAC
type.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250508121306.1277801-6-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Add support for the TY MAC (discrete device) and GF4 RF to
the list of MAC/RF types, and use that to remove fw_name_pre
for the ax210 family devices.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250508121306.1277801-5-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
This is never used, so remove it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250508121306.1277801-4-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Since JF RF always uses b0 step and QuZ MAC always uses
a0 step for firmware, there's no need for these configs
that just force the steps to those values. Remove them.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250508121306.1277801-3-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
This is more maintainable than the fw_name_pre prefix, and
requires fewer duplicate structs as well. Since only b0 FW
exists, override the MAC/RF steps for these devices.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250508121306.1277801-2-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
This is needed, otherwise we don't know what to do and will
not find the correct firmware.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250506194102.3407967-16-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
These are test chips, not available to users, and not even listed
in the PCI IDs table (so the driver won't bind them). There's no
reason to list specific devices with them in the driver.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250506194102.3407967-15-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
With just a handful of values in two bytes, the params are
smaller than the pointer to them. Inline them and save some
space.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250506194102.3407967-14-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Since there's no HT on 6 GHz, only HE, the HT capabilities
are never initialized, and so the ht40_bands value is never
checked for the 6 GHz band. Remove the misleading value.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250506194102.3407967-13-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
The driver must not hold the wiphy mutex when unregistering the thermal
devices. Do not hold the lock for the call to iwl_mld_thermal_exit and
only do a lock/unlock to cancel the ct_kill_exit_wk work.
The problem is that iwl_mld_tzone_get_temp needs to take the wiphy lock
while the thermal code is holding its own locks already. When
unregistering the device, the reverse would happen as the driver was
calling thermal_cooling_device_unregister with the wiphy mutex already
held.
It is not likely to trigger this deadlock as it can only happen if the
thermal code is polling the temperature while the driver is being
unloaded. However, lockdep reported it so fix it.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250506194102.3407967-12-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
rx_omi::finished_work is initialized when the containing link is.
If the worker was queued and then an error happened, we will get to
iwl_mld_init_link from the reconfig and initialize the work after it was
queued.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250506194102.3407967-11-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Different hardware has a different maximum power consumption and the
BIOS can also define a power limit for the device. Add code to select
an appropriate maximum power budget for the device and configure that
instead of using a hardcoded table.
This removes the old table. It does not work with the variable upper
limit and the there should be no consumer that requires these exact step
values.
This considerably increases the power budget for some devices and can
prevent throttling in high traffic situations.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250506194102.3407967-10-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Different hardware has a different maximum power consumption and the
BIOS can also define a power limit for the device. Add code to select
an appropriate maximum power budget for the device and configure that
instead of using a hardcoded table.
This removes the old table. It does not work with the variable upper
limit and the there should be no consumer that requires these exact step
values.
This considerably increases the power budget for some devices and can
prevent throttling in high traffic situations.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250506194102.3407967-9-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
The compare_temps function in both mvm and mld dropped the const
qualifier in a cast in a way that makes -Werror=cast-qual unhappy. Add
the const to the cast to fix this.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250506194102.3407967-8-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
This function is not implemented nor called. Remove its declaration.
Link: https://patch.msgid.link/20250506194102.3407967-7-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Add support for a new version of link configuration command
which includes NPCA and high priority TX traffic support for wifi8.
Signed-off-by: Yedidya Benshimol <yedidya.ben.shimol@intel.com>
Link: https://patch.msgid.link/20250506194102.3407967-6-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Add a new version of mac configuration command
which includes UHR support indication.
Signed-off-by: Yedidya Benshimol <yedidya.ben.shimol@intel.com>
Link: https://patch.msgid.link/20250506194102.3407967-5-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Add a new version of sta configuration command
which includes these wifi8 features:
1. LDPC X2 CW size support indication
2. Indication if ICF frame is needed instead of RTS
3. support for MIC padding delays for protected control frames
Signed-off-by: Yedidya Benshimol <yedidya.ben.shimol@intel.com>
Link: https://patch.msgid.link/20250506194102.3407967-4-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Range response version 10 removes the rx and tx rates fields.
These fields aren't used by the driver anyway, so no change is
needed to support it.
Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Link: https://patch.msgid.link/20250506194102.3407967-3-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Since the FW is the one to assign an ID to a BA, it can happen that
the FW sends a bar_frame_release_notif before the driver had the chance to
allocate the BAID.
Convert the IWL_FW_CHECK into a regular debug print.
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Link: https://patch.msgid.link/20250506194102.3407967-2-miriam.rachel.korenblit@intel.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
|
|
Cong Wang says:
====================
net_sched: Fix gso_skb flushing during qdisc change
This patchset contains a bug fix and its test cases, please check each
patch description for more details. To keep the bug fix minimum, I
intentionally limit the code changes to the cases reported here.
---
v2: added a missing qlen--
fixed the new boolean parameter for two qdiscs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Added new test cases for FQ, FQ_CODEL, FQ_PIE, and HHF qdiscs to verify queue
trimming behavior when the qdisc limit is dynamically reduced.
Each test injects packets, reduces the qdisc limit, and checks that the new
limit is enforced. This is still best effort since timing qdisc backlog
is not easy.
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously, when reducing a qdisc's limit via the ->change() operation, only
the main skb queue was trimmed, potentially leaving packets in the gso_skb
list. This could result in NULL pointer dereference when we only check
sch->limit against sch->q.qlen.
This patch introduces a new helper, qdisc_dequeue_internal(), which ensures
both the gso_skb list and the main queue are properly flushed when trimming
excess packets. All relevant qdiscs (codel, fq, fq_codel, fq_pie, hhf, pie)
are updated to use this helper in their ->change() routines.
Fixes: 76e3cc126bb2 ("codel: Controlled Delay AQM")
Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler")
Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler")
Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
Fixes: d4b36210c2e6 ("net: pkt_sched: PIE AQM scheme")
Reported-by: Will <willsroot@protonmail.com>
Reported-by: Savy <savy@syst3mfailure.io>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
meson_ddr_pmu_create()
The Amlogic DDR PMU driver meson_ddr_pmu_create() function incorrectly uses
smp_processor_id(), which assumes disabled preemption. This leads to kernel
warnings during module loading because meson_ddr_pmu_create() can be called
in a preemptible context.
Following kernel warning and stack trace:
[ 31.745138] [ T2289] BUG: using smp_processor_id() in preemptible [00000000] code: (udev-worker)/2289
[ 31.745154] [ T2289] caller is debug_smp_processor_id+0x28/0x38
[ 31.745172] [ T2289] CPU: 4 UID: 0 PID: 2289 Comm: (udev-worker) Tainted: GW 6.14.0-0-MANJARO-ARM #1 59519addcbca6ba8de735e151fd7b9e97aac7ff0
[ 31.745181] [ T2289] Tainted: [W]=WARN
[ 31.745183] [ T2289] Hardware name: Hardkernel ODROID-N2Plus (DT)
[ 31.745188] [ T2289] Call trace:
[ 31.745191] [ T2289] show_stack+0x28/0x40 (C)
[ 31.745199] [ T2289] dump_stack_lvl+0x4c/0x198
[ 31.745205] [ T2289] dump_stack+0x20/0x50
[ 31.745209] [ T2289] check_preemption_disabled+0xec/0xf0
[ 31.745213] [ T2289] debug_smp_processor_id+0x28/0x38
[ 31.745216] [ T2289] meson_ddr_pmu_create+0x200/0x560 [meson_ddr_pmu_g12 8095101c49676ad138d9961e3eddaee10acca7bd]
[ 31.745237] [ T2289] g12_ddr_pmu_probe+0x20/0x38 [meson_ddr_pmu_g12 8095101c49676ad138d9961e3eddaee10acca7bd]
[ 31.745246] [ T2289] platform_probe+0x98/0xe0
[ 31.745254] [ T2289] really_probe+0x144/0x3f8
[ 31.745258] [ T2289] __driver_probe_device+0xb8/0x180
[ 31.745261] [ T2289] driver_probe_device+0x54/0x268
[ 31.745264] [ T2289] __driver_attach+0x11c/0x288
[ 31.745267] [ T2289] bus_for_each_dev+0xfc/0x160
[ 31.745274] [ T2289] driver_attach+0x34/0x50
[ 31.745277] [ T2289] bus_add_driver+0x160/0x2b0
[ 31.745281] [ T2289] driver_register+0x78/0x120
[ 31.745285] [ T2289] __platform_driver_register+0x30/0x48
[ 31.745288] [ T2289] init_module+0x30/0xfe0 [meson_ddr_pmu_g12 8095101c49676ad138d9961e3eddaee10acca7bd]
[ 31.745298] [ T2289] do_one_initcall+0x11c/0x438
[ 31.745303] [ T2289] do_init_module+0x68/0x228
[ 31.745311] [ T2289] load_module+0x118c/0x13a8
[ 31.745315] [ T2289] __arm64_sys_finit_module+0x274/0x390
[ 31.745320] [ T2289] invoke_syscall+0x74/0x108
[ 31.745326] [ T2289] el0_svc_common+0x90/0xf8
[ 31.745330] [ T2289] do_el0_svc+0x2c/0x48
[ 31.745333] [ T2289] el0_svc+0x60/0x150
[ 31.745337] [ T2289] el0t_64_sync_handler+0x80/0x118
[ 31.745341] [ T2289] el0t_64_sync+0x1b8/0x1c0
Changes replaces smp_processor_id() with raw_smp_processor_id() to
ensure safe CPU ID retrieval in preemptible contexts.
Cc: Jiucheng Xu <jiucheng.xu@amlogic.com>
Fixes: 2016e2113d35 ("perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver")
Signed-off-by: Anand Moon <linux.amoon@gmail.com>
Link: https://lore.kernel.org/r/20250407063206.5211-1-linux.amoon@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Somehow the encodings for REQ2/SNP2 channels in XP events
got mixed up... Unmix them.
CC: stable@vger.kernel.org
Fixes: 23760a014417 ("perf/arm-cmn: Add CMN-700 support")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/087023e9737ac93d7ec7a841da904758c254cb01.1746717400.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Joel Savitz <jsavitz@redhat.com> says:
The two patches are independent of each other. The first patch removes
unnecssary NULL guards from free_nsproxy() and create_new_namespaces()
in line with other usage of the put_*_ns() call sites. The second patch
slightly reduces the size of the kernel when CONFIG_CGROUPS is not
selected.
* patches from https://lore.kernel.org/20250508184930.183040-1-jsavitz@redhat.com:
include/cgroup: separate {get,put}_cgroup_ns no-op case
kernel/nsproxy: remove unnecessary guards
Link: https://lore.kernel.org/20250508184930.183040-1-jsavitz@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When CONFIG_CGROUPS is not selected, {get,put}_cgroup_ns become no-ops
and therefore it is not necessary to compile in the code for changing
the reference count.
When CONFIG_CGROUP is selected, there is no valid case where
either of {get,put}_cgroup_ns() will be called with a NULL argument.
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Link: https://lore.kernel.org/20250508184930.183040-3-jsavitz@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In free_nsproxy() and the error path of create_new_namesapces() the
put_*_ns() calls are guarded by unnecessary NULL checks.
put_pid_ns(), put_ipc_ns(), put_uts_ns(), and put_time_ns() will never
receive a NULL argument unless their namespace type is disabled, and in
this case all four become no-ops at compile time anyway. put_mnt_ns()
will never receive a null argument at any time.
This unguarded usage is in line with other call sites of put_*_ns().
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Link: https://lore.kernel.org/20250508184930.183040-2-jsavitz@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Using FREEZE_HOLDER_USERSPACE has two consequences:
(1) If userspace freezes the filesystem after mnt_drop_write_file() but
before freeze_super() was called filesystem resizing will fail
because the freeze isn't marked as nestable.
(2) If the kernel has successfully frozen the filesystem via
FREEZE_HOLDER_USERSPACE userspace can simply undo it by using the
FITHAW ioctl.
Fix both issues by using FREEZE_HOLDER_KERNEL. It will nest with
FREEZE_HOLDER_USERSPACE and cannot be undone by userspace.
And it is the correct thing to do because the kernel temporarily freezes
the filesystem.
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Christian Brauner <brauner@kernel.org> says:
Now all the pieces are in place to actually allow the power subsystem to
freeze/thaw filesystems during suspend/resume. Filesystems are only
frozen and thawed if the power subsystem does actually own the freeze.
Othwerwise it risks thawing filesystems it didn't own. This could be
done differently be e.g., keeping the filesystems that were actually
frozen on a list and then unfreezing them from that list. This is
disgustingly unclean though and reeks of an ugly hack.
If the filesystem is already frozen by the time we've frozen all
userspace processes we don't care to freeze it again. That's userspace's
job once the process resumes. We only actually freeze filesystems if we
absolutely have to and we ignore other failures to freeze.
We could bubble up errors and fail suspend/resume if the error isn't
EBUSY (aka it's already frozen) but I don't think that this is worth it.
Filesystem freezing during suspend/resume is best-effort. If the user
has 500 ext4 filesystems mounted and 4 fail to freeze for whatever
reason then we simply skip them.
What we have now is already a big improvement and let's see how we fare
with it before making our lives even harder (and uglier) than we have
to.
* patches from https://lore.kernel.org/r/20250402-work-freeze-v2-0-6719a97b52ac@kernel.org:
kernfs: add warning about implementing freeze/thaw
power: freeze filesystems during suspend/resume
Link: https://lore.kernel.org/r/20250402-work-freeze-v2-0-6719a97b52ac@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Christian Brauner <brauner@kernel.org> says:
Allow efivarfs to partake to resync variable state during system
hibernation and suspend. Add freeze/thaw support.
This is a pretty straightforward implementation. We simply add regular
freeze/thaw support for both userspace and the kernel. This works
without any big issues and congrats afaict efivars is the first
pseudofilesystem that adds support for filesystem freezing and thawing.
The simplicity comes from the fact that we simply always resync variable
state after efivarfs has been frozen. It doesn't matter whether that's
because of suspend, userspace initiated freeze or hibernation. Efivars
is simple enough that it doesn't matter that we walk all dentries. There
are no directories and there aren't insane amounts of entries and both
freeze/thaw are already heavy-handed operations. If userspace initiated
a freeze/thaw cycle they would need CAP_SYS_ADMIN in the initial user
namespace (as that's where efivarfs is mounted) so it can't be triggered
by random userspace. IOW, we really really don't care.
* patches from https://lore.kernel.org/r/20250331-work-freeze-v1-0-6dfbe8253b9f@kernel.org:
efivarfs: support freeze/thaw
libfs: export find_next_child()
Link: https://lore.kernel.org/r/20250331-work-freeze-v1-0-6dfbe8253b9f@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Sysfs is built on top of kernfs and sysfs provides the power management
infrastructure to support suspend/hibernate by writing to various files
in /sys/power/. As filesystems may be automatically frozen during
suspend/hibernate implementing freeze/thaw support for kernfs
generically will cause deadlocks as the suspending/hibernation
initiating task will hold a VFS lock that it will then wait upon to be
released. If freeze/thaw for kernfs is needed talk to the VFS.
Link: https://lore.kernel.org/r/20250402-work-freeze-v2-4-6719a97b52ac@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Allow efivarfs to partake to resync variable state during system
hibernation and suspend. Add freeze/thaw support.
This is a pretty straightforward implementation. We simply add regular
freeze/thaw support for both userspace and the kernel. This works
without any big issues and congrats afaict efivars is the first
pseudofilesystem that adds support for filesystem freezing and thawing.
The simplicity comes from the fact that we simply always resync variable
state after efivarfs has been frozen. It doesn't matter whether that's
because of suspend, userspace initiated freeze or hibernation. Efivars
is simple enough that it doesn't matter that we walk all dentries. There
are no directories and there aren't insane amounts of entries and both
freeze/thaw are already heavy-handed operations. We really really don't
need to care.
Link: https://lore.kernel.org/r/20250331-work-freeze-v1-2-6dfbe8253b9f@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Now all the pieces are in place to actually allow the power subsystem
to freeze/thaw filesystems during suspend/resume. Filesystems are only
frozen and thawed if the power subsystem does actually own the freeze.
We could bubble up errors and fail suspend/resume if the error isn't
EBUSY (aka it's already frozen) but I don't think that this is worth it.
Filesystem freezing during suspend/resume is best-effort. If the user
has 500 ext4 filesystems mounted and 4 fail to freeze for whatever
reason then we simply skip them.
What we have now is already a big improvement and let's see how we fare
with it before making our lives even harder (and uglier) than we have
to.
We add a new sysctl know /sys/power/freeze_filesystems that will allow
userspace to freeze filesystems during suspend/hibernate. For now it
defaults to off. The thaw logic doesn't require checking whether
freezing is enabled because the power subsystem exclusively owns frozen
filesystems for the duration of suspend/hibernate and is able to skip
filesystems it doesn't need to freeze.
Also it is technically possible that filesystem
filesystem_freeze_enabled is true and power freezes the filesystems but
before freezing all processes another process disables
filesystem_freeze_enabled. If power were to place the filesystems_thaw()
call under filesystems_freeze_enabled it would fail to thaw the
fileystems it frozw. The exclusive holder mechanism makes it possible to
iterate through the list without any concern making sure that no
filesystems are left frozen.
Link: https://lore.kernel.org/r/20250402-work-freeze-v2-3-6719a97b52ac@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Export find_next_child() so it can be used by efivarfs.
Keep it internal for now. There's no reason to advertise this
kernel-wide.
Link: https://lore.kernel.org/r/20250331-work-freeze-v1-1-6dfbe8253b9f@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Christian Brauner <brauner@kernel.org> says:
Add the necessary infrastructure changes to support freezing for suspend
and hibernate. This should all that's needed to wire up power.
* patches from https://lore.kernel.org/r/20250329-work-freeze-v2-0-a47af37ecc3d@kernel.org:
super: add filesystem freezing helpers for suspend and hibernate
gfs2: pass through holder from the VFS for freeze/thaw
super: use common iterator (Part 2)
super: use a common iterator (Part 1)
super: skip dying superblocks early
super: simplify user_get_super()
super: remove pointless s_root checks
Link: https://lore.kernel.org/r/20250329-work-freeze-v2-0-a47af37ecc3d@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Allow the power subsystem to support filesystem freeze for
suspend and hibernate.
For some kernel subsystems it is paramount that they are guaranteed that
they are the owner of the freeze to avoid any risk of deadlocks. This is
the case for the power subsystem. Enable it to recognize whether it did
actually freeze the filesystem.
If userspace has 10 filesystems and suspend/hibernate manges to freeze 5
and then fails on the 6th for whatever odd reason (current or future)
then power needs to undo the freeze of the first 5 filesystems. It can't
just walk the list again because while it's unlikely that a new
filesystem got added in the meantime it still cannot tell which
filesystems the power subsystem actually managed to get a freeze
reference count on that needs to be dropped during thaw.
There's various ways out of this ugliness. For example, record the
filesystems the power subsystem managed to freeze on a temporary list in
the callbacks and then walk that list backwards during thaw to undo the
freezing or make sure that the power subsystem just actually exclusively
freezes things it can freeze and marking such filesystems as being owned
by power for the duration of the suspend or resume cycle. I opted for
the latter as that seemed the clean thing to do even if it means more
code changes.
If hibernation races with filesystem freezing (e.g. DM reconfiguration),
then hibernation need not freeze a filesystem because it's already
frozen but userspace may thaw the filesystem before hibernation actually
happens.
If the race happens the other way around, DM reconfiguration may
unexpectedly fail with EBUSY.
So allow FREEZE_EXCL to nest with other holders. An exclusive freezer
cannot be undone by any of the other concurrent freezers.
Link: https://lore.kernel.org/r/20250329-work-freeze-v2-6-a47af37ecc3d@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Stop using write_cache_pages and use writeback_iter directly. This
removes an indirect call per written folio and makes the code easier
to follow.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/20250507062124.3933305-1-hch@lst.de
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Brian Foster <bfoster@redhat.com> says:
Here's a bit more fallout and prep. work associated with the folio batch
prototype posted a while back [1]. Work on that is still pending so it
isn't included here, but based on the iter advance cleanups most of
these seemed worthwhile as standalone cleanups. Mainly this just cleans
up some of the helpers and pushes some pos/len trimming further down in
the write begin path.
The fbatch thing is still in prototype stage, but for context the intent
here is that it can mostly now just bolt onto the folio lookup path
because we can advance the range that is skipped and return the next
folio along with the folio subrange for the caller to process.
[1] https://lore.kernel.org/linux-fsdevel/20241213150528.1003662-1-bfoster@redhat.com/
* patches from https://lore.kernel.org/20250506134118.911396-1-bfoster@redhat.com:
iomap: rework iomap_write_begin() to return folio offset and length
iomap: push non-large folio check into get folio path
iomap: helper to trim pos/bytes to within folio
iomap: drop pos param from __iomap_[get|put]_folio()
iomap: drop unnecessary pos param from iomap_write_[begin|end]
iomap: resample iter->pos after iomap_write_begin() calls
Link: https://lore.kernel.org/20250506134118.911396-1-bfoster@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
iomap_write_begin() returns a folio based on current pos and
remaining length in the iter, and each caller then trims the
pos/length to the given folio. Clean this up a bit and let
iomap_write_begin() return the trimmed range along with the folio.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://lore.kernel.org/20250506134118.911396-7-bfoster@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The len param to __iomap_get_folio() is primarily a folio allocation
hint. iomap_write_begin() already trims its local len variable based
on the provided folio, so move the large folio support check closer
to folio lookup.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://lore.kernel.org/20250506134118.911396-6-bfoster@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Several buffered write based iteration callbacks duplicate logic to
trim the current pos and length to within the current folio. Factor
this into a helper to make it easier to relocate closer to folio
lookup.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Link: https://lore.kernel.org/20250506134118.911396-5-bfoster@redhat.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|