Age | Commit message (Collapse) | Author |
|
The double `if' is duplicated in the comment, remove one.
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Link: https://msgid.link/20220811115958.8423-1-wangborong@cdjrlc.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20241229164246.9d8c224e9d4c.Iaacfbd1e9432f31d5d7d037ad925aadbb0d5c4d6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Delete a duplicate statement from this function implementation.
Signed-off-by: Minjie Du <duminjie@vivo.com>
Link: https://msgid.link/20230705114934.16523-1-duminjie@vivo.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20241229164246.b1b0dadc2e9e.Ie57cbe8039b9f388632141447ac910b6fcc3d0c0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Use IS_ERR_OR_NULL() instead of open-coding it
to simplify the code.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241229164246.d3423626d981.I3b4cc7f19d1bfecdb2e6a4eba8da1c7a41461115@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Fix spelling typo in iwl-context-info.h comment.
Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: shitao <shitao@kylinos.cn>
Link: https://msgid.link/20231212093424.3104329-1-shitao@kylinos.cn
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241229164246.c79c132f811b.Ie07a0007b96359b3552878e23c4b9efeb07bba8d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Remove the duplicate "the".
Signed-off-by: Deming Wang <wangdeming@inspur.com>
Link: https://msgid.link/20240318054853.2352-1-wangdeming@inspur.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20241229164246.7b385f337e46.Iae60151e718f344098058b0e4fa6f6c1e43cb414@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
iwl_ssid_exist() seems to check if a given ssid/ssid_len already
exists in a given array ssid_list.
Correctly compare the ssid to the SSID of each array element
(with a matching SSID length) to better remove duplicates.
Signed-off-by: Bjoern A. Zeeb <bz@FreeBSD.org>
Sponsored by: The FreeBSD Foundation
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Daniel Gabay <daniel.gabay@intel.com>
Link: https://patch.msgid.link/20241229164246.4471cd3d8dba.Iab8409b22bf6f01d05571ecef1e97dd3c8b1cc75@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The changes ensure that there is a space between the `u8` type and the
`*` character as preferred by the guidelines.
This change is purely stylistic and do not affect the functionality
of the code.
Signed-off-by: Juan José Arboleda <soyjuanarbol@gmail.com>
Link: https://msgid.link/10b6d4945675cada713e819f7bd6782a66a1c0d2.1724103043.git.soyjuanarbol@gmail.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20241229164246.f09a200be4f8.Ia564ae1c59136bd3c2864ccfb3a244b3257dcd5f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Fix typo 'adderss' to 'address'.
Signed-off-by: Gan Jie <ganjie182@gmail.com>
Link: https://msgid.link/20241101143052.1531-1-ganjie182@gmail.com
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20241229164246.ad8978ee5673.I388e314a4be8333192b3994f43efa5dbd3ac715d@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Smatch reported that we dereference the data pointer to calculate the
expected length before we check it's not NULL. While this is true (and
hence needs to be fixed), this will never happen because the data
pointer comes from struct iwl_rx_packet object which has the following
layout:
struct iwl_rx_packet {
__le32 len_n_flags;
struct iwl_cmd_header hdr;
u8 data[];
} __packed;
So, if the pointer to iwl_rx_packet is valid, data will be valid as
well.
Remove the NULL pointer check on 'data' to avoid confusing smatch.
Also remove the check from similar functions in the same flow that were
cargo cult copy-pasted.
Fixes: 4635e6eaa0fe ("wifi: iwlwifi: mvm: support new versions of the wowlan APIs")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202411210812.0eLaonw3-lkp@intel.com/
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241229164246.c8ce6e041e4b.I4dc19289e3f3807386768c846e08be3ea322cd15@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This should be comparing the AP STA, not the deflink firmware STA
ID. Correct the implementation so that statistics can be requested
for the AP, but not for other stations that may end up with the
firmware STA ID matching 0 in the deflink, or so.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241229164246.08b05aca37cf.Iba1a6a637a758691f710dc4f3f03bd1d960fb087@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Remove unused fields from the transport API.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241229164246.1d04ce18a0ec.Ibfac364163b55b52196d30ff2b43945c5aa804a9@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When handling TX_CMD notification, for mgmt frames tid is equal
to IWL_MAX_TID_COUNT, so with the current logic we'll count
that as MPDU, fix that.
Fixes: ec0d43d26f2c ("wifi: iwlwifi: mvm: Activate EMLSR based on traffic volume")
Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241229164246.80b119bb5d08.I31b1e8ba25cce15819225e5ac80332e4eaa20c13@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The API isn't really MVM specific, it's just the firmware
API. Remove the "MVM_" from the name here as well, as we've
already done in many other places.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241229164246.66e17791c392.I6998e263973c26c1e22b4f470b974a519011b29a@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When iterating over the links of a vif, we need to make sure that the
pointer is valid (in other words - that the link exists) before
dereferncing it.
Use for_each_vif_active_link that also does the check.
Fixes: 2b7ee1a10a72 ("wifi: iwlwiif: mvm: handle the new BT notif")
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20241229164246.31d41f7d3eab.I7fb7036a0b187c1636b01970207259cb2327952c@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Start supporting API version 96 for new devices.
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241228223206.4028b66f4563.I5d5caf4bffeabcab72a69c2b31445e7bee4a94b6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
My recent restart related work has made this race more likely
to happen and we've now noticed it, but it seems that it was
always possible. The race is that the add stream work can be
scheduled just before a restart is scheduled and then execute
before the restart, accessing the device while it's doing the
restart and not accessible.
To fix this, check if the device is restarting and abort the
work in that case. Reschedule it after the restart as well.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241228223206.9c30af039b4d.I1a32936776f8ba5e83dda0a68ffc2722d9d37950@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This debugfs hook really belongs to the firmware handling code and then
we can use it across different op_modes.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241228223206.d31f5994c6a6.Ibe3bc7a25e2bbf7a575287e19db58833bb3e6b9e@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
By convention the newest version of a command/notification structure is
named with out the _ver_# suffix. Apply to stored_beacon_notif.
Signed-off-by: Yedidya Benshimol <yedidya.ben.shimol@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241228223206.e2140aa3c65b.Ie851bdda6df02dcc352bf765a3ec6bdac45c65a2@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
A caller of iwl_acpi_get_dsm_object must free the returned object.
iwl_acpi_get_dsm_integer returns immediately without freeing
it if the expected size is more than 8 bytes. Fix that.
Note that with the current code this will never happen, since the caller
of iwl_acpi_get_dsm_integer already checks that the expected size if
either 1 or 4 bytes, so it can't exceed 8 bytes.
While at it, print the DSM value instead of the return value, as this
was the intention in the first place.
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241228223206.bf61eaab99f8.Ibdc5df02f885208c222456d42c889c43b7e3b2f7@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Usually each struct that represent an API needs to have a comment
specifying all the versions of the API that this struct corresponds to.
iwl_tx_cmd_gen3 was long supporting also version 10. Say that.
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20241228223206.83d681dc9cf7.I355270fb20b23978d9402cb70caf52a0108b8cd4@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Adding channel_load_not_by_us in the mvm phy context.
Signed-off-by: Somashekhar(Som) <somashekhar.puttagangaiah@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241228223206.7c7f3ebebadf.Ifac005cf1e3b02cba0861eb19bfd8099957faad9@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In blank OTP, we get the CRF type from a peripheral register,
support it for PE CRF
Signed-off-by: Somashekhar(Som) <somashekhar.puttagangaiah@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241228223206.a8899d585a6e.I9d9b223c75d5370811220291c62c364967c0acc3@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Enter EMLSR only when two bands are different.
EMLSR should be allowed when one of the link is LB.
Signed-off-by: Somashekhar(Som) <somashekhar.puttagangaiah@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241228223206.ec659168eeb7.I403f61f0e827c14cf2b245f48e1736559f17c476@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Refactor some parts of the image loading to be able to
extend the code for external FSEQ image loading more
easily.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241228223206.224ac6599bbe.Iadc1974d633eec09797522f7d3fa543ea18bd7f6@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
These are not mvm specific.
Signed-off-by: Daniel Gabay <daniel.gabay@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241228223206.1b235ec5354e.If99a38b1f0d7e42ea4ee3907e6c395846c4aa9b0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The register 0x000 is now really boot control, and some
of the old bit names were (even for old hardware) not
reflecting the names on the hardware side; rename them
in the driver to align the naming.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241228223206.6f25be160619.I3ffc9601e99dc414a9ae54a0d90c9d20c0253da5@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This is really where it belongs.
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241228223206.98bdc5e62828.Iee7a8365dd63ebf580d324f90e1e04466d8ef5d5@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This is needed also for more opmodes, and is really not opmode dependent.
Make it a iwlwifi util.
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Link: https://patch.msgid.link/20241228223206.a36373eefbf2.Ib1f305b78508c98934f6000720d6455c88a860cb@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Since iwl_fw_error_collect() is now always called with the sync
argument set to true, to collect data synchronously, remove the
argument from it entirely.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241227095718.08f515513e88.I780a557743ca7f029f46a1cc75d0799542e39d83@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In order to later add the ability to do deeper resets of the
device when it crashes, first restructure the firmware error
handling. Instead of having just a single nic_error() method
that handles all, split it:
- nic_error() just handles and prints the error itself,
- dump_error() synchronously creates an error dump, and
- sw_reset() will be called to request doing a SW reset.
This changes the architecture so that the transport is now
responsible for deciding how to do the reset, and therefore
the handling of reprobe if error occurs during reconfig
moves there, which necessitates adding a method there that
notifies the transport that the recovery was completed.
Actually introducing the model under which deeper resets can
be done will be in future patches.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241227095718.6d4f741ae907.I96a9243e7877808ed6d1bff6967c15d6c24882f0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When some channel context manipulations fail, the device
is going to be restarted to try to recover. Make this go
through a real FW restart via an NMI so the transport is
aware of it and can later handle escalation, and to make
it easier to restructure the code later.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241227095718.96b732029d20.I2e729f402db58a76cea620b6f62a02da49a10b48@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Except for some special handling in DVM, error dump and some
message behaviour, cmd_queue_full and nic_error are equivalent
now. Unify by giving a special error type, so DVM can continue
to differentiate.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241227095718.0222183504aa.Ie29cef75fbd91b64a43619bc36bd5b29c5b9f957@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Approximately three years ago, in commit ddb6b76b6f96
("iwlwifi: yoyo: support TLV-based firmware reset"), the code
was (likely erroneously) changed to no longer treat error
interrupts as firmware errors. As a result, this meant that
the fw_restart counter was only applied in case of command
queue being stuck, which never seems to happen. Also, there's
no longer any way to set the mvm->fw_restart to a value that
doesn't match exactly the module parameter behaviour.
Instead of trying to fix this, simply remove the logic that
limits the number of restarts, it's clearly unused.
However, restore the logic that restart isn't unconditional,
by checking the module parameter.
Since the "fw_error" argument to iwl_mvm_nic_restart() is now
always true (except in the "never happens" case of CMD queue
stuck), just remove it too and treat command queue stuck the
same way as everything else.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241227095718.b0489daf323c.I0cd3233b2214c5f06e059f746041b19d08647e40@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Now that the retry loop only happens when timeouts occur
and firmware errors are different, we no longer need the
STARTING state with all the infrastructure for it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241227095718.c55d73436521.I08e9f6a71d56f86872bca4d4e3048faa113a7120@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
We had reverted the retry loop removal because of an issue
with PNVM loading, but that issue manifests as timeouts.
Since the retries aren't needed in other cases, only do
them when there were timeouts while starting, not other
errors.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241227095718.98201c79f66d.I5d7e12b219d533c6a77741ec5863984d35711f48@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
We had reverted the retry loop removal because of an issue
with PNVM loading, but that issue manifests as timeouts.
Since the retry loops aren't needed in other cases, only
do them when there were timeouts while loading, not other
errors.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241227095718.a21bf40b0fd3.I70166e460906d6d183359889d7543b9c587b7182@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In order to restrict the retry loops for timeouts, first
pass the error code up using ERR_PTR(). This of course
requires all existing functions to be updated accordingly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241227095718.3fe5031d5784.I7307996c91dac69619ff9c616b8a077423fac19f@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
These comments have kernel-doc markup and were meant to
be handled as such, add the right /** marker to them.
Add missing entries where needed.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241227095718.c5c04b641479.I702b8122d307a0d9d09df038cda10be063f7f2d7@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
For certain platforms, it may necessary to use the STEP in URM
(ultra reliable mode.) Read the necessary flags from the BIOS
(ACPI or UEFI) and indicate the chosen mode to the firmware in
the context info. Whether or not URM really was configured is
already read back later, to adjust capabilities accordingly.
Signed-off-by: Somashekhar(Som) <somashekhar.puttagangaiah@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Daniel Gabay <daniel.gabay@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241227095718.b30024905de3.If3c578af2c15f8005bbe71499bc4091348ed7bb0@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
This variable exists for the "common" (WiFi/BT) GUID, not the
WiFi-only GUID. Fix that by passing the GUID to the function.
A short-cut for the wifi-only version remains so not all code
must be updated.
However, rename the GUID defines to be clearer.
Fixes: 09b4c35d73a5 ("wifi: iwlwifi: mvm: Support STEP equalizer settings from BIOS.")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Daniel Gabay <daniel.gabay@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20241227095718.89a5ad921b6d.Idae95a70ff69d2ba1b610e8eced826961ce7de98@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
[BUG]
When running btrfs with block size (4K) smaller than page size (64K,
aarch64), there is a very high chance to crash the kernel at
generic/750, with the following messages:
(before the call traces, there are 3 extra debug messages added)
BTRFS warning (device dm-3): read-write for sector size 4096 with page size 65536 is experimental
BTRFS info (device dm-3): checking UUID tree
hrtimer: interrupt took 5451385 ns
BTRFS error (device dm-3): cow_file_range failed, root=4957 inode=257 start=1605632 len=69632: -28
BTRFS error (device dm-3): run_delalloc_nocow failed, root=4957 inode=257 start=1605632 len=69632: -28
BTRFS error (device dm-3): failed to run delalloc range, root=4957 ino=257 folio=1572864 submit_bitmap=8-15 start=1605632 len=69632: -28
------------[ cut here ]------------
WARNING: CPU: 2 PID: 3020984 at ordered-data.c:360 can_finish_ordered_extent+0x370/0x3b8 [btrfs]
CPU: 2 UID: 0 PID: 3020984 Comm: kworker/u24:1 Tainted: G OE 6.13.0-rc1-custom+ #89
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
pc : can_finish_ordered_extent+0x370/0x3b8 [btrfs]
lr : can_finish_ordered_extent+0x1ec/0x3b8 [btrfs]
Call trace:
can_finish_ordered_extent+0x370/0x3b8 [btrfs] (P)
can_finish_ordered_extent+0x1ec/0x3b8 [btrfs] (L)
btrfs_mark_ordered_io_finished+0x130/0x2b8 [btrfs]
extent_writepage+0x10c/0x3b8 [btrfs]
extent_write_cache_pages+0x21c/0x4e8 [btrfs]
btrfs_writepages+0x94/0x160 [btrfs]
do_writepages+0x74/0x190
filemap_fdatawrite_wbc+0x74/0xa0
start_delalloc_inodes+0x17c/0x3b0 [btrfs]
btrfs_start_delalloc_roots+0x17c/0x288 [btrfs]
shrink_delalloc+0x11c/0x280 [btrfs]
flush_space+0x288/0x328 [btrfs]
btrfs_async_reclaim_data_space+0x180/0x228 [btrfs]
process_one_work+0x228/0x680
worker_thread+0x1bc/0x360
kthread+0x100/0x118
ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1605632 OE len=16384 to_dec=16384 left=0
BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1622016 OE len=12288 to_dec=12288 left=0
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
BTRFS critical (device dm-3): bad ordered extent accounting, root=4957 ino=257 OE offset=1634304 OE len=8192 to_dec=4096 left=0
CPU: 1 UID: 0 PID: 3286940 Comm: kworker/u24:3 Tainted: G W OE 6.13.0-rc1-custom+ #89
Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022
Workqueue: btrfs_work_helper [btrfs] (btrfs-endio-write)
pstate: 404000c5 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : process_one_work+0x110/0x680
lr : worker_thread+0x1bc/0x360
Call trace:
process_one_work+0x110/0x680 (P)
worker_thread+0x1bc/0x360 (L)
worker_thread+0x1bc/0x360
kthread+0x100/0x118
ret_from_fork+0x10/0x20
Code: f84086a1 f9000fe1 53041c21 b9003361 (f9400661)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs
SMP: failed to stop secondary CPUs 2-3
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: 0x275bb9540000 from 0xffff800080000000
PHYS_OFFSET: 0xffff8fbba0000000
CPU features: 0x100,00000070,00801250,8201720b
[CAUSE]
The above warning is triggered immediately after the delalloc range
failure, this happens in the following sequence:
- Range [1568K, 1636K) is dirty
1536K 1568K 1600K 1636K 1664K
| |/////////|////////| |
Where 1536K, 1600K and 1664K are page boundaries (64K page size)
- Enter extent_writepage() for page 1536K
- Enter run_delalloc_nocow() with locked page 1536K and range
[1568K, 1636K)
This is due to the inode having preallocated extents.
- Enter cow_file_range() with locked page 1536K and range
[1568K, 1636K)
- btrfs_reserve_extent() only reserved two extents
The main loop of cow_file_range() only reserved two data extents,
Now we have:
1536K 1568K 1600K 1636K 1664K
| |<-->|<--->|/|///////| |
1584K 1596K
Range [1568K, 1596K) has an ordered extent reserved.
- btrfs_reserve_extent() failed inside cow_file_range() for file offset
1596K
This is already a bug in our space reservation code, but for now let's
focus on the error handling path.
Now cow_file_range() returned -ENOSPC.
- btrfs_run_delalloc_range() do error cleanup <<< ROOT CAUSE
Call btrfs_cleanup_ordered_extents() with locked folio 1536K and range
[1568K, 1636K)
Function btrfs_cleanup_ordered_extents() normally needs to skip the
ranges inside the folio, as it will normally be cleaned up by
extent_writepage().
Such split error handling is already problematic in the first place.
What's worse is the folio range skipping itself, which is not taking
subpage cases into consideration at all, it will only skip the range
if the page start >= the range start.
In our case, the page start < the range start, since for subpage cases
we can have delalloc ranges inside the folio but not covering the
folio.
So it doesn't skip the page range at all.
This means all the ordered extents, both [1568K, 1584K) and
[1584K, 1596K) will be marked as IOERR.
And these two ordered extents have no more pending ios, they are marked
finished, and *QUEUED* to be deleted from the io tree.
- extent_writepage() do error cleanup
Call btrfs_mark_ordered_io_finished() for the range [1536K, 1600K).
Although ranges [1568K, 1584K) and [1584K, 1596K) are finished, the
deletion from io tree is async, it may or may not happen at this
time.
If the ranges have not yet been removed, we will do double cleaning on
those ranges, triggering the above ordered extent warnings.
In theory there are other bugs, like the cleanup in extent_writepage()
can cause double accounting on ranges that are submitted asynchronously
(compression for example).
But that's much harder to trigger because normally we do not mix regular
and compression delalloc ranges.
[FIX]
The folio range split is already buggy and not subpage compatible, it
was introduced a long time ago where subpage support was not even considered.
So instead of splitting the ordered extents cleanup into the folio range
and out of folio range, do all the cleanup inside writepage_delalloc().
- Pass @NULL as locked_folio for btrfs_cleanup_ordered_extents() in
btrfs_run_delalloc_range()
- Skip the btrfs_cleanup_ordered_extents() if writepage_delalloc()
failed
So all ordered extents are only cleaned up by
btrfs_run_delalloc_range().
- Handle the ranges that already have ordered extents allocated
If part of the folio already has ordered extent allocated, and
btrfs_run_delalloc_range() failed, we also need to cleanup that range.
Now we have a concentrated error handling for ordered extents during
btrfs_run_delalloc_range().
Fixes: d1051d6ebf8e ("btrfs: Fix error handling in btrfs_cleanup_ordered_extents")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
for-6.14/block
Pull NVMe updates from Keith:
"nvme updates for Linux 6.14
- Target support for PCI-Endpoint transport (Damien)
- TCP IO queue spreading fixes (Sagi, Chaitanya)
- Target handling for "limited retry" flags (Guixen)
- Poll type fix (Yongsoo)
- Xarray storage error handling (Keisuke)
- Host memory buffer free size fix on error (Francis)"
* tag 'nvme-6.14-2025-01-12' of git://git.infradead.org/nvme: (25 commits)
nvme-pci: use correct size to free the hmb buffer
nvme: Add error path for xa_store in nvme_init_effects
nvme-pci: fix comment typo
Documentation: Document the NVMe PCI endpoint target driver
nvmet: New NVMe PCI endpoint function target driver
nvmet: Implement arbitration feature support
nvmet: Implement interrupt config feature support
nvmet: Implement interrupt coalescing feature support
nvmet: Implement host identifier set feature support
nvmet: Introduce get/set_feature controller operations
nvmet: Do not require SGL for PCI target controller commands
nvmet: Add support for I/O queue management admin commands
nvmet: Introduce nvmet_sq_create() and nvmet_cq_create()
nvmet: Introduce nvmet_req_transfer_len()
nvmet: Improve nvmet_alloc_ctrl() interface and implementation
nvme: Add PCI transport type
nvmet: Add drvdata field to struct nvmet_ctrl
nvmet: Introduce nvmet_get_cmd_effects_admin()
nvmet: Export nvmet_update_cc() and nvmet_cc_xxx() helpers
nvmet: Add vendor_id and subsys_vendor_id subsystem attributes
...
|
|
Rename the macro so it's obvious what it means.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
lock_delalloc_folios()
Same pattern in both functions, we really only use index, start_index is
redundant.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There are only 2 WAIT_* values left for wait parameter, we can encode
this to the function name if the waiting functionality is split.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Last use was in the readahead code that got removed by f26c9238602856
("btrfs: remove reada infrastructure").
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Another conversion to folio API, use the folio locking directly instead
of back and forth page <-> folio conversions.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The check function pattern is supposed to return true/false, currently
there's only one error code.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Use the folio API, remove page_folio/folio_page conversions.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Continue page to folio updates, sync what the function does with it's
name.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|