Age | Commit message (Collapse) | Author |
|
Remove dev_err() messages after platform_get_irq*() failures.
platform_get_irq() already prints an error.
Generated by: scripts/coccinelle/api/platform_get_irq.cocci
Signed-off-by: Yihao Han <hanyihao@vivo.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20220303123431.3170-1-hanyihao@vivo.com
|
|
Several WMI queries leverage hp_wmi_read_int function to read their
data. hp_wmi_read_int function was corrected in a previous patch.
Now, this function invokes hp_wmi_perform_query with input parameter
of size zero and the output buffer of size 4.
WMI commands calling hp_wmi_perform_query with input buffer size value
of zero are listed below.
HPWMI_DISPLAY_QUERY
HPWMI_HDDTEMP_QUERY
HPWMI_ALS_QUERY
HPWMI_HARDWARE_QUERY
HPWMI_WIRELESS_QUERY
HPWMI_BIOS_QUERY
HPWMI_FEATURE_QUERY
HPWMI_HOTKEY_QUERY
HPWMI_FEATURE2_QUERY
HPWMI_WIRELESS2_QUERY
HPWMI_POSTCODEERROR_QUERY
HPWMI_THERMAL_PROFILE_QUERY
HPWMI_FAN_SPEED_MAX_GET_QUERY
Invoking those WMI commands with an input buffer size greater
than zero will cause error 0x05 to be returned.
All WMI commands executed by the driver were reviewed and changes
were made to ensure the expected input and output buffer size match
the WMI specification.
Changes were validated on a HP ZBook Workstation notebook,
HP EliteBook x360, and HP EliteBook 850 G8. Additional
validation was included in the test process to ensure no other
commands were incorrectly handled.
Signed-off-by: Jorge Lopez <jorge.lopez2@hp.com>
Link: https://lore.kernel.org/r/20220310210853.28367-4-jorge.lopez2@hp.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
The purpose of this patch is to introduce a fix and removal of the
current hack when determining tablet mode status.
Determining the tablet mode status requires reading Byte 0 bit 2 as
reported by HPWMI_HARDWARE_QUERY. The investigation identified the
failure was rooted in two areas: HPWMI_HARDWARE_QUERY failure (0x05)
and reading Byte 0, bit 2 only to determine the table mode status.
HPWMI_HARDWARE_QUERY WMI failure also rendered the dock state value
invalid.
The latest changes use SMBIOS Type 3 (chassis type) and WMI Command
0x40 (device_mode_status) information to determine if the device is
in tablet mode or not.
hp_wmi_hw_state function was split into two functions;
hp_wmi_get_dock_state and hp_wmi_get_tablet_mode. The new functions
separate how dock_state and tablet_mode is handled in a cleaner
manner.
All changes were validated on a HP ZBook Workstation notebook,
HP EliteBook x360, and HP EliteBook 850 G8.
Signed-off-by: Jorge Lopez <jorge.lopez2@hp.com>
Link: https://lore.kernel.org/r/20220310210853.28367-3-jorge.lopez2@hp.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
The purpose of this patch is to introduce a fix to hp_wmi_read_int()
and eliminate failure error (0x05). Several WMI queries leverage
hp_wmi_read_int() to read their data and were failing with error 0x05.
HPWMI_DISPLAY_QUERY
HPWMI_HDDTEMP_QUERY
HPWMI_ALS_QUERY
HPWMI_HARDWARE_QUERY
HPWMI_WIRELESS_QUERY
HPWMI_POSTCODEERROR_QUERY
The failure occurs because hp_wmi_read_int() calls
hp_wmi_perform_query() with input parameter of size greater than zero.
Invoking those WMI commands with an input buffer size greater than
zero causes the command to be rejected and error 0x05 be returned.
All changes were validated on a HP ZBook Workstation notebook,
HP EliteBook x360, and HP EliteBook 850 G8.
Signed-off-by: Jorge Lopez <jorge.lopez2@hp.com>
Link: https://lore.kernel.org/r/20220310210853.28367-2-jorge.lopez2@hp.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
Test that errors occur if key protection disallows access, including
tests for storage and fetch protection override. Perform tests for both
logical vcpu and absolute vm ioctls.
Also extend the existing tests to the vm ioctl.
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Link: https://lore.kernel.org/r/20220308125841.3271721-6-scgl@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
|
|
Do not just test the actual copy, but also that success is indicated
when using the check only flag.
Add copy test with storage key checking enabled, including tests for
storage and fetch protection override.
These test cover both logical vcpu ioctls as well as absolute vm ioctls.
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Link: https://lore.kernel.org/r/20220308125841.3271721-5-scgl@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
|
|
The stages synchronize guest and host execution.
This helps the reader and constraits the execution of the test -- if the
observed staging differs from the expected the test fails.
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Link: https://lore.kernel.org/r/20220308125841.3271721-4-scgl@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
|
|
In order to achieve good test coverage we need to be able to invoke the
MEM_OP ioctl with all possible parametrizations.
However, for a given test, we want to be concise and not specify a long
list of default values for parameters not relevant for the test, so the
readers attention is not needlessly diverted.
Add a macro that enables this and convert the existing test to use it.
The macro emulates named arguments and hides some of the ioctl's
redundancy, e.g. sets the key flag if an access key is specified.
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Link: https://lore.kernel.org/r/20220308125841.3271721-3-scgl@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
|
|
Split success case/copy test from error test, making them independent.
This means they do not share state and are easier to understand.
Also, new test can be added in the same manner without affecting the old
ones. In order to make that simpler, introduce functionality for the
setup of commonly used variables.
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Link: https://lore.kernel.org/r/20220308125841.3271721-2-scgl@linux.ibm.com
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
|
|
When handling the SCK instruction, the kvm lock is taken, even though
the vcpu lock is already being held. The normal locking order is kvm
lock first and then vcpu lock. This is can (and in some circumstances
does) lead to deadlocks.
The function kvm_s390_set_tod_clock is called both by the SCK handler
and by some IOCTLs to set the clock. The IOCTLs will not hold the vcpu
lock, so they can safely take the kvm lock. The SCK handler holds the
vcpu lock, but will also somehow need to acquire the kvm lock without
relinquishing the vcpu lock.
The solution is to factor out the code to set the clock, and provide
two wrappers. One is called like the original function and does the
locking, the other is called kvm_s390_try_set_tod_clock and uses
trylock to try to acquire the kvm lock. This new wrapper is then used
in the SCK handler. If locking fails, -EAGAIN is returned, which is
eventually propagated to userspace, thus also freeing the vcpu lock and
allowing for forward progress.
This is not the most efficient or elegant way to solve this issue, but
the SCK instruction is deprecated and its performance is not critical.
The goal of this patch is just to provide a simple but correct way to
fix the bug.
Fixes: 6a3f95a6b04c ("KVM: s390: Intercept SCK instruction")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Link: https://lore.kernel.org/r/20220301143340.111129-1-imbrenda@linux.ibm.com
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
|
|
Fix "no previous prototype" W=1 warning by making the function
vchiq_platform_get_arm_state() static.
While at it, realign the function declaration in one line and reposition
the asterisk symbol to fulfill the 'foo *bar' syntax.
Signed-off-by: Gaston Gonzalez <gascoar@gmail.com>
Link: https://lore.kernel.org/r/216ad30d674b80e0051ecc233ac26ddb1d3e0e75.1646255044.git.gascoar@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix the cpuintc and fixedregulator dtc warnings:
Warning (unit_address_vs_reg): /cpuintc@0: node has a unit name, but no reg property
Warning (unit_address_vs_reg): /fixedregulator@0: node has a unit name, but no reg property
Warning (unit_address_vs_reg): /fixedregulator@1: node has a unit name, but no reg property
Warning (unique_unit_address): /cpuintc@0: duplicate unit-address (also used in node /fixedregulator@0)
Remove the unnecessary status = "okay" property from the xhci node.
Reviewed-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20220312091832.6269-1-arinc.unal@arinc9.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix the GB-PC2 devicetree. Refer to the schematics of the device for more
information.
GB-PC2 devicetree fixes:
- Include mt7621.dtsi instead of gbpc1.dts. Add the missing definitions.
- Remove gpio-leds node as the system LED is not wired to anywhere on
the board and the power LED is directly wired to GND.
- Remove uart3 pin group from gpio-pinmux node as it's not used as GPIO.
- Use reg 7 for the external phy to be on par with
Documentation/devicetree/bindings/net/dsa/mt7530.txt.
- Use the status value "okay".
Link: https://github.com/ngiger/GnuBee_Docs/blob/master/GB-PCx/Documents/GB-PC2_V1.1_schematic.pdf
Reviewed-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20220311090320.3068-2-arinc.unal@arinc9.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Fix LED and pinctrl definitions on the GB-PC1 devicetree. Refer to the
schematics of the device for more information.
LED fixes:
- Change GPIO6 LED label from system to power as GPIO6 is connected to
PLED.
- Add default-on default-trigger to power LED.
- Change GPIO8 LED label from status to system as GPIO8 is connected to
SYS_LED.
- Add disk-activity default-trigger to system LED.
- Switch to the color:function naming scheme.
- Remove lan1 and lan2 LEDs as they don't exist.
Pinctrl fixes:
- Claim state_default node under pinctrl node.
- Change pinctrl0 node name to state-default.
- Change gpio node name to gpio-pinmux to respect
Documentation/devicetree/bindings/pinctrl/ralink,rt2880-pinmux.yaml.
- Sort pin groups alphabetically.
Misc fixes:
- Fix formatting.
- Use the status value "okay".
- Define hexadecimal addresses in lower case.
- Make hexadecimal addresses for memory easier to read.
Link: https://github.com/ngiger/GnuBee_Docs/blob/master/GB-PCx/Documents/GB-PC1_V1.0_Schematic.pdf
Tested-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Signed-off-by: Arınç ÜNAL <arinc.unal@arinc9.com>
Link: https://lore.kernel.org/r/20220311090320.3068-1-arinc.unal@arinc9.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Various spelling mistakes in comments.
Detected with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20220314115354.144023-8-Julia.Lawall@inria.fr
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In arch/mips/include/asm/prom.h, it should use "!CONFIG_USE_OF"
after #else and #endif.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
There exists many same definitions of device_tree_init() for various
platforms, add a weak function in arch/mips/kernel/prom.c to clean
up the related code.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
In device_tree_init(), unflatten_and_copy_device_tree() checks
initial_boot_params, so remove the redundant check.
drivers/of/fdt.c
void __init unflatten_and_copy_device_tree(void)
{
int size;
void *dt;
if (!initial_boot_params) {
pr_warn("No valid device tree found, continuing without\n");
return;
}
...
}
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
We need the staging fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
pgd page is freed by generic implementation pgd_free() since commit
f9cb654cb550 ("asm-generic: pgalloc: provide generic pgd_free()"),
however, there are scenarios that the system uses more than one page as
the pgd table, in such cases the generic implementation pgd_free() won't
be applicable anymore. For example, when PAGE_SIZE_4KB is enabled and
MIPS_VA_BITS_48 is not enabled in a 64bit system, the macro "PGD_ORDER"
will be set as "1", which will cause allocating two pages as the pgd
table. Well, at the same time, the generic implementation pgd_free()
just free one pgd page, which will result in the memory leak.
The memory leak can be easily detected by executing shell command:
"while true; do ls > /dev/null; grep MemFree /proc/meminfo; done"
Fixes: f9cb654cb550 ("asm-generic: pgalloc: provide generic pgd_free()")
Signed-off-by: Yaliang Wang <Yaliang.Wang@windriver.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
__setup() handlers should return 1 to obsolete_checksetup() in
init/main.c to indicate that the boot option has been handled.
A return of 0 causes the boot option/value to be listed as an Unknown
kernel parameter and added to init's (limited) argument or environment
strings. Also, error return codes don't mean anything to
obsolete_checksetup() -- only non-zero (usually 1) or zero.
So return 1 from setup_kmac().
Fixes: 9e21c7e40b7e ("MIPS: RB532: Replace parse_mac_addr() with mac_pton().")
Fixes: 73b4390fb234 ("[MIPS] Routerboard 532: Support for base system")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
From: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: linux-mips@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Phil Sutter <n0-1@freewrt.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Daniel Walter <dwalter@google.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
Unfortunately, Clang did not have support for "sp" as a global register
definition, and was crashing after the addition of current_stack_pointer.
This has been fixed in Clang 14, but earlier Clang versions need to
avoid this code, so add a versioned test and revert back to the
open-coded asm instances. Fixes Clang build error:
fatal error: error in backend: Invalid register name global variable
Fixes: 200ed341b864 ("mips: Implement "current_stack_pointer"")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/lkml/YikTQRql+il3HbrK@dev-arch.thelio-3990X
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Yanteng Si <siyanteng01@gmail.com>
Cc: linux-mips@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
As done with other image addresses in other architectures, use an
explicit flexible array instead of "address of char", which can trip
bounds checking done by the compiler. Found when building with
-Warray-bounds:
In file included from ./include/linux/byteorder/little_endian.h:5,
from ./arch/mips/include/uapi/asm/byteorder.h:15,
from ./arch/mips/include/asm/bitops.h:21,
from ./include/linux/bitops.h:33,
from ./include/linux/kernel.h:22,
from arch/mips/boot/compressed/decompress.c:13:
arch/mips/boot/compressed/decompress.c: In function 'decompress_kernel':
./include/asm-generic/unaligned.h:14:8: warning: array subscript -1 is outside array bounds of 'unsigned char[1]' [-Warray-bounds]
14 | __pptr->x; \
| ~~~~~~^~~
./include/uapi/linux/byteorder/little_endian.h:35:51: note: in definition of macro '__le32_to_cpu'
35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
| ^
./include/asm-generic/unaligned.h:32:21: note: in expansion of macro '__get_unaligned_t'
32 | return le32_to_cpu(__get_unaligned_t(__le32, p));
| ^~~~~~~~~~~~~~~~~
arch/mips/boot/compressed/decompress.c:29:37: note: while referencing '__image_end'
29 | extern unsigned char __image_begin, __image_end;
| ^~~~~~~~~~~
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-mips@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
The of_find_compatible_node() function returns a node pointer with
refcount incremented, We should use of_node_put() on it when done
Add the missing of_node_put() to release the refcount.
Fixes: 2121aa3e2312 ("mips: cdmm: Add mti,mips-cdmm dtb node support")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Acked-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
It's counter-intuitive (and wrong) to put the block group _before_ the
final usage in submit_eb_page. Fix it by re-ordering the call to
btrfs_put_block_group after its final reference. Also fix a minor typo
in 'implies'
Fixes: be1a1d7a5d24 ("btrfs: zoned: finish fully written block group")
CC: stable@vger.kernel.org # 5.16+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Syzbot reported a possible use-after-free in printing information
in device_list_add.
Very similar with the bug fixed by commit 0697d9a61099 ("btrfs: don't
access possibly stale fs_info data for printing duplicate device"),
but this time the use occurs in btrfs_info_in_rcu.
Call Trace:
kasan_report.cold+0x83/0xdf mm/kasan/report.c:459
btrfs_printk+0x395/0x425 fs/btrfs/super.c:244
device_list_add.cold+0xd7/0x2ed fs/btrfs/volumes.c:957
btrfs_scan_one_device+0x4c7/0x5c0 fs/btrfs/volumes.c:1387
btrfs_control_ioctl+0x12a/0x2d0 fs/btrfs/super.c:2409
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl fs/ioctl.c:860 [inline]
__x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Fix this by modifying device->fs_info to NULL too.
Reported-and-tested-by: syzbot+82650a4e0ed38f218363@syzkaller.appspotmail.com
CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In a previous patch ("btrfs: extend locking to all space_info members
accesses") the locking for the space_info members was extended in
btrfs_preempt_reclaim_metadata_space because not all the member
accesses that needed locks were actually locked (bytes_pinned et al).
It was then suggested to also add a call to lockdep_assert_held to
need_preemptive_reclaim. This function also works with space_info
members. As of now, it has only two call sites which both hold the lock.
Suggested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
There is a bug report that a bitflip in the transid part of an extent
buffer makes btrfs to reject certain tree blocks:
BTRFS error (device dm-0): parent transid verify failed on 1382301696 wanted 262166 found 22
[CAUSE]
Note the failed transid check, hex(262166) = 0x40016, while
hex(22) = 0x16.
It's an obvious bitflip.
Furthermore, the reporter also confirmed the bitflip is from the
hardware, so it's a real hardware caused bitflip, and such problem can
not be detected by the existing tree-checker framework.
As tree-checker can only verify the content inside one tree block, while
generation of a tree block can only be verified against its parent.
So such problem remain undetected.
[FIX]
Although tree-checker can not verify it at write-time, we still have a
quick (but not the most accurate) way to catch such obvious corruption.
Function csum_one_extent_buffer() is called before we submit metadata
write.
Thus it means, all the extent buffer passed in should be dirty tree
blocks, and should be newer than last committed transaction.
Using that we can catch the above bitflip.
Although it's not a perfect solution, as if the corrupted generation is
higher than the correct value, we have no way to catch it at all.
Reported-by: Christoph Anton Mitterer <calestyo@scientia.org>
Link: https://lore.kernel.org/linux-btrfs/2dfcbc130c55cc6fd067b93752e90bd2b079baca.camel@scientia.org/
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Qu Wenruo <wqu@sus,ree.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There is one oddball error handling of btrfs_read_buffer():
ret = btrfs_read_buffer(tmp, gen, parent_level - 1, &first_key);
if (!ret) {
*eb_ret = tmp;
return 0;
}
free_extent_buffer(tmp);
btrfs_release_path(p);
return -EIO;
While all other call sites check the error first. Unify the behavior.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We had an error handling pattern for read_tree_block() like this:
eb = read_tree_block();
if (IS_ERR(eb)) {
/*
* Handling error here
* Normally ended up with return or goto out.
*/
} else if (!extent_buffer_uptodate(eb)) {
/*
* Different error handling here
* Normally also ended up with return or goto out;
*/
}
This is fine, but if we want to add extra check for each
read_tree_block(), the existing if-else-if is not that expandable and
will take reader some seconds to figure out there is no extra branch.
Here we change it to a more common way, without the extra else:
eb = read_tree_block();
if (IS_ERR(eb)) {
/*
* Handling error here
*/
return eb or goto out;
}
if (!extent_buffer_uptodate(eb)) {
/*
* Different error handling here
*/
return eb or goto out;
}
This also removes some oddball call sites which uses some creative way
to check error.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
__btrfs_free_extent() does all of the hard work of updating the extent
ref items, and then at the end if we dropped the extent completely it
does the cleanup accounting work. We're going to only want to do that
work for metadata with extent tree v2, so extract this bit into its own
helper.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This is a remnant of the work I did for qgroups a long time ago to only
run for a block when we had dropped the last ref. We haven't done that
for years, but the code remains. Drop this remnant.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We duplicate this logic for both data and metadata, at this point we've
already done our type specific extent root operations, this is just
doing the accounting and removing the space from the free space tree.
Extract this common logic out into a helper.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Switch this to an ASSERT() and return the error in the normal case.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
During log replay there is this pattern of running delayed items after
every inode unlink. To avoid repeating this several times, move the
logic into an helper function and use it instead of calling
btrfs_unlink_inode() followed by btrfs_run_delayed_items().
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
bytes_pinned is always accessed under space_info->lock, except in
btrfs_preempt_reclaim_metadata_space, however the other members are
accessed under that lock. The reserved member of the rsv's are also
partially accessed under a lock and partially not. Move all these
accesses into the same lock to ensure consistency.
This could potentially race and lead to a flush instead of a commit but
it's not a big problem as it's only for preemptive flush.
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Niels Dossche <niels.dossche@ugent.be>
Signed-off-by: Niels Dossche <dossche.niels@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There is a hung_task issue with running generic/068 on an SMR
device. The hang occurs while a process is trying to thaw the
filesystem. The process is trying to take sb->s_umount to thaw the
FS. The lock is held by fsstress, which calls btrfs_sync_fs() and is
waiting for an ordered extent to finish. However, as the FS is frozen,
the ordered extents never finish.
Having an ordered extent while the FS is frozen is the root cause of
the hang. The ordered extent is initiated from btrfs_relocate_chunk()
which is called from btrfs_reclaim_bgs_work().
This commit adds sb_*_write() around btrfs_relocate_chunk() call
site. For the usual "btrfs balance" command, we already call it with
mnt_want_file() in btrfs_ioctl_balance().
Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.13+
Link: https://github.com/naota/linux/issues/56
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently we disallow reflink and dedupe if the two files aren't on the
same vfsmount. However we really only need to disallow it if they're
not on the same super block. It is very common for btrfs to have a main
subvolume that is mounted and then different subvolumes mounted at
different locations. It's allowed to reflink between these volumes, but
the vfsmount check disallows this. Instead fix dedupe to check for the
same superblock, and simply remove the vfsmount check for reflink as it
already does the superblock check.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The sb check is already done in do_clone_file_range, and the mnt check
(which will hopefully go away in a subsequent patch) is done in
ioctl_file_clone(). Remove the check in our code and put an ASSERT() to
make sure it doesn't change underneath us.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We don't need a root here, we just need the btrfs_fs_info, we can just
get the specific roots we need from fs_info.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We're passing a root around here, but we only really need the fs_info,
so fix up btrfs_clean_one_deleted_snapshot() to take an fs_info instead,
and then fix up all the callers appropriately.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When a filesystem goes read-only due to an error, multiple errors tend
to be reported, some of which are knock-on failures. Logging fs_states,
in btrfs_handle_fs_error() and btrfs_printk() helps distinguish the
first error from subsequent messages which may only exist due to an
error state.
Under the new format, most initial errors will look like:
`BTRFS: error (device loop0) in ...`
while subsequent errors will begin with:
`error (device loop0: state E) in ...`
An initial transaction abort error will look like
`error (device loop0: state A) in ...`
and subsequent messages will contain
`(device loop0: state EA) in ...`
In addition to the error states we can also print other states that are
temporary, like remounting, device replace, or indicate a global state
that may affect functionality.
Now implemented:
E - filesystem error detected
A - transaction aborted
L - log tree errors
M - remounting in progress
R - device replace in progress
C - data checksums not verified (mounted with ignoredatacsums)
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Smatch complains about a possible dereference of a pointer that was not
initialized:
CC [M] fs/btrfs/reflink.o
CHECK fs/btrfs/reflink.c
fs/btrfs/reflink.c:533 btrfs_clone() error: potentially dereferencing uninitialized 'trans'.
This is because we are not dealing with the case where the type of a file
extent has an unexpected value (not regular, not prealloc and not inline),
in which case the transaction handle pointer is not initialized.
Such unexpected type should be impossible, except in case of some memory
corruption caused either by bad hardware or some software bug causing
something like a buffer overrun.
So ASSERT that if the extent type is neither regular nor prealloc, then
it must be inline. Bail out with -EUCLEAN and a warning in case it is
not. This silences smatch.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When reflinking an inline extent, we assert that its file offset is 0 and
that its uncompressed length is not greater than the sector size. We then
return an error if one of those conditions is not satisfied. However we
use a return statement, which results in returning from btrfs_clone()
without freeing the path and buffer that were allocated before, as well as
not clearing the flag BTRFS_INODE_NO_DELALLOC_FLUSH for the destination
inode.
Fix that by jumping to the 'out' label instead, and also add a WARN_ON()
for each condition so that in case assertions are disabled, we get to
known which of the unexpected conditions triggered the error.
Fixes: a61e1e0df9f321 ("Btrfs: simplify inline extent handling when doing reflinks")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When an inode has a last_reflink_trans matching the current transaction,
we have to take special care when logging its checksums in order to
avoid getting checksum items with overlapping ranges in a log tree,
which could result in missing checksums after log replay (more on that
in the changelogs of commit 40e046acbd2f36 ("Btrfs: fix missing data
checksums after replaying a log tree") and commit e289f03ea79bbc ("btrfs:
fix corrupt log due to concurrent fsync of inodes with shared extents")).
We also need to make sure a full fsync will copy all old file extent
items it finds in modified leaves, because they might have been copied
from some other inode.
However once we fsync an inode, we don't need to keep paying the price of
that extra special care in future fsyncs done in the same transaction,
unless the inode is used for another reflink operation or the full sync
flag is set on it (truncate, failure to allocate extent maps for holes,
and other exceptional and infrequent cases).
So after we fsync an inode reset its last_unlink_trans to zero. In case
another reflink happens, we continue to update the last_reflink_trans of
the inode, just as before. Also set last_reflink_trans to the generation
of the last transaction that modified the inode whenever we need to set
the full sync flag on the inode, just like when we need to load an inode
from disk after eviction.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Doing a full fsync may require processing many leaves of metadata, which
can take some time and result in a task monopolizing a cpu for too long.
So add a cond_resched() after processing a leaf when doing a full fsync,
while not holding any locks on any tree (a subvolume or a log tree).
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When doing a full fsync, at copy_items(), we iterate over all extents and
then collect their checksums into a list. After copying all the extents to
the log tree, we then log all the previously collected checksums.
Before the previous patch in the series (subject "btrfs: stop copying old
file extents when doing a full fsync"), we had to do it this way, because
while we were iterating over the items in the leaf of the subvolume tree,
we were holding a write lock on a leaf of the log tree, so logging the
checksums for an extent right after we collected them could result in a
deadlock, in case the checksum items ended up in the same leaf.
However after the previous patch in the series we now do a first iteration
over all the items in the leaf of the subvolume tree before locking a path
in the log tree, so we can now log the checksums right after we have
obtained them. This avoids holding in memory all checksums for all extents
in the leaf while copying items from the source leaf to the log tree. The
amount of memory used to hold all checksums of the extents in a leaf can
be significant. For example if a leaf has 200 file extent items referring
to 1M extents, using the default crc32c checksums, would result in using
over 200K of memory (not accounting for the extra overhead of struct
btrfs_ordered_sum), with smaller or less extents it would be less, but
it could be much more with more extents per leaf and/or much larger
extents.
So change copy_items() to log the checksums for an extent after looking
them up, and then free their memory, as they are no longer necessary.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When logging an inode in full sync mode, we go over every leaf that was
modified in the current transaction and has items associated to our inode,
and then copy all those items into the log tree. This includes copying
file extent items that were created and added to the inode in past
transactions, which is useless and only makes use more leaf space in the
log tree.
It's common to have a file with many file extent items spanning many
leaves where only a few file extent items are new and need to be logged,
and in such case we log all the file extent items we find in the modified
leaves.
So change the full sync behaviour to skip over file extent items that are
not needed. Those are the ones that match the following criteria:
1) Have a generation older than the current transaction and the inode
was not a target of a reflink operation, as that can copy file extent
items from a past generation from some other inode into our inode, so
we have to log them;
2) Start at an offset within i_size - we must log anything at or beyond
i_size, otherwise we would lose prealloc extents after log replay.
The following script exercises a scenario where this happens, and it's
somehow close enough to what happened often on a SQL Server workload which
I had to debug sometime ago to fix an issue where a pattern of writes to
prealloc extents and fsync resulted in fsync failing with -EIO (that was
commit ea7036de0d36c4 ("btrfs: fix fsync failure and transaction abort
after writes to prealloc extents")). In that particular case, we had large
files that had random writes and were often truncated, which made the
next fsync be a full sync.
$ cat test.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
MKFS_OPTIONS="-O no-holes -R free-space-tree"
MOUNT_OPTIONS="-o ssd"
FILE_SIZE=$((1 * 1024 * 1024 * 1024)) # 1G
# FILE_SIZE=$((2 * 1024 * 1024 * 1024)) # 2G
# FILE_SIZE=$((512 * 1024 * 1024)) # 512M
mkfs.btrfs -f $MKFS_OPTIONS $DEV
mount $MOUNT_OPTIONS $DEV $MNT
# Create a file with many extents. Use direct IO to make it faster
# to create the file - using buffered IO we would have to fsync
# after each write (terribly slow).
echo "Creating file with $((FILE_SIZE / 4096)) extents of 4K each..."
xfs_io -f -d -c "pwrite -b 4K 0 $FILE_SIZE" $MNT/foobar
# Commit the transaction, so every extent after this is from an
# old generation.
sync
# Now rewrite only a few extents, which are all far spread apart from
# each other (e.g. 1G / 32M = 32 extents).
# After this only a few extents have a new generation, while all other
# ones have an old generation.
echo "Rewriting $((FILE_SIZE / (32 * 1024 * 1024))) extents..."
for ((i = 0; i < $FILE_SIZE; i += $((32 * 1024 * 1024)))); do
xfs_io -c "pwrite $i 4K" $MNT/foobar >/dev/null
done
# Fsync, the inode logged in full sync mode since it was never fsynced
# before.
echo "Fsyncing file..."
xfs_io -c "fsync" $MNT/foobar
umount $MNT
And the following bpftrace program was running when executing the test
script:
$ cat bpf-script.sh
#!/usr/bin/bpftrace
k:btrfs_log_inode
{
@start_log_inode[tid] = nsecs;
}
kr:btrfs_log_inode
/@start_log_inode[tid]/
{
@log_inode_dur[tid] = (nsecs - @start_log_inode[tid]) / 1000;
delete(@start_log_inode[tid]);
}
k:btrfs_sync_log
{
@start_sync_log[tid] = nsecs;
}
kr:btrfs_sync_log
/@start_sync_log[tid]/
{
$sync_log_dur = (nsecs - @start_sync_log[tid]) / 1000;
printf("btrfs_log_inode() took %llu us\n", @log_inode_dur[tid]);
printf("btrfs_sync_log() took %llu us\n", $sync_log_dur);
delete(@start_sync_log[tid]);
delete(@log_inode_dur[tid]);
exit();
}
With 512M test file, before this patch:
btrfs_log_inode() took 15218 us
btrfs_sync_log() took 1328 us
Log tree has 17 leaves and 1 node, its total size is 294912 bytes.
With 512M test file, after this patch:
btrfs_log_inode() took 14760 us
btrfs_sync_log() took 588 us
Log tree has a single leaf, its total size is 16K.
With 1G test file, before this patch:
btrfs_log_inode() took 27301 us
btrfs_sync_log() took 1767 us
Log tree has 33 leaves and 1 node, its total size is 557056 bytes.
With 1G test file, after this patch:
btrfs_log_inode() took 26166 us
btrfs_sync_log() took 593 us
Log tree has a single leaf, its total size is 16K
With 2G test file, before this patch:
btrfs_log_inode() took 50892 us
btrfs_sync_log() took 3127 us
Log tree has 65 leaves and 1 node, its total size is 1081344 bytes.
With 2G test file, after this patch:
btrfs_log_inode() took 50126 us
btrfs_sync_log() took 586 us
Log tree has a single leaf, its total size is 16K.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The submit helper will always run bio_endio() on the bio if it fails to
submit, so cleaning up the bio just leads to a variety of use-after-free
and NULL pointer dereference bugs because we race with the endio
function that is cleaning up the bio. Instead just return BLK_STS_OK as
the repair function has to continue to process the rest of the pages,
and the endio for the repair bio will do the appropriate cleanup for the
page that it was given.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If we fail to submit a bio for whatever reason, we may not have setup a
mirror_num for that bio. This means we shouldn't try to do the repair
workflow, if we do we'll hit an BUG_ON(!failrec->this_mirror) in
clean_io_failure. Instead simply skip the repair workflow if we have no
mirror set, and add an assert to btrfs_check_repairable() to make it
easier to catch what is happening in the future.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|