Age | Commit message (Collapse) | Author |
|
DI(Disable Interrupt) and EI(Enable Interrupt) instructions is required by
MIPSR2/MIPSR6, however, it appears to be buggy on some processors such as
Loongson-3A1000. Thus we make it as a config option to allow disable it at
compile time with CPU_MIPSR2 selected.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Paul Burton <paulburton@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: chenhc@lemote.com
Cc: paul.burton@mips.com
Cc: linux-kernel@vger.kernel.org
|
|
Old code in the kernel uses 1-byte and 0-byte arrays to indicate the
presence of a "variable length array":
struct something {
int length;
u8 data[1];
};
struct something *instance;
instance = kmalloc(sizeof(*instance) + size, GFP_KERNEL);
instance->length = size;
memcpy(instance->data, source, size);
There is also 0-byte arrays. Both cases pose confusion for things like
sizeof(), CONFIG_FORTIFY_SOURCE, etc.[1] Instead, the preferred mechanism
to declare variable-length types such as the one above is a flexible array
member[2] which need to be the last member of a structure and empty-sized:
struct something {
int stuff;
u8 data[];
};
Also, by making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertenly introduced[3] to the codebase from now on.
Lastly, make use of the struct_size() helper to safely calculate the
allocation size for instances of struct n_hdlc_buf and avoid any potential
type mistakes[4][5].
[1] https://github.com/KSPP/linux/issues/21
[2] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")
[4] https://lore.kernel.org/lkml/60e14fb7-8596-e21c-f4be-546ce39e7bdb@embeddedor.com/
[5] commit 553d66cb1e86 ("iommu/vt-d: Use struct_size() helper")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Jiri Slaby <jslaby@suse.cz>
Link: https://lore.kernel.org/r/20200121172138.GA3162@embeddedor
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There is a spelling mistake in a dev_err message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Paul Burton <paulburton@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@vger.kernel.org
Cc: kernel-janitors@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
|
|
Somehow these LL/SC usages are not taken care of, breaking Loongson
builds. Add the SYNCs appropriately.
Signed-off-by: Wang Xuerui <git@xen0n.name>
Signed-off-by: Paul Burton <paulburton@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: Huacai Chen <chenhc@lemote.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: <stable@vger.kernel.org> # v5.5+
|
|
Instead of using the legacy GPIO API and keeping track on
polarity inversion semantics in the driver, switch to use
GPIO descriptors for this driver and change all consumers
in the process.
This makes it possible to retire platform data completely:
the only remaining platform data member was "wakeup" which
was intended to make the vbus interrupt wakeup capable,
but was not set by any users and thus remained unused. VBUS
was not waking any devices up. Leave a comment about it so
later developers using the platform can consider setting it
to always enabled so plugging in USB wakes up the platform.
Cc: Daniel Mack <daniel@zonque.org>
Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Felipe Balbi <balbi@kernel.org>
Acked-by: Sylwester Nawrocki <snawrocki@kernel.org>
Acked-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Link: https://lore.kernel.org/r/20200123155013.93249-1-linus.walleij@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There is a spelling mistake in a deb_dbg message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20200123010344.2834618-1-colin.king@canonical.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Remove 'fs_func struct' and change indirect calls to direct calls.
The following issues are described in exfat's TODO.
> Create helper function for exfat_set_entry_time () and
> exfat_set_entry_type () because it's sort of ugly to be calling the same functionn directly and other code calling through the fs_func struc ponters ...
The fs_func struct was used for switching the helper functions of fat16/fat32/exfat.
Now, it has lost the role of switching, just making the code less readable.
Signed-off-by: Tetsuhiro Kohada <Kohada.Tetsuhiro@dc.MitsubishiElectric.co.jp>
Link: https://lore.kernel.org/r/20200123102445.123033-1-Kohada.Tetsuhiro@dc.MitsubishiElectric.co.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In wilc_wlan_handle_txq(), mutex unlock was called without acquiring
it. Also error code for full VMM condition was incorrect as discussed in
[1]. Now used a proper code to indicate VMM is full, for which transfer
to VMM is required again. 'wilc_wlan_handle_txq()' should be called
again if the VMM space was full earlier or otherwise based on
'txq_event' signal.
1. https://lore.kernel.org/driverdev-devel/20191113183322.a54mh2w6dulklgsd@kili.mountain/
Signed-off-by: Ajay Singh <ajay.kathat@microchip.com>
Link: https://lore.kernel.org/r/20200123182129.4053-2-ajay.kathat@microchip.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Some of the HIF layer API's return zero for failure and non-zero for
success condition. Now, modified the functions to return zero for success
and non-zero for failure as its recommended approach suggested in [1].
1. https://lore.kernel.org/driverdev-devel/20191113183322.a54mh2w6dulklgsd@kili.mountain/
Signed-off-by: Ajay Singh <ajay.kathat@microchip.com>
Link: https://lore.kernel.org/r/20200123182129.4053-1-ajay.kathat@microchip.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to
unsafe_put_user()") I changed filldir to not do individual __put_user()
accesses, but instead use unsafe_put_user() surrounded by the proper
user_access_begin/end() pair.
That make them enormously faster on modern x86, where the STAC/CLAC
games make individual user accesses fairly heavy-weight.
However, the user_access_begin() range was not really the exact right
one, since filldir() has the unfortunate problem that it needs to not
only fill out the new directory entry, it also needs to fix up the
previous one to contain the proper file offset.
It's unfortunate, but the "d_off" field in "struct dirent" is _not_ the
file offset of the directory entry itself - it's the offset of the next
one. So we end up backfilling the offset in the previous entry as we
walk along.
But since x86 didn't really care about the exact range, and used to be
the only architecture that did anything fancy in user_access_begin() to
begin with, the filldir[64]() changes did something lazy, and even
commented on it:
/*
* Note! This range-checks 'previous' (which may be NULL).
* The real range was checked in getdents
*/
if (!user_access_begin(dirent, sizeof(*dirent)))
goto efault;
and it all worked fine.
But now 32-bit ppc is starting to also implement user_access_begin(),
and the fact that we faked the range to only be the (possibly not even
valid) previous directory entry becomes a problem, because ppc32 will
actually be using the range that is passed in for more than just "check
that it's user space".
This is a complete rewrite of Christophe's original patch.
By saving off the record length of the previous entry instead of a
pointer to it in the filldir data structures, we can simplify the range
check and the writing of the previous entry d_off field. No need for
any conditionals in the user accesses themselves, although we retain the
conditional EINTR checking for the "was this the first directory entry"
signal handling latency logic.
Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
Link: https://lore.kernel.org/lkml/a02d3426f93f7eb04960a4d9140902d278cab0bb.1579697910.git.christophe.leroy@c-s.fr/
Link: https://lore.kernel.org/lkml/408c90c4068b00ea8f1c41cca45b84ec23d4946b.1579783936.git.christophe.leroy@c-s.fr/
Reported-and-tested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 8a23eb804ca4 ("Make filldir[64]() verify the directory entry
filename is valid") added some minimal validity checks on the directory
entries passed to filldir[64](). But they really were pretty minimal.
This fleshes out at least the name length check: we used to disallow
zero-length names, but really, negative lengths or oevr-long names
aren't ok either. Both could happen if there is some filesystem
corruption going on.
Now, most filesystems tend to use just an "unsigned char" or similar for
the length of a directory entry name, so even with a corrupt filesystem
you should never see anything odd like that. But since we then use the
name length to create the directory entry record length, let's make sure
it actually is half-way sensible.
Note how POSIX states that the size of a path component is limited by
NAME_MAX, but we actually use PATH_MAX for the check here. That's
because while NAME_MAX is generally the correct maximum name length
(it's 255, for the same old "name length is usually just a byte on
disk"), there's nothing in the VFS layer that really cares.
So the real limitation at a VFS layer is the total pathname length you
can pass as a filename: PATH_MAX.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
With "[PATCHv3] w1: omap-hdq: Simplify driver with PM runtime autosuspend"
we can read the droid4 battery information over 1-wire with this patch
with something like:
# modprobe omap_hdq
# hd /sys/bus/w1/devices/89-*/89-*/nvmem
...
Unfortunately the format of the battery data seems to be Motorola specific
and is currently unusable for battery charger unless somebody figures out
what it means.
Note that currently keeping omap_hdq module loaded will cause extra power
consumption as it seems to scan devices periodically.
Cc: Merlijn Wajer <merlijn@wizzup.org>
Cc: Pavel Machek <pavel@ucw.cz>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
We added coulomb counter calibration support With commit 0cb90f071f73
("power: supply: cpcap-battery: Add basic coulomb counter calibrate
support"), but we also need to configure the related interrupt.
Without the interrupt calibration happens based on a timeout after two
seconds, with the interrupt the calibration just gets done a bit faster.
Cc: Merlijn Wajer <merlijn@wizzup.org>
Cc: Pavel Machek <pavel@ucw.cz>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
This seems to be similar to what we have for am335x. The following can be
tested via sysfs with the to ensure the SGX module gets enabled and disabled
properly:
# echo on > /sys/bus/platform/devices/5600fe00.target-module/power/control
# rwmem 0x5600fe00 # revision register
0x5600fe00 = 0x40000000
# echo auto > /sys/bus/platform/devices/5600fe00.target-module/power/control
# rwmem 0x5000fe00
Bus error
Note that this patch depends on the PRM rstctrl driver that has
been recently posted. If the child device driver(s) need to prevent
rstctrl reset on PM runtime suspend, the drivers need to increase
the usecount for the shared rstctrl reset that can be mapped also
for the child device(s) or accessed via dev->parent.
Cc: Adam Ford <aford173@gmail.com>
Cc: Filip Matijević <filip.matijevic.pz@gmail.com>
Cc: "H. Nikolaus Schaller" <hns@goldelico.com>
Cc: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
Cc: moaz korena <moaz@korena.xyz>
Cc: Merlijn Wajer <merlijn@wizzup.org>
Cc: Paweł Chmiel <pawel.mikolaj.chmiel@gmail.com>
Cc: Philipp Rossak <embed3d@gmail.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
I've tested that the interconnect target module enables and idles
just fine when probed with ti-sysc with PM runtime control via sys:
# echo on > $(find /sys -name control | grep \/5600)
# rwmem 0x5600fe00 # OCP Revision
0x5600fe00 = 0x40000000
# echo auto > $(find /sys -name control | grep \/5600)
Cc: "H. Nikolaus Schaller" <hns@goldelico.com>
Cc: Robert Nelson <robertcnelson@gmail.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
The following can be tested via sysfs with the following to ensure the SGX
module gets enabled and disabled properly:
# echo on > /sys/bus/platform/devices/5600fe00.target-module/power/control
# rwmem 0x5600fe00 # revision register
0x5600fe00 = 0x40000000
# echo auto > /sys/bus/platform/devices/5600fe00.target-module/power/control
# rwmem 0x5000fe00
Bus error
Note that this patch depends on the PRM rstctrl driver that has
been recently posted. If the child device driver(s) need to prevent
rstctrl reset on PM runtime suspend, the drivers need to increase
the usecount for the shared rstctrl reset that can be mapped also
for the child device(s) or accessed via dev->parent.
Cc: Adam Ford <aford173@gmail.com>
Cc: Filip Matijević <filip.matijevic.pz@gmail.com>
Cc: "H. Nikolaus Schaller" <hns@goldelico.com>
Cc: Ivaylo Dimitrov <ivo.g.dimitrov.75@gmail.com>
Cc: moaz korena <moaz@korena.xyz>
Cc: Merlijn Wajer <merlijn@wizzup.org>
Cc: Paweł Chmiel <pawel.mikolaj.chmiel@gmail.com>
Cc: Philipp Rossak <embed3d@gmail.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
|
|
Add VPE node as a child of l4 interconnect in order for it to probe
using ti-sysc.
Signed-off-by: Benoit Parrot <bparrot@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
Add clkctrl nodes for VPE module.
Note that because of the current dts node name dependency for mapping to
clock domain, we must still use "vpe-clkctrl@" naming instead of generic
"clock@" naming for the node. And because of this, it's probably best to
apply the dts node addition together along with the other clock changes.
Signed-off-by: Benoit Parrot <bparrot@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
Add VPFE device nodes entries.
Add OmniVision OV2659 sensor device nodes and linkage.
Since Rev1.2a on this board the sensor source clock (xvclk) has a
dedicated 12Mhz oscillator instead of using clkout1.
Add 'audio_mstrclk' fixed clock object to represent it.
Signed-off-by: Benoit Parrot <bparrot@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
Add VPFE device nodes entries.
Add OmniVision OV2659 sensor device nodes and linkage.
The sensor clock (xvclk) is sourced from clkout1.
Add clock entries to properly select clkout1 and set its parent
clock to sys_clkin_ck.
Signed-off-by: Benoit Parrot <bparrot@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
clkout1 clock node and its generation tree was missing. Add this based
on the data on TRM and PRCM functional spec.
commit 664ae1ab2536 ("ARM: dts: am43xx: add clkctrl nodes") effectively
reverted this commit 8010f13a40d3 ("ARM: dts: am43xx: add support for
clkout1 clock") which is needed for the ov2659 camera sensor clock
definition hence it is being re-applied here.
Note that because of the current dts node name dependency for mapping to
clock domain, we must still use "clkout1-*ck" naming instead of generic
"clock@" naming for the node. And because of this, it's probably best to
apply the dts node addition together along with the other clock changes.
Fixes: 664ae1ab2536 ("ARM: dts: am43xx: add clkctrl nodes")
Signed-off-by: Tero Kristo <t-kristo@ti.com>
Tested-by: Benoit Parrot <bparrot@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Benoit Parrot <bparrot@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
Currently f2fs stats are only available from /d/f2fs/status. This patch
adds some of the f2fs stats to sysfs so that they are accessible even
when debugfs is not mounted.
The following sysfs nodes are added:
-/sys/fs/f2fs/<disk>/free_segments
-/sys/fs/f2fs/<disk>/cp_foreground_calls
-/sys/fs/f2fs/<disk>/cp_background_calls
-/sys/fs/f2fs/<disk>/gc_foreground_calls
-/sys/fs/f2fs/<disk>/gc_background_calls
-/sys/fs/f2fs/<disk>/moved_blocks_foreground
-/sys/fs/f2fs/<disk>/moved_blocks_background
-/sys/fs/f2fs/<disk>/avg_vblocks
Signed-off-by: Hridya Valsaraju <hridya@google.com>
[Jaegeuk Kim: allow STAT_FS without DEBUG_FS]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Since btrfs was migrated to use the generic VFS helpers for clone and
deduplication, it stopped allowing for the last block of a file to be
deduplicated when the source file size is not sector size aligned (when
eof is somewhere in the middle of the last block). There are two reasons
for that:
1) The generic code always rounds down, to a multiple of the block size,
the range's length for deduplications. This means we end up never
deduplicating the last block when the eof is not block size aligned,
even for the safe case where the destination range's end offset matches
the destination file's size. That rounding down operation is done at
generic_remap_check_len();
2) Because of that, the btrfs specific code does not expect anymore any
non-aligned range length's for deduplication and therefore does not
work if such nona-aligned length is given.
This patch addresses that second part, and it depends on a patch that
fixes generic_remap_check_len(), in the VFS, which was submitted ealier
and has the following subject:
"fs: allow deduplication of eof block into the end of the destination file"
These two patches address reports from users that started seeing lower
deduplication rates due to the last block never being deduplicated when
the file size is not aligned to the filesystem's block size.
Link: https://lore.kernel.org/linux-btrfs/2019-1576167349.500456@svIo.N5dq.dFFD/
CC: stable@vger.kernel.org # 5.1+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We always round down, to a multiple of the filesystem's block size, the
length to deduplicate at generic_remap_check_len(). However this is only
needed if an attempt to deduplicate the last block into the middle of the
destination file is requested, since that leads into a corruption if the
length of the source file is not block size aligned. When an attempt to
deduplicate the last block into the end of the destination file is
requested, we should allow it because it is safe to do it - there's no
stale data exposure and we are prepared to compare the data ranges for
a length not aligned to the block (or page) size - in fact we even do
the data compare before adjusting the deduplication length.
After btrfs was updated to use the generic helpers from VFS (by commit
34a28e3d77535e ("Btrfs: use generic_remap_file_range_prep() for cloning
and deduplication")) we started to have user reports of deduplication
not reflinking the last block anymore, and whence users getting lower
deduplication scores. The main use case is deduplication of entire
files that have a size not aligned to the block size of the filesystem.
We already allow cloning the last block to the end (and beyond) of the
destination file, so allow for deduplication as well.
Link: https://lore.kernel.org/linux-btrfs/2019-1576167349.500456@svIo.N5dq.dFFD/
CC: stable@vger.kernel.org # 5.1+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This patch fixes the following sparse errors:
kernel/module.c:3623:9: error: incompatible types in comparison expression
kernel/module.c:4060:41: error: incompatible types in comparison expression
kernel/module.c:4203:28: error: incompatible types in comparison expression
kernel/module.c:4225:41: error: incompatible types in comparison expression
Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
|
|
Should work properly with the latest sbios on 5.5 and newer
kernels.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add device nodes for CSI2 camera board OV5640.
Add the CAL port nodes with the necessary linkage to the ov5640 nodes.
Signed-off-by: Benoit Parrot <bparrot@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
Add the required dtsi node to support the Camera
Adaptation Layer (CAL) for the DRA76 family of devices.
Signed-off-by: Benoit Parrot <bparrot@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
Add device nodes for CSI2 camera board OV5640.
Add the CAL port nodes with the necessary linkage to the ov5640 nodes.
Signed-off-by: Benoit Parrot <bparrot@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
This patch adds the required dtsi node to support the Camera
Adaptation Layer (CAL) for the DRA72 family of devices.
Signed-off-by: Benoit Parrot <bparrot@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
Add CAM nodes as a child of l4 interconnect in order for it to probe
using ti-sysc.
Signed-off-by: Benoit Parrot <bparrot@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
Both CAL and VIP rely on this clock domain. But CAL DPHY require
LVDSRX_96M_GFCLK to be active. When this domain is set to HWSUP the
LVDSRX_96M_GFCLK is on;y active when VIP1 clock is also active. If only
CAL on DRA72x (which uses the VIP2 clkctrl) probes the CAM domain is
enabled but the LVDSRX_96M_GFCLK is left gated. Since LVDSRX_96M_GFCLK
is sourcing the input clock to the DPHY then actual frame capture cannot
start as the phy are inactive.
So we either have to also enabled VIP1 even if we don't intend on using
it or we need to set the CAM domain to use SWSUP only.
This patch implements the latter.
Signed-off-by: Benoit Parrot <bparrot@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
Add clkctrl nodes for CAM domain.
Note that because of the current dts node name dependency for mapping to
clock domain, we must still use "cam-clkctrl@" naming instead of generic
"clock@" naming for the node. And because of this, it's probably best to
apply the dts node addition together along with the other clock changes.
Signed-off-by: Benoit Parrot <bparrot@ti.com>
Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
Extents are cached in read_extent_tree_block(); as a result, extents
are not cached for inodes with depth == 0 when we try to find the
extent using ext4_find_extent(). The result of the lookup is cached
in ext4_map_blocks() but is only a subset of the extent on disk. As a
result, the contents of extents status cache can get very badly
fragmented for certain workloads, such as a random 4k read workload.
File size of /mnt/test is 33554432 (8192 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 8191: 40960.. 49151: 8192: last,eof
$ perf record -e 'ext4:ext4_es_*' /root/bin/fio --name=t --direct=0 --rw=randread --bs=4k --filesize=32M --size=32M --filename=/mnt/test
$ perf script | grep ext4_es_insert_extent | head -n 10
fio 131 [000] 13.975421: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [494/1) mapped 41454 status W
fio 131 [000] 13.975939: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6064/1) mapped 47024 status W
fio 131 [000] 13.976467: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6907/1) mapped 47867 status W
fio 131 [000] 13.976937: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3850/1) mapped 44810 status W
fio 131 [000] 13.977440: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3292/1) mapped 44252 status W
fio 131 [000] 13.977931: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6882/1) mapped 47842 status W
fio 131 [000] 13.978376: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3117/1) mapped 44077 status W
fio 131 [000] 13.978957: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [2896/1) mapped 43856 status W
fio 131 [000] 13.979474: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [7479/1) mapped 48439 status W
Fix this by caching the extents for inodes with depth == 0 in
ext4_find_extent().
[ Renamed ext4_es_cache_extents() to ext4_cache_extents() since this
newly added function is not in extents_cache.c, and to avoid
potential visual confusion with ext4_es_cache_extent(). -TYT ]
Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com>
Link: https://lore.kernel.org/r/20191106122502.19986-1-dmonakhov@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
omap-for-v5.6/ti-sysc-drop-pdata
|
|
We can now probe devices with ti-sysc interconnect driver and dts
data. Let's drop the related platform data and custom ti,hwmods
dts property.
As we're just dropping data, and the early platform data init
is based on the custom ti,hwmods property, we want to drop both
the platform data and ti,hwmods property in a single patch.
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
We can now probe devices with ti-sysc interconnect driver and dts
data. Let's drop the related platform data and custom ti,hwmods
dts property.
As we're just dropping data, and the early platform data init
is based on the custom ti,hwmods property, we want to drop both
the platform data and ti,hwmods property in a single patch.
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
We can now probe devices with ti-sysc interconnect driver and dts
data. Let's drop the related platform data and custom ti,hwmods
dts property.
As we're just dropping data, and the early platform data init
is based on the custom ti,hwmods property, we want to drop both
the platform data and ti,hwmods property in a single patch.
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
We can now probe devices with device tree only configuration using
ti-sysc interconnect target module driver. Let's configure the
module, but keep the legacy "ti,hwmods" peroperty to avoid new boot
time warnings. The legacy property will be removed in later patches
together with the legacy platform data.
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
We can now probe devices with device tree only configuration using
ti-sysc interconnect target module driver. Let's configure the
module, but keep the legacy "ti,hwmods" peroperty to avoid new boot
time warnings. The legacy property will be removed in later patches
together with the legacy platform data.
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
|
|
Sometimes when running generic/475 we would trip the
WARN_ON(cache->reserved) check when free'ing the block groups on umount.
This is because sometimes we don't commit the transaction because of IO
errors and thus do not cleanup the tree logs until at umount time.
These blocks are still reserved until they are cleaned up, but they
aren't cleaned up until _after_ we do the free block groups work. Fix
this by moving the free after free'ing the fs roots, that way all of the
tree logs are cleaned up and we have a properly cleaned fs. A bunch of
loops of generic/475 confirmed this fixes the problem.
CC: stable@vger.kernel.org # 4.9+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Current code doesn't correctly handle the situation which arises when
a file system that has METADATA_UUID_INCOMPAT flag set and has its FSID
changed to the one in metadata uuid. This causes the incompat flag to
disappear.
In case of a power failure we could end up in a situation where part of
the disks in a multi-disk filesystem are correctly reverted to
METADATA_UUID_INCOMPAT flag unset state, while others have
METADATA_UUID_INCOMPAT set and CHANGING_FSID_V2_IN_PROGRESS.
This patch corrects the behavior required to handle the case where a
disk of the second type is scanned first, creating the necessary
btrfs_fs_devices. Subsequently, when a disk which has already completed
the transition is scanned it should overwrite the data in
btrfs_fs_devices.
Reported-by: Su Yue <Damenly_Su@gmx.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There is one more cases which isn't handled by the original metadata
uuid work. Namely, when a filesystem has METADATA_UUID incompat bit and
the user decides to change the FSID to the original one e.g. have
metadata_uuid and fsid match. In case of power failure while this
operation is in progress we could end up in a situation where some of
the disks have the incompat bit removed and the other half have both
METADATA_UUID_INCOMPAT and FSID_CHANGING_IN_PROGRESS flags.
This patch handles the case where a disk that has successfully changed
its FSID such that it equals METADATA_UUID is scanned first.
Subsequently when a disk with both
METADATA_UUID_INCOMPAT/FSID_CHANGING_IN_PROGRESS flags is scanned
find_fsid_changed won't be able to find an appropriate btrfs_fs_devices.
This is done by extending find_fsid_changed to correctly find
btrfs_fs_devices whose metadata_uuid/fsid are the same and they match
the metadata_uuid of the currently scanned device.
Fixes: cc5de4e70256 ("btrfs: Handle final split-brain possibility during fsid change")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reported-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
find_fsid became rather hairy with the introduction of metadata uuid
changing feature. Alleviate this by factoring out the metadata uuid
specific code in a dedicated function which deals with finding
correct fsid for a device with changed uuid.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Since find_fsid_inprogress should also handle the case in which an fs
didn't change its FSID make it call find_fsid directly. This makes the
code in device_list_add simpler by eliminating a conditional call of
find_fsid. No functional changes.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Recently fsstress (from fstests) sporadically started to trigger an
infinite loop during fsync operations. This turned out to be because
support for the rename exchange and whiteout operations was added to
fsstress in fstests. These operations, unlike any others in fsstress,
cause file names to be reused, whence triggering this issue. However
it's not necessary to use rename exchange and rename whiteout operations
trigger this issue, simple rename operations and file creations are
enough to trigger the issue.
The issue boils down to when we are logging inodes that conflict (that
had the name of any inode we need to log during the fsync operation), we
keep logging them even if they were already logged before, and after
that we check if there's any other inode that conflicts with them and
then add it again to the list of inodes to log. Skipping already logged
inodes fixes the issue.
Consider the following example:
$ mkfs.btrfs -f /dev/sdb
$ mount /dev/sdb /mnt
$ mkdir /mnt/testdir # inode 257
$ touch /mnt/testdir/zz # inode 258
$ ln /mnt/testdir/zz /mnt/testdir/zz_link
$ touch /mnt/testdir/a # inode 259
$ sync
# The following 3 renames achieve the same result as a rename exchange
# operation (<rename_exchange> /mnt/testdir/zz_link to /mnt/testdir/a).
$ mv /mnt/testdir/a /mnt/testdir/a/tmp
$ mv /mnt/testdir/zz_link /mnt/testdir/a
$ mv /mnt/testdir/a/tmp /mnt/testdir/zz_link
# The following rename and file creation give the same result as a
# rename whiteout operation (<rename_whiteout> zz to a2).
$ mv /mnt/testdir/zz /mnt/testdir/a2
$ touch /mnt/testdir/zz # inode 260
$ xfs_io -c fsync /mnt/testdir/zz
--> results in the infinite loop
The following steps happen:
1) When logging inode 260, we find that its reference named "zz" was
used by inode 258 in the previous transaction (through the commit
root), so inode 258 is added to the list of conflicting indoes that
need to be logged;
2) After logging inode 258, we find that its reference named "a" was
used by inode 259 in the previous transaction, and therefore we add
inode 259 to the list of conflicting inodes to be logged;
3) After logging inode 259, we find that its reference named "zz_link"
was used by inode 258 in the previous transaction - we add inode 258
to the list of conflicting inodes to log, again - we had already
logged it before at step 3. After logging it again, we find again
that inode 259 conflicts with him, and we add again 259 to the list,
etc - we end up repeating all the previous steps.
So fix this by skipping logging of conflicting inodes that were already
logged.
Fixes: 6b5fc433a7ad67 ("Btrfs: fix fsync after succession of renames of different files")
CC: stable@vger.kernel.org # 5.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If we abort a transaction we have the following sequence
if (!trans->dirty && list_empty(&trans->new_bgs))
return;
WRITE_ONCE(trans->transaction->aborted, err);
The idea being if we didn't modify anything with our trans handle then
we don't really need to abort the whole transaction, maybe the other
trans handles are fine and we can carry on.
However in the case of create_snapshot we add a pending_snapshot object
to our transaction and then commit the transaction. We don't actually
modify anything. sync() behaves the same way, attach to an existing
transaction and commit it. This means that if we have an IO error in
the right places we could abort the committing transaction with our
trans->dirty being not set and thus not set transaction->aborted.
This is a problem because in the create_snapshot() case we depend on
pending->error being set to something, or btrfs_commit_transaction
returning an error.
If we are not the trans handle that gets to commit the transaction, and
we're waiting on the commit to happen we get our return value from
cur_trans->aborted. If this was not set to anything because sync() hit
an error in the transaction commit before it could modify anything then
cur_trans->aborted would be 0. Thus we'd return 0 from
btrfs_commit_transaction() in create_snapshot.
This is a problem because we then try to do things with
pending_snapshot->snap, which will be NULL because we didn't create the
snapshot, and then we'll get a NULL pointer dereference like the
following
"BUG: kernel NULL pointer dereference, address: 00000000000001f0"
RIP: 0010:btrfs_orphan_cleanup+0x2d/0x330
Call Trace:
? btrfs_mksubvol.isra.31+0x3f2/0x510
btrfs_mksubvol.isra.31+0x4bc/0x510
? __sb_start_write+0xfa/0x200
? mnt_want_write_file+0x24/0x50
btrfs_ioctl_snap_create_transid+0x16c/0x1a0
btrfs_ioctl_snap_create_v2+0x11e/0x1a0
btrfs_ioctl+0x1534/0x2c10
? free_debug_processing+0x262/0x2a3
do_vfs_ioctl+0xa6/0x6b0
? do_sys_open+0x188/0x220
? syscall_trace_enter+0x1f8/0x330
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x4a/0x1b0
In order to fix this we need to make sure anybody who calls
commit_transaction has trans->dirty set so that they properly set the
trans->transaction->aborted value properly so any waiters know bad
things happened.
This was found while I was running generic/475 with my modified
fsstress, it reproduced within a few runs. I ran with this patch all
night and didn't see the problem again.
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
If we fsync on a subvolume and create a log root for that volume, and
then later delete that subvolume we'll never clean up its log root. Fix
this by making switch_commit_roots free the log for any dropped roots we
encounter. The extra churn is because we need a btrfs_trans_handle, not
the btrfs_transaction.
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
New sysfs attributes that track the filesystem status of devices, stored
in the per-filesystem directory in /sys/fs/btrfs/FSID/devinfo . There's
a directory for each device, with name corresponding to the numerical
device id.
in_fs_metadata - device is in the list of fs metadata
missing - device is missing (no device node or block device)
replace_target - device is target of replace
writeable - writes from fs are allowed
These attributes reflect the state of the device::dev_state and created
at mount time.
Sample output:
$ pwd
/sys/fs/btrfs/6e1961f1-5918-4ecc-a22f-948897b409f7/devinfo/1/
$ ls
in_fs_metadata missing replace_target writeable
$ cat missing
0
The output from these attributes are 0 or 1. 0 indicates unset and 1
indicates set. These attributes are readonly.
It is observed that the device delete thread and sysfs read thread will
not race because the delete thread calls sysfs kobject_put() which in
turn waits for existing sysfs read to complete.
Note for device replace devid swap:
During the replace the target device temporarily assumes devid 0 before
assigning the devid of the soruce device.
In btrfs_dev_replace_finishing() we remove source sysfs devid using the
function btrfs_sysfs_remove_devices_attr(), so after that call
kobject_rename() to update the devid in the sysfs. This adds and calls
btrfs_sysfs_update_devid() helper function to update the device id.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
|