summaryrefslogtreecommitdiff
path: root/fs/btrfs/zstd.c
AgeCommit message (Collapse)Author
2019-05-28btrfs: correct zstd workspace manager lock to use spin_lock_bh()Dennis Zhou
The btrfs zstd workspace manager uses a background timer to reclaim not recently used workspaces. I used spin_lock() from this context which should have been caught with lockdep, but was not. This deadlock was reported in bugzilla. The fix is to switch the zstd wsm lock to use spin_lock_bh() from the softirq context. This happened quite relibably on ppc64, unlike on other architectures. [ 313.402874] ================================ [ 313.402875] WARNING: inconsistent lock state [ 313.402879] 5.1.0-rc7 #1 Not tainted [ 313.402880] -------------------------------- [ 313.402882] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 313.402885] swapper/5/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 313.402888] 0000000080d1120c (&(&wsm.lock)->rlock){+.?.}, at: .zstd_reclaim_timer_fn+0x40/0x230 [ 313.402895] {SOFTIRQ-ON-W} state was registered at: [ 313.402899] .lock_acquire+0xd0/0x240 [ 313.402903] ._raw_spin_lock+0x34/0x60 [ 313.402906] .zstd_get_workspace+0xd0/0x360 [ 313.402908] .end_compressed_bio_read+0x3b8/0x540 [ 313.402911] .bio_endio+0x174/0x2c0 [ 313.402914] .end_workqueue_fn+0x4c/0x70 [ 313.402917] .normal_work_helper+0x138/0x7e0 [ 313.402920] .process_one_work+0x324/0x790 [ 313.402922] .worker_thread+0x68/0x570 [ 313.402925] .kthread+0x19c/0x1b0 [ 313.402928] .ret_from_kernel_thread+0x58/0x78 [ 313.402930] irq event stamp: 2629216 [ 313.402933] hardirqs last enabled at (2629216): [<c0000000009da738>] ._raw_spin_unlock_irq+0x38/0x60 [ 313.402936] hardirqs last disabled at (2629215): [<c0000000009da4c4>] ._raw_spin_lock_irq+0x24/0x70 [ 313.402939] softirqs last enabled at (2629212): [<c0000000000af9fc>] .irq_enter+0x8c/0xd0 [ 313.402942] softirqs last disabled at (2629213): [<c0000000000afb58>] .irq_exit+0x118/0x170 [ 313.402944] other info that might help us debug this: [ 313.402945] Possible unsafe locking scenario: [ 313.402947] CPU0 [ 313.402948] ---- [ 313.402949] lock(&(&wsm.lock)->rlock); [ 313.402951] <Interrupt> [ 313.402952] lock(&(&wsm.lock)->rlock); [ 313.402954] *** DEADLOCK *** [ 313.402957] 1 lock held by swapper/5/0: [ 313.402958] #0: 000000004b612042 ((&wsm.timer)){+.-.}, at: .call_timer_fn+0x0/0x3c0 [ 313.402963] stack backtrace: [ 313.402967] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.1.0-rc7 #1 [ 313.402968] Call Trace: [ 313.402972] [c0000007fa262e70] [c0000000009b3294] .dump_stack+0xe0/0x15c (unreliable) [ 313.402975] [c0000007fa262f10] [c000000000125548] .print_usage_bug+0x348/0x390 [ 313.402978] [c0000007fa262fd0] [c000000000125cb4] .mark_lock+0x724/0x930 [ 313.402981] [c0000007fa263080] [c000000000126c20] .__lock_acquire+0xc90/0x16a0 [ 313.402984] [c0000007fa2631b0] [c000000000128040] .lock_acquire+0xd0/0x240 [ 313.402987] [c0000007fa263280] [c0000000009da2b4] ._raw_spin_lock+0x34/0x60 [ 313.402990] [c0000007fa263300] [c00000000054b0b0] .zstd_reclaim_timer_fn+0x40/0x230 [ 313.402993] [c0000007fa2633d0] [c000000000158b38] .call_timer_fn+0xc8/0x3c0 [ 313.402996] [c0000007fa2634a0] [c000000000158f74] .expire_timers+0x144/0x260 [ 313.402999] [c0000007fa263550] [c000000000159178] .run_timer_softirq+0xe8/0x230 [ 313.403002] [c0000007fa263680] [c0000000009db288] .__do_softirq+0x188/0x5d4 [ 313.403004] [c0000007fa263790] [c0000000000afb58] .irq_exit+0x118/0x170 [ 313.403008] [c0000007fa263800] [c000000000028d88] .timer_interrupt+0x158/0x430 [ 313.403012] [c0000007fa2638b0] [c0000000000091d4] decrementer_common+0x134/0x140 [ 313.403017] --- interrupt: 901 at replay_interrupt_return+0x0/0x4 LR = .arch_local_irq_restore.part.0+0x68/0x80 [ 313.403020] [c0000007fa263bb0] [c00000000001a3ac] .arch_local_irq_restore.part.0+0x2c/0x80 (unreliable) [ 313.403024] [c0000007fa263c30] [c0000000007bbbcc] .cpuidle_enter_state+0xec/0x670 [ 313.403027] [c0000007fa263d00] [c0000000000f5130] .call_cpuidle+0x40/0x90 [ 313.403031] [c0000007fa263d70] [c0000000000f554c] .do_idle+0x2dc/0x3a0 [ 313.403034] [c0000007fa263e30] [c0000000000f59ac] .cpu_startup_entry+0x2c/0x30 [ 313.403037] [c0000007fa263ea0] [c000000000045674] .start_secondary+0x644/0x650 [ 313.403041] [c0000007fa263f90] [c00000000000ad5c] start_secondary_prolog+0x10/0x14 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203517 Fixes: 3f93aef535c8 ("btrfs: add zstd compression level support") CC: stable@vger.kernel.org # 5.1+ Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-04-29btrfs: zstd: remove indirect calls for local functionsDennis Zhou
While calling functions inside zstd, we don't need to use the indirection provided by the workspace_manager. Forward declarations are added to maintain the function order of btrfs_compress_op. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-27btrfs: zstd: ensure reclaim timer is properly cleaned upDennis Zhou
The timer function, zstd_reclaim_timer_fn(), reschedules itself under certain conditions. When cleaning up, take the lock and remove all workspaces. This prevents the timer from rearming itself. Lastly, switch to del_timer_sync() to ensure that the timer function can't trigger as we're unloading. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: add zstd compression level supportDennis Zhou
Zstd compression requires different amounts of memory for each level of compression. The prior patches implemented indirection to allow for each compression type to manage their workspaces independently. This patch uses this indirection to implement compression level support for zstd. To manage the additional memory require, each compression level has its own queue of workspaces. A global LRU is used to help with reclaim. Reclaim is done via a timer which provides a mechanism to decrease memory utilization by keeping only workspaces around that are sized appropriately. Forward progress is guaranteed by a preallocated max workspace hidden from the LRU. When getting a workspace, it uses a bitmap to identify the levels that are populated and scans up. If it finds a workspace that is greater than it, it uses it, but does not update the last_used time and the corresponding place in the LRU. If we hit memory pressure, we sleep on the max level workspace. We continue to rescan in case we can use a smaller workspace, but eventually should be able to obtain the max level workspace or allocate one again should memory pressure subside. The memory requirement for decompression is the same as level 1, and therefore can use any of available workspace. The number of workspaces is bound by an upper limit of the workqueue's limit which currently is 2 (percpu limit). The reclaim timer is used to free inactive/improperly sized workspaces and is set to 307s to avoid colliding with transaction commit (every 30s). Repeating the experiment from v2 [1], the Silesia corpus was copied to a btrfs filesystem 10 times and then read back after dropping the caches. The btrfs filesystem was on an SSD. Level Ratio Compression (MB/s) Decompression (MB/s) Memory (KB) 1 2.658 438.47 910.51 780 2 2.744 364.86 886.55 1004 3 2.801 336.33 828.41 1260 4 2.858 286.71 886.55 1260 5 2.916 212.77 556.84 1388 6 2.363 119.82 990.85 1516 7 3.000 154.06 849.30 1516 8 3.011 159.54 875.03 1772 9 3.025 100.51 940.15 1772 10 3.033 118.97 616.26 1772 11 3.036 94.19 802.11 1772 12 3.037 73.45 931.49 1772 13 3.041 55.17 835.26 2284 14 3.087 44.70 716.78 2547 15 3.126 37.30 878.84 2547 [1] https://lore.kernel.org/linux-btrfs/20181031181108.289340-1-terrelln@fb.com/ Cc: Nick Terrell <terrelln@fb.com> Cc: Omar Sandoval <osandov@osandov.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: make zstd memory requirements monotonicDennis Zhou
It is possible based on the level configurations that a higher level workspace uses less memory than a lower level workspace. In order to reuse workspaces, this must be made a monotonic relationship. This precomputes the required memory for each level and enforces the monotonicity between level and memory required. This is also done in upstream zstd in [1]. [1] https://github.com/facebook/zstd/commit/a68b76afefec6876f8e8a538155109a5aeac0143 Cc: Nick Terrell <terrelln@fb.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: zstd use the passed through level instead of defaultDennis Zhou
Zstd currently only supports the default level of compression. This patch switches to using the level passed in for btrfs zstd configuration. Zstd workspaces now keep track of the requested level as this can differ from the size of the workspace. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: change set_level() to bound the level passed inDennis Zhou
Currently, the only user of set_level() is zlib which sets an internal workspace parameter. As level is now plumbed into get_workspace(), this can be handled there rather than separately. This repurposes set_level() to bound the level passed in so it can be used when setting the mounts compression level and as well as verifying the level before getting a workspace. The other benefit is this divides the meaning of compress(0) and get_workspace(0). The former means we want to use the default compression level of the compression type. The latter means we can use any workspace available. Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: plumb level through the compression interfaceDennis Zhou
Zlib compression supports multiple levels, but doesn't require changing in how a workspace itself is created and managed. Zstd introduces a different memory requirement such that higher levels of compression require more memory. This requires changes in how the alloc()/get() methods work for zstd. This pach plumbs compression level through the interface as a parameter in preparation for zstd compression levels. This gives the compression types opportunity to create/manage based on the compression level. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-02-25btrfs: move to function pointers for get/put workspacesDennis Zhou
The previous patch added generic helpers for get_workspace() and put_workspace(). Now, we can migrate ownership of the workspace_manager to be in the compression type code as the compression code itself doesn't care beyond being able to get a workspace. The init/cleanup and get/put methods are abstracted so each compression algorithm can decide how they want to manage their workspaces. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Dennis Zhou <dennis@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12btrfs: replace GPL boilerplate by SPDX -- sourcesDavid Sterba
Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header. Signed-off-by: David Sterba <dsterba@suse.com>
2018-01-22btrfs: move some zstd work data from stack to workspaceDavid Sterba
* ZSTD_inBuffer in_buf * ZSTD_outBuffer out_buf are used in all functions to pass the compression parameters and the local variables consume some space. We can move them to the workspace and reduce the stack consumption: zstd.c:zstd_decompress -24 (136 -> 112) zstd.c:zstd_decompress_bio -24 (144 -> 120) zstd.c:zstd_compress_pages -24 (264 -> 240) Signed-off-by: David Sterba <dsterba@suse.com> Reviewed-by: Nick Terrell <terrelln@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2017-11-01btrfs: allow to set compression level for zlibDavid Sterba
Preliminary support for setting compression level for zlib, the following works: $ mount -o compess=zlib # default $ mount -o compess=zlib0 # same $ mount -o compess=zlib9 # level 9, slower sync, less data $ mount -o compess=zlib1 # level 1, faster sync, more data $ mount -o remount,compress=zlib3 # level set by remount The compress-force works the same as compress'. The level is visible in the same format in /proc/mounts. Level set via file property does not work yet. Required patch: "btrfs: prepare for extensions in compression options" Signed-off-by: David Sterba <dsterba@suse.com>
2017-08-15btrfs: Add zstd supportNick Terrell
Add zstd compression and decompression support to BtrFS. zstd at its fastest level compresses almost as well as zlib, while offering much faster compression and decompression, approaching lzo speeds. I benchmarked btrfs with zstd compression against no compression, lzo compression, and zlib compression. I benchmarked two scenarios. Copying a set of files to btrfs, and then reading the files. Copying a tarball to btrfs, extracting it to btrfs, and then reading the extracted files. After every operation, I call `sync` and include the sync time. Between every pair of operations I unmount and remount the filesystem to avoid caching. The benchmark files can be found in the upstream zstd source repository under `contrib/linux-kernel/{btrfs-benchmark.sh,btrfs-extract-benchmark.sh}` [1] [2]. I ran the benchmarks on a Ubuntu 14.04 VM with 2 cores and 4 GiB of RAM. The VM is running on a MacBook Pro with a 3.1 GHz Intel Core i7 processor, 16 GB of RAM, and a SSD. The first compression benchmark is copying 10 copies of the unzipped Silesia corpus [3] into a BtrFS filesystem mounted with `-o compress-force=Method`. The decompression benchmark times how long it takes to `tar` all 10 copies into `/dev/null`. The compression ratio is measured by comparing the output of `df` and `du`. See the benchmark file [1] for details. I benchmarked multiple zstd compression levels, although the patch uses zstd level 1. | Method | Ratio | Compression MB/s | Decompression speed | |---------|-------|------------------|---------------------| | None | 0.99 | 504 | 686 | | lzo | 1.66 | 398 | 442 | | zlib | 2.58 | 65 | 241 | | zstd 1 | 2.57 | 260 | 383 | | zstd 3 | 2.71 | 174 | 408 | | zstd 6 | 2.87 | 70 | 398 | | zstd 9 | 2.92 | 43 | 406 | | zstd 12 | 2.93 | 21 | 408 | | zstd 15 | 3.01 | 11 | 354 | The next benchmark first copies `linux-4.11.6.tar` [4] to btrfs. Then it measures the compression ratio, extracts the tar, and deletes the tar. Then it measures the compression ratio again, and `tar`s the extracted files into `/dev/null`. See the benchmark file [2] for details. | Method | Tar Ratio | Extract Ratio | Copy (s) | Extract (s)| Read (s) | |--------|-----------|---------------|----------|------------|----------| | None | 0.97 | 0.78 | 0.981 | 5.501 | 8.807 | | lzo | 2.06 | 1.38 | 1.631 | 8.458 | 8.585 | | zlib | 3.40 | 1.86 | 7.750 | 21.544 | 11.744 | | zstd 1 | 3.57 | 1.85 | 2.579 | 11.479 | 9.389 | [1] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-benchmark.sh [2] https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/btrfs-extract-benchmark.sh [3] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia [4] https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.11.6.tar.xz zstd source repository: https://github.com/facebook/zstd Signed-off-by: Nick Terrell <terrelln@fb.com> Signed-off-by: Chris Mason <clm@fb.com>