summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-12-08btrfs: introduce mount option rescue=allJosef Bacik
Now that we have the building blocks for some better recovery options with corrupted file systems, add a rescue=all option to enable all of the relevant rescue options. This will allow distros to simply default to rescue=all for the "oh dear lord the world's on fire" recovery without needing to know all the different options that we have and may add in the future. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: introduce mount option rescue=ignoredatacsumsJosef Bacik
There are cases where you can end up with bad data csums because of misbehaving applications. This happens when an application modifies a buffer in-flight when doing an O_DIRECT write. In order to recover the file we need a way to turn off data checksums so you can copy the file off, and then you can delete the file and restore it properly later. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: introduce mount option rescue=ignorebadrootsJosef Bacik
In the face of extent root corruption, or any other core fs wide root corruption we will fail to mount the file system. This makes recovery kind of a pain, because you need to fall back to userspace tools to scrape off data. Instead provide a mechanism to gracefully handle bad roots, so we can at least mount read-only and possibly recover data from the file system. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: show rescue=usebackuproot in /proc/mountsJosef Bacik
The standalone option usebackuproot was intended as one-time use and it was not necessary to keep it in the option list. Now that we're going to have more rescue options, it's desirable to keep them intact as it could be confusing why the option disappears. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> [ remove the btrfs_clear_opt part from open_ctree ] Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: add a helper to print out rescue= optionsJosef Bacik
We're going to have a lot of rescue options, add a helper to collapse the /proc/mounts output to rescue=option1:option2:option3 format. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: sysfs: export supported rescue= mount optionsJosef Bacik
We're going to be adding a variety of different rescue options, we should advertise which ones we support to make user spaces life easier in the future. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: push the NODATASUM check into btrfs_lookup_bio_sumsJosef Bacik
When we move to being able to handle NULL csum_roots it'll be cleaner to just check in btrfs_lookup_bio_sums instead of at all of the caller locations, so push the NODATASUM check into it as well so it's unified. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: unify the ro checking for mount optionsJosef Bacik
We're going to be adding more options that require RDONLY, so add a helper to do the check and error out if we don't have RDONLY set. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: do not start readahead for csum tree when scrubbing non-data block groupsFilipe Manana
When scrubbing a stripe of a block group we always start readahead for the checksums btree and wait for it to complete, however when the blockgroup is not a data block group (or a mixed block group) it is a waste of time to do it, since there are no checksums for metadata extents in that btree. So skip that when the block group does not have the data flag set, saving some time doing memory allocations, queueing a job in the readahead work queue, waiting for it to complete and potentially avoiding some IO as well (when csum tree extents are not in memory already). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: assert we are holding the reada_lock when releasing a readahead zoneFilipe Manana
When we drop the last reference of a zone, we end up releasing it through the callback reada_zone_release(), which deletes the zone from a device's reada_zones radix tree. This tree is protected by the global readahead lock at fs_info->reada_lock. Currently all places that are sure that they are dropping the last reference on a zone, are calling kref_put() in a critical section delimited by this lock, while all other places that are sure they are not dropping the last reference, do not bother calling kref_put() while holding that lock. When working on the previous fix for hangs and use-after-frees in the readahead code, my initial attempts were different and I actually ended up having reada_zone_release() called when not holding the lock, which resulted in weird and unexpected problems. So just add an assertion there to detect such problem more quickly and make the dependency more obvious. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: set EXTENT_NORESERVE bits side btrfs_dirty_pages()Goldwyn Rodrigues
Set the extent bits EXTENT_NORESERVE inside btrfs_dirty_pages() as opposed to calling set_extent_bits again later. Fold check for written length within the function. Note: EXTENT_NORESERVE is set before unlocking extents. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: use round_down while calculating start position in btrfs_dirty_pages()Goldwyn Rodrigues
round_down looks prettier than the bit mask operations. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: use iosize while reading compressed pagesGoldwyn Rodrigues
While using compression, a submitted bio is mapped with a compressed bio which performs the read from disk, decompresses and returns uncompressed data to original bio. The original bio must reflect the uncompressed size (iosize) of the I/O to be performed, or else the page just gets the decompressed I/O length of data (disk_io_size). The compressed bio checks the extent map and gets the correct length while performing the I/O from disk. This came up in subpage work when only compressed length of the original bio was filled in the page. This worked correctly for pagesize == sectorsize because both compressed and uncompressed data are at pagesize boundaries, and would end up filling the requested page. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: calculate num_pages, reserve_bytes once in btrfs_buffered_writeGoldwyn Rodrigues
write_bytes can change in btrfs_check_nocow_lock(). Calculate variables such as num_pages and reserve_bytes once we are sure of the value of write_bytes so there is no need to re-calculate. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: calculate more accurate remaining time to sleep in transaction_kthreadNikolay Borisov
If transaction_kthread is woken up before btrfs_fs_info::commit_interval seconds have elapsed it will sleep for a fixed period of 5 seconds. This is not a problem per-se but is not accurate. Instead the code should sleep for an interval which guarantees on next wakeup commit_interval would have passed. Since time tracking is not precise subtract 1 second from delta to ensure the delay we end up waiting will be longer than than the wake up period. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: record delta directly in transaction_kthreadNikolay Borisov
Rename 'now' to 'delta' and store there the delta between transaction start time and current time. This is in preparation for optimising the sleep logic in the next patch. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08btrfs: remove redundant time check in transaction kthread loopNikolay Borisov
The value obtained from ktime_get_seconds() is guaranteed to be monotonically increasing since it's taken from CLOCK_MONOTONIC. As transaction_kthread obtains a reference to the currently running transaction under holding btrfs_fs_info::trans_lock it's guaranteed to: a) see an initialized 'cur', whose start_time is guaranteed to be smaller than 'now' or b) not obtain a 'cur' and simply go to sleep. Given this remove the unnecessary check, if it sees now < cur->start_time this would imply there are far greater problems on the machine. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-08erofs: simplify try_to_claim_pcluster()Gao Xiang
simplify try_to_claim_pcluster() by directly using cmpxchg() here (the retry loop caused more overhead.) Also, move the chain loop detection in and rename it to z_erofs_try_to_claim_pcluster(). Link: https://lore.kernel.org/r/20201208095834.3133565-3-hsiangkao@redhat.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
2020-12-08erofs: insert to managed cache after adding to pclGao Xiang
Previously, it could be some concern to call add_to_page_cache_lru() with page->mapping == Z_EROFS_MAPPING_STAGING (!= NULL). In contrast, page->private is used instead now, so partially revert commit 5ddcee1f3a1c ("erofs: get rid of __stagingpage_alloc helper") with some adaption for simplicity. Link: https://lore.kernel.org/r/20201208095834.3133565-2-hsiangkao@redhat.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
2020-12-08erofs: get rid of magical Z_EROFS_MAPPING_STAGINGGao Xiang
Previously, we played around with magical page->mapping for short-lived temporary pages since we need to identify different types of pages in the same pcluster but both invalidated and short-lived temporary pages can have page->mapping == NULL. It was considered as safe because that temporary pages are all non-LRU / non-movable pages. This patch tends to use specific page->private to identify short-lived pages instead so it won't rely on page->mapping anymore. Details are described in "compress.h" as well. Link: https://lore.kernel.org/r/20201208095834.3133565-1-hsiangkao@redhat.com Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
2020-12-08erofs: remove a void EROFS_VERSION macro set in MakefileVladimir Zapolskiy
Since commit 4f761fa253b4 ("erofs: rename errln/infoln/debugln to erofs_{err, info, dbg}") the defined macro EROFS_VERSION has no affect, therefore removing it from the Makefile is a non-functional change. Link: https://lore.kernel.org/r/20201030122839.25431-1-vladimir@tuxera.com Reviewed-by: Gao Xiang <hsiangkao@redhat.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Vladimir Zapolskiy <vladimir@tuxera.com> Signed-off-by: Gao Xiang <hsiangkao@redhat.com>
2020-12-07f2fs: introduce max_io_bytes, a sysfs entry, to limit bio sizeJaegeuk Kim
This patch adds max_io_bytes to limit bio size when f2fs tries to merge consecutive IOs. This can give a testing point to split out bios and check end_io handles those bios correctly. This is used to capture a recent bug on the decompression and fsverity flow. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-07f2fs: don't allow any writes on readonly mountJaegeuk Kim
generic_make_request: Trying to write to read-only block-device dm-5 (partno 0) WARNING: CPU: 7 PID: 546 at block/blk-core.c:2190 generic_make_request_checks+0x664/0x690 pc : generic_make_request_checks+0x664/0x690 lr : generic_make_request_checks+0x664/0x690 Call trace: generic_make_request_checks+0x664/0x690 generic_make_request+0xf0/0x3a4 submit_bio+0x80/0x250 __submit_merged_bio+0x368/0x4e0 __submit_merged_write_cond.llvm.12294350193007536502+0xe0/0x3e8 f2fs_wait_on_page_writeback+0x84/0x128 f2fs_convert_inline_page+0x35c/0x6f8 f2fs_convert_inline_inode+0xe0/0x2e0 f2fs_file_mmap+0x48/0x9c mmap_region+0x41c/0x74c do_mmap+0x40c/0x4fc vm_mmap_pgoff+0xb8/0x114 vm_mmap+0x34/0x48 elf_map+0x68/0x108 load_elf_binary+0x538/0xb70 search_binary_handler+0xac/0x1dc exec_binprm+0x50/0x15c __do_execve_file+0x620/0x740 __arm64_sys_execve+0x54/0x68 el0_svc_common+0x9c/0x168 el0_svc_handler+0x60/0x6c el0_svc+0x8/0xc Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-12-07io_uring: fix mis-seting personality's credsPavel Begunkov
After io_identity_cow() copies an work.identity it wants to copy creds to the new just allocated id, not the old one. Otherwise it's akin to req->work.identity->creds = req->work.identity->creds. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07btrfs: use helpers to convert from seconds to jiffies in transaction_kthreadNikolay Borisov
The kernel provides easy to understand helpers to convert from human understandable units to the kernel-friendly 'jiffies'. So let's use those to make the code easier to understand. No functional changes. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-07btrfs: sysfs: export filesystem generationAnand Jain
Matching with the information that's available from the ioctl FS_INFO, add generation to the per-filesystem directory /sys/fs/btrfs/UUID/generation, which could be used by scripts. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2020-12-06coredump: fix core_pattern parse errorMenglong Dong
'format_corename()' will splite 'core_pattern' on spaces when it is in pipe mode, and take helper_argv[0] as the path to usermode executable. It works fine in most cases. However, if there is a space between '|' and '/file/path', such as '| /usr/lib/systemd/systemd-coredump %P %u %g', then helper_argv[0] will be parsed as '', and users will get a 'Core dump to | disabled'. It is not friendly to users, as the pattern above was valid previously. Fix this by ignoring the spaces between '|' and '/file/path'. Fixes: 315c69261dd3 ("coredump: split pipe command whitespace before expanding template") Signed-off-by: Menglong Dong <dong.menglong@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Paul Wise <pabs3@bonedaddy.net> Cc: Jakub Wilk <jwilk@jwilk.net> [https://bugs.debian.org/924398] Cc: Neil Horman <nhorman@tuxdriver.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/5fb62870.1c69fb81.8ef5d.af76@mx.google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-05Merge tag 'io_uring-5.10-2020-12-05' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fix from Jens Axboe: "Just a small fix this time, for an issue with 32-bit compat apps and buffer selection with recvmsg" * tag 'io_uring-5.10-2020-12-05' of git://git.kernel.dk/linux-block: io_uring: fix recvmsg setup with compat buf-select
2020-12-05Merge tag '5.10-rc6-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Three smb3 fixes (two for stable) fixing - a null pointer issue in a DFS error path - a problem with excessive padding when mounted with "idsfromsid" causing owner fields to get corrupted - a more recent problem with compounded reparse point query found in testing to the Linux kernel server" * tag '5.10-rc6-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: refactor create_sd_buf() and and avoid corrupting the buffer cifs: add NULL check for ses->tcon_ipc smb3: set COMPOUND_FID to FileID field of subsequent compound request
2020-12-04net: Remove the err argument from sock_from_fileFlorent Revest
Currently, the sock_from_file prototype takes an "err" pointer that is either not set or set to -ENOTSOCK IFF the returned socket is NULL. This makes the error redundant and it is ignored by a few callers. This patch simplifies the API by letting callers deduce the error based on whether the returned socket is NULL or not. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Florent Revest <revest@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20201204113609.1850150-1-revest@google.com
2020-12-04Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Alexei Starovoitov says: ==================== pull-request: bpf-next 2020-12-03 The main changes are: 1) Support BTF in kernel modules, from Andrii. 2) Introduce preferred busy-polling, from Björn. 3) bpf_ima_inode_hash() and bpf_bprm_opts_set() helpers, from KP Singh. 4) Memcg-based memory accounting for bpf objects, from Roman. 5) Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks, from Stanislav. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (118 commits) selftests/bpf: Fix invalid use of strncat in test_sockmap libbpf: Use memcpy instead of strncpy to please GCC selftests/bpf: Add fentry/fexit/fmod_ret selftest for kernel module selftests/bpf: Add tp_btf CO-RE reloc test for modules libbpf: Support attachment of BPF tracing programs to kernel modules libbpf: Factor out low-level BPF program loading helper bpf: Allow to specify kernel module BTFs when attaching BPF programs bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier selftests/bpf: Add CO-RE relocs selftest relying on kernel module BTF selftests/bpf: Add support for marking sub-tests as skipped selftests/bpf: Add bpf_testmod kernel module for testing libbpf: Add kernel module BTF support for CO-RE relocations libbpf: Refactor CO-RE relocs to not assume a single BTF object libbpf: Add internal helper to load BTF data by FD bpf: Keep module's btf_data_size intact after load bpf: Fix bpf_put_raw_tracepoint()'s use of __module_address() selftests/bpf: Add Userspace tests for TCP_WINDOW_CLAMP bpf: Adds support for setting window clamp samples/bpf: Fix spelling mistake "recieving" -> "receiving" bpf: Fix cold build of test_progs-no_alu32 ... ==================== Link: https://lore.kernel.org/r/20201204021936.85653-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-04fs, close_range: add flag CLOSE_RANGE_CLOEXECGiuseppe Scrivano
When the flag CLOSE_RANGE_CLOEXEC is set, close_range doesn't immediately close the files but it sets the close-on-exec bit. It is useful for e.g. container runtimes that usually install a seccomp profile "as late as possible" before execv'ing the container process itself. The container runtime could either do: 1 2 - install_seccomp_profile(); - close_range(MIN_FD, MAX_INT, 0); - close_range(MIN_FD, MAX_INT, 0); - install_seccomp_profile(); - execve(...); - execve(...); Both alternative have some disadvantages. In the first variant the seccomp_profile cannot block the close_range syscall, as well as opendir/read/close/... for the fallback on older kernels. In the second variant, close_range() can be used only on the fds that are not going to be needed by the runtime anymore, and it must be potentially called multiple times to account for the different ranges that must be closed. Using close_range(..., ..., CLOSE_RANGE_CLOEXEC) solves these issues. The runtime is able to use the existing open fds, the seccomp profile can block close_range() and the syscalls used for its fallback. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Link: https://lore.kernel.org/r/20201118104746.873084-2-gscrivan@redhat.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-12-03cifs: refactor create_sd_buf() and and avoid corrupting the bufferRonnie Sahlberg
When mounting with "idsfromsid" mount option, Azure corrupted the owner SIDs due to excessive padding caused by placing the owner fields at the end of the security descriptor on create. Placing owners at the front of the security descriptor (rather than the end) is also safer, as the number of ACEs (that follow it) are variable. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Suggested-by: Rohith Surabattula <rohiths@microsoft.com> CC: Stable <stable@vger.kernel.org> # v5.8 Signed-off-by: Steve French <stfrench@microsoft.com>
2020-12-03cifs: add NULL check for ses->tcon_ipcAurelien Aptel
In some scenarios (DFS and BAD_NETWORK_NAME) set_root_set() can be called with a NULL ses->tcon_ipc. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-12-03smb3: set COMPOUND_FID to FileID field of subsequent compound requestNamjae Jeon
For an operation compounded with an SMB2 CREATE request, client must set COMPOUND_FID(0xFFFFFFFFFFFFFFFF) to FileID field of smb2 ioctl. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Fixes: 2e4564b31b645 ("smb3: add support stat of WSL reparse points for special file types") Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2020-12-03Merge tag '9p-for-5.10-rc7' of git://github.com/martinetd/linuxLinus Torvalds
Pull 9p fixes from Dominique Martinet: "Restore splice functionality for 9p" * tag '9p-for-5.10-rc7' of git://github.com/martinetd/linux: fs: 9p: add generic splice_write file operation fs: 9p: add generic splice_read file operations
2020-12-03gfs2: in signal_our_withdraw wait for unfreeze of _this_ fs onlyBob Peterson
Function signal_our_withdraw needs to work on file systems that have been partially frozen. To do this, it called flush_workqueue(gfs2_freeze_wq). This this wrong because it waits for *ALL* file systems to be unfrozen, not just the one we're withdrawing from. It should only wait for the targetted file system to be unfrozen. Otherwise it would wait until ALL file systems are thawed before signaling the withdraw. This patch changes signal_our_withdraw so it calls flush_work() for the target file system's freeze work (only) to be completed. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-12-03gfs2: Remove sb_start_write from gfs2_statfs_syncBob Peterson
Before this patch, function gfs2_statfs_sync called sb_start_write and sb_end_write. This is completely unnecessary because, aside from grabbing glocks, gfs2_statfs_sync does all its updates to statfs with a transaction: gfs2_trans_begin and _end. And transactions always do sb_start_intwrite in gfs2_trans_begin and sb_end_intwrite in gfs2_trans_end. This patch simply removes the call to sb_start_write. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2020-12-03ext4: simplify the code of mb_find_order_for_blockChunguang Xu
The code of mb_find_order_for_block is a bit obscure, but we can simplify it with mb_find_buddy(), make the code more concise. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/1604764698-4269-3-git-send-email-brookxu@tencent.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-12-03ext4: remove redundant mb_regenerate_buddy()Chunguang Xu
After this patch (163a203), if an abnormal bitmap is detected, we will mark the group as corrupt, and we will not use this group in the future. Therefore, it should be meaningless to regenerate the buddy bitmap of this group, It might be better to delete it. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/1604764698-4269-2-git-send-email-brookxu@tencent.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-12-03inotify: convert to handle_inode_event() interfaceAmir Goldstein
Convert inotify to use the simple handle_inode_event() interface to get rid of the code duplication between the generic helper fsnotify_handle_event() and the inotify_handle_event() callback, which also happen to be buggy code. The bug will be fixed in the generic helper. Link: https://lore.kernel.org/r/20201202120713.702387-3-amir73il@gmail.com CC: stable@vger.kernel.org Fixes: b9a1b9772509 ("fsnotify: create method handle_inode_event() in fsnotify_operations") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-12-03ext4: use ASSERT() to replace J_ASSERT()Chunguang Xu
There are currently multiple forms of assertion, such as J_ASSERT(). J_ASEERT() is provided for the jbd module, which is a public module. Maybe we should use custom ASSERT() like other file systems, such as xfs, which would be better. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/1604764698-4269-1-git-send-email-brookxu@tencent.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-12-03ext4: print quota journalling mode on (re-)mountRoman Anufriev
Right now, it is hard to understand which quota journalling type is enabled: you need to be quite familiar with kernel code and trace it or really understand what different combinations of fs flags/mount options lead to. This patch adds printing of current quota jounalling mode on each mount/remount, thus making it easier to check it at a glance/in autotests. The semantics is similar to ext4 data journalling modes: * journalled - quota configured, journalling will be enabled * writeback - quota configured, journalling won't be enabled * none - quota isn't configured * disabled - kernel compiled without CONFIG_QUOTA feature Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/1603336860-16153-2-git-send-email-dotdot@yandex-team.ru Signed-off-by: Roman Anufriev <dotdot@yandex-team.ru> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-12-03ext4: add helpers for checking whether quota can be enabled/is journalledRoman Anufriev
Right now, there are several places, where we check whether fs is capable of enabling quota or if quota is journalled with quite long and non-self-descriptive condition statements. This patch wraps these statements into helpers for better readability and easier usage. Link: https://lore.kernel.org/r/1603336860-16153-1-git-send-email-dotdot@yandex-team.ru Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Roman Anufriev <dotdot@yandex-team.ru> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-12-03ext4: remove redundant assignment of variable exColin Ian King
Variable ex is assigned a variable that is not being read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20201021132326.148052-1-colin.king@canonical.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-12-03ext4: remove the null check of bio_vec pageXianting Tian
bv_page can't be NULL in a valid bio_vec, so we can remove the NULL check, as we did in other places when calling bio_for_each_segment_all() to go through all bio_vec of a bio. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Xianting Tian <tian.xianting@h3c.com> Link: https://lore.kernel.org/r/20201020082201.34257-1-tian.xianting@h3c.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-12-03ext4: remove redundant operation that set bh to NULLKaixu Xia
The out_fail branch path don't release the bh and the second bh is valid only in the for statement, so we don't need to set them to NULL. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/1603194069-17557-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-12-03fsnotify: generalize handle_inode_event()Amir Goldstein
The handle_inode_event() interface was added as (quoting comment): "a simple variant of handle_event() for groups that only have inode marks and don't have ignore mask". In other words, all backends except fanotify. The inotify backend also falls under this category, but because it required extra arguments it was left out of the initial pass of backends conversion to the simple interface. This results in code duplication between the generic helper fsnotify_handle_event() and the inotify_handle_event() callback which also happen to be buggy code. Generalize the handle_inode_event() arguments and add the check for FS_EXCL_UNLINK flag to the generic helper, so inotify backend could be converted to use the simple interface. Link: https://lore.kernel.org/r/20201202120713.702387-2-amir73il@gmail.com CC: stable@vger.kernel.org Fixes: b9a1b9772509 ("fsnotify: create method handle_inode_event() in fsnotify_operations") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2020-12-03openat2: reject RESOLVE_BENEATH|RESOLVE_IN_ROOTAleksa Sarai
This was an oversight in the original implementation, as it makes no sense to specify both scoping flags to the same openat2(2) invocation (before this patch, the result of such an invocation was equivalent to RESOLVE_IN_ROOT being ignored). This is a userspace-visible ABI change, but the only user of openat2(2) at the moment is LXC which doesn't specify both flags and so no userspace programs will break as a result. Fixes: fddb5d430ad9 ("open: introduce openat2(2) syscall") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: <stable@vger.kernel.org> # v5.6+ Link: https://lore.kernel.org/r/20201027235044.5240-2-cyphar@cyphar.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-12-03f2fs: avoid race condition for shrinker countJaegeuk Kim
Light reported sometimes shinker gets nat_cnt < dirty_nat_cnt resulting in wrong do_shinker work. Let's avoid to return insane overflowed value by adding single tracking value. Reported-by: Light Hsieh <Light.Hsieh@mediatek.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>