summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2020-04-01signal: Extend exec_id to 64bitsEric W. Biederman
Replace the 32bit exec_id with a 64bit exec_id to make it impossible to wrap the exec_id counter. With care an attacker can cause exec_id wrap and send arbitrary signals to a newly exec'd parent. This bypasses the signal sending checks if the parent changes their credentials during exec. The severity of this problem can been seen that in my limited testing of a 32bit exec_id it can take as little as 19s to exec 65536 times. Which means that it can take as little as 14 days to wrap a 32bit exec_id. Adam Zabrocki has succeeded wrapping the self_exe_id in 7 days. Even my slower timing is in the uptime of a typical server. Which means self_exec_id is simply a speed bump today, and if exec gets noticably faster self_exec_id won't even be a speed bump. Extending self_exec_id to 64bits introduces a problem on 32bit architectures where reading self_exec_id is no longer atomic and can take two read instructions. Which means that is is possible to hit a window where the read value of exec_id does not match the written value. So with very lucky timing after this change this still remains expoiltable. I have updated the update of exec_id on exec to use WRITE_ONCE and the read of exec_id in do_notify_parent to use READ_ONCE to make it clear that there is no locking between these two locations. Link: https://lore.kernel.org/kernel-hardening/20200324215049.GA3710@pi3.com.pl Fixes: 2.3.23pre2 Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2020-04-01NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()Trond Myklebust
When we detach a subrequest from the list, we must also release the reference it holds to the parent. Fixes: 5b2b5187fa85 ("NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race cases") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-04-01io_uring: add missing finish_wait() in io_sq_thread()Hillf Danton
Add it to pair with prepare_to_wait() in an attempt to avoid anything weird in the field. Fixes: b41e98524e42 ("io_uring: add per-task callback handler") Reported-by: syzbot+0c3370f235b74b3cfd97@syzkaller.appspotmail.com Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Fix the iwlwifi regression, from Johannes Berg. 2) Support BSS coloring and 802.11 encapsulation offloading in hardware, from John Crispin. 3) Fix some potential Spectre issues in qtnfmac, from Sergey Matyukevich. 4) Add TTL decrement action to openvswitch, from Matteo Croce. 5) Allow paralleization through flow_action setup by not taking the RTNL mutex, from Vlad Buslov. 6) A lot of zero-length array to flexible-array conversions, from Gustavo A. R. Silva. 7) Align XDP statistics names across several drivers for consistency, from Lorenzo Bianconi. 8) Add various pieces of infrastructure for offloading conntrack, and make use of it in mlx5 driver, from Paul Blakey. 9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki. 10) Lots of parallelization improvements during configuration changes in mlxsw driver, from Ido Schimmel. 11) Add support to devlink for generic packet traps, which report packets dropped during ACL processing. And use them in mlxsw driver. From Jiri Pirko. 12) Support bcmgenet on ACPI, from Jeremy Linton. 13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei Starovoitov, and your's truly. 14) Support XDP meta-data in virtio_net, from Yuya Kusakabe. 15) Fix sysfs permissions when network devices change namespaces, from Christian Brauner. 16) Add a flags element to ethtool_ops so that drivers can more simply indicate which coalescing parameters they actually support, and therefore the generic layer can validate the user's ethtool request. Use this in all drivers, from Jakub Kicinski. 17) Offload FIFO qdisc in mlxsw, from Petr Machata. 18) Support UDP sockets in sockmap, from Lorenz Bauer. 19) Fix stretch ACK bugs in several TCP congestion control modules, from Pengcheng Yang. 20) Support virtual functiosn in octeontx2 driver, from Tomasz Duszynski. 21) Add region operations for devlink and use it in ice driver to dump NVM contents, from Jacob Keller. 22) Add support for hw offload of MACSEC, from Antoine Tenart. 23) Add support for BPF programs that can be attached to LSM hooks, from KP Singh. 24) Support for multiple paths, path managers, and counters in MPTCP. From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti, and others. 25) More progress on adding the netlink interface to ethtool, from Michal Kubecek" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits) net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline cxgb4/chcr: nic-tls stats in ethtool net: dsa: fix oops while probing Marvell DSA switches net/bpfilter: remove superfluous testing message net: macb: Fix handling of fixed-link node net: dsa: ksz: Select KSZ protocol tag netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write net: stmmac: add EHL 2.5Gbps PCI info and PCI ID net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID net: stmmac: create dwmac-intel.c to contain all Intel platform net: dsa: bcm_sf2: Support specifying VLAN tag egress rule net: dsa: bcm_sf2: Add support for matching VLAN TCI net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT net: dsa: bcm_sf2: Disable learning for ASP port net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge net: dsa: b53: Prevent tagged VLAN on port 7 for 7278 net: dsa: b53: Restore VLAN entries upon (re)configuration net: dsa: bcm_sf2: Fix overflow checks hv_netvsc: Remove unnecessary round_up for recv_completion_cnt ...
2020-03-31Merge tag 'selinux-pr-20200330' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull SELinux updates from Paul Moore: "We've got twenty SELinux patches for the v5.7 merge window, the highlights are below: - Deprecate setting /sys/fs/selinux/checkreqprot to 1. This flag was originally created to deal with legacy userspace and the READ_IMPLIES_EXEC personality flag. We changed the default from 1 to 0 back in Linux v4.4 and now we are taking the next step of deprecating it, at some point in the future we will take the final step of rejecting 1. - Allow kernfs symlinks to inherit the SELinux label of the parent directory. In order to preserve backwards compatibility this is protected by the genfs_seclabel_symlinks SELinux policy capability. - Optimize how we store filename transitions in the kernel, resulting in some significant improvements to policy load times. - Do a better job calculating our internal hash table sizes which resulted in additional policy load improvements and likely general SELinux performance improvements as well. - Remove the unused initial SIDs (labels) and improve how we handle initial SIDs. - Enable per-file labeling for the bpf filesystem. - Ensure that we properly label NFS v4.2 filesystems to avoid a temporary unlabeled condition. - Add some missing XFS quota command types to the SELinux quota access controls. - Fix a problem where we were not updating the seq_file position index correctly in selinuxfs. - We consolidate some duplicated code into helper functions. - A number of list to array conversions. - Update Stephen Smalley's email address in MAINTAINERS" * tag 'selinux-pr-20200330' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: clean up indentation issue with assignment statement NFS: Ensure security label is set for root inode MAINTAINERS: Update my email address selinux: avtab_init() and cond_policydb_init() return void selinux: clean up error path in policydb_init() selinux: remove unused initial SIDs and improve handling selinux: reduce the use of hard-coded hash sizes selinux: Add xfs quota command types selinux: optimize storage of filename transitions selinux: factor out loop body from filename_trans_read() security: selinux: allow per-file labeling for bpffs selinux: generalize evaluate_cond_node() selinux: convert cond_expr to array selinux: convert cond_av_list to array selinux: convert cond_list to array selinux: sel_avc_get_stat_idx should increase position index selinux: allow kernfs symlinks to inherit parent directory context selinux: simplify evaluate_cond_node() Documentation,selinux: deprecate setting checkreqprot to 1 selinux: move status variables out of selinux_ss
2020-03-31Merge tag '5.7-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs updates from Steve French: "First part of cifs/smb3 changes for merge window (others are still being tested). Various RDMA (smbdirect) fixes, addition of SMB3.1.1 POSIX support in readdir, 3 fixes for stable, and a fix for flock. Summary: New feature: - SMB3.1.1 POSIX support in readdir Fixes: - various RDMA (smbdirect) fixes - fix for flock - fallocate fix - some improved mount warnings - two timestamp related fixes - reconnect fix - three fixes for stable" * tag '5.7-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (28 commits) cifs: update internal module version number cifs: Allocate encryption header through kmalloc cifs: smbd: Check and extend sender credits in interrupt context cifs: smbd: Calculate the correct maximum packet size for segmented SMBDirect send/receive smb3: use SMB2_SIGNATURE_SIZE define CIFS: Fix bug which the return value by asynchronous read is error CIFS: check new file size when extending file by fallocate SMB3: Minor cleanup of protocol definitions SMB3: Additional compression structures SMB3: Add new compression flags cifs: smb2pdu.h: Replace zero-length array with flexible-array member cifs: clear PF_MEMALLOC before exiting demultiplex thread cifs: cifspdu.h: Replace zero-length array with flexible-array member CIFS: Warn less noisily on default mount fs/cifs: fix gcc warning in sid_to_id cifs: allow unlock flock and OFD lock across fork cifs: do d_move in rename cifs: add SMB2_open() arg to return POSIX data cifs: plumb smb2 POSIX dir enumeration cifs: add smb2 POSIX info level ...
2020-03-31Merge tag 'gfs2-for-5.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Bob Peterson: "We've got a lot of patches (39) for this merge window. Most of these patches are related to corruption that occurs when journals are replayed. For example: 1. A node fails while writing to the file system. 2. Other nodes use the metadata that was once used by the failed node. 3. When the node returns to the cluster, its journal is replayed, but the older metadata blocks overwrite the changes from step 2. Summary: - Fixed the recovery sequence to prevent corruption during journal replay. - Many bug fixes found during recovery testing. - New improved file system withdraw sequence. - Fixed how resource group buffers are managed. - Fixed how metadata revokes are tracked and written. - Improve processing of IO errors hit by daemons like logd and quotad. - Improved error checking in metadata writes. - Fixed how qadata quota data structures are managed" * tag 'gfs2-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (39 commits) gfs2: Fix oversight in gfs2_ail1_flush gfs2: change from write to read lock for sd_log_flush_lock in journal replay gfs2: instrumentation wrt ail1 stuck gfs2: don't lock sd_log_flush_lock in try_rgrp_unlink gfs2: Remove unnecessary gfs2_qa_{get,put} pairs gfs2: Split gfs2_rsqa_delete into gfs2_rs_delete and gfs2_qa_put gfs2: Change inode qa_data to allow multiple users gfs2: eliminate gfs2_rsqa_alloc in favor of gfs2_qa_alloc gfs2: Switch to list_{first,last}_entry gfs2: Clean up inode initialization and teardown gfs2: Additional information when gfs2_ail1_flush withdraws gfs2: leaf_dealloc needs to allocate one more revoke gfs2: allow journal replay to hold sd_log_flush_lock gfs2: don't allow releasepage to free bd still used for revokes gfs2: flesh out delayed withdraw for gfs2_log_flush gfs2: Do proper error checking for go_sync family of glops functions gfs2: Don't demote a glock until its revokes are written gfs2: drain the ail2 list after io errors gfs2: Withdraw in gfs2_ail1_flush if write_cache_pages fails gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty ...
2020-03-31Merge tag 'for-5.7-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "A number of core changes that make things work better in general, code is simpler and cleaner. Core changes: - per-inode file extent tree, for in memory tracking of contiguous extent ranges to make sure i_size adjustments are accurate - tree root structures are protected by reference counts, replacing SRCU that did not cover some cases - leak detector for tree root structures - per-transaction pinned extent tracking - buffer heads are replaced by bios for super block access - speedup of extent back reference resolution, on an example test scenario the runtime of send went down from a hour to minutes - factor out locking scheme used for subvolume writer and NOCOW exclusion, abstracted as DREW lock, double reader-writer exclusion (allow either readers or writers) - cleanup and abstract extent allocation policies, preparation for zoned device support - make reflink/clone_range work on inline extents - add more cancellation point for relocation, improves long response from 'balance cancel' - add page migration callback for data pages - switch to guid for uuids, with additional cleanups of the interface - make ranged full fsyncs more efficient - removal of obsolete ioctl flag BTRFS_SUBVOL_CREATE_ASYNC - remove b-tree readahead from delayed refs paths, avoiding seek and read unnecessary blocks Features: - v2 of ioctl to delete subvolumes, allowing to delete by id and more future extensions Fixes: - fix qgroup rescan worker that could block umount - fix crash during unmount due to race with delayed inode workers - fix dellaloc flushing logic that could create unnecessary chunks under heavy load - fix missing file extent item for hole after ranged fsync - several fixes in relocation error handling Other: - more documentation of relocation, device replace, space reservations - many random cleanups" * tag 'for-5.7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (210 commits) btrfs: fix missing semaphore unlock in btrfs_sync_file btrfs: use nofs allocations for running delayed items btrfs: sysfs: Use scnprintf() instead of snprintf() btrfs: do not resolve backrefs for roots that are being deleted btrfs: track reloc roots based on their commit root bytenr btrfs: restart relocate_tree_blocks properly btrfs: reloc: reorder reservation before root selection btrfs: do not readahead in build_backref_tree btrfs: do not use readahead for running delayed refs btrfs: Remove async_transid from btrfs_mksubvol/create_subvol/create_snapshot btrfs: Remove transid argument from btrfs_ioctl_snap_create_transid btrfs: Remove BTRFS_SUBVOL_CREATE_ASYNC support btrfs: kill the subvol_srcu btrfs: make btrfs_cleanup_fs_roots use the radix tree lock btrfs: don't take an extra root ref at allocation time btrfs: hold a ref on the root on the dead roots list btrfs: make inodes hold a ref on their roots btrfs: move the root freeing stuff into btrfs_put_root btrfs: move ino_cache_inode dropping out of btrfs_free_fs_root btrfs: make the extent buffer leak check per fs info ...
2020-03-31Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds
Pull fscrypt updates from Eric Biggers: "Add an ioctl FS_IOC_GET_ENCRYPTION_NONCE which retrieves a file's encryption nonce. This makes it easier to write automated tests which verify that fscrypt is doing the encryption correctly" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: ubifs: wire up FS_IOC_GET_ENCRYPTION_NONCE f2fs: wire up FS_IOC_GET_ENCRYPTION_NONCE ext4: wire up FS_IOC_GET_ENCRYPTION_NONCE fscrypt: add FS_IOC_GET_ENCRYPTION_NONCE ioctl
2020-03-31xfs: remove redundant variable assignment in xfs_symlink()Kaixu Xia
The variables 'udqp' and 'gdqp' have been initialized, so remove redundant variable assignment in xfs_symlink(). Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Dave Chinner <dchinner@redhat.com Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-31xfs: ratelimit inode flush on buffered write ENOSPCDarrick J. Wong
A customer reported rcu stalls and softlockup warnings on a computer with many CPU cores and many many more IO threads trying to write to a filesystem that is totally out of space. Subsequent analysis pointed to the many many IO threads calling xfs_flush_inodes -> sync_inodes_sb, which causes a lot of wb_writeback_work to be queued. The writeback worker spends so much time trying to wake the many many threads waiting for writeback completion that it trips the softlockup detector, and (in this case) the system automatically reboots. In addition, they complain that the lengthy xfs_flush_inodes scan traps all of those threads in uninterruptible sleep, which hampers their ability to kill the program or do anything else to escape the situation. If there's thousands of threads trying to write to files on a full filesystem, each of those threads will start separate copies of the inode flush scan. This is kind of pointless since we only need one scan, so rate limit the inode flush. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-03-31io_uring: refactor file register/unregister/update handlingXiaoguang Wang
While diving into io_uring fileset register/unregister/update codes, we found one bug in the fileset update handling. io_uring fileset update use a percpu_ref variable to check whether we can put the previously registered file, only when the refcnt of the perfcpu_ref variable reaches zero, can we safely put these files. But this doesn't work so well. If applications always issue requests continually, this perfcpu_ref will never have an chance to reach zero, and it'll always be in atomic mode, also will defeat the gains introduced by fileset register/unresiger/update feature, which are used to reduce the atomic operation overhead of fput/fget. To fix this issue, while applications do IORING_REGISTER_FILES or IORING_REGISTER_FILES_UPDATE operations, we allocate a new percpu_ref and kill the old percpu_ref, new requests will use the new percpu_ref. Once all previous old requests complete, old percpu_refs will be dropped and registered files will be put safely. Link: https://lore.kernel.org/io-uring/5a8dac33-4ca2-4847-b091-f7dcd3ad0ff3@linux.alibaba.com/T/#t Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-03-30f2fs: compress: add .{init,destroy}_decompress_ctx callbackChao Yu
Add below two callback interfaces in struct f2fs_compress_ops: int (*init_decompress_ctx)(struct decompress_io_ctx *dic); void (*destroy_decompress_ctx)(struct decompress_io_ctx *dic); Which will be used by zstd compress algorithm later. In addition, this patch adds callback function pointer check, so that specified algorithm can avoid defining unneeded functions. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: compress: fix to call missing destroy_compress_ctx()Chao Yu
Otherwise, it will cause memory leak. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: change default compression algorithmChao Yu
Use LZ4 as default compression algorithm, as compared to LZO, it shows almost the same compression ratio and much better decompression speed. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: clean up {cic,dic}.ref handlingChao Yu
{cic,dic}.ref should be initialized to number of compressed pages, let's initialize it directly rather than doing w/ f2fs_set_compressed_page(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: fix to use f2fs_readpage_limit() in f2fs_read_multi_pages()Chao Yu
Multipage read flow should consider fsverity, so it needs to use f2fs_readpage_limit() instead of i_size_read() to check EOF condition. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: xattr.h: Make stub helpers inlineYueHaibing
Fix gcc warnings: In file included from fs/f2fs/dir.c:15:0: fs/f2fs/xattr.h:157:13: warning: 'f2fs_destroy_xattr_caches' defined but not used [-Wunused-function] static void f2fs_destroy_xattr_caches(struct f2fs_sb_info *sbi) { } ^~~~~~~~~~~~~~~~~~~~~~~~~ fs/f2fs/xattr.h:156:12: warning: 'f2fs_init_xattr_caches' defined but not used [-Wunused-function] static int f2fs_init_xattr_caches(struct f2fs_sb_info *sbi) { return 0; } Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: a999150f4fe3 ("f2fs: use kmem_cache pool during inline xattr lookups") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: fix to avoid double unlockChao Yu
On image that has verity and compression feature, if compressed pages and non-compressed pages are mixed in one bio, we may double unlock non-compressed page in below flow: - f2fs_post_read_work - f2fs_decompress_work - f2fs_decompress_bio - __read_end_io - unlock_page - fsverity_enqueue_verify_work - f2fs_verity_work - f2fs_verify_bio - unlock_page So it should skip handling non-compressed page in f2fs_decompress_work() if verity is on. Besides, add missing dec_page_count() in f2fs_verify_bio(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: fix potential .flags overflow on 32bit architectureChao Yu
f2fs_inode_info.flags is unsigned long variable, it has 32 bits in 32bit architecture, since we introduced FI_MMAP_FILE flag when we support data compression, we may access memory cross the border of .flags field, corrupting .i_sem field, result in below deadlock. To fix this issue, let's expand .flags as an array to grab enough space to store new flags. Call Trace: __schedule+0x8d0/0x13fc ? mark_held_locks+0xac/0x100 schedule+0xcc/0x260 rwsem_down_write_slowpath+0x3ab/0x65d down_write+0xc7/0xe0 f2fs_drop_nlink+0x3d/0x600 [f2fs] f2fs_delete_inline_entry+0x300/0x440 [f2fs] f2fs_delete_entry+0x3a1/0x7f0 [f2fs] f2fs_unlink+0x500/0x790 [f2fs] vfs_unlink+0x211/0x490 do_unlinkat+0x483/0x520 sys_unlink+0x4a/0x70 do_fast_syscall_32+0x12b/0x683 entry_SYSENTER_32+0xaa/0x102 Fixes: 4c8ff7095bef ("f2fs: support data compression") Tested-by: Ondrej Jirman <megous@megous.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: fix NULL pointer dereference in f2fs_verity_work()Chao Yu
If both compression and fsverity feature is on, generic/572 will report below NULL pointer dereference bug. BUG: kernel NULL pointer dereference, address: 0000000000000018 RIP: 0010:f2fs_verity_work+0x60/0x90 [f2fs] #PF: supervisor read access in kernel mode Workqueue: fsverity_read_queue f2fs_verity_work [f2fs] RIP: 0010:f2fs_verity_work+0x60/0x90 [f2fs] Call Trace: process_one_work+0x16c/0x3f0 worker_thread+0x4c/0x440 ? rescuer_thread+0x350/0x350 kthread+0xf8/0x130 ? kthread_unpark+0x70/0x70 ret_from_fork+0x35/0x40 There are two issue in f2fs_verity_work(): - it needs to traverse and verify all pages in bio. - if pages in bio belong to non-compressed cluster, accessing decompress IO context stored in page private will cause NULL pointer dereference. Fix them. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: fix to clear PG_error if fsverity failedChao Yu
In f2fs_decompress_end_io(), we should clear PG_error flag before page unlock, otherwise reread will fail due to the flag as described in commit fb7d70db305a ("f2fs: clear PageError on the read path"). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: don't call fscrypt_get_encryption_info() explicitly in f2fs_tmpfile()Chao Yu
In f2fs_tmpfile(), parent inode's encryption info is only used when inheriting encryption context to its child inode, however, we have already called fscrypt_get_encryption_info() in fscrypt_inherit_context() to get the encryption info, so just removing unneeded one in f2fs_tmpfile(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: don't trigger data flush in foreground operationChao Yu
Data flush can generate heavy IO and cause long latency during flush, so it's not appropriate to trigger it in foreground operation. And also, we may face below potential deadlock during data flush: - f2fs_write_multi_pages - f2fs_write_raw_pages - f2fs_write_single_data_page - f2fs_balance_fs - f2fs_balance_fs_bg - f2fs_sync_dirty_inodes - filemap_fdatawrite -- stuck on flush same cluster Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: fix NULL pointer dereference in f2fs_write_begin()Chao Yu
BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:f2fs_write_begin+0x823/0xb90 [f2fs] Call Trace: f2fs_quota_write+0x139/0x1d0 [f2fs] write_blk+0x36/0x80 [quota_tree] get_free_dqblk+0x42/0xa0 [quota_tree] do_insert_tree+0x235/0x4a0 [quota_tree] do_insert_tree+0x26e/0x4a0 [quota_tree] do_insert_tree+0x26e/0x4a0 [quota_tree] do_insert_tree+0x26e/0x4a0 [quota_tree] qtree_write_dquot+0x70/0x190 [quota_tree] v2_write_dquot+0x43/0x90 [quota_v2] dquot_acquire+0x77/0x100 f2fs_dquot_acquire+0x2f/0x60 [f2fs] dqget+0x310/0x450 dquot_transfer+0x7e/0x120 f2fs_setattr+0x11a/0x4a0 [f2fs] notify_change+0x349/0x480 chown_common+0x168/0x1c0 do_fchownat+0xbc/0xf0 __x64_sys_fchownat+0x20/0x30 do_syscall_64+0x5f/0x220 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Passing fsdata parameter to .write_{begin,end} in f2fs_quota_write(), so that if quota file is compressed one, we can avoid above NULL pointer dereference when updating quota content. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: clean up f2fs_may_encrypt()Chao Yu
Merge below two conditions into f2fs_may_encrypt() for cleanup - IS_ENCRYPTED() - DUMMY_ENCRYPTION_ENABLED() Check IS_ENCRYPTED(inode) condition in f2fs_init_inode_metadata() is enough since we have already set encrypt flag in f2fs_new_inode(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: fix to avoid potential deadlockChao Yu
We should always check F2FS_I(inode)->cp_task condition in prior to other conditions in __should_serialize_io() to avoid deadloop described in commit 040d2bb318d1 ("f2fs: fix to avoid deadloop if data_flush is on"), however we break this rule when we support compression, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: don't change inode status under page lockChao Yu
In order to shrink page lock coverage. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: fix potential deadlock on compressed quota fileChao Yu
generic/232 reports below deadlock: fsstress D 0 96980 96969 0x00084000 Call Trace: schedule+0x4a/0xb0 io_schedule+0x12/0x40 __lock_page+0x127/0x1d0 pagecache_get_page+0x1d8/0x250 prepare_compress_overwrite+0xe0/0x490 [f2fs] f2fs_prepare_compress_overwrite+0x5d/0x80 [f2fs] f2fs_write_begin+0x833/0xb90 [f2fs] f2fs_quota_write+0x145/0x1e0 [f2fs] write_blk+0x36/0x80 [quota_tree] do_insert_tree+0x2ac/0x4a0 [quota_tree] do_insert_tree+0x26e/0x4a0 [quota_tree] qtree_write_dquot+0x70/0x190 [quota_tree] v2_write_dquot+0x43/0x90 [quota_v2] dquot_acquire+0x77/0x100 f2fs_dquot_acquire+0x2f/0x60 [f2fs] dqget+0x310/0x450 dquot_transfer+0xb2/0x120 f2fs_setattr+0x11a/0x4a0 [f2fs] notify_change+0x349/0x480 chown_common+0x168/0x1c0 do_fchownat+0xbc/0xf0 __x64_sys_lchown+0x21/0x30 do_syscall_64+0x5f/0x220 entry_SYSCALL_64_after_hwframe+0x44/0xa9 task PC stack pid father kworker/u256:0 D 0 103444 2 0x80084000 Workqueue: writeback wb_workfn (flush-251:1) Call Trace: schedule+0x4a/0xb0 schedule_timeout+0x15e/0x2f0 io_schedule_timeout+0x19/0x40 congestion_wait+0x7e/0x120 f2fs_write_multi_pages+0x12a/0x840 [f2fs] f2fs_write_cache_pages+0x48f/0x790 [f2fs] f2fs_write_data_pages+0x2db/0x330 [f2fs] do_writepages+0x1a/0x60 __writeback_single_inode+0x3d/0x340 writeback_sb_inodes+0x225/0x4a0 wb_writeback+0xf7/0x320 wb_workfn+0xba/0x470 process_one_work+0x16c/0x3f0 worker_thread+0x4c/0x440 kthread+0xf8/0x130 ret_from_fork+0x35/0x40 fsstress D 0 5277 5266 0x00084000 Call Trace: schedule+0x4a/0xb0 rwsem_down_write_slowpath+0x29d/0x540 block_operations+0x105/0x360 [f2fs] f2fs_write_checkpoint+0x101/0x1010 [f2fs] f2fs_sync_fs+0xa8/0x130 [f2fs] f2fs_do_sync_file+0x1ad/0x890 [f2fs] do_fsync+0x38/0x60 __x64_sys_fdatasync+0x13/0x20 do_syscall_64+0x5f/0x220 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The root cause is there is potential deadlock between quota data update and writeback. Kworker Thread B Thread C - f2fs_write_cache_pages - lock whole cluster --- A - f2fs_write_multi_pages - f2fs_write_raw_pages - f2fs_write_single_data_page - f2fs_do_write_data_page - f2fs_setattr - f2fs_lock_op --- B - f2fs_write_checkpoint - block_operations - f2fs_lock_all --- B - dquot_transfer - f2fs_quota_write - f2fs_prepare_compress_overwrite - pagecache_get_page --- A - f2fs_trylock_op failed --- B - congestion_wait - goto rewrite To fix this issue, during quota file writeback, just redirty all pages left in cluster rather holding pages' lock in cluster and looping retrying lock cp_rwsem. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: delete DIO read lockDongDongJu
This lock can be a contention with multi 4k random read IO with single inode. example) fio --output=test --name=test --numjobs=60 --filename=/media/samsung960pro/file_test --rw=randread --bs=4k --direct=1 --time_based --runtime=7 --ioengine=libaio --iodepth=256 --group_reporting --size=10G With this commit, it remove that possible lock contention. Signed-off-by: Dongjoo Seo <commisori28@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30f2fs: don't mark compressed inode dirty during f2fs_iget()Chao Yu
- f2fs_iget - do_read_inode - set_inode_flag(, FI_COMPRESSED_FILE) - __mark_inode_dirty_flag(, true) It's unnecessary, so let's just mark compressed inode dirty while compressed inode conversion. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-03-30NFS: Ensure security label is set for root inodeScott Mayhew
When using NFSv4.2, the security label for the root inode should be set via a call to nfs_setsecurity() during the mount process, otherwise the inode will appear as unlabeled for up to acdirmin seconds. Currently the label for the root inode is allocated, retrieved, and freed entirely witin nfs4_proc_get_root(). Add a field for the label to the nfs_fattr struct, and allocate & free the label in nfs_get_root(), where we also add a call to nfs_setsecurity(). Note that for the call to nfs_setsecurity() to succeed, it's necessary to also move the logic calling security_sb_{set,clone}_security() from nfs_get_tree_common() down into nfs_get_root()... otherwise the SBLABEL_MNT flag will not be set in the super_block's security flags and nfs_setsecurity() will silently fail. Reported-by: Richard Haines <richard_c_haines@btinternet.com> Signed-off-by: Scott Mayhew <smayhew@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Tested-by: Stephen Smalley <sds@tycho.nsa.gov> [PM: fixed 80-char line width problems] Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-03-30Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - Continued user-access cleanups in the futex code. - percpu-rwsem rewrite that uses its own waitqueue and atomic_t instead of an embedded rwsem. This addresses a couple of weaknesses, but the primary motivation was complications on the -rt kernel. - Introduce raw lock nesting detection on lockdep (CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal lock differences. This too originates from -rt. - Reuse lockdep zapped chain_hlocks entries, to conserve RAM footprint on distro-ish kernels running into the "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep chain-entries pool. - Misc cleanups, smaller fixes and enhancements - see the changelog for details" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t Documentation/locking/locktypes: Minor copy editor fixes Documentation/locking/locktypes: Further clarifications and wordsmithing m68knommu: Remove mm.h include from uaccess_no.h x86: get rid of user_atomic_cmpxchg_inatomic() generic arch_futex_atomic_op_inuser() doesn't need access_ok() x86: don't reload after cmpxchg in unsafe_atomic_op2() loop x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end() objtool: whitelist __sanitizer_cov_trace_switch() [parisc, s390, sparc64] no need for access_ok() in futex handling sh: no need of access_ok() in arch_futex_atomic_op_inuser() futex: arch_futex_atomic_op_inuser() calling conventions change completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all() lockdep: Add posixtimer context tracing bits lockdep: Annotate irq_work lockdep: Add hrtimer context tracing bits lockdep: Introduce wait-type checks completion: Use simple wait queues sched/swait: Prepare usage in completions ...
2020-03-30Merge branch 'efi-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The EFI changes in this cycle are much larger than usual, for two (positive) reasons: - The GRUB project is showing signs of life again, resulting in the introduction of the generic Linux/UEFI boot protocol, instead of x86 specific hacks which are increasingly difficult to maintain. There's hope that all future extensions will now go through that boot protocol. - Preparatory work for RISC-V EFI support. The main changes are: - Boot time GDT handling changes - Simplify handling of EFI properties table on arm64 - Generic EFI stub cleanups, to improve command line handling, file I/O, memory allocation, etc. - Introduce a generic initrd loading method based on calling back into the firmware, instead of relying on the x86 EFI handover protocol or device tree. - Introduce a mixed mode boot method that does not rely on the x86 EFI handover protocol either, and could potentially be adopted by other architectures (if another one ever surfaces where one execution mode is a superset of another) - Clean up the contents of 'struct efi', and move out everything that doesn't need to be stored there. - Incorporate support for UEFI spec v2.8A changes that permit firmware implementations to return EFI_UNSUPPORTED from UEFI runtime services at OS runtime, and expose a mask of which ones are supported or unsupported via a configuration table. - Partial fix for the lack of by-VA cache maintenance in the decompressor on 32-bit ARM. - Changes to load device firmware from EFI boot service memory regions - Various documentation updates and minor code cleanups and fixes" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) efi/libstub/arm: Fix spurious message that an initrd was loaded efi/libstub/arm64: Avoid image_base value from efi_loaded_image partitions/efi: Fix partition name parsing in GUID partition entry efi/x86: Fix cast of image argument efi/libstub/x86: Use ULONG_MAX as upper bound for all allocations efi: Fix a mistype in comments mentioning efivar_entry_iter_begin() efi/libstub: Avoid linking libstub/lib-ksyms.o into vmlinux efi/x86: Preserve %ebx correctly in efi_set_virtual_address_map() efi/x86: Ignore the memory attributes table on i386 efi/x86: Don't relocate the kernel unless necessary efi/x86: Remove extra headroom for setup block efi/x86: Add kernel preferred address to PE header efi/x86: Decompress at start of PE image load address x86/boot/compressed/32: Save the output address instead of recalculating it efi/libstub/x86: Deal with exit() boot service returning x86/boot: Use unsigned comparison for addresses efi/x86: Avoid using code32_start efi/x86: Make efi32_pe_entry() more readable efi/x86: Respect 32-bit ABI in efi32_pe_entry() efi/x86: Annotate the LOADED_IMAGE_PROTOCOL_GUID with SYM_DATA ...
2020-03-30Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The main changes in this cycle were: - Make kfree_rcu() use kfree_bulk() for added performance - RCU updates - Callback-overload handling updates - Tasks-RCU KCSAN and sparse updates - Locking torture test and RCU torture test updates - Documentation updates - Miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) rcu: Make rcu_barrier() account for offline no-CBs CPUs rcu: Mark rcu_state.gp_seq to detect concurrent writes Documentation/memory-barriers: Fix typos doc: Add rcutorture scripting to torture.txt doc/RCU/rcu: Use https instead of http if possible doc/RCU/rcu: Use absolute paths for non-rst files doc/RCU/rcu: Use ':ref:' for links to other docs doc/RCU/listRCU: Update example function name doc/RCU/listRCU: Fix typos in a example code snippets doc/RCU/Design: Remove remaining HTML tags in ReST files doc: Add some more RCU list patterns in the kernel rcutorture: Set KCSAN Kconfig options to detect more data races rcutorture: Manually clean up after rcu_barrier() failure rcutorture: Make rcu_torture_barrier_cbs() post from corresponding CPU rcuperf: Measure memory footprint during kfree_rcu() test rcutorture: Annotation lockless accesses to rcu_torture_current rcutorture: Add READ_ONCE() to rcu_torture_count and rcu_torture_batch rcutorture: Fix stray access to rcu_fwd_cb_nodelay rcutorture: Fix rcu_torture_one_read()/rcu_torture_writer() data race rcutorture: Make kvm-find-errors.sh abort on bad directory ...
2020-03-30ubifs: Fix out-of-bounds memory access caused by abnormal value of node_lenLiu Song
In “ubifs_check_node”, when the value of "node_len" is abnormal, the code will goto label of "out_len" for execution. Then, in the following "ubifs_dump_node", if inode type is "UBIFS_DATA_NODE", in "print_hex_dump", an out-of-bounds access may occur due to the wrong "ch->len". Therefore, when the value of "node_len" is abnormal, data length should to be adjusted to a reasonable safe range. At this time, structured data is not credible, so dump the corrupted data directly for analysis. Signed-off-by: Liu Song <liu.song11@zte.com.cn> Signed-off-by: Richard Weinberger <richard@nod.at>
2020-03-30ubifs: ubifs_add_orphan: Fix a memory leak bugZhihao Cheng
Memory leak occurs when files with extended attributes are added to orphan list. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Fixes: 988bec41318f3fa897e2f8 ("ubifs: orphan: Handle xattrs like files") Signed-off-by: Richard Weinberger <richard@nod.at>
2020-03-30ubifs: ubifs_jnl_write_inode: Fix a memory leak bugZhihao Cheng
When inodes with extended attributes are evicted, xent is not freed in one exit branch. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Fixes: 9ca2d732644484488db3112 ("ubifs: Limit number of xattrs per inode") Signed-off-by: Richard Weinberger <richard@nod.at>
2020-03-30Merge tag 'driver-core-5.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the "big" set of driver core changes for 5.7-rc1. Nothing huge in here, just lots of little firmware core changes and use of new apis, a libfs fix, a debugfs api change, and some driver core deferred probe rework. All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (44 commits) Revert "driver core: Set fw_devlink to "permissive" behavior by default" driver core: Set fw_devlink to "permissive" behavior by default driver core: Replace open-coded list_last_entry() driver core: Read atomic counter once in driver_probe_done() libfs: fix infoleak in simple_attr_read() driver core: Add device links from fwnode only for the primary device platform/x86: touchscreen_dmi: Add info for the Chuwi Vi8 Plus tablet platform/x86: touchscreen_dmi: Add EFI embedded firmware info support Input: icn8505 - Switch to firmware_request_platform for retreiving the fw Input: silead - Switch to firmware_request_platform for retreiving the fw selftests: firmware: Add firmware_request_platform tests test_firmware: add support for firmware_request_platform firmware: Add new platform fallback mechanism and firmware_request_platform() Revert "drivers: base: power: wakeup.c: Use built-in RCU list checking" drivers: base: power: wakeup.c: Use built-in RCU list checking component: allow missing unbind callback debugfs: remove return value of debugfs_create_file_size() debugfs: Check module state before warning in {full/open}_proxy_open() firmware: fix a double abort case with fw_load_sysfs_fallback arch_topology: Fix putting invalid cpu clk ...
2020-03-30ubifs: Fix ubifs_tnc_lookup() usage in do_kill_orphans()Richard Weinberger
Orphans are allowed to point to deleted inodes. So -ENOENT is not a fatal error. Reported-by: Кочетков Максим <fido_max@inbox.ru> Reported-and-tested-by: "Christian Berger" <Christian.Berger@de.bosch.com> Tested-by: Karl Olsen <karl@micro-technic.com> Tested-by: Jef Driesen <jef.driesen@niko.eu> Fixes: ee1438ce5dc4 ("ubifs: Check link count of inodes when killing orphans.") Signed-off-by: Richard Weinberger <richard@nod.at>
2020-03-30Merge tag 'pstore-v5.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: "These mostly some minor cleanups and a bug fix for an ftrace corner case: - Improve failure paths (chenqiwu) - Fix ftrace position index (Vasily Averin) - Use proper flexible-array member (Gustavo A. R. Silva)" * tag 'pstore-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ram: Replace zero-length array with flexible-array member pstore: pstore_ftrace_seq_next should increase position index pstore/ram: remove unnecessary ramoops_unregister_dummy() pstore/platform: fix potential mem leak if pstore_init_fs failed
2020-03-30Merge tag 'erofs-for-5.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "Updates with a XArray adaptation, several fixes for shrinker and corrupted images are ready for this cycle. All commits have been stress tested with no noticeable smoke out and have been in linux-next as well. Summary: - Convert radix tree usage to XArray - Fix shrink scan count on multiple filesystem instances - Better handling for specific corrupted images - Update my email address in MAINTAINERS" * tag 'erofs-for-5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: MAINTAINERS: erofs: update my email address erofs: handle corrupted images whose decompressed size less than it'd be erofs: use LZ4_decompress_safe() for full decoding erofs: correct the remaining shrink objects erofs: convert workstn to XArray
2020-03-30Merge tag 'docs-5.7' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "This has been a busy cycle for documentation work. Highlights include: - Lots of RST conversion work by Mauro, Daniel ALmeida, and others. Maybe someday we'll get to the end of this stuff...maybe... - Some organizational work to bring some order to the core-api manual. - Various new docs and additions to the existing documentation. - Typo fixes, warning fixes, ..." * tag 'docs-5.7' of git://git.lwn.net/linux: (123 commits) Documentation: x86: exception-tables: document CONFIG_BUILDTIME_TABLE_SORT MAINTAINERS: adjust to filesystem doc ReST conversion docs: deprecated.rst: Add BUG()-family doc: zh_CN: add translation for virtiofs doc: zh_CN: index files in filesystems subdirectory docs: locking: Drop :c:func: throughout docs: locking: Add 'need' to hardirq section docs: conf.py: avoid thousands of duplicate label warning on Sphinx docs: prevent warnings due to autosectionlabel docs: fix reference to core-api/namespaces.rst docs: fix pointers to io-mapping.rst and io_ordering.rst files Documentation: Better document the softlockup_panic sysctl docs: hw-vuln: tsx_async_abort.rst: get rid of an unused ref docs: perf: imx-ddr.rst: get rid of a warning docs: filesystems: fuse.rst: supress a Sphinx warning docs: translations: it: avoid duplicate refs at programming-language.rst docs: driver.rst: supress two ReSt warnings docs: trace: events.rst: convert some new stuff to ReST format Documentation: Add io_ordering.rst to driver-api manual Documentation: Add io-mapping.rst to driver-api manual ...
2020-03-30Merge tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: "Here are the io_uring changes for this merge window. Light on new features this time around (just splice + buffer selection), lots of cleanups, fixes, and improvements to existing support. In particular, this contains: - Cleanup fixed file update handling for stack fallback (Hillf) - Re-work of how pollable async IO is handled, we no longer require thread offload to handle that. Instead we rely using poll to drive this, with task_work execution. - In conjunction with the above, allow expendable buffer selection, so that poll+recv (for example) no longer has to be a split operation. - Make sure we honor RLIMIT_FSIZE for buffered writes - Add support for splice (Pavel) - Linked work inheritance fixes and optimizations (Pavel) - Async work fixes and cleanups (Pavel) - Improve io-wq locking (Pavel) - Hashed link write improvements (Pavel) - SETUP_IOPOLL|SETUP_SQPOLL improvements (Xiaoguang)" * tag 'for-5.7/io_uring-2020-03-29' of git://git.kernel.dk/linux-block: (54 commits) io_uring: cleanup io_alloc_async_ctx() io_uring: fix missing 'return' in comment io-wq: handle hashed writes in chains io-uring: drop 'free_pfile' in struct io_file_put io-uring: drop completion when removing file io_uring: Fix ->data corruption on re-enqueue io-wq: close cancel gap for hashed linked work io_uring: make spdxcheck.py happy io_uring: honor original task RLIMIT_FSIZE io-wq: hash dependent work io-wq: split hashing and enqueueing io-wq: don't resched if there is no work io-wq: remove duplicated cancel code io_uring: fix truncated async read/readv and write/writev retry io_uring: dual license io_uring.h uapi header io_uring: io_uring_enter(2) don't poll while SETUP_IOPOLL|SETUP_SQPOLL enabled io_uring: Fix unused function warnings io_uring: add end-of-bits marker and build time verify it io_uring: provide means of removing buffers io_uring: add IOSQE_BUFFER_SELECT support for IORING_OP_RECVMSG ...
2020-03-30Merge tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: - Online capacity resizing (Balbir) - Number of hardware queue change fixes (Bart) - null_blk fault injection addition (Bart) - Cleanup of queue allocation, unifying the node/no-node API (Christoph) - Cleanup of genhd, moving code to where it makes sense (Christoph) - Cleanup of the partition handling code (Christoph) - disk stat fixes/improvements (Konstantin) - BFQ improvements (Paolo) - Various fixes and improvements * tag 'for-5.7/block-2020-03-29' of git://git.kernel.dk/linux-block: (72 commits) block: return NULL in blk_alloc_queue() on error block: move bio_map_* to blk-map.c Revert "blkdev: check for valid request queue before issuing flush" block: simplify queue allocation bcache: pass the make_request methods to blk_queue_make_request null_blk: use blk_mq_init_queue_data block: add a blk_mq_init_queue_data helper block: move the ->devnode callback to struct block_device_operations block: move the part_stat* helpers from genhd.h to a new header block: move block layer internals out of include/linux/genhd.h block: move guard_bio_eod to bio.c block: unexport get_gendisk block: unexport disk_map_sector_rcu block: unexport disk_get_part block: mark part_in_flight and part_in_flight_rw static block: mark block_depr static block: factor out requeue handling from dispatch code block/diskstats: replace time_in_queue with sum of request times block/diskstats: accumulate all per-cpu counters in one pass block/diskstats: more accurate approximation of io_ticks for slow disks ...
2020-03-30gfs2: Fix oversight in gfs2_ail1_flushBob Peterson
Ordinarily, function gfs2_ail1_start_one issues a write request for one item on the ail1 list, then returns -EBUSY. This makes the caller, gfs2_ail1_flush, loop around and start another. However, it was not clearing the -EBUSY return code each time through the loop. So on rare occasions, like when the wbc runs out of nr_to_write, it remained set to -EBUSY, which triggered an error and withdraw. This patch sets the return code to 0 each time through the restart loop so this won't happen anymore. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2020-03-30ceph: fix snapshot directory timestampsLuis Henriques
The .snap directory timestamps are kept at 0 (1970-01-01 00:00), which isn't consistent with what the fuse client does. This patch makes the behaviour consistent, by setting these timestamps (atime, btime, ctime, mtime) to those of the parent directory. Cc: Marc Roos <M.Roos@f1-outsourcing.eu> Signed-off-by: Luis Henriques <lhenriques@suse.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30ceph: wait for async creating inode before requesting new max sizeYan, Zheng
ceph_check_caps() can't request new max size for async creating inode. This may make ceph_get_caps() loop busily until getting reply of the async create. Also, wait for async creating reply before calling ceph_renew_caps(). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30ceph: don't skip updating wanted caps when cap is staleYan, Zheng
1. try_get_cap_refs() fails to get caps and finds that mds_wanted does not include what it wants. It returns -ESTALE. 2. ceph_get_caps() calls ceph_renew_caps(). ceph_renew_caps() finds that inode has cap, so it calls ceph_check_caps(). 3. ceph_check_caps() finds that issued caps (without checking if it's stale) already includes caps wanted by open file, so it skips updating wanted caps. Above events can cause an infinite loop inside ceph_get_caps(). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-03-30ceph: request new max size only when there is auth capYan, Zheng
When there is no auth cap, check_max_size() can't do anything and may cause an infinite loop inside ceph_get_caps(). Signed-off-by: "Yan, Zheng" <zyan@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>