summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2019-08-01timerfd: Prepare for PREEMPT_RTAnna-Maria Gleixner
Use the hrtimer_cancel_wait_running() synchronization mechanism to prevent priority inversion and live locks on PREEMPT_RT. [ tglx: Split out of combo patch ] Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190730223828.600085866@linutronix.de
2019-07-31Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull mount_capable() fix from Al Viro. * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Unbreak mount_capable()
2019-07-31docs: fix a couple of new broken referencesMauro Carvalho Chehab
Those are due to recent changes. Most of the issues can be automatically fixed with: $ ./scripts/documentation-file-ref-check --fix The only exception was the sound binding with required manual work. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-07-31docs: fs: convert porting to ReSTMauro Carvalho Chehab
This file has its own proper style, except that, after a while, the coding style gets violated and whitespaces are placed on different ways. As Sphinx and ReST are very sentitive to whitespace differences, I had to opt if each entry after required/mandatory/... fields should start with zero spaces or with a tab. I opted to start them all from the zero position, in order to avoid needing to break lines with more than 80 columns, with would make harder for review. Most of the other changes at porting.rst were made to use an unified notation with works nice as a text file while also produce a good html output after being parsed. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-07-31docs: fs: convert docs without extension to ReSTMauro Carvalho Chehab
There are 3 remaining files without an extension inside the fs docs dir. Manually convert them to ReST. In the case of the nfs/exporting.rst file, as the nfs docs aren't ported yet, I opted to convert and add a :orphan: there, with should be removed when it gets added into a nfs-specific part of the fs documentation. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2019-07-31gfs2: Inode dirtying fixAndreas Gruenbacher
With the recent iomap write page reclaim deadlock fix, it turns out that the GLF_DIRTY flag isn't always set when it needs to be anymore: previously, this happened as a side effect of always adding the inode buffer head to the current transaction with gfs2_trans_add_meta, but this isn't happening consistently anymore. Fix by removing an additional unnecessary gfs2_trans_add_meta call and by setting the GLF_DIRTY flag in gfs2_iomap_end. (The GLF_DIRTY flag causes inode_go_sync to flush the transaction log when syncing out the glock of that inode. When the flag isn't set, inode_go_sync will skip inodes, including ones with an i_state of I_DIRTY_PAGES, which will lead to cluster incoherency.) In addition, in gfs2_iomap_page_done, if the metadata has changed, mark the inode as I_DIRTY_DATASYNC to have the inode added to the current transaction: we don't expect metadata to change here, but let's err on the safe side. Fixes: d0a22a4b03b8 ("gfs2: Fix iomap write page reclaim deadlock"); Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-31udf: prevent allocation beyond UDF partitionSteve Magnani
The UDF bitmap allocation code assumes that a recorded Unallocated Space Bitmap is compliant with ECMA-167 4/13, which requires that pad bytes between the end of the bitmap and the end of a logical block are all zero. When a recorded bitmap does not comply with this requirement, for example one padded with FF to the block boundary instead of 00, the allocator may "allocate" blocks that are outside the UDF partition extent. This can result in UDF volume descriptors being overwritten by file data or by partition-level descriptors, and in extreme cases, even in scribbling on a subsequent disk partition. Add a check that the block selected by the allocator actually resides within the UDF partition extent. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Link: https://lore.kernel.org/r/1564341552-129750-1-git-send-email-steve@digidescorp.com Signed-off-by: Jan Kara <jack@suse.cz>
2019-07-31Unbreak mount_capable()Al Viro
In "consolidate the capability checks in sget_{fc,userns}())" the wrong argument had been passed to mount_capable() by sget_fc(). That mistake had been further obscured later, when switching mount_capable() to fs_context has moved the calculation of bogus argument from sget_fc() to mount_capable() itself. It should've been fc->user_ns all along. Screwed-up-by: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Christian Brauner <christian@brauner.io> Tested-by: Christian Brauner <christian@brauner.io> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-31io_uring: fix KASAN use after free in io_sq_wq_submit_workJackie Liu
[root@localhost ~]# ./liburing/test/link QEMU Standard PC report that: [ 29.379892] CPU: 0 PID: 84 Comm: kworker/u2:2 Not tainted 5.3.0-rc2-00051-g4010b622f1d2-dirty #86 [ 29.379902] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 29.379913] Workqueue: io_ring-wq io_sq_wq_submit_work [ 29.379929] Call Trace: [ 29.379953] dump_stack+0xa9/0x10e [ 29.379970] ? io_sq_wq_submit_work+0xbf4/0xe90 [ 29.379986] print_address_description.cold.6+0x9/0x317 [ 29.379999] ? io_sq_wq_submit_work+0xbf4/0xe90 [ 29.380010] ? io_sq_wq_submit_work+0xbf4/0xe90 [ 29.380026] __kasan_report.cold.7+0x1a/0x34 [ 29.380044] ? io_sq_wq_submit_work+0xbf4/0xe90 [ 29.380061] kasan_report+0xe/0x12 [ 29.380076] io_sq_wq_submit_work+0xbf4/0xe90 [ 29.380104] ? io_sq_thread+0xaf0/0xaf0 [ 29.380152] process_one_work+0xb59/0x19e0 [ 29.380184] ? pwq_dec_nr_in_flight+0x2c0/0x2c0 [ 29.380221] worker_thread+0x8c/0xf40 [ 29.380248] ? __kthread_parkme+0xab/0x110 [ 29.380265] ? process_one_work+0x19e0/0x19e0 [ 29.380278] kthread+0x30b/0x3d0 [ 29.380292] ? kthread_create_on_node+0xe0/0xe0 [ 29.380311] ret_from_fork+0x3a/0x50 [ 29.380635] Allocated by task 209: [ 29.381255] save_stack+0x19/0x80 [ 29.381268] __kasan_kmalloc.constprop.6+0xc1/0xd0 [ 29.381279] kmem_cache_alloc+0xc0/0x240 [ 29.381289] io_submit_sqe+0x11bc/0x1c70 [ 29.381300] io_ring_submit+0x174/0x3c0 [ 29.381311] __x64_sys_io_uring_enter+0x601/0x780 [ 29.381322] do_syscall_64+0x9f/0x4d0 [ 29.381336] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 29.381633] Freed by task 84: [ 29.382186] save_stack+0x19/0x80 [ 29.382198] __kasan_slab_free+0x11d/0x160 [ 29.382210] kmem_cache_free+0x8c/0x2f0 [ 29.382220] io_put_req+0x22/0x30 [ 29.382230] io_sq_wq_submit_work+0x28b/0xe90 [ 29.382241] process_one_work+0xb59/0x19e0 [ 29.382251] worker_thread+0x8c/0xf40 [ 29.382262] kthread+0x30b/0x3d0 [ 29.382272] ret_from_fork+0x3a/0x50 [ 29.382569] The buggy address belongs to the object at ffff888067172140 which belongs to the cache io_kiocb of size 224 [ 29.384692] The buggy address is located 120 bytes inside of 224-byte region [ffff888067172140, ffff888067172220) [ 29.386723] The buggy address belongs to the page: [ 29.387575] page:ffffea00019c5c80 refcount:1 mapcount:0 mapping:ffff88806ace5180 index:0x0 [ 29.387587] flags: 0x100000000000200(slab) [ 29.387603] raw: 0100000000000200 dead000000000100 dead000000000122 ffff88806ace5180 [ 29.387617] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000 [ 29.387624] page dumped because: kasan: bad access detected [ 29.387920] Memory state around the buggy address: [ 29.388771] ffff888067172080: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc [ 29.390062] ffff888067172100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb [ 29.391325] >ffff888067172180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 29.392578] ^ [ 29.393480] ffff888067172200: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc [ 29.394744] ffff888067172280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 29.396003] ================================================================== [ 29.397260] Disabling lock debugging due to kernel taint io_sq_wq_submit_work free and read req again. Cc: Zhengyuan Liu <liuzhengyuan@kylinos.cn> Cc: linux-block@vger.kernel.org Cc: stable@vger.kernel.org Fixes: f7b76ac9d17e ("io_uring: fix counter inc/dec mismatch in async_list") Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-31quota: fix condition for resetting time limit in do_set_dqblk()Chengguang Xu
We reset time limit when current usage is smaller or equal to soft limit in other place, so follow this rule in do_set_dqblk(). Signed-off-by: Chengguang Xu <cgxu519@zoho.com.cn> Link: https://lore.kernel.org/r/20190724053216.19392-1-cgxu519@zoho.com.cn Signed-off-by: Jan Kara <jack@suse.cz>
2019-07-31ext2: code cleanup for ext2_free_blocks()Chengguang Xu
Call ext2_data_block_valid() for block range validity. Signed-off-by: Chengguang Xu <cgxu519@zoho.com.cn> Link: https://lore.kernel.org/r/20190723112155.20329-2-cgxu519@zoho.com.cn Signed-off-by: Jan Kara <jack@suse.cz>
2019-07-31ext2: fix block range in ext2_data_block_valid()Chengguang Xu
For block validity we should check the block range from start_block to start_block + count - 1, so fix the range in ext2_data_block_valid() and also modify the count argument properly in calling place. Signed-off-by: Chengguang Xu <cgxu519@zoho.com.cn> Link: https://lore.kernel.org/r/20190723112155.20329-1-cgxu519@zoho.com.cn Signed-off-by: Jan Kara <jack@suse.cz>
2019-07-31udf: support 2048-byte spacing of VRS descriptors on 4K mediaSteven J. Magnani
Some UDF creators (specifically Microsoft, but perhaps others) mishandle the ECMA-167 corner case that requires descriptors within a Volume Recognition Sequence to be placed at 4096-byte intervals on media where the block size is 4K. Instead, the descriptors are placed at the 2048- byte interval mandated for media with smaller blocks. This nonconformity currently prevents Linux from recognizing the filesystem as UDF. Modify the driver to tolerate a misformatted VRS on 4K media. [JK: Simplified descriptor checking] Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Tested-by: Steven J. Magnani <steve@digidescorp.com> Link: https://lore.kernel.org/r/20190711133852.16887-2-steve@digidescorp.com Signed-off-by: Jan Kara <jack@suse.cz>
2019-07-31udf: refactor VRS descriptor identificationSteven J. Magnani
Extract code that parses a Volume Recognition Sequence descriptor (component), in preparation for calling it twice against different locations in a block. Signed-off-by: Steven J. Magnani <steve@digidescorp.com> Link: https://lore.kernel.org/r/20190711133852.16887-1-steve@digidescorp.com Signed-off-by: Jan Kara <jack@suse.cz>
2019-07-30Merge branch 'dax-fix-5.3-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax fix from Dan Williams: "Fix a botched manual patch update that got dropped between testing and application" * 'dax-fix-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Fix missed wakeup in put_unlocked_entry()
2019-07-30compat_ioctl: pppoe: fix PPPOEIOCSFWD handlingArnd Bergmann
Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in linux-2.5.69 along with hundreds of other commands, but was always broken sincen only the structure is compatible, but the command number is not, due to the size being sizeof(size_t), or at first sizeof(sizeof((struct sockaddr_pppox)), which is different on 64-bit architectures. Guillaume Nault adds: And the implementation was broken until 2016 (see 29e73269aa4d ("pppoe: fix reference counting in PPPoE proxy")), and nobody ever noticed. I should probably have removed this ioctl entirely instead of fixing it. Clearly, it has never been used. Fix it by adding a compat_ioctl handler for all pppoe variants that translates the command number and then calls the regular ioctl function. All other ioctl commands handled by pppoe are compatible between 32-bit and 64-bit, and require compat_ptr() conversion. This should apply to all stable kernels. Acked-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-30Merge tag 'f2fs-for-5.4-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "This set of patches adjust to follow recent setflags changes and fix two regressions" * tag 'f2fs-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: use EINVAL for superblock with invalid magic f2fs: fix to read source block before invalidating it f2fs: remove redundant check from f2fs_setflags_common() f2fs: use generic checking function for FS_IOC_FSSETXATTR f2fs: use generic checking and prep function for FS_IOC_SETFLAGS
2019-07-30loop: Fix mount(2) failure due to race with LOOP_SET_FDJan Kara
Commit 33ec3e53e7b1 ("loop: Don't change loop device under exclusive opener") made LOOP_SET_FD ioctl acquire exclusive block device reference while it updates loop device binding. However this can make perfectly valid mount(2) fail with EBUSY due to racing LOOP_SET_FD holding temporarily the exclusive bdev reference in cases like this: for i in {a..z}{a..z}; do dd if=/dev/zero of=$i.image bs=1k count=0 seek=1024 mkfs.ext2 $i.image mkdir mnt$i done echo "Run" for i in {a..z}{a..z}; do mount -o loop -t ext2 $i.image mnt$i & done Fix the problem by not getting full exclusive bdev reference in LOOP_SET_FD but instead just mark the bdev as being claimed while we update the binding information. This just blocks new exclusive openers instead of failing them with EBUSY thus fixing the problem. Fixes: 33ec3e53e7b1 ("loop: Don't change loop device under exclusive opener") Cc: stable@vger.kernel.org Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-30xfs: Fix possible null-pointer dereferences in ↵Jia-Ju Bai
xchk_da_btree_block_check_sibling() In xchk_da_btree_block_check_sibling(), there is an if statement on line 274 to check whether ds->state->altpath.blk[level].bp is NULL: if (ds->state->altpath.blk[level].bp) When ds->state->altpath.blk[level].bp is NULL, it is used on line 281: xfs_trans_brelse(..., ds->state->altpath.blk[level].bp); struct xfs_buf_log_item *bip = bp->b_log_item; ASSERT(bp->b_transp == tp); Thus, possible null-pointer dereferences may occur. To fix these bugs, ds->state->altpath.blk[level].bp is checked before being used. These bugs are found by a static analysis tool STCheck written by us. Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-07-30Btrfs: fix deadlock between fiemap and transaction commitsFilipe Manana
The fiemap handler locks a file range that can have unflushed delalloc, and after locking the range, it tries to attach to a running transaction. If the running transaction started its commit, that is, it is in state TRANS_STATE_COMMIT_START, and either the filesystem was mounted with the flushoncommit option or the transaction is creating a snapshot for the subvolume that contains the file that fiemap is operating on, we end up deadlocking. This happens because fiemap is blocked on the transaction, waiting for it to complete, and the transaction is waiting for the flushed dealloc to complete, which requires locking the file range that the fiemap task already locked. The following stack traces serve as an example of when this deadlock happens: (...) [404571.515510] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [404571.515956] Call Trace: [404571.516360] ? __schedule+0x3ae/0x7b0 [404571.516730] schedule+0x3a/0xb0 [404571.517104] lock_extent_bits+0x1ec/0x2a0 [btrfs] [404571.517465] ? remove_wait_queue+0x60/0x60 [404571.517832] btrfs_finish_ordered_io+0x292/0x800 [btrfs] [404571.518202] normal_work_helper+0xea/0x530 [btrfs] [404571.518566] process_one_work+0x21e/0x5c0 [404571.518990] worker_thread+0x4f/0x3b0 [404571.519413] ? process_one_work+0x5c0/0x5c0 [404571.519829] kthread+0x103/0x140 [404571.520191] ? kthread_create_worker_on_cpu+0x70/0x70 [404571.520565] ret_from_fork+0x3a/0x50 [404571.520915] kworker/u8:6 D 0 31651 2 0x80004000 [404571.521290] Workqueue: btrfs-flush_delalloc btrfs_flush_delalloc_helper [btrfs] (...) [404571.537000] fsstress D 0 13117 13115 0x00004000 [404571.537263] Call Trace: [404571.537524] ? __schedule+0x3ae/0x7b0 [404571.537788] schedule+0x3a/0xb0 [404571.538066] wait_current_trans+0xc8/0x100 [btrfs] [404571.538349] ? remove_wait_queue+0x60/0x60 [404571.538680] start_transaction+0x33c/0x500 [btrfs] [404571.539076] btrfs_check_shared+0xa3/0x1f0 [btrfs] [404571.539513] ? extent_fiemap+0x2ce/0x650 [btrfs] [404571.539866] extent_fiemap+0x2ce/0x650 [btrfs] [404571.540170] do_vfs_ioctl+0x526/0x6f0 [404571.540436] ksys_ioctl+0x70/0x80 [404571.540734] __x64_sys_ioctl+0x16/0x20 [404571.540997] do_syscall_64+0x60/0x1d0 [404571.541279] entry_SYSCALL_64_after_hwframe+0x49/0xbe (...) [404571.543729] btrfs D 0 14210 14208 0x00004000 [404571.544023] Call Trace: [404571.544275] ? __schedule+0x3ae/0x7b0 [404571.544526] ? wait_for_completion+0x112/0x1a0 [404571.544795] schedule+0x3a/0xb0 [404571.545064] schedule_timeout+0x1ff/0x390 [404571.545351] ? lock_acquire+0xa6/0x190 [404571.545638] ? wait_for_completion+0x49/0x1a0 [404571.545890] ? wait_for_completion+0x112/0x1a0 [404571.546228] wait_for_completion+0x131/0x1a0 [404571.546503] ? wake_up_q+0x70/0x70 [404571.546775] btrfs_wait_ordered_extents+0x27c/0x400 [btrfs] [404571.547159] btrfs_commit_transaction+0x3b0/0xae0 [btrfs] [404571.547449] ? btrfs_mksubvol+0x4a4/0x640 [btrfs] [404571.547703] ? remove_wait_queue+0x60/0x60 [404571.547969] btrfs_mksubvol+0x605/0x640 [btrfs] [404571.548226] ? __sb_start_write+0xd4/0x1c0 [404571.548512] ? mnt_want_write_file+0x24/0x50 [404571.548789] btrfs_ioctl_snap_create_transid+0x169/0x1a0 [btrfs] [404571.549048] btrfs_ioctl_snap_create_v2+0x11d/0x170 [btrfs] [404571.549307] btrfs_ioctl+0x133f/0x3150 [btrfs] [404571.549549] ? mem_cgroup_charge_statistics+0x4c/0xd0 [404571.549792] ? mem_cgroup_commit_charge+0x84/0x4b0 [404571.550064] ? __handle_mm_fault+0xe3e/0x11f0 [404571.550306] ? do_raw_spin_unlock+0x49/0xc0 [404571.550608] ? _raw_spin_unlock+0x24/0x30 [404571.550976] ? __handle_mm_fault+0xedf/0x11f0 [404571.551319] ? do_vfs_ioctl+0xa2/0x6f0 [404571.551659] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] [404571.552087] do_vfs_ioctl+0xa2/0x6f0 [404571.552355] ksys_ioctl+0x70/0x80 [404571.552621] __x64_sys_ioctl+0x16/0x20 [404571.552864] do_syscall_64+0x60/0x1d0 [404571.553104] entry_SYSCALL_64_after_hwframe+0x49/0xbe (...) If we were joining the transaction instead of attaching to it, we would not risk a deadlock because a join only blocks if the transaction is in a state greater then or equals to TRANS_STATE_COMMIT_DOING, and the delalloc flush performed by a transaction is done before it reaches that state, when it is in the state TRANS_STATE_COMMIT_START. However a transaction join is intended for use cases where we do modify the filesystem, and fiemap only needs to peek at delayed references from the current transaction in order to determine if extents are shared, and, besides that, when there is no current transaction or when it blocks to wait for a current committing transaction to complete, it creates a new transaction without reserving any space. Such unnecessary transactions, besides doing unnecessary IO, can cause transaction aborts (-ENOSPC) and unnecessary rotation of the precious backup roots. So fix this by adding a new transaction join variant, named join_nostart, which behaves like the regular join, but it does not create a transaction when none currently exists or after waiting for a committing transaction to complete. Fixes: 03628cdbc64db6 ("Btrfs: do not start a transaction during fiemap") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-30Btrfs: fix race leading to fs corruption after transaction abortFilipe Manana
When one transaction is finishing its commit, it is possible for another transaction to start and enter its initial commit phase as well. If the first ends up getting aborted, we have a small time window where the second transaction commit does not notice that the previous transaction aborted and ends up committing, writing a superblock that points to btrees that reference extent buffers (nodes and leafs) that were not persisted to disk. The consequence is that after mounting the filesystem again, we will be unable to load some btree nodes/leafs, either because the content on disk is either garbage (or just zeroes) or corresponds to the old content of a previouly COWed or deleted node/leaf, resulting in the well known error messages "parent transid verify failed on ...". The following sequence diagram illustrates how this can happen. CPU 1 CPU 2 <at transaction N> btrfs_commit_transaction() (...) --> sets transaction state to TRANS_STATE_UNBLOCKED --> sets fs_info->running_transaction to NULL (...) btrfs_start_transaction() start_transaction() wait_current_trans() --> returns immediately because fs_info->running_transaction is NULL join_transaction() --> creates transaction N + 1 --> sets fs_info->running_transaction to transaction N + 1 --> adds transaction N + 1 to the fs_info->trans_list list --> returns transaction handle pointing to the new transaction N + 1 (...) btrfs_sync_file() btrfs_start_transaction() --> returns handle to transaction N + 1 (...) btrfs_write_and_wait_transaction() --> writeback of some extent buffer fails, returns an error btrfs_handle_fs_error() --> sets BTRFS_FS_STATE_ERROR in fs_info->fs_state --> jumps to label "scrub_continue" cleanup_transaction() btrfs_abort_transaction(N) --> sets BTRFS_FS_STATE_TRANS_ABORTED flag in fs_info->fs_state --> sets aborted field in the transaction and transaction handle structures, for transaction N only --> removes transaction from the list fs_info->trans_list btrfs_commit_transaction(N + 1) --> transaction N + 1 was not aborted, so it proceeds (...) --> sets the transaction's state to TRANS_STATE_COMMIT_START --> does not find the previous transaction (N) in the fs_info->trans_list, so it doesn't know that transaction was aborted, and the commit of transaction N + 1 proceeds (...) --> sets transaction N + 1 state to TRANS_STATE_UNBLOCKED btrfs_write_and_wait_transaction() --> succeeds writing all extent buffers created in the transaction N + 1 write_all_supers() --> succeeds --> we now have a superblock on disk that points to trees that refer to at least one extent buffer that was never persisted So fix this by updating the transaction commit path to check if the flag BTRFS_FS_STATE_TRANS_ABORTED is set on fs_info->fs_state if after setting the transaction to the TRANS_STATE_COMMIT_START we do not find any previous transaction in the fs_info->trans_list. If the flag is set, just fail the transaction commit with -EROFS, as we do in other places. The exact error code for the previous transaction abort was already logged and reported. Fixes: 49b25e0540904b ("btrfs: enhance transaction abort infrastructure") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-30Btrfs: fix incremental send failure after deduplicationFilipe Manana
When doing an incremental send operation we can fail if we previously did deduplication operations against a file that exists in both snapshots. In that case we will fail the send operation with -EIO and print a message to dmesg/syslog like the following: BTRFS error (device sdc): Send: inconsistent snapshot, found updated \ extent for inode 257 without updated inode item, send root is 258, \ parent root is 257 This requires that we deduplicate to the same file in both snapshots for the same amount of times on each snapshot. The issue happens because a deduplication only updates the iversion of an inode and does not update any other field of the inode, therefore if we deduplicate the file on each snapshot for the same amount of time, the inode will have the same iversion value (stored as the "sequence" field on the inode item) on both snapshots, therefore it will be seen as unchanged between in the send snapshot while there are new/updated/deleted extent items when comparing to the parent snapshot. This makes the send operation return -EIO and print an error message. Example reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt # Create our first file. The first half of the file has several 64Kb # extents while the second half as a single 512Kb extent. $ xfs_io -f -s -c "pwrite -S 0xb8 -b 64K 0 512K" /mnt/foo $ xfs_io -c "pwrite -S 0xb8 512K 512K" /mnt/foo # Create the base snapshot and the parent send stream from it. $ btrfs subvolume snapshot -r /mnt /mnt/mysnap1 $ btrfs send -f /tmp/1.snap /mnt/mysnap1 # Create our second file, that has exactly the same data as the first # file. $ xfs_io -f -c "pwrite -S 0xb8 0 1M" /mnt/bar # Create the second snapshot, used for the incremental send, before # doing the file deduplication. $ btrfs subvolume snapshot -r /mnt /mnt/mysnap2 # Now before creating the incremental send stream: # # 1) Deduplicate into a subrange of file foo in snapshot mysnap1. This # will drop several extent items and add a new one, also updating # the inode's iversion (sequence field in inode item) by 1, but not # any other field of the inode; # # 2) Deduplicate into a different subrange of file foo in snapshot # mysnap2. This will replace an extent item with a new one, also # updating the inode's iversion by 1 but not any other field of the # inode. # # After these two deduplication operations, the inode items, for file # foo, are identical in both snapshots, but we have different extent # items for this inode in both snapshots. We want to check this doesn't # cause send to fail with an error or produce an incorrect stream. $ xfs_io -r -c "dedupe /mnt/bar 0 0 512K" /mnt/mysnap1/foo $ xfs_io -r -c "dedupe /mnt/bar 512K 512K 512K" /mnt/mysnap2/foo # Create the incremental send stream. $ btrfs send -p /mnt/mysnap1 -f /tmp/2.snap /mnt/mysnap2 ERROR: send ioctl failed with -5: Input/output error This issue started happening back in 2015 when deduplication was updated to not update the inode's ctime and mtime and update only the iversion. Back then we would hit a BUG_ON() in send, but later in 2016 send was updated to return -EIO and print the error message instead of doing the BUG_ON(). A test case for fstests follows soon. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203933 Fixes: 1c919a5e13702c ("btrfs: don't update mtime/ctime on deduped inodes") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-30afs: Fix missing dentry data version updatingDavid Howells
In the in-kernel afs filesystem, the d_fsdata dentry field is used to hold the data version of the parent directory when it was created or when d_revalidate() last caused it to be updated. This is compared to the ->invalid_before field in the directory inode, rather than the actual data version number, thereby allowing changes due to local edits to be ignored. Only if the server data version gets bumped unexpectedly (eg. by a competing client), do we need to revalidate stuff. However, the d_fsdata field should also be updated if an rpc op is performed that modifies that particular dentry. Such ops return the revised data version of the directory(ies) involved, so we should use that. This is particularly problematic for rename, since a dentry from one directory may be moved directly into another directory (ie. mv a/x b/x). It would then be sporting the wrong data version - and if this is in the future, for the destination directory, revalidations would be missed, leading to foreign renames and hard-link deletion being missed. Fix this by the following means: (1) Return the data version number from operations that read the directory contents - if they issue the read. This starts in afs_dir_iterate() and is used, ignored or passed back by its callers. (2) In afs_lookup*(), set the dentry version to the version returned by (1) before d_splice_alias() is called and the dentry published. (3) In afs_d_revalidate(), set the dentry version to that returned from (1) if an rpc call was issued. This means that if a parallel procedure, such as mkdir(), modifies the directory, we won't accidentally use the data version from that. (4) In afs_{mkdir,create,link,symlink}(), set the new dentry's version to the directory data version before d_instantiate() is called. (5) In afs_{rmdir,unlink}, update the target dentry's version to the directory data version as soon as we've updated the directory inode. (6) In afs_rename(), we need to unhash the old dentry before we start so that we don't get afs_d_revalidate() reverting the version change in cross-directory renames. We then need to set both the old and the new dentry versions the data version of the new directory before we call d_move() as d_move() will rehash them. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-30afs: Only update d_fsdata if different in afs_d_revalidate()David Howells
In the in-kernel afs filesystem, d_fsdata is set with the data version of the parent directory. afs_d_revalidate() will update this to the current directory version, but it shouldn't do this if it the value it read from d_fsdata is the same as no lock is held and cmpxchg() is not used. Fix the code to only change the value if it is different from the current directory version. Fixes: 260a980317da ("[AFS]: Add "directory write" support.") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-30afs: Fix off-by-one in afs_rename() expected data version calculationDavid Howells
When afs_rename() calculates the expected data version of the target directory in a cross-directory rename, it doesn't increment it as it should, so it always thinks that the target inode is unexpectedly modified on the server. Fixes: a58823ac4589 ("afs: Fix application of status and callback to be under same lock") Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-30fs: afs: Fix a possible null-pointer dereference in afs_put_read()Jia-Ju Bai
In afs_read_dir(), there is an if statement on line 255 to check whether req->pages is NULL: if (!req->pages) goto error; If req->pages is NULL, afs_put_read() on line 337 is executed. In afs_put_read(), req->pages[i] is used on line 195. Thus, a possible null-pointer dereference may occur in this case. To fix this possible bug, an if statement is added in afs_put_read() to check req->pages. This bug is found by a static analysis tool STCheck written by us. Fixes: f3ddee8dc4e2 ("afs: Fix directory handling") Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com>
2019-07-30afs: Fix loop index mixup in afs_deliver_vl_get_entry_by_name_u()Marc Dionne
afs_deliver_vl_get_entry_by_name_u() scans through the vl entry received from the volume location server and builds a return list containing the sites that are currently valid. When assigning values for the return list, the index into the vl entry (i) is used rather than the one for the new list (entry->nr_server). If all sites are usable, this works out fine as the indices will match. If some sites are not valid, for example if AFS_VLSF_DONTUSE is set, fs_mask and the uuid will be set for the wrong return site. Fix this by using entry->nr_server as the index into the arrays being filled in rather than i. This can lead to EDESTADDRREQ errors if none of the returned sites have a valid fs_mask. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
2019-07-30afs: Fix the CB.ProbeUuid service handler to reply correctlyDavid Howells
Fix the service handler function for the CB.ProbeUuid RPC call so that it replies in the correct manner - that is an empty reply for success and an abort of 1 for failure. Putting 0 or 1 in an integer in the body of the reply should result in the fileserver throwing an RX_PROTOCOL_ERROR abort and discarding its record of the client; older servers, however, don't necessarily check that all the data got consumed, and so might incorrectly think that they got a positive response and associate the client with the wrong host record. If the client is incorrectly associated, this will result in callbacks intended for a different client being delivered to this one and then, when the other client connects and responds positively, all of the callback promises meant for the client that issued the improper response will be lost and it won't receive any further change notifications. Fixes: 9396d496d745 ("afs: support the CB.ProbeUuid RPC op") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com>
2019-07-29dax: Fix missed wakeup in put_unlocked_entry()Jan Kara
The condition checking whether put_unlocked_entry() needs to wake up following waiter got broken by commit 23c84eb78375 ("dax: Fix missed wakeup with PMD faults"). We need to wake the waiter whenever the passed entry is valid (i.e., non-NULL and not special conflict entry). This could lead to processes never being woken up when waiting for entry lock. Fix the condition. Cc: <stable@vger.kernel.org> Link: http://lore.kernel.org/r/20190729120228.GC17833@quack2.suse.cz Fixes: 23c84eb78375 ("dax: Fix missed wakeup with PMD faults") Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-07-28f2fs: use EINVAL for superblock with invalid magicIcenowy Zheng
The kernel mount_block_root() function expects -EACESS or -EINVAL for a unmountable filesystem when trying to mount the root with different filesystem types. However, in 5.3-rc1 the behavior when F2FS code cannot find valid block changed to return -EFSCORRUPTED(-EUCLEAN), and this error code makes mount_block_root() fail when trying to probe F2FS. When the magic number of the superblock mismatches, it has a high probability that it's just not a F2FS. In this case return -EINVAL seems to be a better result, and this return value can make mount_block_root() probing work again. Return -EINVAL when the superblock has magic mismatch, -EFSCORRUPTED in other cases (the magic matches but the superblock cannot be recognized). Fixes: 10f966bbf521 ("f2fs: use generic EFSBADCRC/EFSCORRUPTED") Signed-off-by: Icenowy Zheng <icenowy@aosc.io> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-28xfs: fix stack contents leakage in the v1 inumber ioctlsDarrick J. Wong
Explicitly initialize the onstack structures to zero so we don't leak kernel memory into userspace when converting the in-core inumbers structure to the v1 inogrp ioctl structure. Add a comment about why we have to use memset to ensure that the padding holes in the structures are set to zero. Fixes: 5f19c7fc6873351 ("xfs: introduce v5 inode group structure") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2019-07-28fs-verity: add data verification hooks for ->readpages()Eric Biggers
Add functions that verify data pages that have been read from a fs-verity file, against that file's Merkle tree. These will be called from filesystems' ->readpage() and ->readpages() methods. Since data verification can block, a workqueue is provided for these methods to enqueue verification work from their bio completion callback. See the "Verifying data" section of Documentation/filesystems/fsverity.rst for more information. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-07-28fs-verity: add the hook for file ->setattr()Eric Biggers
Add a function fsverity_prepare_setattr() which filesystems that support fs-verity must call to deny truncates of verity files. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-07-28fs-verity: add the hook for file ->open()Eric Biggers
Add the fsverity_file_open() function, which prepares an fs-verity file to be read from. If not already done, it loads the fs-verity descriptor from the filesystem and sets up an fsverity_info structure for the inode which describes the Merkle tree and contains the file measurement. It also denies all attempts to open verity files for writing. This commit also begins the include/linux/fsverity.h header, which declares the interface between fs/verity/ and filesystems. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-07-28fs-verity: add Kconfig and the helper functions for hashingEric Biggers
Add the beginnings of the fs/verity/ support layer, including the Kconfig option and various helper functions for hashing. To start, only SHA-256 is supported, but other hash algorithms can easily be added. Reviewed-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-07-28Merge tag 'spdx-5.3-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull SPDX fixes from Greg KH: "Here are some small SPDX fixes for 5.3-rc2 for things that came in during the 5.3-rc1 merge window that we previously missed. Only three small patches here: - two uapi patches to resolve some SPDX tags that were not correct - fix an invalid SPDX tag in the iomap Makefile file All have been properly reviewed on the public mailing lists" * tag 'spdx-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: iomap: fix Invalid License ID treewide: remove SPDX "WITH Linux-syscall-note" from kernel-space headers again treewide: add "WITH Linux-syscall-note" to SPDX tag of uapi headers
2019-07-27Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "Two fixes for the fair scheduling class: - Prevent freeing memory which is accessible by concurrent readers - Make the RCU annotations for numa groups consistent" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Use RCU accessors consistently for ->numa_group sched/fair: Don't free p->numa_faults with concurrent readers
2019-07-27Merge tag 'Wimplicit-fallthrough-5.3-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull Wimplicit-fallthrough enablement from Gustavo A. R. Silva: "This marks switch cases where we are expecting to fall through, and globally enables the -Wimplicit-fallthrough option in the main Makefile. Finally, some missing-break fixes that have been tagged for -stable: - drm/amdkfd: Fix missing break in switch statement - drm/amdgpu/gfx10: Fix missing break in switch statement With these changes, we completely get rid of all the fall-through warnings in the kernel" * tag 'Wimplicit-fallthrough-5.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: Makefile: Globally enable fall-through warning drm/i915: Mark expected switch fall-throughs drm/amd/display: Mark expected switch fall-throughs drm/amdkfd/kfd_mqd_manager_v10: Avoid fall-through warning drm/amdgpu/gfx10: Fix missing break in switch statement drm/amdkfd: Fix missing break in switch statement perf/x86/intel: Mark expected switch fall-throughs mtd: onenand_base: Mark expected switch fall-through afs: fsclient: Mark expected switch fall-throughs afs: yfsclient: Mark expected switch fall-throughs can: mark expected switch fall-throughs firewire: mark expected switch fall-throughs
2019-07-27autofs_lookup(): hold ->d_lock over playing with ->d_flagsAl Viro
... as well as setting ->d_fsdata, etc. Make all of that atomic. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-27get rid of autofs_info->active_countAl Viro
autofs_add_active() is always called only once (and on a dentry with freshly allocated ino, at that). autofs_del_active() is never called more than once. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-26f2fs: fix to read source block before invalidating itJaegeuk Kim
f2fs_allocate_data_block() invalidates old block address and enable new block address. Then, if we try to read old block by f2fs_submit_page_bio(), it will give WARN due to reading invalid blocks. Let's make the order sanely back. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-26Merge tag 'for-5.3-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Two regression fixes: - hangs caused by a missing barrier in the locking code - memory leaks of extent_state due to bad handling of a cached pointer" * tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix extent_state leak in btrfs_lock_and_flush_ordered_range btrfs: Fix deadlock caused by missing memory barrier
2019-07-26Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull vfs umount_tree() leak fix from Al Viro: "Fix braino introduced in 'switch the remnants of releasing the mountpoint away from fs_pin'. The most visible result is leaking struct mount when mounting btrfs, making it impossible to shut down" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix the struct mount leak in umount_tree()
2019-07-26Merge tag 'for-linus-20190726' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Several io_uring fixes/improvements: - Blocking fix for O_DIRECT (me) - Latter page slowness for registered buffers (me) - Fix poll hang under certain conditions (me) - Defer sequence check fix for wrapped rings (Zhengyuan) - Mismatch in async inc/dec accounting (Zhengyuan) - Memory ordering issue that could cause stall (Zhengyuan) - Track sequential defer in bytes, not pages (Zhengyuan) - NVMe pull request from Christoph - Set of hang fixes for wbt (Josef) - Redundant error message kill for libahci (Ding) - Remove unused blk_mq_sched_started_request() and related ops (Marcos) - drbd dynamic alloc shash descriptor to reduce stack use (Arnd) - blkcg ->pd_stat() non-debug print (Tejun) - bcache memory leak fix (Wei) - Comment fix (Akinobu) - BFQ perf regression fix (Paolo) * tag 'for-linus-20190726' of git://git.kernel.dk/linux-block: (24 commits) io_uring: ensure ->list is initialized for poll commands Revert "nvme-pci: don't create a read hctx mapping without read queues" nvme: fix multipath crash when ANA is deactivated nvme: fix memory leak caused by incorrect subsystem free nvme: ignore subnqn for ADATA SX6000LNP drbd: dynamically allocate shash descriptor block: blk-mq: Remove blk_mq_sched_started_request and started_request bcache: fix possible memory leak in bch_cached_dev_run() io_uring: track io length in async_list based on bytes io_uring: don't use iov_iter_advance() for fixed buffers block: properly handle IOCB_NOWAIT for async O_DIRECT IO blk-mq: allow REQ_NOWAIT to return an error inline io_uring: add a memory barrier before atomic_read rq-qos: use a mb for got_token rq-qos: set ourself TASK_UNINTERRUPTIBLE after we schedule rq-qos: don't reset has_sleepers on spurious wakeups rq-qos: fix missed wake-ups in rq_qos_throttle wait: add wq_has_single_sleeper helper block, bfq: check also in-flight I/O in dispatch plugging block: fix sysfs module parameters directory path in comment ...
2019-07-26fix the struct mount leak in umount_tree()Al Viro
We need to drop everything we remove from the tree, whether mnt_has_parent() is true or not. Usually the bug manifests as a slow memory leak (leaked struct mount for initramfs); it becomes much more visible in mount_subtree() users, such as btrfs. There we leak a struct mount for btrfs superblock being mounted, which prevents fs shutdown on subsequent umount. Fixes: 56cbb429d911 ("switch the remnants of releasing the mountpoint away from fs_pin") Reported-by: Nikolay Borisov <nborisov@suse.com> Tested-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-26btrfs: fix extent_state leak in btrfs_lock_and_flush_ordered_rangeNaohiro Aota
btrfs_lock_and_flush_ordered_range() loads given "*cached_state" into cachedp, which, in general, is NULL. Then, lock_extent_bits() updates "cachedp", but it never goes backs to the caller. Thus the caller still see its "cached_state" to be NULL and never free the state allocated under btrfs_lock_and_flush_ordered_range(). As a result, we will see massive state leak with e.g. fstests btrfs/005. Fix this bug by properly handling the pointers. Fixes: bd80d94efb83 ("btrfs: Always use a cached extent_state in btrfs_lock_and_flush_ordered_range") Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-25afs: fsclient: Mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warnings: Warning level 3 was used: -Wimplicit-fallthrough=3 fs/afs/fsclient.c: In function ‘afs_deliver_fs_fetch_acl’: fs/afs/fsclient.c:2199:19: warning: this statement may fall through [-Wimplicit-fallthrough=] call->unmarshall++; ~~~~~~~~~~~~~~~~^~ fs/afs/fsclient.c:2202:2: note: here case 1: ^~~~ fs/afs/fsclient.c:2216:19: warning: this statement may fall through [-Wimplicit-fallthrough=] call->unmarshall++; ~~~~~~~~~~~~~~~~^~ fs/afs/fsclient.c:2219:2: note: here case 2: ^~~~ fs/afs/fsclient.c:2225:19: warning: this statement may fall through [-Wimplicit-fallthrough=] call->unmarshall++; ~~~~~~~~~~~~~~~~^~ fs/afs/fsclient.c:2228:2: note: here case 3: ^~~~ This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-07-25afs: yfsclient: Mark expected switch fall-throughsGustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warnings: fs/afs/yfsclient.c: In function ‘yfs_deliver_fs_fetch_opaque_acl’: fs/afs/yfsclient.c:1984:19: warning: this statement may fall through [-Wimplicit-fallthrough=] call->unmarshall++; ~~~~~~~~~~~~~~~~^~ fs/afs/yfsclient.c:1987:2: note: here case 1: ^~~~ fs/afs/yfsclient.c:2005:19: warning: this statement may fall through [-Wimplicit-fallthrough=] call->unmarshall++; ~~~~~~~~~~~~~~~~^~ fs/afs/yfsclient.c:2008:2: note: here case 2: ^~~~ fs/afs/yfsclient.c:2014:19: warning: this statement may fall through [-Wimplicit-fallthrough=] call->unmarshall++; ~~~~~~~~~~~~~~~~^~ fs/afs/yfsclient.c:2017:2: note: here case 3: ^~~~ fs/afs/yfsclient.c:2035:19: warning: this statement may fall through [-Wimplicit-fallthrough=] call->unmarshall++; ~~~~~~~~~~~~~~~~^~ fs/afs/yfsclient.c:2038:2: note: here case 4: ^~~~ fs/afs/yfsclient.c:2047:19: warning: this statement may fall through [-Wimplicit-fallthrough=] call->unmarshall++; ~~~~~~~~~~~~~~~~^~ fs/afs/yfsclient.c:2050:2: note: here case 5: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 Also, fix some commenting style issues. This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
2019-07-25io_uring: ensure ->list is initialized for poll commandsJens Axboe
Daniel reports that when testing an http server that uses io_uring to poll for incoming connections, sometimes it hard crashes. This is due to an uninitialized list member for the io_uring request. Normally this doesn't trigger and none of the test cases caught it. Reported-by: Daniel Kozak <kozzi11@gmail.com> Tested-by: Daniel Kozak <kozzi11@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-25Merge branch 'access-creds'Linus Torvalds
The access() (and faccessat()) credentials change can cause an unnecessary load on the RCU machinery because every access() call ends up freeing the temporary access credential using RCU. This isn't really noticeable on small machines, but if you have hundreds of cores you can cause huge slowdowns due to RCU storms. It's easy to avoid: the temporary access crededntials aren't actually normally accessed using RCU at all, so we can avoid the whole issue by just marking them as such. * access-creds: access: avoid the RCU grace period for the temporary subjective credentials