summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2022-01-03btrfs: introduce item_nr token variant helpersJosef Bacik
The last remaining place where we have the pattern of item = btrfs_item_nr(slot) <do something with the item> are the token helpers. Handle this by introducing token helpers that will do the btrfs_item_nr() work inside of the helper itself, and then convert all users of the btrfs_item token helpers to the new _nr() variants. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: make btrfs_file_extent_inline_item_len take a slotJosef Bacik
Instead of getting the btrfs_item for this, simply pass in the slot of the item and then use the btrfs_item_size_nr() helper inside of btrfs_file_extent_inline_item_len(). Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: add btrfs_set_item_*_nr() helpersJosef Bacik
We have the pattern of item = btrfs_item_nr(slot); btrfs_set_item_*(leaf, item); in a bunch of places in our code. Fix this by adding btrfs_set_item_*_nr() helpers which will do the appropriate work, and replace those calls with btrfs_set_item_*_nr(leaf, slot); Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: use btrfs_item_size_nr/btrfs_item_offset_nr everywhereJosef Bacik
We have this pattern in a lot of places item = btrfs_item_nr(slot); btrfs_item_size(leaf, item); when we could simply use btrfs_item_size(leaf, slot); Fix all callers of btrfs_item_size() and btrfs_item_offset() to use the _nr variation of the helpers. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: remove no longer needed logic for replaying directory deletesFilipe Manana
Now that we log only dir index keys when logging a directory, we no longer need to deal with dir item keys in the log replay code for replaying directory deletes. This is also true for the case when we replay a log tree created by a kernel that still logs dir items. So remove the remaining code of the replay of directory deletes algorithm that deals with dir item keys. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: only copy dir index keys when logging a directoryFilipe Manana
Currently, when logging a directory, we copy both dir items and dir index items from the fs/subvolume tree to the log tree. Both items have exactly the same data (same struct btrfs_dir_item), the difference lies in the key values, where a dir index key contains the index number of a directory entry while the dir item key does not, as it's used for doing fast lookups of an entry by name, while the former is used for sorting entries when listing a directory. We can exploit that and log only the dir index items, since they contain all the information needed to correctly add, replace and delete directory entries when replaying a log tree. Logging only the dir index items is also backward and forward compatible: an unpatched kernel (without this change) can correctly replay a log tree generated by a patched kernel (with this patch), and a patched kernel can correctly replay a log tree generated by an unpatched kernel. The backward compatibility is ensured because: 1) For inserting a new dentry: a dentry is only inserted when we find a new dir index key - we can only insert if we know the dir index offset, which is encoded in the dir index key's offset; 2) For deleting dentries: during log replay, before adding or replacing dentries, we first replay dentry deletions. Whenever we find a dir item key or a dir index key in the subvolume/fs tree that is not logged in a range for which the log tree is authoritative, we do the unlink of the dentry, which removes both the existing dir item key and the dir index key. Therefore logging just dir index keys is enough to ensure dentry deletions are correctly replayed; 3) For dentry replacements: they work when we log only dir index keys and this is mostly due to a combination of 1) and 2). If we replace a dentry with name "foobar" to point from inode A to inode B, then we know the dir index key for the new dentry is different from the old one, as it has an index number (key offset) larger than the old one. This results in replaying a deletion, through replay_dir_deletes(), that causes the old dentry to be removed, both the dir item key and the dir index key, as mentioned at 2). Then when processing the new dir index key, we add the new dentry, adding both a new dir item key and a new index key pointing to inode B, as stated in 1). The forward compatibility, the ability for a patched kernel to replay a log created by an older, unpatched kernel, comes from the changes required for making sure we are able to replay a log that only contains dir index keys - we simply ignore every dir item key we find. So modify directory logging to log only dir index items, and modify the log replay process to ignore dir item keys, from log trees created by an unpatched kernel, and process only with dir index keys. This reduces the amount of logged metadata by about half, and therefore the time spent logging or fsyncing large directories (less CPU time and less IO). The following test script was used to measure this change: #!/bin/bash DEV=/dev/nvme0n1 MNT=/mnt/nvme0n1 NUM_NEW_FILES=1000000 NUM_FILE_DELETES=10000 mkfs.btrfs -f $DEV mount -o ssd $DEV $MNT mkdir $MNT/testdir for ((i = 1; i <= $NUM_NEW_FILES; i++)); do echo -n > $MNT/testdir/file_$i done start=$(date +%s%N) xfs_io -c "fsync" $MNT/testdir end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "dir fsync took $dur ms after adding $NUM_NEW_FILES files" # sync to force transaction commit and wipeout the log. sync del_inc=$(( $NUM_NEW_FILES / $NUM_FILE_DELETES )) for ((i = 1; i <= $NUM_NEW_FILES; i += $del_inc)); do rm -f $MNT/testdir/file_$i done start=$(date +%s%N) xfs_io -c "fsync" $MNT/testdir end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo "dir fsync took $dur ms after deleting $NUM_FILE_DELETES files" echo umount $MNT The tests were run on a physical machine, with a non-debug kernel (Debian's default kernel config), for different values of $NUM_NEW_FILES and $NUM_FILE_DELETES, and the results were the following: ** Before patch, NUM_NEW_FILES = 1 000 000, NUM_DELETE_FILES = 10 000 ** dir fsync took 8412 ms after adding 1000000 files dir fsync took 500 ms after deleting 10000 files ** After patch, NUM_NEW_FILES = 1 000 000, NUM_DELETE_FILES = 10 000 ** dir fsync took 4252 ms after adding 1000000 files (-49.5%) dir fsync took 269 ms after deleting 10000 files (-46.2%) ** Before patch, NUM_NEW_FILES = 100 000, NUM_DELETE_FILES = 1 000 ** dir fsync took 745 ms after adding 100000 files dir fsync took 59 ms after deleting 1000 files ** After patch, NUM_NEW_FILES = 100 000, NUM_DELETE_FILES = 1 000 ** dir fsync took 404 ms after adding 100000 files (-45.8%) dir fsync took 31 ms after deleting 1000 files (-47.5%) ** Before patch, NUM_NEW_FILES = 10 000, NUM_DELETE_FILES = 1 000 ** dir fsync took 67 ms after adding 10000 files dir fsync took 9 ms after deleting 1000 files ** After patch, NUM_NEW_FILES = 10 000, NUM_DELETE_FILES = 1 000 ** dir fsync took 36 ms after adding 10000 files (-46.3%) dir fsync took 5 ms after deleting 1000 files (-44.4%) ** Before patch, NUM_NEW_FILES = 1 000, NUM_DELETE_FILES = 100 ** dir fsync took 9 ms after adding 1000 files dir fsync took 4 ms after deleting 100 files ** After patch, NUM_NEW_FILES = 1 000, NUM_DELETE_FILES = 100 ** dir fsync took 7 ms after adding 1000 files (-22.2%) dir fsync took 3 ms after deleting 100 files (-25.0%) Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: remove spurious unlock/lock of unused_bgs_lockNikolay Borisov
Since both unused block groups and reclaim bgs lists are protected by unused_bgs_lock then free them in the same critical section without doing an extra unlock/lock pair. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: fix deadlock between quota enable and other quota operationsFilipe Manana
When enabling quotas, we attempt to commit a transaction while holding the mutex fs_info->qgroup_ioctl_lock. This can result on a deadlock with other quota operations such as: - qgroup creation and deletion, ioctl BTRFS_IOC_QGROUP_CREATE; - adding and removing qgroup relations, ioctl BTRFS_IOC_QGROUP_ASSIGN. This is because these operations join a transaction and after that they attempt to lock the mutex fs_info->qgroup_ioctl_lock. Acquiring that mutex after joining or starting a transaction is a pattern followed everywhere in qgroups, so the quota enablement operation is the one at fault here, and should not commit a transaction while holding that mutex. Fix this by making the transaction commit while not holding the mutex. We are safe from two concurrent tasks trying to enable quotas because we are serialized by the rw semaphore fs_info->subvol_sem at btrfs_ioctl_quota_ctl(), which is the only call site for enabling quotas. When this deadlock happens, it produces a trace like the following: INFO: task syz-executor:25604 blocked for more than 143 seconds. Not tainted 5.15.0-rc6 #4 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor state:D stack:24800 pid:25604 ppid: 24873 flags:0x00004004 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0xcd9/0x2530 kernel/sched/core.c:6287 schedule+0xd3/0x270 kernel/sched/core.c:6366 btrfs_commit_transaction+0x994/0x2e90 fs/btrfs/transaction.c:2201 btrfs_quota_enable+0x95c/0x1790 fs/btrfs/qgroup.c:1120 btrfs_ioctl_quota_ctl fs/btrfs/ioctl.c:4229 [inline] btrfs_ioctl+0x637e/0x7b70 fs/btrfs/ioctl.c:5010 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f86920b2c4d RSP: 002b:00007f868f61ac58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f86921d90a0 RCX: 00007f86920b2c4d RDX: 0000000020005e40 RSI: 00000000c0109428 RDI: 0000000000000008 RBP: 00007f869212bd80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f86921d90a0 R13: 00007fff6d233e4f R14: 00007fff6d233ff0 R15: 00007f868f61adc0 INFO: task syz-executor:25628 blocked for more than 143 seconds. Not tainted 5.15.0-rc6 #4 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:syz-executor state:D stack:29080 pid:25628 ppid: 24873 flags:0x00004004 Call Trace: context_switch kernel/sched/core.c:4940 [inline] __schedule+0xcd9/0x2530 kernel/sched/core.c:6287 schedule+0xd3/0x270 kernel/sched/core.c:6366 schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:6425 __mutex_lock_common kernel/locking/mutex.c:669 [inline] __mutex_lock+0xc96/0x1680 kernel/locking/mutex.c:729 btrfs_remove_qgroup+0xb7/0x7d0 fs/btrfs/qgroup.c:1548 btrfs_ioctl_qgroup_create fs/btrfs/ioctl.c:4333 [inline] btrfs_ioctl+0x683c/0x7b70 fs/btrfs/ioctl.c:5014 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:874 [inline] __se_sys_ioctl fs/ioctl.c:860 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:860 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Reported-by: Hao Sun <sunhao.th@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CACkBjsZQF19bQ1C6=yetF3BvL10OSORpFUcWXTP6HErshDB4dQ@mail.gmail.com/ Fixes: 340f1aa27f36 ("btrfs: qgroups: Move transaction management inside btrfs_quota_enable/disable") CC: stable@vger.kernel.org # 4.19 Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-03btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW rangeFilipe Manana
When doing a direct IO write against a file range that either has preallocated extents in that range or has regular extents and the file has the NOCOW attribute set, the write fails with -ENOSPC when all of the following conditions are met: 1) There are no data blocks groups with enough free space matching the size of the write; 2) There's not enough unallocated space for allocating a new data block group; 3) The extents in the target file range are not shared, neither through snapshots nor through reflinks. This is wrong because a NOCOW write can be done in such case, and in fact it's possible to do it using a buffered IO write, since when failing to allocate data space, the buffered IO path checks if a NOCOW write is possible. The failure in direct IO write path comes from the fact that early on, at btrfs_dio_iomap_begin(), we try to allocate data space for the write and if it that fails we return the error and stop - we never check if we can do NOCOW. But later, at btrfs_get_blocks_direct_write(), we check if we can do a NOCOW write into the range, or a subset of the range, and then release the previously reserved data space. Fix this by doing the data reservation only if needed, when we must COW, at btrfs_get_blocks_direct_write() instead of doing it at btrfs_dio_iomap_begin(). This also simplifies a bit the logic and removes the inneficiency of doing unnecessary data reservations. The following example test script reproduces the problem: $ cat dio-nocow-enospc.sh #!/bin/bash DEV=/dev/sdj MNT=/mnt/sdj # Use a small fixed size (1G) filesystem so that it's quick to fill # it up. # Make sure the mixed block groups feature is not enabled because we # later want to not have more space available for allocating data # extents but still have enough metadata space free for the file writes. mkfs.btrfs -f -b $((1024 * 1024 * 1024)) -O ^mixed-bg $DEV mount $DEV $MNT # Create our test file with the NOCOW attribute set. touch $MNT/foobar chattr +C $MNT/foobar # Now fill in all unallocated space with data for our test file. # This will allocate a data block group that will be full and leave # no (or a very small amount of) unallocated space in the device, so # that it will not be possible to allocate a new block group later. echo echo "Creating test file with initial data..." xfs_io -c "pwrite -S 0xab -b 1M 0 900M" $MNT/foobar # Now try a direct IO write against file range [0, 10M[. # This should succeed since this is a NOCOW file and an extent for the # range was previously allocated. echo echo "Trying direct IO write over allocated space..." xfs_io -d -c "pwrite -S 0xcd -b 10M 0 10M" $MNT/foobar umount $MNT When running the test: $ ./dio-nocow-enospc.sh (...) Creating test file with initial data... wrote 943718400/943718400 bytes at offset 0 900 MiB, 900 ops; 0:00:01.43 (625.526 MiB/sec and 625.5265 ops/sec) Trying direct IO write over allocated space... pwrite: No space left on device A test case for fstests will follow, testing both this direct IO write scenario as well as the buffered IO write scenario to make it less likely to get future regressions on the buffered IO case. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-01-02cifs: reconnect only the connection and not smb session where possibleShyam Prasad N
With the new per-channel bitmask for reconnect, we have an option to reconnect the tcp session associated with the channel without reconnecting the smb session. i.e. if there are still channels to operate on, we can continue to use the smb session and tcon. However, there are cases where it makes sense to reconnect the smb session even when there are active channels underneath. For example for SMB session expiry. With this patch, we'll have an option to do either, and use the correct option for specific cases. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-02cifs: add WARN_ON for when chan_count goes below minimumShyam Prasad N
chan_count keeps track of the total number of channels. Since at least the primary channel will always be connected, this value can never go below 1. Warn if that happens. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-02cifs: adjust DebugData to use chans_need_reconnect for conn statusShyam Prasad N
Use ses->chans_need_reconnect bitmask to print the connection status of each channel under an SMB session. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-02cifs: use the chans_need_reconnect bitmap for reconnect statusShyam Prasad N
We use the concept of "binding" when one of the secondary channel is in the process of connecting/reconnecting to the server. Till this binding process completes, and the channel is bound to an existing session, we redirect traffic from other established channels on the binding channel, effectively blocking all traffic till individual channels get reconnected. With my last set of commits, we can get rid of this binding serialization. We now have a bitmap of connection states for each channel. We will use this bitmap instead for tracking channel status. Having a bitmap also now enables us to keep the session alive, as long as even a single channel underneath is alive. Unfortunately, this also meant that we need to supply the tcp connection info for the channel during all negotiate and session setup functions. These changes have resulted in a slightly bigger code churn. However, I expect perf and robustness improvements in the mchan scenario after this change. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-02cifs: track individual channel status using chans_need_reconnectShyam Prasad N
We needed a way to identify the channels under the smb session which are in reconnect, so that the traffic to other channels can continue. So I replaced the bool need_reconnect with a bitmask identifying all the channels that need reconnection (named chans_need_reconnect). When a channel needs reconnection, the bit corresponding to the index of the server in ses->chans is used to set this bitmask. Checking if no channels or all the channels need reconnect then becomes very easy. Also wrote some helper macros for checking and setting the bits. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-02cifs: remove redundant assignment to pointer pColin Ian King
The pointer p is being assigned with a value that is never read. The pointer is being re-assigned a different value inside the do-while loop. The assignment is redundant and can be removed. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2022-01-02fs/writeback: Convert inode_switch_wbs_work_fn to foliosMatthew Wilcox (Oracle)
This gets the statistics correct for large folios by modifying the counters by the number of pages in the folio instead of by 1. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2021-12-31orangefs: Fix the size of a memory allocation in orangefs_bufmap_alloc()Christophe JAILLET
'buffer_index_array' really looks like a bitmap. So it should be allocated as such. When kzalloc is called, a number of bytes is expected, but a number of longs is passed instead. In get(), if not enough memory is allocated, un-allocated memory may be read or written. So use bitmap_zalloc() to safely allocate the correct memory size and avoid un-expected behavior. While at it, change the corresponding kfree() into bitmap_free() to keep the semantic. Fixes: ea2c9c9f6574 ("orangefs: bufmap rewrite") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2021-12-31orangefs: use default_groups in kobj_typeGreg Kroah-Hartman
There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the orangfs code to use default_groups field which has been the preferred way since aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Cc: Mike Marshall <hubcap@omnibond.com> Cc: Martin Brandenburg <martin@omnibond.com> Cc: devel@lists.orangefs.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2021-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your *net-next* tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg|ret|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-30fs/mount_setattr: always cleanup mount_kattrChristian Brauner
Make sure that finish_mount_kattr() is called after mount_kattr was succesfully built in both the success and failure case to prevent leaking any references we took when we built it. We returned early if path lookup failed thereby risking to leak an additional reference we took when building mount_kattr when an idmapped mount was requested. Cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 9caccd41541a ("fs: introduce MOUNT_ATTR_IDMAP") Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c commit 077cdda764c7 ("net/mlx5e: TC, Fix memory leak with rules with internal port") commit 31108d142f36 ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'") commit 4390c6edc0fb ("net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'") https://lore.kernel.org/all/20211229065352.30178-1-saeed@kernel.org/ net/smc/smc_wr.c commit 49dc9013e34b ("net/smc: Use the bitmap API when applicable") commit 349d43127dac ("net/smc: fix kernel panic caused by race of smc_sock") bitmap_zero()/memset() is removed by the fix Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-31erofs: add on-disk compressed tail-packing inline supportYue Hu
Introduces erofs compressed tail-packing inline support. This approach adds a new field called `h_idata_size' in the per-file compression header to indicate the encoded size of each tail-packing pcluster. At runtime, it will find the start logical offset of the tail pcluster when initializing per-inode zmap and record such extent (headlcn, idataoff) information to the in-memory inode. Therefore, follow-on requests can directly recognize if one pcluster is a tail-packing inline pcluster or not. Link: https://lore.kernel.org/r/20211228054604.114518-6-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-31erofs: support inline data decompressionYue Hu
Currently, we have already support tail-packing inline for uncompressed file, let's also implement this for compressed files to save I/Os and storage space. Different from normal pclusters, compressed data is available in advance because of other metadata I/Os. Therefore, they directly move into the bypass queue without extra I/O submission. It's the last compression feature before folio/subpage support. Link: https://lore.kernel.org/r/20211228232919.21413-1-xiang@kernel.org Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-31erofs: support unaligned data decompressionGao Xiang
Previously, compressed data was assumed as block-aligned. This should be changed due to in-block tail-packing inline data. Link: https://lore.kernel.org/r/20211228054604.114518-4-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Yue Hu <huyue2@yulong.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-29net: Don't include filter.h from net/sock.hJakub Kicinski
sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2021-12-29nilfs2: use default_groups in kobj_typeGreg Kroah-Hartman
There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the nilfs2 code to use default_groups field which has been the preferred way since aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Link: https://lore.kernel.org/r/20211228144252.390554-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-28ksmbd: Fix smb2_get_name() kernel-doc commentYang Li
Remove some warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/ksmbd/smb2pdu.c:623: warning: Function parameter or member 'local_nls' not described in 'smb2_get_name' fs/ksmbd/smb2pdu.c:623: warning: Excess function parameter 'nls_table' description in 'smb2_get_name' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-28ksmbd: Delete an invalid argument description in smb2_populate_readdir_entry()Yang Li
A warning is reported because an invalid argument description, it is found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/ksmbd/smb2pdu.c:3406: warning: Excess function parameter 'user_ns' description in 'smb2_populate_readdir_entry' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Fixes: 475d6f98804c ("ksmbd: fix translation in smb2_populate_readdir_entry()") Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-28ksmbd: Fix smb2_set_info_file() kernel-doc commentYang Li
Fix argument list that the kdoc format and script verified in smb2_set_info_file(). The warnings were found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/ksmbd/smb2pdu.c:5862: warning: Function parameter or member 'req' not described in 'smb2_set_info_file' fs/ksmbd/smb2pdu.c:5862: warning: Excess function parameter 'info_class' description in 'smb2_set_info_file' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Fixes: 9496e268e3af ("ksmbd: add request buffer validation in smb2_set_info") Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-28ksmbd: Fix buffer_check_err() kernel-doc commentYang Li
Add the description of @rsp_org in buffer_check_err() kernel-doc comment to remove a warning found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/ksmbd/smb2pdu.c:4028: warning: Function parameter or member 'rsp_org' not described in 'buffer_check_err' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Fixes: cb4517201b8a ("ksmbd: remove smb2_buf_length in smb2_hdr") Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-28ksmbd: fix multi session connection failureNamjae Jeon
When RSS mode is enable, windows client do simultaneously send several session requests to server. There is racy issue using sess->ntlmssp.cryptkey on N connection : 1 session. So authetication failed using wrong cryptkey on some session. This patch move cryptkey to ksmbd_conn structure to use each cryptkey on connection. Tested-by: Ziwei Xie <zw.xie@high-flyer.cn> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-28ksmbd: set both ipv4 and ipv6 in FSCTL_QUERY_NETWORK_INTERFACE_INFONamjae Jeon
Set ipv4 and ipv6 address in FSCTL_QUERY_NETWORK_INTERFACE_INFO. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-28ksmbd: set RSS capable in FSCTL_QUERY_NETWORK_INTERFACE_INFONamjae Jeon
Set RSS capable in FSCTL_QUERY_NETWORK_INTERFACE_INFO if netdev has multi tx queues. And add ksmbd_compare_user() to avoid racy condition issue in ksmbd_free_user(). because windows client is simultaneously used to send session setup requests for multichannel connection. Tested-by: Ziwei Xie <zw.xie@high-flyer.cn> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-28ksmbd: Remove unused fields from ksmbd_file struct definitionMarios Makassikis
These fields are remnants of the not upstreamed SMB1 code. Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-28ksmbd: Remove unused parameter from smb2_get_name()Marios Makassikis
The 'share' parameter is no longer used by smb2_get_name() since commit 265fd1991c1d ("ksmbd: use LOOKUP_BENEATH to prevent the out of share access"). Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-28ksmbd: use oid registry functions to decode OIDsHyunchul Lee
Use look_up_OID to decode OIDs rather than implementing functions. Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-29erofs: introduce z_erofs_fixup_insizeGao Xiang
To prepare for the upcoming ztailpacking feature, introduce z_erofs_fixup_insize() and pageofs_in to wrap up the process to get the exact compressed size via zero padding. Link: https://lore.kernel.org/r/20211228054604.114518-3-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-29erofs: tidy up z_erofs_lz4_decompressGao Xiang
To prepare for the upcoming ztailpacking feature and further cleanups, introduce a unique z_erofs_lz4_decompress_ctx to keep the context, including inpages, outpages and oend, which are frequently used by the lz4 decompressor. No logic changes. Link: https://lore.kernel.org/r/20211228054604.114518-2-hsiangkao@linux.alibaba.com Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-28io_uring: use completion batching for poll rem/updPavel Begunkov
Use __io_req_complete() in io_poll_update(), so we can utilise completion batching for both update/remove request and the poll we're killing (if any). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e2bdc6c5abd9e9b80f09b86d8823eb1c780362cd.1639605189.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-28io_uring: single shot poll removal optimisationPavel Begunkov
We don't need to poll oneshot request if we've got a desired mask in io_poll_wake(), task_work will clean it up correctly, but as we already hold a wq spinlock, we can remove ourselves and save on additional spinlocking in io_poll_remove_entries(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ee170a344a18c9ef36b554d806c64caadfd61c31.1639605189.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-28io_uring: poll reworkPavel Begunkov
It's not possible to go forward with the current state of io_uring polling, we need a more straightforward and easier synchronisation. There are a lot of problems with how it is at the moment, including missing events on rewait. The main idea here is to introduce a notion of request ownership while polling, no one but the owner can modify any part but ->poll_refs of struct io_kiocb, that grants us protection against all sorts of races. Main users of such exclusivity are poll task_work handler, so before queueing a tw one should have/acquire ownership, which will be handed off to the tw handler. The other user is __io_arm_poll_handler() do initial poll arming. It starts taking the ownership, so tw handlers won't be run until it's released later in the function after vfs_poll. note: also prevents races in __io_queue_proc(). Poll wake/etc. may not be able to get ownership, then they need to increase the poll refcount and the task_work should notice it and retry if necessary, see io_poll_check_events(). There is also IO_POLL_CANCEL_FLAG flag to notify that we want to kill request. It makes cancellations more reliable, enables double multishot polling, fixes double poll rewait, fixes missing poll events and fixes another bunch of races. Even though it adds some overhead for new refcounting, and there are a couple of nice performance wins: - no req->refs refcounting for poll requests anymore - if the data is already there (once measured for some test to be 1-2% of all apoll requests), it removes it doesn't add atomics and removes spin_lock/unlock pair. - works well with multishots, we don't do remove from queue / add to queue for each new poll event. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6b652927c77ed9580ea4330ac5612f0e0848c946.1639605189.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-28io_uring: kill poll linking optimisationPavel Begunkov
With IORING_FEAT_FAST_POLL in place, io_put_req_find_next() for poll requests doesn't make much sense, and in any case re-adding it shouldn't be a problem considering batching in tctx_task_work(). We can remove it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/15699682bf81610ec901d4e79d6da64baa9f70be.1639605189.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-28io_uring: move common poll bitsPavel Begunkov
Move some poll helpers/etc up, we'll need them there shortly Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6c5c3dba24c86aad5cd389a54a8c7412e6a0621d.1639605189.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-28io_uring: refactor poll updatePavel Begunkov
Clean up io_poll_update() and unify cancellation paths for remove and update. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5937138b6265a1285220e2fab1b28132c1d73ce3.1639605189.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-28io_uring: remove double poll on poll updatePavel Begunkov
Before updating a poll request we should remove it from poll queues, including the double poll entry. Fixes: b69de288e913 ("io_uring: allow events and user_data update of running poll requests") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ac39e7f80152613603b8a6cc29a2b6063ac2434f.1639605189.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-28kobject: remove kset from struct kset_uevent_ops callbacksGreg Kroah-Hartman
There is no need to pass the pointer to the kset in the struct kset_uevent_ops callbacks as no one uses it, so just remove that pointer entirely. Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Wedson Almeida Filho <wedsonaf@google.com> Link: https://lore.kernel.org/r/20211227163924.3970661-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-23Merge tag '5.16-rc5-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull ksmbd fixes from Steve French: "Three ksmbd fixes, all for stable as well. Two fix potential unitialized memory and one fixes a security problem where encryption is unitentionally disabled from some clients" * tag '5.16-rc5-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: disable SMB2_GLOBAL_CAP_ENCRYPTION for SMB 3.1.1 ksmbd: fix uninitialized symbol 'pntsd_size' ksmbd: fix error code in ndr_read_int32()
2021-12-23Merge tag 'io_uring-5.16-2021-12-23' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fix from Jens Axboe: "Single fix for not clearing kiocb->ki_pos back to 0 for a stream, destined for stable as well" * tag 'io_uring-5.16-2021-12-23' of git://git.kernel.dk/linux-block: io_uring: zero iocb->ki_pos for stream file types
2021-12-23ext4: update fast commit TODOsHarshad Shirwadkar
This series takes care of a couple of TODOs and adds new ones. Update the TODOs section to reflect current state and future work that needs to happen. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223202140.2061101-5-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-23ext4: simplify updating of fast commit statsHarshad Shirwadkar
Move fast commit stats updating logic to a separate function from ext4_fc_commit(). This significantly improves readability of ext4_fc_commit(). Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223202140.2061101-4-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>