summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-10-18mm: kmem: move memcg_kmem_bypass() calls to get_mem/obj_cgroup_from_current()Roman Gushchin
Patch series "mm: kmem: kernel memory accounting in an interrupt context". This patchset implements memcg-based memory accounting of allocations made from an interrupt context. Historically, such allocations were passed unaccounted mostly because charging the memory cgroup of the current process wasn't an option. Also performance reasons were likely a reason too. The remote charging API allows to temporarily overwrite the currently active memory cgroup, so that all memory allocations are accounted towards some specified memory cgroup instead of the memory cgroup of the current process. This patchset extends the remote charging API so that it can be used from an interrupt context. Then it removes the fence that prevented the accounting of allocations made from an interrupt context. It also contains a couple of optimizations/code refactorings. This patchset doesn't directly enable accounting for any specific allocations, but prepares the code base for it. The bpf memory accounting will likely be the first user of it: a typical example is a bpf program parsing an incoming network packet, which allocates an entry in hashmap map to store some information. This patch (of 4): Currently memcg_kmem_bypass() is called before obtaining the current memory/obj cgroup using get_mem/obj_cgroup_from_current(). Moving memcg_kmem_bypass() into get_mem/obj_cgroup_from_current() reduces the number of call sites and allows further code simplifications. Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Link: http://lkml.kernel.org/r/20200827225843.1270629-1-guro@fb.com Link: http://lkml.kernel.org/r/20200827225843.1270629-2-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18mm, memcg: rework remote charging API to support nestingRoman Gushchin
Currently the remote memcg charging API consists of two functions: memalloc_use_memcg() and memalloc_unuse_memcg(), which set and clear the memcg value, which overwrites the memcg of the current task. memalloc_use_memcg(target_memcg); <...> memalloc_unuse_memcg(); It works perfectly for allocations performed from a normal context, however an attempt to call it from an interrupt context or just nest two remote charging blocks will lead to an incorrect accounting. On exit from the inner block the active memcg will be cleared instead of being restored. memalloc_use_memcg(target_memcg); memalloc_use_memcg(target_memcg_2); <...> memalloc_unuse_memcg(); Error: allocation here are charged to the memcg of the current process instead of target_memcg. memalloc_unuse_memcg(); This patch extends the remote charging API by switching to a single function: struct mem_cgroup *set_active_memcg(struct mem_cgroup *memcg), which sets the new value and returns the old one. So a remote charging block will look like: old_memcg = set_active_memcg(target_memcg); <...> set_active_memcg(old_memcg); This patch is heavily based on the patch by Johannes Weiner, which can be found here: https://lkml.org/lkml/2020/5/28/806 . Signed-off-by: Roman Gushchin <guro@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dan Schatzberg <dschatzberg@fb.com> Link: https://lkml.kernel.org/r/20200821212056.3769116-1-guro@fb.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18ia64: fix build error with !COREDUMPKrzysztof Kozlowski
Fix linkage error when CONFIG_BINFMT_ELF is selected but CONFIG_COREDUMP is not: ia64-linux-ld: arch/ia64/kernel/elfcore.o: in function `elf_core_write_extra_phdrs': elfcore.c:(.text+0x172): undefined reference to `dump_emit' ia64-linux-ld: arch/ia64/kernel/elfcore.o: in function `elf_core_write_extra_data': elfcore.c:(.text+0x2b2): undefined reference to `dump_emit' Fixes: 1fcccbac89f5 ("elf coredump: replace ELF_CORE_EXTRA_* macros by functions") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200819064146.12529-1-krzk@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-18ext4: Detect already used quota file earlyJan Kara
When we try to use file already used as a quota file again (for the same or different quota type), strange things can happen. At the very least lockdep annotations may be wrong but also inode flags may be wrongly set / reset. When the file is used for two quota types at once we can even corrupt the file and likely crash the kernel. Catch all these cases by checking whether passed file is already used as quota file and bail early in that case. This fixes occasional generic/219 failure due to lockdep complaint. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reported-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201015110330.28716-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18jbd2: avoid transaction reuse after reformattingchangfengnan
When ext4 is formatted with lazy_journal_init=1 and transactions from the previous filesystem are still on disk, it is possible that they are considered during a recovery after a crash. Because the checksum seed has changed, the CRC check will fail, and the journal recovery fails with checksum error although the journal is otherwise perfectly valid. Fix the problem by checking commit block time stamps to determine whether the data in the journal block is just stale or whether it is indeed corrupt. Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Fengnan Chang <changfengnan@hikvision.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201012164900.20197-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: use the normal helper to get the actual inodeKaixu Xia
Here we use the READ_ONCE to fix race conditions in ->d_compare() and ->d_hash() when they are called in RCU-walk mode, seems we can use the normal helper d_inode_rcu() to get the actual inode. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/1602317416-1260-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: fix bs < ps issue reported with dioread_nolock mount optRitesh Harjani
left shifting m_lblk by blkbits was causing value overflow and hence it was not able to convert unwritten to written extent. So, make sure we typecast it to loff_t before do left shift operation. Also in func ext4_convert_unwritten_io_end_vec(), make sure to initialize ret variable to avoid accidentally returning an uninitialized ret. This patch fixes the issue reported in ext4 for bs < ps with dioread_nolock mount option. Fixes: c8cc88163f40df39e50c ("ext4: Add support for blocksize < pagesize in dioread_nolock") Cc: stable@kernel.org Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/af902b5db99e8b73980c795d84ad7bb417487e76.1602168865.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: data=journal: write-protect pages on j_submit_inode_data_buffers()Mauricio Faria de Oliveira
This implements journal callbacks j_submit|finish_inode_data_buffers() with different behavior for data=journal: to write-protect pages under commit, preventing changes to buffers writeably mapped to userspace. If a buffer's content changes between commit's checksum calculation and write-out to disk, it can cause journal recovery/mount failures upon a kernel crash or power loss. [ 27.334874] EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, and O_DIRECT support! [ 27.339492] JBD2: Invalid checksum recovering data block 8705 in log [ 27.342716] JBD2: recovery failed [ 27.343316] EXT4-fs (loop0): error loading journal mount: /ext4: can't read superblock on /dev/loop0. In j_submit_inode_data_buffers() we write-protect the inode's pages with write_cache_pages() and redirty w/ writepage callback if needed. In j_finish_inode_data_buffers() there is nothing do to. And in order to use the callbacks, inodes are added to the inode list in transaction in __ext4_journalled_writepage() and ext4_page_mkwrite(). In ext4_page_mkwrite() we must make sure that the buffers are attached to the transaction as jbddirty with write_end_fn(), as already done in __ext4_journalled_writepage(). Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Reported-by: Dann Frazier <dann.frazier@canonical.com> Reported-by: kernel test robot <lkp@intel.com> # wbc.nr_to_write Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20201006004841.600488-5-mfo@canonical.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: data=journal: fixes for ext4_page_mkwrite()Mauricio Faria de Oliveira
These are two fixes for data journalling required by the next patch, discovered while testing it. First, the optimization to return early if all buffers are mapped is not appropriate for the next patch: The inode _must_ be added to the transaction's list in data=journal mode (so to write-protect pages on commit) thus we cannot return early there. Second, once that optimization to reduce transactions was disabled for data=journal mode, more transactions happened, and occasionally hit this warning message: 'JBD2: Spotted dirty metadata buffer'. Reason is, block_page_mkwrite() will set_buffer_dirty() before do_journal_get_write_access() that is there to prevent it. This issue was masked by the optimization. So, on data=journal use __block_write_begin() instead. This also requires page locking and len recalculation. (see block_page_mkwrite() for implementation details.) Finally, as Jan noted there is little sharing between data=journal and other modes in ext4_page_mkwrite(). However, a prototype of ext4_journalled_page_mkwrite() showed there still would be lots of duplicated lines (tens of) that didn't seem worth it. Thus this patch ends up with an ugly goto to skip all non-data journalling code (to avoid long indentations, but that can be changed..) in the beginning, and just a conditional in the transaction section. Well, we skip a common part to data journalling which is the page truncated check, but we do it again after ext4_journal_start() when we re-acquire the page lock (so not to acquire the page lock twice needlessly for data journalling.) Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201006004841.600488-4-mfo@canonical.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18jbd2, ext4, ocfs2: introduce/use journal callbacks ↵Mauricio Faria de Oliveira
j_submit|finish_inode_data_buffers() Introduce journal callbacks to allow different behaviors for an inode in journal_submit|finish_inode_data_buffers(). The existing users of the current behavior (ext4, ocfs2) are adapted to use the previously exported functions that implement the current behavior. Users are callers of jbd2_journal_inode_ranged_write|wait(), which adds the inode to the transaction's inode list with the JI_WRITE|WAIT_DATA flags. Only ext4 and ocfs2 in-tree. Both CONFIG_EXT4_FS and CONFIG_OCSFS2_FS select CONFIG_JBD2, which builds fs/jbd2/commit.c and journal.c that define and export the functions, so we can call directly in ext4/ocfs2. Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201006004841.600488-3-mfo@canonical.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers()Mauricio Faria de Oliveira
Export functions that implement the current behavior done for an inode in journal_submit|finish_inode_data_buffers(). No functional change. Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20201006004841.600488-2-mfo@canonical.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: introduce ext4_sb_bread_unmovable() to replace sb_bread_unmovable()zhangyi (F)
Now we only use sb_bread_unmovable() to read superblock and descriptor block at mount time, so there is no opportunity that we need to clear buffer verified bit and also handle buffer write_io error bit. But for the sake of unification, let's introduce ext4_sb_bread_unmovable() to replace all sb_bread_unmovable(). After this patch, we stop using read helpers in fs/buffer.c. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20200924073337.861472-8-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: use ext4_sb_bread() instead of sb_bread()zhangyi (F)
We have already remove open codes that invoke helpers provide by fs/buffer.c in all places reading metadata buffers. This patch switch to use ext4_sb_bread() to replace all sb_bread() helpers, which is ext4_read_bh() helper back end. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20200924073337.861472-7-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: introduce ext4_sb_breadahead_unmovable() to replace ↵zhangyi (F)
sb_breadahead_unmovable() If we readahead inode tables in __ext4_get_inode_loc(), it may bypass buffer_write_io_error() check, so introduce ext4_sb_breadahead_unmovable() to handle this special case. This patch also replace sb_breadahead_unmovable() in ext4_fill_super() for the sake of unification. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20200924073337.861472-6-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: use ext4_buffer_uptodate() in __ext4_get_inode_loc()zhangyi (F)
We have already introduced ext4_buffer_uptodate() to re-set the uptodate bit on buffer which has been failed to write out to disk. Just remove the redundant codes and switch to use ext4_buffer_uptodate() in __ext4_get_inode_loc(). Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20200924073337.861472-5-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: use common helpers in all places reading metadata bufferszhangyi (F)
Revome all open codes that read metadata buffers, switch to use ext4_read_bh_*() common helpers. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200924073337.861472-4-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: introduce new metadata buffer read helperszhangyi (F)
The previous patch add clear_buffer_verified() before we read metadata block from disk again, but it's rather easy to miss clearing of this bit because currently we read metadata buffer through different open codes (e.g. ll_rw_block(), bh_submit_read() and invoke submit_bh() directly). So, it's time to add common helpers to unify in all the places reading metadata buffers instead. This patch add 3 helpers: - ext4_read_bh_nowait(): async read metadata buffer if it's actually not uptodate, clear buffer_verified bit before read from disk. - ext4_read_bh(): sync version of read metadata buffer, it will wait until the read operation return and check the return status. - ext4_read_bh_lock(): try to lock the buffer before read buffer, it will skip reading if the buffer is already locked. After this patch, we need to use these helpers in all the places reading metadata buffer instead of different open codes. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200924073337.861472-3-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: clear buffer verified flag if read meta block from diskzhangyi (F)
The metadata buffer is no longer trusted after we read it from disk again because it is not uptodate for some reasons (e.g. failed to write back). Otherwise we may get below memory corruption problem in ext4_ext_split()->memset() if we read stale data from the newly allocated extent block on disk which has been failed to async write out but miss verify again since the verified bit has already been set on the buffer. [ 29.774674] BUG: unable to handle kernel paging request at ffff88841949d000 ... [ 29.783317] Oops: 0002 [#2] SMP [ 29.784219] R10: 00000000000f4240 R11: 0000000000002e28 R12: ffff88842fa1c800 [ 29.784627] CPU: 1 PID: 126 Comm: kworker/u4:3 Tainted: G D W [ 29.785546] R13: ffffffff9cddcc20 R14: ffffffff9cddd420 R15: ffff88842fa1c2f8 [ 29.786679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),BIOS ?-20190727_0738364 [ 29.787588] FS: 0000000000000000(0000) GS:ffff88842fa00000(0000) knlGS:0000000000000000 [ 29.789288] Workqueue: writeback wb_workfn [ 29.790319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 29.790321] (flush-8:0) [ 29.790844] CR2: 0000000000000008 CR3: 00000004234f2000 CR4: 00000000000006f0 [ 29.791924] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 29.792839] RIP: 0010:__memset+0x24/0x30 [ 29.793739] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 29.794256] Code: 90 90 90 90 90 90 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 033 [ 29.795161] Kernel panic - not syncing: Fatal exception in interrupt ... [ 29.808149] Call Trace: [ 29.808475] ext4_ext_insert_extent+0x102e/0x1be0 [ 29.809085] ext4_ext_map_blocks+0xa89/0x1bb0 [ 29.809652] ext4_map_blocks+0x290/0x8a0 [ 29.809085] ext4_ext_map_blocks+0xa89/0x1bb0 [ 29.809652] ext4_map_blocks+0x290/0x8a0 [ 29.810161] ext4_writepages+0xc85/0x17c0 ... Fix this by clearing buffer's verified bit if we read meta block from disk again. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200924073337.861472-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: limit entries returned when counting fsmap recordsDarrick J. Wong
If userspace asked fsmap to try to count the number of entries, we cannot return more than UINT_MAX entries because fmh_entries is u32. Therefore, stop counting if we hit this limit or else we will waste time to return truncated results. Fixes: 0c9ec4beecac ("ext4: support GETFSMAP ioctls") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Link: https://lore.kernel.org/r/20201001222148.GA49520@magnolia Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: make mb_check_counter per groupChunguang Xu
Make bb_check_counter per group, so each group has the same chance to be checked, which can expose errors more easily. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Link: https://lore.kernel.org/r/1601292995-32205-2-git-send-email-brookxu@tencent.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: delete invalid comments near mb_buddy_adjust_borderChunguang Xu
The comment near mb_buddy_adjust_border seems meaningless, just clear it. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Link: https://lore.kernel.org/r/1601292995-32205-1-git-send-email-brookxu@tencent.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: fix bdev write error check failed when mount fs with roZhang Xiaoxu
Consider a situation when a filesystem was uncleanly shutdown and the orphan list is not empty and a read-only mount is attempted. The orphan list cleanup during mount will fail with: ext4_check_bdev_write_error:193: comm mount: Error while async write back metadata This happens because sbi->s_bdev_wb_err is not initialized when mounting the filesystem in read only mode and so ext4_check_bdev_write_error() falsely triggers. Initialize sbi->s_bdev_wb_err unconditionally to avoid this problem. Fixes: bc71726c7257 ("ext4: abort the filesystem if failed to async write metadata buffer") Cc: stable@kernel.org Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200928020556.710971-1-zhangxiaoxu5@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: rename system_blks to s_system_blks inside ext4_sb_infoChunguang Xu
Rename system_blks to s_system_blks inside ext4_sb_info, keep the naming rules consistent with other variables, which is convenient for code reading and writing. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/1600916623-544-2-git-send-email-brookxu@tencent.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: rename journal_dev to s_journal_dev inside ext4_sb_infoChunguang Xu
Rename journal_dev to s_journal_dev inside ext4_sb_info, keep the naming rules consistent with other variables, which is convenient for code reading and writing. Signed-off-by: Chunguang Xu <brookxu@tencent.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/1600916623-544-1-git-send-email-brookxu@tencent.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18jbd2: fix the comment of struct jbd2_journal_handleHui Su
the struct name was modified long ago, but the comment still use struct handle_s. Signed-off-by: Hui Su <sh_def@163.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200922171231.GA53120@rlk Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: add trace exit in exception path.Zhang Qilong
Missing trace exit in exception path of ext4_sync_file and ext4_ind_map_blocks. Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20200921124738.23352-1-zhangqilong3@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: optimize file overwritesRitesh Harjani
In case if the file already has underlying blocks/extents allocated then we don't need to start a journal txn and can directly return the underlying mapping. Currently ext4_iomap_begin() is used by both DAX & DIO path. We can check if the write request is an overwrite & then directly return the mapping information. This could give a significant perf boost for multi-threaded writes specially random overwrites. On PPC64 VM with simulated pmem(DAX) device, ~10x perf improvement could be seen in random writes (overwrite). Also bcoz this optimizes away the spinlock contention during jbd2 slab cache allocation (jbd2_journal_handle). On x86 VM, ~2x perf improvement was observed. Reported-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/88e795d8a4d5cd22165c7ebe857ba91d68d8813e.1600401668.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: remove unused including <linux/version.h>Tian Tao
Remove including <linux/version.h> that don't need it. Signed-off-by: Tian Tao <tiantao6@hisilicon.com> Link: https://lore.kernel.org/r/1600397165-42873-1-git-send-email-tiantao6@hisilicon.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: fix superblock checksum calculation raceConstantine Sapuntzakis
The race condition could cause the persisted superblock checksum to not match the contents of the superblock, causing the superblock to be considered corrupt. An example of the race follows. A first thread is interrupted in the middle of a checksum calculation. Then, another thread changes the superblock, calculates a new checksum, and sets it. Then, the first thread resumes and sets the checksum based on the older superblock. To fix, serialize the superblock checksum calculation using the buffer header lock. While a spinlock is sufficient, the buffer header is already there and there is precedent for locking it (e.g. in ext4_commit_super). Tested the patch by booting up a kernel with the patch, creating a filesystem and some files (including some orphans), and then unmounting and remounting the file system. Cc: stable@kernel.org Signed-off-by: Constantine Sapuntzakis <costa@purestorage.com> Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200914161014.22275-1-costa@purestorage.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: fix error handling code in add_new_gdbDinghao Liu
When ext4_journal_get_write_access() fails, we should terminate the execution flow and release n_group_desc, iloc.bh, dind and gdb_bh. Cc: stable@kernel.org Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20200829025403.3139-1-dinghao.liu@zju.edu.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: disallow modifying DAX inode flag if inline_data has been setXiao Yang
inline_data is mutually exclusive to DAX so enabling both of them triggers the following issue: ------------------------------------------ # mkfs.ext4 -F -O inline_data /dev/pmem1 ... # mount /dev/pmem1 /mnt # echo 'test' >/mnt/file # lsattr -l /mnt/file /mnt/file Inline_Data # xfs_io -c "chattr +x" /mnt/file # xfs_io -c "lsattr -v" /mnt/file [dax] /mnt/file # umount /mnt # mount /dev/pmem1 /mnt # cat /mnt/file cat: /mnt/file: Numerical result out of range ------------------------------------------ Fixes: b383a73f2b83 ("fs/ext4: Introduce DAX inode flag") Signed-off-by: Xiao Yang <yangx.jy@cn.fujitsu.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://lore.kernel.org/r/20200828084330.15776-1-yangx.jy@cn.fujitsu.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: remove unused argument from ext4_(inc|dec)_countNikolay Borisov
The 'handle' argument is not used for anything so simply remove it. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20200826133116.11592-1-nborisov@suse.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: do not interpret high bytes if 64bit feature is disabledPetr Malat
Fields s_free_blocks_count_hi, s_r_blocks_count_hi and s_blocks_count_hi are not valid if EXT4_FEATURE_INCOMPAT_64BIT is not enabled and should be treated as zeroes. Signed-off-by: Petr Malat <oss@malat.biz> Link: https://lore.kernel.org/r/20200825150016.3363-1-oss@malat.biz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: delete duplicated words + other fixesRandy Dunlap
Delete repeated words in fs/ext4/. {the, this, of, we, after} Also change spelling of "xttr" in inline.c to "xattr" in 2 places. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200805024850.12129-1-rdunlap@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: flag as supporting buffered async readsJens Axboe
ext4 uses generic_file_read_iter(), which already supports this. Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/fb90cc2d-b12c-738f-21a4-dd7a8ae0556a@kernel.dk Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: fix leaking sysfs kobject after failed mountEric Biggers
ext4_unregister_sysfs() only deletes the kobject. The reference to it needs to be put separately, like ext4_put_super() does. This addresses the syzbot report "memory leak in kobject_set_name_vargs (3)" (https://syzkaller.appspot.com/bug?extid=9f864abad79fae7c17e1). Reported-by: syzbot+9f864abad79fae7c17e1@syzkaller.appspotmail.com Fixes: 72ba74508b28 ("ext4: release sysfs kobject when failing to enable quotas on mount") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20200922162456.93657-1-ebiggers@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: discard preallocations before releasing group lockJan Kara
ext4_mb_discard_group_preallocations() can be releasing group lock with preallocations accumulated on its local list. Thus although discard_pa_seq was incremented and concurrent allocating processes will be retrying allocations, it can happen that premature ENOSPC error is returned because blocks used for preallocations are not available for reuse yet. Make sure we always free locally accumulated preallocations before releasing group lock. Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20200924150959.4335-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: fix dead loop in ext4_mb_new_blocksYe Bin
As we test disk offline/online with running fsstress, we find fsstress process is keeping running state. kworker/u32:3-262 [004] ...1 140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114 .... kworker/u32:3-262 [004] ...1 140.787471: ext4_mb_discard_preallocations: dev 8,32 needed 114 ext4_mb_new_blocks repeat: ext4_mb_discard_preallocations_should_retry(sb, ac, &seq) freed = ext4_mb_discard_preallocations ext4_mb_discard_group_preallocations this_cpu_inc(discard_pa_seq); ---> freed == 0 seq_retry = ext4_get_discard_pa_seq_sum for_each_possible_cpu(__cpu) __seq += per_cpu(discard_pa_seq, __cpu); if (seq_retry != *seq) { *seq = seq_retry; ret = true; } As we see seq_retry is sum of discard_pa_seq every cpu, if ext4_mb_discard_group_preallocations return zero discard_pa_seq in this cpu maybe increase one, so condition "seq_retry != *seq" have always been met. Ritesh Harjani suggest to in ext4_mb_discard_group_preallocations function we only increase discard_pa_seq when there is some PA to free. Fixes: 07b5b8e1ac40 ("ext4: mballoc: introduce pcpu seqcnt for freeing PA to improve ENOSPC handling") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20200916113859.1556397-3-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-18ext4: implement swap_activate aops using iomapRitesh Harjani
After moving ext4's bmap to iomap interface, swapon functionality on files created using fallocate (which creates unwritten extents) are failing. This is since iomap_bmap interface returns 0 for unwritten extents and thus generic_swapfile_activate considers this as holes and hence bail out with below kernel msg :- [340.915835] swapon: swapfile has holes To fix this we need to implement ->swap_activate aops in ext4 which will use ext4_iomap_report_ops. Since we only need to return the list of extents so ext4_iomap_report_ops should be enough. Cc: stable@kernel.org Reported-by: Yuxuan Shui <yshuiv7@gmail.com> Fixes: ac58e4fb03f ("ext4: move ext4 bmap to use iomap infrastructure") Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/r/20200904091653.1014334-1-riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2020-10-17coccinelle: api: add kfree_mismatch scriptDenis Efremov
Check that alloc and free types of functions match each other. Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
2020-10-17mm: use limited read-ahead to satisfy readJens Axboe
For the case where read-ahead is disabled on the file, or if the cgroup is congested, ensure that we can at least do 1 page of read-ahead to make progress on the read in an async fashion. This could potentially be larger, but it's not needed in terms of functionality, so let's error on the side of caution as larger counts of pages may run into reclaim issues (particularly if we're congested). This makes sure we're not hitting the potentially sync ->readpage() path for IO that is marked IOCB_WAITQ, which could cause us to block. It also means we'll use the same path for IO, regardless of whether or not read-ahead happens to be disabled on the lower level device. Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: Hao_Xu <haoxu@linux.alibaba.com> [axboe: updated for new ractl API] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17mm: mark async iocb read as NOWAIT once some data has been copiedJens Axboe
Once we've copied some data for an iocb that is marked with IOCB_WAITQ, we should no longer attempt to async lock a new page. Instead make sure we return the copied amount, and let the caller retry, instead of returning -EIOCBQUEUED for a new page. This should only be possible with read-ahead disabled on the below device, and multiple threads racing on the same file. Haven't been able to reproduce on anything else. Cc: stable@vger.kernel.org # v5.9 Fixes: 1a0a7853b901 ("mm: support async buffered reads in generic_file_buffered_read()") Reported-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17Merge tag 'perf-tools-for-v5.10-2020-10-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: - cgroup improvements for 'perf stat', allowing for compact specification of events and cgroups in the command line. - Support per thread topdown metrics in 'perf stat'. - Support sample-read topdown metric group in 'perf record' - Show start of latency in addition to its start in 'perf sched latency'. - Add min, max to 'perf script' futex-contention output, in addition to avg. - Allow usage of 'perf_event_attr->exclusive' attribute via the new ':e' event modifier. - Add 'snapshot' command to 'perf record --control', using it with Intel PT. - Support FIFO file names as alternative options to 'perf record --control'. - Introduce branch history "streams", to compare 'perf record' runs with 'perf diff' based on branch records and report hot streams. - Support PE executable symbol tables using libbfd, to profile, for instance, wine binaries. - Add filter support for option 'perf ftrace -F/--funcs'. - Allow configuring the 'disassembler_style' 'perf annotate' knob via 'perf config' - Update CascadelakeX and SkylakeX JSON vendor events files. - Add support for parsing perchip/percore JSON vendor events. - Add power9 hv_24x7 core level metric events. - Add L2 prefetch, ITLB instruction fetch hits JSON events for AMD zen1. - Enable Family 19h users by matching Zen2 AMD vendor events. - Use debuginfod in 'perf probe' when required debug files not found locally. - Display negative tid in non-sample events in 'perf script'. - Make GTK2 support opt-in - Add build test with GTK+ - Add missing -lzstd to the fast path feature detection - Add scripts to auto generate 'mmap', 'mremap' string<->id tables for use in 'perf trace'. - Show python test script in verbose mode. - Fix uncore metric expressions - Msan uninitialized use fixes. - Use condition variables in 'perf bench numa' - Autodetect python3 binary in systems without python2. - Support md5 build ids in addition to sha1. - Add build id 'perf test' regression test. - Fix printable strings in python3 scripts. - Fix off by ones in 'perf trace' in arches using libaudit. - Fix JSON event code for events referencing std arch events. - Introduce 'perf test' shell script for Arm CoreSight testing. - Add rdtsc() for Arm64 for used in the PERF_RECORD_TIME_CONV metadata event and in 'perf test tsc'. - 'perf c2c' improvements: Add "RMT Load Hit" metric, "Total Stores", fixes and documentation update. - Fix usage of reloc_sym in 'perf probe' when using both kallsyms and debuginfo files. - Do not print 'Metric Groups:' unnecessarily in 'perf list' - Refcounting fixes in the event parsing code. - Add expand cgroup event 'perf test' entry. - Fix out of bounds CPU map access when handling armv8_pmu events in 'perf stat'. - Add build-id injection 'perf bench' benchmark. - Enter namespace when reading build-id in 'perf inject'. - Do not load map/dso when injecting build-id speeding up the 'perf inject' process. - Add --buildid-all option to avoid processing all samples, just the mmap metadata events. - Add feature test to check if libbfd has buildid support - Add 'perf test' entry for PE binary format support. - Fix typos in power8 PMU vendor events JSON files. - Hide libtraceevent non API functions. * tag 'perf-tools-for-v5.10-2020-10-15' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (113 commits) perf c2c: Update documentation for metrics reorganization perf c2c: Add metrics "RMT Load Hit" perf c2c: Correct LLC load hit metrics perf c2c: Change header for LLC local hit perf c2c: Use more explicit headers for HITM perf c2c: Change header from "LLC Load Hitm" to "Load Hitm" perf c2c: Organize metrics based on memory hierarchy perf c2c: Display "Total Stores" as a standalone metrics perf c2c: Display the total numbers continuously perf bench: Use condition variables in numa. perf jevents: Fix event code for events referencing std arch events perf diff: Support hot streams comparison perf streams: Report hot streams perf streams: Calculate the sum of total streams hits perf streams: Link stream pair perf streams: Compare two streams perf streams: Get the evsel_streams by evsel_idx perf streams: Introduce branch history "streams" perf intel-pt: Improve PT documentation slightly perf tools: Add support for exclusive groups/events ...
2020-10-17Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "A usual cycle for RDMA with a typical mix of driver and core subsystem updates: - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma, hns, usnic, qib, qedr, cxgb4, hns, bnxt_re - Various rtrs fixes and updates - Bug fix for mlx4 CM emulation for virtualization scenarios where MRA wasn't working right - Use tracepoints instead of pr_debug in the CM code - Scrub the locking in ucma and cma to close more syzkaller bugs - Use tasklet_setup in the subsystem - Revert the idea that 'destroy' operations are not allowed to fail at the driver level. This proved unworkable from a HW perspective. - Revise how the umem API works so drivers make fewer mistakes using it - XRC support for qedr - Convert uverbs objects RWQ and MW to new the allocation scheme - Large queue entry sizes for hns - Use hmm_range_fault() for mlx5 On Demand Paging - uverbs APIs to inspect the GID table instead of sysfs - Move some of the RDMA code for building large page SGLs into lib/scatterlist" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits) RDMA/ucma: Fix use after free in destroy id flow RDMA/rxe: Handle skb_clone() failure in rxe_recv.c RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI RDMA: Explicitly pass in the dma_device to ib_register_device lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values IB/mlx4: Convert rej_tmout radix-tree to XArray RDMA/rxe: Fix bug rejecting all multicast packets RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt() RDMA/rxe: Remove duplicate entries in struct rxe_mr IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS IB/rdmavt: Fix sizeof mismatch MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl. RDMA/bnxt_re: Use rdma_umem_for_each_dma_block() RDMA/umem: Move to allocate SG table from pages lib/scatterlist: Add support in dynamic allocation of SG table from pages tools/testing/scatterlist: Show errors in human readable form tools/testing/scatterlist: Rejuvenate bit-rotten test RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces RDMA/uverbs: Expose the new GID query API to user space ...
2020-10-17Merge tag 'i3c/for-5.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux Pull i3c updates from Boris Brezillon: - Fix DAA for the pre-reserved address case - Fix an error path in the cadence driver * tag 'i3c/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux: i3c: master: Fix error return in cdns_i3c_master_probe() i3c: master: fix for SETDASA and DAA process i3c: master add i3c_master_attach_boardinfo to preserve boardinfo
2020-10-17Merge tag 'mtd/for-5.10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull MTD updates from Richard Weinberger: "NAND core changes: - Drop useless 'depends on' in Kconfig - Add an extra level in the Kconfig hierarchy - Trivial spellings - Dynamic allocation of the interface configurations - Dropping the default ONFI timing mode - Various cleanup (types, structures, naming, comments) - Hide the chip->data_interface indirection - Add the generic rb-gpios property - Add the ->choose_interface_config() hook - Introduce nand_choose_best_sdr_timings() - Use default values for tPROG_max and tBERS_max - Avoid redefining tR_max and tCCS_min - Add a helper to find the closest ONFI mode - bcm63xx MTD parsers: simplify CFE detection Raw NAND controller drivers changes: - fsl-upm: Deprecation of specific DT properties - fsl_upm: Driver rework and cleanup in favor of ->exec_op() - Ingenic: Cleanup ARRAY_SIZE() vs sizeof() use - brcmnand: ECC error handling on EDU transfers - brcmnand: Don't default to EDU transfers - qcom: Set BAM mode only if not set already - qcom: Avoid write to unavailable register - gpio: Driver rework in favor of ->exec_op() - tango: ->exec_op() conversion - mtk: ->exec_op() conversion Raw NAND chip drivers changes: - toshiba: Implement ->choose_interface_config() for TH58NVG2S3HBAI4 - toshiba: Implement ->choose_interface_config() for TC58NVG0S3E - toshiba: Implement ->choose_interface_config() for TC58TEG5DCLTA00 - hynix: Implement ->choose_interface_config() for H27UCG8T2ATR-BC HyperBus changes: - DMA support for TI's AM654 HyperBus controller driver. - HyperBus frontend driver for Renesas RPC-IF driver. SPI NOR core changes: - Support for Winbond w25q64jwm flash - Enable 4K sector support for mx25l12805d SPI NOR controller drivers changes: - intel-spi Add Alder Lake-S PCI ID MTD Core changes: - mtdoops: Don't run panic write twice - mtdconcat: Correctly handle panic write - Use DEFINE_SHOW_ATTRIBUTE" * tag 'mtd/for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: (76 commits) mtd: hyperbus: Fix build failure when only RPCIF_HYPERBUS is enabled mtd: hyperbus: add Renesas RPC-IF driver Revert "mtd: spi-nor: Prefer asynchronous probe" mtd: parsers: bcm63xx: Do not make it modular mtd: spear_smi: Enable compile testing mtd: maps: vmu-flash: fix typos for struct memcard mtd: physmap: Add Baikal-T1 physically mapped ROM support mtd: maps: vmu-flash: simplify the return expression of probe_maple_vmu mtd: onenand: simplify the return expression of onenand_transfer_auto_oob mtd: rawnand: cadence: remove a redundant dev_err call mtd: rawnand: ams-delta: Fix non-OF build warning mtd: rawnand: Don't overwrite the error code from nand_set_ecc_soft_ops() mtd: rawnand: Introduce nand_set_ecc_on_host_ops() mtd: rawnand: atmel: Check return values for nand_read_data_op mtd: rawnand: vf610: Remove unused function vf610_nfc_transfer_size() mtd: rawnand: qcom: Simplify with dev_err_probe() mtd: rawnand: marvell: Fix and update kerneldoc mtd: rawnand: marvell: Simplify with dev_err_probe() mtd: rawnand: gpmi: Simplify with dev_err_probe() mtd: rawnand: atmel: Simplify with dev_err_probe() ...
2020-10-17Merge tag 'thermal-v5.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux Pull thermal updates from Daniel Lezcano: - Fix Kconfig typo "acces" -> "access" (Colin Ian King) - Use dev_error_probe() to simplify the error handling on imx and imx8 platforms (Anson Huang) - Use dedicated kobj_to_dev() instead of container_of() in the sysfs core code (Tian Tao) - Fix coding style by adding braces to a one line conditional statement on rcar (Geert Uytterhoeven) - Add DT binding documentation for the r8a774e1 platform and update the Kconfig description supporting RZ/G2 SoCs (Lad Prabhakar) - Simplify the return expression of stm_thermal_prepare on the stm32 platform (Qinglang Miao) - Fix the unit in the function documentation for the idle injection cooling device (Zhuguang Qing) - Remove an unecessary mutex_init() in the core code (Qinglang Miao) - Add support for keep alive events in the core code and the specific int340x (Srinivas Pandruvada) - Remove unused thermal zone variable in devfreq and cpufreq cooling devices (Zhuguang Qing) - Add the A100's THS controller support (Yangtao Li) - Add power management on the omap3's bandgap sensor (Adam Ford) - Fix a missing nlmsg_free in the netlink core error path (Jing Xiangfeng) * tag 'thermal-v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: thermal: core: Adding missing nlmsg_free() in thermal_genl_sampling_temp() thermal: ti-soc-thermal: Enable addition power management thermal: sun8i: Add A100's THS controller support thermal: sun8i: add TEMP_CALIB_MASK for calibration data in sun50i_h6_ths_calibrate dt-bindings: thermal: sun8i: Add binding for A100's THS controller thermal: cooling: Remove unused variable *tz thermal: int340x: Add keep alive response method thermal: core: Add new event for sending keep alive notifications thermal: int340x: Provide notification for OEM variable change thermal: core: remove unnecessary mutex_init() thermal/idle_inject: Fix comment of idle_duration_us and name of latency_ns thermal: Kconfig: Update description for RCAR_GEN3_THERMAL config thermal: stm32: simplify the return expression of stm_thermal_prepare() dt-bindings: thermal: rcar-gen3-thermal: Add r8a774e1 support thermal: rcar_thermal: Add missing braces to conditional statement thermal: Use kobj_to_dev() instead of container_of() thermal: imx8mm: Use dev_err_probe() to simplify error handling thermal: imx: Use dev_err_probe() to simplify error handling drivers: thermal: Kconfig: fix spelling mistake "acces" -> "access"
2020-10-17io_uring: fix double poll mask initPavel Begunkov
__io_queue_proc() is used by both, poll reqs and apoll. Don't use req->poll.events to copy poll mask because for apoll it aliases with private data of the request. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io-wq: inherit audit loginuid and sessionidJens Axboe
Make sure the async io-wq workers inherit the loginuid and sessionid from the original task, and restore them to unset once we're done with the async work item. While at it, disable the ability for kernel threads to write to their own loginuid. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17io_uring: use percpu counters to track inflight requestsJens Axboe
Even though we place the req_issued and req_complete in separate cachelines, there's considerable overhead in doing the atomics particularly on the completion side. Get rid of having the two counters, and just use a percpu_counter for this. That's what it was made for, after all. This considerably reduces the overhead in __io_free_req(). Signed-off-by: Jens Axboe <axboe@kernel.dk>