summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-06-05vmalloc: check CONFIG_EXECMEM in is_vmalloc_or_module_addr()Cong Wang
After commit 2c9e5d4a0082 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of") CONFIG_BPF_JIT does not depend on CONFIG_MODULES any more and bpf jit also uses the [MODULES_VADDR, MODULES_END] memory region. But is_vmalloc_or_module_addr() still checks CONFIG_MODULES, which then returns false for a bpf jit memory region when CONFIG_MODULES is not defined. It leads to the following kernel BUG: [ 1.567023] ------------[ cut here ]------------ [ 1.567883] kernel BUG at mm/vmalloc.c:745! [ 1.568477] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 1.569367] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.9.0+ #448 [ 1.570247] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 [ 1.570786] RIP: 0010:vmalloc_to_page+0x48/0x1ec [ 1.570786] Code: 0f 00 00 e8 eb 1a 05 00 b8 37 00 00 00 48 ba fe ff ff ff ff 1f 00 00 4c 03 25 76 49 c6 02 48 c1 e0 28 48 01 e8 48 39 d0 76 02 <0f> 0b 4c 89 e7 e8 bf 1a 05 00 49 8b 04 24 48 a9 9f ff ff ff 0f 84 [ 1.570786] RSP: 0018:ffff888007787960 EFLAGS: 00010212 [ 1.570786] RAX: 000036ffa0000000 RBX: 0000000000000640 RCX: ffffffff8147e93c [ 1.570786] RDX: 00001ffffffffffe RSI: dffffc0000000000 RDI: ffffffff840e32c8 [ 1.570786] RBP: ffffffffa0000000 R08: 0000000000000000 R09: 0000000000000000 [ 1.570786] R10: ffff888007787a88 R11: ffffffff8475d8e7 R12: ffffffff83e80ff8 [ 1.570786] R13: 0000000000000640 R14: 0000000000000640 R15: 0000000000000640 [ 1.570786] FS: 0000000000000000(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000 [ 1.570786] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.570786] CR2: ffff888006a01000 CR3: 0000000003e80000 CR4: 0000000000350ef0 [ 1.570786] Call Trace: [ 1.570786] <TASK> [ 1.570786] ? __die_body+0x1b/0x58 [ 1.570786] ? die+0x31/0x4b [ 1.570786] ? do_trap+0x9d/0x138 [ 1.570786] ? vmalloc_to_page+0x48/0x1ec [ 1.570786] ? do_error_trap+0xcd/0x102 [ 1.570786] ? vmalloc_to_page+0x48/0x1ec [ 1.570786] ? vmalloc_to_page+0x48/0x1ec [ 1.570786] ? handle_invalid_op+0x2f/0x38 [ 1.570786] ? vmalloc_to_page+0x48/0x1ec [ 1.570786] ? exc_invalid_op+0x2b/0x41 [ 1.570786] ? asm_exc_invalid_op+0x16/0x20 [ 1.570786] ? vmalloc_to_page+0x26/0x1ec [ 1.570786] ? vmalloc_to_page+0x48/0x1ec [ 1.570786] __text_poke+0xb6/0x458 [ 1.570786] ? __pfx_text_poke_memcpy+0x10/0x10 [ 1.570786] ? __pfx___mutex_lock+0x10/0x10 [ 1.570786] ? __pfx___text_poke+0x10/0x10 [ 1.570786] ? __pfx_get_random_u32+0x10/0x10 [ 1.570786] ? srso_return_thunk+0x5/0x5f [ 1.570786] text_poke_copy_locked+0x70/0x84 [ 1.570786] text_poke_copy+0x32/0x4f [ 1.570786] bpf_arch_text_copy+0xf/0x27 [ 1.570786] bpf_jit_binary_pack_finalize+0x26/0x5a [ 1.570786] bpf_int_jit_compile+0x576/0x8ad [ 1.570786] ? __pfx_bpf_int_jit_compile+0x10/0x10 [ 1.570786] ? srso_return_thunk+0x5/0x5f [ 1.570786] ? __kmalloc_node_track_caller+0x2b5/0x2e0 [ 1.570786] bpf_prog_select_runtime+0x7c/0x199 [ 1.570786] bpf_prepare_filter+0x1e9/0x25b [ 1.570786] ? __pfx_bpf_prepare_filter+0x10/0x10 [ 1.570786] ? srso_return_thunk+0x5/0x5f [ 1.570786] ? _find_next_bit+0x29/0x7e [ 1.570786] bpf_prog_create+0xb8/0xe0 [ 1.570786] ptp_classifier_init+0x75/0xa1 [ 1.570786] ? __pfx_ptp_classifier_init+0x10/0x10 [ 1.570786] ? srso_return_thunk+0x5/0x5f [ 1.570786] ? register_pernet_subsys+0x36/0x42 [ 1.570786] ? srso_return_thunk+0x5/0x5f [ 1.570786] sock_init+0x99/0xa3 [ 1.570786] ? __pfx_sock_init+0x10/0x10 [ 1.570786] do_one_initcall+0x104/0x2c4 [ 1.570786] ? __pfx_do_one_initcall+0x10/0x10 [ 1.570786] ? parameq+0x25/0x2d [ 1.570786] ? rcu_is_watching+0x1c/0x3c [ 1.570786] ? trace_kmalloc+0x81/0xb2 [ 1.570786] ? srso_return_thunk+0x5/0x5f [ 1.570786] ? __kmalloc+0x29c/0x2c7 [ 1.570786] ? srso_return_thunk+0x5/0x5f [ 1.570786] do_initcalls+0xf9/0x123 [ 1.570786] kernel_init_freeable+0x24f/0x289 [ 1.570786] ? __pfx_kernel_init+0x10/0x10 [ 1.570786] kernel_init+0x19/0x13a [ 1.570786] ret_from_fork+0x24/0x41 [ 1.570786] ? __pfx_kernel_init+0x10/0x10 [ 1.570786] ret_from_fork_asm+0x1a/0x30 [ 1.570786] </TASK> [ 1.570819] ---[ end trace 0000000000000000 ]--- [ 1.571463] RIP: 0010:vmalloc_to_page+0x48/0x1ec [ 1.572111] Code: 0f 00 00 e8 eb 1a 05 00 b8 37 00 00 00 48 ba fe ff ff ff ff 1f 00 00 4c 03 25 76 49 c6 02 48 c1 e0 28 48 01 e8 48 39 d0 76 02 <0f> 0b 4c 89 e7 e8 bf 1a 05 00 49 8b 04 24 48 a9 9f ff ff ff 0f 84 [ 1.574632] RSP: 0018:ffff888007787960 EFLAGS: 00010212 [ 1.575129] RAX: 000036ffa0000000 RBX: 0000000000000640 RCX: ffffffff8147e93c [ 1.576097] RDX: 00001ffffffffffe RSI: dffffc0000000000 RDI: ffffffff840e32c8 [ 1.577084] RBP: ffffffffa0000000 R08: 0000000000000000 R09: 0000000000000000 [ 1.578077] R10: ffff888007787a88 R11: ffffffff8475d8e7 R12: ffffffff83e80ff8 [ 1.578810] R13: 0000000000000640 R14: 0000000000000640 R15: 0000000000000640 [ 1.579823] FS: 0000000000000000(0000) GS:ffff88806cc00000(0000) knlGS:0000000000000000 [ 1.580992] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.581869] CR2: ffff888006a01000 CR3: 0000000003e80000 CR4: 0000000000350ef0 [ 1.582800] Kernel panic - not syncing: Fatal exception [ 1.583765] ---[ end Kernel panic - not syncing: Fatal exception ]--- Fix this by checking CONFIG_EXECMEM instead. Link: https://lkml.kernel.org/r/20240528160838.102223-1-xiyou.wangcong@gmail.com Fixes: 2c9e5d4a0082 ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of") Signed-off-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05mm: page_alloc: fix highatomic typing in multi-block buddiesJohannes Weiner
Christoph reports a page allocator splat triggered by xfstests: generic/176 214s ... [ 1204.507931] run fstests generic/176 at 2024-05-27 12:52:30 XFS (nvme0n1): Mounting V5 Filesystem cd936307-415f-48a3-b99d-a2d52ae1f273 XFS (nvme0n1): Ending clean mount XFS (nvme1n1): Mounting V5 Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850 XFS (nvme1n1): Ending clean mount XFS (nvme1n1): Unmounting Filesystem ab3ee1a4-af62-4934-9a6a-6c2fde321850 XFS (nvme1n1): Mounting V5 Filesystem 7099b02d-9c58-4d1d-be1d-2cc472d12cd9 XFS (nvme1n1): Ending clean mount ------------[ cut here ]------------ page type is 3, passed migratetype is 1 (nr=512) WARNING: CPU: 0 PID: 509870 at mm/page_alloc.c:645 expand+0x1c5/0x1f0 Modules linked in: i2c_i801 crc32_pclmul i2c_smbus [last unloaded: scsi_debug] CPU: 0 PID: 509870 Comm: xfs_io Not tainted 6.10.0-rc1+ #2437 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 RIP: 0010:expand+0x1c5/0x1f0 Code: 05 16 70 bf 02 01 e8 ca fc ff ff 8b 54 24 34 44 89 e1 48 c7 c7 80 a2 28 83 48 89 c6 b8 01 00 3 RSP: 0018:ffffc90003b2b968 EFLAGS: 00010082 RAX: 0000000000000000 RBX: ffffffff83fa9480 RCX: 0000000000000000 RDX: 0000000000000005 RSI: 0000000000000027 RDI: 00000000ffffffff RBP: 00000000001f2600 R08: 00000000fffeffff R09: 0000000000000001 R10: 0000000000000000 R11: ffffffff83676200 R12: 0000000000000009 R13: 0000000000000200 R14: 0000000000000001 R15: ffffea0007c98000 FS: 00007f72ca3d5780(0000) GS:ffff8881f9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f72ca1fff38 CR3: 00000001aa0c6002 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff07f0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x7b/0x120 ? expand+0x1c5/0x1f0 ? report_bug+0x191/0x1c0 ? handle_bug+0x3c/0x80 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? expand+0x1c5/0x1f0 ? expand+0x1c5/0x1f0 __rmqueue_pcplist+0x3a9/0x730 get_page_from_freelist+0x7a0/0xf00 __alloc_pages_noprof+0x153/0x2e0 __folio_alloc_noprof+0x10/0xa0 __filemap_get_folio+0x16b/0x370 iomap_write_begin+0x496/0x680 While trying to service a movable allocation (page type 1), the page allocator runs into a two-pageblock buddy on the movable freelist whose second block is typed as highatomic (page type 3). This inconsistency is caused by the highatomic reservation system operating on single pageblocks, while MAX_ORDER can be bigger than that - in this configuration, pageblock_order is 9 while MAX_PAGE_ORDER is 10. The test case is observed to make several adjacent order-3 requests with __GFP_DIRECT_RECLAIM cleared, which marks the surrounding block as highatomic. Upon freeing, the blocks merge into an order-10 buddy. When the highatomic pool is drained later on, this order-10 buddy gets moved back to the movable list, but only the first pageblock is marked movable again. A subsequent expand() of this buddy warns about the tail being of a different type. This is a long-standing bug that's surfaced by the recent block type warnings added to the allocator. The consequences seem mostly benign, it just results in odd behavior: the highatomic tail blocks are not properly drained, instead they end up on the movable list first, then go back to the highatomic list after an alloc-free cycle. To fix this, make the highatomic reservation code aware that allocations/buddies can be larger than a pageblock. While it's an old quirk, the recently added type consistency warnings seem to be the most prominent consequence of it. Set the Fixes: tag accordingly to highlight this backporting dependency. Link: https://lkml.kernel.org/r/20240530114203.GA1222079@cmpxchg.org Fixes: e0932b6c1f94 ("mm: page_alloc: consolidate free page accounting") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Tested-by: Christoph Hellwig <hch@lst.de> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05nilfs2: fix potential kernel bug due to lack of writeback flag waitingRyusuke Konishi
Destructive writes to a block device on which nilfs2 is mounted can cause a kernel bug in the folio/page writeback start routine or writeback end routine (__folio_start_writeback in the log below): kernel BUG at mm/page-writeback.c:3070! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI ... RIP: 0010:__folio_start_writeback+0xbaa/0x10e0 Code: 25 ff 0f 00 00 0f 84 18 01 00 00 e8 40 ca c6 ff e9 17 f6 ff ff e8 36 ca c6 ff 4c 89 f7 48 c7 c6 80 c0 12 84 e8 e7 b3 0f 00 90 <0f> 0b e8 1f ca c6 ff 4c 89 f7 48 c7 c6 a0 c6 12 84 e8 d0 b3 0f 00 ... Call Trace: <TASK> nilfs_segctor_do_construct+0x4654/0x69d0 [nilfs2] nilfs_segctor_construct+0x181/0x6b0 [nilfs2] nilfs_segctor_thread+0x548/0x11c0 [nilfs2] kthread+0x2f0/0x390 ret_from_fork+0x4b/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> This is because when the log writer starts a writeback for segment summary blocks or a super root block that use the backing device's page cache, it does not wait for the ongoing folio/page writeback, resulting in an inconsistent writeback state. Fix this issue by waiting for ongoing writebacks when putting folios/pages on the backing device into writeback state. Link: https://lkml.kernel.org/r/20240530141556.4411-1-konishi.ryusuke@gmail.com Fixes: 9ff05123e3bf ("nilfs2: segment constructor") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05memcg: remove the lockdep assert from __mod_objcg_mlstate()Sebastian Andrzej Siewior
The assert was introduced in the commit cited below as an insurance that the semantic is the same after the local_irq_save() has been removed and the function has been made static. The original requirement to disable interrupt was due the modification of per-CPU counters which require interrupts to be disabled because the counter update operation is not atomic and some of the counters are updated from interrupt context. All callers of __mod_objcg_mlstate() acquire a lock (memcg_stock.stock_lock) which disables interrupts on !PREEMPT_RT and the lockdep assert is satisfied. On PREEMPT_RT the interrupts are not disabled and the assert triggers. The safety of the counter update is already ensured by VM_WARN_ON_IRQS_ENABLED() which is part of __mod_memcg_lruvec_state() and does not require yet another check. Remove the lockdep assert from __mod_objcg_mlstate(). Link: https://lkml.kernel.org/r/20240528141341.rz_rytN_@linutronix.de Fixes: 91882c1617c1 ("memcg: simple cleanup of stats update functions") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05mm: arm64: fix the out-of-bounds issue in contpte_clear_young_dirty_ptesBarry Song
We are passing a huge nr to __clear_young_dirty_ptes() right now. While we should pass the number of pages, we are actually passing CONT_PTE_SIZE. This is causing lots of crashes of MADV_FREE, panic oops could vary everytime. Link: https://lkml.kernel.org/r/20240524005444.135417-1-21cnbao@gmail.com Fixes: 89e86854fb0a ("mm/arm64: override clear_young_dirty_ptes() batch helper") Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Lance Yang <ioworker0@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Chris Li <chrisl@kernel.org> Cc: Barry Song <21cnbao@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Jeff Xie <xiehuan09@gmail.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Peter Xu <peterx@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05mm: huge_mm: fix undefined reference to `mthp_stats' for CONFIG_SYSFS=nBarry Song
if CONFIG_SYSFS is not enabled in config, we get the below error, All errors (new ones prefixed by >>): s390-linux-ld: mm/memory.o: in function `count_mthp_stat': >> include/linux/huge_mm.h:285:(.text+0x191c): undefined reference to `mthp_stats' s390-linux-ld: mm/huge_memory.o:(.rodata+0x10): undefined reference to `mthp_stats' vim +285 include/linux/huge_mm.h 279 280 static inline void count_mthp_stat(int order, enum mthp_stat_item item) 281 { 282 if (order <= 0 || order > PMD_ORDER) 283 return; 284 > 285 this_cpu_inc(mthp_stats.stats[order][item]); 286 } 287 Link: https://lkml.kernel.org/r/20240523210045.40444-1-21cnbao@gmail.com Fixes: ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202405231728.tCAogiSI-lkp@intel.com/ Signed-off-by: Barry Song <v-songbaohua@oppo.com> Tested-by: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05mm: drop the 'anon_' prefix for swap-out mTHP countersBaolin Wang
The mTHP swap related counters: 'anon_swpout' and 'anon_swpout_fallback' are confusing with an 'anon_' prefix, since the shmem can swap out non-anonymous pages. So drop the 'anon_' prefix to keep consistent with the old swap counter names. This is needed in 6.10-rcX to avoid having an inconsistent ABI out in the field. Link: https://lkml.kernel.org/r/7a8989c13299920d7589007a30065c3e2c19f0e0.1716431702.git.baolin.wang@linux.alibaba.com Fixes: d0f048ac39f6 ("mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters") Fixes: 42248b9d34ea ("mm: add docs for per-order mTHP counters and transhuge_page ABI") Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Suggested-by: "Huang, Ying" <ying.huang@intel.com> Acked-by: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05drm/vmwgfx: Don't destroy Screen Target when CRTC is enabled but inactiveIan Forbes
drm_crtc_helper_funcs::atomic_disable can be called even when the CRTC is still enabled. This can occur when the mode changes or the CRTC is set as inactive. In the case where the CRTC is being set as inactive we only want to blank the screen. The Screen Target should remain intact as long as the mode has not changed and CRTC is enabled. This fixes a bug with GDM where locking the screen results in a permanent black screen because the Screen Target is no longer defined. Fixes: 7b0062036c3b ("drm/vmwgfx: Implement virtual crc generation") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Reviewed-by: Martin Krastev <martin.krastev@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240531203358.26677-1-ian.forbes@broadcom.com
2024-06-05drm/vmwgfx: Standardize use of kibibytes when loggingIan Forbes
Use the same standard abbreviation KiB instead of incorrect variants. Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-5-ian.forbes@broadcom.com
2024-06-05drm/vmwgfx: Remove STDU logic from generic mode_valid functionIan Forbes
STDU has its own mode_valid function now so this logic can be removed from the generic version. Fixes: 935f795045a6 ("drm/vmwgfx: Refactor drm connector probing for display modes") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-4-ian.forbes@broadcom.com
2024-06-05drm/vmwgfx: 3D disabled should not effect STDU memory limitsIan Forbes
This limit became a hard cap starting with the change referenced below. Surface creation on the device will fail if the requested size is larger than this limit so altering the value arbitrarily will expose modes that are too large for the device's hard limits. Fixes: 7ebb47c9f9ab ("drm/vmwgfx: Read new register for GB memory when available") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-3-ian.forbes@broadcom.com
2024-06-05drm/vmwgfx: Filter modes which exceed graphics memoryIan Forbes
SVGA requires individual surfaces to fit within graphics memory (max_mob_pages) which means that modes with a final buffer size that would exceed graphics memory must be pruned otherwise creation will fail. Additionally llvmpipe requires its buffer height and width to be a multiple of its tile size which is 64. As a result we have to anticipate that llvmpipe will round up the mode size passed to it by the compositor when it creates buffers and filter modes where this rounding exceeds graphics memory. This fixes an issue where VMs with low graphics memory (< 64MiB) configured with high resolution mode boot to a black screen because surface creation fails. Fixes: d947d1b71deb ("drm/vmwgfx: Add and connect connector helper function") Signed-off-by: Ian Forbes <ian.forbes@broadcom.com> Signed-off-by: Zack Rusin <zack.rusin@broadcom.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-2-ian.forbes@broadcom.com
2024-06-05Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-06-05 We've added 8 non-merge commits during the last 6 day(s) which contain a total of 9 files changed, 34 insertions(+), 35 deletions(-). The main changes are: 1) Fix a potential use-after-free in bpf_link_free when the link uses dealloc_deferred to free the link object but later still tests for presence of link->ops->dealloc, from Cong Wang. 2) Fix BPF test infra to set the run context for rawtp test_run callback where syzbot reported a crash, from Jiri Olsa. 3) Fix bpf_session_cookie BTF_ID in the special_kfunc_set list to exclude it for the case of !CONFIG_FPROBE, also from Jiri Olsa. 4) Fix a Coverity static analysis report to not close() a link_fd of -1 in the multi-uprobe feature detector, from Andrii Nakryiko. 5) Revert support for redirect to any xsk socket bound to the same umem as it can result in corrupted ring state which can lead to a crash when flushing rings. A different approach will be pursued for bpf-next to address it safely, from Magnus Karlsson. 6) Fix inet_csk_accept prototype in test_sk_storage_tracing.c which caused BPF CI failure after the last tree fast forwarding, from Andrii Nakryiko. 7) Fix a coccicheck warning in BPF devmap that iterator variable cannot be NULL, from Thorsten Blum. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: Revert "xsk: Document ability to redirect to any socket bound to the same umem" Revert "xsk: Support redirect to any socket bound to the same umem" bpf: Set run context for rawtp test_run callback bpf: Fix a potential use-after-free in bpf_link_free() bpf, devmap: Remove unnecessary if check in for loop libbpf: don't close(-1) in multi-uprobe feature detector bpf: Fix bpf_session_cookie BTF_ID in special_kfunc_set list selftests/bpf: fix inet_csk_accept prototype in test_sk_storage_tracing.c ==================== Link: https://lore.kernel.org/r/20240605091525.22628-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-06Merge tag 'drm-xe-fixes-2024-06-04' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes Driver Changes: - drm/xe/pf: Update the LMTT when freeing VF GT config Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zl8uFrQp0YjTtX4p@fedora
2024-06-05ptp: Fix error message on failed pin verificationKarol Kolacinski
On failed verification of PTP clock pin, error message prints channel number instead of pin index after "pin", which is incorrect. Fix error message by adding channel number to the message and printing pin number instead of channel number. Fixes: 6092315dfdec ("ptp: introduce programmable pins.") Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Link: https://lore.kernel.org/r/20240604120555.16643-1-karol.kolacinski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05net/sched: taprio: always validate TCA_TAPRIO_ATTR_PRIOMAPEric Dumazet
If one TCA_TAPRIO_ATTR_PRIOMAP attribute has been provided, taprio_parse_mqprio_opt() must validate it, or userspace can inject arbitrary data to the kernel, the second time taprio_change() is called. First call (with valid attributes) sets dev->num_tc to a non zero value. Second call (with arbitrary mqprio attributes) returns early from taprio_parse_mqprio_opt() and bad things can happen. Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule") Reported-by: Noam Rathaus <noamr@ssd-disclosure.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20240604181511.769870-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05Merge tag 'thermal-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "Fix issues related to the handling of invalid trip points in the thermal core and in the thermal debug code that have been overlooked by some recent thermal control core changes" * tag 'thermal-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: trip: Trigger trip down notifications when trips involved in mitigation become invalid thermal: core: Introduce thermal_trip_crossed() thermal/debugfs: Allow tze_seq_show() to print statistics for invalid trips thermal/debugfs: Print initial trip temperature and hysteresis in tze_seq_show()
2024-06-05Merge tag 'acpi-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "These fix the ACPI EC and AC drivers, the ACPI APEI error injection driver and build issues related to the dev_is_pnp() macro referring to pnp_bus_type that is not exported to modules. Specifics: - Fix error handling during EC operation region accesses in the ACPI EC driver (Armin Wolf) - Fix a memory leak in the APEI error injection driver introduced during its converion to a platform driver (Dan Williams) - Fix build failures related to the dev_is_pnp() macro by redefining it as a proper function and exporting it to modules as appropriate and unexport pnp_bus_type which need not be exported any more (Andy Shevchenko) - Update the ACPI AC driver to use power_supply_changed() to let the power supply core handle configuration changes properly (Thomas Weißschuh)" * tag 'acpi-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: AC: Properly notify powermanagement core about changes PNP: Hide pnp_bus_type from the non-PNP code PNP: Make dev_is_pnp() to be a function and export it for modules ACPI: EC: Avoid returning AE_OK on errors in address space handler ACPI: EC: Abort address space access upon error ACPI: APEI: EINJ: Fix einj_dev release leak
2024-06-05Merge tag 'pm-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix the intel_pstate and amd-pstate cpufreq drivers and the cpupower utility. Specifics: - Fix a recently introduced unchecked HWP MSR access in the intel_pstate driver (Srinivas Pandruvada) - Add missing conversion from MHz to KHz to amd_pstate_set_boost() to address sysfs inteface inconsistency and fix P-state frequency reporting on AMD Family 1Ah CPUs in the cpupower utility (Dhananjay Ugwekar) - Get rid of an excess global header file used by the amd-pstate cpufreq driver (Arnd Bergmann)" * tag 'pm-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Fix unchecked HWP MSR access cpufreq: amd-pstate: Fix the inconsistency in max frequency units cpufreq: amd-pstate: remove global header file tools/power/cpupower: Fix Pstate frequency reporting on AMD Family 1Ah CPUs
2024-06-05net/mlx5: Fix tainted pointer delete is case of flow rules creation failAleksandr Mishin
In case of flow rule creation fail in mlx5_lag_create_port_sel_table(), instead of previously created rules, the tainted pointer is deleted deveral times. Fix this bug by using correct flow rules pointers. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 352899f384d4 ("net/mlx5: Lag, use buckets in hash mode") Signed-off-by: Aleksandr Mishin <amishin@t-argos.ru> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240604100552.25201-1-amishin@t-argos.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-05x86/amd_nb: Check for invalid SMN readsYazen Ghannam
AMD Zen-based systems use a System Management Network (SMN) that provides access to implementation-specific registers. SMN accesses are done indirectly through an index/data pair in PCI config space. The PCI config access may fail and return an error code. This would prevent the "read" value from being updated. However, the PCI config access may succeed, but the return value may be invalid. This is in similar fashion to PCI bad reads, i.e. return all bits set. Most systems will return 0 for SMN addresses that are not accessible. This is in line with AMD convention that unavailable registers are Read-as-Zero/Writes-Ignored. However, some systems will return a "PCI Error Response" instead. This value, along with an error code of 0 from the PCI config access, will confuse callers of the amd_smn_read() function. Check for this condition, clear the return value, and set a proper error code. Fixes: ddfe43cdc0da ("x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230403164244.471141-1-yazen.ghannam@amd.com
2024-06-05drm/xe: Don't overmap identity VRAM mappingMatthew Brost
Overmapping the identity VRAM mapping is triggering hardware bugs on certain platforms. Use 2M pages for the last unaligned (to 1G) VRAM chunk. v2: - Always use 2M pages for last chunk (Fei Yang) - break loop when 2M pages are used - Add assert for usable_size being 2M aligned v3: - Fix checkpatch Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Fei Yang <fei.yang@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Fei Yang <fei.yang@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240603181824.1927675-1-matthew.brost@intel.com
2024-06-05drm/xe: flush engine buffers before signalling user fence on all enginesAndrzej Hajda
Tests show that user fence signalling requires kind of write barrier, otherwise not all writes performed by the workload will be available to userspace. It is already done for render and compute, we need it also for the rest: video, gsc, copy. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240605-fix_user_fence_posted-v3-2-06e7932f784a@intel.com
2024-06-05Revert "drm/xe: flush gtt before signalling user fence on all engines"Andrzej Hajda
This reverts commit 38007fa96419a9db9719f170b9e8a7877821cdd1. Signaling user-fence after seqno write does not seem to be good solution. Instead of changing order separate barrier should be put before user-fence, this will be done in separate patch. v2: added fixes tag in case reverted patch gets backported to stable Fixes: 38007fa96419 ("drm/xe: flush gtt before signalling user fence on all engines") Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240605-fix_user_fence_posted-v3-1-06e7932f784a@intel.com
2024-06-05Merge tag 'for-6.10-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A fix for fast fsync that needs to handle errors during writes after some COW failure so it does not lead to an inconsistent state" * tag 'for-6.10-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: ensure fast fsync waits for ordered extents after a write failure
2024-06-05Merge tag 'bcachefs-2024-06-05' of https://evilpiepirate.org/git/bcachefsLinus Torvalds
Pull bcachefs fixes from Kent Overstreet: "Just a few small fixes" * tag 'bcachefs-2024-06-05' of https://evilpiepirate.org/git/bcachefs: bcachefs: Fix trans->locked assert bcachefs: Rereplicate now moves data off of durability=0 devices bcachefs: Fix GFP_KERNEL allocation in break_cycle()
2024-06-05Merge tag 'nvme-6.10-2024-06-05' of git://git.infradead.org/nvme into block-6.10Jens Axboe
Pull NVMe fixes from Keith: "nvme fixes Linux 6.10 - Use reserved tags for special fabrics operations (Chunguang) - Persistent Reservation status masking fix (Weiwen)" * tag 'nvme-6.10-2024-06-05' of git://git.infradead.org/nvme: nvme: fix nvme_pr_* status code parsing nvme-fabrics: use reserved tag for reg read/write command
2024-06-05null_blk: fix validation of block sizeAndreas Hindborg
Block size should be between 512 and PAGE_SIZE and be a power of 2. The current check does not validate this, so update the check. Without this patch, null_blk would Oops due to a null pointer deref when loaded with bs=1536 [1]. Link: https://lore.kernel.org/all/87wmn8mocd.fsf@metaspace.dk/ Signed-off-by: Andreas Hindborg <a.hindborg@samsung.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240603192645.977968-1-nmi@metaspace.dk [axboe: remove unnecessary braces and != 0 check] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-06-05drm/amdgpu/pptable: Fix UBSAN array-index-out-of-boundsTasos Sahanidis
Flexible arrays used [1] instead of []. Replace the former with the latter to resolve multiple UBSAN warnings observed on boot with a BONAIRE card. In addition, use the __counted_by attribute where possible to hint the length of the arrays to the compiler and any sanitizers. Signed-off-by: Tasos Sahanidis <tasos@tasossah.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amd: Fix shutdown (again) on some SMU v13.0.4/11 platformsMario Limonciello
commit cd94d1b182d2 ("dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users") attempted to fix shutdown issues that were reported since commit 31729e8c21ec ("drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11") but caused issues for some people. Adjust the workaround flow to properly only apply in the S4 case: -> For shutdown go through SMU_MSG_PrepareMp1ForUnload -> For S4 go through SMU_MSG_GfxDeviceDriverReset and SMU_MSG_PrepareMp1ForUnload Reported-and-tested-by: lectrode <electrodexsnet@gmail.com> Closes: https://github.com/void-linux/void-packages/issues/50417 Cc: stable@vger.kernel.org Fixes: cd94d1b182d2 ("dm/amd/pm: Fix problems with reboot/shutdown for some SMU 13.0.4/13.0.11 users") Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05Merge tag 'i2c-for-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "This should have been my second pull request during the merge window but one dependency in the drm subsystem fell through the cracks and was only applied for rc2. Now we can finally remove I2C_CLASS_SPD" * tag 'i2c-for-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: Remove I2C_CLASS_SPD i2c: synquacer: Remove a clk reference from struct synquacer_i2c
2024-06-05Merge tag 'tpmdd-next-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm fixes from Jarkko Sakkinen: "The bug fix for tpm_tis_core_init() is not that critical but still makes sense to get into release for the sake of better quality. I included the Intel CPU model define change mainly to help Tony just a bit, as for this subsystem it cannot realistically speaking cause any possible harm" * tag 'tpmdd-next-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm: Switch to new Intel CPU model defines tpm_tis: Do *not* flush uninitialized work
2024-06-05drm/xe/vm: Simplify if conditionThorsten Blum
The if condition !A || A && B can be simplified to !A || B. Fixes the following Coccinelle/coccicheck warning reported by excluded_middle.cocci: WARNING !A || A && B is equivalent to !A || B Compile-tested only. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240603180005.191578-1-thorsten.blum@toblux.com
2024-06-05btrfs: fix leak of qgroup extent records after transaction abortFilipe Manana
Qgroup extent records are created when delayed ref heads are created and then released after accounting extents at btrfs_qgroup_account_extents(), called during the transaction commit path. If a transaction is aborted we free the qgroup records by calling btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs(), unless we don't have delayed references. We are incorrectly assuming that no delayed references means we don't have qgroup extents records. We can currently have no delayed references because we ran them all during a transaction commit and the transaction was aborted after that due to some error in the commit path. So fix this by ensuring we btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs() even if we don't have any delayed references. Reported-by: syzbot+0fecc032fa134afd49df@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/ Fixes: 81f7eb00ff5b ("btrfs: destroy qgroup extent records on transaction abort") CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-05btrfs: fix crash on racing fsync and size-extending write into preallocOmar Sandoval
We have been seeing crashes on duplicate keys in btrfs_set_item_key_safe(): BTRFS critical (device vdb): slot 4 key (450 108 8192) new key (450 108 8192) ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:2620! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 3139 Comm: xfs_io Kdump: loaded Not tainted 6.9.0 #6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:btrfs_set_item_key_safe+0x11f/0x290 [btrfs] With the following stack trace: #0 btrfs_set_item_key_safe (fs/btrfs/ctree.c:2620:4) #1 btrfs_drop_extents (fs/btrfs/file.c:411:4) #2 log_one_extent (fs/btrfs/tree-log.c:4732:9) #3 btrfs_log_changed_extents (fs/btrfs/tree-log.c:4955:9) #4 btrfs_log_inode (fs/btrfs/tree-log.c:6626:9) #5 btrfs_log_inode_parent (fs/btrfs/tree-log.c:7070:8) #6 btrfs_log_dentry_safe (fs/btrfs/tree-log.c:7171:8) #7 btrfs_sync_file (fs/btrfs/file.c:1933:8) #8 vfs_fsync_range (fs/sync.c:188:9) #9 vfs_fsync (fs/sync.c:202:9) #10 do_fsync (fs/sync.c:212:9) #11 __do_sys_fdatasync (fs/sync.c:225:9) #12 __se_sys_fdatasync (fs/sync.c:223:1) #13 __x64_sys_fdatasync (fs/sync.c:223:1) #14 do_syscall_x64 (arch/x86/entry/common.c:52:14) #15 do_syscall_64 (arch/x86/entry/common.c:83:7) #16 entry_SYSCALL_64+0xaf/0x14c (arch/x86/entry/entry_64.S:121) So we're logging a changed extent from fsync, which is splitting an extent in the log tree. But this split part already exists in the tree, triggering the BUG(). This is the state of the log tree at the time of the crash, dumped with drgn (https://github.com/osandov/drgn/blob/main/contrib/btrfs_tree.py) to get more details than btrfs_print_leaf() gives us: >>> print_extent_buffer(prog.crashed_thread().stack_trace()[0]["eb"]) leaf 33439744 level 0 items 72 generation 9 owner 18446744073709551610 leaf 33439744 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da item 0 key (450 INODE_ITEM 0) itemoff 16123 itemsize 160 generation 7 transid 9 size 8192 nbytes 8473563889606862198 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 204 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417704.983333333 (2024-05-22 15:41:44) mtime 1716417704.983333333 (2024-05-22 15:41:44) otime 17592186044416.000000000 (559444-03-08 01:40:16) item 1 key (450 INODE_REF 256) itemoff 16110 itemsize 13 index 195 namelen 3 name: 193 item 2 key (450 XATTR_ITEM 1640047104) itemoff 16073 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 3 key (450 EXTENT_DATA 0) itemoff 16020 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 4096 ram 12288 extent compression 0 (none) item 4 key (450 EXTENT_DATA 4096) itemoff 15967 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 4096 nr 8192 item 5 key (450 EXTENT_DATA 8192) itemoff 15914 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 ... So the real problem happened earlier: notice that items 4 (4k-12k) and 5 (8k-12k) overlap. Both are prealloc extents. Item 4 straddles i_size and item 5 starts at i_size. Here is the state of the filesystem tree at the time of the crash: >>> root = prog.crashed_thread().stack_trace()[2]["inode"].root >>> ret, nodes, slots = btrfs_search_slot(root, BtrfsKey(450, 0, 0)) >>> print_extent_buffer(nodes[0]) leaf 30425088 level 0 items 184 generation 9 owner 5 leaf 30425088 flags 0x100000000000000 fs uuid e5bd3946-400c-4223-8923-190ef1f18677 chunk uuid d58cb17e-6d02-494a-829a-18b7d8a399da ... item 179 key (450 INODE_ITEM 0) itemoff 4907 itemsize 160 generation 7 transid 7 size 4096 nbytes 12288 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 sequence 6 flags 0x10(PREALLOC) atime 1716417703.220000000 (2024-05-22 15:41:43) ctime 1716417703.220000000 (2024-05-22 15:41:43) mtime 1716417703.220000000 (2024-05-22 15:41:43) otime 1716417703.220000000 (2024-05-22 15:41:43) item 180 key (450 INODE_REF 256) itemoff 4894 itemsize 13 index 195 namelen 3 name: 193 item 181 key (450 XATTR_ITEM 1640047104) itemoff 4857 itemsize 37 location key (0 UNKNOWN.0 0) type XATTR transid 7 data_len 1 name_len 6 name: user.a data a item 182 key (450 EXTENT_DATA 0) itemoff 4804 itemsize 53 generation 9 type 1 (regular) extent data disk byte 303144960 nr 12288 extent data offset 0 nr 8192 ram 12288 extent compression 0 (none) item 183 key (450 EXTENT_DATA 8192) itemoff 4751 itemsize 53 generation 9 type 2 (prealloc) prealloc data disk byte 303144960 nr 12288 prealloc data offset 8192 nr 4096 Item 5 in the log tree corresponds to item 183 in the filesystem tree, but nothing matches item 4. Furthermore, item 183 is the last item in the leaf. btrfs_log_prealloc_extents() is responsible for logging prealloc extents beyond i_size. It first truncates any previously logged prealloc extents that start beyond i_size. Then, it walks the filesystem tree and copies the prealloc extent items to the log tree. If it hits the end of a leaf, then it calls btrfs_next_leaf(), which unlocks the tree and does another search. However, while the filesystem tree is unlocked, an ordered extent completion may modify the tree. In particular, it may insert an extent item that overlaps with an extent item that was already copied to the log tree. This may manifest in several ways depending on the exact scenario, including an EEXIST error that is silently translated to a full sync, overlapping items in the log tree, or this crash. This particular crash is triggered by the following sequence of events: - Initially, the file has i_size=4k, a regular extent from 0-4k, and a prealloc extent beyond i_size from 4k-12k. The prealloc extent item is the last item in its B-tree leaf. - The file is fsync'd, which copies its inode item and both extent items to the log tree. - An xattr is set on the file, which sets the BTRFS_INODE_COPY_EVERYTHING flag. - The range 4k-8k in the file is written using direct I/O. i_size is extended to 8k, but the ordered extent is still in flight. - The file is fsync'd. Since BTRFS_INODE_COPY_EVERYTHING is set, this calls copy_inode_items_to_log(), which calls btrfs_log_prealloc_extents(). - btrfs_log_prealloc_extents() finds the 4k-12k prealloc extent in the filesystem tree. Since it starts before i_size, it skips it. Since it is the last item in its B-tree leaf, it calls btrfs_next_leaf(). - btrfs_next_leaf() unlocks the path. - The ordered extent completion runs, which converts the 4k-8k part of the prealloc extent to written and inserts the remaining prealloc part from 8k-12k. - btrfs_next_leaf() does a search and finds the new prealloc extent 8k-12k. - btrfs_log_prealloc_extents() copies the 8k-12k prealloc extent into the log tree. Note that it overlaps with the 4k-12k prealloc extent that was copied to the log tree by the first fsync. - fsync calls btrfs_log_changed_extents(), which tries to log the 4k-8k extent that was written. - This tries to drop the range 4k-8k in the log tree, which requires adjusting the start of the 4k-12k prealloc extent in the log tree to 8k. - btrfs_set_item_key_safe() sees that there is already an extent starting at 8k in the log tree and calls BUG(). Fix this by detecting when we're about to insert an overlapping file extent item in the log tree and truncating the part that would overlap. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-06-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "This is dominated by a couple large series for ARM and x86 respectively, but apart from that things are calm. ARM: - Large set of FP/SVE fixes for pKVM, addressing the fallout from the per-CPU data rework and making sure that the host is not involved in the FP/SVE switching any more - Allow FEAT_BTI to be enabled with NV now that FEAT_PAUTH is completely supported - Fix for the respective priorities of Failed PAC, Illegal Execution state and Instruction Abort exceptions - Fix the handling of AArch32 instruction traps failing their condition code, which was broken by the introduction of ESR_EL2.ISS2 - Allow vcpus running in AArch32 state to be restored in System mode - Fix AArch32 GPR restore that would lose the 64 bit state under some conditions RISC-V: - No need to use mask when hart-index-bits is 0 - Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext() x86: - Fixes and debugging help for the #VE sanity check. Also disable it by default, even for CONFIG_DEBUG_KERNEL, because it was found to trigger spuriously (most likely a processor erratum as the exact symptoms vary by generation). - Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled situation (GIF=0 or interrupt shadow) when the processor supports virtual NMI. While generally KVM will not request an NMI window when virtual NMIs are supported, in this case it *does* have to single-step over the interrupt shadow or enable the STGI intercept, in order to deliver the latched second NMI. - Drop support for hand tuning APIC timer advancement from userspace. Since we have adaptive tuning, and it has proved to work well, drop the module parameter for manual configuration and with it a few stupid bugs that it had" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (32 commits) KVM: x86/mmu: Don't save mmu_invalidate_seq after checking private attr KVM: arm64: Ensure that SME controls are disabled in protected mode KVM: arm64: Refactor CPACR trap bit setting/clearing to use ELx format KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVM KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM KVM: arm64: Specialize handling of host fpsimd state on trap KVM: arm64: Abstract set/clear of CPTR_EL2 bits behind helper KVM: arm64: Fix prototype for __sve_save_state/__sve_restore_state KVM: arm64: Reintroduce __sve_save_state KVM: x86: Drop support for hand tuning APIC timer advancement from userspace KVM: SEV-ES: Delegate LBR virtualization to the processor KVM: SEV-ES: Disallow SEV-ES guests when X86_FEATURE_LBRV is absent KVM: SEV-ES: Prevent MSR access post VMSA encryption RISC-V: KVM: Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext function RISC-V: KVM: No need to use mask when hart-index-bit is 0 KVM: arm64: nv: Expose BTI and CSV_frac to a guest hypervisor KVM: arm64: nv: Fix relative priorities of exceptions generated by ERETAx KVM: arm64: AArch32: Fix spurious trapping of conditional instructions KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode ...
2024-06-05drm/amdgpu: add RAS is_rma flagTao Zhou
Set the flag to true if bad page number reaches threshold. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amd/display: Add null checks for 'stream' and 'plane' before dereferencingSrinivasan Shanmugam
This commit adds null checks for the 'stream' and 'plane' variables in the dcn30_apply_idle_power_optimizations function. These variables were previously assumed to be null at line 922, but they were used later in the code without checking if they were null. This could potentially lead to a null pointer dereference, which would cause a crash. The null checks ensure that 'stream' and 'plane' are not null before they are used, preventing potential crashes. Fixes the below static smatch checker: drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn30/dcn30_hwseq.c:938 dcn30_apply_idle_power_optimizations() error: we previously assumed 'stream' could be null (see line 922) drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn30/dcn30_hwseq.c:940 dcn30_apply_idle_power_optimizations() error: we previously assumed 'plane' could be null (see line 922) Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Cc: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Hersen Wu <hersenxs.wu@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amd/pm: remove dead code in si_convert_power_level_to_smcJesse Zhang
Since gmc_pg is false, setting mcFlags with SISLANDS_SMC_MC_PG_EN cannot be reach. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amd/display: Prevent IPX From Link Detect and Set ModeFangzhi Zuo
IPX involvment proven to affect LT, causing link loss. Need to prevent IPX enabled in LT process in which link detect and set mode are main procedures that have LT taken place. Reviewed-by: Roman Li <roman.li@amd.com> Acked-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amdkfd: remove logically dead codeJesse Zhang
idr_for_each_entry can ensure that mem is not empty during the loop. So don't need check mem again. Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amdgpu: fix failure mapping legacy queue when FLRLin.Cao
Flag "mes.ring.shced.ready" will be set as true after mes hw init and set as false when mes hw fini to avoid duplicate initialization. But hw fini will not be called when function level reset, which will cause mes hw init be skipped during FLR, which will leads to mapping legacy queue fail. Set this flag as false when post reset will fix this issue. Signed-off-by: Lin.Cao <lincao12@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amd/display: Fetch Mall caps from DCDaniel Sa
[Why] When performing P-State switching with Subvp on 8k (downscaled to 4k). corruption can be seen on the screen. MALL data was not being fetched from DC, and the system things there is more MALL space then what is actually available. [How] Read MALL size from dc caps. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Acked-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Signed-off-by: Daniel Sa <daniel.sa@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amd/display: fix YUV video color corruption in DCN401Samson Tam
[Why] Missing check causes sequence error which results in chroma filter coefficients not being updated in certain modes when we display YUV video in fullscreen. This results in color corruption in video [How] Add back chroma_coef_mode check in dscl_set_scl_filter so that filter coefficients are calculated and updated when we have YUV surface Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Acked-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Signed-off-by: Samson Tam <samson.tam@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amdkfd: add reset cause in gpu pre-reset smi eventEric Huang
reset cause is requested by customer as additional info for gpu reset smi event. v2: integerate reset sources suggested by Lijo Lazar Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amdgpu: Update soc24_enum.h and soc21_enum.hFrank Min
Update to latest changes. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amdgpu: Set PTE_IS_PTE bit for gfx12Frank Min
Set PTE_IS_PTE bit while PRT is enabled on gfx12. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amd/display: Updated optc401_set_drr to use dcn401 functionsRelja Vojvodic
why: optc_401_set_drr was using an old optc3 function to update vtotal min and max, causing crashes when disabling FAMS2 how: Updated dcn401 to point to opt401 function for vtotal updates. This version of the function has FAMS2 logic that allows for FAMS2 to be disabled. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Acked-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Signed-off-by: Relja Vojvodic <relja.vojvodic@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amdgpu: add reset sources in gpu reset contextEric Huang
reset source or reset cause is very useful info for reset context, it will be used by events API. Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com> Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-05drm/amd/display: Add UCLK p-state support message to dcn401Dillon Varone
[WHY&HOW] Improves on the SMU interface to explicitly declare P-State support. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Acked-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>