summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-01-23btrfs: avoid starting new transaction when cleaning qgroup during subvolume dropFilipe Manana
At btrfs_qgroup_cleanup_dropped_subvolume() all we want to commit the current transaction in order to have all the qgroup rfer/excl numbers up to date. However we are using btrfs_start_transaction(), which joins the current transaction if there is one that is not yet committing, but also starts a new one if there is none or if the current one is already committing (its state is >= TRANS_STATE_COMMIT_START). This later case results in unnecessary IO, wasting time and a pointless rotation of the backup roots in the super block. So instead of using btrfs_start_transaction() followed by a btrfs_commit_transaction(), use btrfs_commit_current_transaction() which achieves our purpose and avoids starting and committing new transactions. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-23btrfs: fix use-after-free when attempting to join an aborted transactionFilipe Manana
When we are trying to join the current transaction and if it's aborted, we read its 'aborted' field after unlocking fs_info->trans_lock and without holding any extra reference count on it. This means that a concurrent task that is aborting the transaction may free the transaction before we read its 'aborted' field, leading to a use-after-free. Fix this by reading the 'aborted' field while holding fs_info->trans_lock since any freeing task must first acquire that lock and set fs_info->running_transaction to NULL before freeing the transaction. This was reported by syzbot and Dmitry with the following stack traces from KASAN: ================================================================== BUG: KASAN: slab-use-after-free in join_transaction+0xd9b/0xda0 fs/btrfs/transaction.c:278 Read of size 4 at addr ffff888011839024 by task kworker/u4:9/1128 CPU: 0 UID: 0 PID: 1128 Comm: kworker/u4:9 Not tainted 6.13.0-rc7-syzkaller-00019-gc45323b7560e #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Workqueue: events_unbound btrfs_async_reclaim_data_space Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0x169/0x550 mm/kasan/report.c:489 kasan_report+0x143/0x180 mm/kasan/report.c:602 join_transaction+0xd9b/0xda0 fs/btrfs/transaction.c:278 start_transaction+0xaf8/0x1670 fs/btrfs/transaction.c:697 flush_space+0x448/0xcf0 fs/btrfs/space-info.c:803 btrfs_async_reclaim_data_space+0x159/0x510 fs/btrfs/space-info.c:1321 process_one_work kernel/workqueue.c:3236 [inline] process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317 worker_thread+0x870/0xd30 kernel/workqueue.c:3398 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Allocated by task 5315: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:377 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:394 kasan_kmalloc include/linux/kasan.h:260 [inline] __kmalloc_cache_noprof+0x243/0x390 mm/slub.c:4329 kmalloc_noprof include/linux/slab.h:901 [inline] join_transaction+0x144/0xda0 fs/btrfs/transaction.c:308 start_transaction+0xaf8/0x1670 fs/btrfs/transaction.c:697 btrfs_create_common+0x1b2/0x2e0 fs/btrfs/inode.c:6572 lookup_open fs/namei.c:3649 [inline] open_last_lookups fs/namei.c:3748 [inline] path_openat+0x1c03/0x3590 fs/namei.c:3984 do_filp_open+0x27f/0x4e0 fs/namei.c:4014 do_sys_openat2+0x13e/0x1d0 fs/open.c:1402 do_sys_open fs/open.c:1417 [inline] __do_sys_creat fs/open.c:1495 [inline] __se_sys_creat fs/open.c:1489 [inline] __x64_sys_creat+0x123/0x170 fs/open.c:1489 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 5336: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:582 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:233 [inline] slab_free_hook mm/slub.c:2353 [inline] slab_free mm/slub.c:4613 [inline] kfree+0x196/0x430 mm/slub.c:4761 cleanup_transaction fs/btrfs/transaction.c:2063 [inline] btrfs_commit_transaction+0x2c97/0x3720 fs/btrfs/transaction.c:2598 insert_balance_item+0x1284/0x20b0 fs/btrfs/volumes.c:3757 btrfs_balance+0x992/0x10c0 fs/btrfs/volumes.c:4633 btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3670 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The buggy address belongs to the object at ffff888011839000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 36 bytes inside of freed 2048-byte region [ffff888011839000, ffff888011839800) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11838 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff) page_type: f5(slab) raw: 00fff00000000040 ffff88801ac42000 ffffea0000493400 dead000000000002 raw: 0000000000000000 0000000000080008 00000001f5000000 0000000000000000 head: 00fff00000000040 ffff88801ac42000 ffffea0000493400 dead000000000002 head: 0000000000000000 0000000000080008 00000001f5000000 0000000000000000 head: 00fff00000000003 ffffea0000460e01 ffffffffffffffff 0000000000000000 head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 57, tgid 57 (kworker/0:2), ts 67248182943, free_ts 67229742023 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1558 prep_new_page mm/page_alloc.c:1566 [inline] get_page_from_freelist+0x365c/0x37a0 mm/page_alloc.c:3476 __alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4753 alloc_pages_mpol_noprof+0x3e1/0x780 mm/mempolicy.c:2269 alloc_slab_page+0x6a/0x110 mm/slub.c:2423 allocate_slab+0x5a/0x2b0 mm/slub.c:2589 new_slab mm/slub.c:2642 [inline] ___slab_alloc+0xc27/0x14a0 mm/slub.c:3830 __slab_alloc+0x58/0xa0 mm/slub.c:3920 __slab_alloc_node mm/slub.c:3995 [inline] slab_alloc_node mm/slub.c:4156 [inline] __do_kmalloc_node mm/slub.c:4297 [inline] __kmalloc_node_track_caller_noprof+0x2e9/0x4c0 mm/slub.c:4317 kmalloc_reserve+0x111/0x2a0 net/core/skbuff.c:609 __alloc_skb+0x1f3/0x440 net/core/skbuff.c:678 alloc_skb include/linux/skbuff.h:1323 [inline] alloc_skb_with_frags+0xc3/0x820 net/core/skbuff.c:6612 sock_alloc_send_pskb+0x91a/0xa60 net/core/sock.c:2884 sock_alloc_send_skb include/net/sock.h:1803 [inline] mld_newpack+0x1c3/0xaf0 net/ipv6/mcast.c:1747 add_grhead net/ipv6/mcast.c:1850 [inline] add_grec+0x1492/0x19a0 net/ipv6/mcast.c:1988 mld_send_cr net/ipv6/mcast.c:2114 [inline] mld_ifc_work+0x691/0xd90 net/ipv6/mcast.c:2651 page last free pid 5300 tgid 5300 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1127 [inline] free_unref_page+0xd3f/0x1010 mm/page_alloc.c:2659 __slab_free+0x2c2/0x380 mm/slub.c:4524 qlink_free mm/kasan/quarantine.c:163 [inline] qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179 kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286 __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329 kasan_slab_alloc include/linux/kasan.h:250 [inline] slab_post_alloc_hook mm/slub.c:4119 [inline] slab_alloc_node mm/slub.c:4168 [inline] __do_kmalloc_node mm/slub.c:4297 [inline] __kmalloc_noprof+0x236/0x4c0 mm/slub.c:4310 kmalloc_noprof include/linux/slab.h:905 [inline] kzalloc_noprof include/linux/slab.h:1037 [inline] fib_create_info+0xc14/0x25b0 net/ipv4/fib_semantics.c:1435 fib_table_insert+0x1f6/0x1f20 net/ipv4/fib_trie.c:1231 fib_magic+0x3d8/0x620 net/ipv4/fib_frontend.c:1112 fib_add_ifaddr+0x40c/0x5e0 net/ipv4/fib_frontend.c:1156 fib_netdev_event+0x375/0x490 net/ipv4/fib_frontend.c:1494 notifier_call_chain+0x1a5/0x3f0 kernel/notifier.c:85 __dev_notify_flags+0x207/0x400 dev_change_flags+0xf0/0x1a0 net/core/dev.c:9045 do_setlink+0xc90/0x4210 net/core/rtnetlink.c:3109 rtnl_changelink net/core/rtnetlink.c:3723 [inline] __rtnl_newlink net/core/rtnetlink.c:3875 [inline] rtnl_newlink+0x1bb6/0x2210 net/core/rtnetlink.c:4012 Memory state around the buggy address: ffff888011838f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888011838f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888011839000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888011839080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888011839100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Reported-by: syzbot+45212e9d87a98c3f5b42@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/678e7da5.050a0220.303755.007c.GAE@google.com/ Reported-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/linux-btrfs/CACT4Y+ZFBdo7pT8L2AzM=vegZwjp-wNkVJZQf0Ta3vZqtExaSw@mail.gmail.com/ Fixes: 871383be592b ("btrfs: add missing unlocks to transaction abort paths") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-23btrfs: do not output error message if a qgroup has been already cleaned upQu Wenruo
[BUG] There is a bug report that btrfs outputs the following error message: BTRFS info (device nvme0n1p2): qgroup scan completed (inconsistency flag cleared) BTRFS warning (device nvme0n1p2): failed to cleanup qgroup 0/1179: -2 [CAUSE] The error itself is pretty harmless, and the end user should ignore it. When a subvolume is fully dropped, btrfs will call btrfs_qgroup_cleanup_dropped_subvolume() to delete the qgroup. However if a qgroup rescan happened before a subvolume fully dropped, qgroup for that subvolume will not be re-created, as rescan will only create new qgroup if there is a BTRFS_ROOT_REF_KEY found. But before we drop a subvolume, the subvolume is unlinked thus there is no BTRFS_ROOT_REF_KEY. In that case, btrfs_remove_qgroup() will fail with -ENOENT and trigger the above error message. [FIX] Just ignore -ENOENT error from btrfs_remove_qgroup() inside btrfs_qgroup_cleanup_dropped_subvolume(). Reported-by: John Shand <jshand2013@gmail.com> Link: https://bugzilla.suse.com/show_bug.cgi?id=1236056 Fixes: 839d6ea4f86d ("btrfs: automatically remove the subvolume qgroup") Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-23btrfs: fix assertion failure when splitting ordered extent after transaction ↵Filipe Manana
abort If while we are doing a direct IO write a transaction abort happens, we mark all existing ordered extents with the BTRFS_ORDERED_IOERR flag (done at btrfs_destroy_ordered_extents()), and then after that if we enter btrfs_split_ordered_extent() and the ordered extent has bytes left (meaning we have a bio that doesn't cover the whole ordered extent, see details at btrfs_extract_ordered_extent()), we will fail on the following assertion at btrfs_split_ordered_extent(): ASSERT(!(flags & ~BTRFS_ORDERED_TYPE_FLAGS)); because the BTRFS_ORDERED_IOERR flag is set and the definition of BTRFS_ORDERED_TYPE_FLAGS is just the union of all flags that identify the type of write (regular, nocow, prealloc, compressed, direct IO, encoded). Fix this by returning an error from btrfs_extract_ordered_extent() if we find the BTRFS_ORDERED_IOERR flag in the ordered extent. The error will be the error that resulted in the transaction abort or -EIO if no transaction abort happened. This was recently reported by syzbot with the following trace: FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 1 CPU: 0 UID: 0 PID: 5321 Comm: syz.0.0 Not tainted 6.13.0-rc5-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 fail_dump lib/fault-inject.c:53 [inline] should_fail_ex+0x3b0/0x4e0 lib/fault-inject.c:154 should_failslab+0xac/0x100 mm/failslab.c:46 slab_pre_alloc_hook mm/slub.c:4072 [inline] slab_alloc_node mm/slub.c:4148 [inline] __do_kmalloc_node mm/slub.c:4297 [inline] __kmalloc_noprof+0xdd/0x4c0 mm/slub.c:4310 kmalloc_noprof include/linux/slab.h:905 [inline] kzalloc_noprof include/linux/slab.h:1037 [inline] btrfs_chunk_alloc_add_chunk_item+0x244/0x1100 fs/btrfs/volumes.c:5742 reserve_chunk_space+0x1ca/0x2c0 fs/btrfs/block-group.c:4292 check_system_chunk fs/btrfs/block-group.c:4319 [inline] do_chunk_alloc fs/btrfs/block-group.c:3891 [inline] btrfs_chunk_alloc+0x77b/0xf80 fs/btrfs/block-group.c:4187 find_free_extent_update_loop fs/btrfs/extent-tree.c:4166 [inline] find_free_extent+0x42d1/0x5810 fs/btrfs/extent-tree.c:4579 btrfs_reserve_extent+0x422/0x810 fs/btrfs/extent-tree.c:4672 btrfs_new_extent_direct fs/btrfs/direct-io.c:186 [inline] btrfs_get_blocks_direct_write+0x706/0xfa0 fs/btrfs/direct-io.c:321 btrfs_dio_iomap_begin+0xbb7/0x1180 fs/btrfs/direct-io.c:525 iomap_iter+0x697/0xf60 fs/iomap/iter.c:90 __iomap_dio_rw+0xeb9/0x25b0 fs/iomap/direct-io.c:702 btrfs_dio_write fs/btrfs/direct-io.c:775 [inline] btrfs_direct_write+0x610/0xa30 fs/btrfs/direct-io.c:880 btrfs_do_write_iter+0x2a0/0x760 fs/btrfs/file.c:1397 do_iter_readv_writev+0x600/0x880 vfs_writev+0x376/0xba0 fs/read_write.c:1050 do_pwritev fs/read_write.c:1146 [inline] __do_sys_pwritev2 fs/read_write.c:1204 [inline] __se_sys_pwritev2+0x196/0x2b0 fs/read_write.c:1195 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f1281f85d29 RSP: 002b:00007f12819fe038 EFLAGS: 00000246 ORIG_RAX: 0000000000000148 RAX: ffffffffffffffda RBX: 00007f1282176080 RCX: 00007f1281f85d29 RDX: 0000000000000001 RSI: 0000000020000240 RDI: 0000000000000005 RBP: 00007f12819fe090 R08: 0000000000000000 R09: 0000000000000003 R10: 0000000000007000 R11: 0000000000000246 R12: 0000000000000002 R13: 0000000000000000 R14: 00007f1282176080 R15: 00007ffcb9e23328 </TASK> BTRFS error (device loop0 state A): Transaction aborted (error -12) BTRFS: error (device loop0 state A) in btrfs_chunk_alloc_add_chunk_item:5745: errno=-12 Out of memory BTRFS info (device loop0 state EA): forced readonly assertion failed: !(flags & ~BTRFS_ORDERED_TYPE_FLAGS), in fs/btrfs/ordered-data.c:1234 ------------[ cut here ]------------ kernel BUG at fs/btrfs/ordered-data.c:1234! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI CPU: 0 UID: 0 PID: 5321 Comm: syz.0.0 Not tainted 6.13.0-rc5-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 RIP: 0010:btrfs_split_ordered_extent+0xd8d/0xe20 fs/btrfs/ordered-data.c:1234 RSP: 0018:ffffc9000d1df2b8 EFLAGS: 00010246 RAX: 0000000000000057 RBX: 000000000006a000 RCX: 9ce21886c4195300 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: 0000000000000091 R08: ffffffff817f0a3c R09: 1ffff92001a3bdf4 R10: dffffc0000000000 R11: fffff52001a3bdf5 R12: 1ffff1100a45f401 R13: ffff8880522fa018 R14: dffffc0000000000 R15: 000000000006a000 FS: 00007f12819fe6c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000557750bd7da8 CR3: 00000000400ea000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> btrfs_extract_ordered_extent fs/btrfs/direct-io.c:702 [inline] btrfs_dio_submit_io+0x4be/0x6d0 fs/btrfs/direct-io.c:737 iomap_dio_submit_bio fs/iomap/direct-io.c:85 [inline] iomap_dio_bio_iter+0x1022/0x1740 fs/iomap/direct-io.c:447 __iomap_dio_rw+0x13b7/0x25b0 fs/iomap/direct-io.c:703 btrfs_dio_write fs/btrfs/direct-io.c:775 [inline] btrfs_direct_write+0x610/0xa30 fs/btrfs/direct-io.c:880 btrfs_do_write_iter+0x2a0/0x760 fs/btrfs/file.c:1397 do_iter_readv_writev+0x600/0x880 vfs_writev+0x376/0xba0 fs/read_write.c:1050 do_pwritev fs/read_write.c:1146 [inline] __do_sys_pwritev2 fs/read_write.c:1204 [inline] __se_sys_pwritev2+0x196/0x2b0 fs/read_write.c:1195 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f1281f85d29 RSP: 002b:00007f12819fe038 EFLAGS: 00000246 ORIG_RAX: 0000000000000148 RAX: ffffffffffffffda RBX: 00007f1282176080 RCX: 00007f1281f85d29 RDX: 0000000000000001 RSI: 0000000020000240 RDI: 0000000000000005 RBP: 00007f12819fe090 R08: 0000000000000000 R09: 0000000000000003 R10: 0000000000007000 R11: 0000000000000246 R12: 0000000000000002 R13: 0000000000000000 R14: 00007f1282176080 R15: 00007ffcb9e23328 </TASK> Modules linked in: ---[ end trace 0000000000000000 ]--- RIP: 0010:btrfs_split_ordered_extent+0xd8d/0xe20 fs/btrfs/ordered-data.c:1234 RSP: 0018:ffffc9000d1df2b8 EFLAGS: 00010246 RAX: 0000000000000057 RBX: 000000000006a000 RCX: 9ce21886c4195300 RDX: 0000000000000000 RSI: 0000000080000000 RDI: 0000000000000000 RBP: 0000000000000091 R08: ffffffff817f0a3c R09: 1ffff92001a3bdf4 R10: dffffc0000000000 R11: fffff52001a3bdf5 R12: 1ffff1100a45f401 R13: ffff8880522fa018 R14: dffffc0000000000 R15: 000000000006a000 FS: 00007f12819fe6c0(0000) GS:ffff88801fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000557750bd7da8 CR3: 00000000400ea000 CR4: 0000000000352ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 In this case the transaction abort was due to (an injected) memory allocation failure when attempting to allocate a new chunk. Reported-by: syzbot+f60d8337a5c8e8d92a77@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/6777f2dd.050a0220.178762.0045.GAE@google.com/ Fixes: 52b1fdca23ac ("btrfs: handle completed ordered extents in btrfs_split_ordered_extent") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-23btrfs: fix lockdep splat while merging a relocation rootFilipe Manana
When COWing a relocation tree path, at relocation.c:replace_path(), we can trigger a lockdep splat while we are in the btrfs_search_slot() call against the relocation root. This happens in that callchain at ctree.c:read_block_for_search() when we happen to find a child extent buffer already loaded through the fs tree with a lockdep class set to the fs tree. So when we attempt to lock that extent buffer through a relocation tree we have to reset the lockdep class to the class for a relocation tree, since a relocation tree has extent buffers that used to belong to a fs tree and may currently be already loaded (we swap extent buffers between the two trees at the end of replace_path()). However we are missing calls to btrfs_maybe_reset_lockdep_class() to reset the lockdep class at ctree.c:read_block_for_search() before we read lock an extent buffer, just like we did for btrfs_search_slot() in commit b40130b23ca4 ("btrfs: fix lockdep splat with reloc root extent buffers"). So add the missing btrfs_maybe_reset_lockdep_class() calls before the attempts to read lock an extent buffer at ctree.c:read_block_for_search(). The lockdep splat was reported by syzbot and it looks like this: ====================================================== WARNING: possible circular locking dependency detected 6.13.0-rc5-syzkaller-00163-gab75170520d4 #0 Not tainted ------------------------------------------------------ syz.0.0/5335 is trying to acquire lock: ffff8880545dbc38 (btrfs-tree-01){++++}-{4:4}, at: btrfs_tree_read_lock_nested+0x2f/0x250 fs/btrfs/locking.c:146 but task is already holding lock: ffff8880545dba58 (btrfs-treloc-02/1){+.+.}-{4:4}, at: btrfs_tree_lock_nested+0x2f/0x250 fs/btrfs/locking.c:189 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (btrfs-treloc-02/1){+.+.}-{4:4}: reacquire_held_locks+0x3eb/0x690 kernel/locking/lockdep.c:5374 __lock_release kernel/locking/lockdep.c:5563 [inline] lock_release+0x396/0xa30 kernel/locking/lockdep.c:5870 up_write+0x79/0x590 kernel/locking/rwsem.c:1629 btrfs_force_cow_block+0x14b3/0x1fd0 fs/btrfs/ctree.c:660 btrfs_cow_block+0x371/0x830 fs/btrfs/ctree.c:755 btrfs_search_slot+0xc01/0x3180 fs/btrfs/ctree.c:2153 replace_path+0x1243/0x2740 fs/btrfs/relocation.c:1224 merge_reloc_root+0xc46/0x1ad0 fs/btrfs/relocation.c:1692 merge_reloc_roots+0x3b3/0x980 fs/btrfs/relocation.c:1942 relocate_block_group+0xb0a/0xd40 fs/btrfs/relocation.c:3754 btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4087 btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3494 __btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4278 btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4655 btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3670 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #1 (btrfs-tree-01/1){+.+.}-{4:4}: lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 down_write_nested+0xa2/0x220 kernel/locking/rwsem.c:1693 btrfs_tree_lock_nested+0x2f/0x250 fs/btrfs/locking.c:189 btrfs_init_new_buffer fs/btrfs/extent-tree.c:5052 [inline] btrfs_alloc_tree_block+0x41c/0x1440 fs/btrfs/extent-tree.c:5132 btrfs_force_cow_block+0x526/0x1fd0 fs/btrfs/ctree.c:573 btrfs_cow_block+0x371/0x830 fs/btrfs/ctree.c:755 btrfs_search_slot+0xc01/0x3180 fs/btrfs/ctree.c:2153 btrfs_insert_empty_items+0x9c/0x1a0 fs/btrfs/ctree.c:4351 btrfs_insert_empty_item fs/btrfs/ctree.h:688 [inline] btrfs_insert_inode_ref+0x2bb/0xf80 fs/btrfs/inode-item.c:330 btrfs_rename_exchange fs/btrfs/inode.c:7990 [inline] btrfs_rename2+0xcb7/0x2b90 fs/btrfs/inode.c:8374 vfs_rename+0xbdb/0xf00 fs/namei.c:5067 do_renameat2+0xd94/0x13f0 fs/namei.c:5224 __do_sys_renameat2 fs/namei.c:5258 [inline] __se_sys_renameat2 fs/namei.c:5255 [inline] __x64_sys_renameat2+0xce/0xe0 fs/namei.c:5255 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f -> #0 (btrfs-tree-01){++++}-{4:4}: check_prev_add kernel/locking/lockdep.c:3161 [inline] check_prevs_add kernel/locking/lockdep.c:3280 [inline] validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904 __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 down_read_nested+0xb5/0xa50 kernel/locking/rwsem.c:1649 btrfs_tree_read_lock_nested+0x2f/0x250 fs/btrfs/locking.c:146 btrfs_tree_read_lock fs/btrfs/locking.h:188 [inline] read_block_for_search+0x718/0xbb0 fs/btrfs/ctree.c:1610 btrfs_search_slot+0x1274/0x3180 fs/btrfs/ctree.c:2237 replace_path+0x1243/0x2740 fs/btrfs/relocation.c:1224 merge_reloc_root+0xc46/0x1ad0 fs/btrfs/relocation.c:1692 merge_reloc_roots+0x3b3/0x980 fs/btrfs/relocation.c:1942 relocate_block_group+0xb0a/0xd40 fs/btrfs/relocation.c:3754 btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4087 btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3494 __btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4278 btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4655 btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3670 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f other info that might help us debug this: Chain exists of: btrfs-tree-01 --> btrfs-tree-01/1 --> btrfs-treloc-02/1 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(btrfs-treloc-02/1); lock(btrfs-tree-01/1); lock(btrfs-treloc-02/1); rlock(btrfs-tree-01); *** DEADLOCK *** 8 locks held by syz.0.0/5335: #0: ffff88801e3ae420 (sb_writers#13){.+.+}-{0:0}, at: mnt_want_write_file+0x5e/0x200 fs/namespace.c:559 #1: ffff888052c760d0 (&fs_info->reclaim_bgs_lock){+.+.}-{4:4}, at: __btrfs_balance+0x4c2/0x26b0 fs/btrfs/volumes.c:4183 #2: ffff888052c74850 (&fs_info->cleaner_mutex){+.+.}-{4:4}, at: btrfs_relocate_block_group+0x775/0xd90 fs/btrfs/relocation.c:4086 #3: ffff88801e3ae610 (sb_internal#2){.+.+}-{0:0}, at: merge_reloc_root+0xf11/0x1ad0 fs/btrfs/relocation.c:1659 #4: ffff888052c76470 (btrfs_trans_num_writers){++++}-{0:0}, at: join_transaction+0x405/0xda0 fs/btrfs/transaction.c:288 #5: ffff888052c76498 (btrfs_trans_num_extwriters){++++}-{0:0}, at: join_transaction+0x405/0xda0 fs/btrfs/transaction.c:288 #6: ffff8880545db878 (btrfs-tree-01/1){+.+.}-{4:4}, at: btrfs_tree_lock_nested+0x2f/0x250 fs/btrfs/locking.c:189 #7: ffff8880545dba58 (btrfs-treloc-02/1){+.+.}-{4:4}, at: btrfs_tree_lock_nested+0x2f/0x250 fs/btrfs/locking.c:189 stack backtrace: CPU: 0 UID: 0 PID: 5335 Comm: syz.0.0 Not tainted 6.13.0-rc5-syzkaller-00163-gab75170520d4 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_circular_bug+0x13a/0x1b0 kernel/locking/lockdep.c:2074 check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2206 check_prev_add kernel/locking/lockdep.c:3161 [inline] check_prevs_add kernel/locking/lockdep.c:3280 [inline] validate_chain+0x18ef/0x5920 kernel/locking/lockdep.c:3904 __lock_acquire+0x1397/0x2100 kernel/locking/lockdep.c:5226 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849 down_read_nested+0xb5/0xa50 kernel/locking/rwsem.c:1649 btrfs_tree_read_lock_nested+0x2f/0x250 fs/btrfs/locking.c:146 btrfs_tree_read_lock fs/btrfs/locking.h:188 [inline] read_block_for_search+0x718/0xbb0 fs/btrfs/ctree.c:1610 btrfs_search_slot+0x1274/0x3180 fs/btrfs/ctree.c:2237 replace_path+0x1243/0x2740 fs/btrfs/relocation.c:1224 merge_reloc_root+0xc46/0x1ad0 fs/btrfs/relocation.c:1692 merge_reloc_roots+0x3b3/0x980 fs/btrfs/relocation.c:1942 relocate_block_group+0xb0a/0xd40 fs/btrfs/relocation.c:3754 btrfs_relocate_block_group+0x77d/0xd90 fs/btrfs/relocation.c:4087 btrfs_relocate_chunk+0x12c/0x3b0 fs/btrfs/volumes.c:3494 __btrfs_balance+0x1b0f/0x26b0 fs/btrfs/volumes.c:4278 btrfs_balance+0xbdc/0x10c0 fs/btrfs/volumes.c:4655 btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3670 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:906 [inline] __se_sys_ioctl+0xf5/0x170 fs/ioctl.c:892 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f1ac6985d29 Code: ff ff c3 (...) RSP: 002b:00007f1ac63fe038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f1ac6b76160 RCX: 00007f1ac6985d29 RDX: 0000000020000180 RSI: 00000000c4009420 RDI: 0000000000000007 RBP: 00007f1ac6a01b08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000001 R14: 00007f1ac6b76160 R15: 00007fffda145a88 </TASK> Reported-by: syzbot+63913e558c084f7f8fdc@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/677b3014.050a0220.3b53b0.0064.GAE@google.com/ Fixes: 99785998ed1c ("btrfs: reduce lock contention when eb cache miss for btree search") Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-01-23Merge tag 'fs_for_v6.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull isofs update from Jan Kara: "Partial conversion of isofs to folios" * tag 'fs_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: isofs: Partially convert zisofs_read_folio to use a folio
2025-01-23dt-bindings: ufs: Correct indentation and style in DTS exampleKrzysztof Kozlowski
DTS example in the bindings should be indented with 2- or 4-spaces and aligned with opening '- |', so correct any differences like 3-spaces or mixtures 2- and 4-spaces in one binding. No functional changes here, but saves some comments during reviews of new patches built on existing code. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> # renesas Link: https://lore.kernel.org/r/20250107131019.246517-1-krzysztof.kozlowski@linaro.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-01-23Merge tag 'fsnotify_for_v6.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull inotify update from Jan Kara: "A small inotify strcpy() cleanup" * tag 'fsnotify_for_v6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: inotify: Use strscpy() for event->name copies
2025-01-23Merge tag 'xfs-merge-6.14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull XFS updates from Carlos Maiolino: "This is mostly focused on the implementation of reflink and reverse-mapping support for XFS's real-time devices. It also includes several bugfixes. - Implement reflink support for the realtime device - Implement reverse-mapping support for the realtime device - Several bug fixes and cleanups" * tag 'xfs-merge-6.14' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (121 commits) xfs: fix buffer lookup vs release race xfs: check for dead buffers in xfs_buf_find_insert xfs: add a b_iodone callback to struct xfs_buf xfs: move b_li_list based retry handling to common code xfs: simplify xfsaild_resubmit_item xfs: always complete the buffer inline in xfs_buf_submit xfs: remove the extra buffer reference in xfs_buf_submit xfs: move invalidate_kernel_vmap_range to xfs_buf_ioend xfs: simplify buffer I/O submission xfs: move in-memory buftarg handling out of _xfs_buf_ioapply xfs: move write verification out of _xfs_buf_ioapply xfs: remove xfs_buf_delwri_submit_buffers xfs: simplify xfs_buf_delwri_pushbuf xfs: move xfs_buf_iowait out of (__)xfs_buf_submit xfs: remove the incorrect comment about the b_pag field xfs: remove the incorrect comment above xfs_buf_free_maps xfs: fix a double completion for buffers on in-memory targets xfs/libxfs: replace kmalloc() and memcpy() with kmemdup() xfs: constify feature checks xfs: refactor xfs_fs_statfs ...
2025-01-23PM: Revert "Add EXPORT macros for exporting PM functions"Andy Shevchenko
Revert commit 41a337b40e98 ("Add EXPORT macros for exporting PM functions") because the macros added by it are still unused almost two years after they had been introduced. Reported-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20250116154354.149297-1-andriy.shevchenko@linux.intel.com [ rjw: New changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-23PM: hibernate: Add error handling for syscore_suspend()Wentao Liang
In hibernation_platform_enter(), the code did not check the return value of syscore_suspend(), potentially leading to a situation where syscore_resume() would be called even if syscore_suspend() failed. This could cause unpredictable behavior or system instability. Modify the code sequence in question to properly handle errors returned by syscore_suspend(). If an error occurs in the suspend path, the code now jumps to label 'Enable_irqs' skipping the syscore_resume() call and only enabling interrupts after setting the system state to SYSTEM_RUNNING. Fixes: 40dc166cb5dd ("PM / Core: Introduce struct syscore_ops for core subsystems PM") Signed-off-by: Wentao Liang <vulab@iscas.ac.cn> Link: https://patch.msgid.link/20250119143205.2103-1-vulab@iscas.ac.cn [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-23vfio/platform: check the bounds of read/write syscallsAlex Williamson
count and offset are passed from user space and not checked, only offset is capped to 40 bits, which can be used to read/write out of bounds of the device. Fixes: 6e3f26456009 (“vfio/platform: read and write support for the device fd”) Cc: stable@vger.kernel.org Reported-by: Mostafa Saleh <smostafa@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Mostafa Saleh <smostafa@google.com> Tested-by: Mostafa Saleh <smostafa@google.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2025-01-23cpufreq/schedutil: Only bind threads if neededChristian Loehle
Remove the unconditional binding of sugov kthreads to the affected CPUs if the cpufreq driver indicates that updates can happen from any CPU. This allows userspace to set affinities to either save power (waking up bigger CPUs on HMP can be expensive) or increasing performance (by letting the utilized CPUs run without preemption of the sugov kthread). Signed-off-by: Christian Loehle <christian.loehle@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://patch.msgid.link/5a8deed4-7764-4729-a9d4-9520c25fa7e8@arm.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-23cpufreq: ACPI: Remove set_boost in acpi_cpufreq_cpu_init()Lifeng Zheng
At the end of cpufreq_online() in cpufreq.c, set_boost is executed and the per-policy boost flag is set to mirror the cpufreq_driver boost, so it is not necessary to run set_boost in acpi_cpufreq_cpu_init(). Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20250117101457.1530653-5-zhenglifeng1@huawei.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-23cpufreq: CPPC: Fix wrong max_freq in policy initializationLifeng Zheng
In policy initialization, policy->max and policy->cpuinfo.max_freq are always set to the value calculated from caps->nominal_perf. This will cause the frequency stay on base frequency even if the policy is already boosted when a CPU is going online. Fix this by using policy->boost_enabled to determine which value should be set. Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20250117101457.1530653-4-zhenglifeng1@huawei.com [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-23cpufreq: Introduce a more generic way to set default per-policy boost flagLifeng Zheng
In cpufreq_online() of cpufreq.c, the per-policy boost flag is already set to mirror the cpufreq_driver boost during init but using freq_table to judge if the policy has boost frequency. There are two drawbacks to this approach: 1. It doesn't work for the cpufreq drivers that do not use a frequency table. For now, acpi-cpufreq and amd-pstate have to enable boost in policy initialization. And cppc_cpufreq never set policy to boost when going online no matter what the cpufreq_driver boost flag is. 2. If the CPU goes offline when cpufreq_driver boost is enabled and then goes online when cpufreq_driver boost is disabled, the per-policy boost flag will incorrectly remain true. Running set_boost at the end of the online process is a more generic way for all cpufreq drivers. Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com> Link: https://patch.msgid.link/20250117101457.1530653-3-zhenglifeng1@huawei.com Acked-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-23cpufreq: Fix re-boost issue after hotplugging a CPULifeng Zheng
It turns out that CPUX will stay on the base frequency after performing these operations: 1. boost all CPUs: echo 1 > /sys/devices/system/cpu/cpufreq/boost 2. offline one CPU: echo 0 > /sys/devices/system/cpu/cpuX/online 3. deboost all CPUs: echo 0 > /sys/devices/system/cpu/cpufreq/boost 4. online CPUX: echo 1 > /sys/devices/system/cpu/cpuX/online 5. boost all CPUs again: echo 1 > /sys/devices/system/cpu/cpufreq/boost This is because max_freq_req of the policy is not updated during the online process, and the value of max_freq_req before the last offline is retained. When the CPU is boosted again, freq_qos_update_request() will do nothing because the old value is the same as the new one. This causes the CPU to stay at the base frequency. Updating max_freq_req in cpufreq_online() will solve this problem. Signed-off-by: Lifeng Zheng <zhenglifeng1@huawei.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/20250117101457.1530653-2-zhenglifeng1@huawei.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-23cpufreq: s3c64xx: Fix compilation warningViresh Kumar
The driver generates following warning when regulator support isn't enabled in the kernel. Fix it. drivers/cpufreq/s3c64xx-cpufreq.c: In function 's3c64xx_cpufreq_set_target': >> drivers/cpufreq/s3c64xx-cpufreq.c:55:22: warning: variable 'old_freq' set but not used [-Wunused-but-set-variable] 55 | unsigned int old_freq, new_freq; | ^~~~~~~~ >> drivers/cpufreq/s3c64xx-cpufreq.c:54:30: warning: variable 'dvfs' set but not used [-Wunused-but-set-variable] 54 | struct s3c64xx_dvfs *dvfs; | ^~~~ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501191803.CtfT7b2o-lkp@intel.com/ Cc: 5.4+ <stable@vger.kernel.org> # v5.4+ Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://patch.msgid.link/236b227e929e5adc04d1e9e7af6845a46c8e9432.1737525916.git.viresh.kumar@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-23pwm: Ensure callbacks exist before calling themUwe Kleine-König
If one of the waveform functions is called for a chip that only supports .apply(), we want that an error code is returned and not a NULL pointer exception. Fixes: 6c5126c6406d ("pwm: Provide new consumer API functions for waveforms") Cc: stable@vger.kernel.org Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Tested-by: Trevor Gamblin <tgamblin@baylibre.com> Link: https://lore.kernel.org/r/20250123172709.391349-2-u.kleine-koenig@baylibre.com Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org>
2025-01-23ACPI: x86: Add skip i2c clients quirk for Vexia EDU ATLA 10 tablet 5VHans de Goede
The Vexia EDU ATLA 10 tablet comes in 2 different versions with significantly different mainboards. The only outward difference is that the charging barrel on one is marked 5V and the other is marked 9V. Both ship with Android 4.4 as factory OS and have the usual broken DSDT issues for x86 Android tablets. Add a quirk to skip ACPI I2C client enumeration for the 5V version to complement the existing quirk for the 9V version. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://patch.msgid.link/20250123132202.18209-1-hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-01-23drm/v3d: Assign job pointer to NULL before signaling the fenceMaíra Canal
In commit e4b5ccd392b9 ("drm/v3d: Ensure job pointer is set to NULL after job completion"), we introduced a change to assign the job pointer to NULL after completing a job, indicating job completion. However, this approach created a race condition between the DRM scheduler workqueue and the IRQ execution thread. As soon as the fence is signaled in the IRQ execution thread, a new job starts to be executed. This results in a race condition where the IRQ execution thread sets the job pointer to NULL simultaneously as the `run_job()` function assigns a new job to the pointer. This race condition can lead to a NULL pointer dereference if the IRQ execution thread sets the job pointer to NULL after `run_job()` assigns it to the new job. When the new job completes and the GPU emits an interrupt, `v3d_irq()` is triggered, potentially causing a crash. [ 466.310099] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000c0 [ 466.318928] Mem abort info: [ 466.321723] ESR = 0x0000000096000005 [ 466.325479] EC = 0x25: DABT (current EL), IL = 32 bits [ 466.330807] SET = 0, FnV = 0 [ 466.333864] EA = 0, S1PTW = 0 [ 466.337010] FSC = 0x05: level 1 translation fault [ 466.341900] Data abort info: [ 466.344783] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 [ 466.350285] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 466.355350] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 466.360677] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000089772000 [ 466.367140] [00000000000000c0] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 [ 466.375875] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP [ 466.382163] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device algif_hash algif_skcipher af_alg bnep binfmt_misc vc4 snd_soc_hdmi_codec drm_display_helper cec brcmfmac_wcc spidev rpivid_hevc(C) drm_client_lib brcmfmac hci_uart drm_dma_helper pisp_be btbcm brcmutil snd_soc_core aes_ce_blk v4l2_mem2mem bluetooth aes_ce_cipher snd_compress videobuf2_dma_contig ghash_ce cfg80211 gf128mul snd_pcm_dmaengine videobuf2_memops ecdh_generic sha2_ce ecc videobuf2_v4l2 snd_pcm v3d sha256_arm64 rfkill videodev snd_timer sha1_ce libaes gpu_sched snd videobuf2_common sha1_generic drm_shmem_helper mc rp1_pio drm_kms_helper raspberrypi_hwmon spi_bcm2835 gpio_keys i2c_brcmstb rp1 raspberrypi_gpiomem rp1_mailbox rp1_adc nvmem_rmem uio_pdrv_genirq uio i2c_dev drm ledtrig_pattern drm_panel_orientation_quirks backlight fuse dm_mod ip_tables x_tables ipv6 [ 466.458429] CPU: 0 UID: 1000 PID: 2008 Comm: chromium Tainted: G C 6.13.0-v8+ #18 [ 466.467336] Tainted: [C]=CRAP [ 466.470306] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT) [ 466.476157] pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 466.483143] pc : v3d_irq+0x118/0x2e0 [v3d] [ 466.487258] lr : __handle_irq_event_percpu+0x60/0x228 [ 466.492327] sp : ffffffc080003ea0 [ 466.495646] x29: ffffffc080003ea0 x28: ffffff80c0c94200 x27: 0000000000000000 [ 466.502807] x26: ffffffd08dd81d7b x25: ffffff80c0c94200 x24: ffffff8003bdc200 [ 466.509969] x23: 0000000000000001 x22: 00000000000000a7 x21: 0000000000000000 [ 466.517130] x20: ffffff8041bb0000 x19: 0000000000000001 x18: 0000000000000000 [ 466.524291] x17: ffffffafadfb0000 x16: ffffffc080000000 x15: 0000000000000000 [ 466.531452] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 466.538613] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffffd08c527eb0 [ 466.545777] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 466.552941] x5 : ffffffd08c4100d0 x4 : ffffffafadfb0000 x3 : ffffffc080003f70 [ 466.560102] x2 : ffffffc0829e8058 x1 : 0000000000000001 x0 : 0000000000000000 [ 466.567263] Call trace: [ 466.569711] v3d_irq+0x118/0x2e0 [v3d] (P) [ 466.573826] __handle_irq_event_percpu+0x60/0x228 [ 466.578546] handle_irq_event+0x54/0xb8 [ 466.582391] handle_fasteoi_irq+0xac/0x240 [ 466.586498] generic_handle_domain_irq+0x34/0x58 [ 466.591128] gic_handle_irq+0x48/0xd8 [ 466.594798] call_on_irq_stack+0x24/0x58 [ 466.598730] do_interrupt_handler+0x88/0x98 [ 466.602923] el0_interrupt+0x44/0xc0 [ 466.606508] __el0_irq_handler_common+0x18/0x28 [ 466.611050] el0t_64_irq_handler+0x10/0x20 [ 466.615156] el0t_64_irq+0x198/0x1a0 [ 466.618740] Code: 52800035 3607faf3 f9442e80 52800021 (f9406018) [ 466.624853] ---[ end trace 0000000000000000 ]--- [ 466.629483] Kernel panic - not syncing: Oops: Fatal exception in interrupt [ 466.636384] SMP: stopping secondary CPUs [ 466.640320] Kernel Offset: 0x100c400000 from 0xffffffc080000000 [ 466.646259] PHYS_OFFSET: 0x0 [ 466.649141] CPU features: 0x100,00000170,00901250,0200720b [ 466.654644] Memory Limit: none [ 466.657706] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- Fix the crash by assigning the job pointer to NULL before signaling the fence. This ensures that the job pointer is cleared before any new job starts execution, preventing the race condition and the NULL pointer dereference crash. Cc: stable@vger.kernel.org Fixes: e4b5ccd392b9 ("drm/v3d: Ensure job pointer is set to NULL after job completion") Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Tested-by: Phil Elwell <phil@raspberrypi.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250123012403.20447-1-mcanal@igalia.com
2025-01-23hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYINGFrederic Weisbecker
hrtimers are migrated away from the dying CPU to any online target at the CPUHP_AP_HRTIMERS_DYING stage in order not to delay bandwidth timers handling tasks involved in the CPU hotplug forward progress. However wakeups can still be performed by the outgoing CPU after CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth timers being armed. Depending on several considerations (crystal ball power management based election, earliest timer already enqueued, timer migration enabled or not), the target may eventually be the current CPU even if offline. If that happens, the timer is eventually ignored. The most notable example is RCU which had to deal with each and every of those wake-ups by deferring them to an online CPU, along with related workarounds: _ e787644caf76 (rcu: Defer RCU kthreads wakeup when CPU is dying) _ 9139f93209d1 (rcu/nocb: Fix RT throttling hrtimer armed from offline CPU) _ f7345ccc62a4 (rcu/nocb: Fix rcuog wake-up from offline softirq) The problem isn't confined to RCU though as the stop machine kthread (which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at the end of its work through cpu_stop_signal_done() and performs a wake up that eventually arms the deadline server timer: WARNING: CPU: 94 PID: 588 at kernel/time/hrtimer.c:1086 hrtimer_start_range_ns+0x289/0x2d0 CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not tainted Stopper: multi_cpu_stop+0x0/0x120 <- stop_machine_cpuslocked+0x66/0xc0 RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0 Call Trace: <TASK> start_dl_timer enqueue_dl_entity dl_server_start enqueue_task_fair enqueue_task ttwu_do_activate try_to_wake_up complete cpu_stopper_thread Instead of providing yet another bandaid to work around the situation, fix it in the hrtimers infrastructure instead: always migrate away a timer to an online target whenever it is enqueued from an offline CPU. This will also allow to revert all the above RCU disgraceful hacks. Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Reported-by: Vlad Poenaru <vlad.wing@gmail.com> Reported-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Tested-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/all/20250117232433.24027-1-frederic@kernel.org Closes: 20241213203739.1519801-1-usamaarif642@gmail.com
2025-01-23hrtimers: Mark is_migration_base() with __always_inlineAndy Shevchenko
When is_migration_base() is unused, it prevents kernel builds with clang, `make W=1` and CONFIG_WERROR=y: kernel/time/hrtimer.c:156:20: error: unused function 'is_migration_base' [-Werror,-Wunused-function] 156 | static inline bool is_migration_base(struct hrtimer_clock_base *base) | ^~~~~~~~~~~~~~~~~ Fix this by marking it with __always_inline. [ tglx: Use __always_inline instead of __maybe_unused and move it into the usage sites conditional ] Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250116160745.243358-1-andriy.shevchenko@linux.intel.com
2025-01-23Merge branch 'pci/misc'Bjorn Helgaas
- Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM ACPI hotplug driver (Thomas Weißschuh) - Update PCI_EXP_LNKCAP_SLS comment (Lukas Wunner) - Drop superfluous pm_wakeup.h include (Wolfram Sang) - Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong Zhang) - Correct documentation of the 'config_acs=' kernel parameter (Akihiko Odaki) * pci/misc: Documentation: Fix pci=config_acs= example PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT PCI: Don't include 'pm_wakeup.h' directly PCI: Update code comment on PCI_EXP_LNKCAP_SLS for PCIe r3.0 PCI/ACPI: Constify 'struct bin_attribute' PCI/P2PDMA: Constify 'struct bin_attribute' PCI/VPD: Constify 'struct bin_attribute' PCI/sysfs: Constify 'struct bin_attribute'
2025-01-23Merge branch 'pci/controller/xilinx-cpm'Bjorn Helgaas
- Add DT binding and driver support for Xilinx Versal CPM5 (Thippeswamy Havalige) * pci/controller/xilinx-cpm: PCI: xilinx-cpm: Add support for Versal CPM5 Root Port Controller 1 dt-bindings: PCI: xilinx-cpm: Add compatible string for CPM5 host1
2025-01-23Merge branch 'pci/controller/rockchip'Bjorn Helgaas
- Add struct rockchip_pcie_ep kernel-doc to fix warnings (Damien Le Moal) - Simplify clock and reset handling by using bulk interfaces (Anand Moon) - Pass typed rockchip_pcie (not void) pointer to rockchip_pcie_disable_clocks() (Anand Moon) - Return -ENOMEM, not success, when pci_epc_mem_alloc_addr() fails (Dan Carpenter) * pci/controller/rockchip: PCI: rockchip-ep: Fix error code in rockchip_pcie_ep_init_ob_mem() PCI: rockchip: Refactor rockchip_pcie_disable_clocks() signature PCI: rockchip: Simplify reset control handling by using reset_control_bulk*() function PCI: rockchip: Simplify clock handling by using clk_bulk*() functions PCI: rockchip: Add missing fields descriptions for struct rockchip_pcie_ep
2025-01-23Merge branch 'pci/controller/rcar-ep'Bjorn Helgaas
- Avoid passing stack buffer as resource name (King Dix) * pci/controller/rcar-ep: PCI: rcar-ep: Fix incorrect variable used when calling devm_request_mem_region()
2025-01-23Merge branch 'pci/controller/mvebu'Bjorn Helgaas
- Add MODULE_DEVICE_TABLE() to enable module autoloading (Liao Chen) * pci/controller/mvebu: PCI: mvebu: Enable module autoloading
2025-01-23Merge branch 'pci/controller/microchip'Bjorn Helgaas
- Set up the inbound address translation based on whether the platform allows coherent or non-coherent DMA (Daire McNamara) - Update DT binding such that platforms are DMA-coherent by default and must specify 'dma-noncoherent' if needed (Conor Dooley) * pci/controller/microchip: dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent PCI: microchip: Set inbound address translation for coherent or non-coherent mode
2025-01-23Merge branch 'pci/controller/mediatek'Bjorn Helgaas
- Use clk_bulk_prepare_enable() instead of separate clk_bulk_prepare() and clk_bulk_enable() (Lorenzo Bianconi) - Rearrange reset assert/deassert so they're both done in the *_power_up() callbacks (Lorenzo Bianconi) - Document that Airoha EN7581 requires PHY init and power-on before PHY reset deassert, unlike other MediaTek Gen3 controllers (Lorenzo Bianconi) - Move Airoha EN7581 post-reset delay from the en7581 clock .enable() method to mtk_pcie_en7581_power_up() (Lorenzo Bianconi) - Sleep instead of delay during Airoha EN7581 power-up, since this is a non-atomic context (Lorenzo Bianconi) - Skip PERST# assertion on Airoha EN7581 during probe and suspend/resume to avoid a hardware defect (Lorenzo Bianconi) - Enable async probe to reduce system startup time (Douglas Anderson) * pci/controller/mediatek: PCI: mediatek-gen3: Enable async probe by default PCI: mediatek-gen3: Avoid PCIe resetting via PERST# for Airoha EN7581 SoC PCI: mediatek-gen3: Rely on msleep() in mtk_pcie_en7581_power_up() PCI: mediatek-gen3: Move reset delay in mtk_pcie_en7581_power_up() PCI: mediatek-gen3: Add comment about initialization order in mtk_pcie_en7581_power_up() PCI: mediatek-gen3: Move reset/assert callbacks in .power_up() PCI: mediatek-gen3: Rely on clk_bulk_prepare_enable() in mtk_pcie_en7581_power_up()
2025-01-23Merge branch 'pci/controller/layerscape'Bjorn Helgaas
- Simplify by using syscon_regmap_lookup_by_phandle_args() instead of syscon_regmap_lookup_by_phandle() followed by of_property_read_u32_array() (Krzysztof Kozlowski) * pci/controller/layerscape: PCI: layerscape: Use syscon_regmap_lookup_by_phandle_args
2025-01-23Merge branch 'pci/controller/imx6'Bjorn Helgaas
- Add DT compatible string 'fsl,imx8q-pcie-ep' and driver support for i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) Endpoints (Frank Li) - Add DT binding for optional i.MX95 Refclk and driver support to enable it if the platform hasn't enabled it (Richard Zhu) - Configure PHY based on controller being in Root Complex or Endpoint mode (Frank Li) - Rely on dbi2 and iATU base addresses from DT via dw_pcie_get_resources() instead of hardcoding them in imx6 (Richard Zhu) - Skip controller_id computation for i.MX7D since it only has one controller (Richard Zhu) - Deassert apps_reset in imx_pcie_deassert_core_reset() since it is asserted in imx_pcie_assert_core_reset() (Richard Zhu) - Add missing reference clock enable or disable logic for IMX6SX, IMX7D, IMX8MM (Richard Zhu) - Remove redundant imx7d_pcie_init_phy() since imx7d_pcie_enable_ref_clk() does the same thing (Richard Zhu) * pci/controller/imx6: PCI: imx6: Clean up comments and whitespace PCI: imx6: Remove surplus imx7d_pcie_init_phy() function PCI: imx6: Add missing reference clock disable logic PCI: imx6: Deassert apps_reset in imx_pcie_deassert_core_reset() PCI: imx6: Skip controller_id generation logic for i.MX7D PCI: imx6: Fetch dbi2 and iATU base addesses from DT PCI: imx6: Configure PHY based on Root Complex or Endpoint mode PCI: imx6: Add Refclk for i.MX95 PCIe dt-bindings: PCI: fsl,imx6q-pcie: Add Refclk for i.MX95 RC PCI: imx6: Add i.MX8Q PCIe Endpoint (EP) support dt-bindings: PCI: fsl,imx6q-pcie-ep: Add compatible string fsl,imx8q-pcie-ep # Conflicts: # drivers/pci/controller/dwc/pci-imx6.c
2025-01-23Merge branch 'pci/controller/dwc'Bjorn Helgaas
- Fix potential string truncation in dw_pcie_edma_irq_verify() (Niklas Cassel) - Don't wait for link up in DWC core if driver can detect Link Up event (Krishna chaitanya chundru) - If qcom 'global' IRQ is supported for detection of Link Up events, tell DWC core not to wait for link up (Krishna chaitanya chundru) - Update ICC and OPP votes after Link Up events (Krishna chaitanya chundru) - Use dw-rockchip dll_link_up IRQ to detect Link Up and enumerate devices so users don't have to manually rescan (Niklas Cassel) - In dw-rockchip, the 'sys' interrupt is required and detects Link Up events, so tell DWC core not to wait for link up (Niklas Cassel) - Always stop link in dw_pcie_suspend_noirq(), which is required at least for i.MX8QM to re-establish link on resume (Richard Zhu) - Drop racy and unnecessary LTSSM state check before sending PME_TURN_OFF message in dw_pcie_suspend_noirq() (Richard Zhu) - Add stubs for dw_pcie_suspend_noirq() dw_pcie_resume_noirq() when CONFIG_PCIE_DW_HOST is not defined so drivers don't need #ifdefs (Bjorn Helgaas) - Use DWC core suspend/resume functions for imx6 (Frank Li) - Add imx6 suspend/resume support for i.MX8MQ, i.MX8Q, and i.MX95 (Richard Zhu) - Add struct of_pci_range.parent_bus_addr for devices that need their immediate parent bus address, not the CPU address, e.g., to program an internal Address Translation Unit (iATU) (Frank Li) * pci/controller/dwc: PCI: dwc: Simplify config resource lookup of: address: Add parent_bus_addr to struct of_pci_range PCI: imx6: Add i.MX8MQ, i.MX8Q and i.MX95 PM support PCI: imx6: Use DWC common suspend resume method PCI: dwc: Add dw_pcie_suspend_noirq(), dw_pcie_resume_noirq() stubs for !CONFIG_PCIE_DW_HOST PCI: dwc: Remove LTSSM state test in dw_pcie_suspend_noirq() PCI: dwc: Always stop link in the dw_pcie_suspend_noirq PCI: dw-rockchip: Don't wait for link since we can detect Link Up PCI: dw-rockchip: Enumerate endpoints based on dll_link_up IRQ PCI: qcom: Update ICC and OPP values after Link Up event PCI: qcom: Don't wait for link if we can detect Link Up PCI: dwc: Don't wait for link up if driver can detect Link Up event PCI: dwc: Fix potential truncation in dw_pcie_edma_irq_verify() # Conflicts: # drivers/pci/controller/dwc/pci-imx6.c
2025-01-23Merge branch 'pci/controller/dra7xx'Bjorn Helgaas
- Simplify by using syscon_regmap_lookup_by_phandle_args() instead of syscon_regmap_lookup_by_phandle() followed by of_parse_phandle_with_fixed_args() or of_property_read_u32_index() (Krzysztof Kozlowski) * pci/controller/dra7xx: PCI: dra7xx: Use syscon_regmap_lookup_by_phandle_args
2025-01-23Merge branch 'pci/controller/iommu-map'Bjorn Helgaas
- Add host bridge .enable_device() and .disable_device() hooks for bridges that need to configure things like Requester ID to StreamID mapping when enabling devices (Frank Li) - Add imx6 Requester ID to StreamID mapping configuration when enabling devices (Frank Li) - Extend struct pci_ecam_ops with .enable_device() and .disable_device() hooks so drivers that use pci_host_common_probe() instead of their own .probe() have a way to set the .enable_device() callbacks (Marc Zyngier) - Convert pcie-apple StreamID mapping configuration from a bus notifier to the .enable_device() and .disable_device() callbacks (Marc Zyngier) * pci/controller/iommu-map: PCI: apple: Convert to {en,dis}able_device() callbacks PCI: host-generic: Allow {en,dis}able_device() to be provided via pci_ecam_ops PCI: imx6: Add IOMMU and ITS MSI support for i.MX95 PCI: Add enable_device() and disable_device() callbacks for bridges
2025-01-23Merge branch 'pci/dt-bindings'Bjorn Helgaas
- Convert mobiveil-pcie.txt to YAML and update 'interrupt-names' and 'reg-names' (Frank Li) - Add qcom DT SM8550 and SM8650 optional 'global' interrupt for link events (Neil Armstrong) - Add qcom DT 'compatible' strings for IPQ5424 PCIe controller (Manikanta Mylavarapu) * pci/dt-bindings: dt-bindings: PCI: qcom: Document the IPQ5424 PCIe controller dt-bindings: PCI: qcom,pcie-sm8550: Document 'global' interrupt dt-bindings: PCI: mobiveil: Convert mobiveil-pcie.txt to YAML
2025-01-23Merge branch 'pci/endpoint-test'Bjorn Helgaas
- Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing dma_chan_rx (Mohamed Khalfella) - Correct the DMA MEMCPY test so it doesn't fail if the Endpoint supports both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam) - Add pci-epf-test and pci_endpoint_test support for capabilities (Niklas Cassel) - Add Endpoint test for consecutive BARs (Niklas Cassel) - Remove redundant comparison from Endpoint BAR test because a > 1MB BAR can always be exactly covered by iterating with a 1MB buffer (Hans Zhang) - Correct the PCI Endpoint test IOCTL return value (Manivannan Sadhasivam) - Move PCI Endpoint tests from tools/pci to Kselftests (Manivannan Sadhasivam) - Convert PCI Endpoint tests to the Kselftest framework (Manivannan Sadhasivam) * pci/endpoint-test: selftests: pci_endpoint: Migrate to Kselftest framework selftests: Move PCI Endpoint tests from tools/pci to Kselftests misc: pci_endpoint_test: Fix IOCTL return value misc: pci_endpoint_test: Remove redundant 'remainder' test misc: pci_endpoint_test: Add consecutive BAR test misc: pci_endpoint_test: Add support for capabilities PCI: endpoint: pci-epf-test: Add support for capabilities PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error
2025-01-23Merge branch 'pci/endpoint'Bjorn Helgaas
- Destroy the EPC device in devm_pci_epc_destroy(), which previously didn't call devres_release() (Zijun Hu) - Simplify pci_epc_get() with class_find_device_by_name() (Zijun Hu) - Finish virtual EP removal in pci_epf_remove_vepf(), which previously caused a subsequent pci_epf_add_vepf() to fail with -EBUSY (Zijun Hu) - Write BAR_MASK before iATU registers in pci_epc_set_bar() so we don't depend on the BAR_MASK reset value being larger than the requested BAR size (Niklas Cassel) - Prevent changing BAR size/flags in pci_epc_set_bar() to prevent reads from bypassing the iATU if we reduced the BAR size (Niklas Cassel) - Verify address alignment when programming iATU so we don't attempt to write bits that are read-only because of the BAR size, which could lead to directing accesses to the wrong address (Niklas Cassel) - Implement artpec6 pci_epc_features so we can rely on all drivers supporting it so we can use it in EPC core code (Niklas Cassel) - Check for BARs of fixed size to prevent endpoint drivers from trying to change their size (Niklas Cassel) - Verify that requested BAR size is a power of two when endpoint driver sets the BAR (Niklas Cassel) * pci/endpoint: PCI: endpoint: Verify that requested BAR size is a power of two PCI: endpoint: Add size check for fixed size BARs in pci_epc_set_bar() PCI: artpec6: Implement dw_pcie_ep operation get_features PCI: dwc: ep: Add 'address' alignment to 'size' check in dw_pcie_prog_ep_inbound_atu() PCI: dwc: ep: Prevent changing BAR size/flags in pci_epc_set_bar() PCI: dwc: ep: Write BAR_MASK before iATU registers in pci_epc_set_bar() PCI: endpoint: Finish virtual EP removal in pci_epf_remove_vepf() PCI: endpoint: Simplify pci_epc_get() PCI: endpoint: Destroy the EPC device in devm_pci_epc_destroy() PCI: endpoint: Replace magic number '6' by PCI_STD_NUM_BARS
2025-01-23Merge branch 'pci/switchtec'Bjorn Helgaas
- Add Microchip PCI100X device IDs (Rakesh Babu Saladi) * pci/switchtec: PCI: switchtec: Add Microchip PCI100X device IDs
2025-01-23Merge branch 'pci/pm'Bjorn Helgaas
- Avoid D3 for Root Ports on TUXEDO Sirius Gen1 with old BIOS because the system can't wake up from suspend (Werner Sembach) * pci/pm: PCI: Avoid putting some root ports into D3 on TUXEDO Sirius Gen1
2025-01-23Merge branch 'pci/pci-sysfs'Bjorn Helgaas
- Move reset related sysfs code from pci.c to pci-sysfs.c where other similar code lives (Ilpo Järvinen) - Simplify reset_method_store() memory management by using __free() instead of explicit kfree() cleanup (Ilpo Järvinen) - Drop unnecessary zero initializer (Ilpo Järvinen) * pci/pci-sysfs: PCI/sysfs: Remove unnecessary zero in initializer PCI/sysfs: Use __free() in reset_method_store() PCI/sysfs: Move reset related sysfs code to correct file
2025-01-23Merge branch 'pci/of'Bjorn Helgaas
- Unexport of_pci_parse_bus_range() since it's only used in of.c (Bjorn Helgaas) - Drop 'No bus range found' message so we don't complain when DTs don't specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas) - Simplify devm_of_pci_get_host_bridge_resources() interface by dropping parameters that are always the same default values (Bjorn Helgaas) - Update comment reference to of_pci_get_host_bridge_resources(), which no longer exists (Bjorn Helgaas) - Rename the drivers/pci/of_property.c struct of_pci_range to of_pci_range_entry to avoid confusion with the global of_pci_range in include/linux/of_address.h (Bjorn Helgaas) * pci/of: PCI: of_property: Rename struct of_pci_range to of_pci_range_entry sparc/PCI: Update reference to devm_of_pci_get_host_bridge_resources() PCI: of: Simplify devm_of_pci_get_host_bridge_resources() interface PCI: of: Drop 'No bus range found' message PCI: Unexport of_pci_parse_bus_range()
2025-01-23Merge branch 'pci/err'Bjorn Helgaas
- Unexport pcie_read_tlp_log() to encourage drivers to use PCI core logging rather than building their own (Ilpo Järvinen) - Move TLP Log handling to its own file (Ilpo Järvinen) - Add #defines for TLP Header/Prefix log sizes (Ilpo Järvinen) - Store number of supported End-End TLP Prefixes always so we can read the correct number of DWORDs from the TLP Prefix Log (Ilpo Järvinen) - Read TLP Prefixes in addition to the Header Log in pcie_read_tlp_log() (Ilpo Järvinen) - Add pcie_print_tlp_log() to consolidate printing of TLP Header and Prefix Log (Ilpo Järvinen) * pci/err: PCI: Add pcie_print_tlp_log() to print TLP Header and Prefix Log PCI: Add TLP Prefix reading to pcie_read_tlp_log() PCI: Store number of supported End-End TLP Prefixes PCI: Use unsigned int i in pcie_read_tlp_log() PCI: Use same names in pcie_read_tlp_log() prototype and definition PCI: Add defines for TLP Header/Prefix log sizes PCI: Move TLP Log handling to its own file PCI: Don't expose pcie_read_tlp_log() outside PCI subsystem
2025-01-23Merge branch 'pci/enumeration'Bjorn Helgaas
- Batch sizing of multiple BARs while memory decoding is disabled instead of disabling/enabling decoding for each BAR individually; this optimizes virtualized environments where toggling decoding enable is expensive (Alex Williamson) * pci/enumeration: PCI: Batch BAR sizing operations
2025-01-23Merge branch 'pci/dpc'Bjorn Helgaas
- Quirk the Intel Raptor Lake-P PIO log size to accommodate vendor BIOSes that don't configure it correctly (Takashi Iwai) * pci/dpc: PCI/DPC: Quirk PIO log size for Intel Raptor Lake-P
2025-01-23Merge branch 'pci/devres'Bjorn Helgaas
- Update resource request API documentation to encourage callers to supply a driver name when requesting resources (Philipp Stanner) - Export pci_intx_unmanaged() and pcim_intx() (always managed) so callers of pci_intx() (which is sometimes managed) can explicitly choose the one they need (Philipp Stanner) - Convert drivers from pci_intx() to always-managed pcim_intx() or never-managed pci_intx_unmanaged(): amd_sfh, ata (ahci, ata_piix, pata_rdc, sata_sil24, sata_sis, sata_uli, sata_vsc), bnx2x, bna, ntb, qtnfmac, rtsx, tifm_7xx1, vfio, xen-pciback (Philipp Stanner) - Remove pci_intx_unmanaged() since pci_intx() is now always unmanaged and pcim_intx() is always managed (Philipp Stanner) * pci/devres: PCI: Remove devres from pci_intx() net/ethernet: Use never-managed version of pci_intx() HID: amd_sfh: Use always-managed version of pcim_intx() wifi: qtnfmac: use always-managed version of pcim_intx() ata: Use always-managed version of pci_intx() PCI/MSI: Use never-managed version of pci_intx() vfio/pci: Use never-managed version of pci_intx() misc: Use never-managed version of pci_intx() ntb: Use never-managed version of pci_intx() drivers/xen: Use never-managed version of pci_intx() PCI: Export pci_intx_unmanaged() and pcim_intx() PCI: Encourage resource request API users to supply driver name
2025-01-23Merge branch 'pci/aspm'Bjorn Helgaas
- Save parent L1 PM Substates config so when we restore it along with an endpoint's config, the parent info isn't junk (Jian-Hong Pan) * pci/aspm: PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()
2025-01-23io_uring/alloc_cache: get rid of _nocache() helperJens Axboe
Just allow passing in NULL for the cache, if the type in question doesn't have a cache associated with it. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-23io_uring: get rid of alloc cache init_once handlingJens Axboe
init_once is called when an object doesn't come from the cache, and hence needs initial clearing of certain members. While the whole struct could get cleared by memset() in that case, a few of the cache members are large enough that this may cause unnecessary overhead if the caches used aren't large enough to satisfy the workload. For those cases, some churn of kmalloc+kfree is to be expected. Ensure that the 3 users that need clearing put the members they need cleared at the start of the struct, and wrap the rest of the struct in a struct group so the offset is known. While at it, improve the interaction with KASAN such that when/if KASAN writes to members inside the struct that should be retained over caching, it won't trip over itself. For rw and net, the retaining of the iovec over caching is disabled if KASAN is enabled. A helper will free and clear those members in that case. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-01-23io_uring/uring_cmd: cleanup struct io_uring_cmd_data layoutJens Axboe
A few spots in uring_cmd assume that the SQEs copied are always at the start of the structure, and hence mix req->async_data and the struct itself. Clean that up and use the proper indices. Signed-off-by: Jens Axboe <axboe@kernel.dk>