summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-10-01btrfs: disable rate limiting when debug enabledLeo Martins
Disable ratelimiting for btrfs_printk when CONFIG_BTRFS_DEBUG is enabled. This allows for more verbose output which is often needed by functions like btrfs_dump_space_info(). Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Leo Martins <loemra.dev@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01btrfs: wait for fixup workers before stopping cleaner kthread during umountFilipe Manana
During unmount, at close_ctree(), we have the following steps in this order: 1) Park the cleaner kthread - this doesn't destroy the kthread, it basically halts its execution (wake ups against it work but do nothing); 2) We stop the cleaner kthread - this results in freeing the respective struct task_struct; 3) We call btrfs_stop_all_workers() which waits for any jobs running in all the work queues and then free the work queues. Syzbot reported a case where a fixup worker resulted in a crash when doing a delayed iput on its inode while attempting to wake up the cleaner at btrfs_add_delayed_iput(), because the task_struct of the cleaner kthread was already freed. This can happen during unmount because we don't wait for any fixup workers still running before we call kthread_stop() against the cleaner kthread, which stops and free all its resources. Fix this by waiting for any fixup workers at close_ctree() before we call kthread_stop() against the cleaner and run pending delayed iputs. The stack traces reported by syzbot were the following: BUG: KASAN: slab-use-after-free in __lock_acquire+0x77/0x2050 kernel/locking/lockdep.c:5065 Read of size 8 at addr ffff8880272a8a18 by task kworker/u8:3/52 CPU: 1 UID: 0 PID: 52 Comm: kworker/u8:3 Not tainted 6.12.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: btrfs-fixup btrfs_work_helper Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 __lock_acquire+0x77/0x2050 kernel/locking/lockdep.c:5065 lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5825 __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline] _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162 class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:551 [inline] try_to_wake_up+0xb0/0x1480 kernel/sched/core.c:4154 btrfs_writepage_fixup_worker+0xc16/0xdf0 fs/btrfs/inode.c:2842 btrfs_work_helper+0x390/0xc50 fs/btrfs/async-thread.c:314 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Allocated by task 2: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 unpoison_slab_object mm/kasan/common.c:319 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:345 kasan_slab_alloc include/linux/kasan.h:247 [inline] slab_post_alloc_hook mm/slub.c:4086 [inline] slab_alloc_node mm/slub.c:4135 [inline] kmem_cache_alloc_node_noprof+0x16b/0x320 mm/slub.c:4187 alloc_task_struct_node kernel/fork.c:180 [inline] dup_task_struct+0x57/0x8c0 kernel/fork.c:1107 copy_process+0x5d1/0x3d50 kernel/fork.c:2206 kernel_clone+0x223/0x880 kernel/fork.c:2787 kernel_thread+0x1bc/0x240 kernel/fork.c:2849 create_kthread kernel/kthread.c:412 [inline] kthreadd+0x60d/0x810 kernel/kthread.c:765 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Freed by task 61: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object mm/kasan/common.c:247 [inline] __kasan_slab_free+0x59/0x70 mm/kasan/common.c:264 kasan_slab_free include/linux/kasan.h:230 [inline] slab_free_hook mm/slub.c:2343 [inline] slab_free mm/slub.c:4580 [inline] kmem_cache_free+0x1a2/0x420 mm/slub.c:4682 put_task_struct include/linux/sched/task.h:144 [inline] delayed_put_task_struct+0x125/0x300 kernel/exit.c:228 rcu_do_batch kernel/rcu/tree.c:2567 [inline] rcu_core+0xaaa/0x17a0 kernel/rcu/tree.c:2823 handle_softirqs+0x2c5/0x980 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637 irq_exit_rcu+0x9/0x30 kernel/softirq.c:649 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1037 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1037 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702 Last potentially related work creation: kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47 __kasan_record_aux_stack+0xac/0xc0 mm/kasan/generic.c:541 __call_rcu_common kernel/rcu/tree.c:3086 [inline] call_rcu+0x167/0xa70 kernel/rcu/tree.c:3190 context_switch kernel/sched/core.c:5318 [inline] __schedule+0x184b/0x4ae0 kernel/sched/core.c:6675 schedule_idle+0x56/0x90 kernel/sched/core.c:6793 do_idle+0x56a/0x5d0 kernel/sched/idle.c:354 cpu_startup_entry+0x42/0x60 kernel/sched/idle.c:424 start_secondary+0x102/0x110 arch/x86/kernel/smpboot.c:314 common_startup_64+0x13e/0x147 The buggy address belongs to the object at ffff8880272a8000 which belongs to the cache task_struct of size 7424 The buggy address is located 2584 bytes inside of freed 7424-byte region [ffff8880272a8000, ffff8880272a9d00) The buggy address belongs to the physical page: page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x272a8 head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0 flags: 0xfff00000000040(head|node=0|zone=1|lastcpupid=0x7ff) page_type: f5(slab) raw: 00fff00000000040 ffff88801bafa500 dead000000000122 0000000000000000 raw: 0000000000000000 0000000080040004 00000001f5000000 0000000000000000 head: 00fff00000000040 ffff88801bafa500 dead000000000122 0000000000000000 head: 0000000000000000 0000000080040004 00000001f5000000 0000000000000000 head: 00fff00000000003 ffffea00009caa01 ffffffffffffffff 0000000000000000 head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected page_owner tracks the page as allocated page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 2, tgid 2 (kthreadd), ts 71247381401, free_ts 71214998153 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 prep_new_page mm/page_alloc.c:1545 [inline] get_page_from_freelist+0x3039/0x3180 mm/page_alloc.c:3457 __alloc_pages_noprof+0x256/0x6c0 mm/page_alloc.c:4733 alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265 alloc_slab_page+0x6a/0x120 mm/slub.c:2413 allocate_slab+0x5a/0x2f0 mm/slub.c:2579 new_slab mm/slub.c:2632 [inline] ___slab_alloc+0xcd1/0x14b0 mm/slub.c:3819 __slab_alloc+0x58/0xa0 mm/slub.c:3909 __slab_alloc_node mm/slub.c:3962 [inline] slab_alloc_node mm/slub.c:4123 [inline] kmem_cache_alloc_node_noprof+0x1fe/0x320 mm/slub.c:4187 alloc_task_struct_node kernel/fork.c:180 [inline] dup_task_struct+0x57/0x8c0 kernel/fork.c:1107 copy_process+0x5d1/0x3d50 kernel/fork.c:2206 kernel_clone+0x223/0x880 kernel/fork.c:2787 kernel_thread+0x1bc/0x240 kernel/fork.c:2849 create_kthread kernel/kthread.c:412 [inline] kthreadd+0x60d/0x810 kernel/kthread.c:765 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 page last free pid 5230 tgid 5230 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1108 [inline] free_unref_page+0xcd0/0xf00 mm/page_alloc.c:2638 discard_slab mm/slub.c:2678 [inline] __put_partials+0xeb/0x130 mm/slub.c:3146 put_cpu_partial+0x17c/0x250 mm/slub.c:3221 __slab_free+0x2ea/0x3d0 mm/slub.c:4450 qlink_free mm/kasan/quarantine.c:163 [inline] qlist_free_all+0x9a/0x140 mm/kasan/quarantine.c:179 kasan_quarantine_reduce+0x14f/0x170 mm/kasan/quarantine.c:286 __kasan_slab_alloc+0x23/0x80 mm/kasan/common.c:329 kasan_slab_alloc include/linux/kasan.h:247 [inline] slab_post_alloc_hook mm/slub.c:4086 [inline] slab_alloc_node mm/slub.c:4135 [inline] kmem_cache_alloc_noprof+0x135/0x2a0 mm/slub.c:4142 getname_flags+0xb7/0x540 fs/namei.c:139 do_sys_openat2+0xd2/0x1d0 fs/open.c:1409 do_sys_open fs/open.c:1430 [inline] __do_sys_openat fs/open.c:1446 [inline] __se_sys_openat fs/open.c:1441 [inline] __x64_sys_openat+0x247/0x2a0 fs/open.c:1441 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Memory state around the buggy address: ffff8880272a8900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880272a8980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8880272a8a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8880272a8a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff8880272a8b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Reported-by: syzbot+8aaf2df2ef0164ffe1fb@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/66fb36b1.050a0220.aab67.003b.GAE@google.com/ CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01btrfs: fix a NULL pointer dereference when failed to start a new trasacntionQu Wenruo
[BUG] Syzbot reported a NULL pointer dereference with the following crash: FAULT_INJECTION: forcing a failure. start_transaction+0x830/0x1670 fs/btrfs/transaction.c:676 prepare_to_relocate+0x31f/0x4c0 fs/btrfs/relocation.c:3642 relocate_block_group+0x169/0xd20 fs/btrfs/relocation.c:3678 ... BTRFS info (device loop0): balance: ended with status: -12 Oops: general protection fault, probably for non-canonical address 0xdffffc00000000cc: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000660-0x0000000000000667] RIP: 0010:btrfs_update_reloc_root+0x362/0xa80 fs/btrfs/relocation.c:926 Call Trace: <TASK> commit_fs_roots+0x2ee/0x720 fs/btrfs/transaction.c:1496 btrfs_commit_transaction+0xfaf/0x3740 fs/btrfs/transaction.c:2430 del_balance_item fs/btrfs/volumes.c:3678 [inline] reset_balance_state+0x25e/0x3c0 fs/btrfs/volumes.c:3742 btrfs_balance+0xead/0x10c0 fs/btrfs/volumes.c:4574 btrfs_ioctl_balance+0x493/0x7c0 fs/btrfs/ioctl.c:3673 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f [CAUSE] The allocation failure happens at the start_transaction() inside prepare_to_relocate(), and during the error handling we call unset_reloc_control(), which makes fs_info->balance_ctl to be NULL. Then we continue the error path cleanup in btrfs_balance() by calling reset_balance_state() which will call del_balance_item() to fully delete the balance item in the root tree. However during the small window between set_reloc_contrl() and unset_reloc_control(), we can have a subvolume tree update and created a reloc_root for that subvolume. Then we go into the final btrfs_commit_transaction() of del_balance_item(), and into btrfs_update_reloc_root() inside commit_fs_roots(). That function checks if fs_info->reloc_ctl is in the merge_reloc_tree stage, but since fs_info->reloc_ctl is NULL, it results a NULL pointer dereference. [FIX] Just add extra check on fs_info->reloc_ctl inside btrfs_update_reloc_root(), before checking fs_info->reloc_ctl->merge_reloc_tree. That DEAD_RELOC_TREE handling is to prevent further modification to the reloc tree during merge stage, but since there is no reloc_ctl at all, we do not need to bother that. Reported-by: syzbot+283673dbc38527ef9f3d@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/66f6bfa7.050a0220.38ace9.0019.GAE@google.com/ CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01btrfs: send: fix invalid clone operation for file that got its size decreasedFilipe Manana
During an incremental send we may end up sending an invalid clone operation, for the last extent of a file which ends at an unaligned offset that matches the final i_size of the file in the send snapshot, in case the file had its initial size (the size in the parent snapshot) decreased in the send snapshot. In this case the destination will fail to apply the clone operation because its end offset is not sector size aligned and it ends before the current size of the file. Sending the truncate operation always happens when we finish processing an inode, after we process all its extents (and xattrs, names, etc). So fix this by ensuring the file has a valid size before we send a clone operation for an unaligned extent that ends at the final i_size of the file. The size we truncate to matches the start offset of the clone range but it could be any value between that start offset and the final size of the file since the clone operation will expand the i_size if the current size is smaller than the end offset. The start offset of the range was chosen because it's always sector size aligned and avoids a truncation into the middle of a page, which results in dirtying the page due to filling part of it with zeroes and then making the clone operation at the receiver trigger IO. The following test reproduces the issue: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV mount $DEV $MNT # Create a file with a size of 256K + 5 bytes, having two extents, one # with a size of 128K and another one with a size of 128K + 5 bytes. last_ext_size=$((128 * 1024 + 5)) xfs_io -f -d -c "pwrite -S 0xab -b 128K 0 128K" \ -c "pwrite -S 0xcd -b $last_ext_size 128K $last_ext_size" \ $MNT/foo # Another file which we will later clone foo into, but initially with # a larger size than foo. xfs_io -f -c "pwrite -S 0xef 0 1M" $MNT/bar btrfs subvolume snapshot -r $MNT/ $MNT/snap1 # Now resize bar and clone foo into it. xfs_io -c "truncate 0" \ -c "reflink $MNT/foo" $MNT/bar btrfs subvolume snapshot -r $MNT/ $MNT/snap2 rm -f /tmp/send-full /tmp/send-inc btrfs send -f /tmp/send-full $MNT/snap1 btrfs send -p $MNT/snap1 -f /tmp/send-inc $MNT/snap2 umount $MNT mkfs.btrfs -f $DEV mount $DEV $MNT btrfs receive -f /tmp/send-full $MNT btrfs receive -f /tmp/send-inc $MNT umount $MNT Running it before this patch: $ ./test.sh (...) At subvol snap1 At snapshot snap2 ERROR: failed to clone extents to bar: Invalid argument A test case for fstests will be sent soon. Reported-by: Ben Millwood <thebenmachine@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAJhrHS2z+WViO2h=ojYvBPDLsATwLbg+7JaNCyYomv0fUxEpQQ@mail.gmail.com/ Fixes: 46a6e10a1ab1 ("btrfs: send: allow cloning non-aligned extent if it ends at i_size") CC: stable@vger.kernel.org # 6.11 Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01btrfs: tracepoints: end assignment with semicolon at btrfs_qgroup_extent ↵Filipe Manana
event class While running checkpatch.pl against a patch that modifies the btrfs_qgroup_extent event class, it complained about using a comma instead of a semicolon: $ ./scripts/checkpatch.pl qgroups/0003-btrfs-qgroups-remove-bytenr-field-from-struct-btrfs_.patch WARNING: Possible comma where semicolon could be used #215: FILE: include/trace/events/btrfs.h:1720: + __entry->bytenr = bytenr, __entry->num_bytes = rec->num_bytes; total: 0 errors, 1 warnings, 184 lines checked So replace the comma with a semicolon to silence checkpatch and possibly other tools. It also makes the code consistent with the rest. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01btrfs: drop the backref cache during relocation if we commitJosef Bacik
Since the inception of relocation we have maintained the backref cache across transaction commits, updating the backref cache with the new bytenr whenever we COWed blocks that were in the cache, and then updating their bytenr once we detected a transaction id change. This works as long as we're only ever modifying blocks, not changing the structure of the tree. However relocation does in fact change the structure of the tree. For example, if we are relocating a data extent, we will look up all the leaves that point to this data extent. We will then call do_relocation() on each of these leaves, which will COW down to the leaf and then update the file extent location. But, a key feature of do_relocation() is the pending list. This is all the pending nodes that we modified when we updated the file extent item. We will then process all of these blocks via finish_pending_nodes, which calls do_relocation() on all of the nodes that led up to that leaf. The purpose of this is to make sure we don't break sharing unless we absolutely have to. Consider the case that we have 3 snapshots that all point to this leaf through the same nodes, the initial COW would have created a whole new path. If we did this for all 3 snapshots we would end up with 3x the number of nodes we had originally. To avoid this we will cycle through each of the snapshots that point to each of these nodes and update their pointers to point at the new nodes. Once we update the pointer to the new node we will drop the node we removed the link for and all of its children via btrfs_drop_subtree(). This is essentially just btrfs_drop_snapshot(), but for an arbitrary point in the snapshot. The problem with this is that we will never reflect this in the backref cache. If we do this btrfs_drop_snapshot() for a node that is in the backref tree, we will leave the node in the backref tree. This becomes a problem when we change the transid, as now the backref cache has entire subtrees that no longer exist, but exist as if they still are pointed to by the same roots. In the best case scenario you end up with "adding refs to an existing tree ref" errors from insert_inline_extent_backref(), where we attempt to link in nodes on roots that are no longer valid. Worst case you will double free some random block and re-use it when there's still references to the block. This is extremely subtle, and the consequences are quite bad. There isn't a way to make sure our backref cache is consistent between transid's. In order to fix this we need to simply evict the entire backref cache anytime we cross transid's. This reduces performance in that we have to rebuild this backref cache every time we change transid's, but fixes the bug. This has existed since relocation was added, and is a pretty critical bug. There's a lot more cleanup that can be done now that this functionality is going away, but this patch is as small as possible in order to fix the problem and make it easy for us to backport it to all the kernels it needs to be backported to. Followup series will dismantle more of this code and simplify relocation drastically to remove this functionality. We have a reproducer that reproduced the corruption within a few minutes of running. With this patch it survives several iterations/hours of running the reproducer. Fixes: 3fd0a5585eb9 ("Btrfs: Metadata ENOSPC handling for balance") CC: stable@vger.kernel.org Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01btrfs: also add stripe entries for NOCOW writesJohannes Thumshirn
NOCOW writes do not generate stripe_extent entries in the RAID stripe tree, as the RAID stripe-tree feature initially was designed with a zoned filesystem in mind and on a zoned filesystem, we do not allow NOCOW writes. But the RAID stripe-tree feature is independent from the zoned feature, so we must also do NOCOW writes for RAID stripe-tree filesystems. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01btrfs: send: fix buffer overflow detection when copying path to cache entryFilipe Manana
Starting with commit c0247d289e73 ("btrfs: send: annotate struct name_cache_entry with __counted_by()") we annotated the variable length array "name" from the name_cache_entry structure with __counted_by() to improve overflow detection. However that alone was not correct, because the length of that array does not match the "name_len" field - it matches that plus 1 to include the NUL string terminator, so that makes a fortified kernel think there's an overflow and report a splat like this: strcpy: detected buffer overflow: 20 byte write of buffer size 19 WARNING: CPU: 3 PID: 3310 at __fortify_report+0x45/0x50 CPU: 3 UID: 0 PID: 3310 Comm: btrfs Not tainted 6.11.0-prnet #1 Hardware name: CompuLab Ltd. sbc-ihsw/Intense-PC2 (IPC2), BIOS IPC2_3.330.7 X64 03/15/2018 RIP: 0010:__fortify_report+0x45/0x50 Code: 48 8b 34 (...) RSP: 0018:ffff97ebc0d6f650 EFLAGS: 00010246 RAX: 7749924ef60fa600 RBX: ffff8bf5446a521a RCX: 0000000000000027 RDX: 00000000ffffdfff RSI: ffff97ebc0d6f548 RDI: ffff8bf84e7a1cc8 RBP: ffff8bf548574080 R08: ffffffffa8c40e10 R09: 0000000000005ffd R10: 0000000000000004 R11: ffffffffa8c70e10 R12: ffff8bf551eef400 R13: 0000000000000000 R14: 0000000000000013 R15: 00000000000003a8 FS: 00007fae144de8c0(0000) GS:ffff8bf84e780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fae14691690 CR3: 00000001027a2003 CR4: 00000000001706f0 Call Trace: <TASK> ? __warn+0x12a/0x1d0 ? __fortify_report+0x45/0x50 ? report_bug+0x154/0x1c0 ? handle_bug+0x42/0x70 ? exc_invalid_op+0x1a/0x50 ? asm_exc_invalid_op+0x1a/0x20 ? __fortify_report+0x45/0x50 __fortify_panic+0x9/0x10 __get_cur_name_and_parent+0x3bc/0x3c0 get_cur_path+0x207/0x3b0 send_extent_data+0x709/0x10d0 ? find_parent_nodes+0x22df/0x25d0 ? mas_nomem+0x13/0x90 ? mtree_insert_range+0xa5/0x110 ? btrfs_lru_cache_store+0x5f/0x1e0 ? iterate_extent_inodes+0x52d/0x5a0 process_extent+0xa96/0x11a0 ? __pfx_lookup_backref_cache+0x10/0x10 ? __pfx_store_backref_cache+0x10/0x10 ? __pfx_iterate_backrefs+0x10/0x10 ? __pfx_check_extent_item+0x10/0x10 changed_cb+0x6fa/0x930 ? tree_advance+0x362/0x390 ? memcmp_extent_buffer+0xd7/0x160 send_subvol+0xf0a/0x1520 btrfs_ioctl_send+0x106b/0x11d0 ? __pfx___clone_root_cmp_sort+0x10/0x10 _btrfs_ioctl_send+0x1ac/0x240 btrfs_ioctl+0x75b/0x850 __se_sys_ioctl+0xca/0x150 do_syscall_64+0x85/0x160 ? __count_memcg_events+0x69/0x100 ? handle_mm_fault+0x1327/0x15c0 ? __se_sys_rt_sigprocmask+0xf1/0x180 ? syscall_exit_to_user_mode+0x75/0xa0 ? do_syscall_64+0x91/0x160 ? do_user_addr_fault+0x21d/0x630 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fae145eeb4f Code: 00 48 89 (...) RSP: 002b:00007ffdf1cb09b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fae145eeb4f RDX: 00007ffdf1cb0ad0 RSI: 0000000040489426 RDI: 0000000000000004 RBP: 00000000000078fe R08: 00007fae144006c0 R09: 00007ffdf1cb0927 R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffdf1cb1ce8 R13: 0000000000000003 R14: 000055c499fab2e0 R15: 0000000000000004 </TASK> Fix this by not storing the NUL string terminator since we don't actually need it for name cache entries, this way "name_len" corresponds to the actual size of the "name" array. This requires marking the "name" array field with __nonstring and using memcpy() instead of strcpy() as recommended by the guidelines at: https://github.com/KSPP/linux/issues/90 Reported-by: David Arendt <admin@prnet.org> Link: https://lore.kernel.org/linux-btrfs/cee4591a-3088-49ba-99b8-d86b4242b8bd@prnet.org/ Fixes: c0247d289e73 ("btrfs: send: annotate struct name_cache_entry with __counted_by()") CC: stable@vger.kernel.org # 6.11 Tested-by: David Arendt <admin@prnet.org> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-10-01Documentation: add missing folio_queue entryChristian Brauner
Add missing folio_queue entry. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20241001133920.6e28637b@canb.auug.org.au Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-01folio_queue: fix documentationChristian Brauner
s/folioq_count/folioq_full/ Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20241001134729.3f65ae78@canb.auug.org.au Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-01Input: adp5589-keys - fix adp5589_gpio_get_value()Nuno Sa
The adp5589 seems to have the same behavior as similar devices as explained in commit 910a9f5636f5 ("Input: adp5588-keys - get value from data out when dir is out"). Basically, when the gpio is set as output we need to get the value from ADP5589_GPO_DATA_OUT_A register instead of ADP5589_GPI_STATUS_A. Fixes: 9d2e173644bb ("Input: ADP5589 - new driver for I2C Keypad Decoder and I/O Expander") Signed-off-by: Nuno Sa <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20241001-b4-dev-adp5589-fw-conversion-v1-2-fca0149dfc47@analog.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-10-01Input: adp5589-keys - fix NULL pointer dereferenceNuno Sa
We register a devm action to call adp5589_clear_config() and then pass the i2c client as argument so that we can call i2c_get_clientdata() in order to get our device object. However, i2c_set_clientdata() is only being set at the end of the probe function which means that we'll get a NULL pointer dereference in case the probe function fails early. Fixes: 30df385e35a4 ("Input: adp5589-keys - use devm_add_action_or_reset() for register clear") Signed-off-by: Nuno Sa <nuno.sa@analog.com> Link: https://lore.kernel.org/r/20241001-b4-dev-adp5589-fw-conversion-v1-1-fca0149dfc47@analog.com Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-10-01netfs: Fix a KMSAN uninit-value error in netfs_clear_bufferChang Yu
Use folioq_count instead of folioq_nr_slots to fix a KMSAN uninit-value error in netfs_clear_buffer Signed-off-by: Chang Yu <marcus.yu.56@gmail.com> Link: https://lore.kernel.org/r/ZvuXWC2bYpvQsWgS@gmail.com Fixes: cd0277ed0c18 ("netfs: Use new folio_queue data type and iterator instead of xarray iter") Acked-by: David Howells <dhowells@redhat.com> Reported-by: syzbot+921873345a95f4dae7e9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=921873345a95f4dae7e9 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-01ipv4: ip_gre: Fix drops of small packets in ipgre_xmitAnton Danilov
Regression Description: Depending on the options specified for the GRE tunnel device, small packets may be dropped. This occurs because the pskb_network_may_pull function fails due to the packet's insufficient length. For example, if only the okey option is specified for the tunnel device, original (before encapsulation) packets smaller than 28 bytes (including the IPv4 header) will be dropped. This happens because the required length is calculated relative to the network header, not the skb->head. Here is how the required length is computed and checked: * The pull_len variable is set to 28 bytes, consisting of: * IPv4 header: 20 bytes * GRE header with Key field: 8 bytes * The pskb_network_may_pull function adds the network offset, shifting the checkable space further to the beginning of the network header and extending it to the beginning of the packet. As a result, the end of the checkable space occurs beyond the actual end of the packet. Instead of ensuring that 28 bytes are present in skb->head, the function is requesting these 28 bytes starting from the network header. For small packets, this requested length exceeds the actual packet size, causing the check to fail and the packets to be dropped. This issue affects both locally originated and forwarded packets in DMVPN-like setups. How to reproduce (for local originated packets): ip link add dev gre1 type gre ikey 1.9.8.4 okey 1.9.8.4 \ local <your-ip> remote 0.0.0.0 ip link set mtu 1400 dev gre1 ip link set up dev gre1 ip address add 192.168.13.1/24 dev gre1 ip neighbor add 192.168.13.2 lladdr <remote-ip> dev gre1 ping -s 1374 -c 10 192.168.13.2 tcpdump -vni gre1 tcpdump -vni <your-ext-iface> 'ip proto 47' ip -s -s -d link show dev gre1 Solution: Use the pskb_may_pull function instead the pskb_network_may_pull. Fixes: 80d875cfc9d3 ("ipv4: ip_gre: Avoid skb_pull() failure in ipgre_xmit()") Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240924235158.106062-1-littlesmilingcloud@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01Revert "Input: Add driver for PixArt PS/2 touchpad"Dmitry Torokhov
This reverts commit 740ff03d7238214a318cdcfd96dec51832b053d2 because current PixArt detection is too greedy and claims devices that are not PixArt. Reported-by: Benjamin Tissoires <bentiss@kernel.org> Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2314756 Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-10-01net: stmmac: dwmac4: extend timeout for VLAN Tag register busy bit checkShenwei Wang
Increase the timeout for checking the busy bit of the VLAN Tag register from 10µs to 500ms. This change is necessary to accommodate scenarios where Energy Efficient Ethernet (EEE) is enabled. Overnight testing revealed that when EEE is active, the busy bit can remain set for up to approximately 300ms. The new 500ms timeout provides a safety margin. Fixes: ed64639bc1e0 ("net: stmmac: Add support for VLAN Rx filtering") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com> Link: https://patch.msgid.link/20240924205424.573913-1-shenwei.wang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01iov_iter: fix advancing slot in iter_folioq_get_pages()Omar Sandoval
iter_folioq_get_pages() decides to advance to the next folioq slot when it has reached the end of the current folio. However, it is checking offset, which is the beginning of the current part, instead of iov_offset, which is adjusted to the end of the current part, so it doesn't advance the slot when it's supposed to. As a result, on the next iteration, we'll use the same folio with an out-of-bounds offset and return an unrelated page. This manifested as various crashes and other failures in 9pfs in drgn's VM testing setup and BPF CI. Fixes: db0aa2e9566f ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios") Link: https://lore.kernel.org/linux-fsdevel/20240923183432.1876750-1-chantr4@gmail.com/ Tested-by: Manu Bretelle <chantr4@gmail.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Link: https://lore.kernel.org/r/cbaf141ba6c0e2e209717d02746584072844841a.1727722269.git.osandov@fb.com Tested-by: Eduard Zingerman <eddyz87@gmail.com> Tested-by: Leon Romanovsky <leon@kernel.org> Tested-by: Joey Gouly <joey.gouly@arm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-01Merge branch 'net-two-fixes-for-qdisc_pkt_len_init'Paolo Abeni
Eric Dumazet says: ==================== net: two fixes for qdisc_pkt_len_init() Inspired by one syzbot report. At least one qdisc (fq_codel) depends on qdisc_skb_cb(skb)->pkt_len having a sane value (not zero) With the help of af_packet, syzbot was able to fool qdisc_pkt_len_init() to precisely set qdisc_skb_cb(skb)->pkt_len to zero. First patch fixes this issue. Second one (a separate one to help future bisections) adds more sanity check to SKB_GSO_DODGY users. ==================== Link: https://patch.msgid.link/20240924150257.1059524-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01net: add more sanity checks to qdisc_pkt_len_init()Eric Dumazet
One path takes care of SKB_GSO_DODGY, assuming skb->len is bigger than hdr_len. virtio_net_hdr_to_skb() does not fully dissect TCP headers, it only make sure it is at least 20 bytes. It is possible for an user to provide a malicious 'GSO' packet, total length of 80 bytes. - 20 bytes of IPv4 header - 60 bytes TCP header - a small gso_size like 8 virtio_net_hdr_to_skb() would declare this packet as a normal GSO packet, because it would see 40 bytes of payload, bigger than gso_size. We need to make detect this case to not underflow qdisc_skb_cb(skb)->pkt_len. Fixes: 1def9238d4aa ("net_sched: more precise pkt_len computation") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01net: avoid potential underflow in qdisc_pkt_len_init() with UFOEric Dumazet
After commit 7c6d2ecbda83 ("net: be more gentle about silly gso requests coming from user") virtio_net_hdr_to_skb() had sanity check to detect malicious attempts from user space to cook a bad GSO packet. Then commit cf9acc90c80ec ("net: virtio_net_hdr_to_skb: count transport header in UFO") while fixing one issue, allowed user space to cook a GSO packet with the following characteristic : IPv4 SKB_GSO_UDP, gso_size=3, skb->len = 28. When this packet arrives in qdisc_pkt_len_init(), we end up with hdr_len = 28 (IPv4 header + UDP header), matching skb->len Then the following sets gso_segs to 0 : gso_segs = DIV_ROUND_UP(skb->len - hdr_len, shinfo->gso_size); Then later we set qdisc_skb_cb(skb)->pkt_len to back to zero :/ qdisc_skb_cb(skb)->pkt_len += (gso_segs - 1) * hdr_len; This leads to the following crash in fq_codel [1] qdisc_pkt_len_init() is best effort, we only want an estimation of the bytes sent on the wire, not crashing the kernel. This patch is fixing this particular issue, a following one adds more sanity checks for another potential bug. [1] [ 70.724101] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 70.724561] #PF: supervisor read access in kernel mode [ 70.724561] #PF: error_code(0x0000) - not-present page [ 70.724561] PGD 10ac61067 P4D 10ac61067 PUD 107ee2067 PMD 0 [ 70.724561] Oops: Oops: 0000 [#1] SMP NOPTI [ 70.724561] CPU: 11 UID: 0 PID: 2163 Comm: b358537762 Not tainted 6.11.0-virtme #991 [ 70.724561] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 70.724561] RIP: 0010:fq_codel_enqueue (net/sched/sch_fq_codel.c:120 net/sched/sch_fq_codel.c:168 net/sched/sch_fq_codel.c:230) sch_fq_codel [ 70.724561] Code: 24 08 49 c1 e1 06 44 89 7c 24 18 45 31 ed 45 31 c0 31 ff 89 44 24 14 4c 03 8b 90 01 00 00 eb 04 39 ca 73 37 4d 8b 39 83 c7 01 <49> 8b 17 49 89 11 41 8b 57 28 45 8b 5f 34 49 c7 07 00 00 00 00 49 All code ======== 0: 24 08 and $0x8,%al 2: 49 c1 e1 06 shl $0x6,%r9 6: 44 89 7c 24 18 mov %r15d,0x18(%rsp) b: 45 31 ed xor %r13d,%r13d e: 45 31 c0 xor %r8d,%r8d 11: 31 ff xor %edi,%edi 13: 89 44 24 14 mov %eax,0x14(%rsp) 17: 4c 03 8b 90 01 00 00 add 0x190(%rbx),%r9 1e: eb 04 jmp 0x24 20: 39 ca cmp %ecx,%edx 22: 73 37 jae 0x5b 24: 4d 8b 39 mov (%r9),%r15 27: 83 c7 01 add $0x1,%edi 2a:* 49 8b 17 mov (%r15),%rdx <-- trapping instruction 2d: 49 89 11 mov %rdx,(%r9) 30: 41 8b 57 28 mov 0x28(%r15),%edx 34: 45 8b 5f 34 mov 0x34(%r15),%r11d 38: 49 c7 07 00 00 00 00 movq $0x0,(%r15) 3f: 49 rex.WB Code starting with the faulting instruction =========================================== 0: 49 8b 17 mov (%r15),%rdx 3: 49 89 11 mov %rdx,(%r9) 6: 41 8b 57 28 mov 0x28(%r15),%edx a: 45 8b 5f 34 mov 0x34(%r15),%r11d e: 49 c7 07 00 00 00 00 movq $0x0,(%r15) 15: 49 rex.WB [ 70.724561] RSP: 0018:ffff95ae85e6fb90 EFLAGS: 00000202 [ 70.724561] RAX: 0000000002000000 RBX: ffff95ae841de000 RCX: 0000000000000000 [ 70.724561] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001 [ 70.724561] RBP: ffff95ae85e6fbf8 R08: 0000000000000000 R09: ffff95b710a30000 [ 70.724561] R10: 0000000000000000 R11: bdf289445ce31881 R12: ffff95ae85e6fc58 [ 70.724561] R13: 0000000000000000 R14: 0000000000000040 R15: 0000000000000000 [ 70.724561] FS: 000000002c5c1380(0000) GS:ffff95bd7fcc0000(0000) knlGS:0000000000000000 [ 70.724561] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 70.724561] CR2: 0000000000000000 CR3: 000000010c568000 CR4: 00000000000006f0 [ 70.724561] Call Trace: [ 70.724561] <TASK> [ 70.724561] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434) [ 70.724561] ? page_fault_oops (arch/x86/mm/fault.c:715) [ 70.724561] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:26 ./arch/x86/include/asm/irqflags.h:87 ./arch/x86/include/asm/irqflags.h:147 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539) [ 70.724561] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623) [ 70.724561] ? fq_codel_enqueue (net/sched/sch_fq_codel.c:120 net/sched/sch_fq_codel.c:168 net/sched/sch_fq_codel.c:230) sch_fq_codel [ 70.724561] dev_qdisc_enqueue (net/core/dev.c:3784) [ 70.724561] __dev_queue_xmit (net/core/dev.c:3880 (discriminator 2) net/core/dev.c:4390 (discriminator 2)) [ 70.724561] ? irqentry_enter (kernel/entry/common.c:237) [ 70.724561] ? sysvec_apic_timer_interrupt (./arch/x86/include/asm/hardirq.h:74 (discriminator 2) arch/x86/kernel/apic/apic.c:1043 (discriminator 2) arch/x86/kernel/apic/apic.c:1043 (discriminator 2)) [ 70.724561] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:58 (discriminator 4)) [ 70.724561] ? asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:702) [ 70.724561] ? virtio_net_hdr_to_skb.constprop.0 (./include/linux/virtio_net.h:129 (discriminator 1)) [ 70.724561] packet_sendmsg (net/packet/af_packet.c:3145 (discriminator 1) net/packet/af_packet.c:3177 (discriminator 1)) [ 70.724561] ? _raw_spin_lock_bh (./arch/x86/include/asm/atomic.h:107 (discriminator 4) ./include/linux/atomic/atomic-arch-fallback.h:2170 (discriminator 4) ./include/linux/atomic/atomic-instrumented.h:1302 (discriminator 4) ./include/asm-generic/qspinlock.h:111 (discriminator 4) ./include/linux/spinlock.h:187 (discriminator 4) ./include/linux/spinlock_api_smp.h:127 (discriminator 4) kernel/locking/spinlock.c:178 (discriminator 4)) [ 70.724561] ? netdev_name_node_lookup_rcu (net/core/dev.c:325 (discriminator 1)) [ 70.724561] __sys_sendto (net/socket.c:730 (discriminator 1) net/socket.c:745 (discriminator 1) net/socket.c:2210 (discriminator 1)) [ 70.724561] ? __sys_setsockopt (./include/linux/file.h:34 net/socket.c:2355) [ 70.724561] __x64_sys_sendto (net/socket.c:2222 (discriminator 1) net/socket.c:2218 (discriminator 1) net/socket.c:2218 (discriminator 1)) [ 70.724561] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1)) [ 70.724561] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130) [ 70.724561] RIP: 0033:0x41ae09 Fixes: cf9acc90c80ec ("net: virtio_net_hdr_to_skb: count transport header in UFO") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jonathan Davies <jonathan.davies@nutanix.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jonathan Davies <jonathan.davies@nutanix.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01net: ethernet: ti: cpsw_ale: Fix warning on some platformsRoger Quadros
The number of register fields cannot be assumed to be ALE_FIELDS_MAX as some platforms can have lesser fields. Solve this by embedding the actual number of fields available in platform data and use that instead of ALE_FIELDS_MAX. Gets rid of the below warning on BeagleBone Black [ 1.007735] WARNING: CPU: 0 PID: 33 at drivers/base/regmap/regmap.c:1208 regmap_field_init+0x88/0x9c [ 1.007802] invalid empty mask defined [ 1.007812] Modules linked in: [ 1.007842] CPU: 0 UID: 0 PID: 33 Comm: kworker/u4:3 Not tainted 6.11.0-01459-g508403ab7b74-dirty #840 [ 1.007867] Hardware name: Generic AM33XX (Flattened Device Tree) [ 1.007890] Workqueue: events_unbound deferred_probe_work_func [ 1.007935] Call trace: [ 1.007957] unwind_backtrace from show_stack+0x10/0x14 [ 1.007999] show_stack from dump_stack_lvl+0x50/0x64 [ 1.008033] dump_stack_lvl from __warn+0x70/0x124 [ 1.008077] __warn from warn_slowpath_fmt+0x194/0x1a8 [ 1.008113] warn_slowpath_fmt from regmap_field_init+0x88/0x9c [ 1.008154] regmap_field_init from devm_regmap_field_alloc+0x48/0x64 [ 1.008193] devm_regmap_field_alloc from cpsw_ale_create+0xfc/0x320 [ 1.008251] cpsw_ale_create from cpsw_init_common+0x214/0x354 [ 1.008286] cpsw_init_common from cpsw_probe+0x4ac/0xb88 Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Closes: https://lore.kernel.org/netdev/CAMuHMdUf-tKRDzkz2_m8qdFTFutefddU0NTratVrEjRTzA3yQQ@mail.gmail.com/ Fixes: 11cbcfeaa79e ("net: ethernet: ti: cpsw_ale: use regfields for number of Entries and Policers") Signed-off-by: Roger Quadros <rogerq@kernel.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240924-am65-cpsw-multi-rx-fix-v1-1-0ca3fa9a1398@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01net: microchip: Make FDMA config symbol invisibleGeert Uytterhoeven
There is no need to ask the user about enabling Microchip FDMA functionality, as all drivers that use it select the FDMA symbol. Hence make the symbol invisible, unless when compile-testing. Fixes: 30e48a75df9c6ead ("net: microchip: add FDMA library") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Daniel Machon <daniel.machon@microchip.com> Link: https://patch.msgid.link/8e2bcd8899c417a962b7ee3f75b29f35b25d7933.1727171879.git.geert+renesas@glider.be Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01net: fec: Reload PTP registers after link-state changeCsókás, Bence
On link-state change, the controller gets reset, which clears all PTP registers, including PHC time, calibrated clock correction values etc. For correct IEEE 1588 operation we need to restore these after the reset. Fixes: 6605b730c061 ("FEC: Add time stamping code and a PTP hardware clock") Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20240924093705.2897329-2-csokas.bence@prolan.hu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01net: fec: Restart PPS after link state changeCsókás, Bence
On link state change, the controller gets reset, causing PPS to drop out. Re-enable PPS if it was enabled before the controller reset. Fixes: 6605b730c061 ("FEC: Add time stamping code and a PTP hardware clock") Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu> Link: https://patch.msgid.link/20240924093705.2897329-1-csokas.bence@prolan.hu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01net: pcs: xpcs: fix the wrong register that was written backJiawen Wu
The value is read from the register TXGBE_RX_GEN_CTL3, and it should be written back to TXGBE_RX_GEN_CTL3 when it changes some fields. Cc: stable@vger.kernel.org Fixes: f629acc6f210 ("net: pcs: xpcs: support to switch mode for Wangxun NICs") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reported-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20240924022857.865422-1-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01net: ethernet: lantiq_etop: fix memory disclosureAleksander Jan Bajkowski
When applying padding, the buffer is not zeroed, which results in memory disclosure. The mentioned data is observed on the wire. This patch uses skb_put_padto() to pad Ethernet frames properly. The mentioned function zeroes the expanded buffer. In case the packet cannot be padded it is silently dropped. Statistics are also not incremented. This driver does not support statistics in the old 32-bit format or the new 64-bit format. These will be added in the future. In its current form, the patch should be easily backported to stable versions. Ethernet MACs on Amazon-SE and Danube cannot do padding of the packets in hardware, so software padding must be applied. Fixes: 504d4721ee8e ("MIPS: Lantiq: Add ethernet driver") Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://patch.msgid.link/20240923214949.231511-2-olek2@wp.pl Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01net: Fix gso_features_check to check for both dev->gso_{ipv4_,}max_sizeDaniel Borkmann
Commit 24ab059d2ebd ("net: check dev->gso_max_size in gso_features_check()") added a dev->gso_max_size test to gso_features_check() in order to fall back to GSO when needed. This was added as it was noticed that some drivers could misbehave if TSO packets get too big. However, the check doesn't respect dev->gso_ipv4_max_size limit. For instance, a device could be configured with BIG TCP for IPv4, but not IPv6. Therefore, add a netif_get_gso_max_size() equivalent to netif_get_gro_max_size() and use the helper to respect both limits before falling back to GSO engine. Fixes: 24ab059d2ebd ("net: check dev->gso_max_size in gso_features_check()") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240923212242.15669-2-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01net: Add netif_get_gro_max_size helper for GRODaniel Borkmann
Add a small netif_get_gro_max_size() helper which returns the maximum IPv4 or IPv6 GRO size of the netdevice. We later add a netif_get_gso_max_size() equivalent as well for GSO, so that these helpers can be used consistently instead of open-coded checks. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240923212242.15669-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-01net: dsa: improve shutdown sequenceVladimir Oltean
Alexander Sverdlin presents 2 problems during shutdown with the lan9303 driver. One is specific to lan9303 and the other just happens to reproduce there. The first problem is that lan9303 is unique among DSA drivers in that it calls dev_get_drvdata() at "arbitrary runtime" (not probe, not shutdown, not remove): phy_state_machine() -> ... -> dsa_user_phy_read() -> ds->ops->phy_read() -> lan9303_phy_read() -> chip->ops->phy_read() -> lan9303_mdio_phy_read() -> dev_get_drvdata() But we never stop the phy_state_machine(), so it may continue to run after dsa_switch_shutdown(). Our common pattern in all DSA drivers is to set drvdata to NULL to suppress the remove() method that may come afterwards. But in this case it will result in an NPD. The second problem is that the way in which we set dp->conduit->dsa_ptr = NULL; is concurrent with receive packet processing. dsa_switch_rcv() checks once whether dev->dsa_ptr is NULL, but afterwards, rather than continuing to use that non-NULL value, dev->dsa_ptr is dereferenced again and again without NULL checks: dsa_conduit_find_user() and many other places. In between dereferences, there is no locking to ensure that what was valid once continues to be valid. Both problems have the common aspect that closing the conduit interface solves them. In the first case, dev_close(conduit) triggers the NETDEV_GOING_DOWN event in dsa_user_netdevice_event() which closes user ports as well. dsa_port_disable_rt() calls phylink_stop(), which synchronously stops the phylink state machine, and ds->ops->phy_read() will thus no longer call into the driver after this point. In the second case, dev_close(conduit) should do this, as per Documentation/networking/driver.rst: | Quiescence | ---------- | | After the ndo_stop routine has been called, the hardware must | not receive or transmit any data. All in flight packets must | be aborted. If necessary, poll or wait for completion of | any reset commands. So it should be sufficient to ensure that later, when we zeroize conduit->dsa_ptr, there will be no concurrent dsa_switch_rcv() call on this conduit. The addition of the netif_device_detach() function is to ensure that ioctls, rtnetlinks and ethtool requests on the user ports no longer propagate down to the driver - we're no longer prepared to handle them. The race condition actually did not exist when commit 0650bf52b31f ("net: dsa: be compatible with masters which unregister on shutdown") first introduced dsa_switch_shutdown(). It was created later, when we stopped unregistering the user interfaces from a bad spot, and we just replaced that sequence with a racy zeroization of conduit->dsa_ptr (one which doesn't ensure that the interfaces aren't up). Reported-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Closes: https://lore.kernel.org/netdev/2d2e3bba17203c14a5ffdabc174e3b6bbb9ad438.camel@siemens.com/ Closes: https://lore.kernel.org/netdev/c1bf4de54e829111e0e4a70e7bd1cf523c9550ff.camel@siemens.com/ Fixes: ee534378f005 ("net: dsa: fix panic when DSA master device unbinds on shutdown") Reviewed-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Tested-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20240913203549.3081071-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-09-30cifs: Remove intermediate object of failed create reparse callPali Rohár
If CREATE was successful but SMB2_OP_SET_REPARSE failed then remove the intermediate object created by CREATE. Otherwise empty object stay on the server when reparse call failed. This ensures that if the creating of special files is unsupported by the server then no empty file stay on the server as a result of unsupported operation. Fixes: 102466f303ff ("smb: client: allow creating special files via reparse points") Signed-off-by: Pali Rohár <pali@kernel.org> Acked-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-30Revert "smb: client: make SHA-512 TFM ephemeral"Steve French
The original patch causes a crash with signed mounts when using the SMB2.1 dialect RIP: 0010:smb2_calc_signature+0x10e/0x460 [cifs] Code: 46 30 00 00 00 00 49 c7 46 38 00 00 00 00 0f 85 3e 01 00 00 48 8b 83 a8 02 00 00 48 89 85 68 ff ff ff 49 8b b4 24 58 01 00 00 <48> 8b 38 ba 10 00 00 00 e8 55 0f 0c e0 41 89 c7 85 c0 0f 85 44 01 RSP: 0018:ffffb349422fb5c8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff98028765b800 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff980200f2b100 RDI: 0000000000000000 RBP: ffffb349422fb680 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff980235e37800 R13: ffffb349422fb900 R14: ffff98027c160700 R15: ffff98028765b820 FS: 000074139b98f780(0000) GS:ffff98097b980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000011cb78006 CR4: 00000000003726f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? show_regs+0x6c/0x80 ? __die+0x24/0x80 ? page_fault_oops+0x175/0x5c0 ? hrtimer_try_to_cancel.part.0+0x55/0xf0 ? do_user_addr_fault+0x4b2/0x870 ? exc_page_fault+0x85/0x1c0 ? asm_exc_page_fault+0x27/0x30 ? smb2_calc_signature+0x10e/0x460 [cifs] ? smb2_calc_signature+0xa7/0x460 [cifs] ? kmem_cache_alloc_noprof+0x101/0x300 smb2_sign_rqst+0xa2/0xe0 [cifs] smb2_setup_request+0x12d/0x240 [cifs] compound_send_recv+0x304/0x1220 [cifs] cifs_send_recv+0x22/0x40 [cifs] SMB2_tcon+0x2d9/0x8c0 [cifs] cifs_get_smb_ses+0x910/0xef0 [cifs] ? cifs_get_smb_ses+0x910/0xef0 [cifs] cifs_mount_get_session+0x6a/0x250 [cifs] Reported-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Suggested-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> This reverts commit 220d83b52c7d16ec3c168b82f4e6ce59c645f7ab.
2024-09-30Merge tag 'sched_ext-for-6.12-rc1-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - When sched_ext is in bypass mode (e.g. while disabling the BPF scheduler), it was using one DSQ to implement global FIFO scheduling as all it has to do is guaranteeing reasonable forward progress. On multi-socket machines, this can lead to live-lock conditions under certain workloads. Fixed by splitting the queue used for FIFO scheduling per NUMA node. This required several preparation patches. - Hotplug tests on powerpc could reliably trigger deadlock while enabling a BPF scheduler. This was caused by cpu_hotplug_lock nesting inside scx_fork_rwsem and then CPU hotplug path trying to fork a new thread while holding cpu_hotplug_lock. Fixed by restructuring locking in enable and disable paths so that the two locks are not coupled. This required several preparation patches which also fixed a couple other issues in the enable path. - A build fix for !CONFIG_SMP - Userspace tooling sync and updates * tag 'sched_ext-for-6.12-rc1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Remove redundant p->nr_cpus_allowed checker sched_ext: Decouple locks in scx_ops_enable() sched_ext: Decouple locks in scx_ops_disable_workfn() sched_ext: Add scx_cgroup_enabled to gate cgroup operations and fix scx_tg_online() sched_ext: Enable scx_ops_init_task() separately sched_ext: Fix SCX_TASK_INIT -> SCX_TASK_READY transitions in scx_ops_enable() sched_ext: Initialize in bypass mode sched_ext: Remove SCX_OPS_PREPPING sched_ext: Relocate check_hotplug_seq() call in scx_ops_enable() sched_ext: Use shorter slice while bypassing sched_ext: Split the global DSQ per NUMA node sched_ext: Relocate find_user_dsq() sched_ext: Allow only user DSQs for scx_bpf_consume(), scx_bpf_dsq_nr_queued() and bpf_iter_scx_dsq_new() scx_flatcg: Use a user DSQ for fallback instead of SCX_DSQ_GLOBAL tools/sched_ext: Receive misc updates from SCX repo sched_ext: Add __COMPAT helpers for features added during v6.12 devel cycle sched_ext: Build fix for !CONFIG_SMP
2024-09-30Merge tag 'probes-fixes-v6.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fix from Masami Hiramatsu: - uprobes: fix kernel info leak via "[uprobes]" vma Fix uprobes not to expose the uninitialized page for trampoline buffer to user space, which can leak kernel info. * tag 'probes-fixes-v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: uprobes: fix kernel info leak via "[uprobes]" vma
2024-09-30Merge tag 'vfs-6.12-rc2.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "afs: - Fix setting of the server responding flag - Remove unused struct afs_address_list and afs_put_address_list() function - Fix infinite loop because of unresponsive servers - Ensure that afs_retry_request() function is correctly added to the afs_req_ops netfs operations table netfs: - Fix netfs_folio tracepoint handling to handle NULL mappings - Add a missing folio_queue API documentation - Ensure that netfs_write_folio() correctly advances the iterator via iov_iter_advance() - Fix a dentry leak during concurrent cull and cookie lookup operations in cachefiles pidfs: - Correctly handle accessing another task's pid namespace" * tag 'vfs-6.12-rc2.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: netfs: Fix the netfs_folio tracepoint to handle NULL mapping netfs: Add folio_queue API documentation netfs: Advance iterator correctly rather than jumping it afs: Fix the setting of the server responding flag afs: Remove unused struct and function prototype afs: Fix possible infinite loop with unresponsive servers pidfs: check for valid pid namespace afs: Fix missing wire-up of afs_retry_request() cachefiles: fix dentry leak in cachefiles_open_file()
2024-09-30netfs: Fix the netfs_folio tracepoint to handle NULL mappingDavid Howells
Fix the netfs_folio tracepoint to handle folios that have a NULL mapping pointer. In such a case, just substitute a zero inode number. Fixes: c38f4e96e605 ("netfs: Provide func to copy data to pagecache for buffered write") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/2917423.1727697556@warthog.procyon.org.uk cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-30netfs: Add folio_queue API documentationDavid Howells
Add API documentation for folio_queue. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/2912369.1727691281@warthog.procyon.org.uk cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-doc@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-29bcachefs: rename version -> bversion for big endian buildsGuenter Roeck
Builds on big endian systems fail as follows. fs/bcachefs/bkey.h: In function 'bch2_bkey_format_add_key': fs/bcachefs/bkey.h:557:41: error: 'const struct bkey' has no member named 'bversion' The original commit only renamed the variable for little endian builds. Rename it for big endian builds as well to fix the problem. Fixes: cf49f8a8c277 ("bcachefs: rename version -> bversion") Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-09-29close_range(): fix the logics in descriptor table trimmingAl Viro
Cloning a descriptor table picks the size that would cover all currently opened files. That's fine for clone() and unshare(), but for close_range() there's an additional twist - we clone before we close, and it would be a shame to have close_range(3, ~0U, CLOSE_RANGE_UNSHARE) leave us with a huge descriptor table when we are not going to keep anything past stderr, just because some large file descriptor used to be open before our call has taken it out. Unfortunately, it had been dealt with in an inherently racy way - sane_fdtable_size() gets a "don't copy anything past that" argument (passed via unshare_fd() and dup_fd()), close_range() decides how much should be trimmed and passes that to unshare_fd(). The problem is, a range that used to extend to the end of descriptor table back when close_range() had looked at it might very well have stuff grown after it by the time dup_fd() has allocated a new files_struct and started to figure out the capacity of fdtable to be attached to that. That leads to interesting pathological cases; at the very least it's a QoI issue, since unshare(CLONE_FILES) is atomic in a sense that it takes a snapshot of descriptor table one might have observed at some point. Since CLOSE_RANGE_UNSHARE close_range() is supposed to be a combination of unshare(CLONE_FILES) with plain close_range(), ending up with a weird state that would never occur with unshare(2) is confusing, to put it mildly. It's not hard to get rid of - all it takes is passing both ends of the range down to sane_fdtable_size(). There we are under ->files_lock, so the race is trivially avoided. So we do the following: * switch close_files() from calling unshare_fd() to calling dup_fd(). * undo the calling convention change done to unshare_fd() in 60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE" * introduce struct fd_range, pass a pointer to that to dup_fd() and sane_fdtable_size() instead of "trim everything past that point" they are currently getting. NULL means "we are not going to be punching any holes"; NR_OPEN_MAX is gone. * make sane_fdtable_size() use find_last_bit() instead of open-coding it; it's easier to follow that way. * while we are at it, have dup_fd() report errors by returning ERR_PTR(), no need to use a separate int *errorp argument. Fixes: 60997c3d45d9 "close_range: add CLOSE_RANGE_UNSHARE" Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-09-30uprobes: fix kernel info leak via "[uprobes]" vmaOleg Nesterov
xol_add_vma() maps the uninitialized page allocated by __create_xol_area() into userspace. On some architectures (x86) this memory is readable even without VM_READ, VM_EXEC results in the same pgprot_t as VM_EXEC|VM_READ, although this doesn't really matter, debugger can read this memory anyway. Link: https://lore.kernel.org/all/20240929162047.GA12611@redhat.com/ Reported-by: Will Deacon <will@kernel.org> Fixes: d4b3b6384f98 ("uprobes/core: Allocate XOL slots for uprobes use") Cc: stable@vger.kernel.org Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2024-09-29smb: Update comments about some reparse point tagsPali Rohár
NFS-style reparse points are recognized only by the Windows NFS server 2012 and new. Windows 8 does not contain Windows NFS server, so these reparse points are not used on Windows 8. Reparse points with IO_REPARSE_TAG_AF_UNIX tag were primarily introduced for native Win32 AF_UNIX sockets and later were re-used by also by WSL: https://devblogs.microsoft.com/commandline/af_unix-comes-to-windows/ https://devblogs.microsoft.com/commandline/windowswsl-interop-with-af_unix/ Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-29cifs: Check for UTF-16 null codepoint in SFU symlink target locationPali Rohár
Check that read buffer of SFU symlink target location does not contain UTF-16 null codepoint (via UniStrnlen() call) because Linux cannot process symlink with null byte, it truncates everything in buffer after null byte. Fixes: cf2ce67345d6 ("cifs: Add support for reading SFU symlink location") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-09-29Linux 6.12-rc1v6.12-rc1Linus Torvalds
2024-09-29x86: kvm: fix build errorLinus Torvalds
The cpu_emergency_register_virt_callback() function is used unconditionally by the x86 kvm code, but it is declared (and defined) conditionally: #if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD) void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback); ... leading to a build error when neither KVM_INTEL nor KVM_AMD support is enabled: arch/x86/kvm/x86.c: In function ‘kvm_arch_enable_virtualization’: arch/x86/kvm/x86.c:12517:9: error: implicit declaration of function ‘cpu_emergency_register_virt_callback’ [-Wimplicit-function-declaration] 12517 | cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/x86/kvm/x86.c: In function ‘kvm_arch_disable_virtualization’: arch/x86/kvm/x86.c:12522:9: error: implicit declaration of function ‘cpu_emergency_unregister_virt_callback’ [-Wimplicit-function-declaration] 12522 | cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix the build by defining empty helper functions the same way the old cpu_emergency_disable_virtualization() function was dealt with for the same situation. Maybe we could instead have made the call sites conditional, since the callers (kvm_arch_{en,dis}able_virtualization()) have an empty weak fallback. I'll leave that to the kvm people to argue about, this at least gets the build going for that particular config. Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled") Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Kai Huang <kai.huang@intel.com> Cc: Chao Gao <chao.gao@intel.com> Cc: Farrah Chen <farrah.chen@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-09-29Merge tag 'mailbox-v6.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox Pull mailbox updates from Jassi Brar: - fix kconfig dependencies (mhu-v3, omap2+) - use devie name instead of genereic imx_mu_chan as interrupt name (imx) - enable sa8255p and qcs8300 ipc controllers (qcom) - Fix timeout during suspend mode (bcm2835) - convert to use use of_property_match_string (mailbox) - enable mt8188 (mediatek) - use devm_clk_get_enabled helpers (spreadtrum) - fix device-id typo (rockchip) * tag 'mailbox-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox: mailbox, remoteproc: omap2+: fix compile testing dt-bindings: mailbox: qcom-ipcc: Document QCS8300 IPCC dt-bindings: mailbox: qcom-ipcc: document the support for SA8255p dt-bindings: mailbox: mtk,adsp-mbox: Add compatible for MT8188 mailbox: Use of_property_match_string() instead of open-coding mailbox: bcm2835: Fix timeout during suspend mode mailbox: sprd: Use devm_clk_get_enabled() helpers mailbox: rockchip: fix a typo in module autoloading mailbox: imx: use device name in interrupt name mailbox: ARM_MHU_V3 should depend on ARM64
2024-09-29Merge tag 'i2c-for-6.12-rc1-additional_fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: - fix DesignWare driver ENABLE-ABORT sequence, ensuring ABORT can always be sent when needed - check for PCLK in the SynQuacer controller as an optional clock, allowing ACPI to directly provide the clock rate - KEBA driver Kconfig dependency fix - fix XIIC driver power suspend sequence * tag 'i2c-for-6.12-rc1-additional_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: xiic: Fix pm_runtime_set_suspended() with runtime pm enabled i2c: keba: I2C_KEBA should depend on KEBA_CP500 i2c: synquacer: Deal with optional PCLK correctly i2c: designware: fix controller is holding SCL low while ENABLE bit is disabled
2024-09-29Merge tag 'dma-mapping-6.12-2024-09-29' of ↵Linus Torvalds
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping fix from Christoph Hellwig: - handle chained SGLs in the new tracing code (Christoph Hellwig) * tag 'dma-mapping-6.12-2024-09-29' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: fix DMA API tracing for chained scatterlists
2024-09-29Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds
Pull more SCSI updates from James Bottomley: "These are mostly minor updates. There are two drivers (lpfc and mpi3mr) which missed the initial pull and a core change to retry a start/stop unit which affect suspend/resume" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits) scsi: lpfc: Update lpfc version to 14.4.0.5 scsi: lpfc: Support loopback tests with VMID enabled scsi: lpfc: Revise TRACE_EVENT log flag severities from KERN_ERR to KERN_WARNING scsi: lpfc: Ensure DA_ID handling completion before deleting an NPIV instance scsi: lpfc: Fix kref imbalance on fabric ndlps from dev_loss_tmo handler scsi: lpfc: Restrict support for 32 byte CDBs to specific HBAs scsi: lpfc: Update phba link state conditional before sending CMF_SYNC_WQE scsi: lpfc: Add ELS_RSP cmd to the list of WQEs to flush in lpfc_els_flush_cmd() scsi: mpi3mr: Update driver version to 8.12.0.0.50 scsi: mpi3mr: Improve wait logic while controller transitions to READY state scsi: mpi3mr: Update MPI Headers to revision 34 scsi: mpi3mr: Use firmware-provided timestamp update interval scsi: mpi3mr: Enhance the Enable Controller retry logic scsi: sd: Fix off-by-one error in sd_read_block_characteristics() scsi: pm8001: Do not overwrite PCI queue mapping scsi: scsi_debug: Remove a useless memset() scsi: pmcraid: Convert comma to semicolon scsi: sd: Retry START STOP UNIT commands scsi: mpi3mr: A performance fix scsi: ufs: qcom: Update MODE_MAX cfg_bw value ...
2024-09-29Merge tag 'bcachefs-2024-09-28' of git://evilpiepirate.org/bcachefsLinus Torvalds
Pull more bcachefs updates from Kent Overstreet: "Assorted minor syzbot fixes, and for bigger stuff: Fix two disk accounting rewrite bugs: - Disk accounting keys use the version field of bkey so that journal replay can tell which updates have been applied to the btree. This is set in the transaction commit path, after we've gotten our journal reservation (and our time ordering), but the BCH_TRANS_COMMIT_skip_accounting_apply flag that journal replay uses was incorrectly skipping this for new updates generated prior to journal replay. This fixes the underlying cause of an assertion pop in disk_accounting_read. - A couple of fixes for disk accounting + device removal. Checking if acocunting replicas entries were marked in the superblock was being done at the wrong point, when deltas in the journal could still zero them out, and then additionally we'd try to add a missing replicas entry to the superblock without checking if it referred to an invalid (removed) device. A whole slew of repair fixes: - fix infinite loop in propagate_key_to_snapshot_leaves(), this fixes an infinite loop when repairing a filesystem with many snapshots - fix incorrect transaction restart handling leading to occasional "fsck counted ..." warnings - fix warning in __bch2_fsck_err() for bkey fsck errors - check_inode() in fsck now correctly checks if the filesystem was clean - there shouldn't be pending logged ops if the fs was clean, we now check for this - remove_backpointer() doesn't remove a dirent that doesn't actually point to the inode - many more fsck errors are AUTOFIX" * tag 'bcachefs-2024-09-28' of git://evilpiepirate.org/bcachefs: (35 commits) bcachefs: check_subvol_path() now prints subvol root inode bcachefs: remove_backpointer() now checks if dirent points to inode bcachefs: dirent_points_to_inode() now warns on mismatch bcachefs: Fix lost wake up bcachefs: Check for logged ops when clean bcachefs: BCH_FS_clean_recovery bcachefs: Convert disk accounting BUG_ON() to WARN_ON() bcachefs: Fix BCH_TRANS_COMMIT_skip_accounting_apply bcachefs: Check for accounting keys with bversion=0 bcachefs: rename version -> bversion bcachefs: Don't delete unlinked inodes before logged op resume bcachefs: Fix BCH_SB_ERRS() so we can reorder bcachefs: Fix fsck warnings from bkey validation bcachefs: Move transaction commit path validation to as late as possible bcachefs: Fix disk accounting attempting to mark invalid replicas entry bcachefs: Fix unlocked access to c->disk_sb.sb in bch2_replicas_entry_validate() bcachefs: Fix accounting read + device removal bcachefs: bch_accounting_mode bcachefs: fix transaction restart handling in check_extents(), check_dirents() bcachefs: kill inode_walker_entry.seen_this_pos ...
2024-09-29Merge tag 'x86-urgent-2024-09-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Fix TDX MMIO #VE fault handling, and add two new Intel model numbers for 'Pantherlake' and 'Diamond Rapids'" * tag 'x86-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu: Add two Intel CPU model numbers x86/tdx: Fix "in-kernel MMIO" check
2024-09-29Merge tag 'locking-urgent-2024-09-29' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "lockdep: - Fix potential deadlock between lockdep and RCU (Zhiguo Niu) - Use str_plural() to address Coccinelle warning (Thorsten Blum) - Add debuggability enhancement (Luis Claudio R. Goncalves) static keys & calls: - Fix static_key_slow_dec() yet again (Peter Zijlstra) - Handle module init failure correctly in static_call_del_module() (Thomas Gleixner) - Replace pointless WARN_ON() in static_call_module_notify() (Thomas Gleixner) <linux/cleanup.h>: - Add usage and style documentation (Dan Williams) rwsems: - Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS (Waiman Long) atomic ops, x86: - Redeclare x86_32 arch_atomic64_{add,sub}() as void (Uros Bizjak) - Introduce the read64_nonatomic macro to x86_32 with cx8 (Uros Bizjak)" Signed-off-by: Ingo Molnar <mingo@kernel.org> * tag 'locking-urgent-2024-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Move is_rwsem_reader_owned() and rwsem_owner() under CONFIG_DEBUG_RWSEMS jump_label: Fix static_key_slow_dec() yet again static_call: Replace pointless WARN_ON() in static_call_module_notify() static_call: Handle module init failure correctly in static_call_del_module() locking/lockdep: Simplify character output in seq_line() lockdep: fix deadlock issue between lockdep and rcu lockdep: Use str_plural() to fix Coccinelle warning cleanup: Add usage and style documentation lockdep: suggest the fix for "lockdep bfs error:-1" on print_bfs_bug locking/atomic/x86: Redeclare x86_32 arch_atomic64_{add,sub}() as void locking/atomic/x86: Introduce the read64_nonatomic macro to x86_32 with cx8