linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-11-26	Merge tag 'vfs-6.13.exportfs' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs exportfs updates from Christian Brauner: "This contains work to bring NFS connectable file handles to userspace servers. The name_to_handle_at() system call is extended to encode connectable file handles. Such file handles can be resolved to an open file with a connected path. So far userspace NFS servers couldn't make use of this functionality even though the kernel does already support it. This is achieved by introducing a new flag for name_to_handle_at(). Similarly, the open_by_handle_at() system call is tought to understand connectable file handles explicitly created via name_to_handle_at()" * tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: open_by_handle_at() support for decoding "explicit connectable" file handles fs: name_to_handle_at() support for "explicit connectable" file handles fs: prepare for "explicit connectable" file handles
2024-11-26	Merge tag 'vfs-6.13.rust.pid_namespace' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pid_namespace rust bindings from Christian Brauner: "This contains my Rust bindings for pid namespaces needed for various rust drivers. Here's a description of the basic C semantics and how they are mapped to Rust. The pid namespace of a task doesn't ever change once the task is alive. A unshare(CLONE_NEWPID) or setns(fd_pidns/pidfd, CLONE_NEWPID) will not have an effect on the calling task's pid namespace. It will only effect the pid namespace of children created by the calling task. This invariant guarantees that after having acquired a reference to a task's pid namespace it will remain unchanged. When a task has exited and been reaped release_task() will be called. This will set the pid namespace of the task to NULL. So retrieving the pid namespace of a task that is dead will return NULL. Note, that neither holding the RCU lock nor holding a reference count to the task will prevent release_task() from being called. In order to retrieve the pid namespace of a task the task_active_pid_ns() function can be used. There are two cases to consider: (1) retrieving the pid namespace of the current task (2) retrieving the pid namespace of a non-current task From system call context retrieving the pid namespace for case (1) is always safe and requires neither RCU locking nor a reference count to be held. Retrieving the pid namespace after release_task() for current will return NULL but no codepath like that is exposed to Rust. Retrieving the pid namespace from system call context for (2) requires RCU protection. Accessing a pid namespace outside of RCU protection requires a reference count that must've been acquired while holding the RCU lock. Note that accessing a non-current task means NULL can be returned as the non-current task could have already passed through release_task(). To retrieve (1) the current_pid_ns!() macro should be used. It ensures that the returned pid namespace cannot outlive the calling scope. The associated current_pid_ns() function should not be called directly as it could be abused to created an unbounded lifetime for the pid namespace. The current_pid_ns!() macro allows Rust to handle the common case of accessing current's pid namespace without RCU protection and without having to acquire a reference count. For (2) the task_get_pid_ns() method must be used. This will always acquire a reference on the pid namespace and will return an Option to force the caller to explicitly handle the case where pid namespace is None. Something that tends to be forgotten when doing the equivalent operation in C. Missing RCU primitives make it difficult to perform operations that are otherwise safe without holding a reference count as long as RCU protection is guaranteed. But it is not important currently. But we do want it in the future. Note that for (2) the required RCU protection around calling task_active_pid_ns() synchronizes against putting the last reference of the associated struct pid of task->thread_pid. The struct pid stored in that field is used to retrieve the pid namespace of the caller. When release_task() is called task->thread_pid will be NULLed and put_pid() on said struct pid will be delayed in free_pid() via call_rcu() allowing everyone with an RCU protected access to the struct pid acquired from task->thread_pid to finish" * tag 'vfs-6.13.rust.pid_namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: rust: add PidNamespace
2024-11-26	Merge tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux	Linus Torvalds
	Pull nfsd updates from Chuck Lever: "Jeff Layton contributed a scalability improvement to NFSD's NFSv4 backchannel session implementation. This improvement is intended to increase the rate at which NFSD can safely recall NFSv4 delegations from clients, to avoid the need to revoke them. Revoking requires a slow state recovery process. A wide variety of bug fixes and other incremental improvements make up the bulk of commits in this series. As always I am grateful to the NFSD contributors, reviewers, testers, and bug reporters who participated during this cycle" * tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (72 commits) nfsd: allow for up to 32 callback session slots nfs_common: must not hold RCU while calling nfsd_file_put_local nfsd: get rid of include ../internal.h nfsd: fix nfs4_openowner leak when concurrent nfsd4_open occur NFSD: Add nfsd4_copy time-to-live NFSD: Add a laundromat reaper for async copy state NFSD: Block DESTROY_CLIENTID only when there are ongoing async COPY operations NFSD: Handle an NFS4ERR_DELAY response to CB_OFFLOAD NFSD: Free async copy information in nfsd4_cb_offload_release() NFSD: Fix nfsd4_shutdown_copy() NFSD: Add a tracepoint to record canceled async COPY operations nfsd: make nfsd4_session->se_flags a bool nfsd: remove nfsd4_session->se_bchannel nfsd: make use of warning provided by refcount_t nfsd: Don't fail OP_SETCLIENTID when there are too many clients. svcrdma: fix miss destroy percpu_counter in svc_rdma_proc_init() xdrgen: Remove program_stat_to_errno() call sites xdrgen: Update the files included in client-side source code xdrgen: Remove check for "nfs_ok" in C templates xdrgen: Remove tracepoint call site ...
2024-11-26	Merge tag 'f2fs-for-6.13-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This series introduces a device aliasing feature where user can carve out partitions but reclaim the space back by deleting aliased file in root dir. In addition to that, there're numerous minor bug fixes in zoned device support, checkpoint=disable, extent cache management, fiemap, and lazytime mount option. The full list of noticeable changes can be found below. Enhancements: - introduce device aliasing file - add stats in debugfs to show multiple devices - add a sysfs node to limit max read extent count per-inode - modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable - decrease spare area for pinned files for zoned devices Fixes: - Revert "f2fs: remove unreachable lazytime mount option parsing" - adjust unusable cap before checkpoint=disable mode - fix to drop all discards after creating snapshot on lvm device - fix to shrink read extent node in batches - fix changing cursegs if recovery fails on zoned device - fix to adjust appropriate length for fiemap - fix fiemap failure issue when page size is 16KB - fix to avoid forcing direct write to use buffered IO on inline_data inode - fix to map blocks correctly for direct write - fix to account dirty data in __get_secs_required() - fix null-ptr-deref in f2fs_submit_page_bio() - fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks" * tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits) f2fs: fix to drop all discards after creating snapshot on lvm device f2fs: add a sysfs node to limit max read extent count per-inode f2fs: fix to shrink read extent node in batches f2fs: print message if fscorrupted was found in f2fs_new_node_page() f2fs: clear SBI_POR_DOING before initing inmem curseg f2fs: fix changing cursegs if recovery fails on zoned device f2fs: adjust unusable cap before checkpoint=disable mode f2fs: fix to requery extent which cross boundary of inquiry f2fs: fix to adjust appropriate length for fiemap f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK} f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow f2fs: replace deprecated strcpy with strscpy Revert "f2fs: remove unreachable lazytime mount option parsing" f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode f2fs: fix to map blocks correctly for direct write f2fs: fix race in concurrent f2fs_stop_gc_thread f2fs: fix fiemap failure issue when page size is 16KB f2fs: remove redundant atomic file check in defragment f2fs: fix to convert log type to segment data type correctly f2fs: clean up the unused variable additional_reserved_segments ...
2024-11-26	Merge tag 'fuse-update-6.13' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add page -> folio conversions (Joanne Koong, Josef Bacik) - Allow max size of fuse requests to be configurable with a sysctl (Joanne Koong) - Allow FOPEN_DIRECT_IO to take advantage of async code path (yangyun) - Fix large kernel reads (like a module load) in virtio_fs (Hou Tao) - Fix attribute inconsistency in case readdirplus (and plain lookup in corner cases) is racing with inode eviction (Zhang Tianci) - Fix a WARN_ON triggered by virtio_fs (Asahi Lina) * tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (30 commits) virtiofs: dax: remove ->writepages() callback fuse: check attributes staleness on fuse_iget() fuse: remove pages for requests and exclusively use folios fuse: convert direct io to use folios mm/writeback: add folio_mark_dirty_lock() fuse: convert writebacks to use folios fuse: convert retrieves to use folios fuse: convert ioctls to use folios fuse: convert writes (non-writeback) to use folios fuse: convert reads to use folios fuse: convert readdir to use folios fuse: convert readlink to use folios fuse: convert cuse to use folios fuse: add support in virtio for requests using folios fuse: support folios in struct fuse_args_pages and fuse_copy_pages() fuse: convert fuse_notify_store to use folios fuse: convert fuse_retrieve to use folios fuse: use the folio based vmstat helpers fuse: convert fuse_writepage_need_send to take a folio fuse: convert fuse_do_readpage to use folios ...
2024-11-26	Merge tag 'gfs2-for-6.13' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix the code that cleans up left-over unlinked files. Various fixes and minor improvements in deleting files cached or held open remotely. - Simplify the use of dlm's DLM_LKF_QUECVT flag. - A few other minor cleanups. * tag 'gfs2-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (21 commits) gfs2: Prevent inode creation race gfs2: Only defer deletes when we have an iopen glock gfs2: Simplify DLM_LKF_QUECVT use gfs2: gfs2_evict_inode clarification gfs2: Make gfs2_inode_refresh static gfs2: Use get_random_u32 in gfs2_orlov_skip gfs2: Randomize GLF_VERIFY_DELETE work delay gfs2: Use mod_delayed_work in gfs2_queue_try_to_evict gfs2: Update to the evict / remote delete documentation gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inode gfs2: Clean up delete work processing gfs2: Minor delete_work_func cleanup gfs2: Return enum evict_behavior from gfs2_upgrade_iopen_glock gfs2: Rename dinode_demise to evict_behavior gfs2: Rename GIF_{DEFERRED -> DEFER}_DELETE gfs2: Faster gfs2_upgrade_iopen_glock wakeups KMSAN: uninit-value in inode_go_dump (5) gfs2: Fix unlinked inode cleanup gfs2: Allow immediate GLF_VERIFY_DELETE work gfs2: Initialize gl_no_formal_ino earlier ...
2024-11-26	RISC-V: Remove unnecessary include from compat.h	Palmer Dabbelt
	Without this I get a bunch of build errors like In file included from ./include/linux/sched/task_stack.h:12, from ./arch/riscv/include/asm/compat.h:12, from ./arch/riscv/include/asm/pgtable.h:115, from ./include/linux/pgtable.h:6, from ./include/linux/mm.h:30, from arch/riscv/kernel/asm-offsets.c:8: ./include/linux/kasan.h:50:37: error: ‘MAX_PTRS_PER_PTE’ undeclared here (not in a function); did you mean ‘PTRS_PER_PTE’? 50 \| extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]; \| ^~~~~~~~~~~~~~~~ \| PTRS_PER_PTE ./include/linux/kasan.h:51:8: error: unknown type name ‘pmd_t’; did you mean ‘pgd_t’? 51 \| extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD]; \| ^~~~~ \| pgd_t ./include/linux/kasan.h:51:37: error: ‘MAX_PTRS_PER_PMD’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’? 51 \| extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD]; \| ^~~~~~~~~~~~~~~~ \| PTRS_PER_PGD ./include/linux/kasan.h:52:8: error: unknown type name ‘pud_t’; did you mean ‘pgd_t’? 52 \| extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD]; \| ^~~~~ \| pgd_t ./include/linux/kasan.h:52:37: error: ‘MAX_PTRS_PER_PUD’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’? 52 \| extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD]; \| ^~~~~~~~~~~~~~~~ \| PTRS_PER_PGD ./include/linux/kasan.h:53:8: error: unknown type name ‘p4d_t’; did you mean ‘pgd_t’? 53 \| extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D]; \| ^~~~~ \| pgd_t ./include/linux/kasan.h:53:37: error: ‘MAX_PTRS_PER_P4D’ undeclared here (not in a function); did you mean ‘PTRS_PER_PGD’? 53 \| extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D]; \| ^~~~~~~~~~~~~~~~ \| PTRS_PER_PGD Link: https://lore.kernel.org/r/20241126143250.29708-1-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-11-26	Merge branch 'ovl.fixes'	Christian Brauner
	Bring in an overlayfs fix for v6.13-rc1 that fixes a bug introduced by the overlayfs changes merged for v6.13. Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-26	fs/backing_file: fix wrong argument in callback	Amir Goldstein
	Commit 48b50624aec4 ("backing-file: clean up the API") unintentionally changed the argument in the ->accessed() callback from the user file to the backing file. Fixes: 48b50624aec4 ("backing-file: clean up the API") Reported-by: syzbot+8d1206605b05ca9a0e6a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-unionfs/67447b3c.050a0220.1cc393.0085.GAE@google.com/ Tested-by: syzbot+8d1206605b05ca9a0e6a@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20241126145342.364869-1-amir73il@gmail.com Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-26	Bluetooth: SCO: remove the redundant sco_conn_put	Edward Adam Davis
	When adding conn, it is necessary to increase and retain the conn reference count at the same time. Another problem was fixed along the way, conn_put is missing when hcon is NULL in the timeout routine. Fixes: e6720779ae61 ("Bluetooth: SCO: Use kref to track lifetime of sco_conn") Reported-and-tested-by: syzbot+489f78df4709ac2bfdd3@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=489f78df4709ac2bfdd3 Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-26	Bluetooth: MGMT: Fix possible deadlocks	Luiz Augusto von Dentz
	This fixes possible deadlocks like the following caused by hci_cmd_sync_dequeue causing the destroy function to run: INFO: task kworker/u19:0:143 blocked for more than 120 seconds. Tainted: G W O 6.8.0-2024-03-19-intel-next-iLS-24ww14 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u19:0 state:D stack:0 pid:143 tgid:143 ppid:2 flags:0x00004000 Workqueue: hci0 hci_cmd_sync_work [bluetooth] Call Trace: <TASK> __schedule+0x374/0xaf0 schedule+0x3c/0xf0 schedule_preempt_disabled+0x1c/0x30 __mutex_lock.constprop.0+0x3ef/0x7a0 __mutex_lock_slowpath+0x13/0x20 mutex_lock+0x3c/0x50 mgmt_set_connectable_complete+0xa4/0x150 [bluetooth] ? kfree+0x211/0x2a0 hci_cmd_sync_dequeue+0xae/0x130 [bluetooth] ? __pfx_cmd_complete_rsp+0x10/0x10 [bluetooth] cmd_complete_rsp+0x26/0x80 [bluetooth] mgmt_pending_foreach+0x4d/0x70 [bluetooth] __mgmt_power_off+0x8d/0x180 [bluetooth] ? _raw_spin_unlock_irq+0x23/0x40 hci_dev_close_sync+0x445/0x5b0 [bluetooth] hci_set_powered_sync+0x149/0x250 [bluetooth] set_powered_sync+0x24/0x60 [bluetooth] hci_cmd_sync_work+0x90/0x150 [bluetooth] process_one_work+0x13e/0x300 worker_thread+0x2f7/0x420 ? __pfx_worker_thread+0x10/0x10 kthread+0x107/0x140 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x3d/0x60 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Tested-by: Kiran K <kiran.k@intel.com> Fixes: f53e1c9c726d ("Bluetooth: MGMT: Fix possible crash on mgmt_index_removed") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-26	Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync	Luiz Augusto von Dentz
	This fixes the following crash: ================================================================== BUG: KASAN: slab-use-after-free in set_powered_sync+0x3a/0xc0 net/bluetooth/mgmt.c:1353 Read of size 8 at addr ffff888029b4dd18 by task kworker/u9:0/54 CPU: 1 UID: 0 PID: 54 Comm: kworker/u9:0 Not tainted 6.11.0-rc6-syzkaller-01155-gf723224742fc #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 Workqueue: hci0 hci_cmd_sync_work Call Trace: <TASK> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 q kasan_report+0x143/0x180 mm/kasan/report.c:601 set_powered_sync+0x3a/0xc0 net/bluetooth/mgmt.c:1353 hci_cmd_sync_work+0x22b/0x400 net/bluetooth/hci_sync.c:328 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312 worker_thread+0x86d/0xd10 kernel/workqueue.c:3389 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Allocated by task 5247: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 poison_kmalloc_redzone mm/kasan/common.c:370 [inline] __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387 kasan_kmalloc include/linux/kasan.h:211 [inline] __kmalloc_cache_noprof+0x19c/0x2c0 mm/slub.c:4193 kmalloc_noprof include/linux/slab.h:681 [inline] kzalloc_noprof include/linux/slab.h:807 [inline] mgmt_pending_new+0x65/0x250 net/bluetooth/mgmt_util.c:269 mgmt_pending_add+0x36/0x120 net/bluetooth/mgmt_util.c:296 set_powered+0x3cd/0x5e0 net/bluetooth/mgmt.c:1394 hci_mgmt_cmd+0xc47/0x11d0 net/bluetooth/hci_sock.c:1712 hci_sock_sendmsg+0x7b8/0x11c0 net/bluetooth/hci_sock.c:1832 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 sock_write_iter+0x2dd/0x400 net/socket.c:1160 new_sync_write fs/read_write.c:497 [inline] vfs_write+0xa72/0xc90 fs/read_write.c:590 ksys_write+0x1a0/0x2c0 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Freed by task 5246: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2256 [inline] slab_free mm/slub.c:4477 [inline] kfree+0x149/0x360 mm/slub.c:4598 settings_rsp+0x2bc/0x390 net/bluetooth/mgmt.c:1443 mgmt_pending_foreach+0xd1/0x130 net/bluetooth/mgmt_util.c:259 __mgmt_power_off+0x112/0x420 net/bluetooth/mgmt.c:9455 hci_dev_close_sync+0x665/0x11a0 net/bluetooth/hci_sync.c:5191 hci_dev_do_close net/bluetooth/hci_core.c:483 [inline] hci_dev_close+0x112/0x210 net/bluetooth/hci_core.c:508 sock_do_ioctl+0x158/0x460 net/socket.c:1222 sock_ioctl+0x629/0x8e0 net/socket.c:1341 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83gv entry_SYSCALL_64_after_hwframe+0x77/0x7f Reported-by: syzbot+03d6270b6425df1605bf@syzkaller.appspotmail.com Tested-by: syzbot+03d6270b6425df1605bf@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=03d6270b6425df1605bf Fixes: 275f3f648702 ("Bluetooth: Fix not checking MGMT cmd pending queue") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-26	Merge branch 'bnxt_en-bug-fixes'	Paolo Abeni
	Michael Chan says: ==================== bnxt_en: Bug fixes This patchset fixes several things: 1. AER recovery for RoCE when NIC interface is down. 2. Set ethtool backplane link modes correctly. 3. Update RSS ring ID during RX queue restart. 4. Crash with XDP and MTU change. 5. PCIe completion timeout when reading PHC after shutdown. ==================== Link: https://patch.msgid.link/20241122224547.984808-1-michael.chan@broadcom.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	bnxt_en: Unregister PTP during PCI shutdown and suspend	Michael Chan
	If we go through the PCI shutdown or suspend path, we shutdown the NIC but PTP remains registered. If the kernel continues to run for a little bit, the periodic PTP .do_aux_work() function may be called and it will read the PHC from the BAR register. Since the device has already been disabled, it will cause a PCIe completion timeout. Fix it by calling bnxt_ptp_clear() in the PCI shutdown/suspend handlers. bnxt_ptp_clear() will unregister from PTP and .do_aux_work() will be canceled. In bnxt_resume(), we need to re-initialize PTP. Fixes: a521c8a01d26 ("bnxt_en: Move bnxt_ptp_init() from bnxt_open() back to bnxt_init_one()") Cc: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	bnxt_en: Refactor bnxt_ptp_init()	Michael Chan
	Instead of passing the 2nd parameter phc_cfg to bnxt_ptp_init(). Store it in bp->ptp_cfg so that the caller doesn't need to know what the value should be. In the next patch, we'll need to call bnxt_ptp_init() in bnxt_resume() and this will make it easier. Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	bnxt_en: Fix receive ring space parameters when XDP is active	Shravya KN
	The MTU setting at the time an XDP multi-buffer is attached determines whether the aggregation ring will be used and the rx_skb_func handler. This is done in bnxt_set_rx_skb_mode(). If the MTU is later changed, the aggregation ring setting may need to be changed and it may become out-of-sync with the settings initially done in bnxt_set_rx_skb_mode(). This may result in random memory corruption and crashes as the HW may DMA data larger than the allocated buffer size, such as: BUG: kernel NULL pointer dereference, address: 00000000000003c0 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 17 PID: 0 Comm: swapper/17 Kdump: loaded Tainted: G S OE 6.1.0-226bf9805506 #1 Hardware name: Wiwynn Delta Lake PVT BZA.02601.0150/Delta Lake-Class1, BIOS F0E_3A12 08/26/2021 RIP: 0010:bnxt_rx_pkt+0xe97/0x1ae0 [bnxt_en] Code: 8b 95 70 ff ff ff 4c 8b 9d 48 ff ff ff 66 41 89 87 b4 00 00 00 e9 0b f7 ff ff 0f b7 43 0a 49 8b 95 a8 04 00 00 25 ff 0f 00 00 <0f> b7 14 42 48 c1 e2 06 49 03 95 a0 04 00 00 0f b6 42 33f RSP: 0018:ffffa19f40cc0d18 EFLAGS: 00010202 RAX: 00000000000001e0 RBX: ffff8e2c805c6100 RCX: 00000000000007ff RDX: 0000000000000000 RSI: ffff8e2c271ab990 RDI: ffff8e2c84f12380 RBP: ffffa19f40cc0e48 R08: 000000000001000d R09: 974ea2fcddfa4cbf R10: 0000000000000000 R11: ffffa19f40cc0ff8 R12: ffff8e2c94b58980 R13: ffff8e2c952d6600 R14: 0000000000000016 R15: ffff8e2c271ab990 FS: 0000000000000000(0000) GS:ffff8e3b3f840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000003c0 CR3: 0000000e8580a004 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> __bnxt_poll_work+0x1c2/0x3e0 [bnxt_en] To address the issue, we now call bnxt_set_rx_skb_mode() within bnxt_change_mtu() to properly set the AGG rings configuration and update rx_skb_func based on the new MTU value. Additionally, BNXT_FLAG_NO_AGG_RINGS is cleared at the beginning of bnxt_set_rx_skb_mode() to make sure it gets set or cleared based on the current MTU. Fixes: 08450ea98ae9 ("bnxt_en: Fix max_mtu setting for multi-buf XDP") Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Shravya KN <shravya.k-n@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	bnxt_en: Fix queue start to update vnic RSS table	Somnath Kotur
	HWRM_RING_FREE followed by a HWRM_RING_ALLOC is not guaranteed to have the same FW ring ID as before. So we must reinitialize the RSS table with the correct ring IDs. Otherwise, traffic may not resume properly if the restarted ring ID is stale. Since this feature is only supported on P5_PLUS chips, we call bnxt_vnic_set_rss_p5() to update the HW RSS table. Fixes: 2d694c27d32e ("bnxt_en: implement netdev_queue_mgmt_ops") Cc: David Wei <dw@davidwei.uk> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	bnxt_en: Set backplane link modes correctly for ethtool	Shravya KN
	Use the return value from bnxt_get_media() to determine the port and link modes. bnxt_get_media() returns the proper BNXT_MEDIA_KR when the PHY is backplane. This will correct the ethtool settings for backplane devices. Fixes: 5d4e1bf60664 ("bnxt_en: extend media types to supported and autoneg modes") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Shravya KN <shravya.k-n@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	bnxt_en: Reserve rings after PCIe AER recovery if NIC interface is down	Saravanan Vajravel
	After successful PCIe AER recovery, FW will reset all resource reservations. If it is IF_UP, the driver will call bnxt_open() and all resources will be reserved again. It it is IF_DOWN, we should call bnxt_reserve_rings() so that we can reserve resources including RoCE resources to allow RoCE to resume after AER. Without this patch, RoCE fails to resume in this IF_DOWN scenario. Later, if it becomes IF_UP, bnxt_open() will see that resources have been reserved and will not reserve again. Fixes: fb1e6e562b37 ("bnxt_en: Fix AER recovery.") Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	thermal: sun8i: Use scoped device node handling to simplify error paths	Krzysztof Kozlowski
	Obtain the device node reference with scoped/cleanup.h to reduce error handling and make the code a bit simpler. Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20241010-b4-cleanup-h-of-node-put-thermal-v4-6-bfbe29ad81f4@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-26	thermal: tegra: Simplify with scoped for each OF child loop	Krzysztof Kozlowski
	Use scoped for_each_child_of_node_scoped() when iterating over device nodes to make code a bit simpler. Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20241010-b4-cleanup-h-of-node-put-thermal-v4-5-bfbe29ad81f4@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-26	thermal: qcom-spmi-adc-tm5: Simplify with scoped for each OF child loop	Krzysztof Kozlowski
	Use scoped for_each_available_child_of_node_scoped() when iterating over device nodes to make code a bit simpler. Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20241010-b4-cleanup-h-of-node-put-thermal-v4-4-bfbe29ad81f4@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-26	thermal: of: Use scoped device node handling to simplify of_thermal_zone_find()	Krzysztof Kozlowski
	Obtain the device node reference with scoped/cleanup.h to reduce error handling and make the code a bit simpler. Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20241010-b4-cleanup-h-of-node-put-thermal-v4-3-bfbe29ad81f4@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-26	thermal: of: Use scoped memory and OF handling to simplify ↵	Krzysztof Kozlowski
	thermal_of_trips_init() Obtain the device node reference and allocate memory with scoped/cleanup.h to reduce error handling and make the code a bit simpler. The code is not equivalent in one minor aspect: outgoing parameter "*ntrips" will not be zeroed on errors of memory allocation. This difference is not important, because code was already not zeroing it in case of earlier errors and the only caller does not rely on ntrips being 0 in case of errors. Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Chen-Yu Tsai <wenst@chromium.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://patch.msgid.link/20241010-b4-cleanup-h-of-node-put-thermal-v4-2-bfbe29ad81f4@linaro.org [ rjw: Rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-26	thermal: of: Simplify thermal_of_should_bind with scoped for each OF child	Krzysztof Kozlowski
	Use scoped for_each_child_of_node_scoped() when iterating over device nodes to make code a bit simpler. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20241010-b4-cleanup-h-of-node-put-thermal-v4-1-bfbe29ad81f4@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-11-26	thermal: gov_power_allocator: Add missing NULL pointer check	Rafael J. Wysocki
	Commit 0dc23567c206 ("thermal: core: Move lists of thermal instances to trip descriptors") overlooked the case in which the Power Allocator governor attempts to bind to a tripless thermal zone and params->trip_max is NULL in check_power_actors(). No power actors can be found in that case, so check_power_actors() needs to be made return 0 then to restore its previous behavior. Fixes: 0dc23567c206 ("thermal: core: Move lists of thermal instances to trip descriptors") Closes: https://lore.kernel.org/linux-pm/Z0NeGF4ryCe_b5rr@sashalap/ Reported-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/2761105.mvXUDI8C0e@rjwysocki.net
2024-11-26	net: hsr: fix hsr_init_sk() vs network/transport headers.	Eric Dumazet
	Following sequence in hsr_init_sk() is invalid : skb_reset_mac_header(skb); skb_reset_mac_len(skb); skb_reset_network_header(skb); skb_reset_transport_header(skb); It is invalid because skb_reset_mac_len() needs the correct network header, which should be after the mac header. This patch moves the skb_reset_network_header() and skb_reset_transport_header() before the call to dev_hard_header(). As a result skb->mac_len is no longer set to a value close to 65535. Fixes: 48b491a5cc74 ("net: hsr: fix mac_len checks") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: George McCollister <george.mccollister@gmail.com> Link: https://patch.msgid.link/20241122171343.897551-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	spi: Fix acpi deferred irq probe	Stanislaw Gruszka
	When probing spi device take care of deferred probe of ACPI irq gpio similar like for OF/DT case. >From practical standpoint this fixes issue with vsc-tp driver on Dell XP 9340 laptop, which try to request interrupt with spi->irq equal to -EPROBE_DEFER and fail to probe with the following error: vsc-tp spi-INTC10D0:00: probe with driver vsc-tp failed with error -22 Suggested-by: Hans de Goede <hdegoede@redhat.com> Fixes: 33ada67da352 ("ACPI / spi: attach GPIO IRQ from ACPI description to SPI device") Cc: stable@vger.kernel.org Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Tested-by: Alexis Lothoré <alexis.lothore@bootlin.com> # Dell XPS9320, ov01a10 Link: https://patch.msgid.link/20241122094224.226773-1-stanislaw.gruszka@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-26	spi: atmel-quadspi: Fix register name in verbose logging function	Csókás, Bence
	`atmel_qspi_reg_name()` is used for pretty-printing register offsets for verbose logging of register accesses. However, due to a typo (likely a copy-paste error), QSPI_RD's offset prints as "MR", the name of the previous register. Fix this typo. Fixes: c528ecfbef04 ("spi: atmel-quadspi: Add verbose debug facilities to monitor register accesses") Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu> Reviewed-by: Alexander Dahl <ada@thorsis.com> Link: https://patch.msgid.link/20241122141302.2599636-1-csokas.bence@prolan.hu Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-26	Merge branch 'octeontx2-af-misc-rpm-fixes'	Paolo Abeni
	Hariprasad Kelam says: ==================== octeontx2-af: misc RPM fixes There are few issues with the RPM driver, such as FIFO overflow and network performance problems due to wrong FIFO values. This patchset adds fixes for the same. Patch1: Fixes the mismatch between the lmac type reported by the driver and the actual hardware configuration. Patch2: Addresses low network performance observed even on RPMs with larger FIFO lengths. Patch 3 & 4: Fix the stale FEC counters reported by the driver by accessing the correct CSRs Patch 5: Resolves the issue related to RPM FIFO overflow during system reboots ==================== Link: https://patch.msgid.link/20241122162035.5842-1-hkelam@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	octeontx2-af: Quiesce traffic before NIX block reset	Hariprasad Kelam
	During initialization, the AF driver resets all blocks. The RPM (MAC) block and NIX block operate on a credit-based model. When the NIX block resets during active traffic flow, it doesn't release credits to the RPM block. This causes the RPM FIFO to overflow, leading to receive traffic struck. To address this issue, the patch introduces the following changes: 1. Stop receiving traffic at the MAC level during AF driver initialization. 2. Perform an X2P reset (prevents RXFIFO of all LMACS from pushing data) 3. Reset the NIX block. 4. Clear the X2P reset and re-enable receiving traffic. Fixes: 54d557815e15 ("octeontx2-af: Reset all RVU blocks") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	octeontx2-af: RPM: fix stale FCFEC counters	Hariprasad Kelam
	The corrected words register(FCFECX_VL0_CCW_LO)/Uncorrected words register (FCFECX_VL0_NCCW_LO) of FCFEC counter has different LMAC offset which needs to be accessed differently. Fixes: 84ad3642115d ("octeontx2-af: Add FEC stats for RPM/RPM_USX block") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	octeontx2-af: RPM: fix stale RSFEC counters	Hariprasad Kelam
	The earlier patch sets the 'Stats control register' for RPM receive/transmit statistics instead of RSFEC statistics, causing the driver to return stale FEC counters. Fixes: 84ad3642115d ("octeontx2-af: Add FEC stats for RPM/RPM_USX block") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	octeontx2-af: RPM: Fix low network performance	Hariprasad Kelam
	Low network performance is observed even on RPMs with larger FIFO lengths. The cn10kb silicon has three RPM blocks with the following FIFO sizes: -------------------- \| RPM0 \| 256KB \| \| RPM1 \| 256KB \| \| RPM2 \| 128KB \| -------------------- The current design stores the FIFO length in a common structure for all RPMs (mac_ops). As a result, the FIFO length of the last RPM is applied to all RPMs, leading to reduced network performance. This patch resolved the problem by storing the fifo length in per MAC structure (cgx). Fixes: b9d0fedc6234 ("octeontx2-af: cn10kb: Add RPM_USX MAC support") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	octeontx2-af: RPM: Fix mismatch in lmac type	Hariprasad Kelam
	Due to a bug in the previous patch, there is a mismatch between the lmac type reported by the driver and the actual hardware configuration. Fixes: 3ad3f8f93c81 ("octeontx2-af: cn10k: MAC internal loopback support") Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	net: stmmac: dwmac-socfpga: Set RX watchdog interrupt as broken	Maxime Chevallier
	On DWMAC3 and later, there's a RX Watchdog interrupt that's used for interrupt coalescing. It's known to be buggy on some platforms, and dwmac-socfpga appears to be one of them. Changing the interrupt coalescing from ethtool doesn't appear to have any effect here. Without disabling RIWT (Received Interrupt Watchdog Timer, I believe...), we observe latencies while receiving traffic that amount to around ~0.4ms. This was discovered with NTP but can be easily reproduced with a simple ping. Without this patch : 64 bytes from 192.168.5.2: icmp_seq=1 ttl=64 time=0.657 ms With this patch : 64 bytes from 192.168.5.2: icmp_seq=1 ttl=64 time=0.254 ms Fixes: 801d233b7302 ("net: stmmac: Add SOCFPGA glue driver") Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/20241122141256.764578-1-maxime.chevallier@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	marvell: pxa168_eth: fix call balance of pep->clk handling routines	Vitalii Mordan
	If the clock pep->clk was not enabled in pxa168_eth_probe, it should not be disabled in any path. Conversely, if it was enabled in pxa168_eth_probe, it must be disabled in all error paths to ensure proper cleanup. Use the devm_clk_get_enabled helper function to ensure proper call balance for pep->clk. Found by Linux Verification Center (linuxtesting.org) with Klever. Fixes: a49f37eed22b ("net: add Fast Ethernet driver for PXA168.") Signed-off-by: Vitalii Mordan <mordan@ispras.ru> Link: https://patch.msgid.link/20241121200658.2203871-1-mordan@ispras.ru Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26	can: mcp251xfd: mcp251xfd_get_tef_len(): work around erratum DS80000789E 6.	Marc Kleine-Budde
	Commit b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum") introduced mcp251xfd_get_tef_len() to get the number of unhandled transmit events from the Transmit Event FIFO (TEF). As the TEF has no head index, the driver uses the TX-FIFO's tail index instead, assuming that send frames are completed. When calculating the number of unhandled TEF events, that commit didn't take mcp2518fd erratum DS80000789E 6. into account. According to that erratum, the FIFOCI bits of a FIFOSTA register, here the TX-FIFO tail index might be corrupted. However here it seems the bit indicating that the TX-FIFO is empty (MCP251XFD_REG_FIFOSTA_TFERFFIF) is not correct while the TX-FIFO tail index is. Assume that the TX-FIFO is indeed empty if: - Chip's head and tail index are equal (len == 0). - The TX-FIFO is less than half full. (The TX-FIFO empty case has already been checked at the beginning of this function.) - No free buffers in the TX ring. If the TX-FIFO is assumed to be empty, assume that the TEF is full and return the number of elements in the TX-FIFO (which equals the number of TEF elements). If these assumptions are false, the driver might read to many objects from the TEF. mcp251xfd_handle_tefif_one() checks the sequence numbers and will refuse to process old events. Reported-by: Renjaya Raga Zenta <renjaya.zenta@formulatrix.com> Closes: https://patch.msgid.link/CAJ7t6HgaeQ3a_OtfszezU=zB-FqiZXqrnATJ3UujNoQJJf7GgA@mail.gmail.com Fixes: b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum") Tested-by: Renjaya Raga Zenta <renjaya.zenta@formulatrix.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20241126-mcp251xfd-fix-length-calculation-v2-1-c2ed516ed6ba@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-11-26	Merge patch series "Fix {rx,tx}_errors CAN statistics"	Marc Kleine-Budde
	Dario Binacchi <dario.binacchi@amarulasolutions.com> says: This series extends the patch 4d6d26537940 ("can: c_can: fix {rx,tx}_errors statistics"), already merged into the mainline, to other CAN devices that similarly do not correctly increment the error counters for reception/transmission. Changes in v2: - Fix patches 7 through 12 to ensure that statistics are updated even if the allocation of skb fails. - Add five new patches (i. e. 1-5), created during the further analysis of the code while correcting patches from the v1 series (i. e. 7-12). Link: https://patch.msgid.link/20241122221650.633981-1-dario.binacchi@amarulasolutions.com [mkl: omitted patch 3] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2024-11-26	LoongArch: Update Loongson-3 default config file	Huacai Chen
	1, Enable ACPI_BGRT. 2, Enable MODULE COMPRESS. 3, Enable common DM targets. 4, Enable FS_ENCRYPTION and FS_VERITY. 5, Enable CPUFreq governors and drivers. 6, Enable PVPANIC MMIO and PCI drivers. 7, Enable some HID input drivers. 8, Enable some ASoC codec drivers. 9, Enable some Realtek WiFi drivers. 10, Remove some obsolete config options. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-26	LoongArch: dts: Add I2S support to Loongson-2K2000	Binbin Zhou
	The module is supported, adding it. Not all Loongson-2K1000 boards have an i2s interface, here is an example of enabling it: sound { compatible = "loongson,ls-audio-card"; model = "Loongson-ASoC"; mclk-fs = <512>; cpu { sound-dai = <&i2s>; }; codec { sound-dai = <&es8323>; }; }; &i2c1 { status = "okay"; #address-cells = <1>; #size-cells = <0>; es8323:es8323@10 { compatible = "everest,es8323"; reg = <0x10>; #sound-dai-cells = <0>; }; }; &i2s { status = "okay"; clock-frequency = <175000000>; #sound-dai-cells = <0>; }; Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-26	LoongArch: dts: Add I2S support to Loongson-2K1000	Binbin Zhou
	The module is supported, adding it. Not all Loongson-2K1000 boards have an i2s interface, here is an example of enabling it: sound { compatible = "loongson,ls-audio-card"; model = "Loongson-ASoC"; mclk-fs = <512>; cpu { sound-dai = <&i2s>; }; codec { sound-dai = <&uda1342>; }; }; &apbdma2 { status = "okay"; }; &apbdma3 { status = "okay"; }; &i2c3 { status = "okay"; pinctrl-0 = <&i2c1_pins_default>; pinctrl-names = "default"; #address-cells = <1>; #size-cells = <0>; uda1342: codec@1a { compatible = "nxp,uda1342"; reg = <0x1a>; #sound-dai-cells = <0>; }; }; &i2s { status = "okay"; pinctrl-0 = <&hda_pins_default>; pinctrl-names = "default"; }; Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-26	LoongArch: Allow to enable PREEMPT_LAZY	Huacai Chen
	LoongArch has supported PREEMPT_RT now. It uses GENERIC_ENTRY, so just add the TIF bit (TIF_NEED_RESCHED_LAZY) related definitions and select the Kconfig symbol (ARCH_HAS_PREEMPT_LAZY) is enough to make it go. Signed-off-by: Xi Ruoyao <xry111@xry111.site> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-26	LoongArch: Allow to enable PREEMPT_RT	Huacai Chen
	It is really time. LoongArch has all the required architecture related changes, that have been identified over time, in order to enable PREEMPT_RT. With the recent printk changes, the last known road block has been addressed. Allow to enable PREEMPT_RT on LoongArch. Below are the latency data from cyclictest on a 4-core Loongson-3A5000 machine, with a "make -j8" kernel building workload in the background. 1. PREEMPT kernel with default configuration: ./cyclictest -a -t -m -i200 -d0 -p99 policy: fifo: loadavg: 8.78 8.96 8.64 10/296 64800 T: 0 ( 4592) P:99 I:200 C:14838617 Min: 3 Act: 6 Avg: 8 Max: 844 T: 1 ( 4593) P:99 I:200 C:14838765 Min: 3 Act: 9 Avg: 8 Max: 909 T: 2 ( 4594) P:99 I:200 C:14838510 Min: 3 Act: 7 Avg: 8 Max: 832 T: 3 ( 4595) P:99 I:200 C:14838631 Min: 3 Act: 8 Avg: 8 Max: 931 2. PREEMPT_RT kernel with default configuration: ./cyclictest -a -t -m -i200 -d0 -p99 policy: fifo: loadavg: 10.38 10.47 10.35 9/336 77788 T: 0 ( 3941) P:99 I:200 C:19439626 Min: 3 Act: 12 Avg: 8 Max: 227 T: 1 ( 3942) P:99 I:200 C:19439624 Min: 2 Act: 11 Avg: 8 Max: 184 T: 2 ( 3943) P:99 I:200 C:19439623 Min: 3 Act: 4 Avg: 7 Max: 223 T: 3 ( 3944) P:99 I:200 C:19439623 Min: 2 Act: 10 Avg: 7 Max: 226 3. PREEMPT_RT kernel with tuned configuration: ./cyclictest -a -t -m -i200 -d0 -p99 policy: fifo: loadavg: 10.52 10.66 10.62 12/334 109397 T: 0 ( 4765) P:99 I:200 C:29335186 Min: 3 Act: 6 Avg: 8 Max: 62 T: 1 ( 4766) P:99 I:200 C:29335185 Min: 3 Act: 10 Avg: 8 Max: 52 T: 2 ( 4767) P:99 I:200 C:29335184 Min: 3 Act: 8 Avg: 8 Max: 64 T: 3 ( 4768) P:99 I:200 C:29335183 Min: 3 Act: 12 Avg: 8 Max: 53 Main instruments of tuned configuration include: Disable the boot rom space in BIOS, in order to avoid kernel's speculative access to low- speed memory (i.e. boot rom space); Disable CPUFreq scaling; Disable RTC synchronization in the ntpd/chronyd service (also avoid other RTC accesses when running low-latency workloads). Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-26	LoongArch: Select HAVE_POSIX_CPU_TIMERS_TASK_WORK	Huacai Chen
	Move POSIX CPU timer expiry and signal delivery into task context to allow PREEMPT_RT setups to coexist with KVM. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-26	LoongArch: Fix sleeping in atomic context for PREEMPT_RT	Huacai Chen
	Commit bab1c299f3945ffe79 ("LoongArch: Fix sleeping in atomic context in setup_tlb_handler()") changes the gfp flag from GFP_KERNEL to GFP_ATOMIC for alloc_pages_node(). However, for PREEMPT_RT kernels we can still get a "sleeping in atomic context" error: [ 0.372259] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 [ 0.372266] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1 [ 0.372268] preempt_count: 1, expected: 0 [ 0.372270] RCU nest depth: 1, expected: 1 [ 0.372272] 3 locks held by swapper/1/0: [ 0.372274] #0: 900000000c9f5e60 (&pcp->lock){+.+.}-{3:3}, at: get_page_from_freelist+0x524/0x1c60 [ 0.372294] #1: 90000000087013b8 (rcu_read_lock){....}-{1:3}, at: rt_spin_trylock+0x50/0x140 [ 0.372305] #2: 900000047fffd388 (&zone->lock){+.+.}-{3:3}, at: __rmqueue_pcplist+0x30c/0xea0 [ 0.372314] irq event stamp: 0 [ 0.372316] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 0.372322] hardirqs last disabled at (0): [<9000000005947320>] copy_process+0x9c0/0x26e0 [ 0.372329] softirqs last enabled at (0): [<9000000005947320>] copy_process+0x9c0/0x26e0 [ 0.372335] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 0.372341] CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.12.0-rc7+ #1891 [ 0.372346] Hardware name: Loongson Loongson-3A5000-7A1000-1w-CRB/Loongson-LS3A5000-7A1000-1w-CRB, BIOS vUDK2018-LoongArch-V2.0.0-prebeta9 10/21/2022 [ 0.372349] Stack : 0000000000000089 9000000005a0db9c 90000000071519c8 9000000100388000 [ 0.372486] 900000010038b890 0000000000000000 900000010038b898 9000000007e53788 [ 0.372492] 900000000815bcc8 900000000815bcc0 900000010038b700 0000000000000001 [ 0.372498] 0000000000000001 4b031894b9d6b725 00000000055ec000 9000000100338fc0 [ 0.372503] 00000000000000c4 0000000000000001 000000000000002d 0000000000000003 [ 0.372509] 0000000000000030 0000000000000003 00000000055ec000 0000000000000003 [ 0.372515] 900000000806d000 9000000007e53788 00000000000000b0 0000000000000004 [ 0.372521] 0000000000000000 0000000000000000 900000000c9f5f10 0000000000000000 [ 0.372526] 90000000076f12d8 9000000007e53788 9000000005924778 0000000000000000 [ 0.372532] 00000000000000b0 0000000000000004 0000000000000000 0000000000070000 [ 0.372537] ... [ 0.372540] Call Trace: [ 0.372542] [<9000000005924778>] show_stack+0x38/0x180 [ 0.372548] [<90000000071519c4>] dump_stack_lvl+0x94/0xe4 [ 0.372555] [<900000000599b880>] __might_resched+0x1a0/0x260 [ 0.372561] [<90000000071675cc>] rt_spin_lock+0x4c/0x140 [ 0.372565] [<9000000005cbb768>] __rmqueue_pcplist+0x308/0xea0 [ 0.372570] [<9000000005cbed84>] get_page_from_freelist+0x564/0x1c60 [ 0.372575] [<9000000005cc0d98>] __alloc_pages_noprof+0x218/0x1820 [ 0.372580] [<900000000593b36c>] tlb_init+0x1ac/0x298 [ 0.372585] [<9000000005924b74>] per_cpu_trap_init+0x114/0x140 [ 0.372589] [<9000000005921964>] cpu_probe+0x4e4/0xa60 [ 0.372592] [<9000000005934874>] start_secondary+0x34/0xc0 [ 0.372599] [<900000000715615c>] smpboot_entry+0x64/0x6c This is because in PREEMPT_RT kernels normal spinlocks are replaced by rt spinlocks and rt_spin_lock() will cause sleeping. Fix it by disabling NUMA optimization completely for PREEMPT_RT kernels. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-26	LoongArch: Reduce min_delta for the arch clockevent device	Huacai Chen
	Now the min_delta is 0x600 (1536) for LoongArch's constant clockevent device. For a 100MHz hardware timer this means ~15us. This is a little big, especially for PREEMPT_RT enabled kernels. So reduce it to 100 for PREEMPT_RT kernel, and 1000 for others (we don't want too small values to affect performance). Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-26	LoongArch: BPF: Sign-extend return values	Tiezhu Yang
	(1) Description of Problem: When testing BPF JIT with the latest compiler toolchains on LoongArch, there exist some strange failed test cases, dmesg shows something like this: # dmesg -t \| grep FAIL \| head -1 ... ret -3 != -3 (0xfffffffd != 0xfffffffd)FAIL ... (2) Steps to Reproduce: # echo 1 > /proc/sys/net/core/bpf_jit_enable # modprobe test_bpf (3) Additional Info: There are no failed test cases compiled with the lower version of GCC such as 13.3.0, while the problems only appear with higher version of GCC such as 14.2.0. This is because the problems were hidden by the lower version of GCC due to redundant sign extension instructions generated by compiler, but with optimization of higher version of GCC, the sign extension instructions have been removed. (4) Root Cause Analysis: The LoongArch architecture does not expose sub-registers, and hold all 32-bit values in a sign-extended format. While BPF, on the other hand, exposes sub-registers, and use zero-extension (similar to arm64/x86). This has led to some subtle bugs, where a BPF JITted program has not sign-extended the a0 register (return value in LoongArch land), passed the return value up the kernel, for example: \| int from_bpf(void); \| \| long foo(void) \| { \| return from_bpf(); \| } Here, a0 would be 0xffffffff instead of the expected 0xffffffffffffffff. Internally, the LoongArch JIT uses a5 as a dedicated register for BPF return values. That is to say, the LoongArch BPF uses a5 for BPF return values, which are zero-extended, whereas the LoongArch ABI uses a0 which is sign-extended. (5) Final Solution: Keep a5 zero-extended, but explicitly sign-extend a0 (which is used outside BPF land). Because libbpf currently defines the return value of an ebpf program as a 32-bit unsigned integer, just use addi.w to extend bit 31 into bits 63 through 32 of a5 to a0. This is similar to commit 2f1b0d3d7331 ("riscv, bpf: Sign-extend return values"). Fixes: 5dc615520c4d ("LoongArch: Add BPF JIT support") Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-26	LoongArch: Fix build failure with GCC 15 (-std=gnu23)	Tiezhu Yang
	Whenever I try to build the kernel with upcoming GCC 15 which defaults to -std=gnu23 I get a build failure: CC arch/loongarch/vdso/vgetcpu.o In file included from ./include/uapi/linux/posix_types.h:5, from ./include/uapi/linux/types.h:14, from ./include/linux/types.h:6, from ./include/linux/kasan-checks.h:5, from ./include/asm-generic/rwonce.h:26, from ./arch/loongarch/include/generated/asm/rwonce.h:1, from ./include/linux/compiler.h:317, from ./include/asm-generic/bug.h:5, from ./arch/loongarch/include/asm/bug.h:60, from ./include/linux/bug.h:5, from ./include/linux/mmdebug.h:5, from ./include/linux/mm.h:6, from ./arch/loongarch/include/asm/vdso.h:10, from arch/loongarch/vdso/vgetcpu.c:6: ./include/linux/stddef.h:11:9: error: expected identifier before 'false' 11 \| false = 0, \| ^~~~~ ./include/linux/types.h:35:33: error: two or more data types in declaration specifiers 35 \| typedef _Bool bool; \| ^~~~ ./include/linux/types.h:35:1: warning: useless type name in empty declaration 35 \| typedef _Bool bool; \| ^~~~~~~ The kernel builds explicitly with -std=gnu11 in top Makefile, but arch/loongarch/vdso does not use KBUILD_CFLAGS from the rest of the kernel, just add -std=gnu11 flag to arch/loongarch/vdso/Makefile. By the way, commit e8c07082a810 ("Kbuild: move to -std=gnu11") did a similar change for arch/arm64/kernel/vdso32/Makefile. Fixes: c6b99bed6b8f ("LoongArch: Add VDSO and VSYSCALL support") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-11-26	LoongArch: Explicitly specify code model in Makefile	Huacai Chen
	LoongArch's toolchain may change the default code model from normal to medium. This is unnecessary for kernel, and generates some relocations which cannot be handled by the module loader. So explicitly specify the code model to normal in Makefile (for Rust 'normal' is 'small'). Cc: stable@vger.kernel.org Tested-by: Haiyong Sun <sunhaiyong@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>