linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-11-18	nfsd: remove nfsd4_session->se_bchannel	Jeff Layton
	This field is written and is never consulted again. Remove it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	nfsd: make use of warning provided by refcount_t	NeilBrown
	refcount_t, by design, checks for unwanted situations and provides warnings. It is rarely useful to have explicit warnings with refcount usage. In this case we have an explicit warning if a refcount_t reaches zero when decremented. Simply using refcount_dec() will provide a similar warning and also mark the refcount_t as saturated to avoid any possible use-after-free. This patch drops the warning and uses refcount_dec() instead of refcount_dec_and_test(). Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	nfsd: Don't fail OP_SETCLIENTID when there are too many clients.	NeilBrown
	Failing OP_SETCLIENTID or OP_EXCHANGE_ID should only happen if there is memory allocation failure. Putting a hard limit on the number of clients is not really helpful as it will either happen too early and prevent clients that the server can easily handle, or too late and allow clients when the server is swamped. The calculated limit is still useful for expiring courtesy clients where there are "too many" clients, but it shouldn't prevent the creation of active clients. Testing of lots of clients against small-mem servers reports repeated NFS4ERR_DELAY responses which doesn't seem helpful. There may have been reports of similar problems in production use. Also remove an outdated comment - we do use a slab cache. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	svcrdma: fix miss destroy percpu_counter in svc_rdma_proc_init()	Ye Bin
	There's issue as follows: RPC: Registered rdma transport module. RPC: Registered rdma backchannel transport module. RPC: Unregistered rdma transport module. RPC: Unregistered rdma backchannel transport module. BUG: unable to handle page fault for address: fffffbfff80c609a PGD 123fee067 P4D 123fee067 PUD 123fea067 PMD 10c624067 PTE 0 Oops: Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI RIP: 0010:percpu_counter_destroy_many+0xf7/0x2a0 Call Trace: <TASK> __die+0x1f/0x70 page_fault_oops+0x2cd/0x860 spurious_kernel_fault+0x36/0x450 do_kern_addr_fault+0xca/0x100 exc_page_fault+0x128/0x150 asm_exc_page_fault+0x26/0x30 percpu_counter_destroy_many+0xf7/0x2a0 mmdrop+0x209/0x350 finish_task_switch.isra.0+0x481/0x840 schedule_tail+0xe/0xd0 ret_from_fork+0x23/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> If register_sysctl() return NULL, then svc_rdma_proc_cleanup() will not destroy the percpu counters which init in svc_rdma_proc_init(). If CONFIG_HOTPLUG_CPU is enabled, residual nodes may be in the 'percpu_counters' list. The above issue may occur once the module is removed. If the CONFIG_HOTPLUG_CPU configuration is not enabled, memory leakage occurs. To solve above issue just destroy all percpu counters when register_sysctl() return NULL. Fixes: 1e7e55731628 ("svcrdma: Restore read and write stats") Fixes: 22df5a22462e ("svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter") Fixes: df971cd853c0 ("svcrdma: Convert rdma_stat_recv to a per-CPU counter") Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	xdrgen: Remove program_stat_to_errno() call sites	Chuck Lever
	Refactor: Translating an on-the-wire value to a local host errno is architecturally a job for the proc function, not the XDR decoder. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	xdrgen: Update the files included in client-side source code	Chuck Lever
	In particular, client-side source code needs the definition of "struct rpc_procinfo" and does not want header files that pull in "struct svc_rqst". Otherwise, the source does not compile. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	xdrgen: Remove check for "nfs_ok" in C templates	Chuck Lever
	Obviously, "nfs_ok" is defined only for NFS protocols. Other XDR protocols won't know "nfs_ok" from Adam. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	xdrgen: Remove tracepoint call site	Chuck Lever
	This tracepoint was a "note to self" and is not operational. It is added only to client-side code, which so far we haven't needed. It will cause immediate breakage once we start generating client code, though, so remove it now. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	nfsd: release svc_expkey/svc_export with rcu_work	Yang Erkun
	The last reference for `cache_head` can be reduced to zero in `c_show` and `e_show`(using `rcu_read_lock` and `rcu_read_unlock`). Consequently, `svc_export_put` and `expkey_put` will be invoked, leading to two issues: 1. The `svc_export_put` will directly free ex_uuid. However, `e_show`/`c_show` will access `ex_uuid` after `cache_put`, which can trigger a use-after-free issue, shown below. ================================================================== BUG: KASAN: slab-use-after-free in svc_export_show+0x362/0x430 [nfsd] Read of size 1 at addr ff11000010fdc120 by task cat/870 CPU: 1 UID: 0 PID: 870 Comm: cat Not tainted 6.12.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x53/0x70 print_address_description.constprop.0+0x2c/0x3a0 print_report+0xb9/0x280 kasan_report+0xae/0xe0 svc_export_show+0x362/0x430 [nfsd] c_show+0x161/0x390 [sunrpc] seq_read_iter+0x589/0x770 seq_read+0x1e5/0x270 proc_reg_read+0xe1/0x140 vfs_read+0x125/0x530 ksys_read+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Allocated by task 830: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 __kmalloc_node_track_caller_noprof+0x1bc/0x400 kmemdup_noprof+0x22/0x50 svc_export_parse+0x8a9/0xb80 [nfsd] cache_do_downcall+0x71/0xa0 [sunrpc] cache_write_procfs+0x8e/0xd0 [sunrpc] proc_reg_write+0xe1/0x140 vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 868: kasan_save_stack+0x20/0x40 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0x37/0x50 kfree+0xf3/0x3e0 svc_export_put+0x87/0xb0 [nfsd] cache_purge+0x17f/0x1f0 [sunrpc] nfsd_destroy_serv+0x226/0x2d0 [nfsd] nfsd_svc+0x125/0x1e0 [nfsd] write_threads+0x16a/0x2a0 [nfsd] nfsctl_transaction_write+0x74/0xa0 [nfsd] vfs_write+0x1a5/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e 2. We cannot sleep while using `rcu_read_lock`/`rcu_read_unlock`. However, `svc_export_put`/`expkey_put` will call path_put, which subsequently triggers a sleeping operation due to the following `dput`. ============================= WARNING: suspicious RCU usage 5.10.0-dirty #141 Not tainted ----------------------------- ... Call Trace: dump_stack+0x9a/0xd0 ___might_sleep+0x231/0x240 dput+0x39/0x600 path_put+0x1b/0x30 svc_export_put+0x17/0x80 e_show+0x1c9/0x200 seq_read_iter+0x63f/0x7c0 seq_read+0x226/0x2d0 vfs_read+0x113/0x2c0 ksys_read+0xc9/0x170 do_syscall_64+0x33/0x40 entry_SYSCALL_64_after_hwframe+0x67/0xd1 Fix these issues by using `rcu_work` to help release `svc_expkey`/`svc_export`. This approach allows for an asynchronous context to invoke `path_put` and also facilitates the freeing of `uuid/exp/key` after an RCU grace period. Fixes: 9ceddd9da134 ("knfsd: Allow lockless lookups of the exports") Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	SUNRPC: make sure cache entry active before cache_show	Yang Erkun
	The function `c_show` was called with protection from RCU. This only ensures that `cp` will not be freed. Therefore, the reference count for `cp` can drop to zero, which will trigger a refcount use-after-free warning when `cache_get` is called. To resolve this issue, use `cache_get_rcu` to ensure that `cp` remains active. ------------[ cut here ]------------ refcount_t: addition on 0; use-after-free. WARNING: CPU: 7 PID: 822 at lib/refcount.c:25 refcount_warn_saturate+0xb1/0x120 CPU: 7 UID: 0 PID: 822 Comm: cat Not tainted 6.12.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:refcount_warn_saturate+0xb1/0x120 Call Trace: <TASK> c_show+0x2fc/0x380 [sunrpc] seq_read_iter+0x589/0x770 seq_read+0x1e5/0x270 proc_reg_read+0xe1/0x140 vfs_read+0x125/0x530 ksys_read+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	nfsd: make sure exp active before svc_export_show	Yang Erkun
	The function `e_show` was called with protection from RCU. This only ensures that `exp` will not be freed. Therefore, the reference count for `exp` can drop to zero, which will trigger a refcount use-after-free warning when `exp_get` is called. To resolve this issue, use `cache_get_rcu` to ensure that `exp` remains active. ------------[ cut here ]------------ refcount_t: addition on 0; use-after-free. WARNING: CPU: 3 PID: 819 at lib/refcount.c:25 refcount_warn_saturate+0xb1/0x120 CPU: 3 UID: 0 PID: 819 Comm: cat Not tainted 6.12.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:refcount_warn_saturate+0xb1/0x120 ... Call Trace: <TASK> e_show+0x20b/0x230 [nfsd] seq_read_iter+0x589/0x770 seq_read+0x1e5/0x270 vfs_read+0x125/0x530 ksys_read+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: bf18f163e89c ("NFSD: Using exp_get for export getting") Cc: stable@vger.kernel.org # 4.20+ Signed-off-by: Yang Erkun <yangerkun@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	lockd: Remove unneeded initialization of file_lock::c.flc_flags	Chuck Lever
	Since commit 75c7940d2a86 ("lockd: set missing fl_flags field when retrieving args"), nlmsvc_retrieve_args() initializes the flc_flags field. svcxdr_decode_lock() no longer needs to do this. This clean up removes one dependency on the nlm_lock:fl field. No behavior change is expected. Analysis: svcxdr_decode_lock() is called by: nlm4svc_decode_testargs() nlm4svc_decode_lockargs() nlm4svc_decode_cancargs() nlm4svc_decode_unlockargs() nlm4svc_decode_testargs() is used by: - NLMPROC4_TEST and NLMPROC4_TEST_MSG, which call nlmsvc_retrieve_args() - NLMPROC4_GRANTED and NLMPROC4_GRANTED_MSG, which don't pass the lock's file_lock to the generic lock API nlm4svc_decode_lockargs() is used by: - NLMPROC4_LOCK and NLM4PROC4_LOCK_MSG, which call nlmsvc_retrieve_args() - NLMPROC4_UNLOCK and NLM4PROC4_UNLOCK_MSG, which call nlmsvc_retrieve_args() - NLMPROC4_NM_LOCK, which calls nlmsvc_retrieve_args() nlm4svc_decode_cancargs() is used by: - NLMPROC4_CANCEL and NLMPROC4_CANCEL_MSG, which call nlmsvc_retrieve_args() nlm4svc_decode_unlockargs() is used by: - NLMPROC4_UNLOCK and NLMPROC4_UNLOCK_MSG, which call nlmsvc_retrieve_args() All callers except GRANTED/GRANTED_MSG eventually call nlmsvc_retrieve_args() before using nlm_lock::fl.c.flc_flags. Thus this change is safe. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	lockd: Remove unused parameter to nlmsvc_testlock()	Chuck Lever
	The nlm_cookie parameter has been unused since commit 09802fd2a8ca ("lockd: rip out deferred lock handling from testlock codepath"). Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	lockd: Remove some snippets of unfinished code	Chuck Lever
	Clean up. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	lockd: Remove unnecessary memset()	Chuck Lever
	Since commit 103cc1fafee4 ("SUNRPC: Parametrize how much of argsize should be zeroed") (and possibly long before that, even) all of the memory underlying rqstp->rq_argp is zeroed already. There's no need for the memset() in nlm4svc_decode_shareargs(). Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	lockd: Remove unused typedef	Chuck Lever
	Clean up: Looks like the last usage of this typedef was removed by commit 026fec7e7c47 ("sunrpc: properly type pc_decode callbacks") in 2017. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	NFSD: Cap the number of bytes copied by nfs4_reset_recoverydir()	Chuck Lever
	It's only current caller already length-checks the string, but let's be safe. Fixes: 0964a3d3f1aa ("[PATCH] knfsd: nfsd4 reboot dirname fix") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	NFSD: Remove unused values from nfsd4_encode_components_esc()	Chuck Lever
	Clean up. The computed value of @p is saved each time through the loop but is never used. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	NFSD: Remove unused results in nfsd4_encode_pathname4()	Chuck Lever
	Clean up. The result of "*p++" is saved, but is not used before it is overwritten. The result of xdr_encode_opaque() is saved each time through the loop but is never used. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	NFSD: Prevent NULL dereference in nfsd4_process_cb_update()	Chuck Lever
	@ses is initialized to NULL. If __nfsd4_find_backchannel() finds no available backchannel session, setup_callback_client() will try to dereference @ses and segfault. Fixes: dcbeaa68dbbd ("nfsd4: allow backchannel recovery") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	NFSD: Remove a never-true comparison	Chuck Lever
	fh_size is an unsigned int, thus it can never be less than 0. Fixes: d8b26071e65e ("NFSD: simplify struct nfsfh") Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	NFSD: Remove dead code in nfsd4_create_session()	Chuck Lever
	Clean up. AFAICT, there is no way to reach the out_free_conn label with @old set to a non-NULL value, so the expire_client(old) call is never reached and can be removed. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	nfsd: refine and rename NFSD_MAY_LOCK	NeilBrown
	NFSD_MAY_LOCK means a few different things. - it means that GSS is not required. - it means that with NFSEXP_NOAUTHNLM, authentication is not required - it means that OWNER_OVERRIDE is allowed. None of these are specific to locking, they are specific to the NLM protocol. So: - rename to NFSD_MAY_NLM - set NFSD_MAY_OWNER_OVERRIDE and NFSD_MAY_BYPASS_GSS in nlm_fopen() so that NFSD_MAY_NLM doesn't need to imply these. - move the test on NFSEXP_NOAUTHNLM out of nfsd_permission() and into fh_verify where other special-case tests on the MAY flags happen. nfsd_permission() can be called from other places than fh_verify(), but none of these will have NFSD_MAY_NLM. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	NFSD: Replace use of NFSD_MAY_LOCK in nfsd4_lock()	Chuck Lever
	NFSv4 LOCK operations should not avoid the set of authorization checks that apply to all other NFSv4 operations. Also, the "no_auth_nlm" export option should apply only to NLM LOCK requests. It's not necessary or sensible to apply it to NFSv4 LOCK operations. Instead, set no permission bits when calling fh_verify(). Subsequent stateid processing handles authorization checks. Reported-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	nfsd: replace call_rcu by kfree_rcu for simple kmem_cache_free callback	Julia Lawall
	Since SLOB was removed and since commit 6c6c47b063b5 ("mm, slab: call kvfree_rcu_barrier() from kmem_cache_destroy()"), it is not necessary to use call_rcu when the callback only performs kmem_cache_free. Use kfree_rcu() directly. The changes were made using Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Reviewed-by: Jeff Layton <jlayton@kernel.org> Acked-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	xdrgen: Add a utility for extracting XDR from RFCs	Chuck Lever
	For convenience, copy the XDR extraction script from RFC Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	nfsd: Fix NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT	Pali Rohár
	Currently NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT do not bypass only GSS, but bypass any method. This is a problem specially for NFS3 AUTH_NULL-only exports. The purpose of NFSD_MAY_BYPASS_GSS_ON_ROOT is described in RFC 2623, section 2.3.2, to allow mounting NFS2/3 GSS-only export without authentication. So few procedures which do not expose security risk used during mount time can be called also with AUTH_NONE or AUTH_SYS, to allow client mount operation to finish successfully. The problem with current implementation is that for AUTH_NULL-only exports, the NFSD_MAY_BYPASS_GSS_ON_ROOT is active also for NFS3 AUTH_UNIX mount attempts which confuse NFS3 clients, and make them think that AUTH_UNIX is enabled and is working. Linux NFS3 client never switches from AUTH_UNIX to AUTH_NONE on active mount, which makes the mount inaccessible. Fix the NFSD_MAY_BYPASS_GSS and NFSD_MAY_BYPASS_GSS_ON_ROOT implementation and really allow to bypass only exports which have enabled some real authentication (GSS, TLS, or any other). The result would be: For AUTH_NULL-only export if client attempts to do mount with AUTH_UNIX flavor then it will receive access errors, which instruct client that AUTH_UNIX flavor is not usable and will either try other auth flavor (AUTH_NULL if enabled) or fails mount procedure. Similarly if client attempt to do mount with AUTH_NULL flavor and only AUTH_UNIX flavor is enabled then the client will receive access error. This should fix problems with AUTH_NULL-only or AUTH_UNIX-only exports if client attempts to mount it with other auth flavor (e.g. with AUTH_NULL for AUTH_UNIX-only export, or with AUTH_UNIX for AUTH_NULL-only export). Signed-off-by: Pali Rohár <pali@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	nfsd: Fill NFSv4.1 server implementation fields in OP_EXCHANGE_ID response	Pali Rohár
	NFSv4.1 OP_EXCHANGE_ID response from server may contain server implementation details (domain, name and build time) in optional nfs_impl_id4 field. Currently nfsd does not fill this field. Send these information in NFSv4.1 OP_EXCHANGE_ID response. Fill them with the same values as what is Linux NFSv4.1 client doing. Domain is hardcoded to "kernel.org", name is composed in the same way as "uname -srvm" output and build time is hardcoded to zeros. NFSv4.1 client and server implementation fields are useful for statistic purposes or for identifying type of clients and servers. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	lockd: Fix comment about NLMv3 backwards compatibility	Pali Rohár
	NLMv2 is completely different protocol than NLMv1 and NLMv3, and in original Sun implementation is used for RPC loopback callbacks from statd to lockd services. Linux does not use nor does not implement NLMv2. Hence, NLMv3 is not backward compatible with NLMv2. But NLMv3 is backward compatible with NLMv1. Fix comment. Signed-off-by: Pali Rohár <pali@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	nfsd: new tracepoint for after op_func in compound processing	Jeff Layton
	Turn nfsd_compound_encode_err tracepoint into a class and add a new nfsd_compound_op_err tracepoint. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-11-18	Merge tag 'for-6.13/io_uring-20241118' of git://git.kernel.dk/linux	Linus Torvalds
	Pull io_uring updates from Jens Axboe: - Cleanups of the eventfd handling code, making it fully private. - Support for sending a sync message to another ring, without having a ring available to send a normal async message. - Get rid of the separate unlocked hash table, unify everything around the single locked one. - Add support for ring resizing. It can be hard to appropriately size the CQ ring upfront, if the application doesn't know how busy it will be. This results in applications sizing rings for the most busy case, which can be wasteful. With ring resizing, they can start small and grow the ring, if needed. - Add support for fixed wait regions, rather than needing to copy the same wait data tons of times for each wait operation. - Rewrite the resource node handling, which before was serialized per ring. This caused issues with particularly fixed files, where one file waiting on IO could hold up putting and freeing of other unrelated files. Now each node is handled separately. New code is much simpler too, and was a net 250 line reduction in code. - Add support for just doing partial buffer clones, rather than always cloning the entire buffer table. - Series adding static NAPI support, where a specific NAPI instance is used rather than having a list of them available that need lookup. - Add support for mapped regions, and also convert the fixed wait support mentioned above to that concept. This avoids doing special mappings for various planned features, and folds the existing registered wait into that too. - Add support for hybrid IO polling, which is a variant of strict IOPOLL but with an initial sleep delay to avoid spinning too early and wasting resources on devices that aren't necessarily in the < 5 usec category wrt latencies. - Various cleanups and little fixes. * tag 'for-6.13/io_uring-20241118' of git://git.kernel.dk/linux: (79 commits) io_uring/region: fix error codes after failed vmap io_uring: restore back registered wait arguments io_uring: add memory region registration io_uring: introduce concept of memory regions io_uring: temporarily disable registered waits io_uring: disable ENTER_EXT_ARG_REG for IOPOLL io_uring: fortify io_pin_pages with a warning switch io_msg_ring() to CLASS(fd) io_uring: fix invalid hybrid polling ctx leaks io_uring/uring_cmd: fix buffer index retrieval io_uring/rsrc: add & apply io_req_assign_buf_node() io_uring/rsrc: remove '->ctx_ptr' of 'struct io_rsrc_node' io_uring/rsrc: pass 'struct io_ring_ctx' reference to rsrc helpers io_uring: avoid normal tw intermediate fallback io_uring/napi: add static napi tracking strategy io_uring/napi: clean up __io_napi_do_busy_loop io_uring/napi: Use lock guards io_uring/napi: improve __io_napi_add io_uring/napi: fix io_napi_entry RCU accesses io_uring/napi: protect concurrent io_napi_entry timeout accesses ...
2024-11-18	Merge tag 'for-6.13/block-20241118' of git://git.kernel.dk/linux	Linus Torvalds
	Pull block updates from Jens Axboe: - NVMe updates via Keith: - Use uring_cmd helper (Pavel) - Host Memory Buffer allocation enhancements (Christoph) - Target persistent reservation support (Guixin) - Persistent reservation tracing (Guixen) - NVMe 2.1 specification support (Keith) - Rotational Meta Support (Matias, Wang, Keith) - Volatile cache detection enhancment (Guixen) - MD updates via Song: - Maintainers update - raid5 sync IO fix - Enhance handling of faulty and blocked devices - raid5-ppl atomic improvement - md-bitmap fix - Support for manually defining embedded partition tables - Zone append fixes and cleanups - Stop sending the queued requests in the plug list to the driver ->queue_rqs() handle in reverse order. - Zoned write plug cleanups - Cleanups disk stats tracking and add support for disk stats for passthrough IO - Add preparatory support for file system atomic writes - Add lockdep support for queue freezing. Already found a bunch of issues, and some fixes for that are in here. More will be coming. - Fix race between queue stopping/quiescing and IO queueing - ublk recovery improvements - Fix ublk mmap for 64k pages - Various fixes and cleanups * tag 'for-6.13/block-20241118' of git://git.kernel.dk/linux: (118 commits) MAINTAINERS: Update git tree for mdraid subsystem block: make struct rq_list available for !CONFIG_BLOCK block/genhd: use seq_put_decimal_ull for diskstats decimal values block: don't reorder requests in blk_mq_add_to_batch block: don't reorder requests in blk_add_rq_to_plug block: add a rq_list type block: remove rq_list_move virtio_blk: reverse request order in virtio_queue_rqs nvme-pci: reverse request order in nvme_queue_rqs btrfs: validate queue limits block: export blk_validate_limits nvmet: add tracing of reservation commands nvme: parse reservation commands's action and rtype to string nvmet: report ns's vwc not present md/raid5: Increase r5conf.cache_name size block: remove the ioprio field from struct request block: remove the write_hint field from struct request nvme: check ns's volatile write cache not present nvme: add rotational support nvme: use command set independent id ns if available ...
2024-11-18	Merge tag 'ata-6.13-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux Pull ata updates from Niklas Cassel: - Fix typos in comments (Yan Zhen) - Remove unused macro definitions (Damien Le Moal) - Switch back to the .remove() callback (Uwe Kleine-König) - Make use of the get_unaligned_be24() helper instead of open coding (Andy Shevchenko) - Refactor and cleanup ata_scsi_simulate() command emulation, such that all commands use ata_scsi_rbuf_fill() with its own callback (Damien Le Moal) - Improve ata_scsi_simulate() command emulation by accurately setting the SCSI command residual (number of bytes not filled) in the command reply (Damien Le Moal) - Add missing iommus property in ahci-platform device tree binding (Frank Wunderlich) * tag 'ata-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux: dt-bindings: ata: ahci-platform: add missing iommus property ata: libata-scsi: Return residual for emulated SCSI commands ata: libata-scsi: Remove struct ata_scsi_args ata: libata-scsi: Document all VPD page inquiry actors ata: libata-scsi: Refactor ata_scsiop_maint_in() ata: libata-scsi: Refactor ata_scsiop_read_cap() ata: libata-scsi: Refactor ata_scsi_simulate() ata: libata-scsi: Refactor scsi_6_lba_len() with use of get_unaligned_be24() ata: Switch back to struct platform_driver::remove() ata: libata: Remove unused macro definitions ata: Fix typos in the comment
2024-11-18	Merge tag 'for-6.13-tag' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "Changes outside of btrfs: add io_uring command flag to track a dying task (the rest will go via the block git tree). User visible changes: - wire encoded read (ioctl) to io_uring commands, this can be used on itself, in the future this will allow 'send' to be asynchronous. As a consequence, the encoded read ioctl can also work in non-blocking mode - new ioctl to wait for cleaned subvolumes, no need to use the generic and root-only SEARCH_TREE ioctl, will be used by "btrfs subvol sync" - recognize different paths/symlinks for the same devices and don't report them during rescanning, this can be observed with LVM or DM - seeding device use case change, the sprout device (the one capturing new writes) will not clear the read-only status of the super block; this prevents accumulating space from deleted snapshots Performance improvements: - reduce lock contention when traversing extent buffers - reduce extent tree lock contention when searching for inline backref - switch from rb-trees to xarray for delayed ref tracking, improvements due to better cache locality, branching factors and more compact data structures - enable extent map shrinker again (prevent memory exhaustion under some types of IO load), reworked to run in a single worker thread (there used to be problems causing long stalls under memory pressure) Core changes: - raid-stripe-tree feature updates: - make device replace and scrub work - implement partial deletion of stripe extents - new selftests - split the config option BTRFS_DEBUG and add EXPERIMENTAL for features that are experimental or with known problems so we don't misuse debugging config for that - subpage mode updates (sector < page): - update compression implementations - update writepage, writeback - continued folio API conversions: - buffered writes - make buffered write copy one page at a time, preparatory work for future integration with large folios, may cause performance drop - proper locking of root item regarding starting send - error handling improvements - code cleanups and refactoring: - dead code removal - unused parameter reduction - lockdep assertions" * tag 'for-6.13-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (119 commits) btrfs: send: check for read-only send root under critical section btrfs: send: check for dead send root under critical section btrfs: remove check for NULL fs_info at btrfs_folio_end_lock_bitmap() btrfs: fix warning on PTR_ERR() against NULL device at btrfs_control_ioctl() btrfs: fix a typo in btrfs_use_zone_append btrfs: avoid superfluous calls to free_extent_map() in btrfs_encoded_read() btrfs: simplify logic to decrement snapshot counter at btrfs_mksnapshot() btrfs: remove hole from struct btrfs_delayed_node btrfs: update stale comment for struct btrfs_delayed_ref_node::add_list btrfs: add new ioctl to wait for cleaned subvolumes btrfs: simplify range tracking in cow_file_range() btrfs: remove conditional path allocation in btrfs_read_locked_inode() btrfs: push cleanup into btrfs_read_locked_inode() io_uring/cmd: let cmds to know about dying task btrfs: add struct io_btrfs_cmd as type for io_uring_cmd_to_pdu() btrfs: add io_uring command for encoded reads (ENCODED_READ ioctl) btrfs: move priv off stack in btrfs_encoded_read_regular_fill_pages() btrfs: don't sleep in btrfs_encoded_read() if IOCB_NOWAIT is set btrfs: change btrfs_encoded_read() so that reading of extent is done by caller btrfs: remove pointless iocb::ki_pos addition in btrfs_encoded_read() ...
2024-11-18	Merge tag 'ext4_for_linus-6.13-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "A lot of miscellaneous ext4 bug fixes and cleanups this cycle, most notably in the journaling code, bufered I/O, and compiler warning cleanups" * tag 'ext4_for_linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (33 commits) jbd2: Fix comment describing journal_init_common() ext4: prevent an infinite loop in the lazyinit thread ext4: use struct_size() to improve ext4_htree_store_dirent() ext4: annotate struct fname with __counted_by() jbd2: avoid dozens of -Wflex-array-member-not-at-end warnings ext4: use str_yes_no() helper function ext4: prevent delalloc to nodelalloc on remount jbd2: make b_frozen_data allocation always succeed ext4: cleanup variable name in ext4_fc_del() ext4: use string choices helpers jbd2: remove the 'success' parameter from the jbd2_do_replay() function jbd2: remove useless 'block_error' variable jbd2: factor out jbd2_do_replay() jbd2: refactor JBD2_COMMIT_BLOCK process in do_one_pass() jbd2: unified release of buffer_head in do_one_pass() jbd2: remove redundant judgments for check v1 checksum ext4: use ERR_CAST to return an error-valued pointer mm: zero range of eof folio exposed by inode size extension ext4: partial zero eof block on unaligned inode size extension ext4: disambiguate the return value of ext4_dio_write_end_io() ...
2024-11-18	dt-bindings: net: renesas,ether: Drop undocumented "micrel,led-mode"	Rob Herring (Arm)
	"micrel,led-mode" is not yet documented by a schema. It's irrelevant to the example, so just drop it. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20241113225742.1784723-2-robh@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-18	pinctrl: airoha: Use unsigned long for bit search	Kees Cook
	Instead of risking alignment problems and causing (false positive) array bound warnings when casting a u32 to (64-bit) unsigned long, just use a native unsigned long for doing bit searches. Avoids warning with GCC 15's -Warray-bounds -fdiagnostics-details: In file included from ../include/linux/bitmap.h:11, from ../include/linux/cpumask.h:12, from ../arch/x86/include/asm/paravirt.h:21, from ../arch/x86/include/asm/irqflags.h:80, from ../include/linux/irqflags.h:18, from ../include/linux/spinlock.h:59, from ../include/linux/irq.h:14, from ../include/linux/irqchip/chained_irq.h:10, from ../include/linux/gpio/driver.h:8, from ../drivers/pinctrl/mediatek/pinctrl-airoha.c:11: In function 'find_next_bit', inlined from 'airoha_irq_handler' at ../drivers/pinctrl/mediatek/pinctrl-airoha.c:2394:3: ../include/linux/find.h:65:23: error: array subscript 'long unsigned int[0]' is partly outside array bounds of 'u32[1]' {aka 'unsigned int[1]'} [-Werror=array-bounds=] 65 \| val = *addr & GENMASK(size - 1, offset); \| ^~~~~ ../drivers/pinctrl/mediatek/pinctrl-airoha.c: In function 'airoha_irq_handler': ../drivers/pinctrl/mediatek/pinctrl-airoha.c:2387:21: note: object 'status' of size 4 2387 \| u32 status; \| ^~~~~~ Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/20241117114534.work.292-kees@kernel.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2024-11-18	Merge tag 'pull-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds
	Pull statx updates from Al Viro: "Sanitize struct filename and lookup flags handling in statx and friends" * tag 'pull-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: libfs: kill empty_dir_getattr() fs: Simplify getattr interface function checking AT_GETATTR_NOSEC flag fs/stat.c: switch to CLASS(fd_raw) kill getname_statx_lookup_flags() io_statx_prep(): use getname_uflags()
2024-11-18	pinctrl: k210: Undef K210_PC_DEFAULT	zhang jiao
	When the temporary macro K210_PC_DEFAULT is not needed anymore, use its name in the #undef statement instead of the incorrect "DEFAULT" name. Fixes: d4c34d09ab03 ("pinctrl: Add RISC-V Canaan Kendryte K210 FPIOA driver") Signed-off-by: zhang jiao <zhangjiao2@cmss.chinamobile.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/20241113071201.5440-1-zhangjiao2@cmss.chinamobile.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2024-11-18	jump_label: rust: pass a mut ptr to `static_key_count`	Alice Ryhl
	When building the rust_print sample with CONFIG_JUMP_LABEL=n, the Rust static key support falls back to using static_key_count. This function accepts a mutable pointer to the `struct static_key`, but the Rust abstractions are incorrectly passing a const pointer. This means that builds using CONFIG_JUMP_LABEL=n and SAMPLE_RUST_PRINT=y fail with the following error message: error[E0308]: mismatched types --> <root>/samples/rust/rust_print_main.rs:87:5 \| 87 \| / kernel::declare_trace! { 88 \| \| /// # Safety 89 \| \| /// 90 \| \| /// Always safe to call. 91 \| \| unsafe fn rust_sample_loaded(magic: c_int); 92 \| \| } \| \| ^ \| \| \| \| \|_____types differ in mutability \| arguments to this function are incorrect \| = note: expected raw pointer `mut kernel::bindings::static_key` found raw pointer `const kernel::bindings::static_key` note: function defined here --> <root>/rust/bindings/bindings_helpers_generated.rs:33:12 \| 33 \| pub fn static_key_count(key: *mut static_key) -> c_int; \| ^^^^^^^^^^^^^^^^ To fix this, insert a pointer cast so that the pointer is mutable. Link: https://lore.kernel.org/20241118202727.73646-1-aliceryhl@google.com Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202411181440.qEdcuyh6-lkp@intel.com/ Fixes: 169484ab6677 ("rust: add arch_static_branch") Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-18	Merge branch 'for-6.13/bpf' into for-linus	Jiri Kosina
	- improvement of the way hid-bpf coexists with specific drivers (others than hid-generic) that are already bound to devices (Benjamin Tissoires)
2024-11-18	Merge branch 'for-6.13/bug-on-to-warn-on' into for-linus	Jiri Kosina
	- removal of three way-too-aggressive BUG_ON()s from HID drivers (He Lugang)
2024-11-18	Merge branch 'for-6.13/core' into for-linus	Jiri Kosina
	- assorted cleanups and small code fixes (Dmitry Torokhov, Yan Zhen, Nathan Chancellor, Andy Shevchenko)
2024-11-18	Merge branch 'for-6.13/corsair' into for-linus	Jiri Kosina
	- support for Corsair Void headset family (Stuart Hayhurst)
2024-11-18	Merge tag 'pull-ufs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	Linus Torvalds
	Pull ufs updates from Al Viro: "ufs cleanups, fixes and folio conversion" * tag 'pull-ufs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: ufs: ufs_sb_private_info: remove unused s_{2,3}apb fields ufs: Convert ufs_change_blocknr() to take a folio ufs: Pass a folio to ufs_new_fragments() ufs: Convert ufs_inode_getfrag() to take a folio ufs: Convert ufs_extend_tail() to take a folio ufs: Convert ufs_inode_getblock() to take a folio ufs: take the handling of free block counters into a helper clean ufs_trunc_direct() up a bit... ufs: get rid of ubh_{ubhcpymem,memcpyubh}() ufs_inode_getfrag(): remove junk comment ufs_free_fragments(): fix the braino in sanity check ufs_clusteracct(): switch to passing fragment number ufs: untangle ubh_...block...(), part 3 ufs: untangle ubh_...block...(), part 2 ufs: untangle ubh_...block...() macros, part 1 ufs: fix ufs_read_cylinder() failure handling ufs: missing ->splice_write() ufs: fix handling of delete_entry and set_link failures
2024-11-18	Merge branch 'for-6.13/goodix' into for-linus	Jiri Kosina
	- Support for Goodix GT7986U SPI (Charles Wang) - assorted code cleanups and fixes (Charles Wang)
2024-11-18	Merge branch 'for-6.13/i2c-hid' into for-linus	Jiri Kosina
	- code cleanup (Uwe Kleine-König)
2024-11-18	Merge branch 'for-6.13/intel-ish' into for-linus	Jiri Kosina
	- exposing firmware versions for Intel-ISH devices that load firmware from the host (Zhang Lixu) - switch to flex-array members (Erick Archer)
2024-11-18	Merge branch 'for-6.13/kysona' into for-linus	Jiri Kosina
	- initial vendor-specific driver for Kysona, currently adding support for Kysona M600 (Lode Willems)
2024-11-18	Merge branch 'for-6.13/logitech' into for-linus	Jiri Kosina
	- unused variable removal in hidpp_root_get_feature() (Bastien Nocera)