summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2023-01-25cifs: Fix oops due to uncleared server->smbd_conn in reconnectDavid Howells
In smbd_destroy(), clear the server->smbd_conn pointer after freeing the smbd_connection struct that it points to so that reconnection doesn't get confused. Fixes: 8ef130f9ec27 ("CIFS: SMBD: Implement function to destroy a SMB Direct connection") Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Long Li <longli@microsoft.com> Cc: Pavel Shilovsky <piastryyy@gmail.com> Cc: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-24Merge tag 'nfsd-6.2-5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fix from Chuck Lever: - Nail another UAF in NFSD's filecache * tag 'nfsd-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: don't free files unconditionally in __nfsd_file_cache_purge
2023-01-24Merge tag 'scmi-updates-6.3' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/drivers Arm SCMI updates for v6.3 The main addition is a unified userspace interface for SCMI irrespective of the underlying transport and along with some changed to refactor the SCMI stack probing sequence. 1. SCMI unified userspace interface This is to have a unified way of testing an SCMI platform firmware implementation for compliance, fuzzing etc., from the perspective of the non-secure OSPM irrespective of the underlying transport supporting SCMI. It is just for testing/development and not a feature intended fo use in production. Currently an SCMI Compliance Suite[1] can only work by injecting SCMI messages using the mailbox test driver only which makes it transport specific and can't be used with any other transport like virtio, smc/hvc, optee, etc. Also the shared memory can be transport specific and it is better to even abstract/hide those details while providing the userspace access. So in order to scale with any transport, we need a unified interface for the same. In order to achieve that, SCMI "raw mode support" is being added through debugfs which is more configurable as well. A userspace application can inject bare SCMI binary messages into the SCMI core stack; such messages will be routed by the SCMI regular kernel stack to the backend platform firmware using the configured transport transparently. This eliminates the to know about the specific underlying transport internals that will be taken care of by the SCMI core stack itself. Further no additional changes needed in the device tree like in the mailbox-test driver. [1] https://gitlab.arm.com/tests/scmi-tests 2. Refactoring of the SCMI stack probing sequence On some platforms, SCMI transport can be provide by OPTEE/TEE which introduces certain dependency in the probe ordering. In order to address the same, the SCMI bus is split into its own module which continues to be initialized at subsys_initcall, while the SCMI core stack, including its various transport backends (like optee, mailbox, virtio, smc), is now moved into a separate module at module_init level. This allows the other possibly dependent subsystems to register and/or access SCMI bus well before the core SCMI stack and its dependent transport backends. * tag 'scmi-updates-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: (31 commits) firmware: arm_scmi: Clarify raw per-channel ABI documentation firmware: arm_scmi: Add per-channel raw injection support firmware: arm_scmi: Add the raw mode co-existence support firmware: arm_scmi: Call raw mode hooks from the core stack firmware: arm_scmi: Reject SCMI drivers when configured in raw mode firmware: arm_scmi: Add debugfs ABI documentation for raw mode firmware: arm_scmi: Add core raw transmission support firmware: arm_scmi: Add debugfs ABI documentation for common entries firmware: arm_scmi: Populate a common SCMI debugfs root debugfs: Export debugfs_create_str symbol include: trace: Add platform and channel instance references firmware: arm_scmi: Add internal platform/channel identifiers firmware: arm_scmi: Move errors defs and code to common.h firmware: arm_scmi: Add xfer helpers to provide raw access firmware: arm_scmi: Add flags field to xfer firmware: arm_scmi: Refactor scmi_wait_for_message_response firmware: arm_scmi: Refactor polling helpers firmware: arm_scmi: Refactor xfer in-flight registration routines firmware: arm_scmi: Split bus and driver into distinct modules firmware: arm_scmi: Introduce a new lifecycle for protocol devices ... Link: https://lore.kernel.org/r/20230120162152.1438456-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-01-24ext4: make xattr char unsignedness in hash explicitLinus Torvalds
Commit f3bbac32475b ("ext4: deal with legacy signed xattr name hash values") added a hashing function for the legacy case of having the xattr hash calculated using a signed 'char' type. It left the unsigned case alone, since it's all implicitly handled by the '-funsigned-char' compiler option. However, there's been some noise about back-porting it all into stable kernels that lack the '-funsigned-char', so let's just make that at least possible by making the whole 'this uses unsigned char' very explicit in the code itself. Whether such a back-port is really warranted or not, I'll leave to others, but at least together with this change it is technically sensible. Also, add a 'pr_warn_once()' for reporting the "hey, signedness for this hash calculation has changed" issue. Hopefully it never triggers except for that xfstests generic/454 test-case, but even if it does it's just good information to have. If for no other reason than "we can remove the legacy signed hash code entirely if nobody ever sees the message any more". Cc: Sasha Levin <sashal@kernel.org> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Andreas Dilger <adilger@dilger.ca> Cc: Theodore Ts'o <tytso@mit.edu>, Cc: Jason Donenfeld <Jason@zx2c4.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-01-24fuse: fixes after adapting to new posix acl apiChristian Brauner
This cycle we ported all filesystems to the new posix acl api. While looking at further simplifications in this area to remove the last remnants of the generic dummy posix acl handlers we realized that we regressed fuse daemons that don't set FUSE_POSIX_ACL but still make use of posix acls. With the change to a dedicated posix acl api interacting with posix acls doesn't go through the old xattr codepaths anymore and instead only relies the get acl and set acl inode operations. Before this change fuse daemons that don't set FUSE_POSIX_ACL were able to get and set posix acl albeit with two caveats. First, that posix acls aren't cached. And second, that they aren't used for permission checking in the vfs. We regressed that use-case as we currently refuse to retrieve any posix acls if they aren't enabled via FUSE_POSIX_ACL. So older fuse daemons would see a change in behavior. We can restore the old behavior in multiple ways. We could change the new posix acl api and look for a dedicated xattr handler and if we find one prefer that over the dedicated posix acl api. That would break the consistency of the new posix acl api so we would very much prefer not to do that. We could introduce a new ACL_*_CACHE sentinel that would instruct the vfs permission checking codepath to not call into the filesystem and ignore acls. But a more straightforward fix for v6.2 is to do the same thing that Overlayfs does and give fuse a separate get acl method for permission checking. Overlayfs uses this to express different needs for vfs permission lookup and acl based retrieval via the regular system call path as well. Let fuse do the same for now. This way fuse can continue to refuse to retrieve posix acls for daemons that don't set FUSE_POSXI_ACL for permission checking while allowing a fuse server to retrieve it via the usual system calls. In the future, we could extend the get acl inode operation to not just pass a simple boolean to indicate rcu lookup but instead make it a flag argument. Then in addition to passing the information that this is an rcu lookup to the filesystem we could also introduce a flag that tells the filesystem that this is a request from the vfs to use these acls for permission checking. Then fuse could refuse the get acl request for permission checking when the daemon doesn't have FUSE_POSIX_ACL set in the same get acl method. This would also help Overlayfs and allow us to remove the second method for it as well. But since that change is more invasive as we need to update the get acl inode operation for multiple filesystems we should not do this as a fix for v6.2. Instead we will do this for the v6.3 merge window. Fwiw, since posix acls are now always correctly translated in the new posix acl api we could also allow them to be used for daemons without FUSE_POSIX_ACL that are not mounted on the host. But this is behavioral change and again if dones should be done for v6.3. For now, let's just restore the original behavior. A nice side-effect of this change is that for fuse daemons with and without FUSE_POSIX_ACL the same code is used for posix acls in a backwards compatible way. This also means we can remove the legacy xattr handlers completely. We've also added comments to explain the expected behavior for daemons without FUSE_POSIX_ACL into the code. Fixes: 318e66856dde ("xattr: use posix acl api") Signed-off-by: Seth Forshee (Digital Ocean) <sforshee@kernel.org> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-24Merge branch 'iomap-for-next' of ↵Andreas Gruenbacher
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git
2023-01-23fs: dlm: remove unnecessary waker_up() callsAlexander Aring
The wake_up() is already handled inside of midcomms_node_reset() when switching the state to CLOSED state. So there is not need to call it after midcomms_node_reset() again. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: move state change into else branchAlexander Aring
Currently we can switch at first into DLM_CLOSE_WAIT state and then do another state change if a condition is true. Instead of doing two state changes we handle the other state change inside an else branch of this condition. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: remove newline in log_printAlexander Aring
There is an API difference between log_print() and other printk()s to put a newline or not. This one was introduced by mistake because log_print() adds a newline. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: reduce the shutdown timeout to 5 secsAlexander Aring
When a shutdown is stuck, time out after 5 seconds instead of 3 minutes. After this timeout we try a forced shutdown. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: make dlm sequence id more robustAlexander Aring
When joining a new lockspace, use a random number to initialize a sequence number used in messages. This makes it easier to detect sequence number mismatches in message replies during tests that repeatedly join and leave a lockspace. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: wait until all midcomms nodes detect versionAlexander Aring
The current dlm version detection is very complex due to backwards compatablilty with earlier dlm protocol versions. It takes some time to detect if a peer node has a specific DLM version. If it's not detected, we just cut the socket connection. There could be cases where the local node has not detected the version yet, but the peer node has. In these cases, we are trying to shutdown the dlm connection with a FIN/ACK message exchange to be sure the other peer is ready to shutdown the connection on dlm application level. However this mechanism is only available on DLM protocol version 3.2 and we need to be sure the DLM version is detected before. To make it more robust we introduce a a "best effort" wait to wait for the version detection before shutdown the dlm connection. This need to be done before the kthread recoverd for recovery handling is stopped, because recovery handling will trigger enough messages to have a version detection going on. It is a corner case which was detected by modprobe dlm_locktroture module and rmmod dlm_locktorture module directly afterwards (in a looping behaviour). In practice probably nobody would leave a lockspace immediately after joining it. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: ignore unexpected non dlm opts msgsAlexander Aring
This patch ignores unexpected RCOM_NAMES/RCOM_STATUS messages. To be backwards compatible, those messages are not part of the new reliable DLM OPTS encapsulation header, and have their own retransmit handling using sequence number matching When we get unexpected non dlm opts messages, we should allow them and let RCOM message handling filter them out using sequence numbers. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: bring back previous shutdown handlingAlexander Aring
This patch mostly reverts commit 4f567acb0b86 ("fs: dlm: remove socket shutdown handling"). There can be situations where the dlm midcomms nodes hash and lowcomms connection hash are not equal, but we need to guarantee that the lowcomms are all closed on a last release of a dlm lockspace, when a shutdown is invoked. This patch guarantees that we always close all sockets managed by the lowcomms connection hash, and calls shutdown for the last message sent. This ensures we don't cut the socket, which could cause the peer to get a connection reset. In future we should try to merge the midcomms/lowcomms hashes into one hash and not handle both in separate hashes. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: send FIN ack back in right casesAlexander Aring
This patch moves to send a ack back for receiving a FIN message only when we are in valid states. In other cases and there might be a sender waiting for a ack we just let it timeout at the senders time and hopefully all other cleanups will remove the FIN message on their sending queue. As an example we should never send out an ACK being in LAST_ACK state or we cannot assume a working socket communication when we are in CLOSED state. Cc: stable@vger.kernel.org Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: move sending fin message into state change handlingAlexander Aring
This patch moves the send fin handling, which should appear in a specific state change, into the state change handling while the per node state_lock is held. I experienced issues with other messages because we changed the state and a fin message was sent out in a different state. Cc: stable@vger.kernel.org Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: don't set stop rx flag after node resetAlexander Aring
Similar to the stop tx flag, the rx flag should warn about a dlm message being received at DLM_FIN state change, when we are assuming no other dlm application messages. If we receive a FIN message and we are in the state DLM_FIN_WAIT2 we call midcomms_node_reset() which puts the midcomms node into DLM_CLOSED state. Afterwards we should not set the DLM_NODE_FLAG_STOP_RX flag any more. This patch changes the setting DLM_NODE_FLAG_STOP_RX in those state changes when we receive a FIN message and we assume there will be no other dlm application messages received until we hit DLM_CLOSED state. Cc: stable@vger.kernel.org Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: fix race setting stop tx flagAlexander Aring
This patch sets the stop tx flag before we commit the dlm message. This flag will report about unexpected transmissions after we send the DLM_FIN message out, which should be the last message sent. When we commit the dlm fin message, it could be that we already got an ack back and the CLOSED state change already happened. We should not set this flag when we are in CLOSED state. To avoid this race we simply set the tx flag before the state change can be in progress by moving it before dlm_midcomms_commit_mhandle(). Cc: stable@vger.kernel.org Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: be sure to call dlm_send_queue_flush()Alexander Aring
If we release a midcomms node structure, there should be nothing left inside the dlm midcomms send queue. However, sometimes this is not true because I believe some DLM_FIN message was not acked... if we run into a shutdown timeout, then we should be sure there is no pending send dlm message inside this queue when releasing midcomms node structure. Cc: stable@vger.kernel.org Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: fix use after free in midcomms commitAlexander Aring
While working on processing dlm message in softirq context I experienced the following KASAN use-after-free warning: [ 151.760477] ================================================================== [ 151.761803] BUG: KASAN: use-after-free in dlm_midcomms_commit_mhandle+0x19d/0x4b0 [ 151.763414] Read of size 4 at addr ffff88811a980c60 by task lock_torture/1347 [ 151.765284] CPU: 7 PID: 1347 Comm: lock_torture Not tainted 6.1.0-rc4+ #2828 [ 151.766778] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-3.module+el8.7.0+16134+e5908aa2 04/01/2014 [ 151.768726] Call Trace: [ 151.769277] <TASK> [ 151.769748] dump_stack_lvl+0x5b/0x86 [ 151.770556] print_report+0x180/0x4c8 [ 151.771378] ? kasan_complete_mode_report_info+0x7c/0x1e0 [ 151.772241] ? dlm_midcomms_commit_mhandle+0x19d/0x4b0 [ 151.773069] kasan_report+0x93/0x1a0 [ 151.773668] ? dlm_midcomms_commit_mhandle+0x19d/0x4b0 [ 151.774514] __asan_load4+0x7e/0xa0 [ 151.775089] dlm_midcomms_commit_mhandle+0x19d/0x4b0 [ 151.775890] ? create_message.isra.29.constprop.64+0x57/0xc0 [ 151.776770] send_common+0x19f/0x1b0 [ 151.777342] ? remove_from_waiters+0x60/0x60 [ 151.778017] ? lock_downgrade+0x410/0x410 [ 151.778648] ? __this_cpu_preempt_check+0x13/0x20 [ 151.779421] ? rcu_lockdep_current_cpu_online+0x88/0xc0 [ 151.780292] _convert_lock+0x46/0x150 [ 151.780893] convert_lock+0x7b/0xc0 [ 151.781459] dlm_lock+0x3ac/0x580 [ 151.781993] ? 0xffffffffc0540000 [ 151.782522] ? torture_stop+0x120/0x120 [dlm_locktorture] [ 151.783379] ? dlm_scan_rsbs+0xa70/0xa70 [ 151.784003] ? preempt_count_sub+0xd6/0x130 [ 151.784661] ? is_module_address+0x47/0x70 [ 151.785309] ? torture_stop+0x120/0x120 [dlm_locktorture] [ 151.786166] ? 0xffffffffc0540000 [ 151.786693] ? lockdep_init_map_type+0xc3/0x360 [ 151.787414] ? 0xffffffffc0540000 [ 151.787947] torture_dlm_lock_sync.isra.3+0xe9/0x150 [dlm_locktorture] [ 151.789004] ? torture_stop+0x120/0x120 [dlm_locktorture] [ 151.789858] ? 0xffffffffc0540000 [ 151.790392] ? lock_torture_cleanup+0x20/0x20 [dlm_locktorture] [ 151.791347] ? delay_tsc+0x94/0xc0 [ 151.791898] torture_ex_iter+0xc3/0xea [dlm_locktorture] [ 151.792735] ? torture_start+0x30/0x30 [dlm_locktorture] [ 151.793606] lock_torture+0x177/0x270 [dlm_locktorture] [ 151.794448] ? torture_dlm_lock_sync.isra.3+0x150/0x150 [dlm_locktorture] [ 151.795539] ? lock_torture_stats+0x80/0x80 [dlm_locktorture] [ 151.796476] ? do_raw_spin_lock+0x11e/0x1e0 [ 151.797152] ? mark_held_locks+0x34/0xb0 [ 151.797784] ? _raw_spin_unlock_irqrestore+0x30/0x70 [ 151.798581] ? __kthread_parkme+0x79/0x110 [ 151.799246] ? trace_preempt_on+0x2a/0xf0 [ 151.799902] ? __kthread_parkme+0x79/0x110 [ 151.800579] ? preempt_count_sub+0xd6/0x130 [ 151.801271] ? __kasan_check_read+0x11/0x20 [ 151.801963] ? __kthread_parkme+0xec/0x110 [ 151.802630] ? lock_torture_stats+0x80/0x80 [dlm_locktorture] [ 151.803569] kthread+0x192/0x1d0 [ 151.804104] ? kthread_complete_and_exit+0x30/0x30 [ 151.804881] ret_from_fork+0x1f/0x30 [ 151.805480] </TASK> [ 151.806111] Allocated by task 1347: [ 151.806681] kasan_save_stack+0x26/0x50 [ 151.807308] kasan_set_track+0x25/0x30 [ 151.807920] kasan_save_alloc_info+0x1e/0x30 [ 151.808609] __kasan_slab_alloc+0x63/0x80 [ 151.809263] kmem_cache_alloc+0x1ad/0x830 [ 151.809916] dlm_allocate_mhandle+0x17/0x20 [ 151.810590] dlm_midcomms_get_mhandle+0x96/0x260 [ 151.811344] _create_message+0x95/0x180 [ 151.811994] create_message.isra.29.constprop.64+0x57/0xc0 [ 151.812880] send_common+0x129/0x1b0 [ 151.813467] _convert_lock+0x46/0x150 [ 151.814074] convert_lock+0x7b/0xc0 [ 151.814648] dlm_lock+0x3ac/0x580 [ 151.815199] torture_dlm_lock_sync.isra.3+0xe9/0x150 [dlm_locktorture] [ 151.816258] torture_ex_iter+0xc3/0xea [dlm_locktorture] [ 151.817129] lock_torture+0x177/0x270 [dlm_locktorture] [ 151.817986] kthread+0x192/0x1d0 [ 151.818518] ret_from_fork+0x1f/0x30 [ 151.819369] Freed by task 1336: [ 151.819890] kasan_save_stack+0x26/0x50 [ 151.820514] kasan_set_track+0x25/0x30 [ 151.821128] kasan_save_free_info+0x2e/0x50 [ 151.821812] __kasan_slab_free+0x107/0x1a0 [ 151.822483] kmem_cache_free+0x204/0x5e0 [ 151.823152] dlm_free_mhandle+0x18/0x20 [ 151.823781] dlm_mhandle_release+0x2e/0x40 [ 151.824454] rcu_core+0x583/0x1330 [ 151.825047] rcu_core_si+0xe/0x20 [ 151.825594] __do_softirq+0xf4/0x5c2 [ 151.826450] Last potentially related work creation: [ 151.827238] kasan_save_stack+0x26/0x50 [ 151.827870] __kasan_record_aux_stack+0xa2/0xc0 [ 151.828609] kasan_record_aux_stack_noalloc+0xb/0x20 [ 151.829415] call_rcu+0x4c/0x760 [ 151.829954] dlm_mhandle_delete+0x97/0xb0 [ 151.830718] dlm_process_incoming_buffer+0x2fc/0xb30 [ 151.831524] process_dlm_messages+0x16e/0x470 [ 151.832245] process_one_work+0x505/0xa10 [ 151.832905] worker_thread+0x67/0x650 [ 151.833507] kthread+0x192/0x1d0 [ 151.834046] ret_from_fork+0x1f/0x30 [ 151.834900] The buggy address belongs to the object at ffff88811a980c30 which belongs to the cache dlm_mhandle of size 88 [ 151.836894] The buggy address is located 48 bytes inside of 88-byte region [ffff88811a980c30, ffff88811a980c88) [ 151.839007] The buggy address belongs to the physical page: [ 151.839904] page:0000000076cf5d62 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11a980 [ 151.841378] flags: 0x8000000000000200(slab|zone=2) [ 151.842141] raw: 8000000000000200 0000000000000000 dead000000000122 ffff8881089b43c0 [ 151.843401] raw: 0000000000000000 0000000000220022 00000001ffffffff 0000000000000000 [ 151.844640] page dumped because: kasan: bad access detected [ 151.845822] Memory state around the buggy address: [ 151.846602] ffff88811a980b00: fb fb fb fb fc fc fc fc fa fb fb fb fb fb fb fb [ 151.847761] ffff88811a980b80: fb fb fb fc fc fc fc fa fb fb fb fb fb fb fb fb [ 151.848921] >ffff88811a980c00: fb fb fc fc fc fc fa fb fb fb fb fb fb fb fb fb [ 151.850076] ^ [ 151.851085] ffff88811a980c80: fb fc fc fc fc fa fb fb fb fb fb fb fb fb fb fb [ 151.852269] ffff88811a980d00: fc fc fc fc fa fb fb fb fb fb fb fb fb fb fb fc [ 151.853428] ================================================================== [ 151.855618] Disabling lock debugging due to kernel taint It is accessing a mhandle in dlm_midcomms_commit_mhandle() and the mhandle was freed by a call_rcu() call in dlm_process_incoming_buffer(), dlm_mhandle_delete(). It looks like it was freed because an ack of this message was received. There is a short race between committing the dlm message to be transmitted and getting an ack back. If the ack is faster than returning from dlm_midcomms_commit_msg_3_2(), then we run into a use-after free because we still need to reference the mhandle when calling srcu_read_unlock(). To avoid that, we don't allow that mhandle to be freed between dlm_midcomms_commit_msg_3_2() and srcu_read_unlock() by using rcu read lock. We can do that because mhandle is protected by rcu handling. Cc: stable@vger.kernel.org Fixes: 489d8e559c65 ("fs: dlm: add reliable connection if reconnect") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23fs: dlm: start midcomms before scandAlexander Aring
The scand kthread can send dlm messages out, especially dlm remove messages to free memory for unused rsb on other nodes. To send out dlm messages, midcomms must be initialized. This patch moves the midcomms start before scand is started. Cc: stable@vger.kernel.org Fixes: e7fd41792fc0 ("[DLM] The core of the DLM for GFS2/CLVM") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-01-23nfsd: don't free files unconditionally in __nfsd_file_cache_purgeJeff Layton
nfsd_file_cache_purge is called when the server is shutting down, in which case, tearing things down is generally fine, but it also gets called when the exports cache is flushed. Instead of walking the cache and freeing everything unconditionally, handle it the same as when we have a notification of conflicting access. Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Reported-by: Ruben Vestergaard <rubenv@drcmr.dk> Reported-by: Torkil Svensgaard <torkil@drcmr.dk> Reported-by: Shachar Kagan <skagan@nvidia.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Shachar Kagan <skagan@nvidia.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-01-23net/sock: Introduce trace_sk_data_ready()Peilin Ye
As suggested by Cong, introduce a tracepoint for all ->sk_data_ready() callback implementations. For example: <...> iperf-609 [002] ..... 70.660425: sk_data_ready: family=2 protocol=6 func=sock_def_readable iperf-609 [002] ..... 70.660436: sk_data_ready: family=2 protocol=6 func=sock_def_readable <...> Suggested-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-23zonefs: Cache zone group directory inodesDamien Le Moal
Since looking up any zone file inode requires looking up first the inode for the directory representing the zone group of the file, ensuring that the zone group inodes are always cached is desired. To do so, take an extra reference on the zone groups directory inodes on mount, thus avoiding the eviction of these inodes from the inode cache until the volume is unmounted. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
2023-01-23zonefs: Dynamically create file inodes when neededDamien Le Moal
Allocating and initializing all inodes and dentries for all files results in a very large memory usage with high capacity zoned block devices. For instance, with a 26 TB SMR HDD with over 96000 zones, mounting the disk with zonefs results in about 130 MB of memory used, the vast majority of this space being used for vfs inodes and dentries. However, since a user will rarely access all zones at the same time, dynamically creating file inodes and dentries on demand, similarly to regular file systems, can significantly reduce memory usage. This patch modifies mount processing to not create the inodes and dentries for zone files. Instead, the directory inode operation zonefs_lookup() and directory file operation zonefs_readdir() are introduced to allocate and initialize inodes on-demand using the helper functions zonefs_get_dir_inode() and zonefs_get_zgroup_inode(). Implementation of these functions is simple, relying on the static nature of zonefs directories and files. Directory inodes are linked to the volume zone groups (struct zonefs_zone_group) they represent by using the directory inode i_private field. This simplifies the implementation of the lookup and readdir operations. Unreferenced zone file inodes can be evicted from the inode cache at any time. In such case, the only inode information that cannot be recreated from the zone information that is saved in the zone group data structures attached to the volume super block is the inode uid, gid and access rights. These values may have been changed by the user. To keep these attributes for the life time of the mount, as before, the inode mode, uid and gid are saved in the inode zone information and the saved values are used to initialize regular file inodes when an inode lookup happens. The zone information mode, uid and gid are initialized in zonefs_init_zgroup() using the default values. With these changes, the static minimal memory usage of a zonefs volume is mostly reduced to the array of zone information for each zone group. For the 26 TB SMR hard-disk mentioned above, the memory usage after mount becomes about 5.4 MB, a reduction by a factor of 24 from the initial 130 MB memory use. Co-developed-by: Jorgen Hansen <Jorgen.Hansen@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
2023-01-23zonefs: Separate zone information from inode informationDamien Le Moal
In preparation for adding dynamic inode allocation, separate an inode zone information from the zonefs inode structure. The new data structure zonefs_zone is introduced to store in memory information about a zone that must be kept throughout the lifetime of the device mount. Linking between a zone file inode and its zone information is done by setting the inode i_private field to point to a struct zonefs_zone. Using the i_private pointer avoids the need for adding a pointer in struct zonefs_inode_info. Beside the vfs inode, this structure is reduced to a mutex and a write open counter. One struct zonefs_zone is created per file inode on mount. These structures are organized in an array using the new struct zonefs_zone_group data structure to represent zone groups. The zonefs_zone arrays are indexed per file number (the index of a struct zonefs_zone in its array directly gives the file number/name for that zone file inode). Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
2023-01-23zonefs: Reduce struct zonefs_inode_info sizeDamien Le Moal
Instead of using the i_ztype field in struct zonefs_inode_info to indicate the zone type of an inode, introduce the new inode flag ZONEFS_ZONE_CNV to be set in the i_flags field of struct zonefs_inode_info to identify conventional zones. If this flag is not set, the zone of an inode is considered to be a sequential zone. The helpers zonefs_zone_is_cnv(), zonefs_zone_is_seq(), zonefs_inode_is_cnv() and zonefs_inode_is_seq() are introduced to simplify testing the zone type of a struct zonefs_inode_info and of a struct inode. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
2023-01-23zonefs: Simplify IO error handlingDamien Le Moal
Simplify zonefs_check_zone_condition() by moving the code that changes an inode access rights to the new function zonefs_inode_update_mode(). Furthermore, since on mount an inode wpoffset is always zero when zonefs_check_zone_condition() is called during an inode initialization, the "mount" boolean argument is not necessary for the readonly zone case. This argument is thus removed. zonefs_io_error_cb() is also modified to use the inode offline and zone state flags instead of checking the device zone condition. The multiple calls to zonefs_check_zone_condition() are reduced to the first call on entry, which allows removing the "warn" argument. zonefs_inode_update_mode() is also used to update an inode access rights as zonefs_io_error_cb() modifies the inode flags depending on the volume error handling mode (defined with a mount option). Since an inode mode change differs for read-only zones between mount time and IO error time, the flag ZONEFS_ZONE_INIT_MODE is used to differentiate both cases. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
2023-01-23zonefs: Reorganize codeDamien Le Moal
Move all code related to zone file operations from super.c to the new file.c file. Inode and zone management code remains in super.c. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
2023-01-22Merge tag 'gfs2-v6.2-rc4-fix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 writepage fix from Andreas Gruenbacher: - Fix a regression introduced by commit "gfs2: stop using generic_writepages in gfs2_ail1_start_one". * tag 'gfs2-v6.2-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one"
2023-01-22Merge 6.2-rc5 into driver-core-nextGreg Kroah-Hartman
We need the driver core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-22Revert "gfs2: stop using generic_writepages in gfs2_ail1_start_one"Andreas Gruenbacher
Commit b2b0a5e97855 switched from generic_writepages() to filemap_fdatawrite_wbc() in gfs2_ail1_start_one() on the path to replacing ->writepage() with ->writepages() and eventually eliminating the former. Function gfs2_ail1_start_one() is called from gfs2_log_flush(), our main function for flushing the filesystem log. Unfortunately, at least as implemented today, ->writepage() and ->writepages() are entirely different operations for journaled data inodes: while the former creates and submits transactions covering the data to be written, the latter flushes dirty buffers out to disk. With gfs2_ail1_start_one() now calling ->writepages(), we end up creating filesystem transactions while we are in the course of a log flush, which immediately deadlocks on the sdp->sd_log_flush_lock semaphore. Work around that by going back to how things used to work before commit b2b0a5e97855 for now; figuring out a superior solution will take time we don't have available right now. However ... Since the removal of generic_writepages() is imminent, open-code it here. We're already inside a blk_start_plug() ... blk_finish_plug() section here, so skip that part of the original generic_writepages(). This reverts commit b2b0a5e978552e348f85ad9c7568b630a5ede659. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de>
2023-01-21ext4: deal with legacy signed xattr name hash valuesLinus Torvalds
We potentially have old hashes of the xattr names generated on systems with signed 'char' types. Now that everybody uses '-funsigned-char', those hashes will no longer match. This only happens if you use xattrs names that have the high bit set, which probably doesn't happen in practice, but the xfstest generic/454 shows it. Instead of adding a new "signed xattr hash filesystem" bit and having to deal with all the possible combinations, just calculate the hash both ways if the first one fails, and always generate new hashes with the proper unsigned char version. Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202212291509.704a11c9-oliver.sang@intel.com Link: https://lore.kernel.org/all/CAHk-=whUNjwqZXa-MH9KMmc_CpQpoFKFjAB9ZKHuu=TbsouT4A@mail.gmail.com/ Exposed-by: 3bc753c06dd0 ("kbuild: treat char as always unsigned") Cc: Eric Biggers <ebiggers@kernel.org> Cc: Andreas Dilger <adilger@dilger.ca> Cc: Theodore Ts'o <tytso@mit.edu>, Cc: Jason Donenfeld <Jason@zx2c4.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-01-20Merge tag '6.2-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: - important fix for packet signature calculation error - three fixes to correct DFS deadlock, and DFS refresh problem - remove an unused DFS function, and duplicate tcon refresh code - DFS cache lookup fix - uninitialized rc fix * tag '6.2-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: remove unused function cifs: do not include page data when checking signature cifs: fix return of uninitialized rc in dfs_cache_update_tgthint() cifs: handle cache lookup errors different than -ENOENT cifs: remove duplicate code in __refresh_tcon() cifs: don't take exclusive lock for updating target hints cifs: avoid re-lookups in dfs_cache_find() cifs: fix potential deadlock in cache_refresh_path()
2023-01-20ksmbd: do not sign response to session request for guest loginMarios Makassikis
If ksmbd.mountd is configured to assign unknown users to the guest account ("map to guest = bad user" in the config), ksmbd signs the response. This is wrong according to MS-SMB2 3.3.5.5.3: 12. If the SMB2_SESSION_FLAG_IS_GUEST bit is not set in the SessionFlags field, and Session.IsAnonymous is FALSE, the server MUST sign the final session setup response before sending it to the client, as follows: [...] This fixes libsmb2 based applications failing to establish a session ("Wrong signature in received"). Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-20ksmbd: add max connections parameterNamjae Jeon
Add max connections parameter to limit number of maximum simultaneous connections. Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Cc: stable@vger.kernel.org Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-01-20Merge tag 'for-6.2-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fix potential out-of-bounds access to leaf data when seeking in an inline file - fix potential crash in quota when rescan races with disable - reimplement super block signature scratching by marking page/folio dirty and syncing block device, allow removing write_one_page * tag 'for-6.2-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix race between quota rescan and disable leading to NULL pointer deref btrfs: fix invalid leaf access due to inline extent during lseek btrfs: stop using write_one_page in btrfs_scratch_superblock btrfs: factor out scratching of one regular super block
2023-01-20debugfs: Export debugfs_create_str symbolCristian Marussi
Needed by SCMI Raw mode support when compiled as a loadable module. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/r/20230118121426.492864-10-cristian.marussi@arm.com Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2023-01-19sysv: fix handling of delete_entry and set_link failuresAl Viro
similar to minixfs series - make sysv_set_link() report failures, lift dir_put_page() into the callers of sysv_set_link() and sysv_delete_entry(), make sysv_rename() handle failures in both. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-01-19fs/sysv: Replace kmap() with kmap_local_page()Fabio M. De Francesco
kmap() is being deprecated in favor of kmap_local_page(). There are two main problems with kmap(): (1) It comes with an overhead as the mapping space is restricted and protected by a global lock for synchronization and (2) it also requires global TLB invalidation when the kmap’s pool wraps and it might block when the mapping space is fully utilized until a slot becomes available. With kmap_local_page() the mappings are per thread, CPU local, can take page faults, and can be called from any context (including interrupts). It is faster than kmap() in kernels with HIGHMEM enabled. Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and still valid. Since kmap_local_page() would not break the strict rules of local mappings (i.e., the thread locality and the stack based nesting), this function can be easily and safely replace the deprecated API. Therefore, replace kmap() with kmap_local_page() in fs/sysv. kunmap_local() requires the mapping address, so return that address from dir_get_page() to be used in dir_put_page(). Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-01-19fs/sysv: Use dir_put_page() in sysv_rename()Fabio M. De Francesco
Use the dir_put_page() helper in sysv_rename() instead of open-coding two kunmap() + put_page(). Cc: Al Viro <viro@zeniv.linux.org.uk> Suggested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-01-19fs/sysv: Change the signature of dir_get_page()Fabio M. De Francesco
Change the signature of dir_get_page() in order to prepare this function to the conversion to the use of kmap_local_page(). Change also those call sites which are required to adjust to the new signature. Cc: Ira Weiny <ira.weiny@intel.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-01-19fs/sysv: Use the offset_in_page() helperFabio M. De Francesco
Use the offset_in_page() helper because it is more suitable than doing explicit subtractions between pointers to directory entries and kernel virtual addresses of mapped pages. Cc: Ira Weiny <ira.weiny@intel.com> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-01-19sysv: don't flush page immediately for DIRSYNC directoriesChristoph Hellwig
We do not need to writeout modified directory blocks immediately when modifying them while the page is locked. It is enough to do the flush somewhat later which has the added benefit that inode times can be flushed as well. It also allows us to stop depending on write_one_page() function. Ported from an ext2 patch by Jan Kara. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-01-19minix_rename(): minix_delete_entry() might failAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-01-19minix: don't flush page immediately for DIRSYNC directoriesChristoph Hellwig
We do not need to writeout modified directory blocks immediately when modifying them while the page is locked. It is enough to do the flush somewhat later which has the added benefit that inode times can be flushed as well. It also allows us to stop depending on write_one_page() function. Ported from an ext2 patch by Jan Kara. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-01-19minix: fix error handling in minix_set_linkChristoph Hellwig
If minix_prepare_chunk fails, updating c/mtime and marking the dir inode dirty is wrong, as the inode hasn't been modified. Also propagate the error to the caller. Note that this moves the dir_put_page call later, but that matches other uses of this helper in the directory code. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-01-19minix: fix error handling in minix_delete_entryChristoph Hellwig
If minix_prepare_chunk fails, updating c/mtime and marking the dir inode dirty is wrong, as the inode hasn't been modified. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-01-19minix: move releasing pages into unlink and renameChristoph Hellwig
Instead of consuming the page reference and kmap in the low-level minix_delete_entry and minix_set_link helpers, do it in the callers where that code can be shared with the error cleanup path. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-01-19Merge tag 'zonefs-6.2-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fix from Damien Le Moal: - A single patch to fix sync write operations to detect and handle errors due to external zone corruptions resulting in writes at invalid location, from me. * tag 'zonefs-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: Detect append writes at invalid locations