linux-arm.git - Russell King's ARM Linux kernel tree

diff options

author	Heming Zhao <heming.zhao@suse.com>	2024-07-09 18:41:19 +0800
committer	Song Liu <song@kernel.org>	2024-07-12 01:30:17 +0000
commit	fff42f213824fa434a4b6cf906b4331fe6e9302b (patch)
tree	8256583524f88dbb5ba545cefc8257ba44b06a41 /mm/kfence/core.c
parent	3c1743a685b19bc17cf65af4a2eb149fd3b15c50 (diff)

md-cluster: fix hanging issue while a new disk adding

The commit 1bbe254e4336 ("md-cluster: check for timeout while a new disk adding") is correct in terms of code syntax but not suite real clustered code logic. When a timeout occurs while adding a new disk, if recv_daemon() bypasses the unlock for ack_lockres:CR, another node will be waiting to grab EX lock. This will cause the cluster to hang indefinitely. How to fix: 1. In dlm_lock_sync(), change the wait behaviour from forever to a timeout, This could avoid the hanging issue when another node fails to handle cluster msg. Another result of this change is that if another node receives an unknown msg (e.g. a new msg_type), the old code will hang, whereas the new code will timeout and fail. This could help cluster_md handle new msg_type from different nodes with different kernel/module versions (e.g. The user only updates one leg's kernel and monitors the stability of the new kernel). 2. The old code for __sendmsg() always returns 0 (success) under the design (must successfully unlock ->message_lockres). This commit makes this function return an error number when an error occurs. Fixes: 1bbe254e4336 ("md-cluster: check for timeout while a new disk adding") Signed-off-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Su Yue <glass.su@suse.com> Acked-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20240709104120.22243-1-heming.zhao@suse.com

Diffstat (limited to 'mm/kfence/core.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: