linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-05-06	wifi: iwlwifi: mvm: don't always unblock EMLSR	Miri Korenblit
	When an event occurs to unblock EMLSR, the code attempts to re-enable EMLSR. However, the current implementation always tries to activate EMLSR, regardless of whether the blocker was set before the unblocking event or not. If EMLSR was already unblocked, there is no need to re-activate it. Fixes: 6cf7df9f013f ("wifi: iwlwifi: mvm: Add helper functions to update EMLSR status") Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240505091420.eb861402dac9.I6a1d9f774f5551cfab60ea37b71a62640496af9b@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-06	wifi: iwlwifi: mvm: Always allow entering EMLSR from debugfs	Miri Korenblit
	EMLSR can't be activated from mac80211. Except for the debugfs, which is intended for testing purposes. Currently we don't allow entering EMLSR from debugfs if EMLSR is blocked, i.e. if mvmvif::esr_disable_reason is not 0. But we need a way to activate EMLSR regardless of the vif being blocked, for testing. Remove the check of esr_disable_reason Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240505091420.bc3c24d9e0e6.Iad60e22a0d7e2b2b989051e1140b6dc98bef7bcc@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-06	wifi: iwlwifi: mvm: add a debugfs for (un)blocking EMLSR	Miri Korenblit
	This is needed for testing purposes. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240505091420.eba2b6f0664c.I5f058e02abda11bf2eccfd2bcb59ca26bae87a3a@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-06	wifi: iwlwifi: mvm: trigger link selection after exiting EMLSR	Miri Korenblit
	If the reason for exiting EMLSR was a blocking reason, wait for the corresponding unblocking event: - if there is an ongoing scan - do nothing. Link selection will be triggered at the end of it. - If more than 30 seconds passed since the exit, trigger MLO scan, which will trigger link selection - If less then 30 seconds passed since exit, reuse the latest link selection result If the reason for exiting EMLSR was an exit reason (IWL_MVM_EXIT_*), schedule MLO scan in 30 seconds. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Reviewed-by: Ilan Peer <ilan.peer@intel.com> Link: https://msgid.link/20240505091420.6a808c4ae8f5.Ia79605838eb6deee9358bec633ef537f2653db92@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-06	wifi: iwlwifi: cleanup EMLSR when BT is active handling	Miri Korenblit
	BT Coex disables EMLSR only for a 2.4 GHz link, but doesn't block the vif from using EMLSR with a different link pair. In addition, storing it in mvmvif:disable_esr_reason requires extracting the BT Coex bit before checking if EMLSR is blocked or not for a specific vif. Therefore, change the BT Coex bit to be an exit reason and not a blocker. On link selection, EMLSR mode will be re-calculated for the 2.4 GHz link instead of checking that bit. While at it, move the relevant function declarations to the EMLSR functions area in mvm.h Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://msgid.link/20240505091420.a2e93b67c895.I183a0039ef076613144648cc46fbe9ab3d47c574@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-06	Merge wireless into wireless-next	Johannes Berg
	Given how late we are in the cycle, merge the two fixes from wireless into wireless-next as they don't see that urgent. This way, the wireless tree won't need rebasing later. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-05-06	netfilter: nft_set_pipapo: merge deactivate helper into caller	Florian Westphal
	Its the only remaining call site so there is no need for this to be separated anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06	netfilter: nft_set_pipapo: prepare walk function for on-demand clone	Florian Westphal
	The existing code uses iter->type to figure out what data is needed, the live copy (READ) or clone (UPDATE). Without pending updates, priv->clone and priv->match will point to different memory locations, but they have identical content. Future patch will make priv->clone == NULL if there are no pending changes, in this case we must copy the live data for the UPDATE case. Currently this would require GFP_ATOMIC allocation. Split the walk function in two parts: one that does the walk and one that decides which data is needed. In the UPDATE case, callers hold the transaction mutex so we do not need the rcu read lock. This allows to use GFP_KERNEL allocation while cloning. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06	netfilter: nft_set_pipapo: prepare destroy function for on-demand clone	Florian Westphal
	Once priv->clone can be NULL in case no insertions/removals occurred in the last transaction we need to drop set elements from priv->match if priv->clone is NULL. While at it, condense this function by reusing the pipapo_free_match helper instead of open-coded version. The rcu_barrier() is removed, its not needed: old call_rcu instances for pipapo_reclaim_match do not access struct nft_set. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06	netfilter: nft_set_pipapo: make pipapo_clone helper return NULL	Florian Westphal
	Currently it returns an error pointer, but the only possible failure is ENOMEM. After a followup patch, we'd need to discard the errno code, i.e. x = pipapo_clone() if (IS_ERR(x)) return NULL or make more changes to fix up callers to expect IS_ERR() code from set->ops->deactivate(). So simplify this and make it return ptr-or-null. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06	netfilter: nft_set_pipapo: move prove_locking helper around	Florian Westphal
	Preparation patch, the helper will soon get called from insert function too. Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06	netfilter: conntrack: remove flowtable early-drop test	Florian Westphal
	Not sure why this special case exists. Early drop logic (which kicks in when conntrack table is full) should be independent of flowtable offload and only consider assured bit (i.e., two-way traffic was seen). flowtable entries hold a reference to the conntrack entry (struct nf_conn) that has been offloaded. The conntrack use count is not decremented until after the entry is free'd. This change therefore will not result in exceeding the conntrack table limit. It does allow early-drop of tcp flows even when they've been offloaded, but only if they have been offloaded before syn-ack was received or after at least one peer has sent a fin. Currently 'fin' packet reception already stops offloading, so this should not impact offloading either. Cc: Vlad Buslov <vladbu@nvidia.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06	netfilter: conntrack: documentation: remove reference to non-existent sysctl	Florian Westphal
	The referenced sysctl doesn't exist anymore. Fixes: 4592ee7f525c ("netfilter: conntrack: remove offload_pickup sysctl again") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06	netfilter: use NF_DROP instead of -NF_DROP	Jason Xing
	At the beginning in 2009 one patch [1] introduced collecting drop counter in nf_conntrack_in() by returning -NF_DROP. Later, another patch [2] changed the return value of tcp_packet() which now is renamed to nf_conntrack_tcp_packet() from -NF_DROP to NF_DROP. As we can see, that -NF_DROP should be corrected. Similarly, there are other two points where the -NF_DROP is used. Well, as NF_DROP is equal to 0, inverting NF_DROP makes no sense as patch [2] said many years ago. [1] commit 7d1e04598e5e ("netfilter: nf_conntrack: account packets drop by tcp_packet()") [2] commit ec8d540969da ("netfilter: conntrack: fix dropping packet after l4proto->packet()") Signed-off-by: Jason Xing <kernelxing@tencent.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-06	bcachefs: Fix a scheduler splat in __bch2_next_write_buffer_flush_journal_buf()	Kent Overstreet
	We're using mutex_lock() inside a wait_event() conditional - prepare_to_wait() has already flipped task state, so potentially blocking ops need annotation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-05-06	orangefs: fix out-of-bounds fsid access	Mike Marshall
	Arnd Bergmann sent a patch to fsdevel, he says: "orangefs_statfs() copies two consecutive fields of the superblock into the statfs structure, which triggers a warning from the string fortification helpers" Jan Kara suggested an alternate way to do the patch to make it more readable. I ran both ideas through xfstests and both seem fine. This patch is based on Jan Kara's suggestion. Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2024-05-06	LoongArch: KVM: Add mmio trace events support	Bibo Mao
	Add mmio trace events support, currently generic mmio events KVM_TRACE_MMIO_WRITE/xxx_READ/xx_READ_UNSATISFIED are added here. Also vcpu id field is added for all kvm trace events, since perf KVM tool parses vcpu id information for kvm entry event. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06	LoongArch: KVM: Add software breakpoint support	Bibo Mao
	When VM runs in kvm mode, system will not exit to host mode when executing a general software breakpoint instruction such as INSN_BREAK, trap exception happens in guest mode rather than host mode. In order to debug guest kernel on host side, one mechanism should be used to let VM exit to host mode. Here a hypercall instruction with a special code is used for software breakpoint usage. VM exits to host mode and kvm hypervisor identifies the special hypercall code and sets exit_reason with KVM_EXIT_DEBUG. And then let qemu handle it. Idea comes from ppc kvm, one api KVM_REG_LOONGARCH_DEBUG_INST is added to get the hypercall code. VMM needs get sw breakpoint instruction with this api and set the corresponding sw break point for guest kernel. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06	LoongArch: KVM: Add PV IPI support on guest side	Bibo Mao
	PARAVIRT config option and PV IPI is added for the guest side, function pv_ipi_init() is used to add IPI sending and IPI receiving hooks. This function firstly checks whether system runs in VM mode, and if kernel runs in VM mode, it will call function kvm_para_available() to detect the current hypervirsor type (now only KVM type detection is supported). The paravirt functions can work only if current hypervisor type is KVM, since there is only KVM supported on LoongArch now. PV IPI uses virtual IPI sender and virtual IPI receiver functions. With virtual IPI sender, IPI message is stored in memory rather than emulated HW. IPI multicast is also supported, and 128 vcpus can received IPIs at the same time like X86 KVM method. Hypercall method is used for IPI sending. With virtual IPI receiver, HW SWI0 is used rather than real IPI HW. Since VCPU has separate HW SWI0 like HW timer, there is no trap in IPI interrupt acknowledge. Since IPI message is stored in memory, there is no trap in getting IPI message. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06	LoongArch: KVM: Add PV IPI support on host side	Bibo Mao
	On LoongArch system, IPI hw uses iocsr registers. There are one iocsr register access on IPI sending, and two iocsr access on IPI receiving for the IPI interrupt handler. In VM mode all iocsr accessing will cause VM to trap into hypervisor. So with one IPI hw notification there will be three times of trap. In this patch PV IPI is added for VM, hypercall instruction is used for IPI sender, and hypervisor will inject an SWI to the destination vcpu. During the SWI interrupt handler, only CSR.ESTAT register is written to clear irq. CSR.ESTAT register access will not trap into hypervisor, so with PV IPI supported, there is one trap with IPI sender, and no trap with IPI receiver, there is only one trap with IPI notification. Also this patch adds IPI multicast support, the method is similar with x86. With IPI multicast support, IPI notification can be sent to at most 128 vcpus at one time. It greatly reduces the times of trapping into hypervisor. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06	LoongArch: KVM: Add vcpu mapping from physical cpuid	Bibo Mao
	Physical CPUID is used for interrupt routing for irqchips such as ipi, msgint and eiointc interrupt controllers. Physical CPUID is stored at the CSR register LOONGARCH_CSR_CPUID, it can not be changed once vcpu is created and the physical CPUIDs of two vcpus cannot be the same. Different irqchips have different size declaration about physical CPUID, the max CPUID value for CSR LOONGARCH_CSR_CPUID on Loongson-3A5000 is 512, the max CPUID supported by IPI hardware is 1024, while for eiointc irqchip is 256, and for msgint irqchip is 65536. The smallest value from all interrupt controllers is selected now, and the max cpuid size is defines as 256 by KVM which comes from the eiointc irqchip. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06	LoongArch: KVM: Add cpucfg area for kvm hypervisor	Bibo Mao
	Instruction cpucfg can be used to get processor features. And there is a trap exception when it is executed in VM mode, and also it can be used to provide cpu features to VM. On real hardware cpucfg area 0 - 20 is used by now. Here one specified area 0x40000000 -- 0x400000ff is used for KVM hypervisor to provide PV features, and the area can be extended for other hypervisors in future. This area will never be used for real HW, it is only used by software. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06	LoongArch: KVM: Add hypercall instruction emulation	Bibo Mao
	On LoongArch system, there is a hypercall instruction special for virtualization. When system executes this instruction on host side, there is an illegal instruction exception reported, however it will trap into host when it is executed in VM mode. When hypercall is emulated, A0 register is set with value KVM_HCALL_INVALID_CODE, rather than inject EXCCODE_INE invalid instruction exception. So VM can continue to executing the next code. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06	LoongArch/smp: Refine some ipi functions on LoongArch platform	Bibo Mao
	Refine the ipi handling on LoongArch platform, there are three modifications: 1. Add generic function get_percpu_irq(), replacing some percpu irq functions such as get_ipi_irq()/get_pmc_irq()/get_timer_irq() with get_percpu_irq(). 2. Change definition about parameter action called by function loongson_send_ipi_single() and loongson_send_ipi_mask(), and it is defined as decimal encoding format at ipi sender side. Normal decimal encoding is used rather than binary bitmap encoding for ipi action, ipi hw sender uses decimal encoding code, and ipi receiver will get binary bitmap encoding, the ipi hw will convert it into bitmap in ipi message buffer. 3. Add a structure smp_ops on LoongArch platform so that pv ipi can be used later. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-05-06	perf dsos: Switch hand crafted code to bsearch()	Ian Rogers
	Switch to using the bsearch library function rather than having a hand written binary search. Const-ify some static functions to avoid compiler warnings. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Chengen Du <chengen.du@canonical.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com> Link: https://lore.kernel.org/r/20240504213803.218974-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-06	nfsd: set security label during create operations	Stephen Smalley
	When security labeling is enabled, the client can pass a file security label as part of a create operation for the new file, similar to mode and other attributes. At present, the security label is received by nfsd and passed down to nfsd_create_setattr(), but nfsd_setattr() is never called and therefore the label is never set on the new file. This bug may have been introduced on or around commit d6a97d3f589a ("NFSD: add security label to struct nfsd_attrs"). Looking at nfsd_setattr() I am uncertain as to whether the same issue presents for file ACLs and therefore requires a similar fix for those. An alternative approach would be to introduce a new LSM hook to set the "create SID" of the current task prior to the actual file creation, which would atomically label the new inode at creation time. This would be better for SELinux and a similar approach has been used previously (see security_dentry_create_files_as) but perhaps not usable by other LSMs. Reproducer: 1. Install a Linux distro with SELinux - Fedora is easiest 2. git clone https://github.com/SELinuxProject/selinux-testsuite 3. Install the requisite dependencies per selinux-testsuite/README.md 4. Run something like the following script: MOUNT=$HOME/selinux-testsuite sudo systemctl start nfs-server sudo exportfs -o rw,no_root_squash,security_label localhost:$MOUNT sudo mkdir -p /mnt/selinux-testsuite sudo mount -t nfs -o vers=4.2 localhost:$MOUNT /mnt/selinux-testsuite pushd /mnt/selinux-testsuite/ sudo make -C policy load pushd tests/filesystem sudo runcon -t test_filesystem_t ./create_file -f trans_test_file \ -e test_filesystem_filetranscon_t -v sudo rm -f trans_test_file popd sudo make -C policy unload popd sudo umount /mnt/selinux-testsuite sudo exportfs -u localhost:$MOUNT sudo rmdir /mnt/selinux-testsuite sudo systemctl stop nfs-server Expected output: <eliding noise from commands run prior to or after the test itself> Process context: unconfined_u:unconfined_r:test_filesystem_t:s0-s0:c0.c1023 Created file: trans_test_file File context: unconfined_u:object_r:test_filesystem_filetranscon_t:s0 File context is correct Actual output: <eliding noise from commands run prior to or after the test itself> Process context: unconfined_u:unconfined_r:test_filesystem_t:s0-s0:c0.c1023 Created file: trans_test_file File context: system_u:object_r:test_file_t:s0 File context error, expected: test_filesystem_filetranscon_t got: test_file_t Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	NFSD: Add COPY status code to OFFLOAD_STATUS response	Chuck Lever
	Clients that send an OFFLOAD_STATUS might want to distinguish between an async COPY operation that is still running, has completed successfully, or that has failed. The intention of this patch is to make NFSD behave like this: * Copy still running: OFFLOAD_STATUS returns NFS4_OK, the number of bytes copied so far, and an empty osr_status array * Copy completed successfully: OFFLOAD_STATUS returns NFS4_OK, the number of bytes copied, and an osr_status of NFS4_OK * Copy failed: OFFLOAD_STATUS returns NFS4_OK, the number of bytes copied, and an osr_status other than NFS4_OK * Copy operation lost, canceled, or otherwise unrecognized: OFFLOAD_STATUS returns NFS4ERR_BAD_STATEID NB: Though RFC 7862 Section 11.2 lists a small set of NFS status codes that are valid for OFFLOAD_STATUS, there do not seem to be any explicit spec limits on the status codes that may be returned in the osr_status field. At this time we have no unit tests for COPY and its brethren, as pynfs does not yet implement support for NFSv4.2. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	NFSD: Record status of async copy operation in struct nfsd4_copy	Chuck Lever
	After a client has started an asynchronous COPY operation, a subsequent OFFLOAD_STATUS operation will need to report the status code once that COPY operation has completed. The recorded status record will be used by a subsequent patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	SUNRPC: Remove comment for sp_lock	Guoqing Jiang
	It is obsolete since sp_lock was discarded in commit 580a25756a9f ("SUNRPC: discard sp_lock"). Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev> Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	NFSD: add listener-{set,get} netlink command	Lorenzo Bianconi
	Introduce write_ports netlink command. For listener-set, userspace is expected to provide a NFS listeners list it wants enabled. All other sockets will be closed. Reviewed-by: Jeff Layton <jlayton@kernel.org> Co-developed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	SUNRPC: add a new svc_find_listener helper	Jeff Layton
	svc_find_listener will return the transport instance pointer for the endpoint accepting connections/peer traffic from the specified transport class and matching sockaddr. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	SUNRPC: introduce svc_xprt_create_from_sa utility routine	Lorenzo Bianconi
	Add svc_xprt_create_from_sa utility routine and refactor svc_xprt_create() codebase in order to introduce the capability to create a svc port from socket address. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	NFSD: add write_version to netlink command	Lorenzo Bianconi
	Introduce write_version netlink command through a "declarative" interface. This patch introduces a change in behavior since for version-set userspace is expected to provide a NFS major/minor version list it wants to enable while all the other ones will be disabled. (procfs write_version command implements imperative interface where the admin writes +3/-3 to enable/disable a single version. Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	NFSD: convert write_threads to netlink command	Lorenzo Bianconi
	Introduce write_threads netlink command similar to the one available through the procfs. Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Co-developed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	NFSD: allow callers to pass in scope string to nfsd_svc	Jeff Layton
	Currently admins set this by using unshare to create a new uts namespace, and then resetting the hostname. With the new netlink interface we can just pass this in directly. Prepare nfsd_svc for this change. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	NFSD: move nfsd_mutex handling into nfsd_svc callers	Jeff Layton
	Currently nfsd_svc holds the nfsd_mutex over the whole function. For some of the later netlink patches though, we want to do some other things to the server before starting it. Move the mutex handling into the callers. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	lockd: host: Remove unnecessary statements＇host = NULL;＇	Li kunyu
	In 'nlm_alloc_host', the host has already been assigned a value of NULL when defined, so 'host=NULL;' Can be deleted. Signed-off-by: Li kunyu <kunyu@nfschina.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	nfsd: don't create nfsv4recoverydir in nfsdfs when not used.	NeilBrown
	When CONFIG_NFSD_LEGACY_CLIENT_TRACKING is not set, the virtual file /proc/fs/nfsd/nfsv4recoverydir is created but responds EINVAL to any access. This is not useful, is somewhat surprising, and it causes ltp to complain. The only known user of this file is in nfs-utils, which handles non-existence and read-failure equally well. So there is nothing to gain from leaving the file present but inaccessible. So this patch removes the file when its content is not available - i.e. when that config option is not selected. Also remove the #ifdef which hides some of the enum values when CONFIG_NFSD_V$ not selection. simple_fill_super() quietly ignores array entries that are not present, so having slots in the array that don't get used is perfectly acceptable. So there is no value in this #ifdef. Reported-by: Petr Vorel <pvorel@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Fixes: 74fd48739d04 ("nfsd: new Kconfig option for legacy client tracking") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	nfsd: optimise recalculate_deny_mode() for a common case	NeilBrown
	recalculate_deny_mode() takes time that is linear in the number of stateids active on the file. When called from release_openowner -> free_ol_stateid_reaplist ->nfs4_free_ol_stateid -> release_all_access the number of times it is called is linear in the number of stateids. The net result is that time taken by release_openowner is quadratic in the number of stateids. When the nfsd server is shut down while there are many active stateids this can result in a soft lockup. ("CPU stuck for 302s" seen in one case). In many cases all the states have the same deny modes and there is no need to examine the entire list in recalculate_deny_mode(). In particular, recalculate_deny_mode() will only reduce the deny mode, never increase it. So if some prefix of the list causes the original deny mode to be required, there is no need to examine the remainder of the list. So we can improve recalculate_deny_mode() to usually run in constant time, so release_openowner will typically be only linear in the number of states. Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	nfsd: add tracepoint in mark_client_expired_locked	Jeff Layton
	Show client info alongside the number of cl_rpc_users. If that's elevated, then we can infer that this function returned nfserr_jukebox. [ cel: For additional debugging of RPC user refcounting ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Vladimir Benes <vbenes@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	nfsd: new tracepoint for check_slot_seqid	Chuck Lever
	Replace a dprintk in check_slot_seqid with tracepoints. These new tracepoints track slot sequence numbers during operation. Suggested-by: Jeffrey Layton <jlayton@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	nfsd: drop extraneous newline from nfsd tracepoints	Jeff Layton
	We never want a newline in tracepoint output. Signed-off-by: Jeff Layton <jlayton@kernel.org> Tested-by: Vladimir Benes <vbenes@redhat.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	sunrpc: removed redundant procp check	Aleksandr Aprelkov
	since vs_proc pointer is dereferenced before getting it's address there's no need to check for NULL. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 8e5b67731d08 ("SUNRPC: Add a callback to initialise server requests") Signed-off-by: Aleksandr Aprelkov <aaprelkov@usergate.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	fs: nfsd: use group allocation/free of per-cpu counters API	Kefeng Wang
	Use group allocation/free of per-cpu counters api to accelerate nfsd percpu_counters init/destroy(), and also squash the nfsd_percpu_counters_init/reset/destroy() and nfsd_counters_init/destroy() into callers to simplify code. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	nfsd: trivial GET_DIR_DELEGATION support	Jeff Layton
	This adds basic infrastructure for handing GET_DIR_DELEGATION calls from clients, including the decoders and encoders. For now, it always just returns NFS4_OK + GDD4_UNAVAIL. Eventually clients may start sending this operation, and it's better if we can return GDD4_UNAVAIL instead of having to abort the whole compound. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	NFSD: Move callback_wq into struct nfs4_client	Chuck Lever
	Commit 883820366747 ("nfsd: update workqueue creation") made the callback_wq single-threaded, presumably to protect modifications of cl_cb_client. See documenting comment for nfsd4_process_cb_update(). However, cl_cb_client is per-lease. There's no other reason that all callback operations need to be dispatched via a single thread. The single threading here means all client callbacks can be blocked by a problem with one client. Change the NFSv4 callback client so it serializes per-lease instead of serializing all NFSv4 callback operations on the server. Reported-by: Dai Ngo <dai.ngo@oracle.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	nfsd: drop st_mutex before calling move_to_close_lru()	NeilBrown
	move_to_close_lru() is currently called with ->st_mutex held. This can lead to a deadlock as move_to_close_lru() waits for sc_count to drop to 2, and some threads holding a reference might be waiting for the mutex. These references will never be dropped so sc_count will never reach 2. There can be no harm in dropping ->st_mutex before move_to_close_lru() because the only place that takes the mutex is nfsd4_lock_ol_stateid(), and it quickly aborts if sc_type is NFS4_CLOSED_STID, which it will be before move_to_close_lru() is called. See also https://lore.kernel.org/lkml/4dd1fe21e11344e5969bb112e954affb@jd.com/T/ where this problem was raised but not successfully resolved. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	nfsd: replace rp_mutex to avoid deadlock in move_to_close_lru()	NeilBrown
	move_to_close_lru() waits for sc_count to become zero while holding rp_mutex. This can deadlock if another thread holds a reference and is waiting for rp_mutex. By the time we get to move_to_close_lru() the openowner is unhashed and cannot be found any more. So code waiting for the mutex can safely retry the lookup if move_to_close_lru() has started. So change rp_mutex to an atomic_t with three states: RP_UNLOCK - state is still hashed, not locked for reply RP_LOCKED - state is still hashed, is locked for reply RP_UNHASHED - state is not hashed, no code can get a lock. Use wait_var_event() to wait for either a lock, or for the owner to be unhashed. In the latter case, retry the lookup. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	nfsd: move nfsd4_cstate_assign_replay() earlier in open handling.	NeilBrown
	Rather than taking the rp_mutex (via nfsd4_cstate_assign_replay) in nfsd4_cleanup_open_state() (which seems counter-intuitive), take it and assign rp_owner as soon as possible - in nfsd4_process_open1(). This will support a future change when nfsd4_cstate_assign_replay() might fail. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-05-06	nfsd: perform all find_openstateowner_str calls in the one place.	NeilBrown
	Currently find_openstateowner_str look ups are done both in nfsd4_process_open1() and alloc_init_open_stateowner() - the latter possibly being a surprise based on its name. It would be easier to follow, and more conformant to common patterns, if the lookup was all in the one place. So replace alloc_init_open_stateowner() with find_or_alloc_open_stateowner() and use the latter in nfsd4_process_open1() without any calls to find_openstateowner_str(). This means all finds are find_openstateowner_str_locked() and find_openstateowner_str() is no longer needed. So discard find_openstateowner_str() and rename find_openstateowner_str_locked() to find_openstateowner_str(). Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>