linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-07-09	PM: domains: Allow devices attached to genpd to be managed by HW	Ulf Hansson
	Some power-domains may be capable of relying on the HW to control the power for a device that's hooked up to it. Typically, for these kinds of configurations the consumer driver should be able to change the behavior of power domain at runtime, control the power domain in SW mode for certain configurations and handover the control to HW mode for other usecases. To allow a consumer driver to change the behaviour of the PM domain for its device, let's provide a new function, dev_pm_genpd_set_hwmode(). Moreover, let's add a corresponding optional genpd callback, ->set_hwmode_dev(), which the genpd provider should implement if it can support switching between HW controlled mode and SW controlled mode. Similarly, add the dev_pm_genpd_get_hwmode() to allow consumers to read the current mode and its corresponding optional genpd callback, ->get_hwmode_dev(), which the genpd provider can also implement to synchronize the initial HW mode state in genpd_add_device() by reading back the mode from the hardware. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Jagadeesh Kona <quic_jkona@quicinc.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Taniya Das <quic_tdas@quicinc.com> Link: https://lore.kernel.org/r/20240624044809.17751-2-quic_jkona@quicinc.com
2024-07-09	dt-bindings: power: add Amlogic A5 power domains	Xianwei Zhao
	Add devicetree binding document and related header file for Amlogic A5 secure power domains. Signed-off-by: Hongyu Chen <hongyu.chen1@amlogic.com> Signed-off-by: Xianwei Zhao <xianwei.zhao@amlogic.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20240627-a5_secpower-v1-1-1f47dde1270c@amlogic.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2024-07-09	mfd: tmio: Move header to platform_data	Wolfram Sang
	All the MFD components are gone from the header meanwhile. Only the MMC relevant data is left which makes it a platform_data for the MMC controller. Move the header to the now fitting directory. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Acked-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20240213220221.2380-14-wsa+renesas@sang-engineering.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-07-09	mfd: tmio: Sanitize comments	Wolfram Sang
	Reformat the comments to utilize the maximum line length and use single line comments where appropriate. Remove superfluous comments, too. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20240213220221.2380-13-wsa+renesas@sang-engineering.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-07-09	mfd: tmio: Update include files	Wolfram Sang
	Remove meanwhile unneeded includes, only add types.h for dma_addr_t. Also, remove an obsolete forward declaration while here. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20240213220221.2380-12-wsa+renesas@sang-engineering.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-07-09	mfd: tmio: Remove obsolete io accessors	Wolfram Sang
	Since commit 568494db6809 ("mtd: remove tmio_nand driver") and commit aceae7848624 ("fbdev: remove tmiofb driver"), these accessors have no users anymore. Remove them. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20240213220221.2380-10-wsa+renesas@sang-engineering.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-07-09	mfd: tmio: Remove obsolete platform_data	Wolfram Sang
	With commit 8971bb812e3c ("mfd: remove toshiba tmio drivers"), all users of platform data for NAND and framebuffers are gone. So, remove definitions from the header, too. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20240213220221.2380-9-wsa+renesas@sang-engineering.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-07-09	wifi: mac80211: add radio index to ieee80211_chanctx_conf	Felix Fietkau
	Will be used to explicitly assign a channel context to a wiphy radio. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/59f76f57d935f155099276be22badfa671d5bfd9.1720514221.git-series.nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-09	wifi: cfg80211: add helper for checking if a chandef is valid on a radio	Felix Fietkau
	Check if the full channel width is in the radio's frequency range. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/7c8ea146feb6f37cee62e5ba6be5370403695797.1720514221.git-series.nbd@nbd.name [add missing Return: documentation] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-09	sctp: Fix typos and improve comments	Thorsten Blum
	Fix typos s/steam/stream/ and spell out Schedule/Unschedule in the comments. Compile-tested only. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240704202558.62704-2-thorsten.blum@toblux.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-09	wifi: cfg80211: extend interface combination check for multi-radio	Felix Fietkau
	Add a field in struct iface_combination_params to check per-radio interface combinations instead of per-wiphy ones. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/32b28da89c2d759b0324deeefe2be4cee91de18e.1720514221.git-series.nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-09	wifi: cfg80211: add support for advertising multiple radios belonging to a wiphy	Felix Fietkau
	The prerequisite for MLO support in cfg80211/mac80211 is that all the links participating in MLO must be from the same wiphy/ieee80211_hw. To meet this expectation, some drivers may need to group multiple discrete hardware each acting as a link in MLO under single wiphy. With this change, supported frequencies and interface combinations of each individual radio are reported to user space. This allows user space to figure out the limitations of what combination of channels can be used concurrently. Even for non-MLO devices, this improves support for devices capable of running on multiple channels at the same time. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/18a88f9ce82b1c9f7c12f1672430eaf2bb0be295.1720514221.git-series.nbd@nbd.name Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-09	soc: samsung: exynos-pmu: add support for PMU_ALIVE non atomic registers	Peter Griffin
	Not all registers in PMU_ALIVE block support atomic set/clear operations. GS101_SYSIP_DAT0 and GS101_SYSTEM_CONFIGURATION registers are two regs where attempting atomic access fails. As documentation on exactly which registers support atomic operations is not forthcoming. We default to atomic access, unless the register is explicitly added to the tensor_is_atomic() function. Update the comment to reflect this as well. Reviewed-by: Will McVicker <willmcvicker@google.com> Tested-by: Will McVicker <willmcvicker@google.com> Signed-off-by: Peter Griffin <peter.griffin@linaro.org> Link: https://lore.kernel.org/r/20240628223506.1237523-4-peter.griffin@linaro.org Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240702063514.6215-2-krzysztof.kozlowski@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-09	Merge tag 'reset-for-v6.11-2' of git://git.pengutronix.de/pza/linux into ↵	Arnd Bergmann
	soc/drivers Reset controller updates for v6.11, part 2 This tag adds USB VBUS regulator control for Renesas RZ/G2L SoCs, which also touches PHY driver and device tree, and pulls in a new regulator_hardware_enable() helper. The Tegra BPMP reset driver can be compiled under COMPILE_TEST now. * tag 'reset-for-v6.11-2' of git://git.pengutronix.de/pza/linux: arm64: dts: renesas: rz-smarc: Replace fixed regulator for USB VBUS phy: renesas: phy-rcar-gen3-usb2: Control VBUS for RZ/G2L SoCs reset: renesas: Add USB VBUS regulator device as child dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document USB VBUS regulator reset: tegra-bpmp: allow building under COMPILE_TEST regulator: core: Add helper for allow HW access to enable/disable regulator Link: https://lore.kernel.org/r/20240703100809.2773890-1-p.zabel@pengutronix.de Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-09	Merge tag 'qcom-drivers-for-6.11' of ↵	Arnd Bergmann
	https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/drivers Qualcomm driver updates for v6.11 Support for Shared Memory (shm) Bridge is added, which provides a stricter interface for handling of buffers passed to TrustZone. The X1Elite platform is added to uefisecapp allow list, to instantiate the efivars implementation. A new in-kernel implementation of the pd-mapper (or servreg) service is introduced, to replace the userspace dependency for USB Type-C and battery management. Support for sharing interrupts across multiple bwmon instances is added, and a refcount imbalance issue is corrected. The LLCC support for recent platforms is corrected, and SA8775P support is added. A new interface is added to SMEM, to expose "feature codes". One example of the usecase for this is to indicate to the GPU driver which frequencies are available on the given device. The interrupt consumer and provider side of SMP2P is updated to provide more useful names in interrupt stats. Support for using the mailbox binding and driver for outgoing IPC interrupt in the SMSM driver is introduced. socinfo driver learns about SDM670 and IPQ5321, as well as get some updates to the X1E PMICs. pmic_glink is bumped to now support managing 3 USB Type-C ports. * tag 'qcom-drivers-for-6.11' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (48 commits) soc: qcom: smp2p: Use devname for interrupt descriptions soc: qcom: smsm: Add missing mailbox dependency to Kconfig soc: qcom: add missing pd-mapper dependencies soc: qcom: icc-bwmon: Allow for interrupts to be shared across instances dt-bindings: interconnect: qcom,msm8998-bwmon: Add X1E80100 BWMON instances dt-bindings: interconnect: qcom,msm8998-bwmon: Remove opp-table from the required list firmware: qcom: tzmem: export devm_qcom_tzmem_pool_new() soc: qcom: add pd-mapper implementation soc: qcom: pdr: extract PDR message marshalling data soc: qcom: pdr: fix parsing of domains lists soc: qcom: pdr: protect locator_addr with the main mutex firmware: qcom: scm: clarify the comment in qcom_scm_pas_init_image() firmware: qcom: scm: add support for SHM bridge memory carveout firmware: qcom: tzmem: enable SHM Bridge support firmware: qcom: scm: add support for SHM bridge operations firmware: qcom: qseecom: convert to using the TZ allocator firmware: qcom: scm: make qcom_scm_qseecom_app_get_id() use the TZ allocator firmware: qcom: scm: make qcom_scm_lmh_dcvsh() use the TZ allocator firmware: qcom: scm: make qcom_scm_ice_set_key() use the TZ allocator firmware: qcom: scm: make qcom_scm_assign_mem() use the TZ allocator ... Link: https://lore.kernel.org/r/20240705034410.13968-1-andersson@kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-07-09	block: Validate logical block size in blk_validate_limits()	John Garry
	Some drivers validate that their own logical block size. It is no harm to always do this, so validate in blk_validate_limits(). This allows us to remove the validation in most of those drivers. Add a comment to blk_validate_block_size() to inform users that self- validation of LBS is usually unnecessary. Signed-off-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240708091651.177447-3-john.g.garry@oracle.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-08	Merge tag 'nvme-6.11-2024-07-08' of git://git.infradead.org/nvme into ↵	Jens Axboe
	for-6.11/block Pull NVMe updates from Keith: "nvme updates for Linux 6.11 - Device initialization memory leak fixes (Keith) - More constants defined (Weiwen) - Target debugfs support (Hannes) - PCIe subsystem reset enhancements (Keith) - Queue-depth multipath policy (Redhat and PureStorage) - Implement get_unique_id (Christoph) - Authentication error fixes (Gaosheng)" * tag 'nvme-6.11-2024-07-08' of git://git.infradead.org/nvme: (21 commits) nvmet-auth: fix nvmet_auth hash error handling nvme: implement ->get_unique_id nvme-multipath: implement "queue-depth" iopolicy nvme-multipath: prepare for "queue-depth" iopolicy nvme-pci: do not directly handle subsys reset fallout lpfc_nvmet: implement 'host_traddr' nvme-fcloop: implement 'host_traddr' nvmet-fc: implement host_traddr() nvmet-rdma: implement host_traddr() nvmet-tcp: implement host_traddr() nvmet: add 'host_traddr' callback for debugfs nvmet: add debugfs support mailmap: add entry for Weiwen Hu nvme: rename CDR/MORE/DNR to NVME_STATUS_* nvme: fix status magic numbers nvme: rename nvme_sc_to_pr_err to nvme_status_to_pr_err nvme: split device add from initialization nvme: fc: split controller bringup handling nvme: rdma: split controller bringup handling nvme: tcp: split controller bringup handling ...
2024-07-08	jbd2: precompute number of transaction descriptor blocks	Jan Kara
	Instead of computing the number of descriptor blocks a transaction can have each time we need it (which is currently when starting each transaction but will become more frequent later) precompute the number once during journal initialization together with maximum transaction size. We perform the precomputation whenever journal feature set is updated similarly as for computation of journal->j_revoke_records_per_block. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08	jbd2: make jbd2_journal_get_max_txn_bufs() internal	Jan Kara
	There's no reason to have jbd2_journal_get_max_txn_bufs() public function. Currently all users are internal and can use journal->j_max_transaction_buffers instead. This saves some unnecessary recomputations of the limit as a bonus which becomes important as this function gets more complex in the following patch. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-09	cpufreq: Make cpufreq_driver->exit() return void	Lizhe
	The cpufreq core doesn't check the return type of the exit() callback and there is not much the core can do on failures at that point. Just drop the returned value and make it return void. Signed-off-by: Lizhe <sensor1010@163.com> [ Viresh: Reworked the patches to fix all missing changes together. ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> # Mediatek Acked-by: Sudeep Holla <sudeep.holla@arm.com> # scpi, scmi, vexpress Acked-by: Mario Limonciello <mario.limonciello@amd.com> # amd Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> # bmips Acked-by: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Kevin Hilman <khilman@baylibre.com> # omap
2024-07-08	of: dynamic: Introduce of_changeset_add_prop_bool()	Herve Codina
	APIs to add some properties in a changeset exist but nothing to add a DT boolean property (i.e. a property without any values). Fill this lack with of_changeset_add_prop_bool(). Signed-off-by: Herve Codina <herve.codina@bootlin.com> Link: https://lore.kernel.org/r/20240527161450.326615-16-herve.codina@bootlin.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-07-08	of: dynamic: Constify parameter in of_changeset_add_prop_string_array()	Herve Codina
	The str_array parameter has no reason to be an un-const array. Indeed, elements of the 'str_array' array are not changed by the code. Constify the 'str_array' array parameter. With this const qualifier added, the following construction is allowed: static const char * const tab_str[] = { "string1", "string2" }; of_changeset_add_prop_string_array(..., tab_str, ARRAY_SIZE(tab_str)); Signed-off-by: Herve Codina <herve.codina@bootlin.com> Link: https://lore.kernel.org/r/20240527161450.326615-14-herve.codina@bootlin.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-07-08	Input: make events() method return number of events processed	Dmitry Torokhov
	In preparation to consolidating filtering and event processing in the input core change events() method to return number of events processed by it. Reviewed-by: Jeff LaBundy <jeff@labundy.com> Reviewed-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240703213756.3375978-4-dmitry.torokhov@gmail.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-07-08	uaccess: always export _copy_[from\|to]_user with CONFIG_RUST	Arnd Bergmann
	Rust code needs to be able to access _copy_from_user and _copy_to_user so that it can skip the check_copy_size check in cases where the length is known at compile-time, mirroring the logic for when C code will skip check_copy_size. To do this, we ensure that exported versions of these methods are available when CONFIG_RUST is enabled. Alice has verified that this patch passes the CONFIG_TEST_USER_COPY test on x86 using the Android cuttlefish emulator. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/r/20240528-alice-mm-v7-2-78222c31b8f4@google.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-07-08	bpf: Fix too early release of tcx_entry	Daniel Borkmann
	Pedro Pinto and later independently also Hyunwoo Kim and Wongi Lee reported an issue that the tcx_entry can be released too early leading to a use after free (UAF) when an active old-style ingress or clsact qdisc with a shared tc block is later replaced by another ingress or clsact instance. Essentially, the sequence to trigger the UAF (one example) can be as follows: 1. A network namespace is created 2. An ingress qdisc is created. This allocates a tcx_entry, and &tcx_entry->miniq is stored in the qdisc's miniqp->p_miniq. At the same time, a tcf block with index 1 is created. 3. chain0 is attached to the tcf block. chain0 must be connected to the block linked to the ingress qdisc to later reach the function tcf_chain0_head_change_cb_del() which triggers the UAF. 4. Create and graft a clsact qdisc. This causes the ingress qdisc created in step 1 to be removed, thus freeing the previously linked tcx_entry: rtnetlink_rcv_msg() => tc_modify_qdisc() => qdisc_create() => clsact_init() [a] => qdisc_graft() => qdisc_destroy() => __qdisc_destroy() => ingress_destroy() [b] => tcx_entry_free() => kfree_rcu() // tcx_entry freed 5. Finally, the network namespace is closed. This registers the cleanup_net worker, and during the process of releasing the remaining clsact qdisc, it accesses the tcx_entry that was already freed in step 4, causing the UAF to occur: cleanup_net() => ops_exit_list() => default_device_exit_batch() => unregister_netdevice_many() => unregister_netdevice_many_notify() => dev_shutdown() => qdisc_put() => clsact_destroy() [c] => tcf_block_put_ext() => tcf_chain0_head_change_cb_del() => tcf_chain_head_change_item() => clsact_chain_head_change() => mini_qdisc_pair_swap() // UAF There are also other variants, the gist is to add an ingress (or clsact) qdisc with a specific shared block, then to replace that qdisc, waiting for the tcx_entry kfree_rcu() to be executed and subsequently accessing the current active qdisc's miniq one way or another. The correct fix is to turn the miniq_active boolean into a counter. What can be observed, at step 2 above, the counter transitions from 0->1, at step [a] from 1->2 (in order for the miniq object to remain active during the replacement), then in [b] from 2->1 and finally [c] 1->0 with the eventual release. The reference counter in general ranges from [0,2] and it does not need to be atomic since all access to the counter is protected by the rtnl mutex. With this in place, there is no longer a UAF happening and the tcx_entry is freed at the correct time. Fixes: e420bed02507 ("bpf: Add fd-based tcx multi-prog infra with link support") Reported-by: Pedro Pinto <xten@osec.io> Co-developed-by: Pedro Pinto <xten@osec.io> Signed-off-by: Pedro Pinto <xten@osec.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Hyunwoo Kim <v4bel@theori.io> Cc: Wongi Lee <qwerty@theori.io> Cc: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20240708133130.11609-1-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-07-08	dt-bindings: clock: airoha: Add reset support to EN7581 clock binding	Lorenzo Bianconi
	Introduce reset capability to EN7581 device-tree clock binding documentation. Add reset register mapping between misc scu and pb scu ones in order to follow the memory order. This change is not introducing any backward compatibility issue since the EN7581 dts is not upstream yet. Fixes: 0a382be005cf ("dt-bindings: clock: airoha: add EN7581 binding") Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/28fef3e83062d5d71e7b4be4b47583f851a15bf8.1719485847.git.lorenzo@kernel.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-07-08	nfsd: new netlink ops to get/set server pool_mode	Jeff Layton
	Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-07-08	sunrpc: refactor pool_mode setting code	Jeff Layton
	Allow the pool_mode setting code to be called from internal callers so we can call it from a new netlink op. Add a new svc_pool_map_get function to return the current setting. Change the existing module parameter handling to use the new interfaces under the hood. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-07-08	sunrpc: fix up the special handling of sv_nrpools == 1	Jeff Layton
	Only pooled services take a reference to the svc_pool_map. The sunrpc code has always used the sv_nrpools value to detect whether the service is pooled. The problem there is that nfsd is a pooled service, but when it's running in "global" pool_mode, it doesn't take a reference to the pool map because it has a sv_nrpools value of 1. This means that we have two separate codepaths for starting the server, depending on whether it's pooled or not. Fix this by adding a new flag to the svc_serv, that indicates whether the serv is pooled. With this we can have the nfsd service unconditionally take a reference, regardless of pool_mode. Note that this is a behavior change for /sys/module/sunrpc/parameters/pool_mode. Usually this file does not allow you to change the pool-mode while there are nfsd threads running, but if the pool-mode is "global" it's allowed. My assumption is that this is a bug, since it probably should never have worked this way. This patch changes the behavior such that you get back EBUSY even when nfsd is running in global mode. I think this is more reasonable behavior, and given that most people set this today using the module parameter, it's doubtful anyone will notice. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-07-08	nfs: move nfs_wait_on_request to write.c	Christoph Hellwig
	nfs_wait_on_request is now only used in write.c. Move it there and mark it static. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	nfs: fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requests	Christoph Hellwig
	Fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requests to prepare for future changes to this code, and move the helpers to write.c as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	nfs: simplify nfs_folio_find_and_lock_request	Christoph Hellwig
	nfs_folio_find_and_lock_request and the nfs_page_group_lock_head helper called by it spend quite some effort to deal with head vs subrequests. But given that only the head request can be stashed in the folio private data, non of that is required. Fold the locking logic from nfs_page_group_lock_head into nfs_folio_find_and_lock_request and simplify the result based on the invariant that we always find the head request in the folio private data. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	nfs: remove dead code for the old swap over NFS implementation	Christoph Hellwig
	Remove the code testing folio_test_swapcache either explicitly or implicitly in pagemap.h headers, as is now handled using the direct I/O path and not the buffered I/O path that these helpers are located in. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	nfs: Block on write congestion	Jan Kara
	Commit 6df25e58532b ("nfs: remove reliance on bdi congestion") introduced NFS-private solution for limiting number of writes outstanding against a particular server. Unlike previous bdi congestion this algorithm actually works and limits number of outstanding writeback pages to nfs_congestion_kb which scales with amount of client's memory and is capped at 256 MB. As a result some workloads such as random buffered writes over NFS got slower (from ~170 MB/s to ~126 MB/s). The fio command to reproduce is: fio --direct=0 --ioengine=sync --thread --invalidate=1 --group_reporting=1 --runtime=300 --fallocate=posix --ramp_time=10 --new_group --rw=randwrite --size=64256m --numjobs=4 --bs=4k --fsync_on_close=1 --end_fsync=1 This happens because the client sends ~256 MB worth of dirty pages to the server and any further background writeback request is ignored until the number of writeback pages gets below the threshold of 192 MB. By the time this happens and clients decides to trigger another round of writeback, the server often has no pages to write and the disk is idle. To fix this problem and make the client react faster to eased congestion of the server by blocking waiting for congestion to resolve instead of aborting writeback. This improves the random 4k buffered write throughput to 184 MB/s. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	NFSv4/pNFS: Do layout state recovery upon reboot	Trond Myklebust
	Some pNFS implementations, such as flexible files, want the client to send the layout stats and layout errors that may have incurred while the metadata server was booting. To do so, the client sends a layoutreturn with an all-zero stateid while the server is in grace during reboot recovery. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	NFSv4.1: constify the stateid argument in nfs41_test_stateid()	Trond Myklebust
	Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	Return the delegation when deleting sillyrenamed files	Lance Shelton
	Add a callback to return the delegation in order to allow generic NFS code to return the delegation when appropriate. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	NFSv4: Detect support for OPEN4_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION	Trond Myklebust
	If the server supports the NFSv4.2 protocol extension to optimise away returning a stateid when it returns a delegation, then we cache that information in another capability flag. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	NFSv4: Add support for the FATTR4_OPEN_ARGUMENTS attribute	Trond Myklebust
	Query the server for the OPEN arguments that it supports so that we can figure out which extensions we can use. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	NFSv4: Add a capability for delegated attributes	Trond Myklebust
	Cache whether or not the server may have support for delegated attributes in a capability flag. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	NFSv4: Add recovery of attribute delegations	Trond Myklebust
	After a reboot of the NFSv4.2 server, the recovery code needs to specify whether the delegation to be recovered is an attribute delegation or not. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	NFSv4: Add a flags argument to the 'have_delegation' callback	Trond Myklebust
	This argument will be used to allow the caller to specify whether or not they need to know that this is an attribute delegation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	NFSv4: Plumb in XDR support for the new delegation-only setattr op	Trond Myklebust
	We want to send the updated atime and mtime as part of the delegreturn compound. Add a special structure to hold those variables. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	NFSv4: Add new attribute delegation definitions	Trond Myklebust
	Add the attribute delegation XDR definitions from the spec. Signed-off-by: Tom Haynes <loghyr@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	NFSv4: Clean up open delegation return structure	Trond Myklebust
	Instead of having the fields open coded in the struct nfs_openres, add a separate structure for them so that we can reuse that code for the WANT_DELEGATION case. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	xprtrdma: Handle device removal outside of the CM event handler	Chuck Lever
	Wait for all disconnects to complete to ensure the transport has divested all of its hardware resources before the underlying RDMA device can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	rpcrdma: Implement generic device removal	Chuck Lever
	Commit e87a911fed07 ("nvme-rdma: use ib_client API to detect device removal") explains the benefits of handling device removal outside of the CM event handler. Sketch in an IB device removal notification mechanism that can be used by both the client and server side RPC-over-RDMA transport implementations. Suggested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08	interconnect: icc-clk: Add devm_icc_clk_register	Varadarajan Narayanan
	Wrap icc_clk_register to create devm_icc_clk_register to be able to release the resources properly. Acked-by: Georgi Djakov <djakov@kernel.org> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Varadarajan Narayanan <quic_varada@quicinc.com> Link: https://lore.kernel.org/r/20240430064214.2030013-4-quic_varada@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-07-08	interconnect: icc-clk: Specify master/slave ids	Varadarajan Narayanan
	Presently, icc-clk driver autogenerates the master and slave ids. However, devices with multiple nodes on the interconnect could have other constraints and may not match with the auto generated node ids. Hence, modify the driver to use the master/slave ids provided by the caller instead of auto generating. Also, update clk-cbf-8996 accordingly. Acked-by: Georgi Djakov <djakov@kernel.org> Signed-off-by: Varadarajan Narayanan <quic_varada@quicinc.com> Link: https://lore.kernel.org/r/20240430064214.2030013-2-quic_varada@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2024-07-08	Merge branch '20240430064214.2030013-3-quic_varada@quicinc.com' into ↵	Bjorn Andersson
	clk-for-6.11 Merge the IPQ9574 interconnect binding through a topic branch, to make it possible to use the constants in the DeviceTree source branch as well.