linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-03-10	smb: client: retry compound request without reusing lease	Meetakshi Setiya
	There is a shortcoming in the current implementation of the file lease mechanism exposed when the lease keys were attempted to be reused for unlink, rename and set_path_size operations for a client. As per MS-SMB2, lease keys are associated with the file name. Linux smb client maintains lease keys with the inode. If the file has any hardlinks, it is possible that the lease for a file be wrongly reused for an operation on the hardlink or vice versa. In these cases, the mentioned compound operations fail with STATUS_INVALID_PARAMETER. This patch adds a fallback to the old mechanism of not sending any lease with these compound operations if the request with lease key fails with STATUS_INVALID_PARAMETER. Resending the same request without lease key should not hurt any functionality, but might impact performance especially in cases where the error is not because of the usage of wrong lease key and we might end up doing an extra roundtrip. Signed-off-by: Meetakshi Setiya <msetiya@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-10	smb: client: do not defer close open handles to deleted files	Meetakshi Setiya
	When a file/dentry has been deleted before closing all its open handles, currently, closing them can add them to the deferred close list. This can lead to problems in creating file with the same name when the file is re-created before the deferred close completes. This issue was seen while reusing a client's already existing lease on a file for compound operations and xfstest 591 failed because of the deferred close handle that remained valid even after the file was deleted and was being reused to create a file with the same name. The server in this case returns an error on open with STATUS_DELETE_PENDING. Recreating the file would fail till the deferred handles are closed (duration specified in closetimeo). This patch fixes the issue by flagging all open handles for the deleted file (file path to be precise) by setting status_file_deleted to true in the cifsFileInfo structure. As per the information classes specified in MS-FSCC, SMB2 query info response from the server has a DeletePending field, set to true to indicate that deletion has been requested on that file. If this is the case, flag the open handles for this file too. When doing close in cifs_close for each of these handles, check the value of this boolean field and do not defer close these handles if the corresponding filepath has been deleted. Signed-off-by: Meetakshi Setiya <msetiya@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-10	smb: client: reuse file lease key in compound operations	Meetakshi Setiya
	Currently, when a rename, unlink or set path size compound operation is requested on a file that has a lot of dirty pages to be written to the server, we do not send the lease key for these requests. As a result, the server can assume that this request is from a new client, and send a lease break notification to the same client, on the same connection. As a response to the lease break, the client can consume several credits to write the dirty pages to the server. Depending on the server's credit grant implementation, the server can stop granting more credits to this connection, and this can cause a deadlock (which can only be resolved when the lease timer on the server expires). One of the problems here is that the client is sending no lease key, even if it has a lease for the file. This patch fixes the problem by reusing the existing lease key on the file for rename, unlink and set path size compound operations so that the client does not break its own lease. A very trivial example could be a set of commands by a client that maintains open handle (for write) to a file and then tries to copy the contents of that file to another one, eg., tail -f /dev/null > myfile & mv myfile myfile2 Presently, the network capture on the client shows that the move (or rename) would trigger a lease break on the same client, for the same file. With the lease key reused, the lease break request-response overhead is eliminated, thereby reducing the roundtrips performed for this set of operations. The patch fixes the bug described above and also provides perf benefit. Signed-off-by: Meetakshi Setiya <msetiya@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-10	smb3: update allocation size more accurately on write completion	Steve French
	Changes to allocation size are approximated for extending writes of cached files until the server returns the actual value (on SMB3 close or query info for example), but it was setting the estimated value for number of blocks to larger than the file size even if the file is likely sparse which breaks various xfstests (e.g. generic/129, 130, 221, 228). When i_size and i_blocks are updated in write completion do not increase allocation size more than what was written (rounded up to 512 bytes). Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-10	cifs: minor update to list of reviewers	Steve French
	Add Bharath for reviewing deferred close and leases Acked-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-10	smb: remove SLAB_MEM_SPREAD flag usage	Chengming Zhou
	The SLAB_MEM_SPREAD flag is already a no-op as of 6.8-rc1, remove its usage so we can delete it from slab. No functional change. Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-0-02f1753e8303@suse.cz/ Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-10	cifs: allow changing password during remount	Steve French
	There are cases where a session is disconnected and password has changed on the server (or expired) for this user and this currently can not be fixed without unmount and mounting again. This patch allows remount to change the password (for the non Kerberos case, Kerberos ticket refresh is handled differently) when the session is disconnected and the user can not reconnect due to still using old password. Future patches should also allow us to setup the keyring (cifscreds) to have an "alternate password" so we would be able to change the password before the session drops (without the risk of races between when the password changes and the disconnect occurs - ie cases where the old password is still needed because the new password has not fully rolled out to all servers yet). Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-10	cifs: prevent updating file size from server if we have a read/write lease	Bharath SM
	In cases of large directories, the readdir operation may span multiple round trips to retrieve contents. This introduces a potential race condition in case of concurrent write and readdir operations. If the readdir operation initiates before a write has been processed by the server, it may update the file size attribute to an older value. Address this issue by avoiding file size updates from readdir when we have read/write lease. Scenario: 1) process1: open dir xyz 2) process1: readdir instance 1 on xyz 3) process2: create file.txt for write 4) process2: write x bytes to file.txt 5) process2: close file.txt 6) process2: open file.txt for read 7) process1: readdir 2 - overwrites file.txt inode size to 0 8) process2: read contents of file.txt - bug, short read with 0 bytes Cc: stable@vger.kernel.org Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Bharath SM <bharathsm@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-03-10	mailbox: imx: support i.MX95 Generic/ELE/V2X MU	Peng Fan
	Add i.MX95 Generic/ELE/V2X MU support, its register layout is same as i.MX8ULP, but the Parameter registers would show different TR/RR. Since the driver already supports get TR/RR from Parameter registers, not hardcoding the number, this patch just add the compatible entry to reuse i.MX8ULP S4 cfg data. Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2024-03-10	mailbox: imx: populate sub-nodes	Peng Fan
	Some MUs such as i.MX95 MU, have internal SRAM which could be used for SCMI shared memory, so populate the sub-nodes to use the SRAM. Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2024-03-10	mailbox: imx: get RR/TR registers num from Parameter register	Peng Fan
	i.MX8ULP, i.MX93 MU has a Parameter register encoded as below: BIT: 15 --- 8 \| 7 --- 0 RR_NUM TR_NUM So to make driver easy to support more variants, get the RR/TR registers number from Parameter register. The patch only adds support the specific MU, such as ELE MU. For generic MU, not add support for number larger than 4. Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2024-03-10	mailbox: imx: support return value of init	Peng Fan
	There will be changes that init may fail, so adding return value for init function. Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2024-03-10	dt-bindings: mailbox: fsl,mu: add i.MX95 Generic/ELE/V2X MU compatible	Peng Fan
	Add i.MX95 Generic, Secure Enclave and V2X Message Unit compatible string. And the MUs in AONMIX has internal RAMs for SCMI shared buffer usage. Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Jassi Brar <jassisinghbrar@gmail.com>
2024-03-10	Linux 6.8v6.8	Linus Torvalds

2024-03-10	hwmon: (dell-smm) Add XPS 9315 to fan control whitelist	Armin Wolf
	A user reported that on this machine, disabling BIOS fan control is necessary in order to change the fan speed. Signed-off-by: Armin Wolf <W_Armin@gmx.de> Acked-by: Pali Rohár <pali@kernel.org> Link: https://lore.kernel.org/r/20240309212025.13758-1-W_Armin@gmx.de Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-03-10	PCI: hv: Fix ring buffer size calculation	Michael Kelley
	For a physical PCI device that is passed through to a Hyper-V guest VM, current code specifies the VMBus ring buffer size as 4 pages. But this is an inappropriate dependency, since the amount of ring buffer space needed is unrelated to PAGE_SIZE. For example, on x86 the ring buffer size ends up as 16 Kbytes, while on ARM64 with 64 Kbyte pages, the ring size bloats to 256 Kbytes. The ring buffer for PCI pass-thru devices is used for only a few messages during device setup and removal, so any space above a few Kbytes is wasted. Fix this by declaring the ring buffer size to be a fixed 16 Kbytes. Furthermore, use the VMBUS_RING_SIZE() macro so that the ring buffer header is properly accounted for, and so the size is rounded up to a page boundary, using the page size for which the kernel is built. While w/64 Kbyte pages this results in a 64 Kbyte ring buffer header plus a 64 Kbyte ring buffer, that's the smallest possible with that page size. It's still 128 Kbytes better than the current code. Link: https://lore.kernel.org/linux-pci/20240216202240.251818-1-mhklinux@outlook.com Signed-off-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Reviewed-by: Ilpo Jarvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: Long Li <longli@microsoft.com> Cc: <stable@vger.kernel.org> # 5.15.x
2024-03-10	Merge tag 'trace-ring-buffer-v6.8-rc7' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Do not allow large strings (> 4096) as single write to trace_marker The size of a string written into trace_marker was determined by the size of the sub-buffer in the ring buffer. That size is dependent on the PAGE_SIZE of the architecture as it can be mapped into user space. But on PowerPC, where PAGE_SIZE is 64K, that made the limit of the string of writing into trace_marker 64K. One of the selftests looks at the size of the ring buffer sub-buffers and writes that plus more into the trace_marker. The write will take what it can and report back what it consumed so that the user space application (like echo) will write the rest of the string. The string is stored in the ring buffer and can be read via the "trace" or "trace_pipe" files. The reading of the ring buffer uses vsnprintf(), which uses a precision "%.s" to make sure it only reads what is stored in the buffer, as a bug could cause the string to be non terminated. With the combination of the precision change and the PAGE_SIZE of 64K allowing huge strings to be added into the ring buffer, plus the test that would actually stress that limit, a bug was reported that the precision used was too big for "%.s" as the string was close to 64K in size and the max precision of vsnprintf is 32K. Linus suggested not to have that precision as it could hide a bug if the string was again stored without a nul byte. Another issue that was brought up is that the trace_seq buffer is also based on PAGE_SIZE even though it is not tied to the architecture limit like the ring buffer sub-buffer is. Having it be 64K * 2 is simply just too big and wasting memory on systems with 64K page sizes. It is now hardcoded to 8K which is what all other architectures with 4K PAGE_SIZE has. Finally, the write to trace_marker is now limited to 4K as there is no reason to write larger strings into trace_marker. - ring_buffer_wait() should not loop. The ring_buffer_wait() does not have the full context (yet) on if it should loop or not. Just exit the loop as soon as its woken up and let the callers decide to loop or not (they already do, so it's a bit redundant). - Fix shortest_full field to be the smallest amount in the ring buffer that a waiter is waiting for. The "shortest_full" field is updated when a new waiter comes in and wants to wait for a smaller amount of data in the ring buffer than other waiters. But after all waiters are woken up, it's not reset, so if another waiter comes in wanting to wait for more data, it will be woken up when the ring buffer has a smaller amount from what the previous waiters were waiting for. - The wake up all waiters on close is incorrectly called frome .release() and not from .flush() so it will never wake up any waiters as the .release() will not get called until all .read() calls are finished. And the wakeup is for the waiters in those .read() calls. * tag 'trace-ring-buffer-v6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Use .flush() call to wake up readers ring-buffer: Fix resetting of shortest_full ring-buffer: Fix waking up ring buffer readers tracing: Limit trace_marker writes to just 4K tracing: Limit trace_seq size to just 8K and not depend on architecture PAGE_SIZE tracing: Remove precision vsnprintf() check from print event
2024-03-10	PCI: dwc: endpoint: Fix advertised resizable BAR size	Niklas Cassel
	The commit message in commit fc9a77040b04 ("PCI: designware-ep: Configure Resizable BAR cap to advertise the smallest size") claims that it modifies the Resizable BAR capability to only advertise support for 1 MB size BARs. However, the commit writes all zeroes to PCI_REBAR_CAP (the register which contains the possible BAR sizes that a BAR be resized to). According to the spec, it is illegal to not have a bit set in PCI_REBAR_CAP, and 1 MB is the smallest size allowed. Set bit 4 in PCI_REBAR_CAP, so that we actually advertise support for a 1 MB BAR size. Before: Capabilities: [2e8 v1] Physical Resizable BAR BAR 0: current size: 1MB BAR 1: current size: 1MB BAR 2: current size: 1MB BAR 3: current size: 1MB BAR 4: current size: 1MB BAR 5: current size: 1MB After: Capabilities: [2e8 v1] Physical Resizable BAR BAR 0: current size: 1MB, supported: 1MB BAR 1: current size: 1MB, supported: 1MB BAR 2: current size: 1MB, supported: 1MB BAR 3: current size: 1MB, supported: 1MB BAR 4: current size: 1MB, supported: 1MB BAR 5: current size: 1MB, supported: 1MB Fixes: fc9a77040b04 ("PCI: designware-ep: Configure Resizable BAR cap to advertise the smallest size") Link: https://lore.kernel.org/linux-pci/20240307111520.3303774-1-cassel@kernel.org Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: <stable@vger.kernel.org> # 5.2
2024-03-10	Merge tag 'phy-fixes3-6.8' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy Pull phy fixes from Vinod Koul: - fixes for Qualcomm qmp-combo driver for ordering of drm and type-c switch registartion due to drivers might not probe defer after having registered child devices to avoid triggering a probe deferral loop. This fixes internal display on Lenovo ThinkPad X13s * tag 'phy-fixes3-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: phy: qcom-qmp-combo: fix type-c switch registration phy: qcom-qmp-combo: fix drm bridge registration
2024-03-10	PCI: cadence: Clear the ARI Capability Next Function Number of the last function	Jasko-EXT Wojciech
	Next Function Number field in ARI Capability Register for last function must be zero by default as per the PCIe specification, indicating there is no next higher number function but that's not happening in our case, so this patch clears the Next Function Number field for last function used. [kwilczynski: white spaces update for one define] Link: https://lore.kernel.org/linux-pci/20231202085015.3048516-1-s-vadapalli@ti.com Signed-off-by: Jasko-EXT Wojciech <wojciech.jasko-EXT@continental-corporation.com> Signed-off-by: Achal Verma <a-verma1@ti.com> Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Vignesh Raghavendra <vigneshr@ti.com>
2024-03-10	PCI: dwc: Strengthen the MSI address allocation logic	Ajay Agarwal
	There can be platforms that do not use/have 32-bit DMA addresses. The current implementation of 32-bit IOVA allocation can fail for such platforms, eventually leading to the probe failure. Try to allocate a 32-bit msi_data. If this allocation fails, attempt a 64-bit address allocation. Please note that if the 64-bit MSI address is allocated, then the EPs supporting 32-bit MSI address only will not work. Link: https://lore.kernel.org/linux-pci/20240221153840.1789979-1-ajayagarwal@google.com Tested-by: Will McVicker <willmcvicker@google.com> Signed-off-by: Ajay Agarwal <ajayagarwal@google.com> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Serge Semin <fancer.lancer@gmail.com> Reviewed-by: Will McVicker <willmcvicker@google.com>
2024-03-10	PCI: brcmstb: Fix broken brcm_pcie_mdio_write() polling	Jonathan Bell
	The MDIO_WT_DONE() macro tests bit 31, which is always 0 (== done) as readw_poll_timeout_atomic() does a 16-bit read. Replace with the readl variant. [kwilczynski: commit log] Fixes: ca5dcc76314d ("PCI: brcmstb: Replace status loops with read_poll_timeout_atomic()") Link: https://lore.kernel.org/linux-pci/20240217133722.14391-1-wahrenst@gmx.net Signed-off-by: Jonathan Bell <jonathan@raspberrypi.com> Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>
2024-03-10	PCI: qcom: Add X1E80100 PCIe support	Abel Vesa
	Add the compatible and the driver data for X1E80100 PCIe controller. There are 5 controller instances found on this platform, out of which 2 are Gen3 with speeds of up to 8.0GT/s, while the other 3 are Gen4 with speeds of up to 16GT/s. The version of the controller is 1.38.0 for all instances, but they are compatible with 1.9.0 config. The max link width is x8 for one controller, x4 for two of others and x2 for the two left. [kwilczynski: commit log] Link: https://lore.kernel.org/linux-pci/20240301-x1e80100-pci-v4-2-7ab7e281d647@linaro.org Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
2024-03-10	dt-bindings: PCI: qcom: Document the X1E80100 PCIe Controller	Abel Vesa
	Add dedicated schema for the PCIe controllers found on X1E80100. Link: https://lore.kernel.org/linux-pci/20240301-x1e80100-pci-v4-1-7ab7e281d647@linaro.org Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Acked-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
2024-03-10	PCI: qcom: Enable BDF to SID translation properly	Manivannan Sadhasivam
	Qcom SoCs making use of ARM SMMU require BDF to SID translation table in the driver to properly map the SID for the PCIe devices based on their BDF identifier. This is currently achieved with the help of qcom_pcie_config_sid_1_9_0() function for SoCs supporting the 1_9_0 config. But With newer Qcom SoCs starting from SM8450, BDF to SID translation is set to bypass mode by default in hardware. Due to this, the translation table that is set in the qcom_pcie_config_sid_1_9_0() is essentially unused and the default SID is used for all endpoints in SoCs starting from SM8450. This is a security concern and also warrants swapping the DeviceID in DT while using the GIC ITS to handle MSIs from endpoints. The swapping is currently done like below in DT when using GIC ITS: /* * MSIs for BDF (1:0.0) only works with Device ID 0x5980. * Hence, the IDs are swapped. */ msi-map = <0x0 &gic_its 0x5981 0x1>, <0x100 &gic_its 0x5980 0x1>; Here, swapping of the DeviceIDs ensure that the endpoint with BDF (1:0.0) gets the DeviceID 0x5980 which is associated with the default SID as per the iommu mapping in DT. So MSIs were delivered with IDs swapped so far. But this also means the Root Port (0:0.0) won't receive any MSIs (for PME, AER etc...) So let's fix these issues by clearing the BDF to SID bypass mode for all SoCs making use of the 1_9_0 config. This allows the PCIe devices to use the correct SID, thus avoiding the DeviceID swapping hack in DT and also achieving the isolation between devices. Fixes: 4c9398822106 ("PCI: qcom: Add support for configuring BDF to SID mapping for SM8250") Link: https://lore.kernel.org/linux-pci/20240307-pci-bdf-sid-fix-v1-1-9423a7e2d63c@linaro.org Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org # 5.11
2024-03-10	tracing: Use .flush() call to wake up readers	Steven Rostedt (Google)
	The .release() function does not get called until all readers of a file descriptor are finished. If a thread is blocked on reading a file descriptor in ring_buffer_wait(), and another thread closes the file descriptor, it will not wake up the other thread as ring_buffer_wake_waiters() is called by .release(), and that will not get called until the .read() is finished. The issue originally showed up in trace-cmd, but the readers are actually other processes with their own file descriptors. So calling close() would wake up the other tasks because they are blocked on another descriptor then the one that was closed(). But there's other wake ups that solve that issue. When a thread is blocked on a read, it can still hang even when another thread closed its descriptor. This is what the .flush() callback is for. Have the .flush() wake up the readers. Link: https://lore.kernel.org/linux-trace-kernel/20240308202432.107909457@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linke li <lilinke99@qq.com> Cc: Rabin Vincent <rabin@rab.in> Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-03-10	ring-buffer: Fix resetting of shortest_full	Steven Rostedt (Google)
	The "shortest_full" variable is used to keep track of the waiter that is waiting for the smallest amount on the ring buffer before being woken up. When a tasks waits on the ring buffer, it passes in a "full" value that is a percentage. 0 means wake up on any data. 1-100 means wake up from 1% to 100% full buffer. As all waiters are on the same wait queue, the wake up happens for the waiter with the smallest percentage. The problem is that the smallest_full on the cpu_buffer that stores the smallest amount doesn't get reset when all the waiters are woken up. It does get reset when the ring buffer is reset (echo > /sys/kernel/tracing/trace). This means that tasks may be woken up more often then when they want to be. Instead, have the shortest_full field get reset just before waking up all the tasks. If the tasks wait again, they will update the shortest_full before sleeping. Also add locking around setting of shortest_full in the poll logic, and change "work" to "rbwork" to match the variable name for rb_irq_work structures that are used in other places. Link: https://lore.kernel.org/linux-trace-kernel/20240308202431.948914369@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linke li <lilinke99@qq.com> Cc: Rabin Vincent <rabin@rab.in> Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-03-10	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds
	Pull kvm fixes from Paolo Bonzini: "KVM GUEST_MEMFD fixes for 6.8: - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to avoid creating an inconsistent ABI (KVM_MEM_GUEST_MEMFD is not writable from userspace, so there would be no way to write to a read-only guest_memfd). - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly clear that such VMs are purely for development and testing. - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan is to support confidential VMs with deterministic private memory (SNP and TDX) only in the TDP MMU. - Fix a bug in a GUEST_MEMFD dirty logging test that caused false passes. x86 fixes: - Fix missing marking of a guest page as dirty when emulating an atomic access. - Check for mmu_notifier invalidation events before faulting in the pfn, and before acquiring mmu_lock, to avoid unnecessary work and lock contention with preemptible kernels (including CONFIG_PREEMPT_DYNAMIC in non-preemptible mode). - Disable AMD DebugSwap by default, it breaks VMSA signing and will be re-enabled with a better VM creation API in 6.10. - Do the cache flush of converted pages in svm_register_enc_region() before dropping kvm->lock, to avoid a race with unregistering of the same region and the consequent use-after-free issue" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: SEV: disable SEV-ES DebugSwap by default KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region() KVM: selftests: Add a testcase to verify GUEST_MEMFD and READONLY are exclusive KVM: selftests: Create GUEST_MEMFD for relevant invalid flags testcases KVM: x86/mmu: Restrict KVM_SW_PROTECTED_VM to the TDP MMU KVM: x86: Update KVM_SW_PROTECTED_VM docs to make it clear they're a WIP KVM: Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY KVM: x86: Mark target gfn of emulated atomic instruction as dirty
2024-03-10	ring-buffer: Fix waking up ring buffer readers	Steven Rostedt (Google)
	A task can wait on a ring buffer for when it fills up to a specific watermark. The writer will check the minimum watermark that waiters are waiting for and if the ring buffer is past that, it will wake up all the waiters. The waiters are in a wait loop, and will first check if a signal is pending and then check if the ring buffer is at the desired level where it should break out of the loop. If a file that uses a ring buffer closes, and there's threads waiting on the ring buffer, it needs to wake up those threads. To do this, a "wait_index" was used. Before entering the wait loop, the waiter will read the wait_index. On wakeup, it will check if the wait_index is different than when it entered the loop, and will exit the loop if it is. The waker will only need to update the wait_index before waking up the waiters. This had a couple of bugs. One trivial one and one broken by design. The trivial bug was that the waiter checked the wait_index after the schedule() call. It had to be checked between the prepare_to_wait() and the schedule() which it was not. The main bug is that the first check to set the default wait_index will always be outside the prepare_to_wait() and the schedule(). That's because the ring_buffer_wait() doesn't have enough context to know if it should break out of the loop. The loop itself is not needed, because all the callers to the ring_buffer_wait() also has their own loop, as the callers have a better sense of what the context is to decide whether to break out of the loop or not. Just have the ring_buffer_wait() block once, and if it gets woken up, exit the function and let the callers decide what to do next. Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNSRZfg@mail.gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240308202431.792933613@goodmis.org Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linke li <lilinke99@qq.com> Cc: Rabin Vincent <rabin@rab.in> Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-03-10	erofs: support compressed inodes over fscache	Jingbo Xu
	Since fscache can utilize iov_iter to write dest buffers, bio_vec can be used in this way too. To simplify this, pseudo bios are prepared and bio_vec will be filled with bio_add_page(). And a common .bi_end_io will be called directly to handle I/O completions. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240308094159.40547-2-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-03-10	erofs: make iov_iter describe target buffers over fscache	Jingbo Xu
	So far the fscache mode supports uncompressed data only, and the data read from fscache is put directly into the target page cache. As the support for compressed data in fscache mode is going to be introduced, rework the fscache internals so that the following compressed part could make the raw data read from fscache be directed to the target buffer it wants, decompress the raw data, and finally fill the page cache with the decompressed data. As the first step, a new structure, i.e. erofs_fscache_io (io), is introduced to describe a generic read request from the fscache, while the caller can specify the target buffer it wants in the iov_iter structure (io->iter). Besides, the caller can also specify its completion callback and private data through erofs_fscache_io, which will be called to make further handling, e.g. unlocking the page cache for uncompressed data or decompressing the read raw data, when the read request from the fscache completes. Now erofs_fscache_read_io_async() serves as a generic interface for reading raw data from fscache for both compressed and uncompressed data. The erofs_fscache_rq structure is kept to describe a request to fill the page cache in the specified range. Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240308094159.40547-1-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-03-10	erofs: fix lockdep false positives on initializing erofs_pseudo_mnt	Baokun Li
	Lockdep reported the following issue when mounting erofs with a domain_id: ============================================ WARNING: possible recursive locking detected 6.8.0-rc7-xfstests #521 Not tainted -------------------------------------------- mount/396 is trying to acquire lock: ffff907a8aaaa0e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0xe3/0x3d0 but task is already holding lock: ffff907a8aaa90e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0xe3/0x3d0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&type->s_umount_key#50/1); lock(&type->s_umount_key#50/1); * DEADLOCK * May be due to missing lock nesting notation 2 locks held by mount/396: #0: ffff907a8aaa90e0 (&type->s_umount_key#50/1){+.+.}-{3:3}, at: alloc_super+0xe3/0x3d0 #1: ffffffffc00e6f28 (erofs_domain_list_lock){+.+.}-{3:3}, at: erofs_fscache_register_fs+0x3d/0x270 [erofs] stack backtrace: CPU: 1 PID: 396 Comm: mount Not tainted 6.8.0-rc7-xfstests #521 Call Trace: <TASK> dump_stack_lvl+0x64/0xb0 validate_chain+0x5c4/0xa00 __lock_acquire+0x6a9/0xd50 lock_acquire+0xcd/0x2b0 down_write_nested+0x45/0xd0 alloc_super+0xe3/0x3d0 sget_fc+0x62/0x2f0 vfs_get_super+0x21/0x90 vfs_get_tree+0x2c/0xf0 fc_mount+0x12/0x40 vfs_kern_mount.part.0+0x75/0x90 kern_mount+0x24/0x40 erofs_fscache_register_fs+0x1ef/0x270 [erofs] erofs_fc_fill_super+0x213/0x380 [erofs] This is because the file_system_type of both erofs and the pseudo-mount point of domain_id is erofs_fs_type, so two successive calls to alloc_super() are considered to be using the same lock and trigger the warning above. Therefore add a nodev file_system_type called erofs_anon_fs_type in fscache.c to silence this complaint. Because kern_mount() takes a pointer to struct file_system_type, not its (string) name. So we don't need to call register_filesystem(). In addition, call init_pseudo() in erofs_anon_init_fs_context() as suggested by Al Viro, so that we can remove erofs_fc_fill_pseudo_super(), erofs_fc_anon_get_tree(), and erofs_anon_context_ops. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: a9849560c55e ("erofs: introduce a pseudo mnt to manage shared cookies") Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-and-tested-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Yang Erkun <yangerkun@huawei.com> Link: https://lore.kernel.org/r/20240307101018.2021925-1-libaokun1@huawei.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2024-03-10	erofs: refine managed cache operations to folios	Gao Xiang
	Convert erofs_try_to_free_all_cached_pages() and z_erofs_cache_release_folio(). Besides, erofs_page_is_managed() is moved to zdata.c and renamed as erofs_folio_is_managed(). Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240305091448.1384242-6-hsiangkao@linux.alibaba.com
2024-03-10	erofs: convert z_erofs_submissionqueue_endio() to folios	Gao Xiang
	Use bio_for_each_folio() to iterate over each folio in the bio and there is no large folios for now. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240305091448.1384242-5-hsiangkao@linux.alibaba.com
2024-03-10	erofs: convert z_erofs_fill_bio_vec() to folios	Gao Xiang
	Introduce a folio member to `struct z_erofs_bvec` and convert most of z_erofs_fill_bio_vec() to folios, which is still straight-forward. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240305091448.1384242-4-hsiangkao@linux.alibaba.com
2024-03-10	erofs: get rid of `justfound` debugging tag	Gao Xiang
	`justfound` is introduced to identify cached folios that are just added to compressed bvecs so that more checks can be applied in the I/O submission path. EROFS is quite now stable compared to the codebase at that stage. `justfound` becomes a burden for upcoming features. Drop it. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240305091448.1384242-3-hsiangkao@linux.alibaba.com
2024-03-10	erofs: convert z_erofs_do_read_page() to folios	Gao Xiang
	It is a straight-forward conversion. Besides, it's renamed as z_erofs_scan_folio(). Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240305091448.1384242-2-hsiangkao@linux.alibaba.com
2024-03-10	erofs: convert z_erofs_onlinepage_.* to folios	Gao Xiang
	Online folios are locked file-backed folios which will eventually keep decoded (e.g. decompressed) data of each inode for end users to utilize. It may belong to a few pclusters and contain other data (e.g. compressed data for inplace I/Os) temporarily in a time-sharing manner to reduce memory footprints for low-ended storage devices with high latencies under heary I/O pressure. Apart from folio_end_read() usage, it's a straight-forward conversion. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240305091448.1384242-1-hsiangkao@linux.alibaba.com
2024-03-10	openrisc: Use asm-generic's version of fix_to_virt() & virt_to_fix()	Dawei Li
	Openrisc's implementation of fix_to_virt() & virt_to_fix() share same functionality with ones of asm generic. Plus, generic version of fix_to_virt() can trap invalid index at compile time. Thus, Replace the arch-specific implementations with asm generic's ones. Signed-off-by: Dawei Li <set_pte_at@outlook.com> Signed-off-by: Stafford Horne <shorne@gmail.com>
2024-03-10	openrisc: Call setup_memory() earlier in the init sequence	Oreoluwa Babatunde
	The unflatten_and_copy_device_tree() function contains a call to memblock_alloc(). This means that memblock is allocating memory before any of the reserved memory regions are set aside in the setup_memory() function which calls early_init_fdt_scan_reserved_mem(). Therefore, there is a possibility for memblock to allocate from any of the reserved memory regions. Hence, move the call to setup_memory() to be earlier in the init sequence so that the reserved memory regions are set aside before any allocations are done using memblock. Signed-off-by: Oreoluwa Babatunde <quic_obabatun@quicinc.com> Signed-off-by: Stafford Horne <shorne@gmail.com>
2024-03-10	dt-bindings: pinctrl: qcom: update compatible name for match with driver	Tengfei Fan
	Use compatible name "qcom,sm4450-tlmm" instead of "qcom,sm4450-pinctrl" to match the compatible name in sm4450 pinctrl driver. Fixes: 7bf8b78f86db ("dt-bindings: pinctrl: qcom: Add SM4450 pinctrl") Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Tengfei Fan <quic_tengfan@quicinc.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240129092512.23602-2-quic_tengfan@quicinc.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2024-03-09	exec: Simplify remove_arg_zero() error path	Kees Cook
	We don't need the "out" label any more, so remove "ret" and return directly on error. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Kees Cook <keescook@chromium.org> --- Cc: Eric Biederman <ebiederm@xmission.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: linux-mm@kvack.org Cc: linux-fsdevel@vger.kernel.org
2024-03-09	pstore/zone: Don't clear memory twice	Christophe JAILLET
	There is no need to call memset(..., 0, ...) on memory allocated by kcalloc(). It is already zeroed. Remove the redundant call. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/fa2597400051c18c6ca11187b0e4b906729991b2.1709972649.git.christophe.jaillet@wanadoo.fr Signed-off-by: Kees Cook <keescook@chromium.org>
2024-03-09	NFSD: Clean up nfsd4_encode_replay()	Chuck Lever
	Replace open-coded encoding logic with the use of conventional XDR utility functions. Add a tracepoint to make replays observable in field troubleshooting situations. The WARN_ON is removed. A stack trace is of little use, as there is only one call site for nfsd4_encode_replay(), and a buffer length shortage here is unlikely. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-03-09	Merge tag 'i2c-for-6.8-rc8' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Two patches from Heiner for the i801 are targeting muxes discovered while working on some other features. Essentially, there is a reordering when adding optional slaves and proper cleanup upon registering a mux device. Christophe fixes the exit path in the wmt driver that was leaving the clocks hanging, and the last fix from Tommy avoids false error reports in IRQ" * tag 'i2c-for-6.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: aspeed: Fix the dummy irq expected print i2c: wmt: Fix an error handling path in wmt_i2c_probe() i2c: i801: Avoid potential double call to gpiod_remove_lookup_table i2c: i801: Fix using mux_pdev before it's set
2024-03-09	Merge tag 'firewire-fixes-6.8-final' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull firewire fix from Takashi Sakamoto: "A fix to suppress a warning about unreleased IRQ for 1394 OHCI hardware when disabling MSI. In Linux kernel v6.5, a PCI driver for 1394 OHCI hardware was optimized into the managed device resources. Edmund Raile points out that the change brings the warning about unreleased IRQ at the call of pci_disable_msi(), since the API expects that the relevant IRQ has already been released in advance. As long as the API is called in .remove callback of PCI device operation, it is prohibited to maintain the IRQ as the part of managed device resource. As a workaround, the IRQ is explicitly released at .remove callback, before the call of pci_disable_msi(). pci_disable_msi() is legacy API nowadays in PCI MSI implementation. I have a plan to replace it with the modern API in the development for the future version of Linux kernel. So at present I keep them as is" * tag 'firewire-fixes-6.8-final' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: ohci: prevent leak of left-over IRQ on unbind
2024-03-09	SEV: disable SEV-ES DebugSwap by default	Paolo Bonzini
	The DebugSwap feature of SEV-ES provides a way for confidential guests to use data breakpoints. However, because the status of the DebugSwap feature is recorded in the VMSA, enabling it by default invalidates the attestation signatures. In 6.10 we will introduce a new API to create SEV VMs that will allow enabling DebugSwap based on what the user tells KVM to do. Contextually, we will change the legacy KVM_SEV_ES_INIT API to never enable DebugSwap. For compatibility with kernels that pre-date the introduction of DebugSwap, as well as with those where KVM_SEV_ES_INIT will never enable it, do not enable the feature by default. If anybody wants to use it, for now they can enable the sev_es_debug_swap_enabled module parameter, but this will result in a warning. Fixes: d1f85fbe836e ("KVM: SEV: Enable data breakpoints in SEV-ES") Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-03-09	Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of ↵	Paolo Bonzini
	https://github.com/kvm-x86/linux into HEAD KVM GUEST_MEMFD fixes for 6.8: - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to avoid creating ABI that KVM can't sanely support. - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly clear that such VMs are purely a development and testing vehicle, and come with zero guarantees. - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan is to support confidential VMs with deterministic private memory (SNP and TDX) only in the TDP MMU. - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
2024-03-09	Merge tag 'kvm-x86-fixes-6.8-2' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini
	KVM x86 fixes for 6.8, round 2: - When emulating an atomic access, mark the gfn as dirty in the memslot to fix a bug where KVM could fail to mark the slot as dirty during live migration, ultimately resulting in guest data corruption due to a dirty page not being re-copied from the source to the target. - Check for mmu_notifier invalidation events before faulting in the pfn, and before acquiring mmu_lock, to avoid unnecessary work and lock contention. Contending mmu_lock is especially problematic on preemptible kernels, as KVM may yield mmu_lock in response to the contention, which severely degrades overall performance due to vCPUs making it difficult for the task that triggered invalidation to make forward progress. Note, due to another kernel bug, this fix isn't limited to preemtible kernels, as any kernel built with CONFIG_PREEMPT_DYNAMIC=y will yield contended rwlocks and spinlocks. https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com
2024-03-09	arm64, bpf: Use bpf_prog_pack for arm64 bpf trampoline	Puranjay Mohan
	We used bpf_prog_pack to aggregate bpf programs into huge page to relieve the iTLB pressure on the system. This was merged for ARM64[1] We can apply it to bpf trampoline as well. This would increase the preformance of fentry and struct_ops programs. [1] https://lore.kernel.org/bpf/20240228141824.119877-1-puranjay12@gmail.com/ Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Pu Lehui <pulehui@huawei.com> Message-ID: <20240304202803.31400-1-puranjay12@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>