summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-05-01KVM: arm64: Avoid BUG-ing from the host abort pathQuentin Perret
Under certain circumstances __get_fault_info() may resolve the faulting address using the AT instruction. Given that this is being done outside of the host lock critical section, it is racy and the resolution via AT may fail. We currently BUG() in this situation, which is obviously less than ideal. Moving the address resolution to the critical section may have a performance impact, so let's keep it where it is, but bail out and return to the host to try a second time. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-7-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01KVM: arm64: Issue CMOs when tearing down guest s2 pagesQuentin Perret
On the guest teardown path, pKVM will zero the pages used to back the guest data structures before returning them to the host as they may contain secrets (e.g. in the vCPU registers). However, the zeroing is done using a cacheable alias, and CMOs are missing, hence giving the host a potential opportunity to read the original content of the guest structs from memory. Fix this by issuing CMOs after zeroing the pages. Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01KVM: arm64: Do not re-initialize the KVM lockFuad Tabba
The lock is already initialized in core KVM code at kvm_create_vm(). Fixes: 9d0c063a4d1d ("KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1") Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01KVM: arm64: Refactor checks for FP state ownershipFuad Tabba
To avoid direct comparison against the fp_owner enum, add a new function that performs the check, host_owns_fp_regs(), to complement the existing guest_owns_fp_regs(). To check for fpsimd state ownership, use the helpers instead of directly using the enums. No functional change intended. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01KVM: arm64: Move guest_owns_fp_regs() to increase its scopeFuad Tabba
guest_owns_fp_regs() will be used to check fpsimd state ownership across kvm/arm64. Therefore, move it to kvm_host.h to widen its scope. Moreover, the host state is not per-vcpu anymore, the vcpu parameter isn't used, so remove it as well. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01KVM: arm64: Initialize the kvm host data's fpsimd_state pointer in pKVMFuad Tabba
Since the host_fpsimd_state has been removed from kvm_vcpu_arch, it isn't pointing to the hyp's version of the host fp_regs in protected mode. Initialize the host_data fpsimd_state point to the host_data's context fp_regs on pKVM initialization. Fixes: 51e09b5572d6 ("KVM: arm64: Exclude host_fpsimd_state pointer from kvm_vcpu_arch") Signed-off-by: Fuad Tabba <tabba@google.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240423150538.2103045-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01KVM: arm64: Remove duplicated AA64MMFR1_EL1 XNXRussell King
Commit d5a32b60dc18 ("KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1") made certain fields in these registers writable, but in doing so, ID_AA64MMFR1_EL1_XNX was listed twice. Remove the duplication. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Zenghui Yu <zenghui.yu@linux.dev> Link: https://lore.kernel.org/r/E1s2AxF-00AWLv-03@rmk-PC.armlinux.org.uk Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-05-01drm/xe/vm: prevent UAF in rebind_work_func()Matthew Auld
We flush the rebind worker during the vm close phase, however in places like preempt_fence_work_func() we seem to queue the rebind worker without first checking if the vm has already been closed. The concern here is the vm being closed with the worker flushed, but then being rearmed later, which looks like potential uaf, since there is no actual refcounting to track the queued worker. We can't take the vm->lock here in preempt_rebind_work_func() to first check if the vm is closed since that will deadlock, so instead flush the worker again when the vm refcount reaches zero. v2: - Grabbing vm->lock in the preempt worker creates a deadlock, so checking the closed state is tricky. Instead flush the worker when the refcount reaches zero. It should be impossible to queue the preempt worker without already holding vm ref. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1676 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1591 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1364 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1304 Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1249 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423074721.119633-4-matthew.auld@intel.com (cherry picked from commit 3d44d67c441a9fe6f81a1d705f7de009a32a5b35) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-05-01drm/amd/display: Disable panel replay by default for nowMario Limonciello
Panel replay was enabled by default in commit 5950efe25ee0 ("drm/amd/display: Enable Panel Replay for static screen use case"), but it isn't working properly at least on some BOE and AUO panels. Instead of being static the screen is solid black when active. As it's a new feature that was just introduced that regressed VRR disable it for now so that problem can be properly root caused. Cc: Tom Chung <chiahsuan.chung@amd.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3344 Fixes: 5950efe25ee0 ("drm/amd/display: Enable Panel Replay for static screen use case") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-05-01net: core: reject skb_copy(_expand) for fraglist GSO skbsFelix Fietkau
SKB_GSO_FRAGLIST skbs must not be linearized, otherwise they become invalid. Return NULL if such an skb is passed to skb_copy or skb_copy_expand, in order to prevent a crash on a potential later call to skb_gso_segment. Fixes: 3a1296a38d0c ("net: Support GRO/GSO fraglist chaining.") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-01net: bridge: fix multicast-to-unicast with fraglist GSOFelix Fietkau
Calling skb_copy on a SKB_GSO_FRAGLIST skb is not valid, since it returns an invalid linearized skb. This code only needs to change the ethernet header, so pskb_copy is the right function to call here. Fixes: 6db6f0eae605 ("bridge: multicast to unicast") Signed-off-by: Felix Fietkau <nbd@nbd.name> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-01nvme-tcp: strict pdu pacing to avoid send stalls on TLSHannes Reinecke
TLS requires a strict pdu pacing via MSG_EOR to signal the end of a record and subsequent encryption. If we do not set MSG_EOR at the end of a sequence the record won't be closed, encryption doesn't start, and we end up with a send stall as the message will never be passed on to the TCP layer. So do not check for the queue status when TLS is enabled but rather make the MSG_MORE setting dependent on the current request only. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-01nvmet: fix nvme status code when namespace is disabledSagi Grimberg
If the user disabled a nvmet namespace, it is removed from the subsystem namespaces list. When nvmet processes a command directed to an nsid that was disabled, it cannot differentiate between a nsid that is disabled vs. a non-existent namespace, and resorts to return NVME_SC_INVALID_NS with the dnr bit set. This translates to a non-retryable status for the host, which translates to a user error. We should expect disabled namespaces to not cause an I/O error in a multipath environment. Address this by searching a configfs item for the namespace nvmet failed to find, and if we found one, conclude that the namespace is disabled (perhaps temporarily). Return NVME_SC_INTERNAL_PATH_ERROR in this case and keep DNR bit cleared. Reported-by: Jirong Feng <jirong.feng@easystack.cn> Tested-by: Jirong Feng <jirong.feng@easystack.cn> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-01nvmet-tcp: fix possible memory leak when tearing down a controllerSagi Grimberg
When we teardown the controller, we wait for pending I/Os to complete (sq->ref on all queues to drop to zero) and then we go over the commands, and free their command buffers in case they are still fetching data from the host (e.g. processing nvme writes) and have yet to take a reference on the sq. However, we may miss the case where commands have failed before executing and are queued for sending a response, but will never occur because the queue socket is already down. In this case we may miss deallocating command buffers. Solve this by freeing all commands buffers as nvmet_tcp_free_cmd_buffers is idempotent anyways. Reported-by: Yi Zhang <yi.zhang@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-01nvme: cancel pending I/O if nvme controller is in terminal stateNilay Shroff
While I/O is running, if the pci bus error occurs then in-flight I/O can not complete. Worst, if at this time, user (logically) hot-unplug the nvme disk then the nvme_remove() code path can't forward progress until in-flight I/O is cancelled. So these sequence of events may potentially hang hot-unplug code path indefinitely. This patch helps cancel the pending/in-flight I/O from the nvme request timeout handler in case the nvme controller is in the terminal (DEAD/DELETING/DELETING_NOIO) state and that helps nvme_remove() code path forward progress and finish successfully. Link: https://lore.kernel.org/all/199be893-5dfa-41e5-b6f2-40ac90ebccc4@linux.ibm.com/ Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-01nvmet-auth: replace pr_debug() with pr_err() to report an error.Maurizio Lombardi
In nvmet_auth_host_hash(), if a mismatch is detected in the hash length the kernel should print an error. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-01nvmet-auth: return the error code to the nvmet_auth_host_hash() callersMaurizio Lombardi
If the nvmet_auth_host_hash() function fails, the error code should be returned to its callers. Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-01nvme: find numa distance only if controller has valid numa idNilay Shroff
On system where native nvme multipath is configured and iopolicy is set to numa but the nvme controller numa node id is undefined or -1 (NUMA_NO_NODE) then avoid calculating node distance for finding optimal io path. In such case we may access numa distance table with invalid index and that may potentially refer to incorrect memory. So this patch ensures that if the nvme controller numa node id is -1 then instead of calculating node distance for finding optimal io path, we set the numa node distance of such controller to default 10 (LOCAL_DISTANCE). Link: https://lore.kernel.org/all/20240413090614.678353-1-nilay@linux.ibm.com/ Signed-off-by: Nilay Shroff <nilay@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-05-01s390/paes: Reestablish retry loop in paesHarald Freudenberger
With commit ed6776c96c60 ("s390/crypto: remove retry loop with sleep from PAES pkey invocation") the retry loop to retry derivation of a protected key from a secure key has been removed. This was based on the assumption that theses retries are not needed any more as proper retries are done in the zcrypt layer. However, tests have revealed that there exist some cases with master key change in the HSM and immediately (< 1 second) attempt to derive a protected key from a secure key with exact this HSM may eventually fail. The low level functions in zcrypt_ccamisc.c and zcrypt_ep11misc.c detect and report this temporary failure and report it to the caller as -EBUSY. The re-established retry loop in the paes implementation catches exactly this -EBUSY and eventually may run some retries. Fixes: ed6776c96c60 ("s390/crypto: remove retry loop with sleep from PAES pkey invocation") Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-01s390/zcrypt: Use EBUSY to indicate temp unavailabilityHarald Freudenberger
Use -EBUSY instead of -EAGAIN in zcrypt_ccamisc.c in cases where the CCA card returns 8/2290 to indicate a temporarily unavailability of this function. Fixes: ed6776c96c60 ("s390/crypto: remove retry loop with sleep from PAES pkey invocation") Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-01s390/zcrypt: Handle ep11 cprb return codeHarald Freudenberger
An EP11 reply cprb contains a field ret_code which may hold an error code different than the error code stored in the payload of the cprb. As of now all the EP11 misc functions do not evaluate this field but focus on the error code in the payload. Before checking the payload error, first the cprb error field should be evaluated which is introduced with this patch. If the return code value 0x000c0003 is seen, this indicates a busy situation which is reflected by -EBUSY in the zcrpyt_ep11misc.c low level function. A higher level caller should consider to retry after waiting a dedicated duration (say 1 second). Fixes: ed6776c96c60 ("s390/crypto: remove retry loop with sleep from PAES pkey invocation") Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-01s390/zcrypt: Fix wrong format string in debug feature printoutHarald Freudenberger
Fix wrong format string debug feature: %04x was used to print out a 32 bit value. - changed to %08x. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Reviewed-by: Holger Dengler <dengler@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-05-01x86/mm: Remove broken vsyscall emulation code from the page fault codeLinus Torvalds
The syzbot-reported stack trace from hell in this discussion thread actually has three nested page faults: https://lore.kernel.org/r/000000000000d5f4fc0616e816d4@google.com ... and I think that's actually the important thing here: - the first page fault is from user space, and triggers the vsyscall emulation. - the second page fault is from __do_sys_gettimeofday(), and that should just have caused the exception that then sets the return value to -EFAULT - the third nested page fault is due to _raw_spin_unlock_irqrestore() -> preempt_schedule() -> trace_sched_switch(), which then causes a BPF trace program to run, which does that bpf_probe_read_compat(), which causes that page fault under pagefault_disable(). It's quite the nasty backtrace, and there's a lot going on. The problem is literally the vsyscall emulation, which sets current->thread.sig_on_uaccess_err = 1; and that causes the fixup_exception() code to send the signal *despite* the exception being caught. And I think that is in fact completely bogus. It's completely bogus exactly because it sends that signal even when it *shouldn't* be sent - like for the BPF user mode trace gathering. In other words, I think the whole "sig_on_uaccess_err" thing is entirely broken, because it makes any nested page-faults do all the wrong things. Now, arguably, I don't think anybody should enable vsyscall emulation any more, but this test case clearly does. I think we should just make the "send SIGSEGV" be something that the vsyscall emulation does on its own, not this broken per-thread state for something that isn't actually per thread. The x86 page fault code actually tried to deal with the "incorrect nesting" by having that: if (in_interrupt()) return; which ignores the sig_on_uaccess_err case when it happens in interrupts, but as shown by this example, these nested page faults do not need to be about interrupts at all. IOW, I think the only right thing is to remove that horrendously broken code. The attached patch looks like the ObviouslyCorrect(tm) thing to do. NOTE! This broken code goes back to this commit in 2011: 4fc3490114bb ("x86-64: Set siginfo and context on vsyscall emulation faults") ... and back then the reason was to get all the siginfo details right. Honestly, I do not for a moment believe that it's worth getting the siginfo details right here, but part of the commit says: This fixes issues with UML when vsyscall=emulate. ... and so my patch to remove this garbage will probably break UML in this situation. I do not believe that anybody should be running with vsyscall=emulate in 2024 in the first place, much less if you are doing things like UML. But let's see if somebody screams. Reported-and-tested-by: syzbot+83e7f982ca045ab4405c@syzkaller.appspotmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/CAHk-=wh9D6f7HUkDgZHKmDCHUQmp+Co89GP+b8+z+G56BKeyNg@mail.gmail.com
2024-05-01spi: fix null pointer dereference within spi_syncMans Rullgard
If spi_sync() is called with the non-empty queue and the same spi_message is then reused, the complete callback for the message remains set while the context is cleared, leading to a null pointer dereference when the callback is invoked from spi_finalize_current_message(). With function inlining disabled, the call stack might look like this: _raw_spin_lock_irqsave from complete_with_flags+0x18/0x58 complete_with_flags from spi_complete+0x8/0xc spi_complete from spi_finalize_current_message+0xec/0x184 spi_finalize_current_message from spi_transfer_one_message+0x2a8/0x474 spi_transfer_one_message from __spi_pump_transfer_message+0x104/0x230 __spi_pump_transfer_message from __spi_transfer_message_noqueue+0x30/0xc4 __spi_transfer_message_noqueue from __spi_sync+0x204/0x248 __spi_sync from spi_sync+0x24/0x3c spi_sync from mcp251xfd_regmap_crc_read+0x124/0x28c [mcp251xfd] mcp251xfd_regmap_crc_read [mcp251xfd] from _regmap_raw_read+0xf8/0x154 _regmap_raw_read from _regmap_bus_read+0x44/0x70 _regmap_bus_read from _regmap_read+0x60/0xd8 _regmap_read from regmap_read+0x3c/0x5c regmap_read from mcp251xfd_alloc_can_err_skb+0x1c/0x54 [mcp251xfd] mcp251xfd_alloc_can_err_skb [mcp251xfd] from mcp251xfd_irq+0x194/0xe70 [mcp251xfd] mcp251xfd_irq [mcp251xfd] from irq_thread_fn+0x1c/0x78 irq_thread_fn from irq_thread+0x118/0x1f4 irq_thread from kthread+0xd8/0xf4 kthread from ret_from_fork+0x14/0x28 Fix this by also setting message->complete to NULL when the transfer is complete. Fixes: ae7d2346dc89 ("spi: Don't use the message queue if possible in spi_sync") Signed-off-by: Mans Rullgard <mans@mansr.com> Link: https://lore.kernel.org/r/20240430182705.13019-1-mans@mansr.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-04-30drm/amdgpu: fix doorbell regressionShashank Sharma
This patch adds a missed handling of PL domain doorbell while handling VRAM faults. Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Fixes: a6ff969fe9cb ("drm/amdgpu: fix visible VRAM handling during faults") Reviewed-by: Christian Koenig <christian.koenig@amd.com> Signed-off-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30drm/amdkfd: Flush the process wq before creating a kfd_processLancelot SIX
There is a race condition when re-creating a kfd_process for a process. This has been observed when a process under the debugger executes exec(3). In this scenario: - The process executes exec. - This will eventually release the process's mm, which will cause the kfd_process object associated with the process to be freed (kfd_process_free_notifier decrements the reference count to the kfd_process to 0). This causes kfd_process_ref_release to enqueue kfd_process_wq_release to the kfd_process_wq. - The debugger receives the PTRACE_EVENT_EXEC notification, and tries to re-enable AMDGPU traps (KFD_IOC_DBG_TRAP_ENABLE). - When handling this request, KFD tries to re-create a kfd_process. This eventually calls kfd_create_process and kobject_init_and_add. At this point the call to kobject_init_and_add can fail because the old kfd_process.kobj has not been freed yet by kfd_process_wq_release. This patch proposes to avoid this race by making sure to drain kfd_process_wq before creating a new kfd_process object. This way, we know that any cleanup task is done executing when we reach kobject_init_and_add. Signed-off-by: Lancelot SIX <lancelot.six@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30drm/amd/display: Disable seamless boot on 128b/132b encodingSung Joon Kim
[why] preOS will not support display mode programming and link training for UHBR rates. [how] If we detect a sink that's UHBR capable, disable seamless boot Reviewed-by: Anthony Koo <anthony.koo@amd.com> Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Sung Joon Kim <sungjoon.kim@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30mptcp: ensure snd_nxt is properly initialized on connectPaolo Abeni
Christoph reported a splat hinting at a corrupted snd_una: WARNING: CPU: 1 PID: 38 at net/mptcp/protocol.c:1005 __mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Modules linked in: CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 6.9.0-rc1-gbbeac67456c9 #59 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:__mptcp_clean_una+0x4b3/0x620 net/mptcp/protocol.c:1005 Code: be 06 01 00 00 bf 06 01 00 00 e8 a8 12 e7 fe e9 00 fe ff ff e8 8e 1a e7 fe 0f b7 ab 3e 02 00 00 e9 d3 fd ff ff e8 7d 1a e7 fe <0f> 0b 4c 8b bb e0 05 00 00 e9 74 fc ff ff e8 6a 1a e7 fe 0f 0b e9 RSP: 0018:ffffc9000013fd48 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8881029bd280 RCX: ffffffff82382fe4 RDX: ffff8881003cbd00 RSI: ffffffff823833c3 RDI: 0000000000000001 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: fefefefefefefeff R12: ffff888138ba8000 R13: 0000000000000106 R14: ffff8881029bd908 R15: ffff888126560000 FS: 0000000000000000(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f604a5dae38 CR3: 0000000101dac002 CR4: 0000000000170ef0 Call Trace: <TASK> __mptcp_clean_una_wakeup net/mptcp/protocol.c:1055 [inline] mptcp_clean_una_wakeup net/mptcp/protocol.c:1062 [inline] __mptcp_retrans+0x7f/0x7e0 net/mptcp/protocol.c:2615 mptcp_worker+0x434/0x740 net/mptcp/protocol.c:2767 process_one_work+0x1e0/0x560 kernel/workqueue.c:3254 process_scheduled_works kernel/workqueue.c:3335 [inline] worker_thread+0x3c7/0x640 kernel/workqueue.c:3416 kthread+0x121/0x170 kernel/kthread.c:388 ret_from_fork+0x44/0x50 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243 </TASK> When fallback to TCP happens early on a client socket, snd_nxt is not yet initialized and any incoming ack will copy such value into snd_una. If the mptcp worker (dumbly) tries mptcp-level re-injection after such ack, that would unconditionally trigger a send buffer cleanup using 'bad' snd_una values. We could easily disable re-injection for fallback sockets, but such dumb behavior already helped catching a few subtle issues and a very low to zero impact in practice. Instead address the issue always initializing snd_nxt (and write_seq, for consistency) at connect time. Fixes: 8fd738049ac3 ("mptcp: fallback in case of simultaneous connect") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/485 Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240429-upstream-net-20240429-mptcp-snd_nxt-init-connect-v1-1-59ceac0a7dcb@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30drm/amd/display: Fix DC mode screen flickering on DCN321Leo Ma
[Why && How] Screen flickering saw on 4K@60 eDP with high refresh rate external monitor when booting up in DC mode. DC Mode Capping is disabled which caused wrong UCLK being used. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Leo Ma <hanghong.ma@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30drm/amd/display: Add VCO speed parameter for DCN31 FPURodrigo Siqueira
Add VCO speed parameters in the bounding box array. Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30e1000e: change usleep_range to udelay in PHY mdic accessVitaly Lifshits
This is a partial revert of commit 6dbdd4de0362 ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems"). The referenced commit used usleep_range inside the PHY access routines, which are sometimes called from an atomic context. This can lead to a kernel panic in some scenarios, such as cable disconnection and reconnection on vPro systems. Solve this by changing the usleep_range calls back to udelay. Fixes: 6dbdd4de0362 ("e1000e: Workaround for sporadic MDI error on Meteor Lake systems") Cc: stable@vger.kernel.org Reported-by: Jérôme Carretero <cJ@zougloub.eu> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218740 Closes: https://lore.kernel.org/lkml/a7eb665c74b5efb5140e6979759ed243072cb24a.camel@zougloub.eu/ Co-developed-by: Sasha Neftin <sasha.neftin@intel.com> Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Dima Ruinskiy <dima.ruinskiy@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240429171040.1152516-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30drm/amdgpu: once more fix the call oder in amdgpu_ttm_move() v2Christian König
This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap. The basic problem here is that after the move the old location is simply not available any more. Some fixes were suggested, but essentially we should call the move notification before actually moving things because only this way we have the correct order for DMA-buf and VM move notifications as well. Also rework the statistic handling so that we don't update the eviction counter before the move. v2: add missing NULL check Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: 94aeb4117343 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3171 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> CC: stable@vger.kernel.org
2024-04-30drm/amd/display: Allocate zero bw after bw alloc enableMeenakshikumar Somasundaram
[Why] During DP tunnel creation, CM preallocates BW and reduces estimated BW of other DPIA. CM release preallocation only when allocation is complete. Display mode validation logic validates timings based on bw available per host router. In multi display setup, this causes bw allocation failure when allocation greater than estimated bw. [How] Do zero alloc to make the CM to release preallocation and update estimated BW correctly for all DPIAs per host router. Reviewed-by: PeiChen Huang <peichen.huang@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30drm/amd/display: Fix incorrect DSC instance for MSTHersen Wu
[Why] DSC debugfs, such as dp_dsc_clock_en_read, use aconnector->dc_link to find pipe_ctx for display. Displays connected to MST hub share the same dc_link. DSC instance is from pipe_ctx. This causes incorrect DSC instance for display connected to MST hub. [How] Add aconnector->sink check to find pipe_ctx. CC: stable@vger.kernel.org Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Hersen Wu <hersenxs.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30drm/amd/display: Atom Integrated System Info v2_2 for DCN35Gabe Teeger
New request from KMD/VBIOS in order to support new UMA carveout model. This fixes a null dereference from accessing Ctx->dc_bios->integrated_info while it was NULL. DAL parses through the BIOS and extracts the necessary integrated_info but was missing a case for the new BIOS version 2.3. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Gabe Teeger <gabe.teeger@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30net: dsa: mv88e6xxx: Fix number of databases for 88E6141 / 88E6341Marek Behún
The Topaz family (88E6141 and 88E6341) only support 256 Forwarding Information Tables. Fixes: a75961d0ebfd ("net: dsa: mv88e6xxx: Add support for ethernet switch 88E6341") Fixes: 1558727a1c1b ("net: dsa: mv88e6xxx: Add support for ethernet switch 88E6141") Signed-off-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Link: https://lore.kernel.org/r/20240429133832.9547-1-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30drm/amd/display: Add dtbclk access to dcn315Swapnil Patel
[Why & How] Currently DCN315 clk manager is missing code to enable/disable dtbclk. Because of this, "optimized_required" flag is constantly set and this prevents FreeSync from engaging for certain high bandwidth display Modes which require DTBCLK. Reviewed-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Swapnil Patel <swapnil.patel@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30cxgb4: Properly lock TX queue for the selftest.Sebastian Andrzej Siewior
The selftest for the driver sends a dummy packet and checks if the packet will be received properly as it should be. The regular TX path and the selftest can use the same network queue so locking is required and was missing in the selftest path. This was addressed in the commit cited below. Unfortunately locking the TX queue requires BH to be disabled which is not the case in selftest path which is invoked in process context. Lockdep should be complaining about this. Use __netif_tx_lock_bh() for TX queue locking. Fixes: c650e04898072 ("cxgb4: Fix race between loopback and normal Tx path") Reported-by: "John B. Wyatt IV" <jwyatt@redhat.com> Closes: https://lore.kernel.org/all/Zic0ot5aGgR-V4Ks@thinkpad2021/ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20240429091147.YWAaal4v@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30rxrpc: Fix using alignmask being zero for __page_frag_alloc_align()Yunsheng Lin
rxrpc_alloc_data_txbuf() may be called with data_align being zero in none_alloc_txbuf() and rxkad_alloc_txbuf(), data_align is supposed to be an order-based alignment value, but zero is not a valid order-based alignment value, and '~(data_align - 1)' doesn't result in a valid mask-based alignment value for __page_frag_alloc_align(). Fix it by passing a valid order-based alignment value in none_alloc_txbuf() and rxkad_alloc_txbuf(). Also use page_frag_alloc_align() expecting an order-based alignment value in rxrpc_alloc_data_txbuf() to avoid doing the alignment converting operation and to catch possible invalid alignment value in the future. Remove the 'if (data_align)' checking too, as it is always true for a valid order-based alignment value. Fixes: 6b2536462fd4 ("rxrpc: Fix use of changed alignment param to page_frag_alloc_align()") Fixes: 49489bb03a50 ("rxrpc: Do zerocopy using MSG_SPLICE_PAGES and page frags") CC: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240428111640.27306-1-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30drm/amd/display: Ensure that dmcub support flag is set for DCN20Rodrigo Siqueira
In the DCN20 resource initialization, ensure that DMCUB support starts configured as true. Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30drm/amd/display: Handle Y carry-over in VCP X.Y calculationGeorge Shen
Theoretically rare corner case where ceil(Y) results in rounding up to an integer. If this happens, the 1 should be carried over to the X value. CC: stable@vger.kernel.org Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30drm/amdgpu: Fix VRAM memory accountingMukul Joshi
Subtract the VRAM pinned memory when checking for available memory in amdgpu_amdkfd_reserve_mem_limit function since that memory is not available for use. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-04-30ublk: remove segment count and size limitsUday Shankar
ublk_drv currently creates block devices with the default max_segments and max_segment_size limits of BLK_MAX_SEGMENTS (128) and BLK_MAX_SEGMENT_SIZE (65536) respectively. These defaults can artificially constrain the I/O size seen by the ublk server - for example, suppose that the ublk server has configured itself to accept I/Os up to 1M and the application is also issuing 1M sized I/Os. If the I/O buffer used by the application is backed by 4K pages, the buffer could consist of up to 1M / 4K = 256 physically discontiguous segments (even if the buffer is virtually contiguous). As such, the I/O could exceed the default max_segments limit and get split. This can cause unnecessary performance issues if the ublk server is optimized to handle 1M I/Os. The block layer's segment count/size limits exist to model hardware constraints which don't exist in ublk_drv's case, so just remove those limits for the block devices created by ublk_drv. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Riley Thomasson <riley@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240430211623.2802036-1-ushankar@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-04-30clk: samsung: Revert "clk: Use device_get_match_data()"Marek Szyprowski
device_get_match_data() function should not be used on the device other than the one matched to the given driver, because it always returns the match_data of the matched driver. In case of exynos-clkout driver, the original code matches the OF IDs on the PARENT device, so replacing it with of_device_get_match_data() broke the driver. This has been already pointed once in commit 2bc5febd05ab ("clk: samsung: Revert "clk: samsung: exynos-clkout: Use of_device_get_match_data()""). To avoid further confusion, add a comment about this special case, which requires direct of_match_device() call to pass custom IDs array. This partially reverts commit 409c39ec92a35e3708f5b5798c78eae78512cd71. Cc: <stable@vger.kernel.org> Fixes: 409c39ec92a3 ("clk: Use device_get_match_data()") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20240425075628.838497-1-m.szyprowski@samsung.com Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20240430184656.357805-1-krzysztof.kozlowski@linaro.org Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2024-04-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fix from Paolo Bonzini: "A pretty straightforward fix for a NULL pointer dereference, plus the accompanying reproducer" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: Add test for uaccesses to non-existent vgic-v2 CPUIF KVM: arm64: vgic-v2: Check for non-NULL vCPU in vgic_v2_parse_attr()
2024-04-30Input: amimouse - mark driver struct with __refdata to prevent section mismatchUwe Kleine-König
As described in the added code comment, a reference to .exit.text is ok for drivers registered via module_platform_driver_probe(). Make this explicit to prevent the following section mismatch warning WARNING: modpost: drivers/input/mouse/amimouse: section mismatch in reference: amimouse_driver+0x8 (section: .data) -> amimouse_remove (section: .exit.text) that triggers on an allmodconfig W=1 build. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Link: https://lore.kernel.org/r/2e3783106bf6bd9a7bdeb12b706378fb16316471.1711748999.git.u.kleine-koenig@pengutronix.de Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2024-04-30Merge tag 'kvmarm-fixes-6.9-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.9, part #2 - Fix + test for a NULL dereference resulting from unsanitised user input in the vgic-v2 device attribute accessors
2024-04-30usb: typec: tcpm: Check for port partner validity before consuming itBadhri Jagan Sridharan
typec_register_partner() does not guarantee partner registration to always succeed. In the event of failure, port->partner is set to the error value or NULL. Given that port->partner validity is not checked, this results in the following crash: Unable to handle kernel NULL pointer dereference at virtual address xx pc : run_state_machine+0x1bc8/0x1c08 lr : run_state_machine+0x1b90/0x1c08 .. Call trace: run_state_machine+0x1bc8/0x1c08 tcpm_state_machine_work+0x94/0xe4 kthread_worker_fn+0x118/0x328 kthread+0x1d0/0x23c ret_from_fork+0x10/0x20 To prevent the crash, check for port->partner validity before derefencing it in all the call sites. Cc: stable@vger.kernel.org Fixes: c97cd0b4b54e ("usb: typec: tcpm: set initial svdm version based on pd revision") Signed-off-by: Badhri Jagan Sridharan <badhri@google.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20240427202812.3435268-1-badhri@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-30usb: typec: tcpm: enforce ready state when queueing alt mode vdmRD Babiera
Before sending Enter Mode for an Alt Mode, there is a gap between Discover Modes and the Alt Mode driver queueing the Enter Mode VDM for the port partner to send a message to the port. If this message results in unregistering Alt Modes such as in a DR_SWAP, then the following deadlock can occur with respect to the DisplayPort Alt Mode driver: 1. The DR_SWAP state holds port->lock. Unregistering the Alt Mode driver results in a cancel_work_sync() that waits for the current dp_altmode_work to finish. 2. dp_altmode_work makes a call to tcpm_altmode_enter. The deadlock occurs because tcpm_queue_vdm_unlock attempts to hold port->lock. Before attempting to grab the lock, ensure that the port is in a state vdm_run_state_machine can run in. Alt Mode unregistration will not occur in these states. Fixes: 03eafcfb60c0 ("usb: typec: tcpm: Add tcpm_queue_vdm_unlocked() helper") Cc: stable@vger.kernel.org Signed-off-by: RD Babiera <rdbabiera@google.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20240423202356.3372314-2-rdbabiera@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-04-30usb: typec: tcpm: unregister existing source caps before re-registrationAmit Sunil Dhamne
Check and unregister existing source caps in tcpm_register_source_caps function before registering new ones. This change fixes following warning when port partner resends source caps after negotiating PD contract for the purpose of re-negotiation. [ 343.135030][ T151] sysfs: cannot create duplicate filename '/devices/virtual/usb_power_delivery/pd1/source-capabilities' [ 343.135071][ T151] Call trace: [ 343.135076][ T151] dump_backtrace+0xe8/0x108 [ 343.135099][ T151] show_stack+0x18/0x24 [ 343.135106][ T151] dump_stack_lvl+0x50/0x6c [ 343.135119][ T151] dump_stack+0x18/0x24 [ 343.135126][ T151] sysfs_create_dir_ns+0xe0/0x140 [ 343.135137][ T151] kobject_add_internal+0x228/0x424 [ 343.135146][ T151] kobject_add+0x94/0x10c [ 343.135152][ T151] device_add+0x1b0/0x4c0 [ 343.135187][ T151] device_register+0x20/0x34 [ 343.135195][ T151] usb_power_delivery_register_capabilities+0x90/0x20c [ 343.135209][ T151] tcpm_pd_rx_handler+0x9f0/0x15b8 [ 343.135216][ T151] kthread_worker_fn+0x11c/0x260 [ 343.135227][ T151] kthread+0x114/0x1bc [ 343.135235][ T151] ret_from_fork+0x10/0x20 [ 343.135265][ T151] kobject: kobject_add_internal failed for source-capabilities with -EEXIST, don't try to register things with the same name in the same directory. Fixes: 8203d26905ee ("usb: typec: tcpm: Register USB Power Delivery Capabilities") Cc: linux-usb@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Amit Sunil Dhamne <amitsd@google.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20240424223227.1807844-1-amitsd@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>