linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2025-03-21	s390/pci: Support mmap() of PCI resources except for ISM devices	Niklas Schnelle
	So far s390 does not allow mmap() of PCI resources to user-space via the usual mechanisms, though it does use it for RDMA. For the PCI sysfs resource files and /proc/bus/pci it defines neither HAVE_PCI_MMAP nor ARCH_GENERIC_PCI_MMAP_RESOURCE. For vfio-pci s390 previously relied on disabled VFIO_PCI_MMAP and now relies on setting pdev->non_mappable_bars for all devices. This is partly because access to mapped PCI resources from user-space requires special PCI load/store memory-I/O (MIO) instructions, or the special MMIO syscalls when these are not available. Still, such access is possible and useful not just for RDMA, in fact not being able to mmap() PCI resources has previously caused extra work when testing devices. One thing that doesn't work with PCI resources mapped to user-space though is the s390 specific virtual ISM device. Not only because the BAR size of 256 TiB prevents mapping the whole BAR but also because access requires use of the legacy PCI instructions which are not accessible to user-space on systems with the newer MIO PCI instructions. Now with the pdev->non_mappable_bars flag ISM can be excluded from mapping its resources while making this functionality available for all other PCI devices. To this end introduce a minimal implementation of PCI_QUIRKS and use that to set pdev->non_mappable_bars for ISM devices only. Then also set ARCH_GENERIC_PCI_MMAP_RESOURCE to take advantage of the generic implementation of pci_mmap_resource_range() enabling only the newer sysfs mmap() interface. This follows the recommendation in Documentation/PCI/sysfs-pci.rst. Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-3-c5c0f1d26efd@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-21	s390/pci: Introduce pdev->non_mappable_bars and replace VFIO_PCI_MMAP	Niklas Schnelle
	The ability to map PCI resources to user-space is controlled by global defines. For vfio there is VFIO_PCI_MMAP which is only disabled on s390 and controls mapping of PCI resources using vfio-pci with a fallback option via the pread()/pwrite() interface. For the PCI core there is ARCH_GENERIC_PCI_MMAP_RESOURCE which enables a generic implementation for mapping PCI resources plus the newer sysfs interface. Then there is HAVE_PCI_MMAP which can be used with custom definitions of pci_mmap_resource_range() and the historical /proc/bus/pci interface. Both mechanisms are all or nothing. For s390 mapping PCI resources is possible and useful for testing and certain applications such as QEMU's vfio-pci based user-space NVMe driver. For certain devices, however access to PCI resources via mappings to user-space is not possible and these must be excluded from the general PCI resource mapping mechanisms. Introduce pdev->non_mappable_bars to indicate that a PCI device's BARs can not be accessed via mappings to user-space. In the future this enables per-device restrictions of PCI resource mapping. For now, set this flag for all PCI devices on s390 in line with the existing, general disable of PCI resource mapping. As s390 is the only user of the VFI_PCI_MMAP Kconfig options this can already be replaced with a check of this new flag. Also add similar checks in the other code protected by HAVE_PCI_MMAP respectively ARCH_GENERIC_PCI_MMAP in preparation for enabling these for supported devices. Link: https://lore.kernel.org/lkml/20250212132808.08dcf03c.alex.williamson@redhat.com/ Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-2-c5c0f1d26efd@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-21	s390/pci: Fix s390_mmio_read/write syscall page fault handling	Niklas Schnelle
	The s390 MMIO syscalls when using the classic PCI instructions do not cause a page fault when follow_pfnmap_start() fails due to the page not being present. Besides being a general deficiency this breaks vfio-pci's mmap() handling once VFIO_PCI_MMAP gets enabled as this lazily maps on first access. Fix this by following a failed follow_pfnmap_start() with fixup_user_page() and retrying the follow_pfnmap_start(). Also fix a VM_READ vs VM_WRITE mixup in the read syscall. Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-1-c5c0f1d26efd@linux.ibm.com Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
2025-03-21	PCI: Fix NULL dereference in SR-IOV VF creation error path	Shay Drory
	Clean up when virtfn setup fails to prevent NULL pointer dereference during device removal. The kernel oops below occurred due to incorrect error handling flow when pci_setup_device() fails. Add pci_iov_scan_device(), which handles virtfn allocation and setup and cleans up if pci_setup_device() fails, so pci_iov_add_virtfn() doesn't need to call pci_stop_and_remove_bus_device(). This prevents accessing partially initialized virtfn devices during removal. BUG: kernel NULL pointer dereference, address: 00000000000000d0 RIP: 0010:device_del+0x3d/0x3d0 Call Trace: pci_remove_bus_device+0x7c/0x100 pci_iov_add_virtfn+0xfa/0x200 sriov_enable+0x208/0x420 mlx5_core_sriov_configure+0x6a/0x160 [mlx5_core] sriov_numvfs_store+0xae/0x1a0 Link: https://lore.kernel.org/r/20250310084524.599225-1-shayd@nvidia.com Fixes: e3f30d563a38 ("PCI: Make pci_destroy_dev() concurrent safe") Signed-off-by: Shay Drory <shayd@nvidia.com> [bhelgaas: commit log, return ERR_PTR(-ENOMEM) directly] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Keith Busch <kbusch@kernel.org>
2025-03-21	tracepoint: Print the function symbol when tracepoint_debug is set	Huang Shijie
	When tracepoint_debug is set, we may get the output in kernel log: [ 380.013843] Probe 0 : 00000000f0d68cda It is not readable, so change to print the function symbol. After this patch, the output may becomes: [ 55.225555] Probe 0 : perf_trace_sched_wakeup_template+0x0/0x20 Link: https://lore.kernel.org/20250307033858.4134-1-shijie@os.amperecomputing.com Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-21	io_uring/net: only import send_zc buffer once	Caleb Sander Mateos
	io_send_zc() guards its call to io_send_zc_import() with if (!done_io) in an attempt to avoid calling it redundantly on the same req. However, if the initial non-blocking issue returns -EAGAIN, done_io will stay 0. This causes the subsequent issue to unnecessarily re-import the buffer. Add an explicit flag "imported" to io_sr_msg to track if its buffer has already been imported. Clear the flag in io_send_zc_prep(). Call io_send_zc_import() and set the flag in io_send_zc() if it is unset. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: 54cdcca05abd ("io_uring/net: switch io_send() and io_send_zc() to using io_async_msghdr") Link: https://lore.kernel.org/r/20250321184819.3847386-2-csander@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-21	io_uring/cmd: introduce io_uring_cmd_import_fixed_vec	Pavel Begunkov
	io_uring_cmd_import_fixed_vec() is a cmd helper around vectored registered buffer import functions, which caches the memory under the hood. The lifetime of the vectore and hence the iterator is bound to the request. Furthermore, the user is not allowed to call it multiple times for a single request. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/97487a80dec3fb8cf8aeedf1f9026ef6d503fe4b.1742579999.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-21	io_uring/cmd: add iovec cache for commands	Pavel Begunkov
	Add iou_vec to commands and wire caching for it, but don't expose it to users just yet. We need the vec cleared on initial alloc, but since we can't place it at the beginning at the moment, zero the entire async_data. It's cached, and the performance effects only the initial allocation, and it might be not a bad idea since we're exposing those bits to outside drivers. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c0f2145b75791bc6106eb4e72add2cf6a2c72a7a.1742579999.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-21	x86/hyperv: Add comments about hv_vpset and var size hypercall input args	Michael Kelley
	Current code varies in how the size of the variable size input header for hypercalls is calculated when the input contains struct hv_vpset. Surprisingly, this variation is correct, as different hypercalls make different choices for what portion of struct hv_vpset is treated as part of the variable size input header. The Hyper-V TLFS is silent on these details, but the behavior has been confirmed with Hyper-V developers. To avoid future confusion about these differences, add comments to struct hv_vpset, and to hypercall call sites with input that contains a struct hv_vpset. The comments describe the overall situation and the calculation that should be used at each particular call site. No functional change as only comments are updated. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250318214919.958953-1-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250318214919.958953-1-mhklinux@outlook.com>
2025-03-21	Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs	Nuno Das Neves
	Provide a set of IOCTLs for creating and managing child partitions when running as root partition on Hyper-V. The new driver is enabled via CONFIG_MSHV_ROOT. A brief overview of the interface: MSHV_CREATE_PARTITION is the entry point, returning a file descriptor representing a child partition. IOCTLs on this fd can be used to map memory, create VPs, etc. Creating a VP returns another file descriptor representing that VP which in turn has another set of corresponding IOCTLs for running the VP, getting/setting state, etc. MSHV_ROOT_HVCALL is a generic "passthrough" hypercall IOCTL which can be used for a number of partition or VP hypercalls. This is for hypercalls that do not affect any state in the kernel driver, such as getting and setting VP registers and partition properties, translating addresses, etc. It is "passthrough" because the binary input and output for the hypercall is only interpreted by the VMM - the kernel driver does nothing but insert the VP and partition id where necessary (which are always in the same place), and execute the hypercall. Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Co-developed-by: Jinank Jain <jinankjain@microsoft.com> Signed-off-by: Jinank Jain <jinankjain@microsoft.com> Co-developed-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com> Co-developed-by: Muminul Islam <muislam@microsoft.com> Signed-off-by: Muminul Islam <muislam@microsoft.com> Co-developed-by: Praveen K Paladugu <prapal@linux.microsoft.com> Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> Co-developed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Co-developed-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Roman Kisel <romank@linux.microsoft.com> Link: https://lore.kernel.org/r/1741980536-3865-11-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1741980536-3865-11-git-send-email-nunodasneves@linux.microsoft.com>
2025-03-21	eth: bnxt: fix out-of-range access of vnic_info array	Taehee Yoo
	The bnxt_queue_{start \| stop}() access vnic_info as much as allocated, which indicates bp->nr_vnics. So, it should not reach bp->vnic_info[bp->nr_vnics]. Fixes: 661958552eda ("eth: bnxt: do not use BNXT_VNIC_NTUPLE unconditionally in queue restart logic") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20250316025837.939527-1-ap420073@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	selftests/timers: Improve skew_consistency by testing with other clockids	John Stultz
	Lei Chen reported a bug with CLOCK_MONOTONIC_COARSE having inconsistencies when NTP is adjusting the clock frequency. This has gone seemingly undetected for ~15 years, illustrating a clear gap in our testing. The skew_consistency test is intended to catch this sort of problem, but was focused on only evaluating CLOCK_MONOTONIC, and thus missed the problem on CLOCK_MONOTONIC_COARSE. So adjust the test to run with all clockids for 60 seconds each instead of 10 minutes with just CLOCK_MONOTONIC. Reported-by: Lei Chen <lei.chen@smartx.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250320200306.1712599-2-jstultz@google.com Closes: https://lore.kernel.org/lkml/20250310030004.3705801-1-lei.chen@smartx.com/
2025-03-21	timekeeping: Fix possible inconsistencies in _COARSE clockids	John Stultz
	Lei Chen raised an issue with CLOCK_MONOTONIC_COARSE seeing time inconsistencies. Lei tracked down that this was being caused by the adjustment tk->tkr_mono.xtime_nsec -= offset; which is made to compensate for the unaccumulated cycles in offset when the multiplicator is adjusted forward, so that the non-_COARSE clockids don't see inconsistencies. However, the _COARSE clockid getter functions use the adjusted xtime_nsec value directly and do not compensate the negative offset via the clocksource delta multiplied with the new multiplicator. In that case the caller can observe time going backwards in consecutive calls. By design, this negative adjustment should be fine, because the logic run from timekeeping_adjust() is done after it accumulated approximately multiplicator * interval_cycles into xtime_nsec. The accumulated value is always larger then the mult_adj * offset value, which is subtracted from xtime_nsec. Both operations are done together under the tk_core.lock, so the net change to xtime_nsec is always always be positive. However, do_adjtimex() calls into timekeeping_advance() as well, to to apply the NTP frequency adjustment immediately. In this case, timekeeping_advance() does not return early when the offset is smaller then interval_cycles. In that case there is no time accumulated into xtime_nsec. But the subsequent call into timekeeping_adjust(), which modifies the multiplicator, subtracts from xtime_nsec to correct for the new multiplicator. Here because there was no accumulation, xtime_nsec becomes smaller than before, which opens a window up to the next accumulation, where the _COARSE clockid getters, which don't compensate for the offset, can observe the inconsistency. To fix this, rework the timekeeping_advance() logic so that when invoked from do_adjtimex(), the time is immediately forwarded to accumulate also the sub-interval portion into xtime. That means the remaining offset becomes zero and the subsequent multiplier adjustment therefore does not modify xtime_nsec. There is another related inconsistency. If xtime is forwarded due to the instantaneous multiplier adjustment, the NTP error, which was accumulated with the previous setting, becomes meaningless. Therefore clear the NTP error as well, after forwarding the clock for the instantaneous multiplier update. Fixes: da15cfdae033 ("time: Introduce CLOCK_REALTIME_COARSE") Reported-by: Lei Chen <lei.chen@smartx.com> Signed-off-by: John Stultz <jstultz@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250320200306.1712599-1-jstultz@google.com Closes: https://lore.kernel.org/lkml/20250310030004.3705801-1-lei.chen@smartx.com/
2025-03-21	mptcp: sockopt: fix getting freebind & transparent	Matthieu Baerts (NGI0)
	When adding a socket option support in MPTCP, both the get and set parts are supposed to be implemented. IP(V6)_FREEBIND and IP(V6)_TRANSPARENT support for the setsockopt part has been added a while ago, but it looks like the get part got forgotten. It should have been present as a way to verify a setting has been set as expected, and not to act differently from TCP or any other socket types. Everything was in place to expose it, just the last step was missing. Only new code is added to cover these specific getsockopt(), that seems safe. Fixes: c9406a23c116 ("mptcp: sockopt: add SOL_IP freebind & transparent options") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-3-122dbb249db3@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	mptcp: sockopt: fix getting IPV6_V6ONLY	Matthieu Baerts (NGI0)
	When adding a socket option support in MPTCP, both the get and set parts are supposed to be implemented. IPV6_V6ONLY support for the setsockopt part has been added a while ago, but it looks like the get part got forgotten. It should have been present as a way to verify a setting has been set as expected, and not to act differently from TCP or any other socket types. Not supporting this getsockopt(IPV6_V6ONLY) blocks some apps which want to check the default value, before doing extra actions. On Linux, the default value is 0, but this can be changed with the net.ipv6.bindv6only sysctl knob. On Windows, it is set to 1 by default. So supporting the get part, like for all other socket options, is important. Everything was in place to expose it, just the last step was missing. Only new code is added to cover this specific getsockopt(), that seems safe. Fixes: c9b95a135987 ("mptcp: support IPV6_V6ONLY setsockopt") Cc: stable@vger.kernel.org Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/550 Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-net-mptcp-fix-data-stream-corr-sockopt-v1-2-122dbb249db3@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	Merge branch 'netconsole-add-support-for-userdata-release'	Paolo Abeni
	Breno Leitao says: ==================== netconsole: Add support for userdata release I am submitting a series of patches that introduce a new feature for the netconsole subsystem, specifically the addition of the 'release' field to the sysdata structure. This feature allows the kernel release/version to be appended to the userdata dictionary in every message sent, enhancing the information available for debugging and monitoring purposes. This complements the already supported release prepend feature, which was added some time ago. The release prepend appends the release information at the message header, which is not ideal for two reasons: 1) It is difficult to determine if a message includes this information, making it hard and resource-intensive to parse. 2) When a message is fragmented, the release information is appended to every message fragment, consuming valuable space in the packet. The "release prepend" feature was created before the concept of userdata and sysdata. Now that this format has proven successful, we are implementing the release feature as part of this enhanced structure. This patch series aims to improve the netconsole subsystem by providing a more efficient and user-friendly way to include kernel release information in messages. I believe these changes will significantly aid in system analysis and troubleshooting. Suggested-by: Manu Bretelle <chantr4@gmail.com> Signed-off-by: Breno Leitao <leitao@debian.org> ==================== Link: https://patch.msgid.link/20250314-netcons_release-v1-0-07979c4b86af@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	docs: netconsole: document release feature	Breno Leitao
	Add documentation explaining the kernel release auto-population feature in netconsole. This feature appends kernel version information to the userdata dictionary in every message sent when enabled via the `release_enabled` file in the configfs hierarchy. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-netcons_release-v1-6-07979c4b86af@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	selftests: netconsole: Add tests for 'release' feature in sysdata	Breno Leitao
	Expands the self-tests to include the 'release' feature in sysdata. Verifies that enabling the 'release' feature appends the correct data and ensures that disabling it functions as expected. When enabled, the message should have an item similar to in the userdata: `release=$(uname -r)` Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-netcons_release-v1-5-07979c4b86af@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	netconsole: append release to sysdata	Breno Leitao
	Append the init_utsname()->release to sysdata buffer before sending the message in case the feature is set. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-netcons_release-v1-4-07979c4b86af@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	netconsole: add 'sysdata' suffix to related functions	Breno Leitao
	This commit appends a common "sysdata" suffix to functions responsible for appending data to sysdata. This change enhances code clarity and prevents naming conflicts with other "append" functions, particularly in anticipation of the upcoming inclusion of the `release` field in the next patch. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-netcons_release-v1-3-07979c4b86af@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	netconsole: implement configfs for release_enabled	Breno Leitao
	Implement the configfs helpers to show and set release_enabled configfs directories under userdata. When enabled, set the feature bit in netconsole_target->sysdata_fields. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-netcons_release-v1-2-07979c4b86af@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	netconsole: introduce 'release' as a new sysdata field	Breno Leitao
	This commit adds a new feature to the sysdata structure, allowing the kernel release/version to be appended as part of sysdata. Additionally, it updates the logic to count this new field as a used entry when enabled. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314-netcons_release-v1-1-07979c4b86af@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	net: airoha: fix CONFIG_DEBUG_FS check	Arnd Bergmann
	The #if check causes a build failure when CONFIG_DEBUG_FS is turned off: In file included from drivers/net/ethernet/airoha/airoha_eth.c:17: drivers/net/ethernet/airoha/airoha_eth.h:543:5: error: "CONFIG_DEBUG_FS" is not defined, evaluates to 0 [-Werror=undef] 543 \| #if CONFIG_DEBUG_FS \| ^~~~~~~~~~~~~~~ Replace it with the correct #ifdef. Fixes: 3fe15c640f38 ("net: airoha: Introduce PPE debugfs support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://patch.msgid.link/20250314155009.4114308-1-arnd@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	Merge tag 'io_uring-6.14-20250321' of git://git.kernel.dk/linux	Linus Torvalds
	Pull io_uring fix from Jens Axboe: "Single fix heading to stable, fixing an issue with io_req_msg_cleanup() sometimes too eagerly clearing cleanup flags" * tag 'io_uring-6.14-20250321' of git://git.kernel.dk/linux: io_uring/net: don't clear REQ_F_NEED_CLEANUP unconditionally
2025-03-21	ALSA: timer: Don't take register_mutex with copy_from/to_user()	Takashi Iwai
	The infamous mmap_lock taken in copy_from/to_user() can be often problematic when it's called inside another mutex, as they might lead to deadlocks. In the case of ALSA timer code, the bad pattern is with guard(mutex)(&register_mutex) that covers copy_from/to_user() -- which was mistakenly introduced at converting to guard(), and it had been carefully worked around in the past. This patch fixes those pieces simply by moving copy_from/to_user() out of the register mutex lock again. Fixes: 3923de04c817 ("ALSA: pcm: oss: Use guard() for setup") Reported-by: syzbot+2b96f44164236dda0f3b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/67dd86c8.050a0220.25ae54.0059.GAE@google.com Link: https://patch.msgid.link/20250321172653.14310-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-03-21	MAINTAINERS: update bridge entry	Nikolay Aleksandrov
	Roopa has decided to withdraw as a bridge maintainer and Ido has agreed to step up and co-maintain the bridge with me. He has been very helpful in bridge patch reviews and has contributed a lot to the bridge over the years. Add an entry for Roopa to CREDITS and also add bridge's headers to its MAINTAINERS entry. Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250314100631.40999-1-razor@blackwall.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	net: mctp: Remove unnecessary cast in mctp_cb	Herbert Xu
	The void * cast in mctp_cb is unnecessary as it's already been done at the start of the function. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/Z9PwOQeBSYlgZlHq@gondor.apana.org.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	PCI/bwctrl: Fix pcie_bwctrl_select_speed() return type	Ilpo Järvinen
	pcie_bwctrl_select_speed() should take __fls() of the speed bit, not return it as a raw value. Instead of directly returning 2.5GT/s speed bit, simply assign the fallback speed (2.5GT/s) into supported_speeds variable to share the normal return path that calls pcie_supported_speeds2target_speed() to calculate __fls(). This code path is not very likely to execute because pcie_get_supported_speeds() should provide valid ->supported_speeds but a spec violating device could fail to synthesize any speed in pcie_get_supported_speeds(). It could also happen in case the supported_speeds intersection is empty (also a violation of the current PCIe specs). Link: https://lore.kernel.org/r/20250321163103.5145-1-ilpo.jarvinen@linux.intel.com Fixes: de9a6c8d5dbf ("PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-21	Merge branch 'net-phy-remove-calls-to-devm_hwmon_sanitize_name'	Paolo Abeni
	Heiner Kallweit says: ==================== net: phy: remove calls to devm_hwmon_sanitize_name Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in devm_hwmon_device_register_with_info") we can simply provide NULL as name argument. ==================== Link: https://patch.msgid.link/198f3cd0-6c39-4783-afe7-95576a4b8539@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	net: phy: marvell-88q2xxx: remove call to devm_hwmon_sanitize_name	Heiner Kallweit
	Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in devm_hwmon_device_register_with_info") we can simply provide NULL as name argument. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/59c485e4-983c-42f6-9114-916703a62e3f@gmail.com Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	net: phy: mxl-gpy: remove call to devm_hwmon_sanitize_name	Heiner Kallweit
	Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in devm_hwmon_device_register_with_info") we can simply provide NULL as name argument. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/e34c4802-20ce-4556-a47c-812e602e8526@gmail.com Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	net: phy: tja11xx: remove call to devm_hwmon_sanitize_name	Heiner Kallweit
	Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in devm_hwmon_device_register_with_info") we can simply provide NULL as name argument. Note that neither priv->hwmon_name nor priv->hwmon_dev are used outside tja11xx_hwmon_register. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/4452cb7e-1a2f-4213-b49f-9de196be9204@gmail.com Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	net: phy: realtek: remove call to devm_hwmon_sanitize_name	Heiner Kallweit
	Since c909e68f8127 ("hwmon: (core) Use device name as a fallback in devm_hwmon_device_register_with_info") we can simply provide NULL as name argument. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/6e8d26f4-8d0a-4c83-aec3-378847a377eb@gmail.com Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21	PCI: pciehp: Don't enable HPIE when resuming in poll mode	Ilpo Järvinen
	PCIe hotplug can operate in poll mode without interrupt handlers using a polling kthread only. eb34da60edee ("PCI: pciehp: Disable hotplug interrupt during suspend") failed to consider that and enables HPIE (Hot-Plug Interrupt Enable) unconditionally when resuming the Port. Only set HPIE if non-poll mode is in use. This makes pcie_enable_interrupt() match how pcie_enable_notification() already handles HPIE. Link: https://lore.kernel.org/r/20250321162114.3939-1-ilpo.jarvinen@linux.intel.com Fixes: eb34da60edee ("PCI: pciehp: Disable hotplug interrupt during suspend") Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Lukas Wunner <lukas@wunner.de>
2025-03-21	of: address: Allow to specify nonposted-mmio per-device	Konrad Dybcio
	Certain IP blocks may strictly require/expect a nE mapping to function correctly, while others may be fine without it (which is preferred for performance reasons). Allow specifying nonposted-mmio on a per-device basis. Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250319-topic-nonposted_mmio-v1-2-dfb886fbd15f@oss.qualcomm.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-03-21	of: address: Expand nonposted-mmio to non-Apple Silicon platforms	Konrad Dybcio
	The nE memory attribute may be utilized by various implementations, not limited to Apple Silicon platforms. Drop the early CONFIG_ARCH_APPLE check. Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250319-topic-nonposted_mmio-v1-1-dfb886fbd15f@oss.qualcomm.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-03-21	docs: dt-bindings: Specify ordering for properties within groups	Dragan Simic
	Ordering of the individual properties inside each property group benefits from applying natural sort order [1] by the property names, because it results in more logical and more usable property lists, similarly to what's already the case with the alpha-numerical ordering of the nodes without unit addresses. Let's have this clearly specified in the DTS coding style, and let's expand the provided node example a bit, to actually show the results of applying natural sort order. Applying strict alpha-numerical ordering can result in property lists that are suboptimal from the usability standpoint. For the provided example, which stems from a real-world DT, [2][3][4] applying strict alpha-numerical ordering produces the following undesirable result: vdd-0v9-supply = <&board_vreg1>; vdd-12v-supply = <&board_vreg3>; vdd-1v8-supply = <&board_vreg4>; vdd-3v3-supply = <&board_vreg2>; Having the properties sorted in natural order by their associated voltages is more logical, more usable, and a bit more consistent. [1] https://en.wikipedia.org/wiki/Natural_sort_order [2] https://lore.kernel.org/linux-rockchip/b39cfd7490d8194f053bf3971f13a43472d1769e.1740941097.git.dsimic@manjaro.org/ [3] https://lore.kernel.org/linux-rockchip/174104113599.8946.16805724674396090918.b4-ty@sntech.de/ [4] https://lore.kernel.org/linux-rockchip/757afa87255212dfa5abf4c0e31deb08@manjaro.org/ Signed-off-by: Dragan Simic <dsimic@manjaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/6468619098f94d8acb00de0431c414c5fcfbbdbf.1742532899.git.dsimic@manjaro.org Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2025-03-21	drm/amd/pm: Update feature list for smu_v13_0_6	Asad Kamal
	Update feature list for smu_v13_0_6 to show vcn & smu deep sleep feature enable status Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21	drm/amdgpu: Add parameter documentation for amdgpu_sync_fence	Srinivasan Shanmugam
	The 'flags' parameter, which specifies memory allocation behavior while creating a sync entry, Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c:162: warning: Function parameter or struct member 'flags' not described in 'amdgpu_sync_fence' Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21	drm/amdgpu/discovery: optionally use fw based ip discovery	Alex Deucher
	On chips without native IP discovery support, use the fw binary if available, otherwise we can continue without it. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21	drm/amdgpu/discovery: use specific ip_discovery.bin for legacy asics	Flora Cui
	vega10/vega12/vega20/raven/raven2/picasso/arcturus/aldebaran Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21	drm/amdgpu/discovery: check ip_discovery fw file available	Flora Cui
	Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21	drm/amd/pm: Remove unnecessay UQ10 to UINT conversion	Asad Kamal
	Few of the metrics data for smu_v13_0_12 has not been reported in Q10 format, remove UQ10 to UINT conversion for those Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21	drm/amd/pm: Remove unnecessay UQ10 to UINT conversion	Asad Kamal
	Few of the metrics data for smu_v13_0_6 has not been reported in Q10 format, remove UQ10 to UINT conversion for those v2: Move smu_v13_0_12 changes to separate patch(Kevin) Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21	drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA	Jesse.zhang@amd.com
	This commit updates the VM flush implementation for the SDMA engine. - Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ register value for the specified VMID and flush type. This function ensures that all relevant page table cache levels (L1 PTEs, L2 PTEs, and L2 PDEs) are invalidated. - Modified the `sdma_v4_4_2_ring_emit_vm_flush` function to use the new `sdma_v4_4_2_get_invalidate_req` function. The updated function emits the necessary register writes and waits to perform a VM flush for the specified VMID. It updates the PTB address registers and issues a VM invalidation request using the specified VM invalidation engine. - Included the necessary header file `gc/gc_9_0_sh_mask.h` to provide access to the required register definitions. v2: vm flush by the vm inalidation packet (Lijo) v3: code stle and define thh macro for the vm invalidation packet (Christian) v4: Format definition sdma vm invalidate packet (Lijo) Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21	drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU ↵	Jesse.zhang@amd.com
	TLB flush - Modify the VM invalidation engine allocation logic to handle SDMA page rings. SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of allocating a separate engine. This change ensures efficient resource management and avoids the issue of insufficient VM invalidation engines. - Add synchronization for GPU TLB flush operations in gmc_v9_0.c. Use spin_lock and spin_unlock to ensure thread safety and prevent race conditions during TLB flush operations. This improves the stability and reliability of the driver, especially in multi-threaded environments. v2: replace the sdma ring check with a function `amdgpu_sdma_is_page_queue` to check if a ring is an SDMA page queue.(Lijo) v3: Add GC version check, only enabled on GC9.4.3/9.4.4/9.5.0 v4: Fix code style and add more detailed description (Christian) v5: Remove dependency on vm_inv_eng loop order, explicitly lookup shared inv_eng(Christian/Lijo) v6: Added search shared ring function amdgpu_sdma_get_shared_ring (Lijo) Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21	drm/amd/amdgpu: Increase max rings to enable SDMA page ring	Jesse.zhang@amd.com
	Increase the maximum number of rings supported by the AMDGPU driver from 133 to 149. This change is necessary to enable support for the SDMA page ring. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21	drm/amdgpu: Decode deferred error type in gfx aca bank parser	Xiang Liu
	In the case of injecting uncorrected error with background workload, the deferred error among uncorrected errors need to be specified by checking the deferred and poison bits of status register. v2: refine checking for deferred error v2: log possiable DEs among CEs v2: generate CPER records for DEs among UEs Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21	drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.5 GPUs	Srinivasan Shanmugam
	Enable the cleaner shader for GFX11.5.0/11.5.1 GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX11.5.0/11.5.1 GPUs, previously available for GFX11.0.3. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Mario Sopena-Novales <mario.novales@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-21	drm/amdgpu/mes: clean up SDMA HQD loop	Alex Deucher
	Follow the same logic as the other IP types. Reviewed-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>