summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-11-07net: arc: rockchip: fix emac mdio node supportJohan Jonker
The binding emac_rockchip.txt is converted to YAML. Changed against the original binding is an added MDIO subnode. This make the driver failed to find the PHY, and given the 'mdio has invalid PHY address' it is probably looking in the wrong node. Fix emac_mdio.c so that it can handle both old and new device trees. Fixes: 1dabb74971b3 ("ARM: dts: rockchip: restyle emac nodes") Signed-off-by: Johan Jonker <jbx6244@gmail.com> Tested-by: Andy Yan <andyshrk@163.com> Link: https://lore.kernel.org/r/20220603163539.537-3-jbx6244@gmail.com Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-07net: arc: fix the device for dma_map_single/dma_unmap_singleJohan Jonker
The ndev->dev and pdev->dev aren't the same device, use ndev->dev.parent which has dma_mask, ndev->dev.parent is just pdev->dev. Or it would cause the following issue: [ 39.933526] ------------[ cut here ]------------ [ 39.938414] WARNING: CPU: 1 PID: 501 at kernel/dma/mapping.c:149 dma_map_page_attrs+0x90/0x1f8 Fixes: f959dcd6ddfd ("dma-direct: Fix potential NULL pointer dereference") Signed-off-by: David Wu <david.wu@rock-chips.com> Signed-off-by: Johan Jonker <jbx6244@gmail.com> Signed-off-by: Andy Yan <andy.yan@rock-chips.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-07media: videobuf2-core: copy vb planes unconditionallyTudor Ambarus
Copy the relevant data from userspace to the vb->planes unconditionally as it's possible some of the fields may have changed after the buffer has been validated. Keep the dma_buf_put(planes[plane].dbuf) calls in the first `if (!reacquired)` case, in order to be close to the plane validation code where the buffers were got in the first place. Cc: stable@vger.kernel.org Fixes: 95af7c00f35b ("media: videobuf2-core: release all planes first in __prepare_dmabuf()") Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org> Tested-by: Will McVicker <willmcvicker@google.com> Acked-by: Tomasz Figa <tfiga@chromium.org> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl>
2024-11-07Merge branch 'virtio_net-make-rss-interact-properly-with-queue-number'Paolo Abeni
Philo Lu says: ==================== virtio_net: Make RSS interact properly with queue number With this patch set, RSS updates with queue_pairs changing: - When virtnet_probe, init default rss and commit - When queue_pairs changes _without_ user rss configuration, update rss with the new queue number - When queue_pairs changes _with_ user rss configuration, keep rss as user configured Patch 1 and 2 fix possible out of bound errors for indir_table and key. Patch 3 and 4 add RSS update in probe() and set_queues(). ==================== Link: https://patch.msgid.link/20241104085706.13872-1-lulie@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-07virtio_net: Update rss when set queuePhilo Lu
RSS configuration should be updated with queue number. In particular, it should be updated when (1) rss enabled and (2) default rss configuration is used without user modification. During rss command processing, device updates queue_pairs using rss.max_tx_vq. That is, the device updates queue_pairs together with rss, so we can skip the sperate queue_pairs update (VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET below) and return directly. Also remove the `vi->has_rss ?` check when setting vi->rss.max_tx_vq, because this is not used in the other hash_report case. Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-07virtio_net: Sync rss config to device when virtnet_probePhilo Lu
During virtnet_probe, default rss configuration is initialized, but was not committed to the device. This patch fix this by sending rss command after device ready in virtnet_probe. Otherwise, the actual rss configuration used by device can be different with that read by user from driver, which may confuse the user. If the command committing fails, driver rss will be disabled. Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Joe Damato <jdamato@fastly.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-07virtio_net: Add hash_key_length checkPhilo Lu
Add hash_key_length check in virtnet_probe() to avoid possible out of bound errors when setting/reading the hash key. Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Joe Damato <jdamato@fastly.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-07virtio_net: Support dynamic rss indirection table sizePhilo Lu
When reading/writing virtio_net_ctrl_rss, we get the indirection table size from vi->rss_indir_table_size, which is initialized in virtnet_probe(). However, the actual size of indirection_table was set as VIRTIO_NET_RSS_MAX_TABLE_LEN=128. This collision may cause issues if the vi->rss_indir_table_size exceeds 128. This patch instead uses dynamic indirection table, allocated with vi->rss after vi->rss_indir_table_size initialized. And free it in virtnet_remove(). In virtnet_commit_rss_command(), sgs for rss is initialized differently with hash_report. So indirection_table is not used if !vi->has_rss, and then we don't need to alloc indirection_table for hash_report only uses. Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Joe Damato <jdamato@fastly.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-07arm64: fix .data.rel.ro size assertion when CONFIG_LTO_CLANGMasahiro Yamada
Commit be2881824ae9 ("arm64/build: Assert for unwanted sections") introduced an assertion to ensure that the .data.rel.ro section does not exist. However, this check does not work when CONFIG_LTO_CLANG is enabled, because .data.rel.ro matches the .data.[0-9a-zA-Z_]* pattern in the DATA_MAIN macro. Move the ASSERT() above the RW_DATA() line. Fixes: be2881824ae9 ("arm64/build: Assert for unwanted sections") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241106161843.189927-1-masahiroy@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-11-07netfilter: nf_tables: wait for rcu grace period on net_device removalPablo Neira Ayuso
8c873e219970 ("netfilter: core: free hooks with call_rcu") removed synchronize_net() call when unregistering basechain hook, however, net_device removal event handler for the NFPROTO_NETDEV was not updated to wait for RCU grace period. Note that 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal") does not remove basechain rules on device removal, I was hinted to remove rules on net_device removal later, see 5ebe0b0eec9d ("netfilter: nf_tables: destroy basechain and rules on netdevice removal"). Although NETDEV_UNREGISTER event is guaranteed to be handled after synchronize_net() call, this path needs to wait for rcu grace period via rcu callback to release basechain hooks if netns is alive because an ongoing netlink dump could be in progress (sockets hold a reference on the netns). Note that nf_tables_pre_exit_net() unregisters and releases basechain hooks but it is possible to see NETDEV_UNREGISTER at a later stage in the netns exit path, eg. veth peer device in another netns: cleanup_net() default_device_exit_batch() unregister_netdevice_many_notify() notifier_call_chain() nf_tables_netdev_event() __nft_release_basechain() In this particular case, same rule of thumb applies: if netns is alive, then wait for rcu grace period because netlink dump in the other netns could be in progress. Otherwise, if the other netns is going away then no netlink dump can be in progress and basechain hooks can be released inmediately. While at it, turn WARN_ON() into WARN_ON_ONCE() for the basechain validation, which should not ever happen. Fixes: 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-07arm64: Kconfig: Make SME depend on BROKEN for nowMark Rutland
Although support for SME was merged in v5.19, we've since uncovered a number of issues with the implementation, including issues which might corrupt the FPSIMD/SVE/SME state of arbitrary tasks. While there are patches to address some of these issues, ongoing review has highlighted additional functional problems, and more time is necessary to analyse and fix these. For now, mark SME as BROKEN in the hope that we can fix things properly in the near future. As SME is an OPTIONAL part of ARMv9.2+, and there is very little extant hardware, this should not adversely affect the vast majority of users. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: stable@vger.kernel.org # 5.19 Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20241106164220.2789279-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2024-11-07arm64: smccc: Remove broken support for SMCCCv1.3 SVE discard hintMark Rutland
SMCCCv1.3 added a hint bit which callers can set in an SMCCC function ID (AKA "FID") to indicate that it is acceptable for the SMCCC implementation to discard SVE and/or SME state over a specific SMCCC call. The kernel support for using this hint is broken and SMCCC calls may clobber the SVE and/or SME state of arbitrary tasks, though FPSIMD state is unaffected. The kernel support is intended to use the hint when there is no SVE or SME state to save, and to do this it checks whether TIF_FOREIGN_FPSTATE is set or TIF_SVE is clear in assembly code: | ldr <flags>, [<current_task>, #TSK_TI_FLAGS] | tbnz <flags>, #TIF_FOREIGN_FPSTATE, 1f // Any live FP state? | tbnz <flags>, #TIF_SVE, 2f // Does that state include SVE? | | 1: orr <fid>, <fid>, ARM_SMCCC_1_3_SVE_HINT | 2: | << SMCCC call using FID >> This is not safe as-is: (1) SMCCC calls can be made in a preemptible context and preemption can result in TIF_FOREIGN_FPSTATE being set or cleared at arbitrary points in time. Thus checking for TIF_FOREIGN_FPSTATE provides no guarantee. (2) TIF_FOREIGN_FPSTATE only indicates that the live FP/SVE/SME state in the CPU does not belong to the current task, and does not indicate that clobbering this state is acceptable. When the live CPU state is clobbered it is necessary to update fpsimd_last_state.st to ensure that a subsequent context switch will reload FP/SVE/SME state from memory rather than consuming the clobbered state. This and the SMCCC call itself must happen in a critical section with preemption disabled to avoid races. (3) Live SVE/SME state can exist with TIF_SVE clear (e.g. with only TIF_SME set), and checking TIF_SVE alone is insufficient. Remove the broken support for the SMCCCv1.3 SVE saving hint. This is effectively a revert of commits: * cfa7ff959a78 ("arm64: smccc: Support SMCCC v1.3 SVE register saving hint") * a7c3acca5380 ("arm64: smccc: Save lr before calling __arm_smccc_sve_check()") ... leaving behind the ARM_SMCCC_VERSION_1_3 and ARM_SMCCC_1_3_SVE_HINT definitions, since these are simply definitions from the SMCCC specification, and the latter is used in KVM via ARM_SMCCC_CALL_HINTS. If we want to bring this back in future, we'll probably want to handle this logic in C where we can use all the usual FPSIMD/SVE/SME helper functions, and that'll likely require some rework of the SMCCC code and/or its callers. Fixes: cfa7ff959a78 ("arm64: smccc: Support SMCCC v1.3 SVE register saving hint") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: stable@vger.kernel.org Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241106160448.2712997-1-mark.rutland@arm.com Signed-off-by: Will Deacon <will@kernel.org>
2024-11-07x86/sev: Cleanup vc_handle_msr()Borislav Petkov (AMD)
Carve out the MSR_SVSM_CAA into a helper with the suggestion that upcoming future users should do the same. Rename that silly exit_info_1 into what it actually means in this function - whether the MSR access is a read or a write. No functional changes. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Link: https://lore.kernel.org/r/20241106172647.GAZyum1zngPDwyD2IJ@fat_crate.local
2024-11-07s390/pci: Add header guards and includes to internal headersNiklas Schnelle
While pci_iov.h has include guards it doesn't include the necessary header for struct zpci_dev, pci_bus.h on the other hand lacks even basic include guards. This isn't only fragile and breaks convention but also confuses tooling such as clang-analyzer. Add both include guards and the necessary includes. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-07s390/uvdevice: Fix and slightly improve kernel-doc commentHeiko Carstens
Fix incorrect kernel-doc comment style, add missing return statement, fix incorrect parameter name, and add some additional consistency across all kernel-doc comments. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-07s390/uvdevice: Support longer secret listsSteffen Eiden
Enable the list IOCTL to provide lists longer than one page (85 entries). The list IOCTL now accepts any argument length in page granularity. It fills the argument up to this length with entries until the list ends. User space unaware of this enhancement will still receive one page of data and an uv_rc 0x0100. Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20241104153609.1361388-1-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Message-ID: <20241104153609.1361388-1-seiden@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-07s390/sparsemem: Provide phys_to_target_node() with CONFIG_NUMAHeiko Carstens
Add a trival phys_to_target_node() implementation which always returns 0 if CONFIG_NUMA is enabled, since the s390 NUMA implementation only supports node 0. This is similar to memory_add_physaddr_to_nid() in order to avoid runtime warnings. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-07s390/configs: Enable CONFIG_VIRTIO_MEMHeiko Carstens
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-07Merge branch 'virtio-mem' into featuresHeiko Carstens
David Hildenbrand says: ==================== virtio-mem: s390 support Let's finally add s390 support for virtio-mem; my last RFC was sent 4 years ago, and a lot changed in the meantime. The latest QEMU series is available at [1], which contains some more details and a usage example on s390 (last patch). There is not too much in here: The biggest part is querying a new diag(500) STORAGE_LIMIT hypercall to obtain the proper "max_physmem_end". The last three patches are not strictly required but certainly nice-to-have. Note that -- in contrast to standby memory -- virtio-mem memory must be configured to be automatically onlined as soon as hotplugged. The easiest approach is using the "memhp_default_state=" kernel parameter or by using proper udev rules. More details can be found at [2]. I have reviving+upstreaming a systemd service to handle configuring that on my todo list, but for some reason I keep getting distracted ... I tested various things, including: * Various memory hotplug/hotunplug combinations * Device hotplug/hotunplug * /proc/iomem output * reboot * kexec * kdump: make sure we properly enter the "kdump mode" in the virtio-mem driver kdump support for virtio-mem memory on s390 will be sent out separately. v2 -> v3 * "s390/kdump: make is_kdump_kernel() consistently return "true" in kdump environments only" -> Sent out separately [3] * "s390/physmem_info: query diag500(STORAGE LIMIT) to support QEMU/KVM memory devices" -> No query function for diag500 for now. -> Update comment above setup_ident_map_size(). -> Optimize/rewrite diag500_storage_limit() [Heiko] -> Change handling in detect_physmem_online_ranges [Alexander] -> Improve documentation. * "s390/sparsemem: provide memory_add_physaddr_to_nid() with CONFIG_NUMA" -> Added after testing on systems with CONFIG_NUMA=y v1 -> v2: * Document the new diag500 subfunction * Use "s390" instead of "s390x" consistently [1] https://lkml.kernel.org/r/20241008105455.2302628-1-david@redhat.com [2] https://virtio-mem.gitlab.io/user-guide/user-guide-linux.html [3] https://lkml.kernel.org/r/20241023090651.1115507-1-david@redhat.com ==================== Link: https://lore.kernel.org/r/20241025141453.1210600-1-david@redhat.com/ Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-07s390/sparsemem: Provide memory_add_physaddr_to_nid() with CONFIG_NUMADavid Hildenbrand
virtio-mem uses memory_add_physaddr_to_nid() to determine the NID to use for memory it adds. We currently fallback to the dummy implementation in mm/numa.c with CONFIG_NUMA, which will end up triggering an undesired pr_info_once(): Unknown online node for memory at 0x100000000, assuming node 0 On s390, we map all cpus and memory to node 0, so let's add a simple memory_add_physaddr_to_nid() implementation that does exactly that, but without complaining. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241025141453.1210600-8-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-07s390/sparsemem: Reduce section size to 128 MiBDavid Hildenbrand
Ever since commit 421c175c4d609 ("[S390] Add support for memory hot-add.") we've been using a section size of 256 MiB on s390 and 32 MiB on s390. Before that, we were using a section size of 32 MiB on both architectures. Now that we have a new mechanism to expose additional memory to a VM -- virtio-mem -- reduce the section size to 128 MiB to allow for more flexibility and reduce the metadata overhead when dealing with hot(un)plug granularity smaller than 256 MiB. 128 MiB has been used by x86-64 since the very beginning. arm64 with 4k base pages switched to 128 MiB as well: it's just big enough on these architectures to allows for using a huge page (2 MiB) in the vmemmap in sane setups with sizeof(struct page) == 64 bytes and a huge page mapping in the direct mapping, while still allowing for small hot(un)plug granularity. For s390, we could even switch to a 64 MiB section size, as our huge page size is 1 MiB: but the smaller the section size, the more sections we'll have to manage especially on bigger machines. Making it consistent with x86-64 and arm64 feels like the right thing for now. Note that the smallest memory hot(un)plug granularity is also limited by the memory block size, determined by extracting the memory increment size from SCLP. Under QEMU/KVM, implementing virtio-mem, we expose 0; therefore, we'll end up with a memory block size of 128 MiB with a 128 MiB section size. Tested-by: Mario Casquero <mcasquer@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241025141453.1210600-7-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-07lib/Kconfig.debug: Default STRICT_DEVMEM to "y" on s390David Hildenbrand
virtio-mem currently depends on !DEVMEM | STRICT_DEVMEM. Let's default STRICT_DEVMEM to "y" just like we do for arm64 and x86. There could be ways in the future to filter access to virtio-mem device memory even without STRICT_DEVMEM, but for now let's just keep it simple. Tested-by: Mario Casquero <mcasquer@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: David Hildenbrand <david@redhat.com> Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241025141453.1210600-6-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-07virtio-mem: s390 supportDavid Hildenbrand
Now that s390 code is prepared for memory devices that reside above the maximum storage increment exposed through SCLP, everything is in place to unlock virtio-mem support. As virtio-mem in Linux currently supports logically onlining/offlining memory in pageblock granularity, we have an effective hot(un)plug granularity of 1 MiB on s390. As virito-mem adds/removes individual Linux memory blocks (256MB), we will currently never use gigantic pages in the identity mapping. It is worth noting that neither storage keys nor storage attributes (e.g., data / nodat) are touched when onlining memory blocks, which is good because we are not supposed to touch these parts for unplugged device blocks that are logically offline in Linux. We will currently never initialize storage keys for virtio-mem memory -- IOW, storage_key_init_range() is never called. It could be added in the future when plugging device blocks. But as that function essentially does nothing without modifying the code (changing PAGE_DEFAULT_ACC), that's just fine for now. kexec should work as intended and just like on other architectures that support virtio-mem: we will never place kexec binaries on virtio-mem memory, and never indicate virtio-mem memory to the 2nd kernel. The device driver in the 2nd kernel can simply reset the device -- turning all memory unplugged, to then start plugging memory and adding them to Linux, without causing trouble because the memory is already used elsewhere. The special s390 kdump mode, whereby the 2nd kernel creates the ELF core header, won't currently dump virtio-mem memory. The virtio-mem driver has a special kdump mode, from where we can detect memory ranges to dump. Based on this, support for dumping virtio-mem memory can be added in the future fairly easily. Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241025141453.1210600-5-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-07s390/physmem_info: Query diag500(STORAGE LIMIT) to support QEMU/KVM memory ↵David Hildenbrand
devices To support memory devices under QEMU/KVM, such as virtio-mem, we have to prepare our kernel virtual address space accordingly and have to know the highest possible physical memory address we might see later: the storage limit. The good old SCLP interface is not suitable for this use case. In particular, memory owned by memory devices has no relationship to storage increments, it is always detected using the device driver, and unaware OSes (no driver) must never try making use of that memory. Consequently this memory is located outside of the "maximum storage increment"-indicated memory range. Let's use our new diag500 STORAGE_LIMIT subcode to query this storage limit that can exceed the "maximum storage increment", and use the existing interfaces (i.e., SCLP) to obtain information about the initial memory that is not owned+managed by memory devices. If a hypervisor does not support such memory devices, the address exposed through diag500 STORAGE_LIMIT will correspond to the maximum storage increment exposed through SCLP. To teach kdump on s390 to include memory owned by memory devices, there will be ways to query the relevant memory ranges from the device via a driver running in special kdump mode (like virtio-mem already implements to filter /proc/vmcore access so we don't end up reading from unplugged device blocks). Update setup_ident_map_size(), to clarify that there can be more than just online and standby memory. Tested-by: Mario Casquero <mcasquer@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241025141453.1210600-4-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-07Documentation: s390-diag.rst: Document diag500(STORAGE LIMIT) subfunctionDavid Hildenbrand
Let's document our new diag500 subfunction that can be implemented by userspace. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241025141453.1210600-3-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-07Documentation: s390-diag.rst: Make diag500 a generic KVM hypercallDavid Hildenbrand
Let's make it a generic KVM hypercall, allowing other subfunctions to be more independent of virtio. While at it, document that unsupported/unimplemented subfunctions result in a SPECIFICATION exception. This is a preparation for documenting a new subfunction. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241025141453.1210600-2-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-07s390/kvm: Mask extra bits from program interrupt codeClaudio Imbrenda
The program interrupt code has some extra bits that are sometimes set by hardware for various reasons; those bits should be ignored when the program interrupt number is needed for interrupt handling. Fixes: 05066cafa925 ("s390/mm/fault: Handle guest-related program interrupts in KVM") Reported-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Tested-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241031120316.25462-1-imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-07net: stmmac: Fix unbalanced IRQ wake disable warning on single irq caseNícolas F. R. A. Prado
Commit a23aa0404218 ("net: stmmac: ethtool: Fixed calltrace caused by unbalanced disable_irq_wake calls") introduced checks to prevent unbalanced enable and disable IRQ wake calls. However it only initialized the auxiliary variable on one of the paths, stmmac_request_irq_multi_msi(), missing the other, stmmac_request_irq_single(). Add the same initialization on stmmac_request_irq_single() to prevent "Unbalanced IRQ <x> wake disable" warnings from being printed the first time disable_irq_wake() is called on platforms that run on that code path. Fixes: a23aa0404218 ("net: stmmac: ethtool: Fixed calltrace caused by unbalanced disable_irq_wake calls") Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241101-stmmac-unbalanced-wake-single-fix-v1-1-5952524c97f0@collabora.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-06net: vertexcom: mse102x: Fix possible double free of TX skbStefan Wahren
The scope of the TX skb is wider than just mse102x_tx_frame_spi(), so in case the TX skb room needs to be expanded, we should free the the temporary skb instead of the original skb. Otherwise the original TX skb pointer would be freed again in mse102x_tx_work(), which leads to crashes: Internal error: Oops: 0000000096000004 [#2] PREEMPT SMP CPU: 0 PID: 712 Comm: kworker/0:1 Tainted: G D 6.6.23 Hardware name: chargebyte Charge SOM DC-ONE (DT) Workqueue: events mse102x_tx_work [mse102x] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : skb_release_data+0xb8/0x1d8 lr : skb_release_data+0x1ac/0x1d8 sp : ffff8000819a3cc0 x29: ffff8000819a3cc0 x28: ffff0000046daa60 x27: ffff0000057f2dc0 x26: ffff000005386c00 x25: 0000000000000002 x24: 00000000ffffffff x23: 0000000000000000 x22: 0000000000000001 x21: ffff0000057f2e50 x20: 0000000000000006 x19: 0000000000000000 x18: ffff00003fdacfcc x17: e69ad452d0c49def x16: 84a005feff870102 x15: 0000000000000000 x14: 000000000000024a x13: 0000000000000002 x12: 0000000000000000 x11: 0000000000000400 x10: 0000000000000930 x9 : ffff00003fd913e8 x8 : fffffc00001bc008 x7 : 0000000000000000 x6 : 0000000000000008 x5 : ffff00003fd91340 x4 : 0000000000000000 x3 : 0000000000000009 x2 : 00000000fffffffe x1 : 0000000000000000 x0 : 0000000000000000 Call trace: skb_release_data+0xb8/0x1d8 kfree_skb_reason+0x48/0xb0 mse102x_tx_work+0x164/0x35c [mse102x] process_one_work+0x138/0x260 worker_thread+0x32c/0x438 kthread+0x118/0x11c ret_from_fork+0x10/0x20 Code: aa1303e0 97fffab6 72001c1f 54000141 (f9400660) Cc: stable@vger.kernel.org Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support") Signed-off-by: Stefan Wahren <wahrenst@gmx.net> Link: https://patch.msgid.link/20241105163101.33216-1-wahrenst@gmx.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07btrfs: fix the length of reserved qgroup to freeHaisu Wang
The dealloc flag may be cleared and the extent won't reach the disk in cow_file_range when errors path. The reserved qgroup space is freed in commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range"). However, the length of untouched region to free needs to be adjusted with the correct remaining region size. Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range") CC: stable@vger.kernel.org # 6.11+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Haisu Wang <haisuwang@tencent.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-07btrfs: reinitialize delayed ref list after deleting it from the listFilipe Manana
At insert_delayed_ref() if we need to update the action of an existing ref to BTRFS_DROP_DELAYED_REF, we delete the ref from its ref head's ref_add_list using list_del(), which leaves the ref's add_list member not reinitialized, as list_del() sets the next and prev members of the list to LIST_POISON1 and LIST_POISON2, respectively. If later we end up calling drop_delayed_ref() against the ref, which can happen during merging or when destroying delayed refs due to a transaction abort, we can trigger a crash since at drop_delayed_ref() we call list_empty() against the ref's add_list, which returns false since the list was not reinitialized after the list_del() and as a consequence we call list_del() again at drop_delayed_ref(). This results in an invalid list access since the next and prev members are set to poison pointers, resulting in a splat if CONFIG_LIST_HARDENED and CONFIG_DEBUG_LIST are set or invalid poison pointer dereferences otherwise. So fix this by deleting from the list with list_del_init() instead. Fixes: 1d57ee941692 ("btrfs: improve delayed refs iterations") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-07btrfs: fix per-subvolume RO/RW flags with new mount APIQu Wenruo
[BUG] With util-linux 2.40.2, the 'mount' utility is already utilizing the new mount API. e.g: # strace mount -o subvol=subv1,ro /dev/test/scratch1 /mnt/test/ ... fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/mapper/test-scratch1", 0) = 0 fsconfig(3, FSCONFIG_SET_STRING, "subvol", "subv1", 0) = 0 fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0 fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0 fsmount(3, FSMOUNT_CLOEXEC, 0) = 4 mount_setattr(4, "", AT_EMPTY_PATH, {attr_set=MOUNT_ATTR_RDONLY, attr_clr=0, propagation=0 /* MS_??? */, userns_fd=0}, 32) = 0 move_mount(4, "", AT_FDCWD, "/mnt/test", MOVE_MOUNT_F_EMPTY_PATH) = 0 But this leads to a new problem, that per-subvolume RO/RW mount no longer works, if the initial mount is RO: # mount -o subvol=subv1,ro /dev/test/scratch1 /mnt/test # mount -o rw,subvol=subv2 /dev/test/scratch1 /mnt/scratch # mount | grep mnt /dev/mapper/test-scratch1 on /mnt/test type btrfs (ro,relatime,discard=async,space_cache=v2,subvolid=256,subvol=/subv1) /dev/mapper/test-scratch1 on /mnt/scratch type btrfs (ro,relatime,discard=async,space_cache=v2,subvolid=257,subvol=/subv2) # touch /mnt/scratch/foobar touch: cannot touch '/mnt/scratch/foobar': Read-only file system This is a common use cases on distros. [CAUSE] We have a workaround for remount to handle the RO->RW change, but if the mount is using the new mount API, we do not do that, and rely on the mount tool NOT to set the ro flag. But that's not how the mount tool is doing for the new API: fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/mapper/test-scratch1", 0) = 0 fsconfig(3, FSCONFIG_SET_STRING, "subvol", "subv1", 0) = 0 fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0 <<<< Setting RO flag for super block fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0 fsmount(3, FSMOUNT_CLOEXEC, 0) = 4 mount_setattr(4, "", AT_EMPTY_PATH, {attr_set=MOUNT_ATTR_RDONLY, attr_clr=0, propagation=0 /* MS_??? */, userns_fd=0}, 32) = 0 move_mount(4, "", AT_FDCWD, "/mnt/test", MOVE_MOUNT_F_EMPTY_PATH) = 0 This means we will set the super block RO at the first mount. Later RW mount will not try to reconfigure the fs to RW because the mount tool is already using the new API. This totally breaks the per-subvolume RO/RW mount behavior. [FIX] Do not skip the reconfiguration even if using the new API. The old comments are just expecting any mount tool to properly skip the RO flag set even if we specify "ro", which is not the reality. Update the comments regarding the backward compatibility on the kernel level so it works with old and new mount utilities. CC: stable@vger.kernel.org # 6.8+ Fixes: f044b318675f ("btrfs: handle the ro->rw transition for mounting different subvolumes") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-07irqchip/gic-v3: Force propagation of the active state with a read-backMarc Zyngier
Christoffer reports that on some implementations, writing to GICR_ISACTIVER0 (and similar GICD registers) can race badly with a guest issuing a deactivation of that interrupt via the system register interface. There are multiple reasons to this: - this uses an early write-acknoledgement memory type (nGnRE), meaning that the write may only have made it as far as some interconnect by the time the store is considered "done" - the GIC itself is allowed to buffer the write until it decides to take it into account (as long as it is in finite time) The effects are that the activation may not have taken effect by the time the kernel enters the guest, forcing an immediate exit, or that a guest deactivation occurs before the interrupt is active, doing nothing. In order to guarantee that the write to the ISACTIVER register has taken effect, read back from it, forcing the interconnect to propagate the write, and the GIC to process the write before returning the read. Reported-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20241106084418.3794612-1-maz@kernel.org
2024-11-06Merge tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "These are mostly fixes that came up during the nfs bakeathon the other week. Stable Fixes: - Fix KMSAN warning in decode_getfattr_attrs() Other Bugfixes: - Handle -ENOTCONN in xs_tcp_setup_socked() - NFSv3: only use NFS timeout for MOUNT when protocols are compatible - Fix attribute delegation behavior on exclusive create and a/mtime changes - Fix localio to cope with racing nfs_local_probe() - Avoid i_lock contention in fs_clear_invalid_mapping()" * tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: nfs: avoid i_lock contention in nfs_clear_invalid_mapping nfs_common: fix localio to cope with racing nfs_local_probe() NFS: Further fixes to attribute delegation a/mtime changes NFS: Fix attribute delegation behaviour on exclusive create nfs: Fix KMSAN warning in decode_getfattr_attrs() NFSv3: only use NFS timeout for MOUNT when protocols are compatible sunrpc: handle -ENOTCONN in xs_tcp_setup_socket()
2024-11-06media: dvbdev: fix the logic when DVB_DYNAMIC_MINORS is not setMauro Carvalho Chehab
When CONFIG_DVB_DYNAMIC_MINORS, ret is not initialized, and a semaphore is left at the wrong state, in case of errors. Make the code simpler and avoid mistakes by having just one error check logic used weather DVB_DYNAMIC_MINORS is used or not. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202410201717.ULWWdJv8-lkp@intel.com/ Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/9e067488d8935b8cf00959764a1fa5de85d65725.1730926254.git.mchehab+huawei@kernel.org
2024-11-06io_uring: avoid normal tw intermediate fallbackPavel Begunkov
When a DEFER_TASKRUN io_uring is terminating it requeues deferred task work items as normal tw, which can further fallback to kthread execution. Avoid this extra step and always push them to the fallback kthread. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d1cd472cec2230c66bd1c8d412a5833f0af75384.1730772720.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06io_uring/napi: add static napi tracking strategyOlivier Langlois
Add the static napi tracking strategy. That allows the user to manually manage the napi ids list for busy polling, and eliminate the overhead of dynamically updating the list from the fast path. Signed-off-by: Olivier Langlois <olivier@trillion01.com> Link: https://lore.kernel.org/r/96943de14968c35a5c599352259ad98f3c0770ba.1728828877.git.olivier@trillion01.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06io_uring/napi: clean up __io_napi_do_busy_loopOlivier Langlois
__io_napi_do_busy_loop now requires to have loop_end in its parameters. This makes the code cleaner and also has the benefit of removing a branch since the only caller not passing NULL for loop_end_arg is also setting the value conditionally. Signed-off-by: Olivier Langlois <olivier@trillion01.com> Link: https://lore.kernel.org/r/d5b9bb91b1a08fff50525e1c18d7b4709b9ca100.1728828877.git.olivier@trillion01.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06io_uring/napi: Use lock guardsOlivier Langlois
Convert napi locks to use the shiny new Scope-Based Resource Management machinery. Signed-off-by: Olivier Langlois <olivier@trillion01.com> Link: https://lore.kernel.org/r/2680ca47ee183cfdb89d1a40c84d349edeb620ab.1728828877.git.olivier@trillion01.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06io_uring/napi: improve __io_napi_addOlivier Langlois
1. move the sock->sk pointer validity test outside the function to avoid the function call overhead and to make the function more more reusable 2. change its name to __io_napi_add_id to be more precise about it is doing 3. return an error code to report errors Signed-off-by: Olivier Langlois <olivier@trillion01.com> Link: https://lore.kernel.org/r/d637fa3b437d753c0f4e44ff6a7b5bf2c2611270.1728828877.git.olivier@trillion01.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06io_uring/napi: fix io_napi_entry RCU accessesOlivier Langlois
correct 3 RCU structures modifications that were not using the RCU functions to make their update. Signed-off-by: Olivier Langlois <olivier@trillion01.com> Link: https://lore.kernel.org/r/9f53b5169afa8c7bf3665a0b19dc2f7061173530.1728828877.git.olivier@trillion01.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06io_uring/napi: protect concurrent io_napi_entry timeout accessesOlivier Langlois
io_napi_entry timeout value can be updated while accessed from the poll functions. Its concurrent accesses are wrapped with READ_ONCE()/WRITE_ONCE() macros to avoid incorrect compiler optimizations. Signed-off-by: Olivier Langlois <olivier@trillion01.com> Link: https://lore.kernel.org/r/3de3087563cf98f75266fd9f85fdba063a8720db.1728828877.git.olivier@trillion01.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06io_uring: prevent speculating sq_array indexingPavel Begunkov
The SQ index array consists of user provided indexes, which io_uring then uses to index the SQ, and so it's susceptible to speculation. For all other queues io_uring tracks heads and tails in kernel, and they shouldn't need any special care. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c6c7a25962924a55869e317e4fdb682dfdc6b279.1730687889.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06io_uring: move struct io_kiocb from task_struct to io_uring_taskJens Axboe
Rather than store the task_struct itself in struct io_kiocb, store the io_uring specific task_struct. The life times are the same in terms of io_uring, and this avoids doing some dereferences through the task_struct. For the hot path of putting local task references, we can deref req->tctx instead, which we'll need anyway in that function regardless of whether it's local or remote references. This is mostly straight forward, except the original task PF_EXITING check needs a bit of tweaking. task_work is _always_ run from the originating task, except in the fallback case, where it's run from a kernel thread. Replace the potentially racy (in case of fallback work) checks for req->task->flags with current->flags. It's either the still the original task, in which case PF_EXITING will be sane, or it has PF_KTHREAD set, in which case it's fallback work. Both cases should prevent moving forward with the given request. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06io_uring: remove task ref helpersJens Axboe
They are only used right where they are defined, just open-code them inside io_put_task(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06io_uring: move cancelations to be io_uring_task basedJens Axboe
Right now the task_struct pointer is used as the key to match a task, but in preparation for some io_kiocb changes, move it to using struct io_uring_task instead. No functional changes intended in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06io_uring/rsrc: split io_kiocb node type assignmentsJens Axboe
Currently the io_rsrc_node assignment in io_kiocb is an array of two pointers, as two nodes may be assigned to a request - one file node, and one buffer node. However, the buffer node can co-exist with the provided buffers, as currently it's not supported to use both provided and registered buffers at the same time. This crucially brings struct io_kiocb down to 4 cache lines again, as before it spilled into the 5th cacheline. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06io_uring/rsrc: encode node type and ctx togetherJens Axboe
Rather than keep the type field separate rom ctx, use the fact that we can encode up to 4 types of nodes in the LSB of the ctx pointer. Doesn't reclaim any space right now on 64-bit archs, but it leaves a full int for future use. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-06ASoC: SOF: amd: Fix for incorrect DMA ch status register offsetVenkata Prasad Potturu
DMA ch status register offset change in acp7.0 platform Incorrect DMA channel status register offset check lead to firmware boot failure. [ 14.432497] snd_sof_amd_acp70 0000:c4:00.5: ------------[ DSP dump start ]------------ [ 14.432533] snd_sof_amd_acp70 0000:c4:00.5: Firmware boot failure due to timeout [ 14.432549] snd_sof_amd_acp70 0000:c4:00.5: fw_state: SOF_FW_BOOT_IN_PROGRESS (3) [ 14.432610] snd_sof_amd_acp70 0000:c4:00.5: invalid header size 0x71c41000. FW oops is bogus [ 14.432626] snd_sof_amd_acp70 0000:c4:00.5: unexpected fault 0x71c40000 trace 0x71c40000 [ 14.432642] snd_sof_amd_acp70 0000:c4:00.5: ------------[ DSP dump end ]------------ [ 14.432657] snd_sof_amd_acp70 0000:c4:00.5: error: failed to boot DSP firmware -5 [ 14.432672] snd_sof_amd_acp70 0000:c4:00.5: fw_state change: 3 -> 4 [ 14.433260] dmic-codec dmic-codec: ASoC: Unregistered DAI 'dmic-hifi' [ 14.433319] snd_sof_amd_acp70 0000:c4:00.5: fw_state change: 4 -> 0 [ 14.433358] snd_sof_amd_acp70 0000:c4:00.5: error: sof_probe_work failed err: -5 Update correct register offset for DMA ch status register. Fixes: 490be7ba2a01 ("ASoC: SOF: amd: add support for acp7.0 based platform") Signed-off-by: Venkata Prasad Potturu <venkataprasad.potturu@amd.com> Link: https://patch.msgid.link/20241106142658.1240929-1-venkataprasad.potturu@amd.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-06ACPI: processor: Move arch_init_invariance_cppc() call laterMario Limonciello
arch_init_invariance_cppc() is called at the end of acpi_cppc_processor_probe() in order to configure frequency invariance based upon the values from _CPC. This however doesn't work on AMD CPPC shared memory designs that have AMD preferred cores enabled because _CPC needs to be analyzed from all cores to judge if preferred cores are enabled. This issue manifests to users as a warning since commit 21fb59ab4b97 ("ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"): ``` Could not retrieve highest performance (-19) ``` However the warning isn't the cause of this, it was actually commit 279f838a61f9 ("x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()") which exposed the issue. To fix this problem, change arch_init_invariance_cppc() into a new weak symbol that is called at the end of acpi_processor_driver_init(). Each architecture that supports it can declare the symbol to override the weak one. Define it for x86, in arch/x86/kernel/acpi/cppc.c, and for all of the architectures using the generic arch_topology.c code. Fixes: 279f838a61f9 ("x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()") Reported-by: Ivan Shapovalov <intelfx@intelfx.name> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219431 Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://patch.msgid.link/20241104222855.3959267-1-superm1@kernel.org [ rjw: Changelog edit ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>