summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2025-03-21io_uring/cmd: introduce io_uring_cmd_import_fixed_vecPavel Begunkov
io_uring_cmd_import_fixed_vec() is a cmd helper around vectored registered buffer import functions, which caches the memory under the hood. The lifetime of the vectore and hence the iterator is bound to the request. Furthermore, the user is not allowed to call it multiple times for a single request. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/97487a80dec3fb8cf8aeedf1f9026ef6d503fe4b.1742579999.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-21x86/hyperv: Add comments about hv_vpset and var size hypercall input argsMichael Kelley
Current code varies in how the size of the variable size input header for hypercalls is calculated when the input contains struct hv_vpset. Surprisingly, this variation is correct, as different hypercalls make different choices for what portion of struct hv_vpset is treated as part of the variable size input header. The Hyper-V TLFS is silent on these details, but the behavior has been confirmed with Hyper-V developers. To avoid future confusion about these differences, add comments to struct hv_vpset, and to hypercall call sites with input that contains a struct hv_vpset. The comments describe the overall situation and the calculation that should be used at each particular call site. No functional change as only comments are updated. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250318214919.958953-1-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250318214919.958953-1-mhklinux@outlook.com>
2025-03-21Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMsNuno Das Neves
Provide a set of IOCTLs for creating and managing child partitions when running as root partition on Hyper-V. The new driver is enabled via CONFIG_MSHV_ROOT. A brief overview of the interface: MSHV_CREATE_PARTITION is the entry point, returning a file descriptor representing a child partition. IOCTLs on this fd can be used to map memory, create VPs, etc. Creating a VP returns another file descriptor representing that VP which in turn has another set of corresponding IOCTLs for running the VP, getting/setting state, etc. MSHV_ROOT_HVCALL is a generic "passthrough" hypercall IOCTL which can be used for a number of partition or VP hypercalls. This is for hypercalls that do not affect any state in the kernel driver, such as getting and setting VP registers and partition properties, translating addresses, etc. It is "passthrough" because the binary input and output for the hypercall is only interpreted by the VMM - the kernel driver does nothing but insert the VP and partition id where necessary (which are always in the same place), and execute the hypercall. Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Co-developed-by: Jinank Jain <jinankjain@microsoft.com> Signed-off-by: Jinank Jain <jinankjain@microsoft.com> Co-developed-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com> Co-developed-by: Muminul Islam <muislam@microsoft.com> Signed-off-by: Muminul Islam <muislam@microsoft.com> Co-developed-by: Praveen K Paladugu <prapal@linux.microsoft.com> Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> Co-developed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Co-developed-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Roman Kisel <romank@linux.microsoft.com> Link: https://lore.kernel.org/r/1741980536-3865-11-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1741980536-3865-11-git-send-email-nunodasneves@linux.microsoft.com>
2025-03-21net: mctp: Remove unnecessary cast in mctp_cbHerbert Xu
The void * cast in mctp_cb is unnecessary as it's already been done at the start of the function. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/Z9PwOQeBSYlgZlHq@gondor.apana.org.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containersTrond Myklebust
Propagate the NFS_MOUNT_NETUNREACH_FATAL flag to work with the pNFS flexfiles client. In these circumstances, the client needs to treat the ENETDOWN and ENETUNREACH errors as fatal, and should abandon the attempted I/O. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-21NFS: Treat ENETUNREACH errors as fatal in containersTrond Myklebust
Propagate the NFS_MOUNT_NETUNREACH_FATAL flag to work with the generic NFS client. If the flag is set, the client will receive ENETDOWN and ENETUNREACH errors from the RPC layer, and is expected to treat them as being fatal. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-21NFS: Add a mount option to make ENETUNREACH errors fatalTrond Myklebust
If the NFS client was initially created in a container, and that container is torn down, there is usually no possibity to go back and destroy any NFS clients that are hung because their virtual network devices have been unlinked. Add a flag that tells the NFS client that in these circumstances, it should treat ENETDOWN and ENETUNREACH errors as fatal to the NFS client. The option defaults to being on when the mount happens from inside a net namespace that is not "init_net". Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Tested-by: Jeff Layton <jlayton@kernel.org> Acked-by: Chuck Lever <chuck.lever@oracle.com>
2025-03-21net: remove sb1000 cable modem driverArnd Bergmann
This one is hilariously outdated, it provided a faster downlink over TV cable for users of analog modems in the 1990s, through an ISA card. The web page for the userspace tools has been broken for 25 years, and the driver has only ever seen mechanical updates. Link: http://web.archive.org/web/20000611165545/http://home.adelphia.net:80/~siglercm/sb1000.html Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250312085236.2531870-1-arnd@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-21sunrpc: Add a sysfs file for adding a new xprtAnna Schumaker
Writing to this file will clone the 'main' xprt of an xprt_switch and add it to be used as an additional connection. -- Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> v3: Replace call to xprt_iter_get_xprt() with xprt_iter_get_next() Link: https://lore.kernel.org/r/20250207204225.594002-5-anna@kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-21NFS: Extend rdirplus mount option with "force|none"Benjamin Coddington
There are certain users that wish to force the NFS client to choose READDIRPLUS over READDIR for a particular mount. Update the "rdirplus" mount option to optionally accept values. For "rdirplus=force", the NFS client will always attempt to use READDDIRPLUS. The setting of "rdirplus=none" is aliased to the existing "nordirplus". Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/r/c4cf0de4c8be0930b91bc74bee310d289781cd3b.1741885071.git.bcodding@redhat.com Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-03-21landlock: Add the errata interfaceMickaël Salaün
Some fixes may require user space to check if they are applied on the running kernel before using a specific feature. For instance, this applies when a restriction was previously too restrictive and is now getting relaxed (e.g. for compatibility reasons). However, non-visible changes for legitimate use (e.g. security fixes) do not require an erratum. Because fixes are backported down to a specific Landlock ABI, we need a way to avoid cherry-pick conflicts. The solution is to only update a file related to the lower ABI impacted by this issue. All the ABI files are then used to create a bitmask of fixes. The new errata interface is similar to the one used to get the supported Landlock ABI version, but it returns a bitmask instead because the order of fixes may not match the order of versions, and not all fixes may apply to all versions. The actual errata will come with dedicated commits. The description is not actually used in the code but serves as documentation. Create the landlock_abi_version symbol and use its value to check errata consistency. Update test_base's create_ruleset_checks_ordering tests and add errata tests. This commit is backportable down to the first version of Landlock. Fixes: 3532b0b4352c ("landlock: Enable user space to infer supported features") Cc: Günther Noack <gnoack@google.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250318161443.279194-3-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-03-21crypto: lib/chacha - remove unused arch-specific init supportEric Biggers
All implementations of chacha_init_arch() just call chacha_init_generic(), so it is pointless. Just delete it, and replace chacha_init() with what was previously chacha_init_generic(). Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-21crypto: remove obsolete 'comp' compression APIArd Biesheuvel
The 'comp' compression API has been superseded by the acomp API, which is a bit more cumbersome to use, but ultimately more flexible when it comes to hardware implementations. Now that all the users and implementations have been removed, let's remove the core plumbing of the 'comp' API as well. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-21xfrm: ipcomp: Use crypto_acomp interfaceHerbert Xu
Replace the legacy comperssion interface with the new acomp interface. This is the first user to make full user of the asynchronous nature of acomp by plugging into the existing xfrm resume interface. As a result of SG support by acomp, the linear scratch buffer in ipcomp can be removed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-21crypto: acomp - Add support for foliosHerbert Xu
For many users, it's easier to supply a folio rather than an SG list since they already have them. Add support for folios to the acomp interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-21crypto: acomp - Add ACOMP_REQUEST_ALLOC and acomp_request_alloc_extraHerbert Xu
Add ACOMP_REQUEST_ALLOC which is a wrapper around acomp_request_alloc that falls back to a synchronous stack reqeust if the allocation fails. Also add ACOMP_REQUEST_ON_STACK which stores the request on the stack only. The request should be freed with acomp_request_free. Finally add acomp_request_alloc_extra which gives the user extra memory to use in conjunction with the request. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-21crypto: acomp - Remove dst_freeHerbert Xu
Remove the unused dst_free hook. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-21crypto: scomp - Remove support for some non-trivial SG listsHerbert Xu
As the only user of acomp/scomp uses a trivial single-page SG list, remove support for everything else in preprataion for the addition of virtual address support. However, keep support for non-trivial source SG lists as that user is currently jumping through hoops in order to linearise the source data. Limit the source SG linearisation buffer to a single page as that user never goes over that. The only other potential user is also unlikely to exceed that (IPComp) and it can easily do its own linearisation if necessary. Also keep the destination SG linearisation for IPComp. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-21crypto: scatterwalk - Use nth_page instead of doing it by handHerbert Xu
Curiously, the Crypto API scatterwalk incremented pages by hand rather than using nth_page. Possibly because scatterwalk predates nth_page (the following commit is from the history tree): commit 3957f2b34960d85b63e814262a8be7d5ad91444d Author: James Morris <jmorris@intercode.com.au> Date: Sun Feb 2 07:35:32 2003 -0800 [CRYPTO]: in/out scatterlist support for ciphers. Fix this by using nth_page. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-21lib/scatterlist: Add SG_MITER_LOCAL and use itHerbert Xu
Add kmap_local support to the scatterlist iterator. Use it for all the helper functions in lib/scatterlist. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-21crypto: scatterwalk - simplify map and unmap calling conventionEric Biggers
Now that the address returned by scatterwalk_map() is always being stored into the same struct scatter_walk that is passed in, make scatterwalk_map() do so itself and return void. Similarly, now that scatterwalk_unmap() is always being passed the address field within a struct scatter_walk, make scatterwalk_unmap() take a pointer to struct scatter_walk instead of the address directly. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2025-03-21netfilter: fib: avoid lookup if socket is availableFlorian Westphal
In case the fib match is used from the input hook we can avoid the fib lookup if early demux assigned a socket for us: check that the input interface matches sk-cached one. Rework the existing 'lo bypass' logic to first check sk, then for loopback interface type to elide the fib lookup. This speeds up fib matching a little, before: 93.08 GBit/s (no rules at all) 75.1 GBit/s ("fib saddr . iif oif missing drop" in prerouting) 75.62 GBit/s ("fib saddr . iif oif missing drop" in input) After: 92.48 GBit/s (no rules at all) 75.62 GBit/s (fib rule in prerouting) 90.37 GBit/s (fib rule in input). Numbers for the 'no rules' and 'prerouting' are expected to closely match in-between runs, the 3rd/input test case exercises the the 'avoid lookup if cached ifindex in sk matches' case. Test used iperf3 via veth interface, lo can't be used due to existing loopback test. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-03-21bus: fsl-mc: Remove deadcodeDr. David Alan Gilbert
fsl_mc_allocator_driver_exit() was added explicitly by commit 1e8ac83b6caf ("bus: fsl-mc: add fsl_mc_allocator cleanup function") but was never used. Remove it. fsl_mc_portal_reset() was added in 2015 by commit 197f4d6a4a00 ("staging: fsl-mc: fsl-mc object allocator driver") but was never used. Remove it. fsl_mc_portal_reset() was the only caller of dpmcp_reset(). Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com> Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/r/20241115152055.279732-1-linux@treblig.org Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
2025-03-21PCI/MSI: Convert pci_msi_ignore_mask to per MSI domain flagRoger Pau Monne
Setting pci_msi_ignore_mask inhibits the toggling of the mask bit for both MSI and MSI-X entries globally, regardless of the IRQ chip they are using. Only Xen sets the pci_msi_ignore_mask when routing physical interrupts over event channels, to prevent PCI code from attempting to toggle the maskbit, as it's Xen that controls the bit. However, the pci_msi_ignore_mask being global will affect devices that use MSI interrupts but are not routing those interrupts over event channels (not using the Xen pIRQ chip). One example is devices behind a VMD PCI bridge. In that scenario the VMD bridge configures MSI(-X) using the normal IRQ chip (the pIRQ one in the Xen case), and devices behind the bridge configure the MSI entries using indexes into the VMD bridge MSI table. The VMD bridge then demultiplexes such interrupts and delivers to the destination device(s). Having pci_msi_ignore_mask set in that scenario prevents (un)masking of MSI entries for devices behind the VMD bridge. Move the signaling of no entry masking into the MSI domain flags, as that allows setting it on a per-domain basis. Set it for the Xen MSI domain that uses the pIRQ chip, while leaving it unset for the rest of the cases. Remove pci_msi_ignore_mask at once, since it was only used by Xen code, and with Xen dropping usage the variable is unneeded. This fixes using devices behind a VMD bridge on Xen PV hardware domains. Albeit Devices behind a VMD bridge are not known to Xen, that doesn't mean Linux cannot use them. By inhibiting the usage of VMD_FEAT_CAN_BYPASS_MSI_REMAP and the removal of the pci_msi_ignore_mask bodge devices behind a VMD bridge do work fine when use from a Linux Xen hardware domain. That's the whole point of the series. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Juergen Gross <jgross@suse.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Message-ID: <20250219092059.90850-4-roger.pau@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-03-20io_uring: enable toggle of iowait usage when waiting on CQEsJens Axboe
By default, io_uring marks a waiting task as being in iowait, if it's sleeping waiting on events and there are pending requests. This isn't necessarily always useful, and may be confusing on non-storage setups where iowait isn't expected. It can also cause extra power usage, by preventing the CPU from entering lower sleep states. This adds a new enter flag, IORING_ENTER_NO_IOWAIT. If set, then io_uring will not account the sleeping task as being in iowait. If the kernel supports this feature, then it will be marked by having the IORING_FEAT_NO_IOWAIT feature flag set. As the kernel currently does not support separating the iowait accounting and CPU frequency boosting, the IORING_ENTER_NO_IOWAIT controls both of these at the same time. In the future, if those do end up being split, then it'd be possible to control them separately. However, it seems more likely that the kernel will decouple iowait and CPU frequency boosting anyway. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-03-20scsi: ufs: core: Fix a race condition related to device commandsBart Van Assche
There is a TOCTOU race in ufshcd_compl_one_cqe(): hba->dev_cmd.complete may be cleared from another thread after it has been checked and before it is used. Fix this race by moving the device command completion from the stack of the device command submitter into struct ufs_hba. This patch fixes the following kernel crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Call trace: _raw_spin_lock_irqsave+0x34/0x80 complete+0x24/0xb8 ufshcd_compl_one_cqe+0x13c/0x4f0 ufshcd_mcq_poll_cqe_lock+0xb4/0x108 ufshcd_intr+0x2f4/0x444 __handle_irq_event_percpu+0xbc/0x250 handle_irq_event+0x48/0xb0 Fixes: 5a0b0cb9bee7 ("[SCSI] ufs: Add support for sending NOP OUT UPIU") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20250314225206.1487838-1-bvanassche@acm.org Reviewed-by: Peter Wang <peter.wang@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-03-20bpf: Add struct_ops context information to struct bpf_prog_auxJuntong Deng
This patch adds struct_ops context information to struct bpf_prog_aux. This context information will be used in the kfunc filter. Currently the added context information includes struct_ops member offset and a pointer to struct bpf_struct_ops. Signed-off-by: Juntong Deng <juntong.deng@outlook.com> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://patch.msgid.link/20250319215358.2287371-2-ameryhung@gmail.com
2025-03-20nvme-tcp: request secure channel concatenationHannes Reinecke
Add a fabrics option 'concat' to request secure channel concatenation as specified the NVME Base Specification v2.1, section 8.3.4.3: Secure Channel Concatenation. When secure channel concatenation is enabled a 'generated PSK' is inserted into the keyring such that it's available after reset. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-03-20nvme-keyring: add nvme_tls_psk_refresh()Hannes Reinecke
Add a function to refresh a generated PSK in the specified keyring. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-03-20nvme: add nvme_auth_derive_tls_psk()Hannes Reinecke
Add a function to derive the TLS PSK as specified TP8018. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-03-20nvme: add nvme_auth_generate_digest()Hannes Reinecke
Add a function to calculate the PSK digest as specified in TP8018. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-03-20nvme: add nvme_auth_generate_psk()Hannes Reinecke
Add a function to generate a NVMe PSK from the shared credentials negotiated by DH-HMAC-CHAP. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-03-20crypto,fs: Separate out hkdf_extract() and hkdf_expand()Hannes Reinecke
Separate out the HKDF functions into a separate module to to make them available to other callers. And add a testsuite to the module with test vectors from RFC 5869 (and additional vectors for SHA384 and SHA512) to ensure the integrity of the algorithm. Signed-off-by: Hannes Reinecke <hare@kernel.org> Acked-by: Eric Biggers <ebiggers@kernel.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Keith Busch <kbusch@kernel.org>
2025-03-20pds_core: add new fwctl auxiliary_deviceShannon Nelson
Add support for a new fwctl-based auxiliary_device for creating a channel for fwctl support into the AMD/Pensando DSC. Link: https://patch.msgid.link/r/20250320194412.67983-4-shannon.nelson@amd.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-03-20PCI: Move cardbus IO size declarations into pci/pci.hIlpo Järvinen
For some reason, cardbus related io/mem size declarations are in linux/pci.h, whereas non-cardbus sizes are already in pci/pci.h. Move all them into one place in pci/pci.h. Link: https://lore.kernel.org/r/20250311174701.3586-4-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20PCI: Make pci_setup_bridge() staticIlpo Järvinen
pci_setup_bridge() is only used within setup-bus.c. Therefore, make it a static function. Link: https://lore.kernel.org/r/20250311174701.3586-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20PCI: Move resource reassignment func declarations into pci/pci.hIlpo Järvinen
Neither pci_reassign_bridge_resources() nor pci_reassign_resource() is used outside of the PCI subsystem. They seem to be naturally static functions but since resource fitting/assignment is split between setup-bus.c and setup-res.c, they fall into different sides of the divide and need to be declared. Move the declarations of pci_reassign_bridge_resources() and pci_reassign_resource() into pci/pci.h to keep them internal to PCI subsystem. Link: https://lore.kernel.org/r/20250311174701.3586-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20PCI: Move pci_rescan_bus_bridge_resize() declaration to pci/pci.hIlpo Järvinen
pci_rescan_bus_bridge_resize() is only used by code inside PCI subsystem. The comment also falsely advertises it to be for hotplug drivers, yet the only caller is from sysfs store function. Move the function declaration into pci/pci.h. Link: https://lore.kernel.org/r/20250311174701.3586-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2025-03-20cpumask: align text in commentJoel Savitz
Since commit 4e1a7df45480 ("cpumask: Add enabled cpumask for present CPUs that can be brought online") introduced cpu_enabled_mask, the comment line describing the mask has been slightly out of alignment with the adjacent lines. Fix this by removing a single space character. Signed-off-by: Joel Savitz <jsavitz@redhat.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2025-03-20hyperv: Add definitions for root partition driver to hv headersNuno Das Neves
A few additional definitions are required for the mshv driver code (to follow). Introduce those here and clean up a little bit while at it. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/r/1741980536-3865-10-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1741980536-3865-10-git-send-email-nunodasneves@linux.microsoft.com>
2025-03-20x86: hyperv: Add mshv_handler() irq handler and setup functionNuno Das Neves
Add mshv_handler() to process messages related to managing guest partitions such as intercepts, doorbells, and scheduling messages. In a (non-nested) root partition, the same interrupt vector is shared between the vmbus and mshv_root drivers. Introduce a stub for mshv_handler() and call it in sysvec_hyperv_callback alongside vmbus_handler(). Even though both handlers will be called for every Hyper-V interrupt, the messages for each driver are delivered to different offsets within the SYNIC message page, so they won't step on each other. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Wei Liu <wei.liu@kernel.org> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Link: https://lore.kernel.org/r/1741980536-3865-9-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1741980536-3865-9-git-send-email-nunodasneves@linux.microsoft.com>
2025-03-20hyperv: Introduce hv_recommend_using_aeoi()Nuno Das Neves
Factor out the check for enabling auto eoi, to be reused in root partition code. No functional changes. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/r/1741980536-3865-5-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1741980536-3865-5-git-send-email-nunodasneves@linux.microsoft.com>
2025-03-20arm64/hyperv: Add some missing functions to arm64Nuno Das Neves
These non-nested msr and fast hypercall functions are present in x86, but they must be available in both architectures for the root partition driver code. While at it, remove the redundant 'extern' keywords from the hv_do_hypercall() variants in asm-generic/mshyperv.h. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Roman Kisel <romank@linux.microsoft.com> Link: https://lore.kernel.org/r/1741980536-3865-4-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1741980536-3865-4-git-send-email-nunodasneves@linux.microsoft.com>
2025-03-20x86/mshyperv: Add support for extended Hyper-V featuresStanislav Kinsburskii
Extend the "ms_hyperv_info" structure to include a new field, "ext_features", for capturing extended Hyper-V features. Update the "ms_hyperv_init_platform" function to retrieve these features using the cpuid instruction and include them in the informational output. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1741980536-3865-3-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1741980536-3865-3-git-send-email-nunodasneves@linux.microsoft.com>
2025-03-20hyperv: Log hypercall status codes as stringsNuno Das Neves
Introduce hv_status_printk() macros as a convenience to log hypercall errors, formatting them with the status code (HV_STATUS_*) as a raw hex value and also as a string, which saves some time while debugging. Create a table of HV_STATUS_ codes with strings and mapped errnos, and use it for hv_result_to_string() and hv_result_to_errno(). Use the new hv_status_printk()s in hv_proc.c, hyperv-iommu.c, and irqdomain.c hypercalls to aid debugging in the root partition. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Link: https://lore.kernel.org/r/1741980536-3865-2-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1741980536-3865-2-git-send-email-nunodasneves@linux.microsoft.com>
2025-03-20hyperv: Remove unused union and structsThorsten Blum
The union vmpacket_largest_possible_header and several structs have not been used for a long time afaict - remove them. Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20250311091634.494888-2-thorsten.blum@linux.dev Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250311091634.494888-2-thorsten.blum@linux.dev>
2025-03-20hyperv: Add CONFIG_MSHV_ROOT to gate root partition supportNuno Das Neves
CONFIG_MSHV_ROOT allows kernels built to run as a normal Hyper-V guest to exclude the root partition code, which is expected to grow significantly over time. This option is a tristate so future driver code can be built as a (m)odule, allowing faster development iteration cycles. If CONFIG_MSHV_ROOT is disabled, don't compile hv_proc.c, and stub hv_root_partition() to return false unconditionally. This allows the compiler to optimize away root partition code blocks since they will be disabled at compile time. In the case of booting as root partition *without* CONFIG_MSHV_ROOT enabled, print a critical error (the kernel will likely crash). Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1740167795-13296-4-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1740167795-13296-4-git-send-email-nunodasneves@linux.microsoft.com>
2025-03-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR (net-6.14-rc8). Conflict: tools/testing/selftests/net/Makefile 03544faad761 ("selftest: net: add proc_net_pktgen") 3ed61b8938c6 ("selftests: net: test for lwtunnel dst ref loops") tools/testing/selftests/net/config: 85cb3711acb8 ("selftests: net: Add test cases for link and peer netns") 3ed61b8938c6 ("selftests: net: test for lwtunnel dst ref loops") Adjacent commits: tools/testing/selftests/net/Makefile c935af429ec2 ("selftests: net: add support for testing SO_RCVMARK and SO_RCVPRIORITY") 355d940f4d5a ("Revert "selftests: Add IPv6 link-local address generation tests for GRE devices."") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-03-20Merge branch 'kvm-nvmx-and-vm-teardown' into HEADPaolo Bonzini
The immediate issue being fixed here is a nVMX bug where KVM fails to detect that, after nested VM-Exit, L1 has a pending IRQ (or NMI). However, checking for a pending interrupt accesses the legacy PIC, and x86's kvm_arch_destroy_vm() currently frees the PIC before destroying vCPUs, i.e. checking for IRQs during the forced nested VM-Exit results in a NULL pointer deref; that's a prerequisite for the nVMX fix. The remaining patches attempt to bring a bit of sanity to x86's VM teardown code, which has accumulated a lot of cruft over the years. E.g. KVM currently unloads each vCPU's MMUs in a separate operation from destroying vCPUs, all because when guest SMP support was added, KVM had a kludgy MMU teardown flow that broke when a VM had more than one 1 vCPU. And that oddity lived on, for 18 years... Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-03-20Merge tag 'kvmarm-6.15' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.15 - Nested virtualization support for VGICv3, giving the nested hypervisor control of the VGIC hardware when running an L2 VM - Removal of 'late' nested virtualization feature register masking, making the supported feature set directly visible to userspace - Support for emulating FEAT_PMUv3 on Apple silicon, taking advantage of an IMPLEMENTATION DEFINED trap that covers all PMUv3 registers - Paravirtual interface for discovering the set of CPU implementations where a VM may run, addressing a longstanding issue of guest CPU errata awareness in big-little systems and cross-implementation VM migration - Userspace control of the registers responsible for identifying a particular CPU implementation (MIDR_EL1, REVIDR_EL1, AIDR_EL1), allowing VMs to be migrated cross-implementation - pKVM updates, including support for tracking stage-2 page table allocations in the protected hypervisor in the 'SecPageTable' stat - Fixes to vPMU, ensuring that userspace updates to the vPMU after KVM_RUN are reflected into the backing perf events