Age | Commit message (Collapse) | Author |
|
devm_hwspin_lock_request() was added by 2018's
commit 4f1acd758b08 ("hwspinlock: Add devm_xxx() APIs to request/free
hwlock") however, it's never been used, everyone uses the
devm_hwspin_lock_request_specific() call instead.
Remove it.
Similarly, the none-devm variant isn't used.
Remove it, and the referring documentation.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20241027205445.239108-1-linux@treblig.org
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
SIOCBRDELIF is passed to dev_ioctl() first and later forwarded to
br_ioctl_call(), which causes unnecessary RTNL dance and the splat
below [0] under RTNL pressure.
Let's say Thread A is trying to detach a device from a bridge and
Thread B is trying to remove the bridge.
In dev_ioctl(), Thread A bumps the bridge device's refcnt by
netdev_hold() and releases RTNL because the following br_ioctl_call()
also re-acquires RTNL.
In the race window, Thread B could acquire RTNL and try to remove
the bridge device. Then, rtnl_unlock() by Thread B will release RTNL
and wait for netdev_put() by Thread A.
Thread A, however, must hold RTNL after the unlock in dev_ifsioc(),
which may take long under RTNL pressure, resulting in the splat by
Thread B.
Thread A (SIOCBRDELIF) Thread B (SIOCBRDELBR)
---------------------- ----------------------
sock_ioctl sock_ioctl
`- sock_do_ioctl `- br_ioctl_call
`- dev_ioctl `- br_ioctl_stub
|- rtnl_lock |
|- dev_ifsioc '
' |- dev = __dev_get_by_name(...)
|- netdev_hold(dev, ...) .
/ |- rtnl_unlock ------. |
| |- br_ioctl_call `---> |- rtnl_lock
Race | | `- br_ioctl_stub |- br_del_bridge
Window | | | |- dev = __dev_get_by_name(...)
| | | May take long | `- br_dev_delete(dev, ...)
| | | under RTNL pressure | `- unregister_netdevice_queue(dev, ...)
| | | | `- rtnl_unlock
\ | |- rtnl_lock <-' `- netdev_run_todo
| |- ... `- netdev_run_todo
| `- rtnl_unlock |- __rtnl_unlock
| |- netdev_wait_allrefs_any
|- netdev_put(dev, ...) <----------------'
Wait refcnt decrement
and log splat below
To avoid blocking SIOCBRDELBR unnecessarily, let's not call
dev_ioctl() for SIOCBRADDIF and SIOCBRDELIF.
In the dev_ioctl() path, we do the following:
1. Copy struct ifreq by get_user_ifreq in sock_do_ioctl()
2. Check CAP_NET_ADMIN in dev_ioctl()
3. Call dev_load() in dev_ioctl()
4. Fetch the master dev from ifr.ifr_name in dev_ifsioc()
3. can be done by request_module() in br_ioctl_call(), so we move
1., 2., and 4. to br_ioctl_stub().
Note that 2. is also checked later in add_del_if(), but it's better
performed before RTNL.
SIOCBRADDIF and SIOCBRDELIF have been processed in dev_ioctl() since
the pre-git era, and there seems to be no specific reason to process
them there.
[0]:
unregister_netdevice: waiting for wpan3 to become free. Usage count = 2
ref_tracker: wpan3@ffff8880662d8608 has 1/1 users at
__netdev_tracker_alloc include/linux/netdevice.h:4282 [inline]
netdev_hold include/linux/netdevice.h:4311 [inline]
dev_ifsioc+0xc6a/0x1160 net/core/dev_ioctl.c:624
dev_ioctl+0x255/0x10c0 net/core/dev_ioctl.c:826
sock_do_ioctl+0x1ca/0x260 net/socket.c:1213
sock_ioctl+0x23a/0x6c0 net/socket.c:1318
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:906 [inline]
__se_sys_ioctl fs/ioctl.c:892 [inline]
__x64_sys_ioctl+0x1a4/0x210 fs/ioctl.c:892
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcb/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Fixes: 893b19587534 ("net: bridge: fix ioctl locking")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Reported-by: yan kang <kangyan91@outlook.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Closes: https://lore.kernel.org/netdev/SY8P300MB0421225D54EB92762AE8F0F2A1D32@SY8P300MB0421.AUSP300.PROD.OUTLOOK.COM/
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://patch.msgid.link/20250316192851.19781-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
So far s390 does not allow mmap() of PCI resources to user-space via the
usual mechanisms, though it does use it for RDMA. For the PCI sysfs
resource files and /proc/bus/pci it defines neither HAVE_PCI_MMAP nor
ARCH_GENERIC_PCI_MMAP_RESOURCE. For vfio-pci s390 previously relied on
disabled VFIO_PCI_MMAP and now relies on setting pdev->non_mappable_bars
for all devices.
This is partly because access to mapped PCI resources from user-space
requires special PCI load/store memory-I/O (MIO) instructions, or the
special MMIO syscalls when these are not available. Still, such access is
possible and useful not just for RDMA, in fact not being able to mmap() PCI
resources has previously caused extra work when testing devices.
One thing that doesn't work with PCI resources mapped to user-space though
is the s390 specific virtual ISM device. Not only because the BAR size of
256 TiB prevents mapping the whole BAR but also because access requires use
of the legacy PCI instructions which are not accessible to user-space on
systems with the newer MIO PCI instructions.
Now with the pdev->non_mappable_bars flag ISM can be excluded from mapping
its resources while making this functionality available for all other PCI
devices. To this end introduce a minimal implementation of PCI_QUIRKS and
use that to set pdev->non_mappable_bars for ISM devices only. Then also set
ARCH_GENERIC_PCI_MMAP_RESOURCE to take advantage of the generic
implementation of pci_mmap_resource_range() enabling only the newer sysfs
mmap() interface. This follows the recommendation in
Documentation/PCI/sysfs-pci.rst.
Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-3-c5c0f1d26efd@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
The ability to map PCI resources to user-space is controlled by global
defines. For vfio there is VFIO_PCI_MMAP which is only disabled on s390 and
controls mapping of PCI resources using vfio-pci with a fallback option via
the pread()/pwrite() interface.
For the PCI core there is ARCH_GENERIC_PCI_MMAP_RESOURCE which enables a
generic implementation for mapping PCI resources plus the newer sysfs
interface. Then there is HAVE_PCI_MMAP which can be used with custom
definitions of pci_mmap_resource_range() and the historical /proc/bus/pci
interface. Both mechanisms are all or nothing.
For s390 mapping PCI resources is possible and useful for testing and
certain applications such as QEMU's vfio-pci based user-space NVMe driver.
For certain devices, however access to PCI resources via mappings to
user-space is not possible and these must be excluded from the general PCI
resource mapping mechanisms.
Introduce pdev->non_mappable_bars to indicate that a PCI device's BARs can
not be accessed via mappings to user-space. In the future this enables
per-device restrictions of PCI resource mapping.
For now, set this flag for all PCI devices on s390 in line with the
existing, general disable of PCI resource mapping. As s390 is the only user
of the VFI_PCI_MMAP Kconfig options this can already be replaced with a
check of this new flag. Also add similar checks in the other code protected
by HAVE_PCI_MMAP respectively ARCH_GENERIC_PCI_MMAP in preparation for
enabling these for supported devices.
Link: https://lore.kernel.org/lkml/20250212132808.08dcf03c.alex.williamson@redhat.com/
Link: https://lore.kernel.org/r/20250226-vfio_pci_mmap-v7-2-c5c0f1d26efd@linux.ibm.com
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
io_uring_cmd_import_fixed_vec() is a cmd helper around vectored
registered buffer import functions, which caches the memory under
the hood. The lifetime of the vectore and hence the iterator is bound to
the request. Furthermore, the user is not allowed to call it multiple
times for a single request.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/97487a80dec3fb8cf8aeedf1f9026ef6d503fe4b.1742579999.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Current code varies in how the size of the variable size input header
for hypercalls is calculated when the input contains struct hv_vpset.
Surprisingly, this variation is correct, as different hypercalls make
different choices for what portion of struct hv_vpset is treated as part
of the variable size input header. The Hyper-V TLFS is silent on these
details, but the behavior has been confirmed with Hyper-V developers.
To avoid future confusion about these differences, add comments to
struct hv_vpset, and to hypercall call sites with input that contains
a struct hv_vpset. The comments describe the overall situation and
the calculation that should be used at each particular call site.
No functional change as only comments are updated.
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20250318214919.958953-1-mhklinux@outlook.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20250318214919.958953-1-mhklinux@outlook.com>
|
|
Provide a set of IOCTLs for creating and managing child partitions when
running as root partition on Hyper-V. The new driver is enabled via
CONFIG_MSHV_ROOT.
A brief overview of the interface:
MSHV_CREATE_PARTITION is the entry point, returning a file descriptor
representing a child partition. IOCTLs on this fd can be used to map
memory, create VPs, etc.
Creating a VP returns another file descriptor representing that VP which
in turn has another set of corresponding IOCTLs for running the VP,
getting/setting state, etc.
MSHV_ROOT_HVCALL is a generic "passthrough" hypercall IOCTL which can be
used for a number of partition or VP hypercalls. This is for hypercalls
that do not affect any state in the kernel driver, such as getting and
setting VP registers and partition properties, translating addresses,
etc. It is "passthrough" because the binary input and output for the
hypercall is only interpreted by the VMM - the kernel driver does
nothing but insert the VP and partition id where necessary (which are
always in the same place), and execute the hypercall.
Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com>
Co-developed-by: Jinank Jain <jinankjain@microsoft.com>
Signed-off-by: Jinank Jain <jinankjain@microsoft.com>
Co-developed-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com>
Co-developed-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Co-developed-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com>
Co-developed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Co-developed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
Link: https://lore.kernel.org/r/1741980536-3865-11-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1741980536-3865-11-git-send-email-nunodasneves@linux.microsoft.com>
|
|
The void * cast in mctp_cb is unnecessary as it's already been done
at the start of the function.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Jeremy Kerr <jk@codeconstruct.com.au>
Link: https://patch.msgid.link/Z9PwOQeBSYlgZlHq@gondor.apana.org.au
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Propagate the NFS_MOUNT_NETUNREACH_FATAL flag to work with the pNFS
flexfiles client. In these circumstances, the client needs to treat the
ENETDOWN and ENETUNREACH errors as fatal, and should abandon the
attempted I/O.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Propagate the NFS_MOUNT_NETUNREACH_FATAL flag to work with the generic
NFS client. If the flag is set, the client will receive ENETDOWN and
ENETUNREACH errors from the RPC layer, and is expected to treat them as
being fatal.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
|
|
If the NFS client was initially created in a container, and that
container is torn down, there is usually no possibity to go back and
destroy any NFS clients that are hung because their virtual network
devices have been unlinked.
Add a flag that tells the NFS client that in these circumstances, it
should treat ENETDOWN and ENETUNREACH errors as fatal to the NFS client.
The option defaults to being on when the mount happens from inside a net
namespace that is not "init_net".
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
|
|
This one is hilariously outdated, it provided a faster downlink over
TV cable for users of analog modems in the 1990s, through an ISA card.
The web page for the userspace tools has been broken for 25 years, and
the driver has only ever seen mechanical updates.
Link: http://web.archive.org/web/20000611165545/http://home.adelphia.net:80/~siglercm/sb1000.html
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250312085236.2531870-1-arnd@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Writing to this file will clone the 'main' xprt of an xprt_switch and
add it to be used as an additional connection.
--
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
v3: Replace call to xprt_iter_get_xprt() with xprt_iter_get_next()
Link: https://lore.kernel.org/r/20250207204225.594002-5-anna@kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
There are certain users that wish to force the NFS client to choose
READDIRPLUS over READDIR for a particular mount. Update the "rdirplus" mount
option to optionally accept values. For "rdirplus=force", the NFS client
will always attempt to use READDDIRPLUS. The setting of "rdirplus=none" is
aliased to the existing "nordirplus".
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Link: https://lore.kernel.org/r/c4cf0de4c8be0930b91bc74bee310d289781cd3b.1741885071.git.bcodding@redhat.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Some fixes may require user space to check if they are applied on the
running kernel before using a specific feature. For instance, this
applies when a restriction was previously too restrictive and is now
getting relaxed (e.g. for compatibility reasons). However, non-visible
changes for legitimate use (e.g. security fixes) do not require an
erratum.
Because fixes are backported down to a specific Landlock ABI, we need a
way to avoid cherry-pick conflicts. The solution is to only update a
file related to the lower ABI impacted by this issue. All the ABI files
are then used to create a bitmask of fixes.
The new errata interface is similar to the one used to get the supported
Landlock ABI version, but it returns a bitmask instead because the order
of fixes may not match the order of versions, and not all fixes may
apply to all versions.
The actual errata will come with dedicated commits. The description is
not actually used in the code but serves as documentation.
Create the landlock_abi_version symbol and use its value to check errata
consistency.
Update test_base's create_ruleset_checks_ordering tests and add errata
tests.
This commit is backportable down to the first version of Landlock.
Fixes: 3532b0b4352c ("landlock: Enable user space to infer supported features")
Cc: Günther Noack <gnoack@google.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250318161443.279194-3-mic@digikod.net
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
All implementations of chacha_init_arch() just call
chacha_init_generic(), so it is pointless. Just delete it, and replace
chacha_init() with what was previously chacha_init_generic().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The 'comp' compression API has been superseded by the acomp API, which
is a bit more cumbersome to use, but ultimately more flexible when it
comes to hardware implementations.
Now that all the users and implementations have been removed, let's
remove the core plumbing of the 'comp' API as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Replace the legacy comperssion interface with the new acomp
interface. This is the first user to make full user of the
asynchronous nature of acomp by plugging into the existing xfrm
resume interface.
As a result of SG support by acomp, the linear scratch buffer
in ipcomp can be removed.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
For many users, it's easier to supply a folio rather than an SG
list since they already have them. Add support for folios to the
acomp interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add ACOMP_REQUEST_ALLOC which is a wrapper around acomp_request_alloc
that falls back to a synchronous stack reqeust if the allocation
fails.
Also add ACOMP_REQUEST_ON_STACK which stores the request on the stack
only.
The request should be freed with acomp_request_free.
Finally add acomp_request_alloc_extra which gives the user extra
memory to use in conjunction with the request.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Remove the unused dst_free hook.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
As the only user of acomp/scomp uses a trivial single-page SG
list, remove support for everything else in preprataion for the
addition of virtual address support.
However, keep support for non-trivial source SG lists as that
user is currently jumping through hoops in order to linearise
the source data.
Limit the source SG linearisation buffer to a single page as
that user never goes over that. The only other potential user
is also unlikely to exceed that (IPComp) and it can easily do
its own linearisation if necessary.
Also keep the destination SG linearisation for IPComp.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Curiously, the Crypto API scatterwalk incremented pages by hand
rather than using nth_page. Possibly because scatterwalk predates
nth_page (the following commit is from the history tree):
commit 3957f2b34960d85b63e814262a8be7d5ad91444d
Author: James Morris <jmorris@intercode.com.au>
Date: Sun Feb 2 07:35:32 2003 -0800
[CRYPTO]: in/out scatterlist support for ciphers.
Fix this by using nth_page.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Add kmap_local support to the scatterlist iterator. Use it for
all the helper functions in lib/scatterlist.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Now that the address returned by scatterwalk_map() is always being
stored into the same struct scatter_walk that is passed in, make
scatterwalk_map() do so itself and return void.
Similarly, now that scatterwalk_unmap() is always being passed the
address field within a struct scatter_walk, make scatterwalk_unmap()
take a pointer to struct scatter_walk instead of the address directly.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
In case the fib match is used from the input hook we can avoid the fib
lookup if early demux assigned a socket for us: check that the input
interface matches sk-cached one.
Rework the existing 'lo bypass' logic to first check sk, then
for loopback interface type to elide the fib lookup.
This speeds up fib matching a little, before:
93.08 GBit/s (no rules at all)
75.1 GBit/s ("fib saddr . iif oif missing drop" in prerouting)
75.62 GBit/s ("fib saddr . iif oif missing drop" in input)
After:
92.48 GBit/s (no rules at all)
75.62 GBit/s (fib rule in prerouting)
90.37 GBit/s (fib rule in input).
Numbers for the 'no rules' and 'prerouting' are expected to
closely match in-between runs, the 3rd/input test case exercises the
the 'avoid lookup if cached ifindex in sk matches' case.
Test used iperf3 via veth interface, lo can't be used due to existing
loopback test.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
fsl_mc_allocator_driver_exit() was added explicitly by
commit 1e8ac83b6caf ("bus: fsl-mc: add fsl_mc_allocator cleanup function")
but was never used.
Remove it.
fsl_mc_portal_reset() was added in 2015 by
commit 197f4d6a4a00 ("staging: fsl-mc: fsl-mc object allocator driver")
but was never used.
Remove it.
fsl_mc_portal_reset() was the only caller of dpmcp_reset().
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20241115152055.279732-1-linux@treblig.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
|
|
Setting pci_msi_ignore_mask inhibits the toggling of the mask bit for both
MSI and MSI-X entries globally, regardless of the IRQ chip they are using.
Only Xen sets the pci_msi_ignore_mask when routing physical interrupts over
event channels, to prevent PCI code from attempting to toggle the maskbit,
as it's Xen that controls the bit.
However, the pci_msi_ignore_mask being global will affect devices that use
MSI interrupts but are not routing those interrupts over event channels
(not using the Xen pIRQ chip). One example is devices behind a VMD PCI
bridge. In that scenario the VMD bridge configures MSI(-X) using the
normal IRQ chip (the pIRQ one in the Xen case), and devices behind the
bridge configure the MSI entries using indexes into the VMD bridge MSI
table. The VMD bridge then demultiplexes such interrupts and delivers to
the destination device(s). Having pci_msi_ignore_mask set in that scenario
prevents (un)masking of MSI entries for devices behind the VMD bridge.
Move the signaling of no entry masking into the MSI domain flags, as that
allows setting it on a per-domain basis. Set it for the Xen MSI domain
that uses the pIRQ chip, while leaving it unset for the rest of the
cases.
Remove pci_msi_ignore_mask at once, since it was only used by Xen code, and
with Xen dropping usage the variable is unneeded.
This fixes using devices behind a VMD bridge on Xen PV hardware domains.
Albeit Devices behind a VMD bridge are not known to Xen, that doesn't mean
Linux cannot use them. By inhibiting the usage of
VMD_FEAT_CAN_BYPASS_MSI_REMAP and the removal of the pci_msi_ignore_mask
bodge devices behind a VMD bridge do work fine when use from a Linux Xen
hardware domain. That's the whole point of the series.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Message-ID: <20250219092059.90850-4-roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
By default, io_uring marks a waiting task as being in iowait, if it's
sleeping waiting on events and there are pending requests. This isn't
necessarily always useful, and may be confusing on non-storage setups
where iowait isn't expected. It can also cause extra power usage, by
preventing the CPU from entering lower sleep states.
This adds a new enter flag, IORING_ENTER_NO_IOWAIT. If set, then
io_uring will not account the sleeping task as being in iowait. If the
kernel supports this feature, then it will be marked by having the
IORING_FEAT_NO_IOWAIT feature flag set.
As the kernel currently does not support separating the iowait
accounting and CPU frequency boosting, the IORING_ENTER_NO_IOWAIT
controls both of these at the same time. In the future, if those do end
up being split, then it'd be possible to control them separately.
However, it seems more likely that the kernel will decouple iowait and
CPU frequency boosting anyway.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
There is a TOCTOU race in ufshcd_compl_one_cqe(): hba->dev_cmd.complete may
be cleared from another thread after it has been checked and before it is
used. Fix this race by moving the device command completion from the stack
of the device command submitter into struct ufs_hba. This patch fixes the
following kernel crash:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
Call trace:
_raw_spin_lock_irqsave+0x34/0x80
complete+0x24/0xb8
ufshcd_compl_one_cqe+0x13c/0x4f0
ufshcd_mcq_poll_cqe_lock+0xb4/0x108
ufshcd_intr+0x2f4/0x444
__handle_irq_event_percpu+0xbc/0x250
handle_irq_event+0x48/0xb0
Fixes: 5a0b0cb9bee7 ("[SCSI] ufs: Add support for sending NOP OUT UPIU")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20250314225206.1487838-1-bvanassche@acm.org
Reviewed-by: Peter Wang <peter.wang@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This patch adds struct_ops context information to struct bpf_prog_aux.
This context information will be used in the kfunc filter.
Currently the added context information includes struct_ops member
offset and a pointer to struct bpf_struct_ops.
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://patch.msgid.link/20250319215358.2287371-2-ameryhung@gmail.com
|
|
Add a fabrics option 'concat' to request secure channel concatenation as
specified the NVME Base Specification v2.1, section 8.3.4.3: Secure Channel
Concatenation.
When secure channel concatenation is enabled a 'generated PSK' is inserted
into the keyring such that it's available after reset.
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Add a function to refresh a generated PSK in the specified keyring.
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Add a function to derive the TLS PSK as specified TP8018.
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Add a function to calculate the PSK digest as specified in TP8018.
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Add a function to generate a NVMe PSK from the shared credentials
negotiated by DH-HMAC-CHAP.
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Separate out the HKDF functions into a separate module to
to make them available to other callers.
And add a testsuite to the module with test vectors
from RFC 5869 (and additional vectors for SHA384 and SHA512)
to ensure the integrity of the algorithm.
Signed-off-by: Hannes Reinecke <hare@kernel.org>
Acked-by: Eric Biggers <ebiggers@kernel.org>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Add support for a new fwctl-based auxiliary_device for creating a
channel for fwctl support into the AMD/Pensando DSC.
Link: https://patch.msgid.link/r/20250320194412.67983-4-shannon.nelson@amd.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
For some reason, cardbus related io/mem size declarations are in
linux/pci.h, whereas non-cardbus sizes are already in pci/pci.h.
Move all them into one place in pci/pci.h.
Link: https://lore.kernel.org/r/20250311174701.3586-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
pci_setup_bridge() is only used within setup-bus.c. Therefore, make it a
static function.
Link: https://lore.kernel.org/r/20250311174701.3586-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Neither pci_reassign_bridge_resources() nor pci_reassign_resource() is used
outside of the PCI subsystem. They seem to be naturally static functions
but since resource fitting/assignment is split between setup-bus.c and
setup-res.c, they fall into different sides of the divide and need to be
declared.
Move the declarations of pci_reassign_bridge_resources() and
pci_reassign_resource() into pci/pci.h to keep them internal to PCI
subsystem.
Link: https://lore.kernel.org/r/20250311174701.3586-2-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
pci_rescan_bus_bridge_resize() is only used by code inside PCI subsystem.
The comment also falsely advertises it to be for hotplug drivers, yet the
only caller is from sysfs store function. Move the function declaration
into pci/pci.h.
Link: https://lore.kernel.org/r/20250311174701.3586-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Since commit 4e1a7df45480 ("cpumask: Add enabled cpumask
for present CPUs that can be brought online") introduced
cpu_enabled_mask, the comment line describing the mask
has been slightly out of alignment with the adjacent
lines.
Fix this by removing a single space character.
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Signed-off-by: Yury Norov <yury.norov@gmail.com>
|
|
A few additional definitions are required for the mshv driver code
(to follow). Introduce those here and clean up a little bit while
at it.
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Link: https://lore.kernel.org/r/1741980536-3865-10-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1741980536-3865-10-git-send-email-nunodasneves@linux.microsoft.com>
|
|
Add mshv_handler() to process messages related to managing guest
partitions such as intercepts, doorbells, and scheduling messages.
In a (non-nested) root partition, the same interrupt vector is shared
between the vmbus and mshv_root drivers.
Introduce a stub for mshv_handler() and call it in
sysvec_hyperv_callback alongside vmbus_handler().
Even though both handlers will be called for every Hyper-V interrupt,
the messages for each driver are delivered to different offsets
within the SYNIC message page, so they won't step on each other.
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Link: https://lore.kernel.org/r/1741980536-3865-9-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1741980536-3865-9-git-send-email-nunodasneves@linux.microsoft.com>
|
|
Factor out the check for enabling auto eoi, to be reused in root
partition code.
No functional changes.
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Link: https://lore.kernel.org/r/1741980536-3865-5-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1741980536-3865-5-git-send-email-nunodasneves@linux.microsoft.com>
|
|
These non-nested msr and fast hypercall functions are present in x86,
but they must be available in both architectures for the root partition
driver code.
While at it, remove the redundant 'extern' keywords from the
hv_do_hypercall() variants in asm-generic/mshyperv.h.
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
Link: https://lore.kernel.org/r/1741980536-3865-4-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1741980536-3865-4-git-send-email-nunodasneves@linux.microsoft.com>
|
|
Extend the "ms_hyperv_info" structure to include a new field,
"ext_features", for capturing extended Hyper-V features.
Update the "ms_hyperv_init_platform" function to retrieve these features
using the cpuid instruction and include them in the informational output.
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Tianyu Lan <tiala@microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/1741980536-3865-3-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1741980536-3865-3-git-send-email-nunodasneves@linux.microsoft.com>
|
|
Introduce hv_status_printk() macros as a convenience to log hypercall
errors, formatting them with the status code (HV_STATUS_*) as a raw hex
value and also as a string, which saves some time while debugging.
Create a table of HV_STATUS_ codes with strings and mapped errnos, and
use it for hv_result_to_string() and hv_result_to_errno().
Use the new hv_status_printk()s in hv_proc.c, hyperv-iommu.c, and
irqdomain.c hypercalls to aid debugging in the root partition.
Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Link: https://lore.kernel.org/r/1741980536-3865-2-git-send-email-nunodasneves@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <1741980536-3865-2-git-send-email-nunodasneves@linux.microsoft.com>
|
|
The union vmpacket_largest_possible_header and several structs have not
been used for a long time afaict - remove them.
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://lore.kernel.org/r/20250311091634.494888-2-thorsten.blum@linux.dev
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20250311091634.494888-2-thorsten.blum@linux.dev>
|