summaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)Author
2025-02-20Merge branch 'for-6.15/io_uring-rx-zc' into for-6.15/io_uring-epoll-waitJens Axboe
* for-6.15/io_uring-rx-zc: (77 commits) io_uring: Rename KConfig to Kconfig io_uring/zcrx: fix leaks on failed registration io_uring/zcrx: recheck ifq on shutdown io_uring/zcrx: add selftest net: add documentation for io_uring zcrx io_uring/zcrx: add copy fallback io_uring/zcrx: throttle receive requests io_uring/zcrx: set pp memory provider for an rx queue io_uring/zcrx: add io_recvzc request io_uring/zcrx: dma-map area for the device io_uring/zcrx: implement zerocopy receive pp memory provider io_uring/zcrx: grab a net device io_uring/zcrx: add io_zcrx_area io_uring/zcrx: add interface queue and refill queue net: add helpers for setting a memory provider on an rx queue net: page_pool: add memory provider helpers net: prepare for non devmem TCP memory providers net: page_pool: add a mp hook to unregister_netdevice* net: page_pool: add callback for mp info printing netdev: add io_uring memory provider info ...
2025-02-20Merge tag 'linux-can-next-for-6.15-20250219' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2025-02-19 this is a pull request of 12 patches for net-next/master. The first 4 patches are by Krzysztof Kozlowski and simplify the c_can driver's c_can_plat_probe() function. Ciprian Marian Costea contributes 3 patches to add S32G2/S32G3 support to the flexcan driver. Ruffalo Lavoisier's patch removes a duplicated word from the mcp251xfd DT bindings documentation. Oleksij Rempel extends the J1939 documentation. The next patch is by Oliver Hartkopp and adds access for the Remote Request Substitution bit in CAN-XL frames. Henrik Brix Andersen's patch for the gs_usb driver adds support for the CANnectivity firmware. The last patch is by Robin van der Gracht and removes a duplicated setup of RX FIFO in the rockchip_canfd driver. linux-can-next-for-6.15-20250219 * tag 'linux-can-next-for-6.15-20250219' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: can: rockchip_canfd: rkcanfd_chip_fifo_setup(): remove duplicated setup of RX FIFO can: gs_usb: add VID/PID for the CANnectivity firmware can: canxl: support Remote Request Substitution bit access can: j1939: Extend stack documentation with buffer size behavior dt-binding: can: mcp251xfd: remove duplicate word can: flexcan: add NXP S32G2/S32G3 SoC support can: flexcan: Add quirk to handle separate interrupt lines for mailboxes dt-bindings: can: fsl,flexcan: add S32G2/S32G3 SoC support can: c_can: Use syscon_regmap_lookup_by_phandle_args can: c_can: Use of_property_present() to test existence of DT property can: c_can: Simplify handling syscon error path can: c_can: Drop useless final probe failure message ==================== Link: https://patch.msgid.link/20250219113354.529611-1-mkl@pengutronix.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-19net: fib_rules: Add port mask attributesIdo Schimmel
Add attributes that allow matching on source and destination ports with a mask. Matching on the source port with a mask is needed in deployments where users encode path information into certain bits of the UDP source port. Temporarily set the type of the attributes to 'NLA_REJECT' while support is being added. Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250217134109.311176-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-19Merge tag 'mm-hotfixes-stable-2025-02-19-17-49' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "18 hotfixes. 5 are cc:stable and the remainder address post-6.13 issues or aren't considered necessary for -stable kernels. 10 are for MM and 8 are for non-MM. All are singletons, please see the changelogs for details" * tag 'mm-hotfixes-stable-2025-02-19-17-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: test_xarray: fix failure in check_pause when CONFIG_XARRAY_MULTI is not defined kasan: don't call find_vm_area() in a PREEMPT_RT kernel MAINTAINERS: update Nick's contact info selftests/mm: fix check for running THP tests mm: hugetlb: avoid fallback for specific node allocation of 1G pages memcg: avoid dead loop when setting memory.max mailmap: update Nick's entry mm: pgtable: fix incorrect reclaim of non-empty PTE pages taskstats: modify taskstats version getdelays: fix error format characters mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize() tools/mm: fix build warnings with musl-libc mailmap: add entry for Feng Tang .mailmap: add entries for Jeff Johnson mm,madvise,hugetlb: check for 0-length range after end address adjustment mm/zswap: fix inconsistency when zswap_store_page() fails lib/iov_iter: fix import_iovec_ubuf iovec management procfs: fix a locking bug in a vmcore_add_device_dump() error path
2025-02-19can: canxl: support Remote Request Substitution bit accessOliver Hartkopp
The Remote Request Substitution bit is a dominant bit ("0") in the CAN XL frame. As some CAN XL controllers support to access this bit a new CANXL_RRS value has been defined for the canxl_frame.flags element. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20250124142347.7444-1-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-02-18io_uring: fix spelling error in uapi io_uring.hJens Axboe
This is obviously not that important, but when changes are synced back from the kernel to liburing, the codespell CI ends up erroring because of this misspelling. Let's just correct it and avoid this biting us again on an import. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-18s390/vfio-ap: Signal eventfd when guest AP configuration is changedRorie Reyes
In this patch, an eventfd object is created by the vfio_ap device driver and used to notify userspace when a guests's AP configuration is dynamically changed. Such changes may occur whenever: * An adapter, domain or control domain is assigned to or unassigned from a mediated device that is attached to the guest. * A queue assigned to the mediated device that is attached to a guest is bound to or unbound from the vfio_ap device driver. This can occur either by manually binding/unbinding the queue via the vfio_ap driver's sysfs bind/unbind attribute interfaces, or because an adapter, domain or control domain assigned to the mediated device is added to or removed from the host's AP configuration via an SE/HMC The purpose of this patch is to provide immediate notification of changes made to a guest's AP configuration by the vfio_ap driver. This will enable the guest to take immediate action rather than relying on polling or some other inefficient mechanism to detect changes to its AP configuration. Note that there are corresponding QEMU patches that will be shipped along with this patch (see vfio-ap: Report vfio-ap configuration changes) that will pick up the eventfd signal. Signed-off-by: Rorie Reyes <rreyes@linux.ibm.com> Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com> Tested-by: Anthony Krowiak <akrowiak@linux.ibm.com> Link: https://lore.kernel.org/r/20250107183645.90082-1-rreyes@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2025-02-18Merge tag 'sound-6.14-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A slightly large collection of fixes, spread over various drivers. Almost all are small and device-specific fixes and quirks in ASoC SOF Intel and AMD, Renesas, Cirrus, HD-audio, in addition to a small fix for MIDI 2.0" * tag 'sound-6.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (41 commits) ALSA: seq: Drop UMP events when no UMP-conversion is set ALSA: hda/conexant: Add quirk for HP ProBook 450 G4 mute LED ALSA: hda/cirrus: Reduce codec resume time ALSA: hda/cirrus: Correct the full scale volume set logic virtio_snd.h: clarify that `controls` depends on VIRTIO_SND_F_CTLS ALSA: hda: Add error check for snd_ctl_rename_id() in snd_hda_create_dig_out_ctls() ALSA: hda/tas2781: Fix index issue in tas2781 hda SPI driver ASoC: imx-audmix: remove cpu_mclk which is from cpu dai device ALSA: hda/realtek: Fixup ALC225 depop procedure ALSA: hda/tas2781: Update tas2781 hda SPI driver ASoC: cs35l41: Fix acpi_device_hid() not found ASoC: SOF: amd: Add branch prediction hint in ACP IRQ handler ASoC: SOF: amd: Handle IPC replies before FW_BOOT_COMPLETE ASoC: SOF: amd: Drop unused includes from Vangogh driver ASoC: SOF: amd: Add post_fw_run_delay ACP quirk ASoC: Intel: soc-acpi-intel-ptl-match: revise typo of rt713_vb_l2_rt1320_l13 ASoC: Intel: soc-acpi-intel-ptl-match: revise typo of rt712_vb + rt1320 support ALSA: Switch to use hrtimer_setup() ALSA: hda: hda-intel: add Panther Lake-H support ASoC: SOF: Intel: pci-ptl: Add support for PTL-H ...
2025-02-17taskstats: modify taskstats versionWang Yaxin
After adding "delay max" and "delay min" to the taskstats structure, the taskstats version needs to be updated. Link: https://lkml.kernel.org/r/20250208144901218Q5ptVpqsQkb2MOEmW4Ujn@zte.com.cn Fixes: f65c64f311ee ("delayacct: add delay min to record delay peak") Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Signed-off-by: Kun Jiang <jiang.kun2@zte.com.cn> Reviewed-by: xu xin <xu.xin16@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-17netdev-genl: Add an XSK attribute to queuesJoe Damato
Expose a new per-queue nest attribute, xsk, which will be present for queues that are being used for AF_XDP. If the queue is not being used for AF_XDP, the nest will not be present. In the future, this attribute can be extended to include more data about XSK as it is needed. Signed-off-by: Joe Damato <jdamato@fastly.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250214211255.14194-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-17io_uring/zcrx: add io_recvzc requestDavid Wei
Add io_uring opcode OP_RECV_ZC for doing zero copy reads out of a socket. Only the connection should be land on the specific rx queue set up for zero copy, and the socket must be handled by the io_uring instance that the rx queue was registered for zero copy with. That's because neither net_iovs / buffers from our queue can be read by outside applications, nor zero copy is possible if traffic for the zero copy connection goes to another queue. This coordination is outside of the scope of this patch series. Also, any traffic directed to the zero copy enabled queue is immediately visible to the application, which is why CAP_NET_ADMIN is required at the registration step. Of course, no data is actually read out of the socket, it has already been copied by the netdev into userspace memory via DMA. OP_RECV_ZC reads skbs out of the socket and checks that its frags are indeed net_iovs that belong to io_uring. A cqe is queued for each one of these frags. Recall that each cqe is a big cqe, with the top half being an io_uring_zcrx_cqe. The cqe res field contains the len or error. The lower IORING_ZCRX_AREA_SHIFT bits of the struct io_uring_zcrx_cqe::off field contain the offset relative to the start of the zero copy area. The upper part of the off field is trivially zero, and will be used to carry the area id. For now, there is no limit as to how much work each OP_RECV_ZC request does. It will attempt to drain a socket of all available data. This request always operates in multishot mode. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: David Wei <dw@davidwei.uk> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20250215000947.789731-7-dw@davidwei.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-17io_uring/zcrx: add io_zcrx_areaDavid Wei
Add io_zcrx_area that represents a region of userspace memory that is used for zero copy. During ifq registration, userspace passes in the uaddr and len of userspace memory, which is then pinned by the kernel. Each net_iov is mapped to one of these pages. The freelist is a spinlock protected list that keeps track of all the net_iovs/pages that aren't used. For now, there is only one area per ifq and area registration happens implicitly as part of ifq registration. There is no API for adding/removing areas yet. The struct for area registration is there for future extensibility once we support multiple areas and TCP devmem. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20250215000947.789731-3-dw@davidwei.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-17io_uring/zcrx: add interface queue and refill queueDavid Wei
Add a new object called an interface queue (ifq) that represents a net rx queue that has been configured for zero copy. Each ifq is registered using a new registration opcode IORING_REGISTER_ZCRX_IFQ. The refill queue is allocated by the kernel and mapped by userspace using a new offset IORING_OFF_RQ_RING, in a similar fashion to the main SQ/CQ. It is used by userspace to return buffers that it is done with, which will then be re-used by the netdev again. The main CQ ring is used to notify userspace of received data by using the upper 16 bytes of a big CQE as a new struct io_uring_zcrx_cqe. Each entry contains the offset + len to the data. For now, each io_uring instance only has a single ifq. Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: David Wei <dw@davidwei.uk> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20250215000947.789731-2-dw@davidwei.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-17Merge commit '71f0dd5a3293d75d26d405ffbaedfdda4836af32' of ↵Jens Axboe
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next into for-6.15/io_uring-rx-zc Merge networking zerocopy receive tree, to get the prep patches for the io_uring rx zc support. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (63 commits) net: add helpers for setting a memory provider on an rx queue net: page_pool: add memory provider helpers net: prepare for non devmem TCP memory providers net: page_pool: add a mp hook to unregister_netdevice* net: page_pool: add callback for mp info printing netdev: add io_uring memory provider info net: page_pool: create hooks for custom memory providers net: generalise net_iov chunk owners net: prefix devmem specific helpers net: page_pool: don't cast mp param to devmem tools: ynl: add all headers to makefile deps eth: fbnic: set IFF_UNICAST_FLT to avoid enabling promiscuous mode when adding unicast addrs eth: fbnic: add MAC address TCAM to debugfs tools: ynl-gen: support limits using definitions tools: ynl-gen: don't output external constants net/mlx5e: Avoid WARN_ON when configuring MQPRIO with HTB offload enabled net/mlx5e: Remove unused mlx5e_tc_flow_action struct net/mlx5: Remove stray semicolon in LAG port selection table creation net/mlx5e: Support FEC settings for 200G per lane link modes net/mlx5: Add support for 200Gbps per lane link modes ...
2025-02-14Merge tag 'thermal-6.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "Fix a regression caused by an inadvertent change of the THERMAL_GENL_ATTR_CPU_CAPABILITY value in one of the recent thermal commits (Zhang Rui) and drop a stale piece of documentation (Daniel Lezcano)" * tag 'thermal-6.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal/cpufreq_cooling: Remove structure member documentation thermal/netlink: Prevent userspace segmentation fault by adjusting UAPI header
2025-02-14virtio_snd.h: clarify that `controls` depends on VIRTIO_SND_F_CTLSStefano Garzarella
As defined in the specification, the `controls` field in the configuration space is only valid/present if VIRTIO_SND_F_CTLS is negotiated. From https://docs.oasis-open.org/virtio/virtio/v1.3/virtio-v1.3.html: 5.14.4 Device Configuration Layout ... controls (driver-read-only) indicates a total number of all available control elements if VIRTIO_SND_F_CTLS has been negotiated. Let's use the same style used in virtio_blk.h to clarify this and to avoid confusion as happened in QEMU (see link). Link: https://gitlab.com/qemu-project/qemu/-/issues/2805 Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Link: https://patch.msgid.link/20250213161825.139952-1-sgarzare@redhat.com
2025-02-14landlock: Minor typo and grammar fixes in IPC scoping documentationGünther Noack
* Fix some whitespace, punctuation and minor grammar. * Add a missing sentence about the minimum ABI version, to stay in line with the section next to it. Cc: Tahera Fahimi <fahimitahera@gmail.com> Cc: Tanya Agarwal <tanyaagarwal25699@gmail.com> Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20250124154445.162841-1-gnoack@google.com [mic: Add newlines, update doc date] Signed-off-by: Mickaël Salaün <mic@digikod.net>
2025-02-13fs/xattr: bpf: Introduce security.bpf. xattr name prefixSong Liu
Introduct new xattr name prefix security.bpf., and enable reading these xattrs from bpf kfuncs bpf_get_[file|dentry]_xattr(). As we are on it, correct the comments for return value of bpf_get_[file|dentry]_xattr(), i.e. return length the xattr value on success. Signed-off-by: Song Liu <song@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Matt Bobrowski <mattbobrowski@google.com> Link: https://lore.kernel.org/r/20250130213549.3353349-2-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-02-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.14-rc3). No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-12Merge patch series "fs: allow changing idmappings"Christian Brauner
Christian Brauner <brauner@kernel.org> says: Currently, it isn't possible to change the idmapping of an idmapped mount. This is becoming an obstacle for various use-cases. /* idmapped home directories with systemd-homed */ On newer systems /home is can be an idmapped mount such that each file on disk is owned by 65536 and a subfolder exists for foreign id ranges such as containers. For example, a home directory might look like this (using an arbitrary folder as an example): user1@localhost:~/data/mount-idmapped$ ls -al /data/ total 16 drwxrwxrwx 1 65536 65536 36 Jan 27 12:15 . drwxrwxr-x 1 root root 184 Jan 27 12:06 .. -rw-r--r-- 1 65536 65536 0 Jan 27 12:07 aaa -rw-r--r-- 1 65536 65536 0 Jan 27 12:07 bbb -rw-r--r-- 1 65536 65536 0 Jan 27 12:07 cc drwxr-xr-x 1 2147352576 2147352576 0 Jan 27 19:06 containers When logging in home is mounted as an idmapped mount with the following idmappings: 65536:$(id -u):1 // uid mapping 65536:$(id -g):1 // gid mapping 2147352576:2147352576:65536 // uid mapping 2147352576:2147352576:65536 // gid mapping So for a user with uid/gid 1000 an idmapped /home would like like this: user1@localhost:~/data/mount-idmapped$ ls -aln /mnt/ total 16 drwxrwxrwx 1 1000 1000 36 Jan 27 12:15 . drwxrwxr-x 1 0 0 184 Jan 27 12:06 .. -rw-r--r-- 1 1000 1000 0 Jan 27 12:07 aaa -rw-r--r-- 1 1000 1000 0 Jan 27 12:07 bbb -rw-r--r-- 1 1000 1000 0 Jan 27 12:07 cc drwxr-xr-x 1 2147352576 2147352576 0 Jan 27 19:06 containers In other words, 65536 is mapped to the user's uid/gid and the range 2147352576 up to 2147352576 + 65536 is an identity mapping for containers. When a container is started a transient uid/gid range is allocated outside of both mappings of the idmapped mount. For example, the container might get the idmapping: $ cat /proc/1742611/uid_map 0 537985024 65536 This container will be allowed to write to disk within the allocated foreign id range 2147352576 to 2147352576 + 65536. To do this an idmapped mount must be created from an already idmapped mount such that: - The mappings for the user's uid/gid must be dropped, i.e., the following mappings are removed: 65536:$(id -u):1 // uid mapping 65536:$(id -g):1 // gid mapping - A mapping for the transient uid/gid range to the foreign uid/gid range is added: 2147352576:537985024:65536 In combination this will mean that the container will write to disk within the foreign id range 2147352576 to 2147352576 + 65536. /* nested containers */ When the outer container makes use of idmapped mounts it isn't posssible to create an idmapped mount for the inner container with a differen idmapping from the outer container's idmapped mount. There are other usecases and the two above just serve as an illustration of the problem. This patchset makes it possible to create a new idmapped mount from an already idmapped mount. It aims to adhere to current performance constraints and requirements: - Idmapped mounts aim to have near zero performance implications for path lookup. That is why no refernce counting, locking or any other mechanism can be required that would impact performance. This works be ensuring that a regular mount transitions to an idmapped mount once going from a static nop_mnt_idmap mapping to a non-static idmapping. - The idmapping of a mount change anymore for the lifetime of the mount afterwards. This not just avoids UAF issues it also avoids pitfalls such as generating non-matching uid/gid values. Changing idmappings could be solved by: - Idmappings could simply be reference counted (above the simple reference count when sharing them across multiple mounts). This would require pairing mnt_idmap_get() with mnt_idmap_put() which would end up being sprinkled everywhere into the VFS and some filesystems that access idmappings directly. It wouldn't just be quite ugly and introduce new complexity it would have a noticeable performance impact. - Idmappings could gain RCU protection. This would help the LOOKUP_RCU case and avoids taking reference counts under RCU. When not under LOOKUP_RCU reference counts need to be acquired on each idmapping. This would require pairing mnt_idmap_get() with mnt_idmap_put() which would end up being sprinkled everywhere into the VFS and some filesystems that access idmappings directly. This would have the same downsides as mentioned earlier. - The earlier solutions work by updating the mnt->mnt_idmap pointer with the new idmapping. Instead of this it would be possible to change the idmapping itself to avoid UAF issues. To do this a sequence counter would have to be added to struct mount. When retrieving the idmapping to generate uid/gid values the sequence counter would need to be sampled and the generation of the uid/gid would spin until the update of the idmap is finished. This has problems as well but the biggest issue will be that this can lead to inconsistent permission checking and inconsistent uid/gid pairs even more than this is already possible today. Specifically, during creation it could happen that: idmap = mnt_idmap(mnt); inode_permission(idmap, ...); may_create(idmap); // create file with uid/gid based on @idmap in between the permission checking and the generation of the uid/gid value the idmapping could change leading to the permission checking and uid/gid value that is actually used to create a file on disk being out of sync. Similarly if two values are generated like: idmap = mnt_idmap(mnt) vfsgid = make_vfsgid(idmap); // idmapping gets update concurrently vfsuid = make_vfsuid(idmap); @vfsgid and @vfsuid could be out of sync if the idmapping was changed in between. The generation of vfsgid/vfsuid could span a lot of codelines so to guard against this a sequence count would have to be passed around. The performance impact of this solutio are less clear but very likely not zero. - Using SRCU similar to fanotify that can sleep. I find that not just ugly but it would have memory consumption implications and is overall pretty ugly. /* solution */ So, to avoid all of these pitfalls creating an idmapped mount from an already idmapped mount will be done atomically, i.e., a new detached mount is created and a new set of mount properties applied to it without it ever having been exposed to userspace at all. This can be done in two ways. A new flag to open_tree() is added OPEN_TREE_CLEAR_IDMAP that clears the old idmapping and returns a mount that isn't idmapped. And then it is possible to set mount attributes on it again including creation of an idmapped mount. This has the consequence that a file descriptor must exist in userspace that doesn't have any idmapping applied and it will thus never work in unpriviledged scenarios. As a container would be able to remove the idmapping of the mount it has been given. That should be avoided. Instead, we add open_tree_attr() which works just like open_tree() but takes an optional struct mount_attr parameter. This is useful beyond idmappings as it fills a gap where a mount never exists in userspace without the necessary mount properties applied. This is particularly useful for mount options such as MOUNT_ATTR_{RDONLY,NOSUID,NODEV,NOEXEC}. To create a new idmapped mount the following works: // Create a first idmapped mount struct mount_attr attr = { .attr_set = MOUNT_ATTR_IDMAP .userns_fd = fd_userns }; fd_tree = open_tree(-EBADF, "/", OPEN_TREE_CLONE, &attr, sizeof(attr)); move_mount(fd_tree, "", -EBADF, "/mnt", MOVE_MOUNT_F_EMPTY_PATH); // Create a second idmapped mount from the first idmapped mount attr.attr_set = MOUNT_ATTR_IDMAP; attr.userns_fd = fd_userns2; fd_tree2 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE, &attr, sizeof(attr)); // Create a second non-idmapped mount from the first idmapped mount: memset(&attr, 0, sizeof(attr)); attr.attr_clr = MOUNT_ATTR_IDMAP; fd_tree2 = open_tree(-EBADF, "/mnt", OPEN_TREE_CLONE, &attr, sizeof(attr)); * patches from https://lore.kernel.org/r/20250128-work-mnt_idmap-update-v2-v1-0-c25feb0d2eb3@kernel.org: fs: allow changing idmappings fs: add kflags member to struct mount_kattr fs: add open_tree_attr() fs: add copy_mount_setattr() helper fs: add vfs_open_tree() helper Link: https://lore.kernel.org/r/20250128-work-mnt_idmap-update-v2-v1-0-c25feb0d2eb3@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-12statmount: add a new supported_mask fieldJeff Layton
Some of the fields in the statmount() reply can be optional. If the kernel has nothing to emit in that field, then it doesn't set the flag in the reply. This presents a problem: There is currently no way to know what mask flags the kernel supports since you can't always count on them being in the reply. Add a new STATMOUNT_SUPPORTED_MASK flag and field that the kernel can set in the reply. Userland can use this to determine if the fields it requires from the kernel are supported. This also gives us a way to deprecate fields in the future, if that should become necessary. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20250206-statmount-v2-1-6ae70a21c2ab@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-12statmount: allow to retrieve idmappingsChristian Brauner
This adds the STATMOUNT_MNT_UIDMAP and STATMOUNT_MNT_GIDMAP options. It allows the retrieval of idmappings via statmount(). Currently it isn't possible to figure out what idmappings are applied to an idmapped mount. This information is often crucial. Before statmount() the only realistic options for an interface like this would have been to add it to /proc/<pid>/fdinfo/<nr> or to expose it in /proc/<pid>/mountinfo. Both solution would have been pretty ugly and would've shown information that is of strong interest to some application but not all. statmount() is perfect for this. The idmappings applied to an idmapped mount are shown relative to the caller's user namespace. This is the most useful solution that doesn't risk leaking information or confuse the caller. For example, an idmapped mount might have been created with the following idmappings: mount --bind -o X-mount.idmap="0:10000:1000 2000:2000:1 3000:3000:1" /srv /opt Listing the idmappings through statmount() in the same context shows: mnt_id: 2147485088 mnt_parent_id: 2147484816 fs_type: btrfs mnt_root: /srv mnt_point: /opt mnt_opts: ssd,discard=async,space_cache=v2,subvolid=5,subvol=/ mnt_uidmap[0]: 0 10000 1000 mnt_uidmap[1]: 2000 2000 1 mnt_uidmap[2]: 3000 3000 1 mnt_gidmap[0]: 0 10000 1000 mnt_gidmap[1]: 2000 2000 1 mnt_gidmap[2]: 3000 3000 1 But the idmappings might not always be resolvable in the caller's user namespace. For example: unshare --user --map-root In this case statmount() will skip any mappings that fil to resolve in the caller's idmapping: mnt_id: 2147485087 mnt_parent_id: 2147484016 fs_type: btrfs mnt_root: /srv mnt_point: /opt mnt_opts: ssd,discard=async,space_cache=v2,subvolid=5,subvol=/ The caller can differentiate between a mount not being idmapped and a mount that is idmapped but where all mappings fail to resolve in the caller's idmapping by check for the STATMOUNT_MNT_{G,U}IDMAP flag being raised but the number of mappings in ->mnt_{g,u}idmap_num being zero. Note that statmount() requires that the whole range must be resolvable in the caller's user namespace. If a subrange fails to map it will still list the map as not resolvable. This is a practical compromise to avoid having to find which subranges are resovable and wich aren't. Idmappings are listed as a string array with each mapping separated by zero bytes. This allows to retrieve the idmappings and immediately use them for writing to e.g., /proc/<pid>/{g,u}id_map and it also allow for simple iteration like: if (stmnt->mask & STATMOUNT_MNT_UIDMAP) { const char *idmap = stmnt->str + stmnt->mnt_uidmap; for (size_t idx = 0; idx < stmnt->mnt_uidmap_nr; idx++) { printf("mnt_uidmap[%lu]: %s\n", idx, idmap); idmap += strlen(idmap) + 1; } } Link: https://lore.kernel.org/r/20250204-work-mnt_idmap-statmount-v2-2-007720f39f2e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-12f2fs: add ioctl to get IO priority hintJaegeuk Kim
This patch adds an ioctl to give a per-file priority hint to attach REQ_PRIO. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-11thermal/netlink: Prevent userspace segmentation fault by adjusting UAPI headerZhang Rui
The intel-lpmd tool [1], which uses the THERMAL_GENL_ATTR_CPU_CAPABILITY attribute to receive HFI events from kernel space, encounters a segmentation fault after commit 1773572863c4 ("thermal: netlink: Add the commands and the events for the thresholds"). The issue arises because the THERMAL_GENL_ATTR_CPU_CAPABILITY raw value was changed while intel_lpmd still uses the old value. Although intel_lpmd can be updated to check the THERMAL_GENL_VERSION and use the appropriate THERMAL_GENL_ATTR_CPU_CAPABILITY value, the commit itself is questionable. The commit introduced a new element in the middle of enum thermal_genl_attr, which affects many existing attributes and introduces potential risks and unnecessary maintenance burdens for userspace thermal netlink event users. Solve the issue by moving the newly introduced THERMAL_GENL_ATTR_TZ_PREV_TEMP attribute to the end of the enum thermal_genl_attr. This ensures that all existing thermal generic netlink attributes remain unaffected. Link: https://github.com/intel/intel-lpmd [1] Fixes: 1773572863c4 ("thermal: netlink: Add the commands and the events for the thresholds") Signed-off-by: Zhang Rui <rui.zhang@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20250208074907.5679-1-rui.zhang@intel.com [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-02-11tcp: add the ability to control max RTOEric Dumazet
Currently, TCP stack uses a constant (120 seconds) to limit the RTO value exponential growth. Some applications want to set a lower value. Add TCP_RTO_MAX_MS socket option to set a value (in ms) between 1 and 120 seconds. It is discouraged to change the socket rto max on a live socket, as it might lead to unexpected disconnects. Following patch is adding a netns sysctl to control the default value at socket creation time. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-11wifi: nl80211/cfg80211: Stop supporting cooked monitorAlexander Wetzel
Unconditionally start to refuse creating cooked monitor interfaces to phase them out. There is no feature flag for drivers to opt-in for cooked monitor and all known users are using/preferring the modern API since the hostapd release 1.0 in May 2012. Signed-off-by: Alexander Wetzel <Alexander@wetzel-home.de> Link: https://patch.msgid.link/20250204111352.7004-1-Alexander@wetzel-home.de Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-02-10elf: Define note name macrosAkihiko Odaki
elf.h had a comment saying: > Notes used in ET_CORE. Architectures export some of the arch register > sets using the corresponding note types via the PTRACE_GETREGSET and > PTRACE_SETREGSET requests. > The note name for these types is "LINUX", except NT_PRFPREG that is > named "CORE". However, NT_PRSTATUS is also named "CORE". It is also unclear what "these types" refers to. To fix these problems, define a name for each note type. The added definitions are macros so the kernel and userspace can directly refer to them to remove their duplicate definitions of note names. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Acked-by: Baoquan He <bhe@redhat.com> Reviewed-by: Dave Martin <Dave.Martin@arm.com> Link: https://lore.kernel.org/r/20250115-elf-v5-1-0f9e55bbb2fc@daynix.com Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-10blk-crypto: add ioctls to create and prepare hardware-wrapped keysEric Biggers
Until this point, the kernel can use hardware-wrapped keys to do encryption if userspace provides one -- specifically a key in ephemerally-wrapped form. However, no generic way has been provided for userspace to get such a key in the first place. Getting such a key is a two-step process. First, the key needs to be imported from a raw key or generated by the hardware, producing a key in long-term wrapped form. This happens once in the whole lifetime of the key. Second, the long-term wrapped key needs to be converted into ephemerally-wrapped form. This happens each time the key is "unlocked". In Android, these operations are supported in a generic way through KeyMint, a userspace abstraction layer. However, that method is Android-specific and can't be used on other Linux systems, may rely on proprietary libraries, and also misleads people into supporting KeyMint features like rollback resistance that make sense for other KeyMint keys but don't make sense for hardware-wrapped inline encryption keys. Therefore, this patch provides a generic kernel interface for these operations by introducing new block device ioctls: - BLKCRYPTOIMPORTKEY: convert a raw key to long-term wrapped form. - BLKCRYPTOGENERATEKEY: have the hardware generate a new key, then return it in long-term wrapped form. - BLKCRYPTOPREPAREKEY: convert a key from long-term wrapped form to ephemerally-wrapped form. These ioctls are implemented using new operations in blk_crypto_ll_ops. Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> # sm8650 Link: https://lore.kernel.org/r/20250204060041.409950-4-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-02-08iio: introduce the FAULT event typeGuillaume Ranquet
Add a new event type to describe an hardware failure. Reviewed-by: Nuno Sa <nuno.sa@analog.com> Signed-off-by: Guillaume Ranquet <granquet@baylibre.com> Reviewed-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20250127-ad4111_openwire-v5-1-ef2db05c384f@baylibre.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2025-02-06net: ethtool: tsconfig: Fix netlink type of hwtstamp flagsKory Maincent
Fix the netlink type for hardware timestamp flags, which are represented as a bitset of flags. Although only one flag is supported currently, the correct netlink bitset type should be used instead of u32 to keep consistency with other fields. Address this by adding a new named string set description for the hwtstamp flag structure. The code has been introduced in the current release so the uAPI change is still okay. Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Fixes: 6e9e2eed4f39 ("net: ethtool: Add support for tsconfig command to get/set hwtstamp config") Link: https://patch.msgid.link/20250205110304.375086-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06Merge branch 'io_uring-zero-copy-rx'Jakub Kicinski
David Wei says: ==================== io_uring zero copy rx This patchset contains net/ patches needed by a new io_uring request implementing zero copy rx into userspace pages, eliminating a kernel to user copy. We configure a page pool that a driver uses to fill a hw rx queue to hand out user pages instead of kernel pages. Any data that ends up hitting this hw rx queue will thus be dma'd into userspace memory directly, without needing to be bounced through kernel memory. 'Reading' data out of a socket instead becomes a _notification_ mechanism, where the kernel tells userspace where the data is. The overall approach is similar to the devmem TCP proposal. This relies on hw header/data split, flow steering and RSS to ensure packet headers remain in kernel memory and only desired flows hit a hw rx queue configured for zero copy. Configuring this is outside of the scope of this patchset. We share netdev core infra with devmem TCP. The main difference is that io_uring is used for the uAPI and the lifetime of all objects are bound to an io_uring instance. Data is 'read' using a new io_uring request type. When done, data is returned via a new shared refill queue. A zero copy page pool refills a hw rx queue from this refill queue directly. Of course, the lifetime of these data buffers are managed by io_uring rather than the networking stack, with different refcounting rules. This patchset is the first step adding basic zero copy support. We will extend this iteratively with new features e.g. dynamically allocated zero copy areas, THP support, dmabuf support, improved copy fallback, general optimisations and more. In terms of netdev support, we're first targeting Broadcom bnxt. Patches aren't included since Taehee Yoo has already sent a more comprehensive patchset adding support in [1]. Google gve should already support this, and Mellanox mlx5 support is WIP pending driver changes. =========== Performance =========== Note: Comparison with epoll + TCP_ZEROCOPY_RECEIVE isn't done yet. Test setup: * AMD EPYC 9454 * Broadcom BCM957508 200G * Kernel v6.11 base [2] * liburing fork [3] * kperf fork [4] * 4K MTU * Single TCP flow With application thread + net rx softirq pinned to _different_ cores: +-------------------------------+ | epoll | io_uring | |-----------|-------------------| | 82.2 Gbps | 116.2 Gbps (+41%) | +-------------------------------+ Pinned to _same_ core: +-------------------------------+ | epoll | io_uring | |-----------|-------------------| | 62.6 Gbps | 80.9 Gbps (+29%) | +-------------------------------+ ===== Links ===== Broadcom bnxt support: [1]: https://lore.kernel.org/20241003160620.1521626-8-ap420073@gmail.com Linux kernel branch including io_uring bits: [2]: https://github.com/isilence/linux.git zcrx/v13 liburing for testing: [3]: https://github.com/isilence/liburing.git zcrx/next kperf for testing: [4]: https://git.kernel.dk/kperf.git ==================== Link: https://patch.msgid.link/20250204215622.695511-1-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06netdev: add io_uring memory provider infoDavid Wei
Add a nested attribute for io_uring memory provider info. For now it is empty and its presence indicates that a particular page pool or queue has an io_uring memory provider attached. $ ./cli.py --spec netlink/specs/netdev.yaml --dump page-pool-get [{'id': 80, 'ifindex': 2, 'inflight': 64, 'inflight-mem': 262144, 'napi-id': 525}, {'id': 79, 'ifindex': 2, 'inflight': 320, 'inflight-mem': 1310720, 'io_uring': {}, 'napi-id': 525}, ... $ ./cli.py --spec netlink/specs/netdev.yaml --dump queue-get [{'id': 0, 'ifindex': 1, 'type': 'rx'}, {'id': 0, 'ifindex': 1, 'type': 'tx'}, {'id': 0, 'ifindex': 2, 'napi-id': 513, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 514, 'type': 'rx'}, ... {'id': 12, 'ifindex': 2, 'io_uring': {}, 'napi-id': 525, 'type': 'rx'}, ... Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://patch.msgid.link/20250204215622.695511-6-dw@davidwei.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-06ethtool: Add support for 200Gbps per lane link modesJianbo Liu
Define 200G, 400G and 800G link modes using 200Gbps per lane. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-02-05bpf: Add comment about helper freezeLevi Zim
Put a comment after the bpf helper list in uapi bpf.h to prevent people from trying to add new helpers there and direct them to kfuncs. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Levi Zim <rsworktech@outlook.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Acked-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/bpf/CAEf4BzZvQF+QQ=oip4vdz5A=9bd+OmN-CXk5YARYieaipK9s+A@mail.gmail.com/ Link: https://lore.kernel.org/bpf/20221231004213.h5fx3loccbs5hyzu@macbook-pro-6.dhcp.thefacebook.com/ Link: https://lore.kernel.org/bpf/20250204-bpf-helper-freeze-v1-1-46efd9ff20dc@outlook.com
2025-02-05docs/bpf: Document the semantics of BTF tags with kind_flagIhor Solodrai
Explain the meaning of kind_flag in BTF type_tags and decl_tags. Update uapi btf.h kind_flag comment to reflect the changes. Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250130201239.1429648-3-ihor.solodrai@linux.dev
2025-02-05fanotify: notify on mount attach and detachMiklos Szeredi
Add notifications for attaching and detaching mounts. The following new event masks are added: FAN_MNT_ATTACH - Mount was attached FAN_MNT_DETACH - Mount was detached If a mount is moved, then the event is reported with (FAN_MNT_ATTACH | FAN_MNT_DETACH). These events add an info record of type FAN_EVENT_INFO_TYPE_MNT containing these fields identifying the affected mounts: __u64 mnt_id - the ID of the mount (see statmount(2)) FAN_REPORT_MNT must be supplied to fanotify_init() to receive these events and no other type of event can be received with this report type. Marks are added with FAN_MARK_MNTNS, which records the mount namespace from an nsfs file (e.g. /proc/self/ns/mnt). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250129165803.72138-3-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-05pidfd: add PIDFD_SELF* sentinels to refer to own thread/processLorenzo Stoakes
It is useful to be able to utilise the pidfd mechanism to reference the current thread or process (from a userland point of view - thread group leader from the kernel's point of view). Therefore introduce PIDFD_SELF_THREAD to refer to the current thread, and PIDFD_SELF_THREAD_GROUP to refer to the current thread group leader. For convenience and to avoid confusion from userland's perspective we alias these: * PIDFD_SELF is an alias for PIDFD_SELF_THREAD - This is nearly always what the user will want to use, as they would find it surprising if for instance fd's were unshared()'d and they wanted to invoke pidfd_getfd() and that failed. * PIDFD_SELF_PROCESS is an alias for PIDFD_SELF_THREAD_GROUP - Most users have no concept of thread groups or what a thread group leader is, and from userland's perspective and nomenclature this is what userland considers to be a process. We adjust pidfd_get_task() and the pidfd_send_signal() system call with specific handling for this, implementing this functionality for process_madvise(), process_mrelease() (albeit, using it here wouldn't really make sense) and pidfd_send_signal(). Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Link: https://lore.kernel.org/r/24315a16a3d01a548dd45c7515f7d51c767e954e.1738268370.git.lorenzo.stoakes@oracle.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-05counter: add direction change eventDavid Lechner
Add COUNTER_EVENT_DIRECTION_CHANGE to be used by drivers to emit events when a counter detects a change in direction. Signed-off-by: David Lechner <dlechner@baylibre.com> Link: https://lore.kernel.org/r/20250110-counter-ti-eqep-add-direction-support-v2-2-c6b6f96d2db9@baylibre.com Signed-off-by: William Breathitt Gray <wbg@kernel.org>
2025-01-29Merge tag 'fuse-update-6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: "Add support for io-uring communication between kernel and userspace using IORING_OP_URING_CMD (Bernd Schubert). Following features enable gains in performance compared to the regular interface: - Allow processing multiple requests with less syscall overhead - Combine commit of old and fetch of new fuse request - CPU/NUMA affinity of queues Patches were reviewed by several people, including Pavel Begunkov, io-uring co-maintainer" * tag 'fuse-update-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: prevent disabling io-uring on active connections fuse: enable fuse-over-io-uring fuse: block request allocation until io-uring init is complete fuse: {io-uring} Prevent mount point hang on fuse-server termination fuse: Allow to queue bg requests through io-uring fuse: Allow to queue fg requests through io-uring fuse: {io-uring} Make fuse_dev_queue_{interrupt,forget} non-static fuse: {io-uring} Handle teardown of ring entries fuse: Add io-uring sqe commit and fetch support fuse: {io-uring} Make hash-list req unique finding functions non-static fuse: Add fuse-io-uring handling into fuse_copy fuse: Make fuse_copy non static fuse: {io-uring} Handle SQEs - register commands fuse: make args->in_args[0] to be always the header fuse: Add fuse-io-uring design documentation fuse: Move request bits fuse: Move fuse_get_dev to header file fuse: rename to fuse_dev_end_requests and make non-static
2025-01-27Merge tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "Jeff Layton contributed an implementation of NFSv4.2+ attribute delegation, as described here: https://www.ietf.org/archive/id/draft-ietf-nfsv4-delstid-08.html This interoperates with similar functionality introduced into the Linux NFS client in v6.11. An attribute delegation permits an NFS client to manage a file's mtime, rather than flushing dirty data to the NFS server so that the file's mtime reflects the last write, which is considerably slower. Neil Brown contributed dynamic NFSv4.1 session slot table resizing. This facility enables NFSD to increase or decrease the number of slots per NFS session depending on server memory availability. More session slots means greater parallelism. Chuck Lever fixed a long-standing latent bug where NFSv4 COMPOUND encoding screws up when crossing a page boundary in the encoding buffer. This is a zero-day bug, but hitting it is rare and depends on the NFS client implementation. The Linux NFS client does not happen to trigger this issue. A variety of bug fixes and other incremental improvements fill out the list of commits in this release. Great thanks to all contributors, reviewers, testers, and bug reporters who participated during this development cycle" * tag 'nfsd-6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (42 commits) sunrpc: Remove gss_{de,en}crypt_xdr_buf deadcode sunrpc: Remove gss_generic_token deadcode sunrpc: Remove unused xprt_iter_get_xprt Revert "SUNRPC: Reduce thread wake-up rate when receiving large RPC messages" nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION nfsd: handle delegated timestamps in SETATTR nfsd: add support for delegated timestamps nfsd: rework NFS4_SHARE_WANT_* flag handling nfsd: add support for FATTR4_OPEN_ARGUMENTS nfsd: prepare delegation code for handing out *_ATTRS_DELEG delegations nfsd: rename NFS4_SHARE_WANT_* constants to OPEN4_SHARE_ACCESS_WANT_* nfsd: switch to autogenerated definitions for open_delegation_type4 nfs_common: make include/linux/nfs4.h include generated nfs4_1.h nfsd: fix handling of delegated change attr in CB_GETATTR SUNRPC: Document validity guarantees of the pointer returned by reserve_space NFSD: Insulate nfsd4_encode_fattr4() from page boundaries in the encode buffer NFSD: Insulate nfsd4_encode_secinfo() from page boundaries in the encode buffer NFSD: Refactor nfsd4_do_encode_secinfo() again NFSD: Insulate nfsd4_encode_readlink() from page boundaries in the encode buffer NFSD: Insulate nfsd4_encode_read_plus_data() from page boundaries in the encode buffer ...
2025-01-27Merge tag 'for-6.14/dm-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mikulas Patocka: - fix a spelling error in dm-raid - change kzalloc to kcalloc - remove useless test in alloc_multiple_bios - disable REQ_NOWAIT for flushes - dm-transaction-manager: use red-black trees instead of linear lists - atomic writes support for dm-linear, dm-stripe and dm-mirror - dm-crypt: code cleanups and two bugfixes * tag 'for-6.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm-crypt: track tag_offset in convert_context dm-crypt: don't initialize cc_sector again dm-crypt: don't update io->sector after kcryptd_crypt_write_io_submit() dm-crypt: use bi_sector in bio when initialize integrity seed dm-crypt: fully initialize clone->bi_iter in crypt_alloc_buffer() dm-crypt: set atomic as false when calling crypt_convert() in kworker dm-mirror: Support atomic writes dm-io: Warn on creating multiple atomic write bios for a region dm-stripe: Enable atomic writes dm-linear: Enable atomic writes dm: Ensure cloned bio is same length for atomic write dm-table: atomic writes support dm-transaction-manager: use red-black trees instead of linear lists dm: disable REQ_NOWAIT for flushes dm: remove useless test in alloc_multiple_bios dm: change kzalloc to kcalloc dm raid: fix spelling errors in raid_ctr()
2025-01-27Merge tag 'char-misc-6.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull Char/Misc/IIO driver updates from Greg KH: "Here is the "big" set of char/misc/iio and other smaller driver subsystem updates for 6.14-rc1. Loads of different things in here this development cycle, highlights are: - ntsync "driver" to handle Windows locking types enabling Wine to work much better on many workloads (i.e. games). The driver framework was in 6.13, but now it's enabled and fully working properly. Should make many SteamOS users happy. Even comes with tests! - Large IIO driver updates and bugfixes - FPGA driver updates - Coresight driver updates - MHI driver updates - PPS driver updatesa - const bin_attribute reworking for many drivers - binder driver updates - smaller driver updates and fixes All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (311 commits) ntsync: Fix reference leaks in the remaining create ioctls. spmi: hisi-spmi-controller: Drop duplicated OF node assignment in spmi_controller_probe() spmi: Set fwnode for spmi devices ntsync: fix a file reference leak in drivers/misc/ntsync.c scripts/tags.sh: Don't tag usages of DECLARE_BITMAP dt-bindings: interconnect: qcom,msm8998-bwmon: Add SM8750 CPU BWMONs dt-bindings: interconnect: OSM L3: Document sm8650 OSM L3 compatible dt-bindings: interconnect: qcom-bwmon: Document QCS615 bwmon compatibles interconnect: sm8750: Add missing const to static qcom_icc_desc memstick: core: fix kernel-doc notation intel_th: core: fix kernel-doc warnings binder: log transaction code on failure iio: dac: ad3552r-hs: clear reset status flag iio: dac: ad3552r-common: fix ad3541/2r ranges iio: chemical: bme680: Fix uninitialized variable in __bme680_read_raw() misc: fastrpc: Fix copy buffer page size misc: fastrpc: Fix registered buffer page address misc: fastrpc: Deregister device nodes properly in error scenarios nvmem: core: improve range check for nvmem_cell_write() nvmem: qcom-spmi-sdam: Set size in struct nvmem_config ...
2025-01-27Merge tag 'usb-6.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB / Thunderbolt driver updates from Greg KH: "Here is the USB and Thunderbolt driver updates for 6.14-rc1. Nothing huge in here, just lots of new hardware support and updates for existing drivers. Changes here are: - big gadget f_tcm driver update - other gadget driver updates and fixes - thunderbolt driver updates for new hardware and capabilities and lots more debugging functionality to handle it when things aren't working well. - xhci driver updates - new USB-serial device updates - typec driver updates, including a chrome platform driver (acked by the subsystem maintainers) - other small driver updates All of these have been in linux-next for a while with no reported issues" * tag 'usb-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (123 commits) usb: hcd: Bump local buffer size in rh_string() Revert "usb: gadget: u_serial: Disable ep before setting port to null to fix the crash caused by port being null" usb: typec: tcpci: Prevent Sink disconnection before vPpsShutdown in SPR PPS usb: xhci: tegra: Fix OF boolean read warning usb: host: xhci-plat: add support compatible ID PNP0D15 usb: typec: ucsi: Add a macro definition for UCSI v1.0 usb: dwc3: core: Defer the probe until USB power supply ready usbip: Correct format specifier for seqnum from %d to %u usbip: Fix seqnum sign extension issue in vhci_tx_urb dt-bindings: usb: snps,dwc3: Split core description usb: quirks: Add NO_LPM quirk for TOSHIBA TransMemory-Mx device usb: dwc3: gadget: Reinitiate stream for all host NoStream behavior USB: Use str_enable_disable-like helpers USB: gadget: Use str_enable_disable-like helpers USB: phy: Use str_enable_disable-like helpers USB: typec: Use str_enable_disable-like helpers USB: host: Use str_enable_disable-like helpers USB: Replace own str_plural with common one USB: serial: quatech2: fix null-ptr-deref in qt2_process_read_urb() usb: phy: Remove API devm_usb_put_phy() ...
2025-01-27Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "A small number of improvements all over the place: - vdpa/octeon support for multiple interrupts - virtio-pci support for error recovery - vp_vdpa support for notification with data - vhost/net fix to set num_buffers for spec compliance - virtio-mem now works with kdump on s390 And small cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (23 commits) virtio_blk: Add support for transport error recovery virtio_pci: Add support for PCIe Function Level Reset vhost/net: Set num_buffers for virtio 1.0 vdpa/octeon_ep: read vendor-specific PCI capability virtio-pci: define type and header for PCI vendor data vdpa/octeon_ep: handle device config change events vdpa/octeon_ep: enable support for multiple interrupts per device vdpa: solidrun: Replace deprecated PCI functions s390/kdump: virtio-mem kdump support (CONFIG_PROC_VMCORE_DEVICE_RAM) virtio-mem: support CONFIG_PROC_VMCORE_DEVICE_RAM virtio-mem: remember usable region size virtio-mem: mark device ready before registering callbacks in kdump mode fs/proc/vmcore: introduce PROC_VMCORE_DEVICE_RAM to detect device RAM ranges in 2nd kernel fs/proc/vmcore: factor out freeing a list of vmcore ranges fs/proc/vmcore: factor out allocating a vmcore range and adding it to a list fs/proc/vmcore: move vmcore definitions out of kcore.h fs/proc/vmcore: prefix all pr_* with "vmcore:" fs/proc/vmcore: disallow vmcore modifications while the vmcore is open fs/proc/vmcore: replace vmcoredd_mutex by vmcore_mutex fs/proc/vmcore: convert vmcore_cb_lock into vmcore_mutex ...
2025-01-27virtio-pci: define type and header for PCI vendor dataShijith Thotton
Added macro definition for VIRTIO_PCI_CAP_VENDOR_CFG to identify the PCI vendor data type in the virtio_pci_cap structure. Defined a new struct virtio_pci_vndr_data for the vendor data capability header as per the specification. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Shijith Thotton <sthotton@marvell.com> Message-Id: <20250103153226.1933479-3-sthotton@marvell.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-26Merge tag 'mm-nonmm-stable-2025-01-24-23-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Mainly individually changelogged singleton patches. The patch series in this pull are: - "lib min_heap: Improve min_heap safety, testing, and documentation" from Kuan-Wei Chiu provides various tightenings to the min_heap library code - "xarray: extract __xa_cmpxchg_raw" from Tamir Duberstein preforms some cleanup and Rust preparation in the xarray library code - "Update reference to include/asm-<arch>" from Geert Uytterhoeven fixes pathnames in some code comments - "Converge on using secs_to_jiffies()" from Easwar Hariharan uses the new secs_to_jiffies() in various places where that is appropriate - "ocfs2, dlmfs: convert to the new mount API" from Eric Sandeen switches two filesystems to the new mount API - "Convert ocfs2 to use folios" from Matthew Wilcox does that - "Remove get_task_comm() and print task comm directly" from Yafang Shao removes now-unneeded calls to get_task_comm() in various places - "squashfs: reduce memory usage and update docs" from Phillip Lougher implements some memory savings in squashfs and performs some maintainability work - "lib: clarify comparison function requirements" from Kuan-Wei Chiu tightens the sort code's behaviour and adds some maintenance work - "nilfs2: protect busy buffer heads from being force-cleared" from Ryusuke Konishi fixes an issues in nlifs when the fs is presented with a corrupted image - "nilfs2: fix kernel-doc comments for function return values" from Ryusuke Konishi fixes some nilfs kerneldoc - "nilfs2: fix issues with rename operations" from Ryusuke Konishi addresses some nilfs BUG_ONs which syzbot was able to trigger - "minmax.h: Cleanups and minor optimisations" from David Laight does some maintenance work on the min/max library code - "Fixes and cleanups to xarray" from Kemeng Shi does maintenance work on the xarray library code" * tag 'mm-nonmm-stable-2025-01-24-23-16' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (131 commits) ocfs2: use str_yes_no() and str_no_yes() helper functions include/linux/lz4.h: add some missing macros Xarray: use xa_mark_t in xas_squash_marks() to keep code consistent Xarray: remove repeat check in xas_squash_marks() Xarray: distinguish large entries correctly in xas_split_alloc() Xarray: move forward index correctly in xas_pause() Xarray: do not return sibling entries from xas_find_marked() ipc/util.c: complete the kernel-doc function descriptions gcov: clang: use correct function param names latencytop: use correct kernel-doc format for func params minmax.h: remove some #defines that are only expanded once minmax.h: simplify the variants of clamp() minmax.h: move all the clamp() definitions after the min/max() ones minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp() minmax.h: reduce the #define expansion of min(), max() and clamp() minmax.h: update some comments minmax.h: add whitespace around operators and after commas nilfs2: do not update mtime of renamed directory that is not moved nilfs2: handle errors that nilfs_prepare_chunk() may return CREDITS: fix spelling mistake ...
2025-01-25Merge tag 'pci-v6.14-changes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci updates from Bjorn Helgaas: "Enumeration: - Batch sizing of multiple BARs while memory decoding is disabled instead of disabling/enabling decoding for each BAR individually; this optimizes virtualized environments where toggling decoding enable is expensive (Alex Williamson) - Add host bridge .enable_device() and .disable_device() hooks for bridges that need to configure things like Requester ID to StreamID mapping when enabling devices (Frank Li) - Extend struct pci_ecam_ops with .enable_device() and .disable_device() hooks so drivers that use pci_host_common_probe() instead of their own .probe() have a way to set the .enable_device() callbacks (Marc Zyngier) - Drop 'No bus range found' message so we don't complain when DTs don't specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas) - Rename the drivers/pci/of_property.c struct of_pci_range to of_pci_range_entry to avoid confusion with the global of_pci_range in include/linux/of_address.h (Bjorn Helgaas) Driver binding: - Update resource request API documentation to encourage callers to supply a driver name when requesting resources (Philipp Stanner) - Export pci_intx_unmanaged() and pcim_intx() (always managed) so callers of pci_intx() (which is sometimes managed) can explicitly choose the one they need (Philipp Stanner) - Convert drivers from pci_intx() to always-managed pcim_intx() or never-managed pci_intx_unmanaged(): amd_sfh, ata (ahci, ata_piix, pata_rdc, sata_sil24, sata_sis, sata_uli, sata_vsc), bnx2x, bna, ntb, qtnfmac, rtsx, tifm_7xx1, vfio, xen-pciback (Philipp Stanner) - Remove pci_intx_unmanaged() since pci_intx() is now always unmanaged and pcim_intx() is always managed (Philipp Stanner) Error handling: - Unexport pcie_read_tlp_log() to encourage drivers to use PCI core logging rather than building their own (Ilpo Järvinen) - Move TLP Log handling to its own file (Ilpo Järvinen) - Store number of supported End-End TLP Prefixes always so we can read the correct number of DWORDs from the TLP Prefix Log (Ilpo Järvinen) - Read TLP Prefixes in addition to the Header Log in pcie_read_tlp_log() (Ilpo Järvinen) - Add pcie_print_tlp_log() to consolidate printing of TLP Header and Prefix Log (Ilpo Järvinen) - Quirk the Intel Raptor Lake-P PIO log size to accommodate vendor BIOSes that don't configure it correctly (Takashi Iwai) ASPM: - Save parent L1 PM Substates config so when we restore it along with an endpoint's config, the parent info isn't junk (Jian-Hong Pan) Power management: - Avoid D3 for Root Ports on TUXEDO Sirius Gen1 with old BIOS because the system can't wake up from suspend (Werner Sembach) Endpoint framework: - Destroy the EPC device in devm_pci_epc_destroy(), which previously didn't call devres_release() (Zijun Hu) - Finish virtual EP removal in pci_epf_remove_vepf(), which previously caused a subsequent pci_epf_add_vepf() to fail with -EBUSY (Zijun Hu) - Write BAR_MASK before iATU registers in pci_epc_set_bar() so we don't depend on the BAR_MASK reset value being larger than the requested BAR size (Niklas Cassel) - Prevent changing BAR size/flags in pci_epc_set_bar() to prevent reads from bypassing the iATU if we reduced the BAR size (Niklas Cassel) - Verify address alignment when programming iATU so we don't attempt to write bits that are read-only because of the BAR size, which could lead to directing accesses to the wrong address (Niklas Cassel) - Implement artpec6 pci_epc_features so we can rely on all drivers supporting it so we can use it in EPC core code (Niklas Cassel) - Check for BARs of fixed size to prevent endpoint drivers from trying to change their size (Niklas Cassel) - Verify that requested BAR size is a power of two when endpoint driver sets the BAR (Niklas Cassel) Endpoint framework tests: - Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing dma_chan_rx (Mohamed Khalfella) - Correct the DMA MEMCPY test so it doesn't fail if the Endpoint supports both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam) - Add pci-epf-test and pci_endpoint_test support for capabilities (Niklas Cassel) - Add Endpoint test for consecutive BARs (Niklas Cassel) - Remove redundant comparison from Endpoint BAR test because a > 1MB BAR can always be exactly covered by iterating with a 1MB buffer (Hans Zhang) - Move and convert PCI Endpoint tests from tools/pci to Kselftests (Manivannan Sadhasivam) Apple PCIe controller driver: - Convert StreamID mapping configuration from a bus notifier to the .enable_device() and .disable_device() callbacks (Marc Zyngier) Freescale i.MX6 PCIe controller driver: - Add Requester ID to StreamID mapping configuration when enabling devices (Frank Li) - Use DWC core suspend/resume functions for imx6 (Frank Li) - Add suspend/resume support for i.MX8MQ, i.MX8Q, and i.MX95 (Richard Zhu) - Add DT compatible string 'fsl,imx8q-pcie-ep' and driver support for i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) Endpoints (Frank Li) - Add DT binding for optional i.MX95 Refclk and driver support to enable it if the platform hasn't enabled it (Richard Zhu) - Configure PHY based on controller being in Root Complex or Endpoint mode (Frank Li) - Rely on dbi2 and iATU base addresses from DT via dw_pcie_get_resources() instead of hardcoding them (Richard Zhu) - Deassert apps_reset in imx_pcie_deassert_core_reset() since it is asserted in imx_pcie_assert_core_reset() (Richard Zhu) - Add missing reference clock enable or disable logic for IMX6SX, IMX7D, IMX8MM (Richard Zhu) - Remove redundant imx7d_pcie_init_phy() since imx7d_pcie_enable_ref_clk() does the same thing (Richard Zhu) Freescale Layerscape PCIe controller driver: - Simplify by using syscon_regmap_lookup_by_phandle_args() instead of syscon_regmap_lookup_by_phandle() followed by of_property_read_u32_array() (Krzysztof Kozlowski) Marvell MVEBU PCIe controller driver: - Add MODULE_DEVICE_TABLE() to enable module autoloading (Liao Chen) MediaTek PCIe Gen3 controller driver: - Use clk_bulk_prepare_enable() instead of separate clk_bulk_prepare() and clk_bulk_enable() (Lorenzo Bianconi) - Rearrange reset assert/deassert so they're both done in the *_power_up() callbacks (Lorenzo Bianconi) - Document that Airoha EN7581 requires PHY init and power-on before PHY reset deassert, unlike other MediaTek Gen3 controllers (Lorenzo Bianconi) - Move Airoha EN7581 post-reset delay from the en7581 clock .enable() method to mtk_pcie_en7581_power_up() (Lorenzo Bianconi) - Sleep instead of delay during Airoha EN7581 power-up, since this is a non-atomic context (Lorenzo Bianconi) - Skip PERST# assertion on Airoha EN7581 during probe and suspend/resume to avoid a hardware defect (Lorenzo Bianconi) - Enable async probe to reduce system startup time (Douglas Anderson) Microchip PolarFlare PCIe controller driver: - Set up the inbound address translation based on whether the platform allows coherent or non-coherent DMA (Daire McNamara) - Update DT binding such that platforms are DMA-coherent by default and must specify 'dma-noncoherent' if needed (Conor Dooley) Mobiveil PCIe controller driver: - Convert mobiveil-pcie.txt to YAML and update 'interrupt-names' and 'reg-names' (Frank Li) Qualcomm PCIe controller driver: - Add DT SM8550 and SM8650 optional 'global' interrupt for link events (Neil Armstrong) - Add DT 'compatible' strings for IPQ5424 PCIe controller (Manikanta Mylavarapu) - If 'global' IRQ is supported for detection of Link Up events, tell DWC core not to wait for link up (Krishna chaitanya chundru) Renesas R-Car PCIe controller driver: - Avoid passing stack buffer as resource name (King Dix) Rockchip PCIe controller driver: - Simplify clock and reset handling by using bulk interfaces (Anand Moon) - Pass typed rockchip_pcie (not void) pointer to rockchip_pcie_disable_clocks() (Anand Moon) - Return -ENOMEM, not success, when pci_epc_mem_alloc_addr() fails (Dan Carpenter) Rockchip DesignWare PCIe controller driver: - Use dll_link_up IRQ to detect Link Up and enumerate devices so users don't have to manually rescan (Niklas Cassel) - Tell DWC core not to wait for link up since the 'sys' interrupt is required and detects Link Up events (Niklas Cassel) Synopsys DesignWare PCIe controller driver: - Don't wait for link up in DWC core if driver can detect Link Up event (Krishna chaitanya chundru) - Update ICC and OPP votes after Link Up events (Krishna chaitanya chundru) - Always stop link in dw_pcie_suspend_noirq(), which is required at least for i.MX8QM to re-establish link on resume (Richard Zhu) - Drop racy and unnecessary LTSSM state check before sending PME_TURN_OFF message in dw_pcie_suspend_noirq() (Richard Zhu) - Add struct of_pci_range.parent_bus_addr for devices that need their immediate parent bus address, not the CPU address, e.g., to program an internal Address Translation Unit (iATU) (Frank Li) TI DRA7xx PCIe controller driver: - Simplify by using syscon_regmap_lookup_by_phandle_args() instead of syscon_regmap_lookup_by_phandle() followed by of_parse_phandle_with_fixed_args() or of_property_read_u32_index() (Krzysztof Kozlowski) Xilinx Versal CPM PCIe controller driver: - Add DT binding and driver support for Xilinx Versal CPM5 (Thippeswamy Havalige) MicroSemi Switchtec management driver: - Add Microchip PCI100X device IDs (Rakesh Babu Saladi) Miscellaneous: - Move reset related sysfs code from pci.c to pci-sysfs.c where other similar code lives (Ilpo Järvinen) - Simplify reset_method_store() memory management by using __free() instead of explicit kfree() cleanup (Ilpo Järvinen) - Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM ACPI hotplug driver (Thomas Weißschuh) - Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong Zhang) - Correct documentation of the 'config_acs=' kernel parameter (Akihiko Odaki)" * tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (111 commits) PCI: Batch BAR sizing operations dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent PCI: microchip: Set inbound address translation for coherent or non-coherent mode Documentation: Fix pci=config_acs= example PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT PCI: Don't include 'pm_wakeup.h' directly selftests: pci_endpoint: Migrate to Kselftest framework selftests: Move PCI Endpoint tests from tools/pci to Kselftests misc: pci_endpoint_test: Fix IOCTL return value dt-bindings: PCI: qcom: Document the IPQ5424 PCIe controller dt-bindings: PCI: qcom,pcie-sm8550: Document 'global' interrupt dt-bindings: PCI: mobiveil: Convert mobiveil-pcie.txt to YAML PCI: switchtec: Add Microchip PCI100X device IDs misc: pci_endpoint_test: Remove redundant 'remainder' test misc: pci_endpoint_test: Add consecutive BAR test misc: pci_endpoint_test: Add support for capabilities PCI: endpoint: pci-epf-test: Add support for capabilities PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error PCI: dwc: Simplify config resource lookup ...
2025-01-25Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "Loongarch: - Clear LLBCTL if secondary mmu mapping changes - Add hypercall service support for usermode VMM x86: - Add a comment to kvm_mmu_do_page_fault() to explain why KVM performs a direct call to kvm_tdp_page_fault() when RETPOLINE is enabled - Ensure that all SEV code is compiled out when disabled in Kconfig, even if building with less brilliant compilers - Remove a redundant TLB flush on AMD processors when guest CR4.PGE changes - Use str_enabled_disabled() to replace open coded strings - Drop kvm_x86_ops.hwapic_irr_update() as KVM updates hardware's APICv cache prior to every VM-Enter - Overhaul KVM's CPUID feature infrastructure to track all vCPU capabilities instead of just those where KVM needs to manage state and/or explicitly enable the feature in hardware. Along the way, refactor the code to make it easier to add features, and to make it more self-documenting how KVM is handling each feature - Rework KVM's handling of VM-Exits during event vectoring; this plugs holes where KVM unintentionally puts the vCPU into infinite loops in some scenarios (e.g. if emulation is triggered by the exit), and brings parity between VMX and SVM - Add pending request and interrupt injection information to the kvm_exit and kvm_entry tracepoints respectively - Fix a relatively benign flaw where KVM would end up redoing RDPKRU when loading guest/host PKRU, due to a refactoring of the kernel helpers that didn't account for KVM's pre-checking of the need to do WRPKRU - Make the completion of hypercalls go through the complete_hypercall function pointer argument, no matter if the hypercall exits to userspace or not. Previously, the code assumed that KVM_HC_MAP_GPA_RANGE specifically went to userspace, and all the others did not; the new code need not special case KVM_HC_MAP_GPA_RANGE and in fact does not care at all whether there was an exit to userspace or not - As part of enabling TDX virtual machines, support support separation of private/shared EPT into separate roots. When TDX will be enabled, operations on private pages will need to go through the privileged TDX Module via SEAMCALLs; as a result, they are limited and relatively slow compared to reading a PTE. The patches included in 6.14 allow KVM to keep a mirror of the private EPT in host memory, and define entries in kvm_x86_ops to operate on external page tables such as the TDX private EPT - The recently introduced conversion of the NX-page reclamation kthread to vhost_task moved the task under the main process. The task is created as soon as KVM_CREATE_VM was invoked and this, of course, broke userspace that didn't expect to see any child task of the VM process until it started creating its own userspace threads. In particular crosvm refuses to fork() if procfs shows any child task, so unbreak it by creating the task lazily. This is arguably a userspace bug, as there can be other kinds of legitimate worker tasks and they wouldn't impede fork(); but it's not like userspace has a way to distinguish kernel worker tasks right now. Should they show as "Kthread: 1" in proc/.../status? x86 - Intel: - Fix a bug where KVM updates hardware's APICv cache of the highest ISR bit while L2 is active, while ultimately results in a hardware-accelerated L1 EOI effectively being lost - Honor event priority when emulating Posted Interrupt delivery during nested VM-Enter by queueing KVM_REQ_EVENT instead of immediately handling the interrupt - Rework KVM's processing of the Page-Modification Logging buffer to reap entries in the same order they were created, i.e. to mark gfns dirty in the same order that hardware marked the page/PTE dirty - Misc cleanups Generic: - Cleanup and harden kvm_set_memory_region(); add proper lockdep assertions when setting memory regions and add a dedicated API for setting KVM-internal memory regions. The API can then explicitly disallow all flags for KVM-internal memory regions - Explicitly verify the target vCPU is online in kvm_get_vcpu() to fix a bug where KVM would return a pointer to a vCPU prior to it being fully online, and give kvm_for_each_vcpu() similar treatment to fix a similar flaw - Wait for a vCPU to come online prior to executing a vCPU ioctl, to fix a bug where userspace could coerce KVM into handling the ioctl on a vCPU that isn't yet onlined - Gracefully handle xarray insertion failures; even though such failures are impossible in practice after xa_reserve(), reserving an entry is always followed by xa_store() which does not know (or differentiate) whether there was an xa_reserve() before or not RISC-V: - Zabha, Svvptc, and Ziccrse extension support for guests. None of them require anything in KVM except for detecting them and marking them as supported; Zabha adds byte and halfword atomic operations, while the others are markers for specific operation of the TLB and of LL/SC instructions respectively - Virtualize SBI system suspend extension for Guest/VM - Support firmware counters which can be used by the guests to collect statistics about traps that occur in the host Selftests: - Rework vcpu_get_reg() to return a value instead of using an out-param, and update all affected arch code accordingly - Convert the max_guest_memory_test into a more generic mmu_stress_test. The basic gist of the "conversion" is to have the test do mprotect() on guest memory while vCPUs are accessing said memory, e.g. to verify KVM and mmu_notifiers are working as intended - Play nice with treewrite builds of unsupported architectures, e.g. arm (32-bit), as KVM selftests' Makefile doesn't do anything to ensure the target architecture is actually one KVM selftests supports - Use the kernel's $(ARCH) definition instead of the target triple for arch specific directories, e.g. arm64 instead of aarch64, mainly so as not to be different from the rest of the kernel - Ensure that format strings for logging statements are checked by the compiler even when the logging statement itself is disabled - Attempt to whack the last LLC references/misses mole in the Intel PMU counters test by adding a data load and doing CLFLUSH{OPT} on the data instead of the code being executed. It seems that modern Intel CPUs have learned new code prefetching tricks that bypass the PMU counters - Fix a flaw in the Intel PMU counters test where it asserts that events are counting correctly without actually knowing what the events count given the underlying hardware; this can happen if Intel reuses a formerly microarchitecture-specific event encoding as an architectural event, as was the case for Top-Down Slots" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (151 commits) kvm: defer huge page recovery vhost task to later KVM: x86/mmu: Return RET_PF* instead of 1 in kvm_mmu_page_fault() KVM: Disallow all flags for KVM-internal memslots KVM: x86: Drop double-underscores from __kvm_set_memory_region() KVM: Add a dedicated API for setting KVM-internal memslots KVM: Assert slots_lock is held when setting memory regions KVM: Open code kvm_set_memory_region() into its sole caller (ioctl() API) LoongArch: KVM: Add hypercall service support for usermode VMM LoongArch: KVM: Clear LLBCTL if secondary mmu mapping is changed KVM: SVM: Use str_enabled_disabled() helper in svm_hardware_setup() KVM: VMX: read the PML log in the same order as it was written KVM: VMX: refactor PML terminology KVM: VMX: Fix comment of handle_vmx_instruction() KVM: VMX: Reinstate __exit attribute for vmx_exit() KVM: SVM: Use str_enabled_disabled() helper in sev_hardware_setup() KVM: x86: Avoid double RDPKRU when loading host/guest PKRU KVM: x86: Use LVT_TIMER instead of an open coded literal RISC-V: KVM: Add new exit statstics for redirected traps RISC-V: KVM: Update firmware counters for various events RISC-V: KVM: Redirect instruction access fault trap to guest ...
2025-01-24Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: "No major functionality this cycle: - iommufd part of the domain_alloc_paging_flags() conversion - Move IOMMU_HWPT_FAULT_ID_VALID processing out of drivers - Increase a timeout waiting for other threads to drop transient refcounts that syzkaller was hitting - Fix a UBSAN hit in iova_bitmap due to shift out of bounds - Add missing cleanup of fault events during FD shutdown, fixing a memory leak - Improve the fault delivery flow to have a smaller locking critical region that does not include copy_to_user() - Fix 32 bit ABI breakage due to missed implicit padding, and fix the stack memory leakage" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Fix struct iommu_hwpt_pgfault init and padding iommufd/fault: Use a separate spinlock to protect fault->deliver list iommufd/fault: Destroy response and mutex in iommufd_fault_destroy() iommufd: Keep OBJ/IOCTL lists in an alphabetical order iommufd/iova_bitmap: Fix shift-out-of-bounds in iova_bitmap_offset_to_index() iommu: iommufd: fix WARNING in iommufd_device_unbind iommufd: Deal with IOMMU_HWPT_FAULT_ID_VALID in iommufd core iommufd/selftest: Remove domain_alloc_paging()
2025-01-24Merge tag 'platform-drivers-x86-v6.14-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver updates from Ilpo Järvinen: "acer-wmi: - Add support for PH14-51, PH16-72, and Nitro AN515-58 - Add proper hwmon support - Improve error handling when reading "gaming system info" - Replace direct EC reads for the current platform profile with WMI calls to handle EC address variations - Replace custom platform_profile cycling with the generic one ACPI: - platform_profile: Major refactoring and improvements - Support registering multiple platform_profile handlers concurrently to avoid the need to quirk which handler takes precedence - Support reporting "custom" profile for cases where the current profile is ambiguous or when settings tweaks are done outside the pre-defined profile - Abstract and layer platform_profile API better using the class_dev and drvdata - Various minor improvements - Add Documentation and kerneldoc amd/hsmp: - Add support for HSMP protocol v7 amd/pmc: - Support AMD 1Ah family 70h - Support STB with Ryzen desktop SoCs amd/pmf: - Support Custom BIOS inputs for PMF TA - Support passing SRA sensor data from AMD SFH (HID) to PMF TA dell-smo8800: - Move SMO88xx quirk away from the generic i2c-i801 driver - Add accelerometer support for Dell Latitude E6330/E6430 and XPS 9550 - Support probing accelerometer for models yet to be listed in the DMI mapping table because ACPI lacks i2c-address for the accelerometer (behind a module parameter because probing might be dangerous) HID: - amd_sfh: Add support for exporting SRA sensor data hp-wmi: - Add fan and thermal support for Victus 16-s1000 input: - Add key for phone linking - i8042: Add context for the i8042 filter to enable cleaning up the filter related global variables from pdx86 drivers lenovo-wmi-camera: - Use SW_CAMERA_LENS_COVER instead of KEY_CAMERA_ACCESS mellanox mlxbf-pmc: - Add support for monitoring cycle count - Add Documentation thinkpad_acpi: - Add support for phone link key tools/power/x86/intel-speed-select: - Fix Turbo Ratio Limit restore x86-android-tables: - Add support for Vexia EDU ATLA 10 Bluetooth and EC battery driver And miscellaneous cleanups / refactoring / improvements" * tag 'platform-drivers-x86-v6.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (133 commits) platform/x86: acer-wmi: Fix initialization of last_non_turbo_profile platform/x86: acer-wmi: Ignore AC events platform/mellanox: mlxreg-io: use sysfs_emit() instead of sprintf() platform/mellanox: mlxreg-hotplug: use sysfs_emit() instead of sprintf() platform/mellanox: mlxbf-bootctl: use sysfs_emit() instead of sprintf() platform/x86: hp-wmi: Add fan and thermal profile support for Victus 16-s1000 ACPI: platform_profile: Add a prefix to log messages ACPI: platform_profile: Add documentation ACPI: platform_profile: Clean platform_profile_handler ACPI: platform_profile: Move platform_profile_handler ACPI: platform_profile: Remove platform_profile_handler from exported symbols platform/x86: thinkpad_acpi: Use devm_platform_profile_register() platform/x86: inspur_platform_profile: Use devm_platform_profile_register() platform/x86: hp-wmi: Use devm_platform_profile_register() platform/x86: ideapad-laptop: Use devm_platform_profile_register() platform/x86: dell-pc: Use devm_platform_profile_register() platform/x86: asus-wmi: Use devm_platform_profile_register() platform/x86: amd: pmf: sps: Use devm_platform_profile_register() platform/x86: acer-wmi: Use devm_platform_profile_register() platform/surface: surface_platform_profile: Use devm_platform_profile_register() ...