summaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)Author
2024-08-14media: rkisp1: Add support for the companding blockPaul Elder
Add support to the rkisp1 driver for the companding block that exists on the i.MX8MP version of the ISP. This requires usage of the new extensible parameters format, and showcases how the format allows for extensions without breaking backward compatibility. Signed-off-by: Paul Elder <paul.elder@ideasonboard.com> Reviewed-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Reviewed-by: Paul Elder <paul.elder@ideasonboard.com> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
2024-08-13usb: gadget: f_fs: add capability for dfu functional descriptorDavid Sands
Add the ability for the USB FunctionFS (FFS) gadget driver to be able to create Device Firmware Upgrade (DFU) functional descriptors. [1] This patch allows implementation of DFU in userspace using the FFS gadget. The DFU protocol uses the control pipe (ep0) for all messaging so only the addition of the DFU functional descriptor is needed in the kernel driver. The DFU functional descriptor is written to the ep0 file along with any other descriptors during FFS setup. DFU requires an interface descriptor followed by the DFU functional descriptor. This patch includes documentation of the added descriptor for DFU and conversion of some existing documentation to kernel-doc format so that it can be included in the generated docs. An implementation of DFU 1.1 that implements just the runtime descriptor using the FunctionFS gadget (with rebooting into u-boot for DFU mode) has been tested on an i.MX8 Nano. An implementation of DFU 1.1 that implements both runtime and DFU mode using the FunctionFS gadget has been tested on Xilinx Zynq UltraScale+. Note that for the best performance of firmware update file transfers, the userspace program should respond as quick as possible to the setup packets. [1] https://www.usb.org/sites/default/files/DFU_1.1.pdf Signed-off-by: David Sands <david.sands@biamp.com> Co-developed-by: Chris Wulff <crwulff@gmail.com> Signed-off-by: Chris Wulff <crwulff@gmail.com> Link: https://lore.kernel.org/r/20240811000004.1395888-2-crwulff@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-12net: nexthop: Increase weight to u16Petr Machata
In CLOS networks, as link failures occur at various points in the network, ECMP weights of the involved nodes are adjusted to compensate. With high fan-out of the involved nodes, and overall high number of nodes, a (non-)ECMP weight ratio that we would like to configure does not fit into 8 bits. Instead of, say, 255:254, we might like to configure something like 1000:999. For these deployments, the 8-bit weight may not be enough. To that end, in this patch increase the next hop weight from u8 to u16. Increasing the width of an integral type can be tricky, because while the code still compiles, the types may not check out anymore, and numerical errors come up. To prevent this, the conversion was done in two steps. First the type was changed from u8 to a single-member structure, which invalidated all uses of the field. This allowed going through them one by one and audit for type correctness. Then the structure was replaced with a vanilla u16 again. This should ensure that no place was missed. The UAPI for configuring nexthop group members is that an attribute NHA_GROUP carries an array of struct nexthop_grp entries: struct nexthop_grp { __u32 id; /* nexthop id - must exist */ __u8 weight; /* weight of this nexthop */ __u8 resvd1; __u16 resvd2; }; The field resvd1 is currently validated and required to be zero. We can lift this requirement and carry high-order bits of the weight in the reserved field: struct nexthop_grp { __u32 id; /* nexthop id - must exist */ __u8 weight; /* weight of this nexthop */ __u8 weight_high; __u16 resvd2; }; Keeping the fields split this way was chosen in case an existing userspace makes assumptions about the width of the weight field, and to sidestep any endianness issues. The weight field is currently encoded as the weight value minus one, because weight of 0 is invalid. This same trick is impossible for the new weight_high field, because zero must mean actual zero. With this in place: - Old userspace is guaranteed to carry weight_high of 0, therefore configuring 8-bit weights as appropriate. When dumping nexthops with 16-bit weight, it would only show the lower 8 bits. But configuring such nexthops implies existence of userspace aware of the extension in the first place. - New userspace talking to an old kernel will work as long as it only attempts to configure 8-bit weights, where the high-order bits are zero. Old kernel will bounce attempts at configuring >8-bit weights. Renaming reserved fields as they are allocated for some purpose is commonly done in Linux. Whoever touches a reserved field is doing so at their own risk. nexthop_grp::resvd1 in particular is currently used by at least strace, however they carry an own copy of UAPI headers, and the conversion should be trivial. A helper is provided for decoding the weight out of the two fields. Forcing a conversion seems preferable to bending backwards and introducing anonymous unions or whatever. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://patch.msgid.link/483e2fcf4beb0d9135d62e7d27b46fa2685479d4.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12net: nexthop: Add flag to assert that NHGRP reserved fields are zeroPetr Machata
There are many unpatched kernel versions out there that do not initialize the reserved fields of struct nexthop_grp. The issue with that is that if those fields were to be used for some end (i.e. stop being reserved), old kernels would still keep sending random data through the field, and a new userspace could not rely on the value. In this patch, use the existing NHA_OP_FLAGS, which is currently inbound only, to carry flags back to the userspace. Add a flag to indicate that the reserved fields in struct nexthop_grp are zeroed before dumping. This is reliant on the actual fix from commit 6d745cd0e972 ("net: nexthop: Initialize all fields in dumped nexthops"). Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/21037748d4f9d8ff486151f4c09083bcf12d5df8.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-12nsfs: fix ioctl declarationChristian Brauner
The kernel is writing an object of type __u64, so the ioctl has to be defined to _IOR(NSIO, 0x5, __u64) instead of _IO(NSIO, 0x5). Reported-by: Dmitry V. Levin <ldv@strace.io> Link: https://lore.kernel.org/r/20240730164554.GA18486@altlinux.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-12ethtool: rss: support skipping contexts during dumpJakub Kicinski
Applications may want to deal with dynamic RSS contexts only. So dumping context 0 will be counter-productive for them. Support starting the dump from a given context ID. Alternative would be to implement a dump flag to skip just context 0, not sure which is better... Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-12media: uapi: videodev2: Add V4L2_META_FMT_RK_ISP1_EXT_PARAMSJacopo Mondi
The rkisp1 driver stores ISP configuration parameters in the fixed rkisp1_params_cfg structure. As the members of the structure are part of the userspace API, the structure layout is immutable and cannot be extended further. Introducing new parameters or modifying the existing ones would change the buffer layout and cause breakages in existing applications. The allow for future extensions to the ISP parameters, introduce a new extensible parameters format, with a new format 4CC. Document usage of the new format in the rkisp1 admin guide. Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com> Reviewed-by: Paul Elder <paul.elder@ideasonboard.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
2024-08-12media: uapi: rkisp1-config: Add extensible params formatJacopo Mondi
Add to the rkisp1-config.h header data types and documentation of the extensible parameters format. Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Reviewed-by: Paul Elder <paul.elder@ideasonboard.com> Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
2024-08-09Merge patch series "nsfs: iterate through mount namespaces"Christian Brauner
Christian Brauner <brauner@kernel.org> says: Recently, we added the ability to list mounts in other mount namespaces and the ability to retrieve namespace file descriptors without having to go through procfs by deriving them from pidfds. This extends nsfs in two ways: (1) Add the ability to retrieve information about a mount namespace via NS_MNT_GET_INFO. This will return the mount namespace id and the number of mounts currently in the mount namespace. The number of mounts can be used to size the buffer that needs to be used for listmount() and is in general useful without having to actually iterate through all the mounts. The structure is extensible. (2) Add the ability to iterate through all mount namespaces over which the caller holds privilege returning the file descriptor for the next or previous mount namespace. To retrieve a mount namespace the caller must be privileged wrt to it's owning user namespace. This means that PID 1 on the host can list all mounts in all mount namespaces or that a container can list all mounts of its nested containers. Optionally pass a structure for NS_MNT_GET_INFO with NS_MNT_GET_{PREV,NEXT} to retrieve information about the mount namespace in one go. (1) and (2) can be implemented for other namespace types easily. Together with recent api additions this means one can iterate through all mounts in all mount namespaces without ever touching procfs. Here's a sample program list_all_mounts_everywhere.c: // SPDX-License-Identifier: GPL-2.0-or-later #define _GNU_SOURCE #include <asm/unistd.h> #include <assert.h> #include <errno.h> #include <fcntl.h> #include <getopt.h> #include <linux/stat.h> #include <sched.h> #include <stddef.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> #include <sys/param.h> #include <sys/pidfd.h> #include <sys/stat.h> #include <sys/statfs.h> #define die_errno(format, ...) \ do { \ fprintf(stderr, "%m | %s: %d: %s: " format "\n", __FILE__, \ __LINE__, __func__, ##__VA_ARGS__); \ exit(EXIT_FAILURE); \ } while (0) /* Get the id for a mount namespace */ #define NS_GET_MNTNS_ID _IO(0xb7, 0x5) /* Get next mount namespace. */ struct mnt_ns_info { __u32 size; __u32 nr_mounts; __u64 mnt_ns_id; }; #define MNT_NS_INFO_SIZE_VER0 16 /* size of first published struct */ /* Get information about namespace. */ #define NS_MNT_GET_INFO _IOR(0xb7, 10, struct mnt_ns_info) /* Get next namespace. */ #define NS_MNT_GET_NEXT _IOR(0xb7, 11, struct mnt_ns_info) /* Get previous namespace. */ #define NS_MNT_GET_PREV _IOR(0xb7, 12, struct mnt_ns_info) #define PIDFD_GET_MNT_NAMESPACE _IO(0xFF, 3) #define STATX_MNT_ID_UNIQUE 0x00004000U /* Want/got extended stx_mount_id */ #define __NR_listmount 458 #define __NR_statmount 457 /* * @mask bits for statmount(2) */ #define STATMOUNT_SB_BASIC 0x00000001U /* Want/got sb_... */ #define STATMOUNT_MNT_BASIC 0x00000002U /* Want/got mnt_... */ #define STATMOUNT_PROPAGATE_FROM 0x00000004U /* Want/got propagate_from */ #define STATMOUNT_MNT_ROOT 0x00000008U /* Want/got mnt_root */ #define STATMOUNT_MNT_POINT 0x00000010U /* Want/got mnt_point */ #define STATMOUNT_FS_TYPE 0x00000020U /* Want/got fs_type */ #define STATMOUNT_MNT_NS_ID 0x00000040U /* Want/got mnt_ns_id */ #define STATMOUNT_MNT_OPTS 0x00000080U /* Want/got mnt_opts */ struct statmount { __u32 size; /* Total size, including strings */ __u32 mnt_opts; __u64 mask; /* What results were written */ __u32 sb_dev_major; /* Device ID */ __u32 sb_dev_minor; __u64 sb_magic; /* ..._SUPER_MAGIC */ __u32 sb_flags; /* SB_{RDONLY,SYNCHRONOUS,DIRSYNC,LAZYTIME} */ __u32 fs_type; /* [str] Filesystem type */ __u64 mnt_id; /* Unique ID of mount */ __u64 mnt_parent_id; /* Unique ID of parent (for root == mnt_id) */ __u32 mnt_id_old; /* Reused IDs used in proc/.../mountinfo */ __u32 mnt_parent_id_old; __u64 mnt_attr; /* MOUNT_ATTR_... */ __u64 mnt_propagation; /* MS_{SHARED,SLAVE,PRIVATE,UNBINDABLE} */ __u64 mnt_peer_group; /* ID of shared peer group */ __u64 mnt_master; /* Mount receives propagation from this ID */ __u64 propagate_from; /* Propagation from in current namespace */ __u32 mnt_root; /* [str] Root of mount relative to root of fs */ __u32 mnt_point; /* [str] Mountpoint relative to current root */ __u64 mnt_ns_id; __u64 __spare2[49]; char str[]; /* Variable size part containing strings */ }; struct mnt_id_req { __u32 size; __u32 spare; __u64 mnt_id; __u64 param; __u64 mnt_ns_id; }; #define MNT_ID_REQ_SIZE_VER1 32 /* sizeof second published struct */ #define LSMT_ROOT 0xffffffffffffffff /* root mount */ static int __statmount(__u64 mnt_id, __u64 mnt_ns_id, __u64 mask, struct statmount *stmnt, size_t bufsize, unsigned int flags) { struct mnt_id_req req = { .size = MNT_ID_REQ_SIZE_VER1, .mnt_id = mnt_id, .param = mask, .mnt_ns_id = mnt_ns_id, }; return syscall(__NR_statmount, &req, stmnt, bufsize, flags); } static struct statmount *sys_statmount(__u64 mnt_id, __u64 mnt_ns_id, __u64 mask, unsigned int flags) { size_t bufsize = 1 << 15; struct statmount *stmnt = NULL, *tmp = NULL; int ret; for (;;) { tmp = realloc(stmnt, bufsize); if (!tmp) goto out; stmnt = tmp; ret = __statmount(mnt_id, mnt_ns_id, mask, stmnt, bufsize, flags); if (!ret) return stmnt; if (errno != EOVERFLOW) goto out; bufsize <<= 1; if (bufsize >= UINT_MAX / 2) goto out; } out: free(stmnt); printf("statmount failed"); return NULL; } static ssize_t sys_listmount(__u64 mnt_id, __u64 last_mnt_id, __u64 mnt_ns_id, __u64 list[], size_t num, unsigned int flags) { struct mnt_id_req req = { .size = MNT_ID_REQ_SIZE_VER1, .mnt_id = mnt_id, .param = last_mnt_id, .mnt_ns_id = mnt_ns_id, }; return syscall(__NR_listmount, &req, list, num, flags); } int main(int argc, char *argv[]) { #define LISTMNT_BUFFER 10 __u64 list[LISTMNT_BUFFER], last_mnt_id = 0; int ret, pidfd, fd_mntns; struct mnt_ns_info info = {}; pidfd = pidfd_open(getpid(), 0); if (pidfd < 0) die_errno("pidfd_open failed"); fd_mntns = ioctl(pidfd, PIDFD_GET_MNT_NAMESPACE, 0); if (fd_mntns < 0) die_errno("ioctl(PIDFD_GET_MNT_NAMESPACE) failed"); ret = ioctl(fd_mntns, NS_MNT_GET_INFO, &info); if (ret < 0) die_errno("ioctl(NS_GET_MNTNS_ID) failed"); printf("Listing %u mounts for mount namespace %d:%llu\n", info.nr_mounts, fd_mntns, info.mnt_ns_id); for (;;) { ssize_t nr_mounts; next: nr_mounts = sys_listmount(LSMT_ROOT, last_mnt_id, info.mnt_ns_id, list, LISTMNT_BUFFER, 0); if (nr_mounts <= 0) { printf("Finished listing mounts for mount namespace %d:%llu\n\n", fd_mntns, info.mnt_ns_id); ret = ioctl(fd_mntns, NS_MNT_GET_NEXT, 0); if (ret < 0) die_errno("ioctl(NS_MNT_GET_NEXT) failed"); close(ret); ret = ioctl(fd_mntns, NS_MNT_GET_NEXT, &info); if (ret < 0) { if (errno == ENOENT) { printf("Finished listing all mount namespaces\n"); exit(0); } die_errno("ioctl(NS_MNT_GET_NEXT) failed"); } close(fd_mntns); fd_mntns = ret; last_mnt_id = 0; printf("Listing %u mounts for mount namespace %d:%llu\n", info.nr_mounts, fd_mntns, info.mnt_ns_id); goto next; } for (size_t cur = 0; cur < nr_mounts; cur++) { struct statmount *stmnt; last_mnt_id = list[cur]; stmnt = sys_statmount(last_mnt_id, info.mnt_ns_id, STATMOUNT_SB_BASIC | STATMOUNT_MNT_BASIC | STATMOUNT_MNT_ROOT | STATMOUNT_MNT_POINT | STATMOUNT_MNT_NS_ID | STATMOUNT_MNT_OPTS | STATMOUNT_FS_TYPE, 0); if (!stmnt) { printf("Failed to statmount(%llu) in mount namespace(%llu)\n", last_mnt_id, info.mnt_ns_id); continue; } printf("mnt_id(%u/%llu) | mnt_parent_id(%u/%llu): %s @ %s ==> %s with options: %s\n", stmnt->mnt_id_old, stmnt->mnt_id, stmnt->mnt_parent_id_old, stmnt->mnt_parent_id, stmnt->str + stmnt->fs_type, stmnt->str + stmnt->mnt_root, stmnt->str + stmnt->mnt_point, stmnt->str + stmnt->mnt_opts); free(stmnt); } } exit(0); } * patches from https://lore.kernel.org/r/20240719-work-mount-namespace-v1-0-834113cab0d2@kernel.org: nsfs: iterate through mount namespaces file: add fput() cleanup helper fs: add put_mnt_ns() cleanup helper fs: allow mount namespace fd Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-09nsfs: iterate through mount namespacesChristian Brauner
It is already possible to list mounts in other mount namespaces and to retrieve namespace file descriptors without having to go through procfs by deriving them from pidfds. Augment these abilities by adding the ability to retrieve information about a mount namespace via NS_MNT_GET_INFO. This will return the mount namespace id and the number of mounts currently in the mount namespace. The number of mounts can be used to size the buffer that needs to be used for listmount() and is in general useful without having to actually iterate through all the mounts. The structure is extensible. And add the ability to iterate through all mount namespaces over which the caller holds privilege returning the file descriptor for the next or previous mount namespace. To retrieve a mount namespace the caller must be privileged wrt to it's owning user namespace. This means that PID 1 on the host can list all mounts in all mount namespaces or that a container can list all mounts of its nested containers. Optionally pass a structure for NS_MNT_GET_INFO with NS_MNT_GET_{PREV,NEXT} to retrieve information about the mount namespace in one go. Both ioctls can be implemented for other namespace types easily. Together with recent api additions this means one can iterate through all mounts in all mount namespaces without ever touching procfs. Link: https://lore.kernel.org/r/20240719-work-mount-namespace-v1-5-834113cab0d2@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-08bpf/bpf_get,set_sockopt: add option to set TCP-BPF sock ops flagsAlan Maguire
Currently the only opportunity to set sock ops flags dictating which callbacks fire for a socket is from within a TCP-BPF sockops program. This is problematic if the connection is already set up as there is no further chance to specify callbacks for that socket. Add TCP_BPF_SOCK_OPS_CB_FLAGS to bpf_setsockopt() and bpf_getsockopt() to allow users to specify callbacks later, either via an iterator over sockets or via a socket-specific program triggered by a setsockopt() on the socket. Previous discussion on this here [1]. [1] https://lore.kernel.org/bpf/f42f157b-6e52-dd4d-3d97-9b86c84c0b00@oracle.com/ Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/r/20240808150558.1035626-2-alan.maguire@oracle.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-08-08Merge tag 'drm-misc-next-2024-08-01' of ↵Daniel Vetter
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.12: UAPI Changes: virtio: - Define DRM capset Cross-subsystem Changes: dma-buf: - heaps: Clean up documentation printk: - Pass description to kmsg_dump() Core Changes: CI: - Update IGT tests - Point upstream repo to GitLab instance modesetting: - Introduce Power Saving Policy property for connectors - Add might_fault() to drm_modeset_lock priming - Add dynamic per-crtc vblank configuration support panic: - Avoid build-time interference with framebuffer console docs: - Document Colorspace property scheduler: - Remove full_recover from drm_sched_start TTM: - Make LRU walk restartable after dropping locks - Allow direct reclaim to allocate local memory Driver Changes: amdgpu: - Support Power Saving Policy connector property ast: - astdp: Support AST2600 with VGA; Clean up HPD bridge: - Silence error message on -EPROBE_DEFER - analogix: Clean aup - bridge-connector: Fix double free - lt6505: Disable interrupt when powered off - tc358767: Make default DP port preemphasis configurable gma500: - Update i2c terminology ivpu: - Add MODULE_FIRMWARE() lcdif: - Fix pixel clock loongson: - Use GEM refcount over TTM's mgag200: - Improve BMC handling - Support VBLANK intterupts nouveau: - Refactor and clean up internals - Use GEM refcount over TTM's panel: - Shutdown fixes plus documentation - Refactor several drivers for better code sharing - boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus DT; Fix porch parameter - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT - jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor for code sharing sti: - Fix module owner stm: - Avoid UAF wih managed plane and CRTC helpers - Fix module owner - Fix error handling in probe - Depend on COMMON_CLK - ltdc: Fix transparency after disabling plane; Remove unused interrupt tegra: - Call drm_atomic_helper_shutdown() v3d: - Clean up perfmon vkms: - Clean up Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240801121406.GA102996@linux.fritz.box
2024-08-08media: uapi/linux/cec.h: cec_msg_set_reply_to: zero flagsHans Verkuil
The cec_msg_set_reply_to() helper function never zeroed the struct cec_msg flags field, this can cause unexpected behavior if flags was uninitialized to begin with. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Fixes: 0dbacebede1e ("[media] cec: move the CEC framework out of staging and to media") Cc: <stable@vger.kernel.org> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2024-08-05media: cec: core: add new CEC_MSG_FL_REPLY_VENDOR_ID flagHans Verkuil
If this flag is set, then the reply is expected to consist of the CEC_MSG_VENDOR_COMMAND_WITH_ID opcode followed by the Vendor ID (as used in bytes 1-4 of the message), followed by the struct cec_msg reply field. Note that this assumes that the byte after the Vendor ID is a vendor-specific opcode. This flag makes it easier to wait for replies to vendor commands, using the same CEC framework support for waiting for regular replies. Support for this flag is indicated by setting the new CEC_CAP_REPLY_VENDOR_ID capability. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2024-07-31binder: frozen notificationYu-Ting Tseng
Frozen processes present a significant challenge in binder transactions. When a process is frozen, it cannot, by design, accept and/or respond to binder transactions. As a result, the sender needs to adjust its behavior, such as postponing transactions until the peer process unfreezes. However, there is currently no way to subscribe to these state change events, making it impossible to implement frozen-aware behaviors efficiently. Introduce a binder API for subscribing to frozen state change events. This allows programs to react to changes in peer process state, mitigating issues related to binder transactions sent to frozen processes. Implementation details: For a given binder_ref, the state of frozen notification can be one of the followings: 1. Userspace doesn't want a notification. binder_ref->freeze is null. 2. Userspace wants a notification but none is in flight. list_empty(&binder_ref->freeze->work.entry) = true 3. A notification is in flight and waiting to be read by userspace. binder_ref_freeze.sent is false. 4. A notification was read by userspace and kernel is waiting for an ack. binder_ref_freeze.sent is true. When a notification is in flight, new state change events are coalesced into the existing binder_ref_freeze struct. If userspace hasn't picked up the notification yet, the driver simply rewrites the state. Otherwise, the notification is flagged as requiring a resend, which will be performed once userspace acks the original notification that's inflight. See https://r.android.com/3070045 for how userspace is going to use this feature. Signed-off-by: Yu-Ting Tseng <yutingtseng@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20240709070047.4055369-4-yutingtseng@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-30Merge tag 'v6.11-rc1' into for-6.12Tejun Heo
Linux 6.11-rc1
2024-07-29x86/elf: Add a new FPU buffer layout info to x86 core filesVignesh Balasubramanian
Add a new .note section containing type, size, offset and flags of every xfeature that is present. This information will be used by debuggers to understand the XSAVE layout of the machine where the core file has been dumped, and to read XSAVE registers, especially during cross-platform debugging. The XSAVE layouts of modern AMD and Intel CPUs differ, especially since Memory Protection Keys and the AVX-512 features have been inculcated into the AMD CPUs. Since AMD never adopted (and hence never left room in the XSAVE layout for) the Intel MPX feature, tools like GDB had assumed a fixed XSAVE layout matching that of Intel (based on the XCR0 mask). Hence, core dumps from AMD CPUs didn't match the known size for the XCR0 mask. This resulted in GDB and other tools not being able to access the values of the AVX-512 and PKRU registers on AMD CPUs. To solve this, an interim solution has been accepted into GDB, and is already a part of GDB 14, see https://sourceware.org/pipermail/gdb-patches/2023-March/198081.html. But it depends on heuristics based on the total XSAVE register set size and the XCR0 mask to infer the layouts of the various register blocks for core dumps, and hence, is not a foolproof mechanism to determine the layout of the XSAVE area. Therefore, add a new core dump note in order to allow GDB/LLDB and other relevant tools to determine the layout of the XSAVE area of the machine where the corefile was dumped. The new core dump note (which is being proposed as a per-process .note section), NT_X86_XSAVE_LAYOUT (0x205) contains an array of structures. Each structure describes an individual extended feature containing offset, size and flags in this format: struct x86_xfeat_component { u32 type; u32 size; u32 offset; u32 flags; }; and in an independent manner, allowing for future extensions without depending on hw arch specifics like CPUID etc. [ bp: Massage commit message, zap trailing whitespace. ] Co-developed-by: Jini Susan George <jinisusan.george@amd.com> Signed-off-by: Jini Susan George <jinisusan.george@amd.com> Co-developed-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Signed-off-by: Vignesh Balasubramanian <vigbalas@amd.com> Link: https://lore.kernel.org/r/20240725161017.112111-2-vigbalas@amd.com
2024-07-29Merge drm/drm-next into drm-misc-nextThomas Zimmermann
Backmerging to get a late RC of v6.10 before moving into v6.11. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-07-29spi: Enable controllers to extend the SPI protocol with MOSI idle configurationMarcelo Schmitt
The behavior of an SPI controller data output line (SDO or MOSI or COPI (Controller Output Peripheral Input) for disambiguation) is usually not specified when the controller is not clocking out data on SCLK edges. However, there do exist SPI peripherals that require specific MOSI line state when data is not being clocked out of the controller. Conventional SPI controllers may set the MOSI line on SCLK edges then bring it low when no data is going out or leave the line the state of the last transfer bit. More elaborated controllers are capable to set the MOSI idle state according to different configurable levels and thus are more suitable for interfacing with demanding peripherals. Add SPI mode bits to allow peripherals to request explicit MOSI idle state when needed. When supporting a particular MOSI idle configuration, the data output line state is expected to remain at the configured level when the controller is not clocking out data. When a device that needs a specific MOSI idle state is identified, its driver should request the MOSI idle configuration by setting the proper SPI mode bit. Acked-by: Nuno Sa <nuno.sa@analog.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: David Lechner <dlechner@baylibre.com> Tested-by: David Lechner <dlechner@baylibre.com> Signed-off-by: Marcelo Schmitt <marcelo.schmitt@analog.com> Link: https://patch.msgid.link/9802160b5e5baed7f83ee43ac819cb757a19be55.1720810545.git.marcelo.schmitt@analog.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-07-28nbd: add support for rotational devicesWouter Verhelst
The NBD protocol defines the flag NBD_FLAG_ROTATIONAL to flag that the export in use should be treated as a rotational device. Add support for that flag to the kernel driver. Signed-off-by: Wouter Verhelst <w@uter.be> Reviewed-by: Eric Blake <eblake@redhat.com> Link: https://lore.kernel.org/r/20240725164536.1275851-1-w@uter.be Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-25drm/amdkfd: allow users to target recommended SDMA enginesJonathan Kim
Certain GPUs have better copy performance over xGMI on specific SDMA engines depending on the source and destination GPU. Allow users to create SDMA queues on these recommended engines. Close to 2x overall performance has been observed with this optimization. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-25Merge tag 'net-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf and netfilter. A lot of networking people were at a conference last week, busy catching COVID, so relatively short PR. Current release - regressions: - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP Current release - new code bugs: - l2tp: protect session IDR and tunnel session list with one lock, make sure the state is coherent to avoid a warning - eth: bnxt_en: update xdp_rxq_info in queue restart logic - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field Previous releases - regressions: - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len, the field reuses previously un-validated pad Previous releases - always broken: - tap/tun: drop short frames to prevent crashes later in the stack - eth: ice: add a per-VF limit on number of FDIR filters - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash" * tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits) tun: add missing verification for short frame tap: add missing verification for short frame mISDN: Fix a use after free in hfcmulti_tx() gve: Fix an edge case for TSO skb validity check bnxt_en: update xdp_rxq_info in queue restart logic tcp: process the 3rd ACK with sk_socket for TFO/MPTCP selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len bpf: Fix a segment issue when downgrading gso_size net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling MAINTAINERS: make Breno the netconsole maintainer MAINTAINERS: Update bonding entry net: nexthop: Initialize all fields in dumped nexthops net: stmmac: Correct byte order of perfect_match selftests: forwarding: skip if kernel not support setting bridge fdb learning limit tipc: Return non-zero value from tipc_udp_addr2str() on error netfilter: nft_set_pipapo_avx2: disable softinterrupts ice: Fix recipe read procedure ice: Add a per-VF limit on number of FDIR filters net: bonding: correctly annotate RCU in bond_should_notify_peers() ...
2024-07-25Merge tag 'uml-for-linus-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Support for preemption - i386 Rust support - Huge cleanup by Benjamin Berg - UBSAN support - Removal of dead code * tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (41 commits) um: vector: always reset vp->opened um: vector: remove vp->lock um: register power-off handler um: line: always fill *error_out in setup_one_line() um: remove pcap driver from documentation um: Enable preemption in UML um: refactor TLB update handling um: simplify and consolidate TLB updates um: remove force_flush_all from fork_handler um: Do not flush MM in flush_thread um: Delay flushing syscalls until the thread is restarted um: remove copy_context_skas0 um: remove LDT support um: compress memory related stub syscalls while adding them um: Rework syscall handling um: Add generic stub_syscall6 function um: Create signal stack memory assignment in stub_data um: Remove stub-data.h include from common-offsets.h um: time-travel: fix signal blocking race/hang um: time-travel: remove time_exit() ...
2024-07-25Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-07-25 We've added 14 non-merge commits during the last 8 day(s) which contain a total of 19 files changed, 177 insertions(+), 70 deletions(-). The main changes are: 1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and BPF sockhash. Also add test coverage for this case, from Michal Luczaj. 2) Fix a segmentation issue when downgrading gso_size in the BPF helper bpf_skb_adjust_room(), from Fred Li. 3) Fix a compiler warning in resolve_btfids due to a missing type cast, from Liwei Song. 4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte boundary in the fexit_sleep BPF selftest, from Puranjay Mohan. 5) Fix a xsk regression to require a flag when actuating tx_metadata_len, from Stanislav Fomichev. 6) Fix function prototype BTF dumping in libbpf for prototypes that have no input arguments, from Andrii Nakryiko. 7) Fix stacktrace symbol resolution in perf script for BPF programs containing subprograms, from Hou Tao. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len bpf: Fix a segment issue when downgrading gso_size tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids bpf, events: Use prog to emit ksymbol event for main program selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected() af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash bpftool: Fix typo in usage help libbpf: Fix no-args func prototype BTF dumping syntax MAINTAINERS: Update powerpc BPF JIT maintainers MAINTAINERS: Update email address of Naveen selftests/bpf: fexit_sleep: Fix stack allocation for arm64 ==================== Link: https://patch.msgid.link/20240725114312.32197-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-25xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_lenStanislav Fomichev
Julian reports that commit 341ac980eab9 ("xsk: Support tx_metadata_len") can break existing use cases which don't zero-initialize xdp_umem_reg padding. Introduce new XDP_UMEM_TX_METADATA_LEN to make sure we interpret the padding as tx_metadata_len only when being explicitly asked. Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len") Reported-by: Julian Schindel <mail@arctic-alpaca.de> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20240713015253.121248-2-sdf@fomichev.me
2024-07-24drm/virtio: Add DRM capset definitionDmitry Osipenko
Define DRM native context capset in the VirtIO-GPU protocol header. Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240714205502.3409718-1-dmitry.osipenko@collabora.com
2024-07-24Merge tag 'random-6.11-rc1-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: "This adds getrandom() support to the vDSO. First, it adds a new kind of mapping to mmap(2), MAP_DROPPABLE, which lets the kernel zero out pages anytime under memory pressure, which enables allocating memory that never gets swapped to disk but also doesn't count as being mlocked. Then, the vDSO implementation of getrandom() is introduced in a generic manner and hooked into random.c. Next, this is implemented on x86. (Also, though it's not ready for this pull, somebody has begun an arm64 implementation already) Finally, two vDSO selftests are added. There are also two housekeeping cleanup commits" * tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: MAINTAINERS: add random.h headers to RNG subsection random: note that RNDGETPOOL was removed in 2.6.9-rc2 selftests/vDSO: add tests for vgetrandom x86: vdso: Wire up getrandom() vDSO implementation random: introduce generic vDSO getrandom() implementation mm: add MAP_DROPPABLE for designating always lazily freeable mappings
2024-07-21Merge tag 'mm-stable-2024-07-21-14-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...
2024-07-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates LoongArch: - Add paravirt steal time support - Add support for KVM_DIRTY_LOG_INITIALLY_SET - Add perf kvm-stat support for loongarch RISC-V: - Redirect AMO load/store access fault traps to guest - perf kvm stat support - Use guest files for IMSIC virtualization, when available s390: - Assortment of tiny fixes which are not time critical x86: - Fixes for Xen emulation - Add a global struct to consolidate tracking of host values, e.g. EFER - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX - Print the name of the APICv/AVIC inhibits in the relevant tracepoint - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor - Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop - Update to the newfangled Intel CPU FMS infrastructure - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored - Misc cleanups x86 - MMU: - Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages - Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards x86 - AMD: - Make per-CPU save_area allocations NUMA-aware - Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code - Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data x86 - Intel: - Remove an unnecessary EPT TLB flush when enabling hardware - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1) - KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows Generic: - Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them) - New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl - Enable halt poll shrinking by default, as Intel found it to be a clear win - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out() - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout Selftests: - Remove dead code in the memslot modification stress test - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs - Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) crypto: ccp: Add the SNP_VLEK_LOAD command KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops KVM: x86: Replace static_call_cond() with static_call() KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event x86/sev: Move sev_guest.h into common SEV header KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event KVM: x86: Suppress MMIO that is triggered during task switch emulation KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory() KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault" KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory KVM: Document KVM_PRE_FAULT_MEMORY ioctl mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE perf kvm: Add kvm-stat for loongarch64 LoongArch: KVM: Add PV steal time support in guest side ...
2024-07-20Merge tag 'landlock-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock updates from Mickaël Salaün: "This simplifies code and improves documentation" * tag 'landlock-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: landlock: Various documentation improvements landlock: Clarify documentation for struct landlock_ruleset_attr landlock: Use bit-fields for storing handled layer access masks
2024-07-19Merge tag 'char-misc-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc and other driver updates from Greg KH: "Here is the "big" set of char/misc and other driver subsystem changes for 6.11-rc1. Nothing major in here, just loads of new drivers and updates. Included in here are: - IIO api updates and new drivers added - wait_interruptable_timeout() api cleanups for some drivers - MODULE_DESCRIPTION() additions for loads of drivers - parport out-of-bounds fix - interconnect driver updates and additions - mhi driver updates and additions - w1 driver fixes - binder speedups and fixes - eeprom driver updates - coresight driver updates - counter driver update - new misc driver additions - other minor api updates All of these, EXCEPT for the final Kconfig build fix for 32bit systems, have been in linux-next for a while with no reported issues. The Kconfig fixup went in 29 hours ago, so might have missed the latest linux-next, but was acked by everyone involved" * tag 'char-misc-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (330 commits) misc: Kconfig: exclude mrvl-cn10k-dpi compilation for 32-bit systems misc: delete Makefile.rej binder: fix hang of unregistered readers misc: Kconfig: add a new dependency for MARVELL_CN10K_DPI virtio: add missing MODULE_DESCRIPTION() macro agp: uninorth: add missing MODULE_DESCRIPTION() macro spmi: add missing MODULE_DESCRIPTION() macros dev/parport: fix the array out-of-bounds risk samples: configfs: add missing MODULE_DESCRIPTION() macro misc: mrvl-cn10k-dpi: add Octeon CN10K DPI administrative driver misc: keba: Fix missing AUXILIARY_BUS dependency slimbus: Fix struct and documentation alignment in stream.c MAINTAINERS: CC dri-devel list on Qualcomm FastRPC patches misc: fastrpc: use coherent pool for untranslated Compute Banks misc: fastrpc: support complete DMA pool access to the DSP misc: fastrpc: add missing MODULE_DESCRIPTION() macro misc: fastrpc: Add missing dev_err newlines misc: fastrpc: Use memdup_user() nvmem: core: Implement force_ro sysfs attribute nvmem: Use sysfs_emit() for type attribute ...
2024-07-19random: note that RNDGETPOOL was removed in 2.6.9-rc2Jason A. Donenfeld
RNDGETPOOL was thankfully removed twenty years ago, but it's stuck around in headers. Probably removing it from uapi headers isn't great in case there are some weird users out there, but we should at least mark this as having been removed, to save future readers the same goose chase I just went on. Link: https://lore.kernel.org/all/E1By1St-0001TS-Qj@thunk.org/ Link: https://lore.kernel.org/all/Pine.LNX.4.58.0409130937050.4094@ppc970.osdl.org/ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-19random: introduce generic vDSO getrandom() implementationJason A. Donenfeld
Provide a generic C vDSO getrandom() implementation, which operates on an opaque state returned by vgetrandom_alloc() and produces random bytes the same way as getrandom(). This has the following API signature: ssize_t vgetrandom(void *buffer, size_t len, unsigned int flags, void *opaque_state, size_t opaque_len); The return value and the first three arguments are the same as ordinary getrandom(), while the last two arguments are a pointer to the opaque allocated state and its size. Were all five arguments passed to the getrandom() syscall, nothing different would happen, and the functions would have the exact same behavior. The actual vDSO RNG algorithm implemented is the same one implemented by drivers/char/random.c, using the same fast-erasure techniques as that. Should the in-kernel implementation change, so too will the vDSO one. It requires an implementation of ChaCha20 that does not use any stack, in order to maintain forward secrecy if a multi-threaded program forks (though this does not account for a similar issue with SA_SIGINFO copying registers to the stack), so this is left as an architecture-specific fill-in. Stack-less ChaCha20 is an easy algorithm to implement on a variety of architectures, so this shouldn't be too onerous. Initially, the state is keyless, and so the first call makes a getrandom() syscall to generate that key, and then uses it for subsequent calls. By keeping track of a generation counter, it knows when its key is invalidated and it should fetch a new one using the syscall. Later, more than just a generation counter might be used. Since MADV_WIPEONFORK is set on the opaque state, the key and related state is wiped during a fork(), so secrets don't roll over into new processes, and the same state doesn't accidentally generate the same random stream. The generation counter, as well, is always >0, so that the 0 counter is a useful indication of a fork() or otherwise uninitialized state. If the kernel RNG is not yet initialized, then the vDSO always calls the syscall, because that behavior cannot be emulated in userspace, but fortunately that state is short lived and only during early boot. If it has been initialized, then there is no need to inspect the `flags` argument, because the behavior does not change post-initialization regardless of the `flags` value. Since the opaque state passed to it is mutated, vDSO getrandom() is not reentrant, when used with the same opaque state, which libc should be mindful of. The function works over an opaque per-thread state of a particular size, which must be marked VM_WIPEONFORK, VM_DONTDUMP, VM_NORESERVE, and VM_DROPPABLE for proper operation. Over time, the nuances of these allocations may change or grow or even differ based on architectural features. The opaque state passed to vDSO getrandom() must be allocated using the mmap_flags and mmap_prot parameters provided by the vgetrandom_opaque_params struct, which also contains the size of each state. That struct can be obtained with a call to vgetrandom(NULL, 0, 0, &params, ~0UL). Then, libc can call mmap(2) and slice up the returned array into a state per each thread, while ensuring that no single state straddles a page boundary. Libc is expected to allocate a chunk of these on first use, and then dole them out to threads as they're created, allocating more when needed. vDSO getrandom() provides the ability for userspace to generate random bytes quickly and safely, and is intended to be integrated into libc's thread management. As an illustrative example, the introduced code in the vdso_test_getrandom self test later in this series might be used to do the same outside of libc. In a libc the various pthread-isms are expected to be elided into libc internals. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-19mm: add MAP_DROPPABLE for designating always lazily freeable mappingsJason A. Donenfeld
The vDSO getrandom() implementation works with a buffer allocated with a new system call that has certain requirements: - It shouldn't be written to core dumps. * Easy: VM_DONTDUMP. - It should be zeroed on fork. * Easy: VM_WIPEONFORK. - It shouldn't be written to swap. * Uh-oh: mlock is rlimited. * Uh-oh: mlock isn't inherited by forks. - It shouldn't reserve actual memory, but it also shouldn't crash when page faulting in memory if none is available * Uh-oh: VM_NORESERVE means segfaults. It turns out that the vDSO getrandom() function has three really nice characteristics that we can exploit to solve this problem: 1) Due to being wiped during fork(), the vDSO code is already robust to having the contents of the pages it reads zeroed out midway through the function's execution. 2) In the absolute worst case of whatever contingency we're coding for, we have the option to fallback to the getrandom() syscall, and everything is fine. 3) The buffers the function uses are only ever useful for a maximum of 60 seconds -- a sort of cache, rather than a long term allocation. These characteristics mean that we can introduce VM_DROPPABLE, which has the following semantics: a) It never is written out to swap. b) Under memory pressure, mm can just drop the pages (so that they're zero when read back again). c) It is inherited by fork. d) It doesn't count against the mlock budget, since nothing is locked. e) If there's not enough memory to service a page fault, it's not fatal, and no signal is sent. This way, allocations used by vDSO getrandom() can use: VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE And there will be no problem with OOMing, crashing on overcommitment, using memory when not in use, not wiping on fork(), coredumps, or writing out to swap. In order to let vDSO getrandom() use this, expose these via mmap(2) as MAP_DROPPABLE. Note that this involves removing the MADV_FREE special case from sort_folio(), which according to Yu Zhao is unnecessary and will simply result in an extra call to shrink_folio_list() in the worst case. The chunk removed reenables the swapbacked flag, which we don't want for VM_DROPPABLE, and we can't conditionalize it here because there isn't a vma reference available. Finally, the provided self test ensures that this is working as desired. Cc: linux-mm@kvack.org Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2024-07-19Merge tag 'for-linus-iommufd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: - The iova_bitmap logic for efficiently reporting dirty pages back to userspace has a few more tricky corner case bugs that have been resolved and backed with new tests. The revised version has simpler logic. - Shared branch with iommu for handle support when doing domain attach. Handles allow the domain owner to include additional private data on a per-device basis. - IO Page Fault Reporting to userspace via iommufd. Page faults can be generated on fault capable HWPTs when a translation is not present. Routing them to userspace would allow a VMM to be able to virtualize them into an emulated vIOMMU. This is the next step to fully enabling vSVA support. * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (26 commits) iommufd: Put constants for all the uAPI enums iommufd: Fix error pointer checking iommufd: Add check on user response code iommufd: Remove IOMMUFD_PAGE_RESP_FAILURE iommufd: Require drivers to supply the cache_invalidate_user ops iommufd/selftest: Add coverage for IOPF test iommufd/selftest: Add IOPF support for mock device iommufd: Associate fault object with iommufd_hw_pgtable iommufd: Fault-capable hwpt attach/detach/replace iommufd: Add iommufd fault object iommufd: Add fault and response message definitions iommu: Extend domain attach group with handle support iommu: Add attach handle to struct iopf_group iommu: Remove sva handle list iommu: Introduce domain attachment handle iommufd/iova_bitmap: Remove iterator logic iommufd/iova_bitmap: Dynamic pinning on iova_bitmap_set() iommufd/iova_bitmap: Consolidate iova_bitmap_set exit conditionals iommufd/iova_bitmap: Move initial pinning to iova_bitmap_for_each() iommufd/iova_bitmap: Cache mapped length in iova_bitmap_map struct ...
2024-07-18Merge tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Add support for large folios - Implement rpcrdma generic device removal notification - Add client support for attribute delegations - Use a LAYOUTRETURN during reboot recovery to report layoutstats and errors - Improve throughput for random buffered writes - Add NVMe support to pnfs/blocklayout Bugfixes: - Fix rpcrdma_reqs_reset() - Avoid soft lockups when using UDP - Fix an nfs/blocklayout premature PR key unregestration - Another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server - Do not extend writes to the entire folio - Pass explicit offset and count values to tracepoints - Fix a race to wake up sleeping SUNRPC sync tasks - Fix gss_status tracepoint output Cleanups: - Add missing MODULE_DESCRIPTION() macros - Add blocklayout / SCSI layout tracepoints - Remove asm-generic headers from xprtrdma verbs.c - Remove unused 'struct mnt_fhstatus' - Other delegation related cleanups - Other folio related cleanups - Other pNFS related cleanups - Other xprtrdma cleanups" * tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits) SUNRPC: Fixup gss_status tracepoint error output SUNRPC: Fix a race to wake a sync task nfs: split nfs_read_folio nfs: pass explicit offset/count to trace events nfs: do not extend writes to the entire folio nfs/blocklayout: add support for NVMe nfs: remove nfs_page_length nfs: remove the unused max_deviceinfo_size field from struct pnfs_layoutdriver_type nfs: don't reuse partially completed requests in nfs_lock_and_join_requests nfs: move nfs_wait_on_request to write.c nfs: fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requests nfs: fold nfs_folio_find_and_lock_request into nfs_lock_and_join_requests nfs: simplify nfs_folio_find_and_lock_request nfs: remove nfs_folio_private_request nfs: remove dead code for the old swap over NFS implementation NFSv4.1 another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server nfs: Block on write congestion nfs: Properly initialize server->writeback nfs: Drop pointless check from nfs_commit_release_pages() nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg ...
2024-07-18Merge tag 'drm-next-2024-07-18' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "There's a lot of stuff in here, amd, i915 and xe have new platform work, lots of core rework around EDID handling, some new COMPILE_TEST options, maintainer changes and a lots of other stuff. Summary: core: - deprecate DRM data and return 0 date - connector: Create a set of helpers to help with HDMI support - Remove driver owner assignments - Allow more drivers to compile with COMPILE_TEST - Conversions to drm_edid - Sprinkle MODULE_DESCRIPTIONS everywhere they are missing - Remove drm_mm_replace_node - print: Add a drm prefix to warn level messages too, remove ___drm_dbg, consolidate prefix handling - New monochrome TV mode variant ttm: - improve number of page faults on some platforms - fix test builds under PREEMPT_RT - more test coverage ci: - Require a more recent version of mesa - improve farm setup and test generation dma-buf: - warn if reserving 0 fence slots - internal API heap enhancements fbdev: - Create memory manager optimized fbdev emulation panic: - Allow to select fonts - improve drm_fb_dma_get_scanout_buffer - Allow to dump kmsg to the screen bridge: - Remove redundant checks on bridge->encoder - Remove drm_bridge_chain_mode_fixup - bridge-connector: Plumb in the new HDMI helper - analogix_dp: Various improvements, handle AUX transfers timeout - samsung-dsim: Fix timings calculation - tc358767: Plenty of small fixes, fix no connector attach, fix clocks - sii902x: state validation improvements panels: - Switch panels from register table initialization to proper code - Now that the panel code tracks the panel state, remove every ad-hoc implementation in the panel drivers - More cleanup of prepare / enable state tracking in drivers - edp: Drop legacy panel compatibles - simple-bridge: Switch to devm_drm_bridge_add - New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology 13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0, BOE nv110wum-l60, IVO t109nw41, WL-355608-A8, PrimeView PM070WL4, Lincoln Technologies LCD197, Ortustech COM35H3P70ULC, AUO G104STN01, K&d kd101ne3-40ti amdgpu: - DCN 4.0.x support - GC 12.0 support - GMC 12.0 support - SDMA 7.0 support - MES12 support - MMHUB 4.1 support - GFX12 modifier and DCC support - lots of IP fixes/updates amdkfd: - Contiguous VRAM allocations - GC 12.0 support - SDMA 7.0 support - SR-IOV fixes - KFD GFX ALU exceptions i915: - Battlemage Xe2 HPD display enablement - Panel Replay enabling - DP AUX-less ALPM/LOBF - Enable link training failure fallback for DP MST links - CMRR (Content Match Refresh Rate) enabling - Increase ADL-S/ADL-P/DG2+ max TMDS bitrate to 6 Gbps - Enable eDP AUX based HDR backlight - Support replaying GPU hangs with captured context image - Automate CCS Mode setting during engine resets - lots of refactoring - Support replaying GPU hangs with captured context image - Increase FLR timeout from 3s to 9s - Enable w/a 16021333562 for DG2, MTL and ARL [guc] xe: - update MAINATINERS - New uapi adding OA functionality to Xe - expose l3 bank mask - fix display detect on ADL-N - runtime PM Fixes - Fix silent backmerge issues - More prep for SR-IOV - HWmon additions - per client usage info - Rework GPU page fault handling - Drop EXEC_QUEUE_FLAG_BANNED - Add BMG PCI IDs - Scheduler fixes and improvements - Rename xe_exec_queue::compute to xe_exec_queue::lr - Use ttm_uncached for BO with NEEDS_UC flag - Rename xe perf layer as xe observation layer - lots of refactoring radeon: - Backlight workaround for iMac - Silence UBSAN flex array warnings msm: - Validate registers XML description against schema in CI - core/dpu: SM7150 support - mdp5: Add support for MSM8937 - gpu: Add param for userspace to know if raytracing is supported - gpu: X185 support (aka gpu in X1 laptop chips) - gpu: a505 support ivpu: - hardware scheduler support - profiling support - improvements to the platform support layer - firmware handling improvements - clocks/power mgmt improvements - scheduler/logging improvements habanalabs: - Gradual sleep in polling memory macro - Reduce Gaudi2 MSI-X interrupt count to 128 - Add Gaudi2-D revision support - Add timestamp to CPLD info - Gaudi2: Assume hard-reset by firmware upon MC SEI severe error - Align Gaudi2 interrupt names - Check for errors after preboot is ready - Change habanalabs maintainer and git repo path mgag200: - refactoring and improvements - Add BMC output - enable polling nouveau: - add registry command line v3d: - perf counters improvements zynqmp: - irq and debugfs improvements atmel-hlcdc: - Support XLCDC in sam9x7 mipi-dbi: - Remove mipi_dbi_machine_little_endian - make SPI bits per word configurable - support RGB888 - allow pixel formats to be specified in the DT sun4i: - Rework the blender setup for DE2 panfrost: - Enable MT8188 support vc4: - Monochrome TV support exynos: - fix fallback mode regression - fix memory leak - Use drm_edid_duplicate() instead of kmemdup() etnaviv: - fix i.MX8MP NPU clock gating - workaround FE register cdc issues on some cores - fix DMA sync handling for cached buffers - fix job timeout handling - keep TS enabled on MMUv2 cores for improved performance mediatek: - Convert to platform remove callback returning void- - Drop chain_mode_fixup call in mode_valid() - Fixes the errors of MediaTek display driver found by IGT - Add display support for the MT8365-EVK board - Fix bit depth overwritten for mtk_ovl_set bit_depth() - Fix possible_crtcs calculation - Fix spurious kfree() ast: - refactor mode setting code stm: - Add LVDS support - DSI PHY updates" * tag 'drm-next-2024-07-18' of https://gitlab.freedesktop.org/drm/kernel: (2501 commits) drm/amdgpu/mes12: add missing opcode string drm/amdgpu/mes11: update opcode strings Revert "drm/amd/display: Reset freesync config before update new state" drm/omap: Restrict compile testing to PAGE_SIZE less than 64KB drm/xe: Drop trace_xe_hw_fence_free drm/xe/uapi: Rename xe perf layer as xe observation layer drm/amdgpu: remove exp hw support check for gfx12 drm/amdgpu: timely save bad pages to eeprom after gpu ras reset is completed drm/amdgpu: flush all cached ras bad pages to eeprom drm/amdgpu: select compute ME engines dynamically drm/amd/display: Allow display DCC for DCN401 drm/amdgpu: select compute ME engines dynamically drm/amdgpu/job: Replace DRM_INFO/ERROR logging drm/amdgpu: select compute ME engines dynamically drm/amd/pm: Ignore initial value in smu response register drm/amdgpu: Initialize VF partition mode drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping MAINTAINERS: fix Xinhui's name MAINTAINERS: update powerplay and swsmu drm/qxl: Pin buffer objects for internal mappings ...
2024-07-18landlock: Various documentation improvementsGünther Noack
* Fix some typos, incomplete or confusing phrases. * Split paragraphs where appropriate. * List the same error code multiple times, if it has multiple possible causes. * Bring wording closer to the man page wording, which has undergone more thorough review (esp. for LANDLOCK_ACCESS_FS_WRITE_FILE). * Small semantic clarifications * Call the ephemeral port range "ephemeral" * Clarify reasons for EFAULT in landlock_add_rule() * Clarify @rule_type doc for landlock_add_rule() This is a collection of small fixes which I collected when preparing the corresponding man pages [1]. Cc: Alejandro Colomar <alx@kernel.org> Cc: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Link: https://lore.kernel.org/r/20240715155554.2791018-1-gnoack@google.com [1] Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20240715160328.2792835-2-gnoack@google.com [mic: Add label to link, fix formatting spotted by make htmldocs, synchronize userspace-api documentation's date] Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-07-17Merge tag 'media/v6.11-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - New sensor drivers: gc05a2, gc08a3 and imx283 - New serializer/deserializer drivers: max96714 and max96717 - New JPEG encoder driver: e5010 - Support for Raspberry Pi PiSP Backend (BE) ISP driver - Old documentation for av7110 driver removed, as a new version was added as Documentation/userspace-api/media/dvb/legacy*.rst - atompisp: Linux firmwares are now available, so drop firmware-related task from TODO and update firmware logic - The imx258 driver has gained several improvements - wave5 driver has gained support for HEVC decoding - em28xx gained support for MyGica UTV3 - av7110 budget-patch driver removed - Lots of other cleanups, improvements and fixes * tag 'media/v6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (301 commits) media: raspberrypi: Switch to remove_new media: uapi: pisp_be_config: Add extra config fields media: uapi: pisp_be_config: Re-sort pisp_be_tiles_config media: uapi: pisp_common: Capitalize all macros media: uapi: pisp_common: Add 32 bpp format test media: uapi: pisp_be_config: Drop BIT() from uAPI media: stm32: dcmipp: correct error handling in dcmipp_create_subdevs media: atomisp: Fix spelling mistakes in sh_css_sp.c media: atomisp: Fix spelling mistake in ia_css_debug.c media: atomisp: Fix spelling mistake in hmm_bo.c media: atomisp: Fix spelling mistake in ia_css_eed1_8.host.c media: atomisp: Fix spelling mistake in sh_css_internal.h media: atomisp: Fix spelling mistake "pipline" -> "pipeline" media: atomisp: Remove unused GPIO related defines and APIs media: atomisp: Replace COMPILATION_ERROR_IF() by static_assert() media: atomisp: Clean up unused macros from math_support.h media: atomisp: csi2-bridge: Add DMI quirk for OV5693 on Xiaomi Mipad2 media: atomisp: Update TODO media: atomisp: Prefix firmware paths with "intel/ipu/" media: atomisp: Remove firmware_name module parameter ...
2024-07-17Merge tag 'for-6.11-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "The highlights are new logic behind background block group reclaim, automatic removal of qgroup after removing a subvolume and new 'rescue=' mount options. The rest is optimizations, cleanups and refactoring. User visible features: - dynamic block group reclaim: - tunable framework to avoid situations where eager data allocations prevent creating new metadata chunks due to lack of unallocated space - reuse sysfs knob bg_reclaim_threshold (otherwise used only in zoned mode) for a fixed value threshold - new on/off sysfs knob "dynamic_reclaim" calculating the value based on heuristics, aiming to keep spare working space for relocating chunks but not to needlessly relocate partially utilized block groups or reclaim newly allocated ones - stats are exported in sysfs per block group type, files "reclaim_*" - this may increase IO load at unexpected times but the corner case of no allocatable block groups is known to be worse - automatically remove qgroup of deleted subvolumes: - adjust qgroup removal conditions, make sure all related subvolume data are already removed, or return EBUSY, also take into account setting of sysfs drop_subtree_threshold - also works in squota mode - mount option updates: new modes of 'rescue=' that allow to mount images (read-only) that could have been partially converted by user space tools - ignoremetacsums - invalid metadata checksums are ignored - ignoresuperflags - super block flags that track conversion in progress (like UUID or checksums) Core: - size of struct btrfs_inode is now below 1024 (on a release config), improved memory packing and other secondary effects - switch tracking of open inodes from rb-tree to xarray, minor performance improvement - reduce number of empty transaction commits when there are no dirty data/metadata - memory allocation optimizations (reduced numbers, reordering out of critical sections) - extent map structure optimizations and refactoring, more sanity checks - more subpage in zoned mode preparations or fixes - general snapshot code cleanups, improvements and documentation - tree-checker updates: more file extent ram_bytes fixes, continued - raid-stripe-tree update (not backward compatible): - remove extent encoding field from the structure, can be inferred from other information - requires btrfs-progs 6.9.1 or newer - cleanups and refactoring - error message updates - error handling improvements - return type and parameter cleanups and improvements" * tag 'for-6.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (152 commits) btrfs: fix extent map use-after-free when adding pages to compressed bio btrfs: fix bitmap leak when loading free space cache on duplicate entry btrfs: remove the BUG_ON() inside extent_range_clear_dirty_for_io() btrfs: move extent_range_clear_dirty_for_io() into inode.c btrfs: enhance compression error messages btrfs: fix data race when accessing the last_trans field of a root btrfs: rename the extra_gfp parameter of btrfs_alloc_page_array() btrfs: remove the extra_gfp parameter from btrfs_alloc_folio_array() btrfs: introduce new "rescue=ignoresuperflags" mount option btrfs: introduce new "rescue=ignoremetacsums" mount option btrfs: output the unrecognized super block flags as hex btrfs: remove unused Opt enums btrfs: tree-checker: add extra ram_bytes and disk_num_bytes check btrfs: fix the ram_bytes assignment for truncated ordered extents btrfs: make validate_extent_map() catch ram_bytes mismatch btrfs: ignore incorrect btrfs_file_extent_item::ram_bytes btrfs: cleanup the bytenr usage inside btrfs_extent_item_to_extent_map() btrfs: fix typo in error message in btrfs_validate_super() btrfs: move the direct IO code into its own file btrfs: pass a btrfs_inode to btrfs_set_prop() ...
2024-07-17Merge tag 'dlm-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: - New flag DLM_LSFL_SOFTIRQ_SAFE can be set by code using dlm to indicate callbacks can be run from softirq - Change md-cluster to set DLM_LSFL_SOFTIRQ_SAFE - Clean up for previous changes, e.g. unused code and parameters - Remove custom pre-allocation of rsb structs which is unnecessary with kmem caches - Change idr to xarray for lkb structs in use - Change idr to xarray for rsb structs being recovered - Change outdated naming related to internal rsb states - Fix some incorrect add/remove of rsb on scan list - Use rcu to free rsb structs * tag 'dlm-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: add rcu_barrier before destroy kmem cache dlm: remove DLM_LSFL_SOFTIRQ from exflags fs: dlm: remove unused struct 'dlm_processed_nodes' md-cluster: use DLM_LSFL_SOFTIRQ for dlm_new_lockspace() dlm: implement LSFL_SOFTIRQ_SAFE dlm: introduce DLM_LSFL_SOFTIRQ_SAFE dlm: use LSFL_FS to check for kernel lockspace dlm: use rcu to avoid an extra rsb struct lookup dlm: fix add_scan and del_scan usage dlm: change list and timer names dlm: move recover idr to xarray datastructure dlm: move lkb idr to xarray datastructure dlm: drop own rsb pre allocation mechanism dlm: remove ls_local_handle from struct dlm_ls dlm: remove unused parameter in dlm_midcomms_addr dlm: don't kref_init rsbs created for toss list dlm: remove scand leftovers
2024-07-17Merge tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "This is a light release containing optimizations, code clean-ups, and minor bug fixes. This development cycle focused on work outside of upstream kernel development: - Continuing to build upstream CI for NFSD based on kdevops - Continuing to focus on the quality of NFSD in LTS kernels - Participation in IETF nfsv4 WG discussions about NFSv4 ACLs, directory delegation, and NFSv4.2 COPY offload Notable features for v6.11 that do not come through the NFSD tree include NFS server-side support for the new pNFS NVMe layout type [RFC9561]. Functional testing for pNFS block layouts like this one has been introduced to our kdevops CI harness. Work on improving the resolution of file attribute time stamps in local filesystems is also ongoing tree-wide. As always I am grateful to NFSD contributors, reviewers, testers, and bug reporters who participated during this cycle" * tag 'nfsd-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: nfsd_file_lease_notifier_call gets a file_lease as an argument gss_krb5: Fix the error handling path for crypto_sync_skcipher_setkey MAINTAINERS: Add a bugzilla link for NFSD nfsd: new netlink ops to get/set server pool_mode sunrpc: refactor pool_mode setting code nfsd: allow passing in array of thread counts via netlink nfsd: make nfsd_svc take an array of thread counts sunrpc: fix up the special handling of sv_nrpools == 1 SUNRPC: Add a trace point in svc_xprt_deferred_close NFSD: Support write delegations in LAYOUTGET lockd: Use *-y instead of *-objs in Makefile NFSD: Fix nfsdcld warning svcrdma: Handle ADDR_CHANGE CM event properly svcrdma: Refactor the creation of listener CMA ID NFSD: remove unused structs 'nfsd3_voidargs' NFSD: harden svcxdr_dupstr() and svcxdr_tmpalloc() against integer overflows
2024-07-17crypto: ccp: Add the SNP_VLEK_LOAD commandMichael Roth
When requesting an attestation report a guest is able to specify whether it wants SNP firmware to sign the report using either a Versioned Chip Endorsement Key (VCEK), which is derived from chip-unique secrets, or a Versioned Loaded Endorsement Key (VLEK) which is obtained from an AMD Key Derivation Service (KDS) and derived from seeds allocated to enrolled cloud service providers (CSPs). For VLEK keys, an SNP_VLEK_LOAD SNP firmware command is used to load them into the system after obtaining them from the KDS. Add a corresponding userspace interface so to allow the loading of VLEK keys into the system. See SEV-SNP Firmware ABI 1.54, SNP_VLEK_LOAD for more details. Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-ID: <20240501085210.2213060-21-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-16Merge tag 'net-next-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Not much excitement - a handful of large patchsets (devmem among them) did not make it in time. Core & protocols: - Use local_lock in addition to local_bh_disable() to protect per-CPU resources in networking, a step closer for local_bh_disable() not to act as a big lock on PREEMPT_RT - Use flex array for netdevice priv area, ensure its cache alignment - Add a sysctl knob to allow user to specify a default rto_min at socket init time. Bit of a big hammer but multiple companies were independently carrying such patch downstream so clearly it's useful - Support scheduling transmission of packets based on CLOCK_TAI - Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned off using cpusets - Support multiple L2TPv3 UDP tunnels using the same 5-tuple address - Allow configuration of multipath hash seed, to both allow synchronizing hashing of two routers, and preventing partial accidental sync - Improve TCP compliance with RFC 9293 for simultaneous connect() - Support sending NAT keepalives in IPsec ESP in UDP states. Userspace IKE daemon had to do this before, but the kernel can better keep track of it - Support sending supervision HSR frames with MAC addresses stored in ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled - Introduce IPPROTO_SMC for selecting SMC when socket is created - Allow UDP GSO transmit from devices with no checksum offload - openvswitch: add packet sampling via psample, separating the sampled traffic from "upcall" packets sent to user space for forwarding - nf_tables: shrink memory consumption for transaction objects Things we sprinkled into general kernel code: - Power Sequencing subsystem (used by Qualcomm Bluetooth driver for QCA6390) [ Already merged separately - Linus ] - Add IRQ information in sysfs for auxiliary bus - Introduce guard definition for local_lock - Add aligned flavor of __cacheline_group_{begin, end}() markings for grouping fields in structures BPF: - Notify user space (via epoll) when a struct_ops object is getting detached/unregistered - Add new kfuncs for a generic, open-coded bits iterator - Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head - Support resilient split BTF which cuts down on duplication and makes BTF as compact as possible WRT BTF from modules - Add support for dumping kfunc prototypes from BTF which enables both detecting as well as dumping compilable prototypes for kfuncs - riscv64 BPF JIT improvements in particular to add 12-argument support for BPF trampolines and to utilize bpf_prog_pack for the latter - Add the capability to offload the netfilter flowtable in XDP layer through kfuncs Driver API: - Allow users to configure IRQ tresholds between which automatic IRQ moderation can choose - Expand Power Sourcing (PoE) status with power, class and failure reason. Support setting power limits - Track additional RSS contexts in the core, make sure configuration changes don't break them - Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP data paths - Support updating firmware on SFP modules Tests and tooling: - mptcp: use net/lib.sh to manage netns - TCP-AO and TCP-MD5: replace debug prints used by tests with tracepoints - openvswitch: make test self-contained (don't depend on OvS CLI tools) Drivers: - Ethernet high-speed NICs: - Broadcom (bnxt): - increase the max total outstanding PTP TX packets to 4 - add timestamping statistics support - implement netdev_queue_mgmt_ops - support new RSS context API - Intel (100G, ice, idpf): - implement FEC statistics and dumping signal quality indicators - support E825C products (with 56Gbps PHYs) - nVidia/Mellanox: - support HW-GRO - mlx4/mlx5: support per-queue statistics via netlink - obey the max number of EQs setting in sub-functions - AMD/Solarflare: - support new RSS context API - AMD/Pensando: - ionic: rework fix for doorbell miss to lower overhead and skip it on new HW - Wangxun: - txgbe: support Flow Director perfect filters - Ethernet NICs consumer, embedded and virtual: - Add driver for Tehuti Networks TN40xx chips - Add driver for Meta's internal NIC chips - Add driver for Ethernet MAC on Airoha EN7581 SoCs - Add driver for Renesas Ethernet-TSN devices - Google cloud vNIC: - flow steering support - Microsoft vNIC: - support page sizes other than 4KB on ARM64 - vmware vNIC: - support latency measurement (update to version 9) - VirtIO net: - support for Byte Queue Limits - support configuring thresholds for automatic IRQ moderation - support for AF_XDP Rx zero-copy - Synopsys (stmmac): - support for STM32MP13 SoC - let platforms select the right PCS implementation - TI: - icssg-prueth: add multicast filtering support - icssg-prueth: enable PTP timestamping and PPS - Renesas: - ravb: improve Rx performance 30-400% by using page pool, theaded NAPI and timer-based IRQ coalescing - ravb: add MII support for R-Car V4M - Cadence (macb): - macb: add ARP support to Wake-On-LAN - Cortina: - use phylib for RX and TX pause configuration - Ethernet switches: - nVidia/Mellanox: - support configuration of multipath hash seed - report more accurate max MTU - use page_pool to improve Rx performance - MediaTek: - mt7530: add support for bridge port isolation - Qualcomm: - qca8k: add support for bridge port isolation - Microchip: - lan9371/2: add 100BaseTX PHY support - NXP: - vsc73xx: implement VLAN operations - Ethernet PHYs: - aquantia: enable support for aqr115c - aquantia: add support for PHY LEDs - realtek: add support for rtl8224 2.5Gbps PHY - xpcs: add memory-mapped device support - add BroadR-Reach link mode and support in Broadcom's PHY driver - CAN: - add document for ISO 15765-2 protocol support - mcp251xfd: workaround for erratum DS80000789E, use timestamps to catch when device returns incorrect FIFO status - WiFi: - mac80211/cfg80211: - parse Transmit Power Envelope (TPE) data in mac80211 instead of in drivers - improvements for 6 GHz regulatory flexibility - multi-link improvements - support multiple radios per wiphy - remove DEAUTH_NEED_MGD_TX_PREP flag - Intel (iwlwifi): - bump FW API to 91 for BZ/SC devices - report 64-bit radiotap timestamp - enable P2P low latency by default - handle Transmit Power Envelope (TPE) advertised by AP - remove support for older FW for new devices - fast resume (keeping the device configured) - mvm: re-enable Multi-Link Operation (MLO) - aggregation (A-MSDU) optimizations - MediaTek (mt76): - mt7925 Multi-Link Operation (MLO) support - Qualcomm (ath10k): - LED support for various chipsets - Qualcomm (ath12k): - remove unsupported Tx monitor handling - support channel 2 in 6 GHz band - support Spatial Multiplexing Power Save (SMPS) in 6 GHz band - supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID Advertisements (EMA) - support dynamic VLAN - add panic handler for resetting the firmware state - DebugFS support for datapath statistics - WCN7850: support for Wake on WLAN - Microchip (wilc1000): - read MAC address during probe to make it visible to user space - suspend/resume improvements - TI (wl18xx): - support newer firmware versions - RealTek (rtw89): - preparation for RTL8852BE-VT support - Wake on WLAN support for WiFi 6 chips - 36-bit PCI DMA support - RealTek (rtlwifi): - RTL8192DU support - Broadcom (brcmfmac): - Management Frame Protection support (to enable WPA3) - Bluetooth: - qualcomm: use the power sequencer for QCA6390 - btusb: mediatek: add ISO data transmission functions - hci_bcm4377: add BCM4388 support - btintel: add support for BlazarU core - btintel: add support for Whale Peak2 - btnxpuart: add support for AW693 A1 chipset - btnxpuart: add support for IW615 chipset - btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591" * tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1589 commits) eth: fbnic: Fix spelling mistake "tiggerring" -> "triggering" tcp: Replace strncpy() with strscpy() wifi: ath12k: fix build vs old compiler tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child(). eth: fbnic: Write the TCAM tables used for RSS control and Rx to host eth: fbnic: Add L2 address programming eth: fbnic: Add basic Rx handling eth: fbnic: Add basic Tx handling eth: fbnic: Add link detection eth: fbnic: Add initial messaging to notify FW of our presence eth: fbnic: Implement Rx queue alloc/start/stop/free eth: fbnic: Implement Tx queue alloc/start/stop/free eth: fbnic: Allocate a netdevice and napi vectors with queues eth: fbnic: Add FW communication mechanism eth: fbnic: Add message parsing for FW messages eth: fbnic: Add register init to set PCIe/Ethernet device config eth: fbnic: Allocate core device specific structures and devlink interface eth: fbnic: Add scaffolding for Meta's NIC driver PCI: Add Meta Platforms vendor ID net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK ...
2024-07-16Merge tag 'perf-core-2024-07-16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: - Intel PT support enhancements & fixes - Fix leaked SIGTRAP events - Improve and fix the Intel uncore driver - Add support for Intel HBM and CXL uncore counters - Add Intel Lake and Arrow Lake support - AMD uncore driver fixes - Make SIGTRAP and __perf_pending_irq() work on RT - Micro-optimizations - Misc cleanups and fixes * tag 'perf-core-2024-07-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) perf/x86/intel: Add a distinct name for Granite Rapids perf/x86/intel/ds: Fix non 0 retire latency on Raptorlake perf/x86/intel: Hide Topdown metrics events if the feature is not enumerated perf/x86/intel/uncore: Fix the bits of the CHA extended umask for SPR perf: Split __perf_pending_irq() out of perf_pending_irq() perf: Don't disable preemption in perf_pending_task(). perf: Move swevent_htable::recursion into task_struct. perf: Shrink the size of the recursion counter. perf: Enqueue SIGTRAP always via task_work. task_work: Add TWA_NMI_CURRENT as an additional notify mode. perf: Move irq_work_queue() where the event is prepared. perf: Fix event leak upon exec and file release perf: Fix event leak upon exit task_work: Introduce task_work_cancel() again task_work: s/task_work_cancel()/task_work_cancel_func()/ perf/x86/amd/uncore: Fix DF and UMC domain identification perf/x86/amd/uncore: Avoid PMU registration if counters are unavailable perf/x86/intel: Support Perfmon MSRs aliasing perf/x86/intel: Support PERFEVTSEL extension perf/x86: Add config_mask to represent EVENTSEL bitmask ...
2024-07-16Merge tag 'm68k-for-v6.11-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k Pull m68k updates from Geert Uytterhoeven: - Fix bootup lock-ups on Warp1260, Atari TT, and MegaSTe - Miscellaneous fixes and improvements - defconfig updates * tag 'm68k-for-v6.11-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: m68k: cmpxchg: Fix return value for default case in __arch_xchg() m68k: defconfig: Update defconfigs for v6.10-rc1 m68k: atari: Fix TT bootup freeze / unexpected (SCU) interrupt messages zorro: Use str_plural() in amiga_zorro_probe() m68k: emu: Add missing MODULE_DESCRIPTION() macros m68k: amiga: Turn off Warp1260 interrupts during boot
2024-07-16Merge branch 'kvm-6.11-sev-attestation' into HEADPaolo Bonzini
The GHCB 2.0 specification defines 2 GHCB request types to allow SNP guests to send encrypted messages/requests to firmware: SNP Guest Requests and SNP Extended Guest Requests. These encrypted messages are used for things like servicing attestation requests issued by the guest. Implementing support for these is required to be fully GHCB-compliant. For the most part, KVM only needs to handle forwarding these requests to firmware (to be issued via the SNP_GUEST_REQUEST firmware command defined in the SEV-SNP Firmware ABI), and then forwarding the encrypted response to the guest. However, in the case of SNP Extended Guest Requests, the host is also able to provide the certificate data corresponding to the endorsement key used by firmware to sign attestation report requests. This certificate data is provided by userspace because: 1) It allows for different keys/key types to be used for each particular guest with requiring any sort of KVM API to configure the certificate table in advance on a per-guest basis. 2) It provides additional flexibility with how attestation requests might be handled during live migration where the certificate data for source/dest might be different. 3) It allows all synchronization between certificates and firmware/signing key updates to be handled purely by userspace rather than requiring some in-kernel mechanism to facilitate it. [1] To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO/KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data, but is still enough to provide compliance with the GHCB 2.0 spec.
2024-07-16KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE eventBrijesh Singh
Version 2 of GHCB specification added support for the SNP Guest Request Message NAE event. The event allows for an SEV-SNP guest to make requests to the SEV-SNP firmware through the hypervisor using the SNP_GUEST_REQUEST API defined in the SEV-SNP firmware specification. This is used by guests primarily to request attestation reports from firmware. There are other request types are available as well, but the specifics of what guest requests are being made generally does not affect how they are handled by the hypervisor, which only serves as a proxy for the guest requests and firmware responses. Implement handling for these events. When an SNP Guest Request is issued, the guest will provide its own request/response pages, which could in theory be passed along directly to firmware. However, these pages would need special care: - Both pages are from shared guest memory, so they need to be protected from migration/etc. occurring while firmware reads/writes to them. At a minimum, this requires elevating the ref counts and potentially needing an explicit pinning of the memory. This places additional restrictions on what type of memory backends userspace can use for shared guest memory since there would be some reliance on using refcounted pages. - The response page needs to be switched to Firmware-owned state before the firmware can write to it, which can lead to potential host RMP #PFs if the guest is misbehaved and hands the host a guest page that KVM is writing to for other reasons (e.g. virtio buffers). Both of these issues can be avoided completely by using separately-allocated bounce pages for both the request/response pages and passing those to firmware instead. So that's the approach taken here. Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Co-developed-by: Alexey Kardashevskiy <aik@amd.com> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Co-developed-by: Ashish Kalra <ashish.kalra@amd.com> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> [mdr: ensure FW command failures are indicated to guest, drop extended request handling to be re-written as separate patch, massage commit] Signed-off-by: Michael Roth <michael.roth@amd.com> Message-ID: <20240701223148.3798365-2-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-16landlock: Clarify documentation for struct landlock_ruleset_attrGünther Noack
The explanation for @handled_access_fs and @handled_access_net has significant overlap and is better explained together. * Explain the commonalities in structure-level documentation. * Clarify some wording and break up longer sentences. * Put emphasis on the word "handled" to make it clearer that "handled" is a term with special meaning in the context of Landlock. I'd like to transfer this wording into the man pages as well. Signed-off-by: Günther Noack <gnoack@google.com> Cc: Alejandro Colomar <alx@kernel.org> Cc: Konstantin Meskhidze <konstantin.meskhidze@huawei.com> Cc: linux-security-module@vger.kernel.org Link: https://lore.kernel.org/r/20240711165456.2148590-2-gnoack@google.com [mic: Format commit message] Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-07-16Merge tag 'kvm-x86-misc-6.11' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini
KVM x86 misc changes for 6.11 - Add a global struct to consolidate tracking of host values, e.g. EFER, and move "shadow_phys_bits" into the structure as "maxphyaddr". - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX. - Print the name of the APICv/AVIC inhibits in the relevant tracepoint. - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor. - Misc cleanups