summaryrefslogtreecommitdiff
path: root/include/uapi
AgeCommit message (Collapse)Author
2024-03-11Merge tag 'for-6.9/io_uring-20240310' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring updates from Jens Axboe: - Make running of task_work internal loops more fair, and unify how the different methods deal with them (me) - Support for per-ring NAPI. The two minor networking patches are in a shared branch with netdev (Stefan) - Add support for truncate (Tony) - Export SQPOLL utilization stats (Xiaobing) - Multishot fixes (Pavel) - Fix for a race in manipulating the request flags via poll (Pavel) - Cleanup the multishot checking by making it generic, moving it out of opcode handlers (Pavel) - Various tweaks and cleanups (me, Kunwu, Alexander) * tag 'for-6.9/io_uring-20240310' of git://git.kernel.dk/linux: (53 commits) io_uring: Fix sqpoll utilization check racing with dying sqpoll io_uring/net: dedup io_recv_finish req completion io_uring: refactor DEFER_TASKRUN multishot checks io_uring: fix mshot io-wq checks io_uring/net: add io_req_msg_cleanup() helper io_uring/net: simplify msghd->msg_inq checking io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE io_uring/net: remove dependency on REQ_F_PARTIAL_IO for sr->done_io io_uring/net: correctly handle multishot recvmsg retry setup io_uring/net: clear REQ_F_BL_EMPTY in the multishot retry handler io_uring: fix io_queue_proc modifying req->flags io_uring: fix mshot read defer taskrun cqe posting io_uring/net: fix overflow check in io_recvmsg_mshot_prep() io_uring/net: correct the type of variable io_uring/sqpoll: statistics of the true utilization of sq threads io_uring/net: move recv/recvmsg flags out of retry loop io_uring/kbuf: flag request if buffer pool is empty after buffer pick io_uring/net: improve the usercopy for sendmsg/recvmsg io_uring/net: move receive multishot out of the generic msghdr path io_uring/net: unify how recvmsg and sendmsg copy in the msghdr ...
2024-03-11Merge tag 'vfs-6.9.uuid' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs uuid updates from Christian Brauner: "This adds two new ioctl()s for getting the filesystem uuid and retrieving the sysfs path based on the path of a mounted filesystem. Getting the filesystem uuid has been implemented in filesystem specific code for a while it's now lifted as a generic ioctl" * tag 'vfs-6.9.uuid' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: add support for FS_IOC_GETFSSYSFSPATH fs: add FS_IOC_GETFSSYSFSPATH fat: Hook up sb->s_uuid fs: FS_IOC_GETUUID ovl: convert to super_set_uuid() fs: super_set_uuid()
2024-03-11Merge tag 'vfs-6.9.pidfd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pdfd updates from Christian Brauner: - Until now pidfds could only be created for thread-group leaders but not for threads. There was no technical reason for this. We simply had no users that needed support for this. Now we do have users that need support for this. This introduces a new PIDFD_THREAD flag for pidfd_open(). If that flag is set pidfd_open() creates a pidfd that refers to a specific thread. In addition, we now allow clone() and clone3() to be called with CLONE_PIDFD | CLONE_THREAD which wasn't possible before. A pidfd that refers to an individual thread differs from a pidfd that refers to a thread-group leader: (1) Pidfds are pollable. A task may poll a pidfd and get notified when the task has exited. For thread-group leader pidfds the polling task is woken if the thread-group is empty. In other words, if the thread-group leader task exits when there are still threads alive in its thread-group the polling task will not be woken when the thread-group leader exits but rather when the last thread in the thread-group exits. For thread-specific pidfds the polling task is woken if the thread exits. (2) Passing a thread-group leader pidfd to pidfd_send_signal() will generate thread-group directed signals like kill(2) does. Passing a thread-specific pidfd to pidfd_send_signal() will generate thread-specific signals like tgkill(2) does. The default scope of the signal is thus determined by the type of the pidfd. Since use-cases exist where the default scope of the provided pidfd needs to be overriden the following flags are added to pidfd_send_signal(): - PIDFD_SIGNAL_THREAD Send a thread-specific signal. - PIDFD_SIGNAL_THREAD_GROUP Send a thread-group directed signal. - PIDFD_SIGNAL_PROCESS_GROUP Send a process-group directed signal. The scope change will only work if the struct pid is actually used for this scope. For example, in order to send a thread-group directed signal the provided pidfd must be used as a thread-group leader and similarly for PIDFD_SIGNAL_PROCESS_GROUP the struct pid must be used as a process group leader. - Move pidfds from the anonymous inode infrastructure to a tiny pseudo filesystem. This will unblock further work that we weren't able to do simply because of the very justified limitations of anonymous inodes. Moving pidfds to a tiny pseudo filesystem allows for statx on pidfds to become useful for the first time. They can now be compared by inode number which are unique for the system lifetime. Instead of stashing struct pid in file->private_data we can now stash it in inode->i_private. This makes it possible to introduce concepts that operate on a process once all file descriptors have been closed. A concrete example is kill-on-last-close. Another side-effect is that file->private_data is now freed up for per-file options for pidfds. Now, each struct pid will refer to a different inode but the same struct pid will refer to the same inode if it's opened multiple times. In contrast to now where each struct pid refers to the same inode. The tiny pseudo filesystem is not visible anywhere in userspace exactly like e.g., pipefs and sockfs. There's no lookup, there's no complex inode operations, nothing. Dentries and inodes are always deleted when the last pidfd is closed. We allocate a new inode and dentry for each struct pid and we reuse that inode and dentry for all pidfds that refer to the same struct pid. The code is entirely optional and fairly small. If it's not selected we fallback to anonymous inodes. Heavily inspired by nsfs. The dentry and inode allocation mechanism is moved into generic infrastructure that is now shared between nsfs and pidfs. The path_from_stashed() helper must be provided with a stashing location, an inode number, a mount, and the private data that is supposed to be used and it will provide a path that can be passed to dentry_open(). The helper will try retrieve an existing dentry from the provided stashing location. If a valid dentry is found it is reused. If not a new one is allocated and we try to stash it in the provided location. If this fails we retry until we either find an existing dentry or the newly allocated dentry could be stashed. Subsequent openers of the same namespace or task are then able to reuse it. - Currently it is only possible to get notified when a task has exited, i.e., become a zombie and userspace gets notified with EPOLLIN. We now also support waiting until the task has been reaped, notifying userspace with EPOLLHUP. - Ensure that ESRCH is reported for getfd if a task is exiting instead of the confusing EBADF. - Various smaller cleanups to pidfd functions. * tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits) libfs: improve path_from_stashed() libfs: add stashed_dentry_prune() libfs: improve path_from_stashed() helper pidfs: convert to path_from_stashed() helper nsfs: convert to path_from_stashed() helper libfs: add path_from_stashed() pidfd: add pidfs pidfd: move struct pidfd_fops pidfd: allow to override signal scope in pidfd_send_signal() pidfd: change pidfd_send_signal() to respect PIDFD_THREAD signal: fill in si_code in prepare_kill_siginfo() selftests: add ESRCH tests for pidfd_getfd() pidfd: getfd should always report ESRCH if a task is exiting pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together pidfd: exit: kill the no longer used thread_group_exited() pidfd: change do_notify_pidfd() to use __wake_up(poll_to_key(EPOLLIN)) pid: kill the obsolete PIDTYPE_PID code in transfer_pid() pidfd: kill the no longer needed do_notify_pidfd() in de_thread() pidfd_poll: report POLLHUP when pid_task() == NULL pidfd: implement PIDFD_THREAD flag for pidfd_open() ...
2024-03-11Merge tag 'vfs-6.9.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Misc features, cleanups, and fixes for vfs and individual filesystems. Features: - Support idmapped mounts for hugetlbfs. - Add RWF_NOAPPEND flag for pwritev2(). This allows us to fix a bug where the passed offset is ignored if the file is O_APPEND. The new flag allows a caller to enforce that the offset is honored to conform to posix even if the file was opened in append mode. - Move i_mmap_rwsem in struct address_space to avoid false sharing between i_mmap and i_mmap_rwsem. - Convert efs, qnx4, and coda to use the new mount api. - Add a generic is_dot_dotdot() helper that's used by various filesystems and the VFS code instead of open-coding it multiple times. - Recently we've added stable offsets which allows stable ordering when iterating directories exported through NFS on e.g., tmpfs filesystems. Originally an xarray was used for the offset map but that caused slab fragmentation issues over time. This switches the offset map to the maple tree which has a dense mode that handles this scenario a lot better. Includes tests. - Finally merge the case-insensitive improvement series Gabriel has been working on for a long time. This cleanly propagates case insensitive operations through ->s_d_op which in turn allows us to remove the quite ugly generic_set_encrypted_ci_d_ops() operations. It also improves performance by trying a case-sensitive comparison first and then fallback to case-insensitive lookup if that fails. This also fixes a bug where overlayfs would be able to be mounted over a case insensitive directory which would lead to all sort of odd behaviors. Cleanups: - Make file_dentry() a simple accessor now that ->d_real() is simplified because of the backing file work we did the last two cycles. - Use the dedicated file_mnt_idmap helper in ntfs3. - Use smp_load_acquire/store_release() in the i_size_read/write helpers and thus remove the hack to handle i_size reads in the filemap code. - The SLAB_MEM_SPREAD is a nop now. Remove it from various places in fs/ - It's no longer necessary to perform a second built-in initramfs unpack call because we retain the contents of the previous extraction. Remove it. - Now that we have removed various allocators kfree_rcu() always works with kmem caches and kmalloc(). So simplify various places that only use an rcu callback in order to handle the kmem cache case. - Convert the pipe code to use a lockdep comparison function instead of open-coding the nesting making lockdep validation easier. - Move code into fs-writeback.c that was located in a header but can be made static as it's only used in that one file. - Rewrite the alignment checking iterators for iovec and bvec to be easier to read, and also significantly more compact in terms of generated code. This saves 270 bytes of text on x86-64 (with clang-18) and 224 bytes on arm64 (with gcc-13). In profiles it also saves a bit of time for the same workload. - Switch various places to use KMEM_CACHE instead of kmem_cache_create(). - Use inode_set_ctime_to_ts() in inode_set_ctime_current() - Use kzalloc() in name_to_handle_at() to avoid kernel infoleak. - Various smaller cleanups for eventfds. Fixes: - Fix various comments and typos, and unneeded initializations. - Fix stack allocation hack for clang in the select code. - Improve dump_mapping() debug code on a best-effort basis. - Fix build errors in various selftests. - Avoid wrap-around instrumentation in various places. - Don't allow user namespaces without an idmapping to be used for idmapped mounts. - Fix sysv sb_read() call. - Fix fallback implementation of the get_name() export operation" * tag 'vfs-6.9.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (70 commits) hugetlbfs: support idmapped mounts qnx4: convert qnx4 to use the new mount api fs: use inode_set_ctime_to_ts to set inode ctime to current time libfs: Drop generic_set_encrypted_ci_d_ops ubifs: Configure dentry operations at dentry-creation time f2fs: Configure dentry operations at dentry-creation time ext4: Configure dentry operations at dentry-creation time libfs: Add helper to choose dentry operations at mount-time libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops fscrypt: Drop d_revalidate once the key is added fscrypt: Drop d_revalidate for valid dentries during lookup fscrypt: Factor out a helper to configure the lookup dentry ovl: Always reject mounting over case-insensitive directories libfs: Attempt exact-match comparison first during casefolded lookup efs: remove SLAB_MEM_SPREAD flag usage jfs: remove SLAB_MEM_SPREAD flag usage minix: remove SLAB_MEM_SPREAD flag usage openpromfs: remove SLAB_MEM_SPREAD flag usage proc: remove SLAB_MEM_SPREAD flag usage qnx6: remove SLAB_MEM_SPREAD flag usage ...
2024-03-11Merge tag 'asoc-v6.9' of ↵Takashi Iwai
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Updates for v6.9 This has been quite a small release, there's a lot of driver specific cleanups and minor enhancements but hardly anything on the core and only one new driver. Highlights include: - SoundWire support for AMD ACP 6.3 systems. - Support for reporting version information for AVS firmware. - Support DSPless mode for Intel Soundwire systems. - Support for configuring CS35L56 amplifiers using EFI calibration data. - Log which component is being operated on as part of power management trace events. - Support for Microchip SAM9x7, NXP i.MX95 and Qualcomm WCD939x
2024-03-11Merge tag 'loongarch-kvm-6.9' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.9 * Set reserved bits as zero in CPUCFG. * Start SW timer only when vcpu is blocking. * Do not restart SW timer when it is expired. * Remove unnecessary CSR register saving during enter guest.
2024-03-11Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ethtool: ice: Support for RSS settings to GTP Takeru Hayasaka enables RSS functionality for GTP packets on ice driver with ethtool. A user can include TEID and make RSS work for GTP-U over IPv4 by doing the following:`ethtool -N ens3 rx-flow-hash gtpu4 sde` In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d. gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does not include a TEID. gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that includes a TEID. gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios. gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6. gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended header includes Uplink, applicable to both IPv4 and IPv6. gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink, for both IPv4 and IPv6. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-11Merge branch 'for-next' into for-linusTakashi Iwai
Prep for 6.9 merge. Signed-off-by: Takashi Iwai <tiwai@suse.de>
2024-03-09Merge tag 'kvm-x86-guest_memfd_fixes-6.8' of ↵Paolo Bonzini
https://github.com/kvm-x86/linux into HEAD KVM GUEST_MEMFD fixes for 6.8: - Make KVM_MEM_GUEST_MEMFD mutually exclusive with KVM_MEM_READONLY to avoid creating ABI that KVM can't sanely support. - Update documentation for KVM_SW_PROTECTED_VM to make it abundantly clear that such VMs are purely a development and testing vehicle, and come with zero guarantees. - Limit KVM_SW_PROTECTED_VM guests to the TDP MMU, as the long term plan is to support confidential VMs with deterministic private memory (SNP and TDX) only in the TDP MMU. - Fix a bug in a GUEST_MEMFD negative test that resulted in false passes when verifying that KVM_MEM_GUEST_MEMFD memslots can't be dirty logged.
2024-03-08net: nexthop: Expose nexthop group HW stats to user spaceIdo Schimmel
Add netlink support for reading NH group hardware stats. Stats collection is done through a new notifier, NEXTHOP_EVENT_HW_STATS_REPORT_DELTA. Drivers that implement HW counters for a given NH group are thereby asked to collect the stats and report back to core by calling nh_grp_hw_stats_report_delta(). This is similar to what netdevice L3 stats do. Besides exposing number of packets that passed in the HW datapath, also include information on whether any driver actually realizes the counters. The core can tell based on whether it got any _report_delta() reports from the drivers. This allows enabling the statistics at the group at any time, with drivers opting into supporting them. This is also in line with what netdevice L3 stats are doing. So as not to waste time and space, tie the collection and reporting of HW stats with a new op flag, NHA_OP_FLAG_DUMP_HW_STATS. Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Kees Cook <keescook@chromium.org> # For the __counted_by bits Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: nexthop: Add ability to enable / disable hardware statisticsIdo Schimmel
Add netlink support for enabling collection of HW statistics on nexthop groups. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: nexthop: Expose nexthop group stats to user spaceIdo Schimmel
Add netlink support for reading NH group stats. This data is only for statistics of the traffic in the SW datapath. HW nexthop group statistics will be added in the following patches. Emission of the stats is keyed to a new op_stats flag to avoid cluttering the netlink message with stats if the user doesn't need them: NHA_OP_FLAG_DUMP_STATS. Co-developed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08net: nexthop: Add NHA_OP_FLAGSPetr Machata
In order to add per-nexthop statistics, but still not increase netlink message size for consumers that do not care about them, there needs to be a toggle through which the user indicates their desire to get the statistics. To that end, add a new attribute, NHA_OP_FLAGS. The idea is to be able to use the attribute for carrying of arbitrary operation-specific flags, i.e. not make it specific for get / dump. Add the new attribute to get and dump policies, but do not actually allow any flags yet -- those will come later as the flags themselves are defined. Add the necessary parsing code. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-03-08Merge branches 'arm/mediatek', 'arm/renesas', 'arm/smmu', 'x86/vt-d', ↵Joerg Roedel
'x86/amd' and 'core' into next
2024-03-07netdev: add queue stat for alloc failuresJakub Kicinski
Rx alloc failures are commonly counted by drivers. Support reporting those via netdev-genl queue stats. Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20240306195509.1502746-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07netdev: add per-queue statisticsJakub Kicinski
The ethtool-nl family does a good job exposing various protocol related and IEEE/IETF statistics which used to get dumped under ethtool -S, with creative names. Queue stats don't have a netlink API, yet, and remain a lion's share of ethtool -S output for new drivers. Not only is that bad because the names differ driver to driver but it's also bug-prone. Intuitively drivers try to report only the stats for active queues, but querying ethtool stats involves multiple system calls, and the number of stats is read separately from the stats themselves. Worse still when user space asks for values of the stats, it doesn't inform the kernel how big the buffer is. If number of stats increases in the meantime kernel will overflow user buffer. Add a netlink API for dumping queue stats. Queue information is exposed via the netdev-genl family, so add the stats there. Support per-queue and sum-for-device dumps. Latter will be useful when subsequent patches add more interesting common stats than just bytes and packets. The API does not currently distinguish between HW and SW stats. The expectation is that the source of the stats will either not matter much (good packets) or be obvious (skb alloc errors). Acked-by: Stanislav Fomichev <sdf@google.com> Reviewed-by: Amritha Nambiar <amritha.nambiar@intel.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20240306195509.1502746-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-08Merge tag 'drm-etnaviv-next-2024-03-07' of ↵Dave Airlie
https://git.pengutronix.de/git/lst/linux into drm-next - various code cleanups - enhancements for NPU and MRT support Signed-off-by: Dave Airlie <airlied@redhat.com> From: Lucas Stach <l.stach@pengutronix.de> Link: https://patchwork.freedesktop.org/patch/msgid/72a783cd98d60f6ebb43b90a6b453eea87224409.camel@pengutronix.de
2024-03-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/page_pool_user.c 0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors") 429679dcf7d9 ("page_pool: fix netlink dump stop/resume") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07arm64/ptrace: Expose FPMR via ptraceMark Brown
Add a new regset to expose FPMR via ptrace. It is not added to the FPSIMD registers since that structure is exposed elsewhere without any allowance for extension we don't add there. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240306-arm64-2023-dpisa-v5-5-c568edc8ed7f@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-03-06bpf: Introduce may_goto instructionAlexei Starovoitov
Introduce may_goto instruction that from the verifier pov is similar to open coded iterators bpf_for()/bpf_repeat() and bpf_loop() helper, but it doesn't iterate any objects. In assembly 'may_goto' is a nop most of the time until bpf runtime has to terminate the program for whatever reason. In the current implementation may_goto has a hidden counter, but other mechanisms can be used. For programs written in C the later patch introduces 'cond_break' macro that combines 'may_goto' with 'break' statement and has similar semantics: cond_break is a nop until bpf runtime has to break out of this loop. It can be used in any normal "for" or "while" loop, like for (i = zero; i < cnt; cond_break, i++) { The verifier recognizes that may_goto is used in the program, reserves additional 8 bytes of stack, initializes them in subprog prologue, and replaces may_goto instruction with: aux_reg = *(u64 *)(fp - 40) if aux_reg == 0 goto pc+off aux_reg -= 1 *(u64 *)(fp - 40) = aux_reg may_goto instruction can be used by LLVM to implement __builtin_memcpy, __builtin_strcmp. may_goto is not a full substitute for bpf_for() macro. bpf_for() doesn't have induction variable that verifiers sees, so 'i' in bpf_for(i, 0, 100) is seen as imprecise and bounded. But when the code is written as: for (i = 0; i < 100; cond_break, i++) the verifier see 'i' as precise constant zero, hence cond_break (aka may_goto) doesn't help to converge the loop. A static or global variable can be used as a workaround: static int zero = 0; for (i = zero; i < 100; cond_break, i++) // works! may_goto works well with arena pointers that don't need to be bounds checked on access. Load/store from arena returns imprecise unbounded scalar and loops with may_goto pass the verifier. Reserve new opcode BPF_JMP | BPF_JCOND for may_goto insn. JCOND stands for conditional pseudo jump. Since goto_or_nop insn was proposed, it may use the same opcode. may_goto vs goto_or_nop can be distinguished by src_reg: code = BPF_JMP | BPF_JCOND src_reg = 0 - may_goto src_reg = 1 - goto_or_nop Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Tested-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240306031929.42666-2-alexei.starovoitov@gmail.com
2024-03-06ethtool: Add GTP RSS hash options to ethtool.hTakeru Hayasaka
This is a patch that enables RSS functionality for GTP packets using ethtool. A user can include TEID and make RSS work for GTP-U over IPv4 by doing the following:`ethtool -N ens3 rx-flow-hash gtpu4 sde` In addition to gtpu(4|6), we now support gtpc(4|6),gtpc(4|6)t,gtpu(4|6)e, gtpu(4|6)u, and gtpu(4|6)d. gtpc(4|6): Used for GTP-C in IPv4 and IPv6, where the GTP header format does not include a TEID. gtpc(4|6)t: Used for GTP-C in IPv4 and IPv6, with a GTP header format that includes a TEID. gtpu(4|6): Used for GTP-U in both IPv4 and IPv6 scenarios. gtpu(4|6)e: Used for GTP-U with extended headers in both IPv4 and IPv6. gtpu(4|6)u: Used when the PSC (PDU session container) in the GTP-U extended header includes Uplink, applicable to both IPv4 and IPv6. gtpu(4|6)d: Used when the PSC in the GTP-U extended header includes Downlink, for both IPv4 and IPv6. GTP generates a flow that includes an ID called TEID to identify the tunnel. This tunnel is created for each UE (User Equipment).By performing RSS based on this flow, it is possible to apply RSS for each communication unit from the UE. Without this, RSS would only be effective within the range of IP addresses. For instance, the PGW can only perform RSS within the IP range of the SGW. Problematic from a load distribution perspective, especially if there's a bias in the terminals connected to a particular base station.This case can be solved by using this patch. Signed-off-by: Takeru Hayasaka <hayatake396@gmail.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-03-06fanotify: Fix misspelling of "writable"Vicki Pfau
Several file system notification system headers have "writable" misspelled as "writtable" in the comments. This patch fixes it in the fanotify header. Signed-off-by: Vicki Pfau <vi@endrift.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240306020831.1404033-3-vi@endrift.com>
2024-03-06inotify: Fix misspelling of "writable"Vicki Pfau
Several file system notification system headers have "writable" misspelled as "writtable" in the comments. This patch fixes it in the inotify header. Signed-off-by: Vicki Pfau <vi@endrift.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240306020831.1404033-1-vi@endrift.com>
2024-03-06fuse: Use the high bit of request ID for indicating resend requestsZhao Chen
Some FUSE daemons want to know if the received request is a resend request. The high bit of the fuse request ID is utilized for indicating this, enabling the receiver to perform appropriate handling. The init flag "FUSE_HAS_RESEND" is added to indicate this feature. Signed-off-by: Zhao Chen <winters.zc@antgroup.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-06fuse: Introduce a new notification type for resend pending requestsZhao Chen
When a FUSE daemon panics and failover, we aim to minimize the impact on applications by reusing the existing FUSE connection. During this process, another daemon is employed to preserve the FUSE connection's file descriptor. The new started FUSE Daemon will takeover the fd and continue to provide service. However, it is possible for some inflight requests to be lost and never returned. As a result, applications awaiting replies would become stuck forever. To address this, we can resend these pending requests to the new started FUSE daemon. This patch introduces a new notification type "FUSE_NOTIFY_RESEND", which can trigger resending of the pending requests, ensuring they are properly processed again. Signed-off-by: Zhao Chen <winters.zc@antgroup.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-06fuse: add support for explicit export disablingJingbo Xu
open_by_handle_at(2) can fail with -ESTALE with a valid handle returned by a previous name_to_handle_at(2) for evicted fuse inodes, which is especially common when entry_valid_timeout is 0, e.g. when the fuse daemon is in "cache=none" mode. The time sequence is like: name_to_handle_at(2) # succeed evict fuse inode open_by_handle_at(2) # fail The root cause is that, with 0 entry_valid_timeout, the dput() called in name_to_handle_at(2) will trigger iput -> evict(), which will send FUSE_FORGET to the daemon. The following open_by_handle_at(2) will send a new FUSE_LOOKUP request upon inode cache miss since the previous inode eviction. Then the fuse daemon may fail the FUSE_LOOKUP request with -ENOENT as the cached metadata of the requested inode has already been cleaned up during the previous FUSE_FORGET. The returned -ENOENT is treated as -ESTALE when open_by_handle_at(2) returns. This confuses the application somehow, as open_by_handle_at(2) fails when the previous name_to_handle_at(2) succeeds. The returned errno is also confusing as the requested file is not deleted and already there. It is reasonable to fail name_to_handle_at(2) early in this case, after which the application can fallback to open(2) to access files. Since this issue typically appears when entry_valid_timeout is 0 which is configured by the fuse daemon, the fuse daemon is the right person to explicitly disable the export when required. Also considering FUSE_EXPORT_SUPPORT actually indicates the support for lookups of "." and "..", and there are existing fuse daemons supporting export without FUSE_EXPORT_SUPPORT set, for compatibility, we add a new INIT flag for such purpose. Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-05Merge tag 'v6.8-rc7' into gpio/for-nextBartosz Golaszewski
Linux 6.8-rc7
2024-03-05btrfs: qgroup: validate btrfs_qgroup_inherit parameterQu Wenruo
[BUG] Currently btrfs can create subvolume with an invalid qgroup inherit without triggering any error: # mkfs.btrfs -O quota -f $dev # mount $dev $mnt # btrfs subvolume create -i 2/0 $mnt/subv1 # btrfs qgroup show -prce --sync $mnt Qgroupid Referenced Exclusive Path -------- ---------- --------- ---- 0/5 16.00KiB 16.00KiB <toplevel> 0/256 16.00KiB 16.00KiB subv1 [CAUSE] We only do a very basic size check for btrfs_qgroup_inherit structure, but never really verify if the values are correct. Thus in btrfs_qgroup_inherit() function, we have to skip non-existing qgroups, and never return any error. [FIX] Fix the behavior and introduce extra checks: - Introduce early check for btrfs_qgroup_inherit structure Not only the size, but also all the qgroup ids would be verified. And the timing is very early, so we can return error early. This early check is very important for snapshot creation, as snapshot is delayed to transaction commit. - Drop support for btrfs_qgroup_inherit::num_ref_copies and num_excl_copies Those two members are used to specify to copy refr/excl numbers from other qgroups. This would definitely mark qgroup inconsistent, and btrfs-progs has dropped the support for them for a long time. It's time to drop the support for kernel. - Verify the supported btrfs_qgroup_inherit::flags Just in case we want to add extra flags for btrfs_qgroup_inherit. Now above subvolume creation would fail with -ENOENT other than silently ignore the non-existing qgroup. CC: stable@vger.kernel.org # 6.7+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-05fuse: implement ioctls to manage backing filesAmir Goldstein
FUSE server calls the FUSE_DEV_IOC_BACKING_OPEN ioctl with a backing file descriptor. If the call succeeds, a backing file identifier is returned. A later change will be using this backing file id in a reply to OPEN request with the flag FOPEN_PASSTHROUGH to setup passthrough of file operations on the open FUSE file to the backing file. The FUSE server should call FUSE_DEV_IOC_BACKING_CLOSE ioctl to close the backing file by its id. This can be done at any time, but if an open reply with FOPEN_PASSTHROUGH flag is still in progress, the open may fail if the backing file is closed before the fuse file was opened. Setting up backing files requires a server with CAP_SYS_ADMIN privileges. For the backing file to be successfully setup, the backing file must implement both read_iter and write_iter file operations. The limitation on the level of filesystem stacking allowed for the backing file is enforced before setting up the backing file. Signed-off-by: Alessio Balsini <balsini@android.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-03-03RDMA/hns: Support userspace configuring congestion control algorithm with QP ↵Junxian Huang
granularity Currently, congestion control algorithm is statically configured in FW, and all QPs use the same algorithm(except UD which has a fixed configuration of DCQCN). This is not flexible enough. Support userspace configuring congestion control algorithm with QP granularity while creating QPs. If the algorithm is not specified in userspace, use the default one. Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://lore.kernel.org/r/20240301104845.1141083-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-03-02Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-02-29 We've added 119 non-merge commits during the last 32 day(s) which contain a total of 150 files changed, 3589 insertions(+), 995 deletions(-). The main changes are: 1) Extend the BPF verifier to enable static subprog calls in spin lock critical sections, from Kumar Kartikeya Dwivedi. 2) Fix confusing and incorrect inference of PTR_TO_CTX argument type in BPF global subprogs, from Andrii Nakryiko. 3) Larger batch of riscv BPF JIT improvements and enabling inlining of the bpf_kptr_xchg() for RV64, from Pu Lehui. 4) Allow skeleton users to change the values of the fields in struct_ops maps at runtime, from Kui-Feng Lee. 5) Extend the verifier's capabilities of tracking scalars when they are spilled to stack, especially when the spill or fill is narrowing, from Maxim Mikityanskiy & Eduard Zingerman. 6) Various BPF selftest improvements to fix errors under gcc BPF backend, from Jose E. Marchesi. 7) Avoid module loading failure when the module trying to register a struct_ops has its BTF section stripped, from Geliang Tang. 8) Annotate all kfuncs in .BTF_ids section which eventually allows for automatic kfunc prototype generation from bpftool, from Daniel Xu. 9) Several updates to the instruction-set.rst IETF standardization document, from Dave Thaler. 10) Shrink the size of struct bpf_map resp. bpf_array, from Alexei Starovoitov. 11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer, from Benjamin Tissoires. 12) Fix bpftool to be more portable to musl libc by using POSIX's basename(), from Arnaldo Carvalho de Melo. 13) Add libbpf support to gcc in CORE macro definitions, from Cupertino Miranda. 14) Remove a duplicate type check in perf_event_bpf_event, from Florian Lehner. 15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them with notrace correctly, from Yonghong Song. 16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible array to fix build warnings, from Kees Cook. 17) Fix resolve_btfids cross-compilation to non host-native endianness, from Viktor Malik. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits) selftests/bpf: Test if shadow types work correctly. bpftool: Add an example for struct_ops map and shadow type. bpftool: Generated shadow variables for struct_ops maps. libbpf: Convert st_ops->data to shadow type. libbpf: Set btf_value_type_id of struct bpf_map for struct_ops. bpf: Replace bpf_lpm_trie_key 0-length array with flexible array bpf, arm64: use bpf_prog_pack for memory management arm64: patching: implement text_poke API bpf, arm64: support exceptions arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT bpf: add is_async_callback_calling_insn() helper bpf: introduce in_sleepable() helper bpf: allow more maps in sleepable bpf programs selftests/bpf: Test case for lacking CFI stub functions. bpf: Check cfi_stubs before registering a struct_ops type. bpf: Clarify batch lookup/lookup_and_delete semantics bpf, docs: specify which BPF_ABS and BPF_IND fields were zero bpf, docs: Fix typos in instruction-set.rst selftests/bpf: update tcp_custom_syncookie to use scalar packet offset bpf: Shrink size of struct bpf_map/bpf_array. ... ==================== Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-01Merge tag 'sound-6.8-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "The amount of changes wasn't as small as wished, but all reasonably small fixes. There is a PCM core API change, which is for correcting the behavior change we took in 6.8. The rest are device-specific fixes for ASoC AMD, Qualcomm, Cirrus codecs, HD-audio quirks & co" * tag 'sound-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ASoC: amd: yc: Fix non-functional mic on Lenovo 21J2 ASoC: amd: yc: add new YC platform variant (0x63) support ALSA: hda/realtek - ALC285 reduce pop noise from Headphone port ASoC: amd: yc: Add Lenovo ThinkBook 21J0 into DMI quirk table ALSA: hda/realtek: Add special fixup for Lenovo 14IRP8 ASoC: soc-card: Fix missing locking in snd_soc_card_get_kcontrol() ALSA: hda/realtek: tas2781: enable subwoofer volume control ALSA: pcm: clarify and fix default msbits value for all formats ASoC: qcom: Fix uninitialized pointer dmactl ALSA: hda/realtek: fix mute/micmute LED For HP mt440 ALSA: Drop leftover snd-rtctimer stuff from Makefile ALSA: ump: Fix the discard error code from snd_ump_legacy_open() ALSA: hda/realtek: Enable Mute LED on HP 840 G8 (MB 8AB8) ASoC: cs35l56: Must clear HALO_STATE before issuing SYSTEM_RESET ALSA: hda/realtek: Fix top speaker connection on Dell Inspiron 16 Plus 7630 ALSA: firewire-lib: fix to check cycle continuity
2024-03-01Merge tag 'drm-fixes-2024-03-01' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Bunch of fixes, xe, amdgpu, nouveau and tegra all have a few. Then drm/bridge including some drivers/soc fallout fixes. The biggest thing in here is a new unit test for some buddy allocator fixes, otherwise a misc fbcon, ttm unit test and one msm revert. Seems pretty normal for this stage. buddy: - two allocation fixes + unit test fbcon: - font restore syzkaller fix ttm: - kunit test fix bridge: - fix aux-hpd leaks - fix aux-hpd registration - fix use after free in soc/qcom - fix boot on soc/qcom xe: - A couple of tracepoint updates from Priyanka and Lucas - Make sure BINDs are completed before accepting UNBINDs on LR vms - Don't arbitrarily restrict max number of batched binds - Add uapi for dumpable bos (agreed on IRC) - Remove unused uapi flags and a leftover comment - A couple of fixes related to the execlist backend msm: - DP: Revert a change which was causing a HDP regression amdgpu: - Fix potential buffer overflow - Fix power min cap - Suspend/resume fix - SI PM fix - eDP fix nouveau: - fix a misreported VRAM sizing - fix a regression in suspend/resume due to freeing tegra: - host1x reset fix - only remove existing driver if display is possible" * tag 'drm-fixes-2024-03-01' of https://gitlab.freedesktop.org/drm/kernel: (32 commits) drm/nouveau: keep DMA buffers required for suspend/resume nouveau: report byte usage in VRAM usage drm/xe/xe_trace: Add move_lacks_source detail to xe_bo_move trace drm/xe: Deny unbinds if uapi ufence pending drm/xe: Expose user fence from xe_sync_entry drm/xe: Use pointers in trace events drm/xe/xe_bo_move: Enhance xe_bo_move trace drm/xe/mmio: fix build warning for BAR resize on 32-bit drm/xe: get rid of MAX_BINDS drm/xe: Use vmalloc for array of bind allocation in bind IOCTL drm/xe: Don't support execlists in xe_gt_tlb_invalidation layer drm/xe: Fix execlist splat drm/xe/uapi: Remove unused flags drm/xe/uapi: Remove DRM_XE_VM_BIND_FLAG_ASYNC comment left over drm/xe: Add uapi for dumpable bos drm/amd/display: Add monitor patch for specific eDP Revert "drm/msm/dp: use drm_bridge_hpd_notify() to report HPD status changes" drm/tests/drm_buddy: add alloc_range_bias test drm/buddy: check range allocation matches alignment drm/buddy: fix range bias ...
2024-03-01pidfd: add pidfsChristian Brauner
This moves pidfds from the anonymous inode infrastructure to a tiny pseudo filesystem. This has been on my todo for quite a while as it will unblock further work that we weren't able to do simply because of the very justified limitations of anonymous inodes. Moving pidfds to a tiny pseudo filesystem allows: * statx() on pidfds becomes useful for the first time. * pidfds can be compared simply via statx() and then comparing inode numbers. * pidfds have unique inode numbers for the system lifetime. * struct pid is now stashed in inode->i_private instead of file->private_data. This means it is now possible to introduce concepts that operate on a process once all file descriptors have been closed. A concrete example is kill-on-last-close. * file->private_data is freed up for per-file options for pidfds. * Each struct pid will refer to a different inode but the same struct pid will refer to the same inode if it's opened multiple times. In contrast to now where each struct pid refers to the same inode. Even if we were to move to anon_inode_create_getfile() which creates new inodes we'd still be associating the same struct pid with multiple different inodes. The tiny pseudo filesystem is not visible anywhere in userspace exactly like e.g., pipefs and sockfs. There's no lookup, there's no complex inode operations, nothing. Dentries and inodes are always deleted when the last pidfd is closed. We allocate a new inode for each struct pid and we reuse that inode for all pidfds. We use iget_locked() to find that inode again based on the inode number which isn't recycled. We allocate a new dentry for each pidfd that uses the same inode. That is similar to anonymous inodes which reuse the same inode for thousands of dentries. For pidfds we're talking way less than that. There usually won't be a lot of concurrent openers of the same struct pid. They can probably often be counted on two hands. I know that systemd does use separate pidfd for the same struct pid for various complex process tracking issues. So I think with that things actually become way simpler. Especially because we don't have to care about lookup. Dentries and inodes continue to be always deleted. The code is entirely optional and fairly small. If it's not selected we fallback to anonymous inodes. Heavily inspired by nsfs which uses a similar stashing mechanism just for namespaces. Link: https://lore.kernel.org/r/20240213-vfs-pidfd_fs-v1-2-f863f58cfce1@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-01Merge tag 'drm-xe-fixes-2024-02-29' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes UAPI Changes: - A couple of tracepoint updates from Priyanka and Lucas. - Make sure BINDs are completed before accepting UNBINDs on LR vms. - Don't arbitrarily restrict max number of batched binds. - Add uapi for dumpable bos (agreed on IRC). - Remove unused uapi flags and a leftover comment. Driver Changes: - A couple of fixes related to the execlist backend. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZeCBg4MA2hd1oggN@fedora
2024-03-01Merge tag 'drm-intel-gt-next-2024-02-28' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-next Driver Changes: Fixes: - Add some boring kerneldoc (Tvrtko Ursulin) - Check before removing mm notifier (Nirmoy Signed-off-by: Dave Airlie <airlied@redhat.com> From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/Zd889Wvu/ZKZSK4/@tursulin-desk
2024-02-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/mptcp/protocol.c adf1bb78dab5 ("mptcp: fix snd_wnd initialization for passive socket") 9426ce476a70 ("mptcp: annotate lockless access for RX path fields") https://lore.kernel.org/all/20240228103048.19255709@canb.auug.org.au/ Adjacent changes: drivers/dpll/dpll_core.c 0d60d8df6f49 ("dpll: rely on rcu for netdev_dpll_pin()") e7f8df0e81bf ("dpll: move xa_erase() call in to match dpll_pin_alloc() error path order") drivers/net/veth.c 1ce7d306ea63 ("veth: try harder when allocating queue memory") 0bef512012b1 ("net: add netdev_lockdep_set_classes() to virtual drivers") drivers/net/wireless/intel/iwlwifi/mvm/d3.c 8c9bef26e98b ("wifi: iwlwifi: mvm: d3: implement suspend with MLO") 78f65fbf421a ("wifi: iwlwifi: mvm: ensure offloading TID queue exists") net/wireless/nl80211.c f78c1375339a ("wifi: nl80211: reject iftype change with mesh ID change") 414532d8aa89 ("wifi: cfg80211: use IEEE80211_MAX_MESH_ID_LEN appropriately") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-29bpf: Replace bpf_lpm_trie_key 0-length array with flexible arrayKees Cook
Replace deprecated 0-length array in struct bpf_lpm_trie_key with flexible array. Found with GCC 13: ../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=] 207 | *(__be16 *)&key->data[i]); | ^~~~~~~~~~~~~ ../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16' 102 | #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x)) | ^ ../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu' 97 | #define be16_to_cpu __be16_to_cpu | ^~~~~~~~~~~~~ ../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu' 206 | u16 diff = be16_to_cpu(*(__be16 *)&node->data[i] ^ | ^~~~~~~~~~~ In file included from ../include/linux/bpf.h:7: ../include/uapi/linux/bpf.h:82:17: note: while referencing 'data' 82 | __u8 data[0]; /* Arbitrary size */ | ^~~~ And found at run-time under CONFIG_FORTIFY_SOURCE: UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49 index 0 is out of range for type '__u8 [*]' Changing struct bpf_lpm_trie_key is difficult since has been used by userspace. For example, in Cilium: struct egress_gw_policy_key { struct bpf_lpm_trie_key lpm_key; __u32 saddr; __u32 daddr; }; While direct references to the "data" member haven't been found, there are static initializers what include the final member. For example, the "{}" here: struct egress_gw_policy_key in_key = { .lpm_key = { 32 + 24, {} }, .saddr = CLIENT_IP, .daddr = EXTERNAL_SVC_IP & 0Xffffff, }; To avoid the build time and run time warnings seen with a 0-sized trailing array for struct bpf_lpm_trie_key, introduce a new struct that correctly uses a flexible array for the trailing bytes, struct bpf_lpm_trie_key_u8. As part of this, include the "header" portion (which is just the "prefixlen" member), so it can be used by anything building a bpf_lpr_trie_key that has trailing members that aren't a u8 flexible array (like the self-test[1]), which is named struct bpf_lpm_trie_key_hdr. Unfortunately, C++ refuses to parse the __struct_group() helper, so it is not possible to define struct bpf_lpm_trie_key_hdr directly in struct bpf_lpm_trie_key_u8, so we must open-code the union directly. Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out, and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment to the UAPI header directing folks to the two new options. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Closes: https://paste.debian.net/hidden/ca500597/ Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1] Link: https://lore.kernel.org/bpf/20240222155612.it.533-kees@kernel.org
2024-02-29Merge tag 'net-6.8-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, WiFi and netfilter. We have one outstanding issue with the stmmac driver, which may be a LOCKDEP false positive, not a blocker. Current release - regressions: - netfilter: nf_tables: re-allow NFPROTO_INET in nft_(match/target)_validate() - eth: ionic: fix error handling in PCI reset code Current release - new code bugs: - eth: stmmac: complete meta data only when enabled, fix null-deref - kunit: fix again checksum tests on big endian CPUs Previous releases - regressions: - veth: try harder when allocating queue memory - Bluetooth: - hci_bcm4377: do not mark valid bd_addr as invalid - hci_event: fix handling of HCI_EV_IO_CAPA_REQUEST Previous releases - always broken: - info leak in __skb_datagram_iter() on netlink socket - mptcp: - map v4 address to v6 when destroying subflow - fix potential wake-up event loss due to sndbuf auto-tuning - fix double-free on socket dismantle - wifi: nl80211: reject iftype change with mesh ID change - fix small out-of-bound read when validating netlink be16/32 types - rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back - ipv6: fix potential "struct net" ref-leak in inet6_rtm_getaddr() - ip_tunnel: prevent perpetual headroom growth with huge number of tunnels on top of each other - mctp: fix skb leaks on error paths of mctp_local_output() - eth: ice: fixes for DPLL state reporting - dpll: rely on rcu for netdev_dpll_pin() to prevent UaF - eth: dpaa: accept phy-interface-type = '10gbase-r' in the device tree" * tag 'net-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (73 commits) dpll: fix build failure due to rcu_dereference_check() on unknown type kunit: Fix again checksum tests on big endian CPUs tls: fix use-after-free on failed backlog decryption tls: separate no-async decryption request handling from async tls: fix peeking with sync+async decryption tls: decrement decrypt_pending if no async completion will be called gtp: fix use-after-free and null-ptr-deref in gtp_newlink() net: hsr: Use correct offset for HSR TLV values in supervisory HSR frames igb: extend PTP timestamp adjustments to i211 rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing back tools: ynl: fix handling of multiple mcast groups selftests: netfilter: add bridge conntrack + multicast test case netfilter: bridge: confirm multicast packets before passing them up the stack netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate() Bluetooth: qca: Fix triggering coredump implementation Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DT Bluetooth: qca: Fix wrong event type for patch config command Bluetooth: Enforce validation on max value of connection interval Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST Bluetooth: mgmt: Fix limited discoverable off timeout ...
2024-02-29drm/xe/uapi: Remove unused flagsFrancois Dugast
Those cases missed in previous uAPI cleanups were mostly accidentally brought in from i915 or created to exercise the possibilities of gpuvm but they are not used by userspace yet, so let's remove them. They can still be brought back later if needed. v2: - Fix XE_VM_FLAG_FAULT_MODE support in xe_lrc.c (Brian Welty) - Leave DRM_XE_VM_BIND_OP_UNMAP_ALL (José Roberto de Souza) - Ensure invalid flag values are rejected (Rodrigo Vivi) v3: Rebase after removal of persistent exec_queues (Francois Dugast) v4: Rodrigo: Rebase after the new dumpable flag. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240222232356.175431-1-rodrigo.vivi@intel.com (cherry picked from commit 84a1ed5e67565b09b8fd22a26754d2897de55ce0) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-29drm/xe/uapi: Remove DRM_XE_VM_BIND_FLAG_ASYNC comment left overJosé Roberto de Souza
This is a comment left over of commit d3d767396a02 ("drm/xe/uapi: Remove sync binds"). Fixes: d3d767396a02 ("drm/xe/uapi: Remove sync binds") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231226172321.61518-1-jose.souza@intel.com (cherry picked from commit f031c3a7af8ea06790dd0a71872c4f0175084baa) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-29drm/xe: Add uapi for dumpable bosMaarten Lankhorst
Add the flag XE_VM_BIND_FLAG_DUMPABLE to notify devcoredump that this mapping should be dumped. This is not hooked up, but the uapi should be ready before merging. It's likely easier to dump the contents of the bo's at devcoredump readout time, so it's better if the bos will stay unmodified after a hang. The NEEDS_CPU_MAPPING flag is removed as requirement. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240221133024.898315-3-maarten.lankhorst@linux.intel.com (cherry picked from commit 76a86b58d2b3de31e88acb487ebfa0c3cc7c41d2) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2024-02-28ublk: add UBLK_CMD_DEL_DEV_ASYNCMing Lei
The current command UBLK_CMD_DEL_DEV won't return until the device is released, this way looks more reliable, but makes userspace more difficult to implement, especially about orders: unmap command buffer(which holds one ublkc reference), ublkc close, io_uring_file_unregister, ublkb close. Add UBLK_CMD_DEL_DEV_ASYNC so that device deletion won't wait release, then userspace needn't worry about the above order. Actually both loop and nbd is deleted in this async way. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240223075539.89945-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-02-28uapi: ioam6: API for netlink multicast eventsJustin Iurman
Add new api to support ioam6 events for generic netlink multicast. A first "trace" event is added to the list of ioam6 events, which will represent an IOAM Pre-allocated Trace Option-Type. It provides another solution to share IOAM data with user space. Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-27uapi: in6: replace temporary label with rfc9486Justin Iurman
Not really a fix per se, but IPV6_TLV_IOAM is still tagged as "TEMPORARY IANA allocation for IOAM", while RFC 9486 is available for some time now. Just update the reference. Fixes: 9ee11f0fff20 ("ipv6: ioam: Data plane support for Pre-allocated Trace") Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240226124921.9097-1-justin.iurman@uliege.be Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-27Merge 6.8-rc6 into tty-nextGreg Kroah-Hartman
We need the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-26Merge v6.8-rc6 into drm-nextDaniel Vetter
Thomas Zimmermann asked to backmerge -rc6 for drm-misc branches, there's a few same-area-changed conflicts (xe and amdgpu mostly) that are getting a bit too annoying. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2024-02-26Merge v6.8-rc6 into usb-nextGreg Kroah-Hartman
We need it here for the USB fixes, and it resolves a merge conflict as reported in linux-next in drivers/usb/roles/class.c Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23fuse: introduce FUSE_PASSTHROUGH capabilityAmir Goldstein
FUSE_PASSTHROUGH capability to passthrough FUSE operations to backing files will be made available with kernel config CONFIG_FUSE_PASSTHROUGH. When requesting FUSE_PASSTHROUGH, userspace needs to specify the max_stack_depth that is allowed for FUSE on top of backing files. Introduce the flag FOPEN_PASSTHROUGH and backing_id to fuse_open_out argument that can be used when replying to OPEN request, to setup passthrough of io operations on the fuse inode to a backing file. Introduce a refcounted fuse_backing object that will be used to associate an open backing file with a fuse inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2024-02-23media: rkisp1: Add version enum for i.MX8MP ISPPaul Elder
The NXP i.MX8MP integrates an ISP8000Nano from VeriSilicon, which is a derivative of the ISP found in earlier Rockchip SoCs. It isn't clear at which exact point the two product lines have diverged, and there is no public information regarding the version numbering scheme of the ISP8000Nano. Nonetheless, this ISP is close enough to the V10 found in the RK3399 to be supported by the same driver. Add an entry for the ISP found in the NXP i.MX8MP to the version enum. Given the lack of information on the version numbering scheme, and on whether or not the version in the i.MX8MP is identical to other ISP8000Nano versions or has been customized for the i.MX8MP, depart from the number-based versions and name this new version V_IMX8MP. Update comments for the other versions and for relevant parameters blocks to clearly indicate the size of grids and histogram for the different versions. Signed-off-by: Paul Elder <paul.elder@ideasonboard.com> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com>