summaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)Author
2022-09-01Revert "io_uring: rename IORING_OP_FILES_UPDATE"Pavel Begunkov
This reverts commit 4379d5f15b3fd4224c37841029178aa8082a242e. We removed notification flushing, also cleanup uapi preparation changes to not pollute it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/89edc3905350f91e1b6e26d9dbf42ee44fd451a2.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-09-01Revert "io_uring: add zc notification flush requests"Pavel Begunkov
This reverts commit 492dddb4f6e3a5839c27d41ff1fecdbe6c3ab851. Soon we won't have the very notion of notification flushing, so remove notification flushing requests. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8850334ca56e65b413cb34fd158db81d7b2865a3.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-31usbip: add USBIP_URB_* URB transfer flagsShuah Khan
USBIP driver packs URB transfer flags in network packets that are exchanged between Server (usbip_host) and Client (vhci_hcd). URB_* flags are internal to kernel and could change. Where as USBIP URB flags exchanged in network packets are USBIP user API must not change. Add USBIP_URB* flags to make this an explicit API and change the client and server to map them. Details as follows: Client tx path (USBIP_CMD_SUBMIT): - Maps URB_* to USBIP_URB_* when it sends USBIP_CMD_SUBMIT packet. Server rx path (USBIP_CMD_SUBMIT): - Maps USBIP_URB_* to URB_* when it receives USBIP_CMD_SUBMIT packet. Flags aren't included in USBIP_CMD_UNLINK and USBIP_RET_SUBMIT packets and no special handling is needed for them in the following cases: - Server rx path (USBIP_CMD_UNLINK) - Client rx path & Server tx path (USBIP_RET_SUBMIT) Update protocol documentation to reflect the change. Suggested-by: Hongren Zenithal Zheng <i@zenithal.me> Suggested-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20220824002456.94605-1-skhan@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-30net: virtio_net: fix notification coalescing commentsAlvaro Karsz
Fix wording in comments for the notifications coalescing feature. Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20220823073947.14774-1-alvaro.karsz@solid-run.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-30netlink: add support for ext_ack missing attributesJakub Kicinski
There is currently no way to report via extack in a structured way that an attribute is missing. This leads to families resorting to string messages. Add a pair of attributes - @offset and @type for machine-readable way of reporting missing attributes. The @offset points to the nest which should have contained the attribute, @type is the expected nla_type. The offset will be skipped if the attribute is missing at the message level rather than inside a nest. User space should be able to figure out which attribute enum (AKA attribute space AKA attribute set) the nest pointed to by @offset is using. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-30media: videodev2.h: drop V4L2_CAP_ASYNCIOHans Verkuil
The V4L2_CAP_ASYNCIO capability was never implemented (and in fact it isn't clear what it was supposed to do in the first place). Drop it from the capabilities list. Keep it in videodev2.h with the other defines under ifndef __KERNEL__ for backwards compatibility. This will free up a capability bit for other future uses. And having an unused and undefined I/O method is just plain confusing. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-08-30media: v4l2-ctrls: Fix typo in VP8 commentDeborah Brouwer
The comment for the VP8 loop filter flags uses the partially wrong name for the flags. Unlike the other VP8 flag names, the loop filter flag names don't have "_FLAG" in them. Change the comment so that it matches the actual flag definitions in the header. Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com> Reviewed-by: Nicolas Dufresne <nicolas.dufresne@collabora.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-08-29media: uapi: Add a control for DW100 driverXavier Roumegue
The DW100 driver gets the dewarping mapping as a binary blob from the userspace application through a custom control. The blob format is hardware specific so create a dedicated control for this purpose. Signed-off-by: Xavier Roumegue <xavier.roumegue@oss.nxp.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-08-29media: v4l: uapi: Add user control base for DW100 controlsXavier Roumegue
Add a control base for DW100 driver controls, and reserve 16 controls. Signed-off-by: Xavier Roumegue <xavier.roumegue@oss.nxp.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-08-29xfrm: lwtunnel: add lwtunnel support for xfrm interfaces in collect_md modeEyal Birger
Allow specifying the xfrm interface if_id and link as part of a route metadata using the lwtunnel infrastructure. This allows for example using a single xfrm interface in collect_md mode as the target of multiple routes each specifying a different if_id. With the appropriate changes to iproute2, considering an xfrm device ipsec1 in collect_md mode one can for example add a route specifying an if_id like so: ip route add <SUBNET> dev ipsec1 encap xfrm if_id 1 In which case traffic routed to the device via this route would use if_id in the xfrm interface policy lookup. Or in the context of vrf, one can also specify the "link" property: ip route add <SUBNET> dev ipsec1 encap xfrm if_id 1 link_dev eth15 Note: LWT_XFRM_LINK uses NLA_U32 similar to IFLA_XFRM_LINK even though internally "link" is signed. This is consistent with other _LINK attributes in other devices as well as in bpf and should not have an effect as device indexes can't be negative. Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-29xfrm: interface: support collect metadata modeEyal Birger
This commit adds support for 'collect_md' mode on xfrm interfaces. Each net can have one collect_md device, created by providing the IFLA_XFRM_COLLECT_METADATA flag at creation. This device cannot be altered and has no if_id or link device attributes. On transmit to this device, the if_id is fetched from the attached dst metadata on the skb. If exists, the link property is also fetched from the metadata. The dst metadata type used is METADATA_XFRM which holds these properties. On the receive side, xfrmi_rcv_cb() populates a dst metadata for each packet received and attaches it to the skb. The if_id used in this case is fetched from the xfrm state, and the link is fetched from the incoming device. This information can later be used by upper layers such as tc, ebpf, and ip rules. Because the skb is scrubed in xfrmi_rcv_cb(), the attachment of the dst metadata is postponed until after scrubing. Similarly, xfrm_input() is adapted to avoid dropping metadata dsts by only dropping 'valid' (skb_valid_dst(skb) == true) dsts. Policy matching on packets arriving from collect_md xfrmi devices is done by using the xfrm state existing in the skb's sec_path. The xfrm_if_cb.decode_cb() interface implemented by xfrmi_decode_session() is changed to keep the details of the if_id extraction tucked away in xfrm_interface.c. Reviewed-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2022-08-29perf: Add PERF_BR_NEW_ARCH_[N] map for BRBE on arm64 platformAnshuman Khandual
BRBE captured branch types will overflow perf_branch_entry.type and generic branch types in perf_branch_entry.new_type. So override each available arch specific branch type in the following manner to comprehensively process all reported branch types in BRBE. PERF_BR_ARM64_FIQ PERF_BR_NEW_ARCH_1 PERF_BR_ARM64_DEBUG_HALT PERF_BR_NEW_ARCH_2 PERF_BR_ARM64_DEBUG_EXIT PERF_BR_NEW_ARCH_3 PERF_BR_ARM64_DEBUG_INST PERF_BR_NEW_ARCH_4 PERF_BR_ARM64_DEBUG_DATA PERF_BR_NEW_ARCH_5 Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: James Clark <james.clark@arm.com> Link: https://lkml.kernel.org/r/20220824044822.70230-5-anshuman.khandual@arm.com
2022-08-29perf: Capture branch privilege informationAnshuman Khandual
Platforms like arm64 could capture privilege level information for all the branch records. Hence this adds a new element in the struct branch_entry to record the privilege level information, which could be requested through a new event.attr.branch_sample_type based flag PERF_SAMPLE_BRANCH_PRIV_SAVE. This flag helps user choose whether privilege information is captured. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: James Clark <james.clark@arm.com> Link: https://lkml.kernel.org/r/20220824044822.70230-4-anshuman.khandual@arm.com
2022-08-29perf: Extend branch type classificationAnshuman Khandual
branch_entry.type now has ran out of space to accommodate more branch types classification. This will prevent perf branch stack implementation on arm64 (via BRBE) to capture all available branch types. Extending this bit field i.e branch_entry.type [4 bits] is not an option as it will break user space ABI both for little and big endian perf tools. Extend branch classification with a new field branch_entry.new_type via a new branch type PERF_BR_EXTEND_ABI in branch_entry.type. Perf tools which could decode PERF_BR_EXTEND_ABI, will then parse branch_entry.new_type as well. branch_entry.new_type is a 4 bit field which can hold upto 16 branch types. The first three branch types will hold various generic page faults followed by five architecture specific branch types, which can be overridden by the platform for specific use cases. These architecture specific branch types gets overridden on arm64 platform for BRBE implementation. New generic branch types - PERF_BR_NEW_FAULT_ALGN - PERF_BR_NEW_FAULT_DATA - PERF_BR_NEW_FAULT_INST New arch specific branch types - PERF_BR_NEW_ARCH_1 - PERF_BR_NEW_ARCH_2 - PERF_BR_NEW_ARCH_3 - PERF_BR_NEW_ARCH_4 - PERF_BR_NEW_ARCH_5 Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: James Clark <james.clark@arm.com> Link: https://lkml.kernel.org/r/20220824044822.70230-3-anshuman.khandual@arm.com
2022-08-29perf: Add system error and not in transaction branch typesAnshuman Khandual
This expands generic branch type classification by adding two more entries there in i.e system error and not in transaction. This also updates the x86 implementation to process X86_BR_NO_TX records as appropriate. This changes branch types reported to user space on x86 platform but it should not be a problem. The possible scenarios and impacts are enumerated here. -------------------------------------------------------------------------- | kernel | perf tool | Impact | -------------------------------------------------------------------------- | old | old | Works as before | -------------------------------------------------------------------------- | old | new | PERF_BR_UNKNOWN is processed | -------------------------------------------------------------------------- | new | old | PERF_BR_NO_TX is blocked via old PERF_BR_MAX | -------------------------------------------------------------------------- | new | new | PERF_BR_NO_TX is recognized | -------------------------------------------------------------------------- When PERF_BR_NO_TX is blocked via old PERF_BR_MAX (new kernel with old perf tool) the user space might throw up an warning complaining about an unrecognized branch types being reported, but it's expected. PERF_BR_SERROR & PERF_BR_NO_TX branch types will be used for BRBE implementation on arm64 platform. PERF_BR_NO_TX complements 'abort' and 'in_tx' elements in perf_branch_entry which represent other transaction states for a given branch record. Because this completes the transaction state classification. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: James Clark <james.clark@arm.com> Link: https://lkml.kernel.org/r/20220824044822.70230-2-anshuman.khandual@arm.com
2022-08-26bpf: Fix a few typos in BPF helpers documentationQuentin Monnet
Address a few typos in the documentation for the BPF helper functions. They were reported by Jakub [0], who ran spell checkers on the generated man page [1]. [0] https://lore.kernel.org/linux-man/d22dcd47-023c-8f52-d369-7b5308e6c842@gmail.com/T/#mb02e7d4b7fb61d98fa914c77b581184e9a9537af [1] https://lore.kernel.org/linux-man/eb6a1e41-c48e-ac45-5154-ac57a2c76108@gmail.com/T/#m4a8d1b003616928013ffcd1450437309ab652f9f v3: Do not copy unrelated (and breaking) elements to tools/ header v2: Turn a ',' into a ';' Reported-by: Jakub Wilk <jwilk@jwilk.net> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220825220806.107143-1-quentin@isovalent.com
2022-08-26openvswitch: allow specifying ifindex of new interfacesAndrey Zhadchenko
CRIU is preserving ifindexes of net devices after restoration. However, current Open vSwitch API does not allow to target ifindex, so we cannot correctly restore OVS configuration. Add new OVS_DP_ATTR_IFINDEX for OVS_DP_CMD_NEW and use it as desired ifindex. Use OVS_VPORT_ATTR_IFINDEX during OVS_VPORT_CMD_NEW to specify new netdev ifindex. Signed-off-by: Andrey Zhadchenko <andrey.zhadchenko@virtuozzo.com> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-27perf/core: Add speculation info to branch entriesSandipan Das
Add a new "spec" bitfield to branch entries for providing speculation information. This will be populated using hints provided by branch sampling features on supported hardware. The following cases are covered: * No branch speculation information is available * Branch is speculative but taken on the wrong path * Branch is non-speculative but taken on the correct path * Branch is speculative and taken on the correct path Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/834088c302faf21c7b665031dd111f424e509a64.1660211399.git.sandipan.das@amd.com
2022-08-26Merge tag 'io_uring-6.0-2022-08-26' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Add missing header file to the MAINTAINERS entry for io_uring (Ammar) - liburing and the kernel ship the same io_uring.h header, but one change we've had for a long time only in liburing is to ensure it's C++ safe. Add extern C around it, so we can more easily sync them in the future (Ammar) - Fix an off-by-one in the sync cancel added in this merge window (me) - Error handling fix for passthrough (Kanchan) - Fix for address saving for async execution for the zc tx support (Pavel) - Fix ordering for TCP zc notifications, so we always have them ordered correctly between "data was sent" and "data was acked". This isn't strictly needed with the notification slots, but we've been pondering disabling the slot support for 6.0 - and if we do, then we do require the ordering to be sane. Regardless of that, it's the sane thing to do in terms of API (Pavel) - Minor cleanup for indentation and lockdep annotation (Pavel) * tag 'io_uring-6.0-2022-08-26' of git://git.kernel.dk/linux-block: io_uring/net: save address for sendzc async execution io_uring: conditional ->async_data allocation io_uring/notif: order notif vs send CQEs io_uring/net: fix indentation io_uring/net: fix zc send link failing io_uring/net: fix must_hold annotation io_uring: fix submission-failure handling for uring-cmd io_uring: fix off-by-one in sync cancelation file check io_uring: uapi: Add `extern "C"` in io_uring.h for liburing MAINTAINERS: Add `include/linux/io_uring_types.h`
2022-08-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel borkmann says: ==================== The following pull-request contains BPF updates for your *net* tree. We've added 11 non-merge commits during the last 14 day(s) which contain a total of 13 files changed, 61 insertions(+), 24 deletions(-). The main changes are: 1) Fix BPF verifier's precision tracking around BPF ring buffer, from Kumar Kartikeya Dwivedi. 2) Fix regression in tunnel key infra when passing FLOWI_FLAG_ANYSRC, from Eyal Birger. 3) Fix insufficient permissions for bpf_sys_bpf() helper, from YiFei Zhu. 4) Fix splat from hitting BUG when purging effective cgroup programs, from Pu Lehui. 5) Fix range tracking for array poke descriptors, from Daniel Borkmann. 6) Fix corrupted packets for XDP_SHARED_UMEM in aligned mode, from Magnus Karlsson. 7) Fix NULL pointer splat in BPF sockmap sk_msg_recvmsg(), from Liu Jian. 8) Add READ_ONCE() to bpf_jit_limit when reading from sysctl, from Kuniyuki Iwashima. 9) Add BPF selftest lru_bug check to s390x deny list, from Daniel Müller. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-26Merge tag 'wireless-next-2022-08-26-v2' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes berg says: ==================== Various updates: * rtw88: operation, locking, warning, and code style fixes * rtw89: small updates * cfg80211/mac80211: more EHT/MLO (802.11be, WiFi 7) work * brcmfmac: a couple of fixes * misc cleanups etc. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-25bpf: Add CGROUP prefix to cgroup_iter_orderHao Luo
bpf_cgroup_iter_order is globally visible but the entries do not have CGROUP prefix. As requested by Andrii, put a CGROUP in the names in bpf_cgroup_iter_order. This patch fixes two previous commits: one introduced the API and the other uses the API in bpf selftest (that is, the selftest cgroup_hierarchical_stats). I tested this patch via the following command: test_progs -t cgroup,iter,btf_dump Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter") Fixes: 88886309d2e8 ("selftests/bpf: add a selftest for cgroup hierarchical stats collection") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220825223936.1865810-1-haoluo@google.com Signed-off-by: Martin KaFai Lau <kafai@fb.com>
2022-08-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ethernet/mellanox/mlx5/core/en_fs.c 21234e3a84c7 ("net/mlx5e: Fix use after free in mlx5e_fs_init()") c7eafc5ed068 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer") https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/ https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25bpf: Introduce cgroup iterHao Luo
Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: - walking a cgroup's descendants in pre-order. - walking a cgroup's descendants in post-order. - walking a cgroup's ancestors. - process only the given cgroup. When attaching cgroup_iter, one can set a cgroup to the iter_link created from attaching. This cgroup is passed as a file descriptor or cgroup id and serves as the starting point of the walk. If no cgroup is specified, the starting point will be the root cgroup v2. For walking descendants, one can specify the order: either pre-order or post-order. For walking ancestors, the walk starts at the specified cgroup and ends at the root. One can also terminate the walk early by returning 1 from the iter program. Note that because walking cgroup hierarchy holds cgroup_mutex, the iter program is called with cgroup_mutex held. Currently only one session is supported, which means, depending on the volume of data bpf program intends to send to user space, the number of cgroups that can be walked is limited. For example, given the current buffer size is 8 * PAGE_SIZE, if the program sends 64B data for each cgroup, assuming PAGE_SIZE is 4kb, the total number of cgroups that can be walked is 512. This is a limitation of cgroup_iter. If the output data is larger than the kernel buffer size, after all data in the kernel buffer is consumed by user space, the subsequent read() syscall will signal EOPNOTSUPP. In order to work around, the user may have to update their program to reduce the volume of data sent to output. For example, skip some uninteresting cgroups. In future, we may extend bpf_iter flags to allow customizing buffer size. Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hao Luo <haoluo@google.com> Link: https://lore.kernel.org/r/20220824233117.1312810-2-haoluo@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-25wifi: cfg80211: Add link_id parameter to various key operations for MLOVeerendranath Jakkam
Add support for various key operations on MLD by adding new parameter link_id. Pass the link_id received from userspace to driver for add_key, get_key, del_key, set_default_key, set_default_mgmt_key and set_default_beacon_key to support configuring keys specific to each MLO link. Userspace must not specify link ID for MLO pairwise key since it is common for all the MLO links. Signed-off-by: Veerendranath Jakkam <quic_vjakkam@quicinc.com> Link: https://lore.kernel.org/r/20220730052643.1959111-4-quic_vjakkam@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-08-24Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2022-08-24 1) Fix a refcount leak in __xfrm_policy_check. From Xin Xiong. 2) Revert "xfrm: update SA curlft.use_time". This violates RFC 2367. From Antony Antony. 3) Fix a comment on XFRMA_LASTUSED. From Antony Antony. 4) x->lastused is not cloned in xfrm_do_migrate. Fix from Antony Antony. 5) Serialize the calls to xfrm_probe_algs. From Herbert Xu. 6) Fix a null pointer dereference of dst->dev on a metadata dst in xfrm_lookup_with_ifid. From Nikolay Aleksandrov. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-23net: improve and fix netlink kdocJakub Kicinski
Subsequent patch will render the kdoc from include/uapi/linux/netlink.h into Documentation. We need to fix the warnings. While at it move the comments on struct nlmsghdr to a proper kdoc comment. Link: https://lore.kernel.org/r/20220819200221.422801-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-23bpf: update bpf_{g,s}et_retval documentationStanislav Fomichev
* replace 'syscall' with 'upper layers', still mention that it's being exported via syscall errno * describe what happens in set_retval(-EPERM) + return 1 * describe what happens with bind's 'return 3' Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220823222555.523590-5-sdf@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-23bpf, flow_dissector: Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode for bpf progsShmulik Ladkani
Currently, attaching BPF_PROG_TYPE_FLOW_DISSECTOR programs completely replaces the flow-dissector logic with custom dissection logic. This forces implementors to write programs that handle dissection for any flows expected in the namespace. It makes sense for flow-dissector BPF programs to just augment the dissector with custom logic (e.g. dissecting certain flows or custom protocols), while enjoying the broad capabilities of the standard dissector for any other traffic. Introduce BPF_FLOW_DISSECTOR_CONTINUE retcode. Flow-dissector BPF programs may return this to indicate no dissection was made, and fallback to the standard dissector is requested. Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20220821113519.116765-3-shmulik.ladkani@gmail.com
2022-08-23fs: dlm: remove DLM_LSFL_FS from uapiAlexander Aring
The DLM_LSFL_FS flag is set in lockspaces created directly for a kernel user, as opposed to those lockspaces created for user space applications. The user space libdlm allowed this flag to be set for lockspaces created from user space, but then used by a kernel user. No kernel user has ever used this method, so remove the ability to do it. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2022-08-23io_uring: uapi: Add `extern "C"` in io_uring.h for liburingAmmar Faizi
Make it easy for liburing to integrate uapi header with the kernel. Previously, when this header changes, the liburing side can't directly copy this header file due to some small differences. Sync them. Link: https://lore.kernel.org/io-uring/f1feef16-6ea2-0653-238f-4aaee35060b6@kernel.dk Cc: Bart Van Assche <bvanassche@acm.org> Cc: Dylan Yudaken <dylany@fb.com> Cc: Facebook Kernel Team <kernel-team@fb.com> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-22block: sed-opal: Add ioctl to return device statusdougmill@linux.vnet.ibm.com
Provide a mechanism to retrieve basic status information about the device, including the "supported" flag indicating whether SED-OPAL is supported. The information returned is from the various feature descriptors received during the discovery0 step, and so this ioctl does nothing more than perform the discovery0 step and then save the information received. See "struct opal_status" and OPAL_FL_* bits for the status information currently returned. This is necessary to be able to check whether a device is OPAL enabled, set up, locked or unlocked from userspace programs like systemd-cryptsetup and libcryptsetup. Right now we just have to assume the user 'knows' or blindly attempt setup/lock/unlock operations. Signed-off-by: Douglas Miller <dougmill@linux.vnet.ibm.com> Tested-by: Luca Boccassi <bluca@debian.org> Reviewed-by: Scott Bauer <sbauer@plzdonthack.me> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org> Link: https://lore.kernel.org/r/20220816140713.84893-1-luca.boccassi@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-22Remove DECnet support from kernelStephen Hemminger
DECnet is an obsolete network protocol that receives more attention from kernel janitors than users. It belongs in computer protocol history museum not in Linux kernel. It has been "Orphaned" in kernel since 2010. The iproute2 support for DECnet was dropped in 5.0 release. The documentation link on Sourceforge says it is abandoned there as well. Leave the UAPI alone to keep userspace programs compiling. This means that there is still an empty neighbour table for AF_DECNET. The table of /proc/sys/net entries was updated to match current directories and reformatted to be alphabetical. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: David Ahern <dsahern@kernel.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-19media: v4l2-ctrls: add change flag for when dimensions changeHans Verkuil
Add a new V4L2_EVENT_CTRL_CH_DIMENSIONS change flag that is issued when the dimensions of an array change as a result of a __v4l2_ctrl_modify_dimensions() call. This will inform userspace that there are new dimensions. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
2022-08-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-18net: macsec: Expose MACSEC_SALT_LEN definition to user spaceEmeel Hakim
Expose MACSEC_SALT_LEN definition to user space to be used in various user space applications such as iproute. Iproute will use this as part of adding macsec extended packet number support. Reviewed-by: Raed Salem <raeds@nvidia.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Link: https://lore.kernel.org/r/20220818153229.4721-1-ehakim@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Andrii Nakryiko says: ==================== bpf-next 2022-08-17 We've added 45 non-merge commits during the last 14 day(s) which contain a total of 61 files changed, 986 insertions(+), 372 deletions(-). The main changes are: 1) New bpf_ktime_get_tai_ns() BPF helper to access CLOCK_TAI, from Kurt Kanzenbach and Jesper Dangaard Brouer. 2) Few clean ups and improvements for libbpf 1.0, from Andrii Nakryiko. 3) Expose crash_kexec() as kfunc for BPF programs, from Artem Savkov. 4) Add ability to define sleepable-only kfuncs, from Benjamin Tissoires. 5) Teach libbpf's bpf_prog_load() and bpf_map_create() to gracefully handle unsupported names on old kernels, from Hangbin Liu. 6) Allow opting out from auto-attaching BPF programs by libbpf's BPF skeleton, from Hao Luo. 7) Relax libbpf's requirement for shared libs to be marked executable, from Henqgi Chen. 8) Improve bpf_iter internals handling of error returns, from Hao Luo. 9) Few accommodations in libbpf to support GCC-BPF quirks, from James Hilliard. 10) Fix BPF verifier logic around tracking dynptr ref_obj_id, from Joanne Koong. 11) bpftool improvements to handle full BPF program names better, from Manu Bretelle. 12) bpftool fixes around libcap use, from Quentin Monnet. 13) BPF map internals clean ups and improvements around memory allocations, from Yafang Shao. 14) Allow to use cgroup_get_from_file() on cgroupv1, allowing BPF cgroup iterator to work on cgroupv1, from Yosry Ahmed. 15) BPF verifier internal clean ups, from Dave Marchevsky and Joanne Koong. 16) Various fixes and clean ups for selftests/bpf and vmtest.sh, from Daniel Xu, Artem Savkov, Joanne Koong, Andrii Nakryiko, Shibin Koikkara Reeny. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits) selftests/bpf: Few fixes for selftests/bpf built in release mode libbpf: Clean up deprecated and legacy aliases libbpf: Streamline bpf_attr and perf_event_attr initialization libbpf: Fix potential NULL dereference when parsing ELF selftests/bpf: Tests libbpf autoattach APIs libbpf: Allows disabling auto attach selftests/bpf: Fix attach point for non-x86 arches in test_progs/lsm libbpf: Making bpf_prog_load() ignore name if kernel doesn't support selftests/bpf: Update CI kconfig selftests/bpf: Add connmark read test selftests/bpf: Add existing connection bpf_*_ct_lookup() test bpftool: Clear errno after libcap's checks bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation bpftool: Fix a typo in a comment libbpf: Add names for auxiliary maps bpf: Use bpf_map_area_alloc consistently on bpf map creation bpf: Make __GFP_NOWARN consistent in bpf map creation bpf: Use bpf_map_area_free instread of kvfree bpf: Remove unneeded memset in queue_stack_map creation libbpf: preserve errno across pr_warn/pr_info/pr_debug ... ==================== Link: https://lore.kernel.org/r/20220817215656.1180215-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17bpf: Partially revert flexible-array member replacementDaniel Borkmann
Partially revert 94dfc73e7cf4 ("treewide: uapi: Replace zero-length arrays with flexible-array members") given it breaks BPF UAPI. For example, BPF CI run reveals build breakage under LLVM: [...] CLNG-BPF [test_maps] map_ptr_kern.o CLNG-BPF [test_maps] btf__core_reloc_arrays___diff_arr_val_sz.o CLNG-BPF [test_maps] test_bpf_cookie.o progs/map_ptr_kern.c:314:26: error: field 'trie_key' with variable sized type 'struct bpf_lpm_trie_key' not at the end of a struct or class is a GNU extension [-Werror,-Wgnu-variable-sized-type-not-at-end] struct bpf_lpm_trie_key trie_key; ^ CLNG-BPF [test_maps] btf__core_reloc_type_based___diff.o 1 error generated. make: *** [Makefile:521: /tmp/runner/work/bpf/bpf/tools/testing/selftests/bpf/map_ptr_kern.o] Error 1 make: *** Waiting for unfinished jobs.... [...] Typical usage of the bpf_lpm_trie_key is that the struct gets embedded into a user defined key for the LPM BPF map, from the selftest example: struct bpf_lpm_trie_key { <-- UAPI exported struct __u32 prefixlen; __u8 data[]; }; struct lpm_key { <-- BPF program defined struct struct bpf_lpm_trie_key trie_key; __u32 data; }; Undo this for BPF until a different solution can be found. It's the only flexible- array member case in the UAPI header. This was discovered in BPF CI after Dave reported that the include/uapi/linux/bpf.h header was out of sync with tools/include/uapi/linux/bpf.h after 94dfc73e7cf4. And the subsequent sync attempt failed CI. Fixes: 94dfc73e7cf4 ("treewide: uapi: Replace zero-length arrays with flexible-array members") Reported-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/bpf/22aebc88-da67-f086-e620-dd4a16e2bc69@iogearbox.net
2022-08-16virtio: kerneldocs fixes and enhancementsRicardo Cañuelo
Fix variable names in some kerneldocs, naming in others. Add kerneldocs for struct vring_desc and vring_interrupt. Signed-off-by: Ricardo Cañuelo <ricardo.canuelo@collabora.com> Message-Id: <20220810094004.1250-2-ricardo.canuelo@collabora.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com>
2022-08-15bpf: Clear up confusion in bpf_skb_adjust_room()'s documentationQuentin Monnet
Adding or removing room space _below_ layers 2 or 3, as the description mentions, is ambiguous. This was written with a mental image of the packet with layer 2 at the top, layer 3 under it, and so on. But it has led users to believe that it was on lower layers (before the beginning of the L2 and L3 headers respectively). Let's make it more explicit, and specify between which layers the room space is adjusted. Reported-by: Rumen Telbizov <rumen.telbizov@menlosecurity.com> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220812153727.224500-3-quentin@isovalent.com
2022-08-12Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - A huge patchset supporting vq resize using the new vq reset capability - Features, fixes, and cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (88 commits) vdpa/mlx5: Fix possible uninitialized return value vdpa_sim_blk: add support for discard and write-zeroes vdpa_sim_blk: add support for VIRTIO_BLK_T_FLUSH vdpa_sim_blk: make vdpasim_blk_check_range usable by other requests vdpa_sim_blk: check if sector is 0 for commands other than read or write vdpa_sim: Implement suspend vdpa op vhost-vdpa: uAPI to suspend the device vhost-vdpa: introduce SUSPEND backend feature bit vdpa: Add suspend operation virtio-blk: Avoid use-after-free on suspend/resume virtio_vdpa: support the arg sizes of find_vqs() vhost-vdpa: Call ida_simple_remove() when failed vDPA: fix 'cast to restricted le16' warnings in vdpa.c vDPA: !FEATURES_OK should not block querying device config space vDPA/ifcvf: support userspace to query features and MQ of a management device vDPA/ifcvf: get_config_size should return a value no greater than dev implementation vhost scsi: Allow user to control num virtqueues vhost-scsi: Fix max number of virtqueues vdpa/mlx5: Support different address spaces for control and data vdpa/mlx5: Implement susupend virtqueue callback ...
2022-08-11Merge tag 'net-6.0-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bluetooth, bpf, can and netfilter. A little larger than usual but it's all fixes, no late features. It's large partially because of timing, and partially because of follow ups to stuff that got merged a week or so before the merge window and wasn't as widely tested. Maybe the Bluetooth fixes are a little alarming so we'll address that, but the rest seems okay and not scary. Notably we're including a fix for the netfilter Kconfig [1], your WiFi warning [2] and a bluetooth fix which should unblock syzbot [3]. Current release - regressions: - Bluetooth: - don't try to cancel uninitialized works [3] - L2CAP: fix use-after-free caused by l2cap_chan_put - tls: rx: fix device offload after recent rework - devlink: fix UAF on failed reload and leftover locks in mlxsw Current release - new code bugs: - netfilter: - flowtable: fix incorrect Kconfig dependencies [1] - nf_tables: fix crash when nf_trace is enabled - bpf: - use proper target btf when exporting attach_btf_obj_id - arm64: fixes for bpf trampoline support - Bluetooth: - ISO: unlock on error path in iso_sock_setsockopt() - ISO: fix info leak in iso_sock_getsockopt() - ISO: fix iso_sock_getsockopt for BT_DEFER_SETUP - ISO: fix memory corruption on iso_pinfo.base - ISO: fix not using the correct QoS - hci_conn: fix updating ISO QoS PHY - phy: dp83867: fix get nvmem cell fail Previous releases - regressions: - wifi: cfg80211: fix validating BSS pointers in __cfg80211_connect_result [2] - atm: bring back zatm uAPI after ATM had been removed - properly fix old bug making bonding ARP monitor mode not being able to work with software devices with lockless Tx - tap: fix null-deref on skb->dev in dev_parse_header_protocol - revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP" it helps some devices and breaks others - netfilter: - nf_tables: many fixes rejecting cross-object linking which may lead to UAFs - nf_tables: fix null deref due to zeroed list head - nf_tables: validate variable length element extension - bgmac: fix a BUG triggered by wrong bytes_compl - bcmgenet: indicate MAC is in charge of PHY PM Previous releases - always broken: - bpf: - fix bad pointer deref in bpf_sys_bpf() injected via test infra - disallow non-builtin bpf programs calling the prog_run command - don't reinit map value in prealloc_lru_pop - fix UAFs during the read of map iterator fd - fix invalidity check for values in sk local storage map - reject sleepable program for non-resched map iterator - mptcp: - move subflow cleanup in mptcp_destroy_common() - do not queue data on closed subflows - virtio_net: fix memory leak inside XDP_TX with mergeable - vsock: fix memory leak when multiple threads try to connect() - rework sk_user_data sharing to prevent psock leaks - geneve: fix TOS inheriting for ipv4 - tunnels & drivers: do not use RT_TOS for IPv6 flowlabel - phy: c45 baset1: do not skip aneg configuration if clock role is not specified - rose: avoid overflow when /proc displays timer information - x25: fix call timeouts in blocking connects - can: mcp251x: fix race condition on receive interrupt - can: j1939: - replace user-reachable WARN_ON_ONCE() with netdev_warn_once() - fix memory leak of skbs in j1939_session_destroy() Misc: - docs: bpf: clarify that many things are not uAPI - seg6: initialize induction variable to first valid array index (to silence clang vs objtool warning) - can: ems_usb: fix clang 14's -Wunaligned-access warning" * tag 'net-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (117 commits) net: atm: bring back zatm uAPI dpaa2-eth: trace the allocated address instead of page struct net: add missing kdoc for struct genl_multicast_group::flags nfp: fix use-after-free in area_cache_get() MAINTAINERS: use my korg address for mt7601u mlxsw: minimal: Fix deadlock in ports creation bonding: fix reference count leak in balance-alb mode net: usb: qmi_wwan: Add support for Cinterion MV32 bpf: Shut up kern_sys_bpf warning. net/tls: Use RCU API to access tls_ctx->netdev tls: rx: device: don't try to copy too much on detach tls: rx: device: bound the frag walk net_sched: cls_route: remove from list when handle is 0 selftests: forwarding: Fix failing tests with old libnet net: refactor bpf_sk_reuseport_detach() net: fix refcount bug in sk_psock_get (2) selftests/bpf: Ensure sleepable program is rejected by hash map iter selftests/bpf: Add write tests for sk local storage map iterator selftests/bpf: Add tests for reading a dangling map iter fd bpf: Only allow sleepable program for resched-able iterator ...
2022-08-11net: atm: bring back zatm uAPIJakub Kicinski
Jiri reports that linux-atm does not build without this header. Bring it back. It's completely dead code but we can't break the build for user space :( Reported-by: Jiri Slaby <jirislaby@kernel.org> Fixes: 052e1f01bfae ("net: atm: remove support for ZeitNet ZN122x ATM devices") Link: https://lore.kernel.org/all/8576aef3-37e4-8bae-bab5-08f82a78efd3@kernel.org/ Link: https://lore.kernel.org/r/20220810164547.484378-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-11vhost-vdpa: uAPI to suspend the deviceEugenio Pérez
The ioctl adds support for suspending the device from userspace. This is a must before getting virtqueue indexes (base) for live migration, since the device could modify them after userland gets them. There are individual ways to perform that action for some devices (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but there was no way to perform it for any vhost device (and, in particular, vhost-vdpa). After a successful return of the ioctl call the device must not process more virtqueue descriptors. The device can answer to read or writes of config fields as if it were not suspended. In particular, writing to "queue_enable" with a value of 1 will not make the device start processing buffers of the virtqueue. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20220810171512.2343333-4-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vhost-vdpa: introduce SUSPEND backend feature bitEugenio Pérez
Userland knows if it can suspend the device or not by checking this feature bit. It's only offered if the vdpa driver backend implements the suspend() operation callback, and to offer it or userland to ack it if the backend does not offer that callback is an error. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20220810171512.2343333-3-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11vduse: Support querying information of IOVA regionsXie Yongji
This introduces a new ioctl: VDUSE_IOTLB_GET_INFO to support querying some information of IOVA regions. Now it can be used to query whether the IOVA region supports userspace memory registration. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Message-Id: <20220803045523.23851-6-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11vduse: Support registering userspace memory for IOVA regionsXie Yongji
Introduce two ioctls: VDUSE_IOTLB_REG_UMEM and VDUSE_IOTLB_DEREG_UMEM to support registering and de-registering userspace memory for IOVA regions. Now it only supports registering userspace memory for bounce buffer region in virtio-vdpa case. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220803045523.23851-5-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11net: virtio_net: notifications coalescing supportAlvaro Karsz
New VirtIO network feature: VIRTIO_NET_F_NOTF_COAL. Control a Virtio network device notifications coalescing parameters using the control virtqueue. A device that supports this fetature can receive VIRTIO_NET_CTRL_NOTF_COAL control commands. - VIRTIO_NET_CTRL_NOTF_COAL_TX_SET: Ask the network device to change the following parameters: - tx_usecs: Maximum number of usecs to delay a TX notification. - tx_max_packets: Maximum number of packets to send before a TX notification. - VIRTIO_NET_CTRL_NOTF_COAL_RX_SET: Ask the network device to change the following parameters: - rx_usecs: Maximum number of usecs to delay a RX notification. - rx_max_packets: Maximum number of packets to receive before a RX notification. VirtIO spec. patch: https://lists.oasis-open.org/archives/virtio-comment/202206/msg00100.html Signed-off-by: Alvaro Karsz <alvaro.karsz@solid-run.com> Message-Id: <20220718091102.498774-1-alvaro.karsz@solid-run.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jason Wang <jasowang@redhat.com>
2022-08-11virtio_pci: struct virtio_pci_common_cfg add queue_resetXuan Zhuo
Add queue_reset in virtio_pci_modern_common_cfg. https://github.com/oasis-tcs/virtio-spec/issues/124 https://github.com/oasis-tcs/virtio-spec/issues/139 Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-30-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-11virtio: queue_reset: add VIRTIO_F_RING_RESETXuan Zhuo
Added VIRTIO_F_RING_RESET, it came from here https://github.com/oasis-tcs/virtio-spec/issues/124 https://github.com/oasis-tcs/virtio-spec/issues/139 This feature indicates that the driver can reset a queue individually. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220801063902.129329-28-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>