summaryrefslogtreecommitdiff
path: root/include/uapi/linux
AgeCommit message (Collapse)Author
2024-04-10ethtool: update tsinfo statistics attribute docs with correct typeRahul Rameshbabu
nla_put_uint can either write a u32 or u64 netlink attribute value. The size depends on whether the value can be represented with a u32 or requires a u64. Use a uint annotation in various documentation to represent this. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Link: https://lore.kernel.org/r/20240409232520.237613-2-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-09PCI/DOE: Support discovery version 2Alexey Kardashevskiy
PCIe r6.1, sec 6.30.1.1 defines a "DOE Discovery Version" field in the DOE Discovery Request Data Object Contents (3rd DW) as: 15:8 DOE Discovery Version – must be 02h if the Capability Version in the Data Object Exchange Extended Capability is 02h or greater. Add support for the version on devices with the DOE v2 capability. Link: https://lore.kernel.org/r/20240307022006.3657433-1-aik@amd.com Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
2024-04-08devlink: Support setting max_io_eqsParav Pandit
Many devices send event notifications for the IO queues, such as tx and rx queues, through event queues. Enable a privileged owner, such as a hypervisor PF, to set the number of IO event queues for the VF and SF during the provisioning stage. example: Get maximum IO event queues of the VF device:: $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 10 Set maximum IO event queues of the VF device:: $ devlink port function set pci/0000:06:00.0/2 max_io_eqs 32 $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 ipsec_packet disabled max_io_eqs 32 Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-08vhost-vdpa: change ioctl # for VDPA_GET_VRING_SIZEMichael S. Tsirkin
VDPA_GET_VRING_SIZE by mistake uses the already occupied ioctl # 0x80 and we never noticed - it happens to work because the direction and size are different, but confuses tools such as perf which like to look at just the number, and breaks the extra robustness of the ioctl numbering macros. To fix, sort the entries and renumber the ioctl - not too late since it wasn't in any released kernels yet. Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Reported-by: Namhyung Kim <namhyung@kernel.org> Fixes: 1496c47065f9 ("vhost-vdpa: uapi to support reporting per vq size") Cc: "Zhu Lingshan" <lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <41c1c5489688abe5bfef9f7cf15584e3fb872ac5.1712092759.git.mst@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Zhu Lingshan <lingshan.zhu@intel.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2024-04-06net: ethtool: Allow passing a phy index for some commandsMaxime Chevallier
Some netlink commands are target towards ethernet PHYs, to control some of their features. As there's several such commands, add the ability to pass a PHY index in the ethnl request, which will populate the generic ethnl_req_info with the relevant phydev when the command targets a PHY. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-06net: phy: Introduce ethernet link topology representationMaxime Chevallier
Link topologies containing multiple network PHYs attached to the same net_device can be found when using a PHY as a media converter for use with an SFP connector, on which an SFP transceiver containing a PHY can be used. With the current model, the transceiver's PHY can't be used for operations such as cable testing, timestamping, macsec offload, etc. The reason being that most of the logic for these configuration, coming from either ethtool netlink or ioctls tend to use netdev->phydev, which in multi-phy systems will reference the PHY closest to the MAC. Introduce a numbering scheme allowing to enumerate PHY devices that belong to any netdev, which can in turn allow userspace to take more precise decisions with regard to each PHY's configuration. The numbering is maintained per-netdev, in a phy_device_list. The numbering works similarly to a netdevice's ifindex, with identifiers that are only recycled once INT_MAX has been reached. This prevents races that could occur between PHY listing and SFP transceiver removal/insertion. The identifiers are assigned at phy_attach time, as the numbering depends on the netdevice the phy is attached to. The PHY index can be re-used for PHYs that are persistent. Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-05netlink: specs: ethtool: define header-flags as an enumJakub Kicinski
Recent changes added header flags to the spec. Use an enum instead of defines for more seamless codegen. [Jakub: drop the already applied parts and rewrite message] Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-6-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-05ethtool: add interface to read Tx hardware timestamping statisticsRahul Rameshbabu
Multiple network devices that support hardware timestamping appear to have common behavior with regards to timestamp handling. Implement common Tx hardware timestamping statistics in a tx_stats struct_group. Common Rx hardware timestamping statistics can subsequently be implemented in a rx_stats struct_group for ethtool_ts_stats. Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20240403212931.128541-2-rrameshbabu@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/ip_gre.c 17af420545a7 ("erspan: make sure erspan_base_hdr is present in skb->head") 5832c4a77d69 ("ip_tunnel: convert __be16 tunnel flags to bitmaps") https://lore.kernel.org/all/20240402103253.3b54a1cf@canb.auug.org.au/ Adjacent changes: net/ipv6/ip6_fib.c d21d40605bca ("ipv6: Fix infinite recursion in fib6_dump_done().") 5fc68320c1fb ("ipv6: remove RTNL protection from inet6_dump_fib()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-04bpf: Pack struct bpf_fib_lookupAnton Protopopov
The struct bpf_fib_lookup is supposed to be of size 64. A recent commit 59b418c7063d ("bpf: Add a check for struct bpf_fib_lookup size") added a static assertion to check this property so that future changes to the structure will not accidentally break this assumption. As it immediately turned out, on some 32-bit arm systems, when AEABI=n, the total size of the structure was equal to 68, see [1]. This happened because the bpf_fib_lookup structure contains a union of two 16-bit fields: union { __u16 tot_len; __u16 mtu_result; }; which was supposed to compile to a 16-bit-aligned 16-bit field. On the aforementioned setups it was instead both aligned and padded to 32-bits. Declare this inner union as __attribute__((packed, aligned(2))) such that it always is of size 2 and is aligned to 16 bits. [1] https://lore.kernel.org/all/CA+G9fYtsoP51f-oP_Sp5MOq-Ffv8La2RztNpwvE6+R1VtFiLrw@mail.gmail.com/#t Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Fixes: e1850ea9bd9e ("bpf: bpf_fib_lookup return MTU value as output when looked up") Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240403123303.1452184-1-aspsk@isovalent.com
2024-04-03Merge tag 'wireless-next-2024-04-03' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.10 The first "new features" pull request for v6.10 with changes both in stack and in drivers. The big thing in this pull request is that wireless subsystem is now almost free of sparse warnings. There's only one warning left in ath11k which was introduced in v6.9-rc1 and will be fixed via the wireless tree. Realtek drivers continue to improve, now we have support for RTL8922AE and RTL8723CS devices. ath11k also has long waited support for P2P. This time we have a small conflict in iwlwifi, Stephen has an example merge resolution which should help with fixing the conflict: https://lore.kernel.org/all/20240326100945.765b8caf@canb.auug.org.au/ Major changes: rtw89 * RTL8922AE Wi-Fi 7 PCI device support rtw88 * RTL8723CS SDIO device support iwlwifi * don't support puncturing in 5 GHz * support monitor mode on passive channels * BZ-W device support * P2P with HE/EHT support ath11k * P2P support for QCA6390, WCN6855 and QCA2066 * tag 'wireless-next-2024-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (122 commits) wifi: mt76: mt7915: workaround dubious x | !y warning wifi: mwl8k: Avoid -Wflex-array-member-not-at-end warnings wifi: ti: Avoid a hundred -Wflex-array-member-not-at-end warnings wifi: iwlwifi: mvm: fix check in iwl_mvm_sta_fw_id_mask net: rfkill: gpio: Convert to platform remove callback returning void wifi: mac80211: use kvcalloc() for codel vars wifi: iwlwifi: reconfigure TLC during HW restart wifi: iwlwifi: mvm: don't change BA sessions during restart wifi: iwlwifi: mvm: select STA mask only for active links wifi: iwlwifi: mvm: set wider BW OFDMA ignore correctly wifi: iwlwifi: Add support for LARI_CONFIG_CHANGE_CMD cmd v9 wifi: iwlwifi: mvm: Declare HE/EHT capabilities support for P2P interfaces wifi: iwlwifi: mvm: Remove outdated comment wifi: iwlwifi: add support for BZ_W wifi: iwlwifi: Print a specific device name. wifi: iwlwifi: remove wrong CRF_IDs wifi: iwlwifi: remove devices that never came out wifi: iwlwifi: mvm: mark EMLSR disabled in cleanup iterator wifi: iwlwifi: mvm: fix active link counting during recovery wifi: iwlwifi: mvm: assign link STA ID lookups during restart ... ==================== Link: https://lore.kernel.org/r/20240403093625.CF515C433C7@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03net: ethtool: Add impedance mismatch result code to cable testPawel Dembicki
Some PHYs can recognize during a cable test if the impedance in the cable is okay. They can detect reflections caused by impedance discontinuity between a regular 100 Ohm cable and an abnormal part with a higher or lower impedance. This commit introduces a new result code: ETHTOOL_A_CABLE_RESULT_CODE_IMPEDANCE_MISMATCH, which represents the results of a cable test indicating issues with impedance integrity. Signed-off-by: Pawel Dembicki <paweldembicki@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20240402201123.2961909-2-paweldembicki@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-03tee: tstee: Add Trusted Services TEE driverBalint Dobszay
The Trusted Services project provides a framework for developing and deploying device Root of Trust services in FF-A Secure Partitions. The FF-A SPs are accessible through the FF-A driver, but this doesn't provide a user space interface. The goal of this TEE driver is to make Trusted Services SPs accessible for user space clients. All TS SPs have the same FF-A UUID, it identifies the RPC protocol used by TS. A TS SP can host one or more services, a service is identified by its service UUID. The same type of service cannot be present twice in the same SP. During SP boot each service in an SP is assigned an interface ID, this is just a short ID to simplify message addressing. There is 1:1 mapping between TS SPs and TEE devices, i.e. a separate TEE device is registered for each TS SP. This is required since contrary to the generic TEE design where memory is shared with the whole TEE implementation, in case of FF-A, memory is shared with a specific SP. A user space client has to be able to separately share memory with each SP based on its endpoint ID. Acked-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Balint Dobszay <balint.dobszay@arm.com> Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
2024-04-02uapi: team: use header file generated from YAML specHangbin Liu
generated with: $ ./tools/net/ynl/ynl-gen-c.py --mode uapi \ > --spec Documentation/netlink/specs/team.yaml \ > --header -o include/uapi/linux/if_team.h Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20240401031004.1159713-5-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-02bpf: Fix typo in uapi doc commentsDavid Lechner
In a few places in the bpf uapi headers, EOPNOTSUPP is missing a "P" in the doc comments. This adds the missing "P". Signed-off-by: David Lechner <dlechner@baylibre.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240329152900.398260-2-dlechner@baylibre.com
2024-04-02crypto: remove CONFIG_CRYPTO_STATSEric Biggers
Remove support for the "Crypto usage statistics" feature (CONFIG_CRYPTO_STATS). This feature does not appear to have ever been used, and it is harmful because it significantly reduces performance and is a large maintenance burden. Covering each of these points in detail: 1. Feature is not being used Since these generic crypto statistics are only readable using netlink, it's fairly straightforward to look for programs that use them. I'm unable to find any evidence that any such programs exist. For example, Debian Code Search returns no hits except the kernel header and kernel code itself and translations of the kernel header: https://codesearch.debian.net/search?q=CRYPTOCFGA_STAT&literal=1&perpkg=1 The patch series that added this feature in 2018 (https://lore.kernel.org/linux-crypto/1537351855-16618-1-git-send-email-clabbe@baylibre.com/) said "The goal is to have an ifconfig for crypto device." This doesn't appear to have happened. It's not clear that there is real demand for crypto statistics. Just because the kernel provides other types of statistics such as I/O and networking statistics and some people find those useful does not mean that crypto statistics are useful too. Further evidence that programs are not using CONFIG_CRYPTO_STATS is that it was able to be disabled in RHEL and Fedora as a bug fix (https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2947). Even further evidence comes from the fact that there are and have been bugs in how the stats work, but they were never reported. For example, before Linux v6.7 hash stats were double-counted in most cases. There has also never been any documentation for this feature, so it might be hard to use even if someone wanted to. 2. CONFIG_CRYPTO_STATS significantly reduces performance Enabling CONFIG_CRYPTO_STATS significantly reduces the performance of the crypto API, even if no program ever retrieves the statistics. This primarily affects systems with a large number of CPUs. For example, https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2039576 reported that Lustre client encryption performance improved from 21.7GB/s to 48.2GB/s by disabling CONFIG_CRYPTO_STATS. It can be argued that this means that CONFIG_CRYPTO_STATS should be optimized with per-cpu counters similar to many of the networking counters. But no one has done this in 5+ years. This is consistent with the fact that the feature appears to be unused, so there seems to be little interest in improving it as opposed to just disabling it. It can be argued that because CONFIG_CRYPTO_STATS is off by default, performance doesn't matter. But Linux distros tend to error on the side of enabling options. The option is enabled in Ubuntu and Arch Linux, and until recently was enabled in RHEL and Fedora (see above). So, even just having the option available is harmful to users. 3. CONFIG_CRYPTO_STATS is a large maintenance burden There are over 1000 lines of code associated with CONFIG_CRYPTO_STATS, spread among 32 files. It significantly complicates much of the implementation of the crypto API. After the initial submission, many fixes and refactorings have consumed effort of multiple people to keep this feature "working". We should be spending this effort elsewhere. Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-01pfcp: always set pfcp metadataMichal Swiatkowski
In PFCP receive path set metadata needed by flower code to do correct classification based on this metadata. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01ip_tunnel: convert __be16 tunnel flags to bitmapsAlexander Lobakin
Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT have been defined as __be16. Now all of those 16 bits are occupied and there's no more free space for new flags. It can't be simply switched to a bigger container with no adjustments to the values, since it's an explicit Endian storage, and on LE systems (__be16)0x0001 equals to (__be64)0x0001000000000000. We could probably define new 64-bit flags depending on the Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on LE, but that would introduce an Endianness dependency and spawn a ton of Sparse warnings. To mitigate them, all of those places which were adjusted with this change would be touched anyway, so why not define stuff properly if there's no choice. Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the value already coded and a fistful of <16 <-> bitmap> converters and helpers. The two flags which have a different bit position are SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as __cpu_to_be16(), but as (__force __be16), i.e. had different positions on LE and BE. Now they both have strongly defined places. Change all __be16 fields which were used to store those flags, to IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) -> unsigned long[1] for now, and replace all TUNNEL_* occurrences to their bitmap counterparts. Use the converters in the places which talk to the userspace, hardware (NFP) or other hosts (GRE header). The rest must explicitly use the new flags only. This must be done at once, otherwise there will be too many conversions throughout the code in the intermediate commits. Finally, disable the old __be16 flags for use in the kernel code (except for the two 'irregular' flags mentioned above), to prevent any accidental (mis)use of them. For the userspace, nothing is changed, only additions were made. Most noticeable bloat-o-meter difference (.text): vmlinux: 307/-1 (306) gre.ko: 62/0 (62) ip_gre.ko: 941/-217 (724) [*] ip_tunnel.ko: 390/-900 (-510) [**] ip_vti.ko: 138/0 (138) ip6_gre.ko: 534/-18 (516) [*] ip6_tunnel.ko: 118/-10 (108) [*] gre_flags_to_tnl_flags() grew, but still is inlined [**] ip_tunnel_find() got uninlined, hence such decrease The average code size increase in non-extreme case is 100-200 bytes per module, mostly due to sizeof(long) > sizeof(__be16), as %__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers are able to expand the majority of bitmap_*() calls here into direct operations on scalars. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-01Merge tag 'v6.9-rc2' into media_stageHans Verkuil
Linux 6.9-rc2 This is needed to pull in commit 11763a8598f88 ("fs/9p: fix uaf in in v9fs_stat2inode_dotl"), which fixes the broken virtme. With this fix the media regression tests can be run again without crashing.
2024-03-28bpf: Add support for passing mark with bpf_fib_lookupAnton Protopopov
Extend the bpf_fib_lookup() helper by making it to utilize mark if the BPF_FIB_LOOKUP_MARK flag is set. In order to pass the mark the four bytes of struct bpf_fib_lookup are used, shared with the output-only smac/dmac fields. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: David Ahern <dsahern@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240326101742.17421-2-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-27Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-03-25 We've added 38 non-merge commits during the last 13 day(s) which contain a total of 50 files changed, 867 insertions(+), 274 deletions(-). The main changes are: 1) Add the ability to specify and retrieve BPF cookie also for raw tracepoint programs in order to ease migration from classic to raw tracepoints, from Andrii Nakryiko. 2) Allow the use of bpf_get_{ns_,}current_pid_tgid() helper for all program types and add additional BPF selftests, from Yonghong Song. 3) Several improvements to bpftool and its build, for example, enabling libbpf logs when loading pid_iter in debug mode, from Quentin Monnet. 4) Check the return code of all BPF-related set_memory_*() functions during load and bail out in case they fail, from Christophe Leroy. 5) Avoid a goto in regs_refine_cond_op() such that the verifier can be better integrated into Agni tool which doesn't support backedges yet, from Harishankar Vishwanathan. 6) Add a small BPF trie perf improvement by always inlining longest_prefix_match, from Jesper Dangaard Brouer. 7) Small BPF selftest refactor in bpf_tcp_ca.c to utilize start_server() helper instead of open-coding it, from Geliang Tang. 8) Improve test_tc_tunnel.sh BPF selftest to prevent client connect before the server bind, from Alessandro Carminati. 9) Fix BPF selftest benchmark for older glibc and use syscall(SYS_gettid) instead of gettid(), from Alan Maguire. 10) Implement a backward-compatible method for struct_ops types with additional fields which are not present in older kernels, from Kui-Feng Lee. 11) Add a small helper to check if an instruction is addr_space_cast from as(0) to as(1) and utilize it in x86-64 JIT, from Puranjay Mohan. 12) Small cleanup to remove unnecessary error check in bpf_struct_ops_map_update_elem, from Martin KaFai Lau. 13) Improvements to libbpf fd validity checks for BPF map/programs, from Mykyta Yatsenko. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (38 commits) selftests/bpf: Fix flaky test btf_map_in_map/lookup_update bpf: implement insn_is_cast_user() helper for JITs bpf: Avoid get_kernel_nofault() to fetch kprobe entry IP selftests/bpf: Use start_server in bpf_tcp_ca bpf: Sync uapi bpf.h to tools directory libbpf: Add new sec_def "sk_skb/verdict" selftests/bpf: Mark uprobe trigger functions with nocf_check attribute selftests/bpf: Use syscall(SYS_gettid) instead of gettid() wrapper in bench bpf-next: Avoid goto in regs_refine_cond_op() bpftool: Clean up HOST_CFLAGS, HOST_LDFLAGS for bootstrap bpftool selftests/bpf: scale benchmark counting by using per-CPU counters bpftool: Remove unnecessary source files from bootstrap version bpftool: Enable libbpf logs when loading pid_iter in debug mode selftests/bpf: add raw_tp/tp_btf BPF cookie subtests libbpf: add support for BPF cookie for raw_tp/tp_btf programs bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programs bpf: pass whole link instead of prog when triggering raw tracepoint bpf: flatten bpf_probe_register call chain selftests/bpf: Prevent client connect before server bind in test_tc_tunnel.sh selftests/bpf: Add a sk_msg prog bpf_get_ns_current_pid_tgid() test ... ==================== Link: https://lore.kernel.org/r/20240325233940.7154-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-27drm/amdkfd: range check cp bad op exception interruptsJonathan Kim
Due to a CP interrupt bug, bad packet garbage exception codes are raised. Do a range check so that the debugger and runtime do not receive garbage codes. Update the user api to guard exception code type checking as well. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Tested-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-26statx: stx_subvolKent Overstreet
Add a new statx field for (sub)volume identifiers, as implemented by btrfs and bcachefs. This includes bcachefs support; we'll definitely want btrfs support as well. Link: https://lore.kernel.org/linux-fsdevel/2uvhm6gweyl7iyyp2xpfryvcu2g3padagaeqcbiavjyiis6prl@yjm725bizncq/ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20240308022914.196982-1-kent.overstreet@linux.dev Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-03-25wifi: nl80211: cleanup nl80211.h kernel-docJeff Johnson
Fix all of the kernel-doc issues in nl80211.h. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://msgid.link/20240319-kdoc-nl80211-v1-3-549e09d52866@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: nl80211: fix nl80211 uapi comment style issuesJeff Johnson
Currently kernel-doc raises "warning: bad line:" for several comments that have invalid multi-line comment style; they are missing the leading '*'. And checkpatch.pl raises "WARNING: please, no space before tabs" for a large number of comments which have space then tab after the leading '*'. Fix those issues. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://msgid.link/20240319-kdoc-nl80211-v1-2-549e09d52866@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25wifi: nl80211: rename enum plink_actionsJeff Johnson
kernel-doc flagged the following issue: include/uapi/linux/nl80211.h:6081: warning: expecting prototype for enum nl80211_plink_action. Prototype was for enum plink_actions instead This is because the documentation doesn't match the code. Normally the correct fix for such an issue is to modify the documentation to match the code. However, in this case, since the actual name plink_actions is not referenced by any code, rename it to nl80211_plink_action to give it a proper prefix and match the documentation. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://msgid.link/20240319-kdoc-nl80211-v1-1-549e09d52866@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-03-25media: v4l2: Add REMOVE_BUFS ioctlBenjamin Gaignard
VIDIOC_REMOVE_BUFS ioctl allows to remove buffers from a queue. The number of buffers to remove in given by count field of struct v4l2_remove_buffers and the range start at the index specified in the same structure. Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com> Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> [hverkuil: vidioc-remove-bufs.rst: mention no bufs are freed on error]
2024-03-21Merge tag 'tty-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial driver updates from Greg KH: "Here is the big set of TTY/Serial driver updates and cleanups for 6.9-rc1. Included in here are: - more tty cleanups from Jiri - loads of 8250 driver cleanups from Andy - max310x driver updates - samsung serial driver updates - uart_prepare_sysrq_char() updates for many drivers - platform driver remove callback void cleanups - stm32 driver updates - other small tty/serial driver updates All of these have been in linux-next for a long time with no reported issues" * tag 'tty-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits) dt-bindings: serial: stm32: add power-domains property serial: 8250_dw: Replace ACPI device check by a quirk serial: Lock console when calling into driver before registration serial: 8250_uniphier: Switch to use uart_read_port_properties() serial: 8250_tegra: Switch to use uart_read_port_properties() serial: 8250_pxa: Switch to use uart_read_port_properties() serial: 8250_omap: Switch to use uart_read_port_properties() serial: 8250_of: Switch to use uart_read_port_properties() serial: 8250_lpc18xx: Switch to use uart_read_port_properties() serial: 8250_ingenic: Switch to use uart_read_port_properties() serial: 8250_dw: Switch to use uart_read_port_properties() serial: 8250_bcm7271: Switch to use uart_read_port_properties() serial: 8250_bcm2835aux: Switch to use uart_read_port_properties() serial: 8250_aspeed_vuart: Switch to use uart_read_port_properties() serial: port: Introduce a common helper to read properties serial: core: Add UPIO_UNKNOWN constant for unknown port type serial: core: Move struct uart_port::quirks closer to possible values serial: sh-sci: Call sci_serial_{in,out}() directly serial: core: only stop transmit when HW fifo is empty serial: pch: Use uart_prepare_sysrq_char(). ...
2024-03-21Merge tag 'usb-6.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB / Thunderbolt updates from Greg KH: "Here is the big set of USB and Thunderbolt changes for 6.9-rc1. Lots of tiny changes and forward progress to support new hardware and better support for existing devices. Included in here are: - Thunderbolt (i.e. USB4) updates for newer hardware and uses as more people start to use the hardware - default USB authentication mode Kconfig and documentation update to make it more obvious what is going on - USB typec updates and enhancements - usual dwc3 driver updates - usual xhci driver updates - function USB (i.e. gadget) driver updates and additions - new device ids for lots of drivers - loads of other small updates, full details in the shortlog All of these, including a "last minute regression fix" have been in linux-next with no reported issues" * tag 'usb-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (185 commits) usb: usb-acpi: Fix oops due to freeing uninitialized pld pointer usb: gadget: net2272: Use irqflags in the call to net2272_probe_fin usb: gadget: tegra-xudc: Fix USB3 PHY retrieval logic phy: tegra: xusb: Add API to retrieve the port number of phy USB: gadget: pxa27x_udc: Remove unused of_gpio.h usb: gadget/snps_udc_plat: Remove unused of_gpio.h usb: ohci-pxa27x: Remove unused of_gpio.h usb: sl811-hcd: only defined function checkdone if QUIRK2 is defined usb: Clarify expected behavior of dev_bin_attrs_are_visible() xhci: Allow RPM on the USB controller (1022:43f7) by default usb: isp1760: remove SLAB_MEM_SPREAD flag usage usb: misc: onboard_hub: use pointer consistently in the probe function usb: gadget: fsl: Increase size of name buffer for endpoints usb: gadget: fsl: Add of device table to enable module autoloading usb: typec: tcpm: add support to set tcpc connector orientatition usb: typec: tcpci: add generic tcpci fallback compatible dt-bindings: usb: typec-tcpci: add tcpci fallback binding usb: gadget: fsl-udc: Replace custom log wrappers by dev_{err,warn,dbg,vdbg} usb: core: Set connect_type of ports based on DT node dt-bindings: usb: Add downstream facing ports to realtek binding ...
2024-03-19bpf: support BPF cookie in raw tracepoint (raw_tp, tp_btf) programsAndrii Nakryiko
Wire up BPF cookie for raw tracepoint programs (both BTF and non-BTF aware variants). This brings them up to part w.r.t. BPF cookie usage with classic tracepoint and fentry/fexit programs. Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Message-ID: <20240319233852.1977493-4-andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-19Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - Per vq sizes in vdpa - Info query for block devices support in vdpa - DMA sync callbacks in vduse - Fixes, cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (35 commits) virtio_net: rename free_old_xmit_skbs to free_old_xmit virtio_net: unify the code for recycling the xmit ptr virtio-net: add cond_resched() to the command waiting loop virtio-net: convert rx mode setting to use workqueue virtio: packed: fix unmap leak for indirect desc table vDPA: report virtio-blk flush info to user space vDPA: report virtio-block read-only info to user space vDPA: report virtio-block write zeroes configuration to user space vDPA: report virtio-block discarding configuration to user space vDPA: report virtio-block topology info to user space vDPA: report virtio-block MQ info to user space vDPA: report virtio-block max segments in a request to user space vDPA: report virtio-block block-size to user space vDPA: report virtio-block max segment size to user space vDPA: report virtio-block capacity to user space virtio: make virtio_bus const vdpa: make vdpa_bus const vDPA/ifcvf: implement vdpa_config_ops.get_vq_num_min vDPA/ifcvf: get_max_vq_size to return max size virtio_vdpa: create vqs with the actual size ...
2024-03-19vDPA: report virtio-blk flush info to user spaceZhu Lingshan
This commit reports whether a virtio-blk device support cache flush command to user space Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-11-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block read-only info to user spaceZhu Lingshan
This commit report read-only information of virtio-blk devices to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-10-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block write zeroes configuration to user spaceZhu Lingshan
This commits reports write zeroes configuration of virtio-block devices to user space, includes: 1)maximum write zeroes sectors size 2)maximum write zeroes segment number Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-9-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block discarding configuration to user spaceZhu Lingshan
This commit reports virtio-blk discarding configuration to user space,includes: 1) the maximum discard sectors 2) maximum number of discard segments for the block driver to use 3) the alignment for splitting a discarding request Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-8-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block topology info to user spaceZhu Lingshan
This commit allows vDPA reporting topology information of virtio-blk devices to user space, includes: 1) the number of logical blocks per physical block 2) offset of first aligned logical block 3) suggested minimum I/O size in blocks 4) optimal (suggested maximum) I/O size in blocks Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-7-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block MQ info to user spaceZhu Lingshan
This commits allows vDPA reporting virtio-block multi-queue configuration to user sapce. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-6-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block max segments in a request to user spaceZhu Lingshan
This commit allows vDPA reporting the maximum number of segments in a request of virtio-block devices to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-5-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block block-size to user spaceZhu Lingshan
This commit allows reporting the block size of a virtio-block device to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-4-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block max segment size to user spaceZhu Lingshan
This commit allows reporting the max size of any single segment of virtio-block devices to user space. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-3-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vDPA: report virtio-block capacity to user spaceZhu Lingshan
This commit allows userspace to query capacity of a virtio-block device. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240218185606.13509-2-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19vhost-vdpa: uapi to support reporting per vq sizeZhu Lingshan
The size of a virtqueue is a per vq configuration. This commit introduce a new ioctl uAPI to support this flexibility. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Message-Id: <20240202163905.8834-2-lingshan.zhu@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-19virtio: uapi: Drop __packed attribute in linux/virtio_pci.hSuzuki K Poulose
Commit 92792ac752aa ("virtio-pci: Introduce admin command sending function") added "__packed" structures to UAPI header linux/virtio_pci.h. This triggers build failures in the consumer userspace applications without proper "definition" of __packed (e.g., kvmtool build fails). Moreover, the structures are already packed well, and doesn't need explicit packing, similar to the rest of the structures in all virtio_* headers. Remove the __packed attribute. Fixes: 92792ac752aa ("virtio-pci: Introduce admin command sending function") Cc: Feng Liu <feliu@nvidia.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Yishai Hadas <yishaih@nvidia.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Message-Id: <20240125232039.913606-1-suzuki.poulose@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-18Merge tag 'trace-v6.9-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: "Main user visible change: - User events can now have "multi formats" The current user events have a single format. If another event is created with a different format, it will fail to be created. That is, once an event name is used, it cannot be used again with a different format. This can cause issues if a library is using an event and updates its format. An application using the older format will prevent an application using the new library from registering its event. A task could also DOS another application if it knows the event names, and it creates events with different formats. The multi-format event is in a different name space from the single format. Both the event name and its format are the unique identifier. This will allow two different applications to use the same user event name but with different payloads. - Added support to have ftrace_dump_on_oops dump out instances and not just the main top level tracing buffer. Other changes: - Add eventfs_root_inode Only the root inode has a dentry that is static (never goes away) and stores it upon creation. There's no reason that the thousands of other eventfs inodes should have a pointer that never gets set in its descriptor. Create a eventfs_root_inode desciptor that has a eventfs_inode descriptor and a dentry pointer, and only the root inode will use this. - Added WARN_ON()s in eventfs There's some conditionals remaining in eventfs that should never be hit, but instead of removing them, add WARN_ON() around them to make sure that they are never hit. - Have saved_cmdlines allocation also include the map_cmdline_to_pid array The saved_cmdlines structure allocates a large amount of data to hold its mappings. Within it, it has three arrays. Two are already apart of it: map_pid_to_cmdline[] and saved_cmdlines[]. More memory can be saved by also including the map_cmdline_to_pid[] array as well. - Restructure __string() and __assign_str() macros used in TRACE_EVENT() Dynamic strings in TRACE_EVENT() are declared with: __string(name, source) And assigned with: __assign_str(name, source) In the tracepoint callback of the event, the __string() is used to get the size needed to allocate on the ring buffer and __assign_str() is used to copy the string into the ring buffer. There's a helper structure that is created in the TRACE_EVENT() macro logic that will hold the string length and its position in the ring buffer which is created by __string(). There are several trace events that have a function to create the string to save. This function is executed twice. Once for __string() and again for __assign_str(). There's no reason for this. The helper structure could also save the string it used in __string() and simply copy that into __assign_str() (it also already has its length). By using the structure to store the source string for the assignment, it means that the second argument to __assign_str() is no longer needed. It will be removed in the next merge window, but for now add a warning if the source string given to __string() is different than the source string given to __assign_str(), as the source to __assign_str() isn't even used and will be going away. - Added checks to make sure that the source of __string() is also the source of __assign_str() so that it can be safely removed in the next merge window. Included fixes that the above check found. - Other minor clean ups and fixes" * tag 'trace-v6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (34 commits) tracing: Add __string_src() helper to help compilers not to get confused tracing: Use strcmp() in __assign_str() WARN_ON() check tracepoints: Use WARN() and not WARN_ON() for warnings tracing: Use div64_u64() instead of do_div() tracing: Support to dump instance traces by ftrace_dump_on_oops tracing: Remove second parameter to __assign_rel_str() tracing: Add warning if string in __assign_str() does not match __string() tracing: Add __string_len() example tracing: Remove __assign_str_len() ftrace: Fix most kernel-doc warnings tracing: Decrement the snapshot if the snapshot trigger fails to register tracing: Fix snapshot counter going between two tracers that use it tracing: Use EVENT_NULL_STR macro instead of open coding "(null)" tracing: Use ? : shortcut in trace macros tracing: Do not calculate strlen() twice for __string() fields tracing: Rework __assign_str() and __string() to not duplicate getting the string cxl/trace: Properly initialize cxl_poison region name net: hns3: tracing: fix hclgevf trace event strings drm/i915: Add missing ; to __assign_str() macros in tracepoint code NFSD: Fix nfsd_clid_class use of __string_len() macro ...
2024-03-18tracing/user_events: Introduce multi-format eventsBeau Belgrave
Currently user_events supports 1 event with the same name and must have the exact same format when referenced by multiple programs. This opens an opportunity for malicious or poorly thought through programs to create events that others use with different formats. Another scenario is user programs wishing to use the same event name but add more fields later when the software updates. Various versions of a program may be running side-by-side, which is prevented by the current single format requirement. Add a new register flag (USER_EVENT_REG_MULTI_FORMAT) which indicates the user program wishes to use the same user_event name, but may have several different formats of the event. When this flag is used, create the underlying tracepoint backing the user_event with a unique name per-version of the format. It's important that existing ABI users do not get this logic automatically, even if one of the multi format events matches the format. This ensures existing programs that create events and assume the tracepoint name will match exactly continue to work as expected. Add logic to only check multi-format events with other multi-format events and single-format events to only check single-format events during find. Change system name of the multi-format event tracepoint to ensure that multi-format events are isolated completely from single-format events. This prevents single-format names from conflicting with multi-format events if they end with the same suffix as the multi-format events. Add a register_name (reg_name) to the user_event struct which allows for split naming of events. We now have the name that was used to register within user_events as well as the unique name for the tracepoint. Upon registering events ensure matches based on first the reg_name, followed by the fields and format of the event. This allows for multiple events with the same registered name to have different formats. The underlying tracepoint will have a unique name in the format of {reg_name}.{unique_id}. For example, if both "test u32 value" and "test u64 value" are used with the USER_EVENT_REG_MULTI_FORMAT the system would have 2 unique tracepoints. The dynamic_events file would then show the following: u:test u64 count u:test u32 count The actual tracepoint names look like this: test.0 test.1 Both would be under the new user_events_multi system name to prevent the older ABI from being used to squat on multi-formatted events and block their use. Deleting events via "!u:test u64 count" would only delete the first tracepoint that matched that format. When the delete ABI is used all events with the same name will be attempted to be deleted. If per-version deletion is required, user programs should either not use persistent events or delete them via dynamic_events. Link: https://lore.kernel.org/linux-trace-kernel/20240222001807.1463-3-beaub@linux.microsoft.com Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-03-15Merge tag 'powerpc-6.9-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add AT_HWCAP3 and AT_HWCAP4 aux vector entries for future use by glibc - Add support for recognising the Power11 architected and raw PVRs - Add support for nr_cpus=n on the command line where the boot CPU is >= n - Add ppcxx_allmodconfig targets for all 32-bit sub-arches - Other small features, cleanups and fixes Thanks to Akanksha J N, Brian King, Christophe Leroy, Dawei Li, Geoff Levand, Greg Kroah-Hartman, Jan-Benedict Glaw, Kajol Jain, Kunwu Chan, Li zeming, Madhavan Srinivasan, Masahiro Yamada, Nathan Chancellor, Nicholas Piggin, Peter Bergner, Qiheng Lin, Randy Dunlap, Ricardo B. Marliere, Rob Herring, Sathvika Vasireddy, Shrikanth Hegde, Uwe Kleine-König, Vaibhav Jain, and Wen Xiong. * tag 'powerpc-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (71 commits) powerpc/macio: Make remove callback of macio driver void returned powerpc/83xx: Fix build failure with FPU=n powerpc/64s: Fix get_hugepd_cache_index() build failure powerpc/4xx: Fix warp_gpio_leds build failure powerpc/amigaone: Make several functions static powerpc/embedded6xx: Fix no previous prototype for avr_uart_send() etc. macintosh/adb: make adb_dev_class constant powerpc: xor_vmx: Add '-mhard-float' to CFLAGS powerpc/fsl: Fix mfpmr() asm constraint error powerpc: Remove cpu-as-y completely powerpc/fsl: Modernise mt/mfpmr powerpc/fsl: Fix mfpmr build errors with newer binutils powerpc/64s: Use .machine power4 around dcbt powerpc/64s: Move dcbt/dcbtst sequence into a macro powerpc/mm: Code cleanup for __hash_page_thp powerpc/hv-gpci: Fix the H_GET_PERF_COUNTER_INFO hcall return value checks powerpc/irq: Allow softirq to hardirq stack transition powerpc: Stop using of_root powerpc/machdep: Define 'compatibles' property in ppc_md and use it of: Reimplement of_machine_is_compatible() using of_machine_compatible_match() ...
2024-03-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "S390: - Changes to FPU handling came in via the main s390 pull request - Only deliver to the guest the SCLP events that userspace has requested - More virtual vs physical address fixes (only a cleanup since virtual and physical address spaces are currently the same) - Fix selftests undefined behavior x86: - Fix a restriction that the guest can't program a PMU event whose encoding matches an architectural event that isn't included in the guest CPUID. The enumeration of an architectural event only says that if a CPU supports an architectural event, then the event can be programmed *using the architectural encoding*. The enumeration does NOT say anything about the encoding when the CPU doesn't report support the event *in general*. It might support it, and it might support it using the same encoding that made it into the architectural PMU spec - Fix a variety of bugs in KVM's emulation of RDPMC (more details on individual commits) and add a selftest to verify KVM correctly emulates RDMPC, counter availability, and a variety of other PMC-related behaviors that depend on guest CPUID and therefore are easier to validate with selftests than with custom guests (aka kvm-unit-tests) - Zero out PMU state on AMD if the virtual PMU is disabled, it does not cause any bug but it wastes time in various cases where KVM would check if a PMC event needs to be synthesized - Optimize triggering of emulated events, with a nice ~10% performance improvement in VM-Exit microbenchmarks when a vPMU is exposed to the guest - Tighten the check for "PMI in guest" to reduce false positives if an NMI arrives in the host while KVM is handling an IRQ VM-Exit - Fix a bug where KVM would report stale/bogus exit qualification information when exiting to userspace with an internal error exit code - Add a VMX flag in /proc/cpuinfo to report 5-level EPT support - Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for read, e.g. to avoid serializing vCPUs when userspace deletes a memslot - Tear down TDP MMU page tables at 4KiB granularity (used to be 1GiB). KVM doesn't support yielding in the middle of processing a zap, and 1GiB granularity resulted in multi-millisecond lags that are quite impolite for CONFIG_PREEMPT kernels - Allocate write-tracking metadata on-demand to avoid the memory overhead when a kernel is built with i915 virtualization support but the workloads use neither shadow paging nor i915 virtualization - Explicitly initialize a variety of on-stack variables in the emulator that triggered KMSAN false positives - Fix the debugregs ABI for 32-bit KVM - Rework the "force immediate exit" code so that vendor code ultimately decides how and when to force the exit, which allowed some optimization for both Intel and AMD - Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if vCPU creation ultimately failed, causing extra unnecessary work - Cleanup the logic for checking if the currently loaded vCPU is in-kernel - Harden against underflowing the active mmu_notifier invalidation count, so that "bad" invalidations (usually due to bugs elsehwere in the kernel) are detected earlier and are less likely to hang the kernel x86 Xen emulation: - Overlay pages can now be cached based on host virtual address, instead of guest physical addresses. This removes the need to reconfigure and invalidate the cache if the guest changes the gpa but the underlying host virtual address remains the same - When possible, use a single host TSC value when computing the deadline for Xen timers in order to improve the accuracy of the timer emulation - Inject pending upcall events when the vCPU software-enables its APIC to fix a bug where an upcall can be lost (and to follow Xen's behavior) - Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen events fails, e.g. if the guest has aliased xAPIC IDs RISC-V: - Support exception and interrupt handling in selftests - New self test for RISC-V architectural timer (Sstc extension) - New extension support (Ztso, Zacas) - Support userspace emulation of random number seed CSRs ARM: - Infrastructure for building KVM's trap configuration based on the architectural features (or lack thereof) advertised in the VM's ID registers - Support for mapping vfio-pci BARs as Normal-NC (vaguely similar to x86's WC) at stage-2, improving the performance of interacting with assigned devices that can tolerate it - Conversion of KVM's representation of LPIs to an xarray, utilized to address serialization some of the serialization on the LPI injection path - Support for _architectural_ VHE-only systems, advertised through the absence of FEAT_E2H0 in the CPU's ID register - Miscellaneous cleanups, fixes, and spelling corrections to KVM and selftests LoongArch: - Set reserved bits as zero in CPUCFG - Start SW timer only when vcpu is blocking - Do not restart SW timer when it is expired - Remove unnecessary CSR register saving during enter guest - Misc cleanups and fixes as usual Generic: - Clean up Kconfig by removing CONFIG_HAVE_KVM, which was basically always true on all architectures except MIPS (where Kconfig determines the available depending on CPU capabilities). It is replaced either by an architecture-dependent symbol for MIPS, and IS_ENABLED(CONFIG_KVM) everywhere else - Factor common "select" statements in common code instead of requiring each architecture to specify it - Remove thoroughly obsolete APIs from the uapi headers - Move architecture-dependent stuff to uapi/asm/kvm.h - Always flush the async page fault workqueue when a work item is being removed, especially during vCPU destruction, to ensure that there are no workers running in KVM code when all references to KVM-the-module are gone, i.e. to prevent a very unlikely use-after-free if kvm.ko is unloaded - Grab a reference to the VM's mm_struct in the async #PF worker itself instead of gifting the worker a reference, so that there's no need to remember to *conditionally* clean up after the worker Selftests: - Reduce boilerplate especially when utilize selftest TAP infrastructure - Add basic smoke tests for SEV and SEV-ES, along with a pile of library support for handling private/encrypted/protected memory - Fix benign bugs where tests neglect to close() guest_memfd files" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits) selftests: kvm: remove meaningless assignments in Makefiles KVM: riscv: selftests: Add Zacas extension to get-reg-list test RISC-V: KVM: Allow Zacas extension for Guest/VM KVM: riscv: selftests: Add Ztso extension to get-reg-list test RISC-V: KVM: Allow Ztso extension for Guest/VM RISC-V: KVM: Forward SEED CSR access to user space KVM: riscv: selftests: Add sstc timer test KVM: riscv: selftests: Change vcpu_has_ext to a common function KVM: riscv: selftests: Add guest helper to get vcpu id KVM: riscv: selftests: Add exception handling support LoongArch: KVM: Remove unnecessary CSR register saving during enter guest LoongArch: KVM: Do not restart SW timer when it is expired LoongArch: KVM: Start SW timer only when vcpu is blocking LoongArch: KVM: Set reserved bits as zero in CPUCFG KVM: selftests: Explicitly close guest_memfd files in some gmem tests KVM: x86/xen: fix recursive deadlock in timer injection KVM: pfncache: simplify locking and make more self-contained KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled KVM: x86/xen: improve accuracy of Xen timers ...
2024-03-15Merge tag 'media/v6.9-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - DVB budget legacy API was finally documented. It took only 20+ years to get some documentation about it... - hantro driver has gained support for STM32MP25 VDEC/VENC - rkisp1 has gained support for i.MX8MP - atomisp got rid of two items from its todo list. Still 5 items pending for moving it out of staging - lots of driver fixes, cleanups and improvements * tag 'media/v6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (252 commits) media: rcar-isp: Disallow unbind of devices media: usbtv: Remove useless locks in usbtv_video_free() media: mediatek: vcodec: avoid -Wcast-function-type-strict warning media: ttpci: fix two memleaks in budget_av_attach media: go7007: fix a memleak in go7007_load_encoder media: dvb-frontends: avoid stack overflow warnings with clang media: pvrusb2: fix uaf in pvr2_context_set_notify media: usb: s2255: Refactor s2255_get_fx2fw media: ti: j721e-csi2rx: Convert to platform remove callback returning void media: stm32-dcmipp: Convert to platform remove callback returning void media: nxp: imx8-isi: Convert to platform remove callback returning void media: nuvoton: Convert to platform remove callback returning void media: chips-media: wave5: Convert to platform remove callback returning void media: chips-media: wave5: Remove unnecessary semicolons media: i2c: imx290: Fix IMX920 typo media: platform: replace of_graph_get_next_endpoint() media: i2c: replace of_graph_get_next_endpoint() media: ivsc: csi: Make use of sub-device state media: ivsc: csi: Swap SINK and SOURCE pads media: ipu-bridge: Serialise calls to IPU bridge init ...
2024-03-15Merge tag 'fuse-update-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add passthrough mode for regular file I/O. This allows performing read and write (also via memory maps) on a backing file without incurring the overhead of roundtrips to userspace. For now this is only allowed to privileged servers, but this limitation will go away in the future (Amir Goldstein) - Fix interaction of direct I/O mode with memory maps (Bernd Schubert) - Export filesystem tags through sysfs for virtiofs (Stefan Hajnoczi) - Allow resending queued requests for server crash recovery (Zhao Chen) - Misc fixes and cleanups * tag 'fuse-update-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (38 commits) fuse: get rid of ff->readdir.lock fuse: remove unneeded lock which protecting update of congestion_threshold fuse: Fix missing FOLL_PIN for direct-io fuse: remove an unnecessary if statement fuse: Track process write operations in both direct and writethrough modes fuse: Use the high bit of request ID for indicating resend requests fuse: Introduce a new notification type for resend pending requests fuse: add support for explicit export disabling fuse: __kuid_val/__kgid_val helpers in fuse_fill_attr_from_inode() fuse: fix typo for fuse_permission comment fuse: Convert fuse_writepage_locked to take a folio fuse: Remove fuse_writepage virtio_fs: remove duplicate check if queue is broken fuse: use FUSE_ROOT_ID in fuse_get_root_inode() fuse: don't unhash root fuse: fix root lookup with nonzero generation fuse: replace remaining make_bad_inode() with fuse_make_bad() virtiofs: drop __exit from virtio_fs_sysfs_exit() fuse: implement passthrough for mmap fuse: implement splice read/write passthrough ...
2024-03-14Merge tag 'mm-stable-2024-03-13-20-04' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames from hotplugged memory rather than only from main memory. Series "implement "memmap on memory" feature on s390". - More folio conversions from Matthew Wilcox in the series "Convert memcontrol charge moving to use folios" "mm: convert mm counter to take a folio" - Chengming Zhou has optimized zswap's rbtree locking, providing significant reductions in system time and modest but measurable reductions in overall runtimes. The series is "mm/zswap: optimize the scalability of zswap rb-tree". - Chengming Zhou has also provided the series "mm/zswap: optimize zswap lru list" which provides measurable runtime benefits in some swap-intensive situations. - And Chengming Zhou further optimizes zswap in the series "mm/zswap: optimize for dynamic zswap_pools". Measured improvements are modest. - zswap cleanups and simplifications from Yosry Ahmed in the series "mm: zswap: simplify zswap_swapoff()". - In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has contributed several DAX cleanups as well as adding a sysfs tunable to control the memmap_on_memory setting when the dax device is hotplugged as system memory. - Johannes Weiner has added the large series "mm: zswap: cleanups", which does that. - More DAMON work from SeongJae Park in the series "mm/damon: make DAMON debugfs interface deprecation unignorable" "selftests/damon: add more tests for core functionalities and corner cases" "Docs/mm/damon: misc readability improvements" "mm/damon: let DAMOS feeds and tame/auto-tune itself" - In the series "mm/mempolicy: weighted interleave mempolicy and sysfs extension" Rakie Kim has developed a new mempolicy interleaving policy wherein we allocate memory across nodes in a weighted fashion rather than uniformly. This is beneficial in heterogeneous memory environments appearing with CXL. - Christophe Leroy has contributed some cleanup and consolidation work against the ARM pagetable dumping code in the series "mm: ptdump: Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute". - Luis Chamberlain has added some additional xarray selftesting in the series "test_xarray: advanced API multi-index tests". - Muhammad Usama Anjum has reworked the selftest code to make its human-readable output conform to the TAP ("Test Anything Protocol") format. Amongst other things, this opens up the use of third-party tools to parse and process out selftesting results. - Ryan Roberts has added fork()-time PTE batching of THP ptes in the series "mm/memory: optimize fork() with PTE-mapped THP". Mainly targeted at arm64, this significantly speeds up fork() when the process has a large number of pte-mapped folios. - David Hildenbrand also gets in on the THP pte batching game in his series "mm/memory: optimize unmap/zap with PTE-mapped THP". It implements batching during munmap() and other pte teardown situations. The microbenchmark improvements are nice. - And in the series "Transparent Contiguous PTEs for User Mappings" Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte mappings"). Kernel build times on arm64 improved nicely. Ryan's series "Address some contpte nits" provides some followup work. - In the series "mm/hugetlb: Restore the reservation" Breno Leitao has fixed an obscure hugetlb race which was causing unnecessary page faults. He has also added a reproducer under the selftest code. - In the series "selftests/mm: Output cleanups for the compaction test", Mark Brown did what the title claims. - Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring". - Even more zswap material from Nhat Pham. The series "fix and extend zswap kselftests" does as claimed. - In the series "Introduce cpu_dcache_is_aliasing() to fix DAX regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in our handling of DAX on archiecctures which have virtually aliasing data caches. The arm architecture is the main beneficiary. - Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic improvements in worst-case mmap_lock hold times during certain userfaultfd operations. - Some page_owner enhancements and maintenance work from Oscar Salvador in his series "page_owner: print stacks and their outstanding allocations" "page_owner: Fixup and cleanup" - Uladzislau Rezki has contributed some vmalloc scalability improvements in his series "Mitigate a vmap lock contention". It realizes a 12x improvement for a certain microbenchmark. - Some kexec/crash cleanup work from Baoquan He in the series "Split crash out from kexec and clean up related config items". - Some zsmalloc maintenance work from Chengming Zhou in the series "mm/zsmalloc: fix and optimize objects/page migration" "mm/zsmalloc: some cleanup for get/set_zspage_mapping()" - Zi Yan has taught the MM to perform compaction on folios larger than order=0. This a step along the path to implementaton of the merging of large anonymous folios. The series is named "Enable >0 order folio memory compaction". - Christoph Hellwig has done quite a lot of cleanup work in the pagecache writeback code in his series "convert write_cache_pages() to an iterator". - Some modest hugetlb cleanups and speedups in Vishal Moola's series "Handle hugetlb faults under the VMA lock". - Zi Yan has changed the page splitting code so we can split huge pages into sizes other than order-0 to better utilize large folios. The series is named "Split a folio to any lower order folios". - David Hildenbrand has contributed the series "mm: remove total_mapcount()", a cleanup. - Matthew Wilcox has sought to improve the performance of bulk memory freeing in his series "Rearrange batched folio freeing". - Gang Li's series "hugetlb: parallelize hugetlb page init on boot" provides large improvements in bootup times on large machines which are configured to use large numbers of hugetlb pages. - Matthew Wilcox's series "PageFlags cleanups" does that. - Qi Zheng's series "minor fixes and supplement for ptdesc" does that also. S390 is affected. - Cleanups to our pagemap utility functions from Peter Xu in his series "mm/treewide: Replace pXd_large() with pXd_leaf()". - Nico Pache has fixed a few things with our hugepage selftests in his series "selftests/mm: Improve Hugepage Test Handling in MM Selftests". - Also, of course, many singleton patches to many things. Please see the individual changelogs for details. * tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits) mm/zswap: remove the memcpy if acomp is not sleepable crypto: introduce: acomp_is_async to expose if comp drivers might sleep memtest: use {READ,WRITE}_ONCE in memory scanning mm: prohibit the last subpage from reusing the entire large folio mm: recover pud_leaf() definitions in nopmd case selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements selftests/mm: skip uffd hugetlb tests with insufficient hugepages selftests/mm: dont fail testsuite due to a lack of hugepages mm/huge_memory: skip invalid debugfs new_order input for folio split mm/huge_memory: check new folio order when split a folio mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure mm: add an explicit smp_wmb() to UFFDIO_CONTINUE mm: fix list corruption in put_pages_list mm: remove folio from deferred split list before uncharging it filemap: avoid unnecessary major faults in filemap_fault() mm,page_owner: drop unnecessary check mm,page_owner: check for null stack_record before bumping its refcount mm: swap: fix race between free_swap_and_cache() and swapoff() mm/treewide: align up pXd_leaf() retval across archs mm/treewide: drop pXd_large() ...