summaryrefslogtreecommitdiff
path: root/tools/include
AgeCommit message (Collapse)Author
2023-11-01Merge tag 'for-6.7/io_uring-sockopt-2023-10-30' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring {get,set}sockopt support from Jens Axboe: "This adds support for using getsockopt and setsockopt via io_uring. The main use cases for this is to enable use of direct descriptors, rather than first instantiating a normal file descriptor, doing the option tweaking needed, then turning it into a direct descriptor. With this support, we can avoid needing a regular file descriptor completely. The net and bpf bits have been signed off on their side" * tag 'for-6.7/io_uring-sockopt-2023-10-30' of git://git.kernel.dk/linux: selftests/bpf/sockopt: Add io_uring support io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT io_uring/cmd: return -EOPNOTSUPP if net is disabled selftests/net: Extract uring helpers to be reusable tools headers: Grab copy of io_uring.h io_uring/cmd: Pass compat mode in issue_flags net/socket: Break down __sys_getsockopt net/socket: Break down __sys_setsockopt bpf: Add sockptr support for setsockopt bpf: Add sockptr support for getsockopt
2023-10-31Merge tag 'net-next-6.7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Support usec resolution of TCP timestamps, enabled selectively by a route attribute. - Defer regular TCP ACK while processing socket backlog, try to send a cumulative ACK at the end. Increase single TCP flow performance on a 200Gbit NIC by 20% (100Gbit -> 120Gbit). - The Fair Queuing (FQ) packet scheduler: - add built-in 3 band prio / WRR scheduling - support bypass if the qdisc is mostly idle (5% speed up for TCP RR) - improve inactive flow reporting - optimize the layout of structures for better cache locality - Support TCP Authentication Option (RFC 5925, TCP-AO), a more modern replacement for the old MD5 option. - Add more retransmission timeout (RTO) related statistics to TCP_INFO. - Support sending fragmented skbs over vsock sockets. - Make sure we send SIGPIPE for vsock sockets if socket was shutdown(). - Add sysctl for ignoring lower limit on lifetime in Router Advertisement PIO, based on an in-progress IETF draft. - Add sysctl to control activation of TCP ping-pong mode. - Add sysctl to make connection timeout in MPTCP configurable. - Support rcvlowat and notsent_lowat on MPTCP sockets, to help apps limit the number of wakeups. - Support netlink GET for MDB (multicast forwarding), allowing user space to request a single MDB entry instead of dumping the entire table. - Support selective FDB flushing in the VXLAN tunnel driver. - Allow limiting learned FDB entries in bridges, prevent OOM attacks. - Allow controlling via configfs netconsole targets which were created via the kernel cmdline at boot, rather than via configfs at runtime. - Support multiple PTP timestamp event queue readers with different filters. - MCTP over I3C. BPF: - Add new veth-like netdevice where BPF program defines the logic of the xmit routine. It can operate in L3 and L2 mode. - Support exceptions - allow asserting conditions which should never be true but are hard for the verifier to infer. With some extra flexibility around handling of the exit / failure: https://lwn.net/Articles/938435/ - Add support for local per-cpu kptr, allow allocating and storing per-cpu objects in maps. Access to those objects operates on the value for the current CPU. This allows to deprecate local one-off implementations of per-CPU storage like BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE maps. - Extend cgroup BPF sockaddr hooks for UNIX sockets. The use case is for systemd to re-implement the LogNamespace feature which allows running multiple instances of systemd-journald to process the logs of different services. - Enable open-coded task_vma iteration, after maple tree conversion made it hard to directly walk VMAs in tracing programs. - Add open-coded task, css_task and css iterator support. One of the use cases is customizable OOM victim selection via BPF. - Allow source address selection with bpf_*_fib_lookup(). - Add ability to pin BPF timer to the current CPU. - Prevent creation of infinite loops by combining tail calls and fentry/fexit programs. - Add missed stats for kprobes to retrieve the number of missed kprobe executions and subsequent executions of BPF programs. - Inherit system settings for CPU security mitigations. - Add BPF v4 CPU instruction support for arm32 and s390x. Changes to common code: - overflow: add DEFINE_FLEX() for on-stack definition of structs with flexible array members. - Process doc update with more guidance for reviewers. Driver API: - Simplify locking in WiFi (cfg80211 and mac80211 layers), use wiphy mutex in most places and remove a lot of smaller locks. - Create a common DPLL configuration API. Allow configuring and querying state of PLL circuits used for clock syntonization, in network time distribution. - Unify fragmented and full page allocation APIs in page pool code. Let drivers be ignorant of PAGE_SIZE. - Rework PHY state machine to avoid races with calls to phy_stop(). - Notify DSA drivers of MAC address changes on user ports, improve correctness of offloads which depend on matching port MAC addresses. - Allow antenna control on injected WiFi frames. - Reduce the number of variants of napi_schedule(). - Simplify error handling when composing devlink health messages. Misc: - A lot of KCSAN data race "fixes", from Eric. - A lot of __counted_by() annotations, from Kees. - A lot of strncpy -> strscpy and printf format fixes. - Replace master/slave terminology with conduit/user in DSA drivers. - Handful of KUnit tests for netdev and WiFi core. Removed: - AppleTalk COPS. - AppleTalk ipddp. - TI AR7 CPMAC Ethernet driver. Drivers: - Ethernet high-speed NICs: - Intel (100G, ice, idpf): - add a driver for the Intel E2000 IPUs - make CRC/FCS stripping configurable - cross-timestamping for E823 devices - basic support for E830 devices - use aux-bus for managing client drivers - i40e: report firmware versions via devlink - nVidia/Mellanox: - support 4-port NICs - increase max number of channels to 256 - optimize / parallelize SF creation flow - Broadcom (bnxt): - enhance NIC temperature reporting - support PAM4 speeds and lane configuration - Marvell OcteonTX2: - PTP pulse-per-second output support - enable hardware timestamping for VFs - Solarflare/AMD: - conntrack NAT offload and offload for tunnels - Wangxun (ngbe/txgbe): - expose HW statistics - Pensando/AMD: - support PCI level reset - narrow down the condition under which skbs are linearized - Netronome/Corigine (nfp): - support CHACHA20-POLY1305 crypto in IPsec offload - Ethernet NICs embedded, slower, virtual: - Synopsys (stmmac): - add Loongson-1 SoC support - enable use of HW queues with no offload capabilities - enable PPS input support on all 5 channels - increase TX coalesce timer to 5ms - RealTek USB (r8152): improve efficiency of Rx by using GRO frags - xen: support SW packet timestamping - add drivers for implementations based on TI's PRUSS (AM64x EVM) - nVidia/Mellanox Ethernet datacenter switches: - avoid poor HW resource use on Spectrum-4 by better block selection for IPv6 multicast forwarding and ordering of blocks in ACL region - Ethernet embedded switches: - Microchip: - support configuring the drive strength for EMI compliance - ksz9477: partial ACL support - ksz9477: HSR offload - ksz9477: Wake on LAN - Realtek: - rtl8366rb: respect device tree config of the CPU port - Ethernet PHYs: - support Broadcom BCM5221 PHYs - TI dp83867: support hardware LED blinking - CAN: - add support for Linux-PHY based CAN transceivers - at91_can: clean up and use rx-offload helpers - WiFi: - MediaTek (mt76): - new sub-driver for mt7925 USB/PCIe devices - HW wireless <> Ethernet bridging in MT7988 chips - mt7603/mt7628 stability improvements - Qualcomm (ath12k): - WCN7850: - enable 320 MHz channels in 6 GHz band - hardware rfkill support - enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS to make scan faster - read board data variant name from SMBIOS - QCN9274: mesh support - RealTek (rtw89): - TDMA-based multi-channel concurrency (MCC) - Silicon Labs (wfx): - Remain-On-Channel (ROC) support - Bluetooth: - ISO: many improvements for broadcast support - mark BCM4378/BCM4387 as BROKEN_LE_CODED - add support for QCA2066 - btmtksdio: enable Bluetooth wakeup from suspend" * tag 'net-next-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1816 commits) net: pcs: xpcs: Add 2500BASE-X case in get state for XPCS drivers net: bpf: Use sockopt_lock_sock() in ip_sock_set_tos() net: mana: Use xdp_set_features_flag instead of direct assignment vxlan: Cleanup IFLA_VXLAN_PORT_RANGE entry in vxlan_get_size() iavf: delete the iavf client interface iavf: add a common function for undoing the interrupt scheme iavf: use unregister_netdev iavf: rely on netdev's own registered state iavf: fix the waiting time for initial reset iavf: in iavf_down, don't queue watchdog_task if comms failed iavf: simplify mutex_trylock+sleep loops iavf: fix comments about old bit locks doc/netlink: Update schema to support cmd-cnt-name and cmd-max-name tools: ynl: introduce option to process unknown attributes or types ipvlan: properly track tx_errors netdevsim: Block until all devices are released nfp: using napi_build_skb() to replace build_skb() net: dsa: microchip: ksz9477: Fix spelling mistake "Enery" -> "Energy" net: dsa: microchip: Ensure Stable PME Pin State for Wake-on-LAN net: dsa: microchip: Refactor switch shutdown routine for WoL preparation ...
2023-10-30Merge tag 'nolibc.2023.10.23a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull nolibc updates from Paul McKenney: - Add stdarg.h header and a few additional system-call upgrades - Add support for constructors and destructors - Add tests to verify the ability to link multiple .o files against nolibc - Numerous string-function optimizations and improvements - Prevent redundant kernel relinks by avoiding embedding of initramfs into the kernel image - Allow building i386 with multiarch compiler and make ppc64le use qemu-system-ppc64 - Miscellaneous fixups, including addition of -nostdinc for nolibc-test, avoiding -Wstringop-overflow warnings, and avoiding unused parameter warnings for ENOSYS fallbacks * tag 'nolibc.2023.10.23a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: selftests/nolibc: add tests for multi-object linkage selftests/nolibc: use qemu-system-ppc64 for ppc64le tools/nolibc: add support for constructors and destructors tools/nolibc: drop test for getauxval(AT_PAGESZ) tools/nolibc: automatically detect necessity to use pselect6 tools/nolibc: don't define new syscall number tools/nolibc: avoid unused parameter warnings for ENOSYS fallbacks selftests/nolibc: allow building i386 with multiarch compiler selftests/nolibc: don't embed initramfs into kernel image selftests/nolibc: libc-test: avoid -Wstringop-overflow warnings tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function tools/nolibc: x86-64: Use `rep stosb` for `memset()` tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` selftests/nolibc: use -nostdinc for nolibc-test tools/nolibc: add stdarg.h header
2023-10-26Merge tag 'for-netdev' of ↵Jakub Kicinski
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-10-26 We've added 51 non-merge commits during the last 10 day(s) which contain a total of 75 files changed, 5037 insertions(+), 200 deletions(-). The main changes are: 1) Add open-coded task, css_task and css iterator support. One of the use cases is customizable OOM victim selection via BPF, from Chuyi Zhou. 2) Fix BPF verifier's iterator convergence logic to use exact states comparison for convergence checks, from Eduard Zingerman, Andrii Nakryiko and Alexei Starovoitov. 3) Add BPF programmable net device where bpf_mprog defines the logic of its xmit routine. It can operate in L3 and L2 mode, from Daniel Borkmann and Nikolay Aleksandrov. 4) Batch of fixes for BPF per-CPU kptr and re-enable unit_size checking for global per-CPU allocator, from Hou Tao. 5) Fix libbpf which eagerly assumed that SHT_GNU_verdef ELF section was going to be present whenever a binary has SHT_GNU_versym section, from Andrii Nakryiko. 6) Fix BPF ringbuf correctness to fold smp_mb__before_atomic() into atomic_set_release(), from Paul E. McKenney. 7) Add a warning if NAPI callback missed xdp_do_flush() under CONFIG_DEBUG_NET which helps checking if drivers were missing the former, from Sebastian Andrzej Siewior. 8) Fix missed RCU read-lock in bpf_task_under_cgroup() which was throwing a warning under sleepable programs, from Yafang Shao. 9) Avoid unnecessary -EBUSY from htab_lock_bucket by disabling IRQ before checking map_locked, from Song Liu. 10) Make BPF CI linked_list failure test more robust, from Kumar Kartikeya Dwivedi. 11) Enable samples/bpf to be built as PIE in Fedora, from Viktor Malik. 12) Fix xsk starving when multiple xsk sockets were associated with a single xsk_buff_pool, from Albert Huang. 13) Clarify the signed modulo implementation for the BPF ISA standardization document that it uses truncated division, from Dave Thaler. 14) Improve BPF verifier's JEQ/JNE branch taken logic to also consider signed bounds knowledge, from Andrii Nakryiko. 15) Add an option to XDP selftests to use multi-buffer AF_XDP xdp_hw_metadata and mark used XDP programs as capable to use frags, from Larysa Zaremba. 16) Fix bpftool's BTF dumper wrt printing a pointer value and another one to fix struct_ops dump in an array, from Manu Bretelle. * tag 'for-netdev' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (51 commits) netkit: Remove explicit active/peer ptr initialization selftests/bpf: Fix selftests broken by mitigations=off samples/bpf: Allow building with custom bpftool samples/bpf: Fix passing LDFLAGS to libbpf samples/bpf: Allow building with custom CFLAGS/LDFLAGS bpf: Add more WARN_ON_ONCE checks for mismatched alloc and free selftests/bpf: Add selftests for netkit selftests/bpf: Add netlink helper library bpftool: Extend net dump with netkit progs bpftool: Implement link show support for netkit libbpf: Add link-based API for netkit tools: Sync if_link uapi header netkit, bpf: Add bpf programmable net device bpf: Improve JEQ/JNE branch taken logic bpf: Fold smp_mb__before_atomic() into atomic_set_release() bpf: Fix unnecessary -EBUSY from htab_lock_bucket xsk: Avoid starving the xsk further down the list bpf: print full verifier states on infinite loop detection selftests/bpf: test if state loops are detected in a tricky case bpf: correct loop detection for iterators convergence ... ==================== Link: https://lore.kernel.org/r/20231026150509.2824-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/mac80211/rx.c 91535613b609 ("wifi: mac80211: don't drop all unprotected public action frames") 6c02fab72429 ("wifi: mac80211: split ieee80211_drop_unencrypted_mgmt() return value") Adjacent changes: drivers/net/ethernet/apm/xgene/xgene_enet_main.c 61471264c018 ("net: ethernet: apm: Convert to platform remove callback returning void") d2ca43f30611 ("net: xgene: Fix unused xgene_enet_of_match warning for !CONFIG_OF") net/vmw_vsock/virtio_transport.c 64c99d2d6ada ("vsock/virtio: support to send non-linear skb") 53b08c498515 ("vsock/virtio: initialize the_virtio_vsock before using VQs") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-24tools: Sync if_link uapi headerDaniel Borkmann
Sync if_link uapi header to the latest version as we need the refresher in tooling for netkit device. Given it's been a while since the last sync and the diff is fairly big, it has been done as its own commit. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20231024214904.29825-3-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24netkit, bpf: Add bpf programmable net deviceDaniel Borkmann
This work adds a new, minimal BPF-programmable device called "netkit" (former PoC code-name "meta") we recently presented at LSF/MM/BPF. The core idea is that BPF programs are executed within the drivers xmit routine and therefore e.g. in case of containers/Pods moving BPF processing closer to the source. One of the goals was that in case of Pod egress traffic, this allows to move BPF programs from hostns tcx ingress into the device itself, providing earlier drop or forward mechanisms, for example, if the BPF program determines that the skb must be sent out of the node, then a redirect to the physical device can take place directly without going through per-CPU backlog queue. This helps to shift processing for such traffic from softirq to process context, leading to better scheduling decisions/performance (see measurements in the slides). In this initial version, the netkit device ships as a pair, but we plan to extend this further so it can also operate in single device mode. The pair comes with a primary and a peer device. Only the primary device, typically residing in hostns, can manage BPF programs for itself and its peer. The peer device is designated for containers/Pods and cannot attach/detach BPF programs. Upon the device creation, the user can set the default policy to 'pass' or 'drop' for the case when no BPF program is attached. Additionally, the device can be operated in L3 (default) or L2 mode. The management of BPF programs is done via bpf_mprog, so that multi-attach is supported right from the beginning with similar API and dependency controls as tcx. For details on the latter see commit 053c8e1f235d ("bpf: Add generic attach/detach/query API for multi-progs"). tc BPF compatibility is provided, so that existing programs can be easily migrated. Going forward, we plan to use netkit devices in Cilium as the main device type for connecting Pods. They will be operated in L3 mode in order to simplify a Pod's neighbor management and the peer will operate in default drop mode, so that no traffic is leaving between the time when a Pod is brought up by the CNI plugin and programs attached by the agent. Additionally, the programs we attach via tcx on the physical devices are using bpf_redirect_peer() for inbound traffic into netkit device, hence the latter is also supporting the ndo_get_peer_dev callback. Similarly, we use bpf_redirect_neigh() for the way out, pushing from netkit peer to phys device directly. Also, BIG TCP is supported on netkit device. For the follow-up work in single device mode, we plan to convert Cilium's cilium_host/_net devices into a single one. An extensive test suite for checking device operations and the BPF program and link management API comes as BPF selftests in this series. Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://github.com/borkmann/iproute2/tree/pr/netkit Link: http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf (24ff.) Link: https://lore.kernel.org/r/20231024214904.29825-2-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-24Merge tag 'mm-hotfixes-stable-2023-10-24-09-40' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "20 hotfixes. 12 are cc:stable and the remainder address post-6.5 issues or aren't considered necessary for earlier kernel versions" * tag 'mm-hotfixes-stable-2023-10-24-09-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: maple_tree: add GFP_KERNEL to allocations in mas_expected_entries() selftests/mm: include mman header to access MREMAP_DONTUNMAP identifier mailmap: correct email aliasing for Oleksij Rempel mailmap: map Bartosz's old address to the current one mm/damon/sysfs: check DAMOS regions update progress from before_terminate() MAINTAINERS: Ondrej has moved kasan: disable kasan_non_canonical_hook() for HW tags kasan: print the original fault addr when access invalid shadow hugetlbfs: close race between MADV_DONTNEED and page fault hugetlbfs: extend hugetlb_vma_lock to private VMAs hugetlbfs: clear resv_map pointer if mmap fails mm: zswap: fix pool refcount bug around shrink_worker() mm/migrate: fix do_pages_move for compat pointers riscv: fix set_huge_pte_at() for NAPOT mappings when a swap entry is set riscv: handle VM_FAULT_[HWPOISON|HWPOISON_LARGE] faults instead of panicking mmap: fix error paths with dup_anon_vma() mmap: fix vma_iterator in error path of vma_merge() mm: fix vm_brk_flags() to not bail out while holding lock mm/mempolicy: fix set_mempolicy_home_node() previous VMA pointer mm/page_alloc: correct start page when guard page debug is enabled
2023-10-23Merge tag 'urgent/nolibc.2023.10.16a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull nolibc fixes from Paul McKenney: - tools/nolibc: i386: Fix a stack misalign bug on _start - MAINTAINERS: nolibc: update tree location - tools/nolibc: mark start_c as weak to avoid linker errors * tag 'urgent/nolibc.2023.10.16a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: tools/nolibc: mark start_c as weak MAINTAINERS: nolibc: update tree location tools/nolibc: i386: Fix a stack misalign bug on _start
2023-10-19selftests/net: Extract uring helpers to be reusableBreno Leitao
Instead of defining basic io_uring functions in the test case, move them to a common directory, so, other tests can use them. This simplify the test code and reuse the common liburing infrastructure. This is basically a copy of what we have in io_uring_zerocopy_tx with some minor improvements to make checkpatch happy. A follow-up test will use the same helpers in a BPF sockopt test. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20231016134750.1381153-8-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-10-19tools headers: Grab copy of io_uring.hBreno Leitao
This file will be used by mini_uring.h and allow tests to run without the need of installing liburing to run the tests. This is needed to run io_uring tests in BPF, such as (tools/testing/selftests/bpf/prog_tests/sockopt.c). Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20231016134750.1381153-7-leitao@debian.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-10-18maple_tree: add GFP_KERNEL to allocations in mas_expected_entries()Liam R. Howlett
Users complained about OOM errors during fork without triggering compaction. This can be fixed by modifying the flags used in mas_expected_entries() so that the compaction will be triggered in low memory situations. Since mas_expected_entries() is only used during fork, the extra argument does not need to be passed through. Additionally, the two test_maple_tree test cases and one benchmark test were altered to use the correct locking type so that allocations would not trigger sleeping and thus fail. Testing was completed with lockdep atomic sleep detection. The additional locking change requires rwsem support additions to the tools/ directory through the use of pthreads pthread_rwlock_t. With this change test_maple_tree works in userspace, as a module, and in-kernel. Users may notice that the system gave up early on attempting to start new processes instead of attempting to reclaim memory. Link: https://lkml.kernel.org/r/20230915093243epcms1p46fa00bbac1ab7b7dca94acb66c44c456@epcms1p4 Link: https://lkml.kernel.org/r/20231012155233.2272446-1-Liam.Howlett@oracle.com Fixes: 54a611b60590 ("Maple Tree: add new data structure") Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by: Peng Zhang <zhangpeng.00@bytedance.com> Cc: <jason.sim@samsung.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-10-16Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-10-16 We've added 90 non-merge commits during the last 25 day(s) which contain a total of 120 files changed, 3519 insertions(+), 895 deletions(-). The main changes are: 1) Add missed stats for kprobes to retrieve the number of missed kprobe executions and subsequent executions of BPF programs, from Jiri Olsa. 2) Add cgroup BPF sockaddr hooks for unix sockets. The use case is for systemd to reimplement the LogNamespace feature which allows running multiple instances of systemd-journald to process the logs of different services, from Daan De Meyer. 3) Implement BPF CPUv4 support for s390x BPF JIT, from Ilya Leoshkevich. 4) Improve BPF verifier log output for scalar registers to better disambiguate their internal state wrt defaults vs min/max values matching, from Andrii Nakryiko. 5) Extend the BPF fib lookup helpers for IPv4/IPv6 to support retrieving the source IP address with a new BPF_FIB_LOOKUP_SRC flag, from Martynas Pumputis. 6) Add support for open-coded task_vma iterator to help with symbolization for BPF-collected user stacks, from Dave Marchevsky. 7) Add libbpf getters for accessing individual BPF ring buffers which is useful for polling them individually, for example, from Martin Kelly. 8) Extend AF_XDP selftests to validate the SHARED_UMEM feature, from Tushar Vyavahare. 9) Improve BPF selftests cross-building support for riscv arch, from Björn Töpel. 10) Add the ability to pin a BPF timer to the same calling CPU, from David Vernet. 11) Fix libbpf's bpf_tracing.h macros for riscv to use the generic implementation of PT_REGS_SYSCALL_REGS() to access syscall arguments, from Alexandre Ghiti. 12) Extend libbpf to support symbol versioning for uprobes, from Hengqi Chen. 13) Fix bpftool's skeleton code generation to guarantee that ELF data is 8 byte aligned, from Ian Rogers. 14) Inherit system-wide cpu_mitigations_off() setting for Spectre v1/v4 security mitigations in BPF verifier, from Yafang Shao. 15) Annotate struct bpf_stack_map with __counted_by attribute to prepare BPF side for upcoming __counted_by compiler support, from Kees Cook. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (90 commits) bpf: Ensure proper register state printing for cond jumps bpf: Disambiguate SCALAR register state output in verifier logs selftests/bpf: Make align selftests more robust selftests/bpf: Improve missed_kprobe_recursion test robustness selftests/bpf: Improve percpu_alloc test robustness selftests/bpf: Add tests for open-coded task_vma iter bpf: Introduce task_vma open-coded iterator kfuncs selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c bpf: Don't explicitly emit BTF for struct btf_iter_num bpf: Change syscall_nr type to int in struct syscall_tp_t net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not set bpf: Avoid unnecessary audit log for CPU security mitigations selftests/bpf: Add tests for cgroup unix socket address hooks selftests/bpf: Make sure mount directory exists documentation/bpf: Document cgroup unix socket address hooks bpftool: Add support for cgroup unix socket address hooks libbpf: Add support for cgroup unix socket address hooks bpf: Implement cgroup sockaddr hooks for unix sockets bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf bpf: Propagate modified uaddrlen from cgroup sockaddr programs ... ==================== Link: https://lore.kernel.org/r/20231016204803.30153-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-12tools/nolibc: add support for constructors and destructorsThomas Weißschuh
With the startup code moved to C, implementing support for constructors and deconstructors is fairly easy to implement. Examples for code size impact: text data bss dec hex filename 21837 104 88 22029 560d nolibc-test.before 22135 120 88 22343 5747 nolibc-test.after 21970 104 88 22162 5692 nolibc-test.after-only-crt.h-changes The sections are defined by [0]. [0] https://refspecs.linuxfoundation.org/elf/gabi4+/ch5.dynamic.html Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/lkml/20231007-nolibc-constructors-v2-1-ef84693efbc1@weissschuh.net/
2023-10-12tools/nolibc: automatically detect necessity to use pselect6Thomas Weißschuh
We can automatically detect if pselect6 is needed or not from the kernel headers. This removes the need to manually specify it. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20230917-nolibc-syscall-nr-v2-4-03863d509b9a@weissschuh.net
2023-10-12tools/nolibc: don't define new syscall numberThomas Weißschuh
All symbols created by nolibc are also visible to user code. Syscall constants are expected to come from the kernel headers and should not be made up by nolibc. Refactor the logic to avoid defining syscall numbers. Also the new code is easier to understand. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20230917-nolibc-syscall-nr-v2-3-03863d509b9a@weissschuh.net
2023-10-12tools/nolibc: avoid unused parameter warnings for ENOSYS fallbacksThomas Weißschuh
The ENOSYS fallback code does not use its functions parameters. This can lead to compiler warnings about unused parameters. Explicitly avoid these warnings. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Willy Tarreau <w@1wt.eu> Link: https://lore.kernel.org/r/20230917-nolibc-syscall-nr-v2-2-03863d509b9a@weissschuh.net
2023-10-12tools/nolibc: string: Remove the `_nolibc_memcpy_up()` functionAmmar Faizi
This function is only called by memcpy(), there is no real reason to have this wrapper. Delete this function and move the code to memcpy() directly. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2023-10-12tools/nolibc: string: Remove the `_nolibc_memcpy_down()` functionAmmar Faizi
This nolibc internal function is not used. Delete it. It was probably supposed to handle memmove(), but today the memmove() has its own implementation. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2023-10-12tools/nolibc: x86-64: Use `rep stosb` for `memset()`Ammar Faizi
Simplify memset() on the x86-64 arch. The x86-64 arch has a 'rep stosb' instruction, which can perform memset() using only a single instruction, given: %al = value (just like the second argument of memset()) %rdi = destination %rcx = length Before this patch: ``` 00000000000010c9 <memset>: 10c9: 48 89 f8 mov %rdi,%rax 10cc: 48 85 d2 test %rdx,%rdx 10cf: 74 0e je 10df <memset+0x16> 10d1: 31 c9 xor %ecx,%ecx 10d3: 40 88 34 08 mov %sil,(%rax,%rcx,1) 10d7: 48 ff c1 inc %rcx 10da: 48 39 ca cmp %rcx,%rdx 10dd: 75 f4 jne 10d3 <memset+0xa> 10df: c3 ret ``` After this patch: ``` 0000000000001511 <memset>: 1511: 96 xchg %eax,%esi 1512: 48 89 d1 mov %rdx,%rcx 1515: 57 push %rdi 1516: f3 aa rep stos %al,%es:(%rdi) 1518: 58 pop %rax 1519: c3 ret ``` v2: - Use pushq %rdi / popq %rax (Alviro). - Use xchg %eax, %esi (Willy). Link: https://lore.kernel.org/lkml/ZO9e6h2jjVIMpBJP@1wt.eu Suggested-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Suggested-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2023-10-12tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`Ammar Faizi
Simplify memcpy() and memmove() on the x86-64 arch. The x86-64 arch has a 'rep movsb' instruction, which can perform memcpy() using only a single instruction, given: %rdi = destination %rsi = source %rcx = length Additionally, it can also handle the overlapping case by setting DF=1 (backward copy), which can be used as the memmove() implementation. Before this patch: ``` 00000000000010ab <memmove>: 10ab: 48 89 f8 mov %rdi,%rax 10ae: 31 c9 xor %ecx,%ecx 10b0: 48 39 f7 cmp %rsi,%rdi 10b3: 48 83 d1 ff adc $0xffffffffffffffff,%rcx 10b7: 48 85 d2 test %rdx,%rdx 10ba: 74 25 je 10e1 <memmove+0x36> 10bc: 48 83 c9 01 or $0x1,%rcx 10c0: 48 39 f0 cmp %rsi,%rax 10c3: 48 c7 c7 ff ff ff ff mov $0xffffffffffffffff,%rdi 10ca: 48 0f 43 fa cmovae %rdx,%rdi 10ce: 48 01 cf add %rcx,%rdi 10d1: 44 8a 04 3e mov (%rsi,%rdi,1),%r8b 10d5: 44 88 04 38 mov %r8b,(%rax,%rdi,1) 10d9: 48 01 cf add %rcx,%rdi 10dc: 48 ff ca dec %rdx 10df: 75 f0 jne 10d1 <memmove+0x26> 10e1: c3 ret 00000000000010e2 <memcpy>: 10e2: 48 89 f8 mov %rdi,%rax 10e5: 48 85 d2 test %rdx,%rdx 10e8: 74 12 je 10fc <memcpy+0x1a> 10ea: 31 c9 xor %ecx,%ecx 10ec: 40 8a 3c 0e mov (%rsi,%rcx,1),%dil 10f0: 40 88 3c 08 mov %dil,(%rax,%rcx,1) 10f4: 48 ff c1 inc %rcx 10f7: 48 39 ca cmp %rcx,%rdx 10fa: 75 f0 jne 10ec <memcpy+0xa> 10fc: c3 ret ``` After this patch: ``` // memmove is an alias for memcpy 000000000040133b <memcpy>: 40133b: 48 89 d1 mov %rdx,%rcx 40133e: 48 89 f8 mov %rdi,%rax 401341: 48 89 fa mov %rdi,%rdx 401344: 48 29 f2 sub %rsi,%rdx 401347: 48 39 ca cmp %rcx,%rdx 40134a: 72 03 jb 40134f <memcpy+0x14> 40134c: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) 40134e: c3 ret 40134f: 48 8d 7c 0f ff lea -0x1(%rdi,%rcx,1),%rdi 401354: 48 8d 74 0e ff lea -0x1(%rsi,%rcx,1),%rsi 401359: fd std 40135a: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi) 40135c: fc cld 40135d: c3 ret ``` v3: - Make memmove as an alias for memcpy (Willy). - Make the forward copy the likely case (Alviro). v2: - Fix the broken memmove implementation (David). Link: https://lore.kernel.org/lkml/20230902062237.GA23141@1wt.eu Link: https://lore.kernel.org/lkml/5a821292d96a4dbc84c96ccdc6b5b666@AcuMS.aculab.com Suggested-by: David Laight <David.Laight@aculab.com> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2023-10-12tools/nolibc: add stdarg.h headerThomas Weißschuh
This allows nolic to work with `-nostdinc` avoiding any reliance on system headers. The implementation has been lifted from musl libc 1.2.4. There is already an implementation of stdarg.h in include/linux/stdarg.h but that is GPL licensed and therefore not suitable for nolibc. The used compiler builtins have been validated to be at least available since GCC 4.1.2 and clang 3.0.0. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-10-12tools/nolibc: mark start_c as weakThomas Weißschuh
Otherwise the different instances of _start_c from each compilation unit will lead to linker errors: /usr/bin/ld: /tmp/ccSNvRqs.o: in function `_start_c': nolibc-test-foo.c:(.text.nolibc_memset+0x9): multiple definition of `_start_c'; /tmp/ccG25101.o:nolibc-test.c:(.text+0x1ea3): first defined here Fixes: 17336755150b ("tools/nolibc: add new crt.h with _start_c") Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/lkml/20231012-nolibc-start_c-multiple-v1-1-fbfc73e0283f@weissschuh.net/ Link: https://lore.kernel.org/lkml/20231012-nolibc-linkage-test-v1-1-315e682768b4@weissschuh.net/ Acked-by: Willy Tarreau <w@1wt.eu>
2023-10-12tools/nolibc: i386: Fix a stack misalign bug on _startAmmar Faizi
The ABI mandates that the %esp register must be a multiple of 16 when executing a 'call' instruction. Commit 2ab446336b17 ("tools/nolibc: i386: shrink _start with _start_c") simplified the _start function, but it didn't take care of the %esp alignment, causing SIGSEGV on SSE and AVX programs that use aligned move instruction (e.g., movdqa, movaps, and vmovdqa). The 'and $-16, %esp' aligns the %esp at a multiple of 16. Then 'push %eax' will subtract the %esp by 4; thus, it breaks the 16-byte alignment. Make sure the %esp is correctly aligned after the push by subtracting 12 before the push. Extra: Add 'add $12, %esp' before the 'and $-16, %esp' to avoid over-estimating for particular cases as suggested by Willy. A test program to validate the %esp alignment on _start can be found at: https://lore.kernel.org/lkml/ZOoindMFj1UKqo+s@biznet-home.integral.gnuweeb.org [ Thomas: trim Fixes tag commit id ] Cc: Zhangjin Wu <falcon@tinylab.org> Fixes: 2ab446336b17 ("tools/nolibc: i386: shrink _start with _start_c") Reported-by: Nicholas Rosenberg <inori@vnlx.org> Acked-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
2023-10-11bpf: Implement cgroup sockaddr hooks for unix socketsDaan De Meyer
These hooks allows intercepting connect(), getsockname(), getpeername(), sendmsg() and recvmsg() for unix sockets. The unix socket hooks get write access to the address length because the address length is not fixed when dealing with unix sockets and needs to be modified when a unix socket address is modified by the hook. Because abstract socket unix addresses start with a NUL byte, we cannot recalculate the socket address in kernelspace after running the hook by calculating the length of the unix socket path using strlen(). These hooks can be used when users want to multiplex syscall to a single unix socket to multiple different processes behind the scenes by redirecting the connect() and other syscalls to process specific sockets. We do not implement support for intercepting bind() because when using bind() with unix sockets with a pathname address, this creates an inode in the filesystem which must be cleaned up. If we rewrite the address, the user might try to clean up the wrong file, leaking the socket in the filesystem where it is never cleaned up. Until we figure out a solution for this (and a use case for intercepting bind()), we opt to not allow rewriting the sockaddr in bind() calls. We also implement recvmsg() support for connected streams so that after a connect() that is modified by a sockaddr hook, any corresponding recmvsg() on the connected socket can also be modified to make the connected program think it is connected to the "intended" remote. Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Link: https://lore.kernel.org/r/20231011185113.140426-5-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-09bpf: Derive source IP addr via bpf_*_fib_lookup()Martynas Pumputis
Extend the bpf_fib_lookup() helper by making it to return the source IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set. For example, the following snippet can be used to derive the desired source IP address: struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr }; ret = bpf_skb_fib_lookup(skb, p, sizeof(p), BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH); if (ret != BPF_FIB_LKUP_RET_SUCCESS) return TC_ACT_SHOT; /* the p.ipv4_src now contains the source address */ The inability to derive the proper source address may cause malfunctions in BPF-based dataplanes for hosts containing netdevs with more than one routable IP address or for multi-homed hosts. For example, Cilium implements packet masquerading in BPF. If an egressing netdev to which the Cilium's BPF prog is attached has multiple IP addresses, then only one [hardcoded] IP address can be used for masquerading. This breaks connectivity if any other IP address should have been selected instead, for example, when a public and private addresses are attached to the same egress interface. The change was tested with Cilium [1]. Nikolay Aleksandrov helped to figure out the IPv6 addr selection. [1]: https://github.com/cilium/cilium/pull/28283 Signed-off-by: Martynas Pumputis <m@lambda.lt> Link: https://lore.kernel.org/r/20231007081415.33502-2-m@lambda.lt Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-10-09bpf: Add ability to pin bpf timer to calling CPUDavid Vernet
BPF supports creating high resolution timers using bpf_timer_* helper functions. Currently, only the BPF_F_TIMER_ABS flag is supported, which specifies that the timeout should be interpreted as absolute time. It would also be useful to be able to pin that timer to a core. For example, if you wanted to make a subset of cores run without timer interrupts, and only have the timer be invoked on a single core. This patch adds support for this with a new BPF_F_TIMER_CPU_PIN flag. When specified, the HRTIMER_MODE_PINNED flag is passed to hrtimer_start(). A subsequent patch will update selftests to validate. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <song@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20231004162339.200702-2-void@manifault.com
2023-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts (or adjacent changes of note). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-09-26Merge tag 'perf-tools-fixes-for-v6.6-1-2023-09-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Namhyung Kim: "Build: - Update header files in the tools/**/include directory to sync with the kernel sources as usual. - Remove unused bpf-prologue files. While it's not strictly a fix, but the functionality was removed in this cycle so better to get rid of the code together. - Other minor build fixes. Misc: - Fix uninitialized memory access in PMU parsing code - Fix segfaults on software event" * tag 'perf-tools-fixes-for-v6.6-1-2023-09-25' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf jevent: fix core dump on software events on s390 perf pmu: Ensure all alias variables are initialized perf jevents metric: Fix type of strcmp_cpuid_str perf trace: Avoid compile error wrt redefining bool perf bpf-prologue: Remove unused file tools headers UAPI: Update tools's copy of drm.h headers tools arch x86: Sync the msr-index.h copy with the kernel sources perf bench sched-seccomp-notify: Use the tools copy of seccomp.h UAPI tools headers UAPI: Copy seccomp.h to be able to build 'perf bench' in older systems tools headers UAPI: Sync files changed by new fchmodat2 and map_shadow_stack syscalls with the kernel sources perf tools: Update copy of libbpf's hashmap.c
2023-09-25bpf: Add missed value to kprobe perf link infoJiri Olsa
Add missed value to kprobe attached through perf link info to hold the stats of missed kprobe handler execution. The kprobe's missed counter gets incremented when kprobe handler is not executed due to another kprobe running on the same cpu. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230920213145.1941596-4-jolsa@kernel.org
2023-09-25bpf: Add missed value to kprobe_multi link infoJiri Olsa
Add missed value to kprobe_multi link info to hold the stats of missed kprobe_multi probe. The missed counter gets incremented when fprobe fails the recursion check or there's no rethook available for return probe. In either case the attached bpf program is not executed. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Song Liu <song@kernel.org> Reviewed-by: Song Liu <song@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20230920213145.1941596-3-jolsa@kernel.org
2023-09-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni
Cross-merge networking fixes after downstream PR. No conflicts. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-09-21Merge tag 'net-6.6-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and bpf. Current release - regressions: - bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE - netfilter: fix entries val in rule reset audit log - eth: stmmac: fix incorrect rxq|txq_stats reference Previous releases - regressions: - ipv4: fix null-deref in ipv4_link_failure - netfilter: - fix several GC related issues - fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP - eth: team: fix null-ptr-deref when team device type is changed - eth: i40e: fix VF VLAN offloading when port VLAN is configured - eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB Previous releases - always broken: - core: fix ETH_P_1588 flow dissector - mptcp: fix several connection hang-up conditions - bpf: - avoid deadlock when using queue and stack maps from NMI - add override check to kprobe multi link attach - hsr: properly parse HSRv1 supervisor frames. - eth: igc: fix infinite initialization loop with early XDP redirect - eth: octeon_ep: fix tx dma unmap len values in SG - eth: hns3: fix GRE checksum offload issue" * tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits) sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast() igc: Expose tx-usecs coalesce setting to user octeontx2-pf: Do xdp_do_flush() after redirects. bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI net: ena: Flush XDP packets on error. net/handshake: Fix memory leak in __sock_create() and sock_alloc_file() net: hinic: Fix warning-hinic_set_vlan_fliter() warn: variable dereferenced before check 'hwdev' netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP netfilter: nf_tables: fix memleak when more than 255 elements expired netfilter: nf_tables: disable toggling dormant table state more than once vxlan: Add missing entries to vxlan_get_size() net: rds: Fix possible NULL-pointer dereference team: fix null-ptr-deref when team device type is changed net: bridge: use DEV_STATS_INC() net: hns3: add 5ms delay before clear firmware reset irq source net: hns3: fix fail to delete tc flower rules during reset issue net: hns3: only enable unicast promisc when mac table full net: hns3: fix GRE checksum offload issue net: hns3: add cmdq check for vf periodic service task net: stmmac: fix incorrect rxq|txq_stats reference ...
2023-09-15bpf: Fix BTF_ID symbol generation collision in tools/Nick Desaulniers
Marcus and Satya reported an issue where BTF_ID macro generates same symbol in separate objects and that breaks final vmlinux link. ld.lld: error: ld-temp.o <inline asm>:14577:1: symbol '__BTF_ID__struct__cgroup__624' is already defined This can be triggered under specific configs when __COUNTER__ happens to be the same for the same symbol in two different translation units, which is already quite unlikely to happen. Add __LINE__ number suffix to make BTF_ID symbol more unique, which is not a complete fix, but it would help for now and meanwhile we can work on better solution as suggested by Andrii. Cc: stable@vger.kernel.org Reported-by: Satya Durga Srinivasu Prabhala <quic_satyap@quicinc.com> Reported-by: Marcus Seyfarth <m.seyfarth@gmail.com> Closes: https://github.com/ClangBuiltLinux/linux/issues/1913 Debugged-by: Nathan Chancellor <nathan@kernel.org> Co-developed-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/CAEf4Bzb5KQ2_LmhN769ifMeSJaWfebccUasQOfQKaOd0nQ51tw@mail.gmail.com/ Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20230915-bpf_collision-v3-2-263fc519c21f@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-09-15bpf: expose information about supported xdp metadata kfuncStanislav Fomichev
Add new xdp-rx-metadata-features member to netdev netlink which exports a bitmask of supported kfuncs. Most of the patch is autogenerated (headers), the only relevant part is netdev.yaml and the changes in netdev-genl.c to marshal into netlink. Example output on veth: $ ip link add veth0 type veth peer name veth1 # ifndex == 12 $ ./tools/net/ynl/samples/netdev 12 Select ifc ($ifindex; or 0 = dump; or -2 ntf check): 12 veth1[12] xdp-features (23): basic redirect rx-sg xdp-rx-metadata-features (3): timestamp hash xdp-zc-max-segs=0 Cc: netdev@vger.kernel.org Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230913171350.369987-3-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-09-14memblock tests: fix warning ‘struct seq_file’ declared inside parameter listMike Rapoport (IBM)
Building memblock tests produces the following warning: cc -I. -I../../include -Wall -O2 -fsanitize=address -fsanitize=undefined -D CONFIG_PHYS_ADDR_T_64BIT -c -o main.o main.c In file included from tests/common.h:9, from tests/basic_api.h:5, from main.c:2: ./linux/memblock.h:601:50: warning: ‘struct seq_file’ declared inside parameter list will not be visible outside of this definition or declaration 601 | static inline void memtest_report_meminfo(struct seq_file *m) { } | ^~~~~~~~ Add declaration of 'struct seq_file' to tools/include/linux/seq_file.h to fix it. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2023-09-14memblock tests: fix warning: "__ALIGN_KERNEL" redefinedMike Rapoport (IBM)
Building memblock tests produces the following warning: cc -I. -I../../include -Wall -O2 -fsanitize=address -fsanitize=undefined -D CONFIG_PHYS_ADDR_T_64BIT -c -o main.o main.c In file included from ../../include/linux/pfn.h:5, from ./linux/memory_hotplug.h:6, from ./linux/init.h:7, from ./linux/memblock.h:11, from tests/common.h:8, from tests/basic_api.h:5, from main.c:2: ../../include/linux/mm.h:14: warning: "__ALIGN_KERNEL" redefined 14 | #define __ALIGN_KERNEL(x, a) __ALIGN_KERNEL_MASK(x, (typeof(x))(a) - 1) | In file included from ../../include/linux/mm.h:6, from ../../include/linux/pfn.h:5, from ./linux/memory_hotplug.h:6, from ./linux/init.h:7, from ./linux/memblock.h:11, from tests/common.h:8, from tests/basic_api.h:5, from main.c:2: ../../include/uapi/linux/const.h:31: note: this is the location of the previous definition 31 | #define __ALIGN_KERNEL(x, a) __ALIGN_KERNEL_MASK(x, (__typeof__(x))(a) - 1) | Remove definitions of __ALIGN_KERNEL and __ALIGN_KERNEL_MASK from tools/include/linux/mm.h to fix it. Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2023-09-14memblock tests: Fix compilation errors.Rong Tao
This patch fix the follow errors. commit 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") pass nid parameter to reserve_bootmem_region(), $ make -C tools/testing/memblock/ ... memblock.c: In function ‘memmap_init_reserved_pages’: memblock.c:2111:25: error: too many arguments to function ‘reserve_bootmem_region’ 2111 | reserve_bootmem_region(start, end, nid); | ^~~~~~~~~~~~~~~~~~~~~~ ../../include/linux/mm.h:32:6: note: declared here 32 | void reserve_bootmem_region(phys_addr_t start, phys_addr_t end); | ^~~~~~~~~~~~~~~~~~~~~~ memblock.c:2122:17: error: too many arguments to function ‘reserve_bootmem_region’ 2122 | reserve_bootmem_region(start, end, nid); | ^~~~~~~~~~~~~~~~~~~~~~ commit dcdfdd40fa82 ("mm: Add support for unaccepted memory") call accept_memory() in memblock.c $ make -C tools/testing/memblock/ ... cc -fsanitize=address -fsanitize=undefined main.o memblock.o \ lib/slab.o mmzone.o slab.o tests/alloc_nid_api.o \ tests/alloc_helpers_api.o tests/alloc_api.o tests/basic_api.o \ tests/common.o tests/alloc_exact_nid_api.o -o main /usr/bin/ld: memblock.o: in function `memblock_alloc_range_nid': memblock.c:(.text+0x7ae4): undefined reference to `accept_memory' Signed-off-by: Rong Tao <rongtao@cestc.cn> Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory") Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") Link: https://lore.kernel.org/r/tencent_6F19BC082167F15DF2A8D8BEFE8EF220F60A@qq.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
2023-09-13tools headers UAPI: Update tools's copy of drm.h headersArnaldo Carvalho de Melo
Picking the changes from: ad9ee11fdf113f96 ("drm/doc: document that PRIME import/export is always supported") 2ff4f6d410afa762 ("drm/doc: document drm_event and its types") 9a2eabf48ade4fba ("drm/doc: use proper cross-references for sections") c7a4722971691562 ("drm/syncobj: add IOCTL to register an eventfd") Addressing these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/drm/drm.h include/uapi/drm/drm.h Now 'perf trace' and other code that might use the tools/perf/trace/beauty autogenerated tables will be able to translate this new ioctl code into a string: $ tools/perf/trace/beauty/drm_ioctl.sh > before $ cp include/uapi/drm/drm.h tools/include/uapi/drm/drm.h $ tools/perf/trace/beauty/drm_ioctl.sh > after $ diff -u before after --- before 2023-09-13 08:54:45.170134002 -0300 +++ after 2023-09-13 08:55:06.612712776 -0300 @@ -108,6 +108,7 @@ [0xCC] = "SYNCOBJ_TRANSFER", [0xCD] = "SYNCOBJ_TIMELINE_SIGNAL", [0xCE] = "MODE_GETFB2", + [0xCF] = "SYNCOBJ_EVENTFD", [DRM_COMMAND_BASE + 0x00] = "I915_INIT", [DRM_COMMAND_BASE + 0x01] = "I915_FLUSH", [DRM_COMMAND_BASE + 0x02] = "I915_FLIP", $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Simon Ser <contact@emersion.fr> Link: https://lore.kernel.org/lkml/ZQGkh9qlhpKA%2FSMY@kernel.org/ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-13tools headers UAPI: Copy seccomp.h to be able to build 'perf bench' in older ↵Arnaldo Carvalho de Melo
systems The new 'perf bench' for sched-seccomp-notify uses defines and types not available in older systems where we want to have perf available, so grab a copy of this UAPI from the kernel sources to allow that. This will be checked in the future for drift from the original when we build the perf tool, that will warn when that happens like: make: Entering directory '/var/home/acme/git/perf-tools/tools/perf' BUILD: Doing 'make -j32' parallel build Warning: Kernel ABI header differences: Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andrei Vagin <avagin@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kees Kook <keescook@chromium.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/ZQGhMXtwX7RvV3ya@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-13tools headers UAPI: Sync files changed by new fchmodat2 and map_shadow_stack ↵Arnaldo Carvalho de Melo
syscalls with the kernel sources To pick the changes in these csets: c35559f94ebc3e3b ("x86/shstk: Introduce map_shadow_stack syscall") 78252deb023cf087 ("arch: Register fchmodat2, usually as syscall 452") That add support for this new syscall in tools such as 'perf trace'. For instance, this is now possible: # perf trace -v -e fchmodat*,map_shadow_stack --max-events=4 Using CPUID AuthenticAMD-25-21-0 Reusing "openat" BPF sys_enter augmenter for "fchmodat" event qualifier tracepoint filter: (common_pid != 3499340 && common_pid != 11259) && (id == 268 || id == 452 || id == 453) ^C# And it'll work as with other syscalls, for instance openat: # perf trace -e openat* --max-events=4 0.000 ( 0.015 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/proc/meminfo", flags: RDONLY|CLOEXEC) = 11 0.068 ( 0.019 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.pressure", flags: RDONLY|CLOEXEC) = 11 0.119 ( 0.008 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.current", flags: RDONLY|CLOEXEC) = 11 0.138 ( 0.006 ms): systemd-oomd/1150 openat(dfd: CWD, filename: "/sys/fs/cgroup/user.slice/user-1001.slice/user@1001.service/memory.min", flags: RDONLY|CLOEXEC) = 11 # That is the filter expression attached to the raw_syscalls:sys_{enter,exit} tracepoints. $ find tools/perf/arch/ -name "syscall*tbl" | xargs grep -E fchmodat\|sys_map_shadow_stack tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:258 n64 fchmodat sys_fchmodat tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl:452 n64 fchmodat2 sys_fchmodat2 tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:297 common fchmodat sys_fchmodat tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:452 common fchmodat2 sys_fchmodat2 tools/perf/arch/s390/entry/syscalls/syscall.tbl:299 common fchmodat sys_fchmodat sys_fchmodat tools/perf/arch/s390/entry/syscalls/syscall.tbl:452 common fchmodat2 sys_fchmodat2 sys_fchmodat2 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:268 common fchmodat sys_fchmodat tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:452 common fchmodat2 sys_fchmodat2 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:453 64 map_shadow_stack sys_map_shadow_stack $ $ grep -Ew map_shadow_stack\|fchmodat2 /tmp/build/perf-tools/arch/x86/include/generated/asm/syscalls_64.c [452] = "fchmodat2", [453] = "map_shadow_stack", $ This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl diff -u tools/perf/arch/powerpc/entry/syscalls/syscall.tbl arch/powerpc/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/s390/entry/syscalls/syscall.tbl arch/s390/kernel/syscalls/syscall.tbl diff -u tools/perf/arch/mips/entry/syscalls/syscall_n64.tbl arch/mips/kernel/syscalls/syscall_n64.tbl Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Link: https://lore.kernel.org/lkml/ZP8bE7aXDBu%2Fdrak@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-09-11bpf: Clarify error expectations from bpf_clone_redirectStanislav Fomichev
Commit 151e887d8ff9 ("veth: Fixing transmit return status for dropped packets") exposed the fact that bpf_clone_redirect is capable of returning raw NET_XMIT_XXX return codes. This is in the conflict with its UAPI doc which says the following: "0 on success, or a negative error in case of failure." Update the UAPI to reflect the fact that bpf_clone_redirect can return positive error numbers, but don't explicitly define their meaning. Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230911194731.286342-1-sdf@google.com
2023-09-08bpf: Mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecatedYonghong Song
Now 'BPF_MAP_TYPE_CGRP_STORAGE + local percpu ptr' can cover all BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE functionality and more. So mark BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE deprecated. Also make changes in selftests/bpf/test_bpftool_synctypes.py and selftest libbpf_str to fix otherwise test errors. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20230827152837.2003563-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-29Merge tag 'net-next-6.6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Increase size limits for to-be-sent skb frag allocations. This allows tun, tap devices and packet sockets to better cope with large writes operations - Store netdevs in an xarray, to simplify iterating over netdevs - Refactor nexthop selection for multipath routes - Improve sched class lifetime handling - Add backup nexthop ID support for bridge - Implement drop reasons support in openvswitch - Several data races annotations and fixes - Constify the sk parameter of routing functions - Prepend kernel version to netconsole message Protocols: - Implement support for TCP probing the peer being under memory pressure - Remove hard coded limitation on IPv6 specific info placement inside the socket struct - Get rid of sysctl_tcp_adv_win_scale and use an auto-estimated per socket scaling factor - Scaling-up the IPv6 expired route GC via a separated list of expiring routes - In-kernel support for the TLS alert protocol - Better support for UDP reuseport with connected sockets - Add NEXT-C-SID support for SRv6 End.X behavior, reducing the SR header size - Get rid of additional ancillary per MPTCP connection struct socket - Implement support for BPF-based MPTCP packet schedulers - Format MPTCP subtests selftests results in TAP - Several new SMC 2.1 features including unique experimental options, max connections per lgr negotiation, max links per lgr negotiation BPF: - Multi-buffer support in AF_XDP - Add multi uprobe BPF links for attaching multiple uprobes and usdt probes, which is significantly faster and saves extra fds - Implement an fd-based tc BPF attach API (TCX) and BPF link support on top of it - Add SO_REUSEPORT support for TC bpf_sk_assign - Support new instructions from cpu v4 to simplify the generated code and feature completeness, for x86, arm64, riscv64 - Support defragmenting IPv(4|6) packets in BPF - Teach verifier actual bounds of bpf_get_smp_processor_id() and fix perf+libbpf issue related to custom section handling - Introduce bpf map element count and enable it for all program types - Add a BPF hook in sys_socket() to change the protocol ID from IPPROTO_TCP to IPPROTO_MPTCP to cover migration for legacy - Introduce bpf_me_mcache_free_rcu() and fix OOM under stress - Add uprobe support for the bpf_get_func_ip helper - Check skb ownership against full socket - Support for up to 12 arguments in BPF trampoline - Extend link_info for kprobe_multi and perf_event links Netfilter: - Speed-up process exit by aborting ruleset validation if a fatal signal is pending - Allow NLA_POLICY_MASK to be used with BE16/BE32 types Driver API: - Page pool optimizations, to improve data locality and cache usage - Introduce ndo_hwtstamp_get() and ndo_hwtstamp_set() to avoid the need for raw ioctl() handling in drivers - Simplify genetlink dump operations (doit/dumpit) providing them the common information already populated in struct genl_info - Extend and use the yaml devlink specs to [re]generate the split ops - Introduce devlink selective dumps, to allow SF filtering SF based on handle and other attributes - Add yaml netlink spec for netlink-raw families, allow route, link and address related queries via the ynl tool - Remove phylink legacy mode support - Support offload LED blinking to phy - Add devlink port function attributes for IPsec New hardware / drivers: - Ethernet: - Broadcom ASP 2.0 (72165) ethernet controller - MediaTek MT7988 SoC - Texas Instruments AM654 SoC - Texas Instruments IEP driver - Atheros qca8081 phy - Marvell 88Q2110 phy - NXP TJA1120 phy - WiFi: - MediaTek mt7981 support - Can: - Kvaser SmartFusion2 PCI Express devices - Allwinner T113 controllers - Texas Instruments tcan4552/4553 chips - Bluetooth: - Intel Gale Peak - Qualcomm WCN3988 and WCN7850 - NXP AW693 and IW624 - Mediatek MT2925 Drivers: - Ethernet NICs: - nVidia/Mellanox: - mlx5: - support UDP encapsulation in packet offload mode - IPsec packet offload support in eswitch mode - improve aRFS observability by adding new set of counters - extends MACsec offload support to cover RoCE traffic - dynamic completion EQs - mlx4: - convert to use auxiliary bus instead of custom interface logic - Intel - ice: - implement switchdev bridge offload, even for LAG interfaces - implement SRIOV support for LAG interfaces - igc: - add support for multiple in-flight TX timestamps - Broadcom: - bnxt: - use the unified RX page pool buffers for XDP and non-XDP - use the NAPI skb allocation cache - OcteonTX2: - support Round Robin scheduling HTB offload - TC flower offload support for SPI field - Freescale: - add XDP_TX feature support - AMD: - ionic: add support for PCI FLR event - sfc: - basic conntrack offload - introduce eth, ipv4 and ipv6 pedit offloads - ST Microelectronics: - stmmac: maximze PTP timestamping resolution - Virtual NICs: - Microsoft vNIC: - batch ringing RX queue doorbell on receiving packets - add page pool for RX buffers - Virtio vNIC: - add per queue interrupt coalescing support - Google vNIC: - add queue-page-list mode support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add port range matching tc-flower offload - permit enslavement to netdevices with uppers - Ethernet embedded switches: - Marvell (mv88e6xxx): - convert to phylink_pcs - Renesas: - r8A779fx: add speed change support - rzn1: enables vlan support - Ethernet PHYs: - convert mv88e6xxx to phylink_pcs - WiFi: - Qualcomm Wi-Fi 7 (ath12k): - extremely High Throughput (EHT) PHY support - RealTek (rtl8xxxu): - enable AP mode for: RTL8192FU, RTL8710BU (RTL8188GU), RTL8192EU and RTL8723BU - RealTek (rtw89): - Introduce Time Averaged SAR (TAS) support - Connector: - support for event filtering" * tag 'net-next-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1806 commits) net: ethernet: mtk_wed: minor change in wed_{tx,rx}info_show net: ethernet: mtk_wed: add some more info in wed_txinfo_show handler net: stmmac: clarify difference between "interface" and "phy_interface" r8152: add vendor/device ID pair for D-Link DUB-E250 devlink: move devlink_notify_register/unregister() to dev.c devlink: move small_ops definition into netlink.c devlink: move tracepoint definitions into core.c devlink: push linecard related code into separate file devlink: push rate related code into separate file devlink: push trap related code into separate file devlink: use tracepoint_enabled() helper devlink: push region related code into separate file devlink: push param related code into separate file devlink: push resource related code into separate file devlink: push dpipe related code into separate file devlink: move and rename devlink_dpipe_send_and_alloc_skb() helper devlink: push shared buffer related code into separate file devlink: push port related code into separate file devlink: push object register/unregister notifications into separate helpers inet: fix IP_TRANSPARENT error handling ...
2023-08-28Merge tag 'linux-kselftest-nolibc-6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull nolibc updates from Shuah Khan: "Nolibc: - improved portability by removing build errors with -ENOSYS - added syscall6() on MIPS to support pselect6() and mmap() - added setvbuf(), rmdir(), pipe(), pipe2() - add support for ppc/ppc64 - environ is no longer optional - fixed frame pointer issues at -O0 - dropped sys_stat() in favor of sys_statx() - centralized _start_c() to remove lots of asm code - switched size_t to __SIZE_TYPE__ Selftests: - improved status reporting (success/warning/failure counts, path to log file) - various code cleanups (indent, unused variables, ...) - more consistent test numbering - enabled compiler warnings - dropped unreliable chmod_net test - improved reliability (create /dev/zero & /tmp, rely less on /proc) - new tests (brk/sbrk/mmap/munmap) - improved compatibility with musl - new run-nolibc-test target to build and run natively - new run-libc-test target to build and run against native libc - made the cmdline parser more reliable against boolean arguments - dropped dependency on memfd for vfprintf() test - nolibc-test is no longer stripped - added support for extending ARCH via XARCH Other: - add Thomas as co-maintainer" * tag 'linux-kselftest-nolibc-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (103 commits) tools/nolibc: avoid undesired casts in the __sysret() macro tools/nolibc: keep brk(), sbrk(), mmap() away from __sysret() tools/nolibc: silence ppc64 compile warnings selftests/nolibc: libc-test: use HOSTCC instead of CC tools/nolibc: stackprotector.h: make __stack_chk_init static selftests/nolibc: allow report with existing test log selftests/nolibc: add test support for ppc64 selftests/nolibc: add test support for ppc64le selftests/nolibc: add test support for ppc selftests/nolibc: add XARCH and ARCH mapping support tools/nolibc: add support for powerpc64 tools/nolibc: add support for powerpc MAINTAINERS: nolibc: add myself as co-maintainer selftests/nolibc: enable compiler warnings selftests/nolibc: don't strip nolibc-test selftests/nolibc: prevent out of bounds access in expect_vfprintf selftests/nolibc: use correct return type for read() and write() selftests/nolibc: avoid sign-compare warnings selftests/nolibc: avoid unused parameter warnings selftests/nolibc: make functions static if possible ...
2023-08-23tools/nolibc: avoid undesired casts in the __sysret() macroWilly Tarreau
Having __sysret() as an inline function has the unfortunate effect of adding casts and large constants comparisons after the syscall returns that significantly inflate some light code that's otherwise syscall- heavy. Even nolibc-test grew by ~1%. Let's switch back to a macro for this, and use it only with signed arguments. Note that it is also possible to design a slightly more complex macro covering unsigned and pointers but we only have 3 such syscalls so it is pointless, and these were just addressed not to use this macro anymore. Now for the argument (the local variable containing the syscall return value), any negative value is an error, that results in -1 being returned and errno to be assigned the opposite value. This may be revisited again in the future if really needed but for now let's get back to something sane. Fixes: 428905da6ec4 ("tools/nolibc: sys.h: add a syscall return helper") Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/ Link: https://lore.kernel.org/lkml/ZNKOJY+g66nkIyvv@1wt.eu/ Cc: Zhangjin Wu <falcon@tinylab.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Thomas Weißschuh <thomas@t-8ch.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: keep brk(), sbrk(), mmap() away from __sysret()Willy Tarreau
The __sysret() function causes some undesirable casts so we'll revert it. In order to keep it simple it will now only support integer return values like in the past, so we must basically revert the changes that were made to these 3 syscalls which return a pointer so that they simply rely on their own test and the SET_ERRNO() macro. Fixes: 4201cfce15fe ("tools/nolibc: clean up sbrk() routine") Fixes: 924e9539aeaa ("tools/nolibc: clean up mmap() routine") Fixes: d27447bc2e0a ("tools/nolibc: sys.h: apply __sysret() helper") Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/ Link: https://lore.kernel.org/lkml/ZNKOJY+g66nkIyvv@1wt.eu/ Cc: Zhangjin Wu <falcon@tinylab.org> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Thomas Weißschuh <thomas@t-8ch.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: silence ppc64 compile warningsZhangjin Wu
Silence the following warnings reported by the new -Wall -Wextra options with pure assembly code. In file included from sysroot/powerpc/include/stdio.h:13, from nolibc-test.c:13: sysroot/powerpc/include/arch.h: In function '_start': sysroot/powerpc/include/arch.h:192:32: warning: unused variable 'r2' [-Wunused-variable] 192 | register volatile long r2 __asm__ ("r2") = (void *)&TOC - (void *)_start; | ^~ sysroot/powerpc/include/arch.h:187:97: warning: optimization may eliminate reads and/or writes to register variables [-Wvolatile-register-var] 187 | void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_stack_protector _start(void) | ^~~~~~ Since only elfv2 ABI requires to save the TOC/GOT pointer to r2 register, when using elfv1 ABI, the old C code is simply ignored by the compiler, but the compiler can not ignore the inline assembly code and will introduce build failure or running segfaults. So, let's further only add the new assembly code for elfv2 ABI with the checking of _CALL_ELF == 2. Link: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.pdf Link: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Euro-LLVM-2014-Weigand.pdf Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: stackprotector.h: make __stack_chk_init staticZhangjin Wu
This allows to generate smaller text/data/dec size. As the _start_c() function added by crt.h, __stack_chk_init() is called from _start_c() instead of the assembly _start. So, it is able to mark it with static now. Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2023-08-23tools/nolibc: add support for powerpc64Zhangjin Wu
This follows the 64-bit PowerPC ABI [1], refers to the slides: "A new ABI for little-endian PowerPC64 Design & Implementation" [2] and the musl code in arch/powerpc64/crt_arch.h. First, stdu and clrrdi are used instead of stwu and clrrwi for powerpc64. Second, the stack frame size is increased to 32 bytes for powerpc64, 32 bytes is the minimal stack frame size supported described in [2]. Besides, the TOC pointer (GOT pointer) must be saved to r2. This works on both little endian and big endian 64-bit PowerPC. [1]: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.pdf [2]: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Euro-LLVM-2014-Weigand.pdf Reviewed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Zhangjin Wu <falcon@tinylab.org> Signed-off-by: Willy Tarreau <w@1wt.eu>