summaryrefslogtreecommitdiff
path: root/tools/include
AgeCommit message (Collapse)Author
2023-03-01bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwrJoanne Koong
Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr. The user must pass in a buffer to store the contents of the data slice if a direct pointer to the data cannot be obtained. For skb and xdp type dynptrs, these two APIs are the only way to obtain a data slice. However, for other types of dynptrs, there is no difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data. For skb type dynptrs, the data is copied into the user provided buffer if any of the data is not in the linear portion of the skb. For xdp type dynptrs, the data is copied into the user provided buffer if the data is between xdp frags. If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then the skb will be uncloned (see bpf_unclone_prologue()). Please note that any bpf_dynptr_write() automatically invalidates any prior data slices of the skb dynptr. This is because the skb may be cloned or may need to pull its paged buffer into the head. As such, any bpf_dynptr_write() will automatically have its prior data slices invalidated, even if the write is to data in the skb head of an uncloned skb. Please note as well that any other helper calls that change the underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data slices of the skb dynptr as well, for the same reasons. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-10-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Add xdp dynptrsJoanne Koong
Add xdp dynptrs, which are dynptrs whose underlying pointer points to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of xdp->data and xdp->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For reads and writes on the dynptr, this includes reading/writing from/to and across fragments. Data slices through the bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() should be used. For examples of how xdp dynptrs can be used, please see the attached selftests. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-9-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01bpf: Add skb dynptrsJoanne Koong
Add skb dynptrs, which are dynptrs whose underlying pointer points to a skb. The dynptr acts on skb data. skb dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of skb->data and skb->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For bpf prog types that don't support writes on skb data, the dynptr is read-only (bpf_dynptr_write() will return an error) For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write() interfaces, reading and writing from/to data in the head as well as from/to non-linear paged buffers is supported. Data slices through the bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used. For examples of how skb dynptrs can be used, please see the attached selftests. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-27Merge tag 'net-6.3-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from wireless and netfilter. The notable fixes here are the EEE fix which restores boot for many embedded platforms (real and QEMU); WiFi warning suppression and the ICE Kconfig cleanup. Current release - regressions: - phy: multiple fixes for EEE rework - wifi: wext: warn about usage only once - wifi: ath11k: allow system suspend to survive ath11k Current release - new code bugs: - mlx5: Fix memory leak in IPsec RoCE creation - ibmvnic: assign XPS map to correct queue index Previous releases - regressions: - netfilter: ip6t_rpfilter: Fix regression with VRF interfaces - netfilter: ctnetlink: make event listener tracking global - nf_tables: allow to fetch set elements when table has an owner - mlx5: - fix skb leak while fifo resync and push - fix possible ptp queue fifo use-after-free Previous releases - always broken: - sched: fix action bind logic - ptp: vclock: use mutex to fix "sleep on atomic" bug if driver also uses a mutex - netfilter: conntrack: fix rmmod double-free race - netfilter: xt_length: use skb len to match in length_mt6, avoid issues with BIG TCP Misc: - ice: remove unnecessary CONFIG_ICE_GNSS - mlx5e: remove hairpin write debugfs files - sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy" * tag 'net-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (53 commits) tcp: tcp_check_req() can be called from process context net: phy: c45: fix network interface initialization failures on xtensa, arm:cubieboard xen-netback: remove unused variables pending_idx and index net/sched: act_api: move TCA_EXT_WARN_MSG to the correct hierarchy net: dsa: ocelot_ext: remove unnecessary phylink.h include net: mscc: ocelot: fix duplicate driver name error net: dsa: felix: fix internal MDIO controller resource length net: dsa: seville: ignore mscc-miim read errors from Lynx PCS net/sched: act_sample: fix action bind logic net/sched: act_mpls: fix action bind logic net/sched: act_pedit: fix action bind logic wifi: wext: warn about usage only once wifi: mt76: usb: fix use-after-free in mt76u_free_rx_queue qede: avoid uninitialized entries in coal_entry array nfc: fix memory leak of se_io context in nfc_genl_se_io ice: remove unnecessary CONFIG_ICE_GNSS net/sched: cls_api: Move call to tcf_exts_miss_cookie_base_destroy() ibmvnic: Assign XPS map to correct queue index docs: net: fix inaccuracies in msg_zerocopy.rst tools: net: add __pycache__ to gitignore ...
2023-02-24netdev-genl: fix repeated typo oflloading -> offloadingTariq Toukan
Fix a repeated copy/paste typo. Fixes: d3d854fd6a1d ("netdev-genl: create a simple family for netdev stuff") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-23Merge tag 'nolibc.2023.02.06a' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull nolibc updates from Paul McKenney: - Add s390 support - Add support for the ARM Thumb1 instruction set - Fix O_* flags definitions for open() and fcntl() - Make errno a weak symbol instead of a static variable - Export environ as a weak symbol - Export _auxv as a weak symbol for auxilliary vector retrieval - Implement getauxval() and getpagesize() - Further improve self tests, including permitting userland testing of the nolibc library * tag 'nolibc.2023.02.06a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (28 commits) selftests/nolibc: Add a "run-user" target to test the program in user land selftests/nolibc: Support "x86_64" for arch name selftests/nolibc: Add `getpagesize(2)` selftest nolibc/sys: Implement `getpagesize(2)` function nolibc/stdlib: Implement `getauxval(3)` function tools/nolibc: add auxiliary vector retrieval for s390 tools/nolibc: add auxiliary vector retrieval for mips tools/nolibc: add auxiliary vector retrieval for riscv tools/nolibc: add auxiliary vector retrieval for arm tools/nolibc: add auxiliary vector retrieval for arm64 tools/nolibc: add auxiliary vector retrieval for x86_64 tools/nolibc: add auxiliary vector retrieval for i386 tools/nolibc: export environ as a weak symbol on s390 tools/nolibc: export environ as a weak symbol on riscv tools/nolibc: export environ as a weak symbol on mips tools/nolibc: export environ as a weak symbol on arm tools/nolibc: export environ as a weak symbol on arm64 tools/nolibc: export environ as a weak symbol on i386 tools/nolibc: export environ as a weak symbol on x86_64 tools/nolibc: make errno a weak symbol instead of a static one ...
2023-02-23Merge branch 'linus' into objtool/core, to pick up Xen dependenciesIngo Molnar
Pick up dependencies - freshly merged upstream via xen-next - before applying dependent objtool changes. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-02-17bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookupMartin KaFai Lau
The bpf_fib_lookup() also looks up the neigh table. This was done before bpf_redirect_neigh() was added. In the use case that does not manage the neigh table and requires bpf_fib_lookup() to lookup a fib to decide if it needs to redirect or not, the bpf prog can depend only on using bpf_redirect_neigh() to lookup the neigh. It also keeps the neigh entries fresh and connected. This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid the double neigh lookup when the bpf prog always call bpf_redirect_neigh() to do the neigh lookup. The params->smac output is skipped together when SKIP_NEIGH is set because bpf_redirect_neigh() will figure out the smac also. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev
2023-02-15selftests/bpf: Fix build error for LoongArchTiezhu Yang
There exists build error when make -C tools/testing/selftests/bpf/ on LoongArch: BINARY test_verifier In file included from test_verifier.c:27: tools/include/uapi/linux/bpf_perf_event.h:14:28: error: field 'regs' has incomplete type 14 | bpf_user_pt_regs_t regs; | ^~~~ make: *** [Makefile:577: tools/testing/selftests/bpf/test_verifier] Error 1 make: Leaving directory 'tools/testing/selftests/bpf' Add missing uapi header for LoongArch to use the following definition: typedef struct user_pt_regs bpf_user_pt_regs_t; Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/1676458867-22052-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-13bpf: Add basic bpf_rb_{root,node} supportDave Marchevsky
This patch adds special BPF_RB_{ROOT,NODE} btf_field_types similar to BPF_LIST_{HEAD,NODE}, adds the necessary plumbing to detect the new types, and adds bpf_rb_root_free function for freeing bpf_rb_root in map_values. structs bpf_rb_root and bpf_rb_node are opaque types meant to obscure structs rb_root_cached rb_node, respectively. btf_struct_access will prevent BPF programs from touching these special fields automatically now that they're recognized. btf_check_and_fixup_fields now groups list_head and rb_root together as "graph root" fields and {list,rb}_node as "graph node", and does same ownership cycle checking as before. Note that this function does _not_ prevent ownership type mixups (e.g. rb_root owning list_node) - that's handled by btf_parse_graph_root. After this patch, a bpf program can have a struct bpf_rb_root in a map_value, but not add anything to nor do anything useful with it. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230214004017.2534011-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-11x86/unwind/orc: Add 'signal' field to ORC metadataJosh Poimboeuf
Add a 'signal' field which allows unwind hints to specify whether the instruction pointer should be taken literally (like for most interrupts and exceptions) rather than decremented (like for call stack return addresses) when used to find the next ORC entry. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/d2c5ec4d83a45b513d8fd72fab59f1a8cfa46871.1676068346.git.jpoimboe@kernel.org
2023-02-03bpf: fix typo in header for bpf_perf_prog_read_valueFlorian Lehner
Fix a simple typo in the documentation for bpf_perf_prog_read_value. Signed-off-by: Florian Lehner <dev@der-flo.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20230203121439.25884-1-dev@der-flo.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-02netdev-genl: create a simple family for netdev stuffJakub Kicinski
Add a Netlink spec-compatible family for netdevs. This is a very simple implementation without much thought going into it. It allows us to reap all the benefits of Netlink specs, one can use the generic client to issue the commands: $ ./cli.py --spec netdev.yaml --dump dev_get [{'ifindex': 1, 'xdp-features': set()}, {'ifindex': 2, 'xdp-features': {'basic', 'ndo-xmit', 'redirect'}}, {'ifindex': 3, 'xdp-features': {'rx-sg'}}] the generic python library does not have flags-by-name support, yet, but we also don't have to carry strings in the messages, as user space can get the names from the spec. Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Co-developed-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Co-developed-by: Marek Majtyka <alardam@gmail.com> Signed-off-by: Marek Majtyka <alardam@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/327ad9c9868becbe1e601b580c962549c8cd81f2.1675245258.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-02tools/bpf: Use tab instead of white spaces to sync bpf.hTiezhu Yang
Just silence the following build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h' Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Link: https://lore.kernel.org/r/1675319486-27744-2-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-28Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2023-01-28 We've added 124 non-merge commits during the last 22 day(s) which contain a total of 124 files changed, 6386 insertions(+), 1827 deletions(-). The main changes are: 1) Implement XDP hints via kfuncs with initial support for RX hash and timestamp metadata kfuncs, from Stanislav Fomichev and Toke Høiland-Jørgensen. Measurements on overhead: https://lore.kernel.org/bpf/875yellcx6.fsf@toke.dk 2) Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case, from Andrii Nakryiko. 3) Significantly reduce the search time for module symbols by livepatch and BPF, from Jiri Olsa and Zhen Lei. 4) Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals, from David Vernet. 5) Fix several issues in the dynptr processing such as stack slot liveness propagation, missing checks for PTR_TO_STACK variable offset, etc, from Kumar Kartikeya Dwivedi. 6) Various performance improvements, fixes, and introduction of more than just one XDP program to XSK selftests, from Magnus Karlsson. 7) Big batch to BPF samples to reduce deprecated functionality, from Daniel T. Lee. 8) Enable struct_ops programs to be sleepable in verifier, from David Vernet. 9) Reduce pr_warn() noise on BTF mismatches when they are expected under the CONFIG_MODULE_ALLOW_BTF_MISMATCH config anyway, from Connor O'Brien. 10) Describe modulo and division by zero behavior of the BPF runtime in BPF's instruction specification document, from Dave Thaler. 11) Several improvements to libbpf API documentation in libbpf.h, from Grant Seltzer. 12) Improve resolve_btfids header dependencies related to subcmd and add proper support for HOSTCC, from Ian Rogers. 13) Add ipip6 and ip6ip decapsulation support for bpf_skb_adjust_room() helper along with BPF selftests, from Ziyang Xuan. 14) Simplify the parsing logic of structure parameters for BPF trampoline in the x86-64 JIT compiler, from Pu Lehui. 15) Get BTF working for kernels with CONFIG_RUST enabled by excluding Rust compilation units with pahole, from Martin Rodriguez Reboredo. 16) Get bpf_setsockopt() working for kTLS on top of TCP sockets, from Kui-Feng Lee. 17) Disable stack protection for BPF objects in bpftool given BPF backends don't support it, from Holger Hoffstätte. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (124 commits) selftest/bpf: Make crashes more debuggable in test_progs libbpf: Add documentation to map pinning API functions libbpf: Fix malformed documentation formatting selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket. bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt(). bpf/selftests: Verify struct_ops prog sleepable behavior bpf: Pass const struct bpf_prog * to .check_member libbpf: Support sleepable struct_ops.s section bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable selftests/bpf: Fix vmtest static compilation error tools/resolve_btfids: Alter how HOSTCC is forced tools/resolve_btfids: Install subcmd headers bpf/docs: Document the nocast aliasing behavior of ___init bpf/docs: Document how nested trusted fields may be defined bpf/docs: Document cpumask kfuncs in a new file selftests/bpf: Add selftest suite for cpumask kfuncs selftests/bpf: Add nested trust selftests suite bpf: Enable cpumasks to be queried and used as kptrs bpf: Disallow NULLable pointers for trusted kfuncs ... ==================== Link: https://lore.kernel.org/r/20230128004827.21371-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23bpf: Introduce device-bound XDP programsStanislav Fomichev
New flag BPF_F_XDP_DEV_BOUND_ONLY plus all the infra to have a way to associate a netdev with a BPF program at load time. netdevsim checks are dropped in favor of generic check in dev_xdp_attach. Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230119221536.3349901-6-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ipa/ipa_interrupt.c drivers/net/ipa/ipa_interrupt.h 9ec9b2a30853 ("net: ipa: disable ipa interrupt during suspend") 8e461e1f092b ("net: ipa: introduce ipa_interrupt_enable()") d50ed3558719 ("net: ipa: enable IPA interrupt handlers separate from registration") https://lore.kernel.org/all/20230119114125.5182c7ab@canb.auug.org.au/ https://lore.kernel.org/all/79e46152-8043-a512-79d9-c3b905462774@tessares.net/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-18tools headers: Syncronize linux/build_bug.h with the kernel sourcesArnaldo Carvalho de Melo
To pick up the changes in: 07a368b3f55a79d3 ("bug: introduce ASSERT_STRUCT_OFFSET") This cset only introduces a build time assert macro, that may be useful at some point for tooling, for now it silences this perf build warning: Warning: Kernel ABI header at 'tools/include/linux/build_bug.h' differs from latest version at 'include/linux/build_bug.h' diff -u tools/include/linux/build_bug.h include/linux/build_bug.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lore.kernel.org/lkml/Y8f0jqQFYDAOBkHx@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-17tools headers UAPI: Sync linux/kvm.h with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes in: b0305c1e0e27ad91 ("KVM: x86/xen: Add KVM_XEN_INVALID_GPA and KVM_XEN_INVALID_GFN to uapi") That just rebuilds perf, as these patches don't add any new KVM ioctl to be harvested for the the 'perf trace' ioctl syscall argument beautifiers. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h' diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lore.kernel.org/lkml/Y7Loj5slB908QSXf@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-15bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()Ziyang Xuan
Add ipip6 and ip6ip decap support for bpf_skb_adjust_room(). Main use case is for using cls_bpf on ingress hook to decapsulate IPv4 over IPv6 and IPv6 over IPv4 tunnel packets. Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the new IP header version after decapsulating the outer IP header. Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/b268ec7f0ff9431f4f43b1b40ab856ebb28cb4e1.1673574419.git.william.xuanziyang@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/usb/r8152.c be53771c87f4 ("r8152: add vendor/device ID pair for Microsoft Devkit") ec51fbd1b8a2 ("r8152: add USB device driver for config selection") https://lore.kernel.org/all/20230113113339.658c4723@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-10nolibc/sys: Implement `getpagesize(2)` functionAmmar Faizi
This function returns the page size used by the running kernel. The page size value is taken from the auxiliary vector at 'AT_PAGESZ' key. 'getpagesize(2)' is assumed as a syscall becuase the manpage placement of this function is in entry 2 ('man 2 getpagesize') despite there is no real 'getpagesize(2)' syscall in the Linux syscall table. Define this function in 'sys.h'. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10nolibc/stdlib: Implement `getauxval(3)` functionAmmar Faizi
Previous commits save the address of the auxiliary vector into a global variable @_auxv. This commit creates a new function 'getauxval()' as a helper function to get the auxv value based on the given key. The behavior of this function is identic with the function documented in 'man 3 getauxval'. This function is also needed to implement 'getpagesize()' function that we will wire up in the next patches. Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: add auxiliary vector retrieval for s390Sven Schnelle
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: add auxiliary vector retrieval for mipsWilly Tarreau
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: add auxiliary vector retrieval for riscvWilly Tarreau
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. It was tested on riscv64 only. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: add auxiliary vector retrieval for armWilly Tarreau
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. Signed-off-by: Willy Tarreau <w@1wt.eu> It was tested in arm, thumb1 and thumb2 modes. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: add auxiliary vector retrieval for arm64Willy Tarreau
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: add auxiliary vector retrieval for x86_64Willy Tarreau
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: add auxiliary vector retrieval for i386Willy Tarreau
In the _start block we now iterate over envp to find the auxiliary vector after the NULL. The pointer is saved into an _auxv variable that is marked as weak so that it's accessible from multiple units. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: export environ as a weak symbol on s390Sven Schnelle
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested on s390 both with environ inherited from _start and extracted from envp. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: export environ as a weak symbol on riscvWilly Tarreau
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested on riscv64 both with environ inherited from _start and extracted from envp. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: export environ as a weak symbol on mipsWilly Tarreau
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested with mips24kc (BE) both with environ inherited from _start and extracted from envp. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: export environ as a weak symbol on armWilly Tarreau
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested in arm and thumb1 and thumb2 modes, and for each mode, both with environ inherited from _start and extracted from envp. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: export environ as a weak symbol on arm64Willy Tarreau
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested both with environ inherited from _start and extracted from envp. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: export environ as a weak symbol on i386Willy Tarreau
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested both with environ inherited from _start and extracted from envp. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: export environ as a weak symbol on x86_64Willy Tarreau
The environ is retrieved from the _start code and is easy to store at this moment. Let's declare the variable weak and store the value into it. By not being static it will be visible to all units. By being weak, if some programs already declared it, they will continue to be able to use it. This was tested both with environ inherited from _start and extracted from envp. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: make errno a weak symbol instead of a static oneWilly Tarreau
Till now errno was declared static so that it could be eliminated if unused. While the goal is commendable for tiny executables as it allows to eliminate any data and bss segments when not used, this comes with some limitations, one of which being that the errno symbol seen in different units are not the same. Even though this has never been a real issue given the nature of the programs involved till now, it happens that referencing the same symbol from multiple units can also be achieved using weak symbols, with a difference being that only one of them will be used for all of them. Compared to weak symbols, static basically have no benefit for regular programs since there are always at least a few variables in most of these, so the bss segment cannot be eliminated. E.g: $ size nolibc-test-static-errno text data bss dec hex filename 11531 0 48 11579 2d3b nolibc-test-static-errno Furthermore, the weak symbol doesn't use bss storage at all, resulting in a slightly section: $ size nolibc-test-weak-errno text data bss dec hex filename 11531 0 40 11571 2d33 nolibc-test-weak-errno This patch thus converts errno from static to weak. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: remove local definitions of O_* flags for open/fcntlWilly Tarreau
The historic nolibc code did not include asm/fcntl.h and had to define the various O_RDWR etc macros in each arch-specific file (since such values differ between certain archs). This was found at least once to induce bugs due to wrong definitions. Let's get rid of all of them and include asm/nolibc.h from sys.h instead. This was verified to work properly on all supported architectures. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: support thumb mode with frame pointers on ARMWilly Tarreau
In Thumb mode, register r7 is normally used to store the frame pointer. By default when optimizing at -Os there's no frame pointer so this works fine. But if no optimization is set, then build errors occur, indicating that r7 cannot not be used. It's difficult to cheat because it's the compiler that is complaining, not the assembler, so it's not even possible to report that the register was clobbered. The solution consists in saving and restoring r7 around the syscall, but this slightly inflates the code. The syscall number is passed via r6 which is never used by syscalls. The current patch adds a few macroes which do that only in Thumb mode, and which continue to directly assign the syscall number to register r7 in ARM mode. Now this always builds and works for all modes (tested on Arm, Thumbv1, Thumbv2 modes, at -Os, -O0, -O0 -fomit-frame-pointer). The code is very slightly inflated in thumb-mode without frame-pointers compared to previously (e.g. 7928 vs 7864 bytes for nolibc-test) but at least it's always operational. And it's possible to disable this mechanism by setting NOLIBC_OMIT_FRAME_POINTER. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: enable support for thumb1 mode for ARMWilly Tarreau
Passing -mthumb to the kernel.org arm toolchain failed to build because it defaults to armv5 hence thumb1, which has a fairly limited instruction set compared to thumb2 enabled with armv7 that is much more complete. It's not very difficult to adjust the instructions to also build on thumb1, it only adds a total of 3 instructions, so it's worth doing it at least to ease use by casual testers. It was verified that the adjusted code now builds and works fine for armv5, thumb1, armv7 and thumb2, as long as frame pointers are not used. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-10tools/nolibc: make compiler and assembler agree on the section around _startWilly Tarreau
The out-of-block asm() statement carrying _start does not allow the compiler to know what section the assembly code is being emitted to, and there's no easy way to push/pop the current section and restore it. It sometimes causes issues depending on the include files ordering and compiler optimizations. For example if a variable is declared immediately before the asm() block and another one after, the compiler assumes that the current section is still .bss and doesn't re-emit it, making the second variable appear inside the .text section instead. Forcing .bss at the end of the _start block doesn't work either because at certain optimizations the compiler may reorder blocks and will make some real code appear just after this block. A significant number of solutions were attempted, but many of them were still sensitive to section reordering. In the end, the best way to make sure the compiler and assembler agree on the current section is to place this code inside a function. Here the function is directly called _start and configured not to emit a frame-pointer, hence to have no prologue. If some future architectures would still emit some prologue, another working approach consists in naming the function differently and placing the _start label inside the asm statement. But the current solution is simpler. It was tested with nolibc-test at -O,-O0,-O2,-O3,-Os for arm,arm64,i386, mips,riscv,s390 and x86_64. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-09tools/nolibc: fix the O_* fcntl/open macro definitions for riscvWilly Tarreau
When RISCV port was imported in 5.2, the O_* macros were taken with their octal value and written as-is in hex, resulting in the getdents64() to fail in nolibc-test. Fixes: 582e84f7b779 ("tool headers nolibc: add RISCV support") #5.2 Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-09nolibc: add support for s390Sven Schnelle
Use arch-x86_64 as a template. Not really different, but we have our own mmap syscall which takes a structure instead of discrete arguments. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-09tools/nolibc: prevent gcc from making memset() loop over itselfWilly Tarreau
When building on ARM in thumb mode with gcc-11.3 at -O2 or -O3, nolibc-test segfaults during the select() tests. It turns out that at this level, gcc recognizes an opportunity for using memset() to zero the fd_set, but it miscompiles it because it also recognizes a memset pattern as well, and decides to call memset() from the memset() code: 000122bc <memset>: 122bc: b510 push {r4, lr} 122be: 0004 movs r4, r0 122c0: 2a00 cmp r2, #0 122c2: d003 beq.n 122cc <memset+0x10> 122c4: 23ff movs r3, #255 ; 0xff 122c6: 4019 ands r1, r3 122c8: f7ff fff8 bl 122bc <memset> 122cc: 0020 movs r0, r4 122ce: bd10 pop {r4, pc} Simply placing an empty asm() statement inside the loop suffices to avoid this. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-09tools/nolibc: fix missing includes causing build issues at -O0Willy Tarreau
After the nolibc includes were split to facilitate portability from standard libcs, programs that include only what they need may miss some symbols which are needed by libgcc. This is the case for raise() which is needed by the divide by zero code in some architectures for example. Regardless, being able to include only the apparently needed files is convenient. Instead of trying to move all exported definitions to a single file, since this can change over time, this patch takes another approach consisting in including the nolibc header at the end of all standard include files. This way their types and functions are already known at the moment of inclusion, and including any single one of them is sufficient to bring all the required ones. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-09tools/nolibc: restore mips branch ordering in the _start blockWilly Tarreau
Depending on the compiler used and the optimization options, the sbrk() test was crashing, both on real hardware (mips-24kc) and in qemu. One such example is kernel.org toolchain in version 11.3 optimizing at -Os. Inspecting the sys_brk() call shows the following code: 0040047c <sys_brk>: 40047c: 24020fcd li v0,4045 400480: 27bdffe0 addiu sp,sp,-32 400484: 0000000c syscall 400488: 27bd0020 addiu sp,sp,32 40048c: 10e00001 beqz a3,400494 <sys_brk+0x18> 400490: 00021023 negu v0,v0 400494: 03e00008 jr ra It is obviously wrong, the "negu" instruction is placed in beqz's delayed slot, and worse, there's no nop nor instruction after the return, so the next function's first instruction (addiu sip,sip,-32) will also be executed as part of the delayed slot that follows the return. This is caused by the ".set noreorder" directive in the _start block, that applies to the whole program. The compiler emits code without the delayed slots and relies on the compiler to swap instructions when this option is not set. Removing the option would require to change the startup code in a way that wouldn't make it look like the resulting code, which would not be easy to debug. Instead let's just save the default ordering before changing it, and restore it at the end of the _start block. Now the code is correct: 0040047c <sys_brk>: 40047c: 24020fcd li v0,4045 400480: 27bdffe0 addiu sp,sp,-32 400484: 0000000c syscall 400488: 10e00002 beqz a3,400494 <sys_brk+0x18> 40048c: 27bd0020 addiu sp,sp,32 400490: 00021023 negu v0,v0 400494: 03e00008 jr ra 400498: 00000000 nop Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc") #5.0 Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-09tools/nolibc: Fix S_ISxxx macrosWarner Losh
The mode field has the type encoded as an value in a field, not as a bit mask. Mask the mode with S_IFMT instead of each type to test. Otherwise, false positives are possible: eg S_ISDIR will return true for block devices because S_IFDIR = 0040000 and S_IFBLK = 0060000 since mode is masked with S_IFDIR instead of S_IFMT. These macros now match the similar definitions in tools/include/uapi/linux/stat.h. Signed-off-by: Warner Losh <imp@bsdimp.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-09nolibc: fix fd_set typeSven Schnelle
The kernel uses unsigned long for the fd_set bitmap, but nolibc use u32. This works fine on little endian machines, but fails on big endian. Convert to unsigned long to fix this. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2023-01-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>