summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-04-23Merge branch '100GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== This series lowers the CPU usage of the ice driver when using its provided /dev/gnss*. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-23net: mtk_eth_soc: mediatek: fix ppe flow accounting for v1 hardwareFelix Fietkau
Older chips (like MT7622) use a different bit in ib2 to enable hardware counter support. Add macros for both and select the appropriate bit. Fixes: 3fbe4d8c0e53 ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting") Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21Merge tag 'mlx5-updates-2023-04-20' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-04-20 1) Dragos Improves RX page pool, and provides some fixes to his previous series: 1.1) Fix releasing page_pool for striding RQ and legacy RQ nonlinear case 1.2) Hook NAPIs to page pools to gain more performance. 2) From Roi, Some cleanups to TC and eswitch modules. 3) Maher migrates vnic diagnostic counters reporting from debugfs to a dedicated devlink health reporter Maher Says: =========== net/mlx5: Expose vnic diagnostic counters using devlink Currently, vnic diagnostic counters are exposed through the following debugfs: $ ls /sys/kernel/debug/mlx5/0000:08:00.0/esw/vf_0/vnic_diag/ cq_overrun quota_exceeded_command total_q_under_processor_handle invalid_command send_queue_priority_update_flow nic_receive_steering_discard The current design does not allow the hypervisor to view the diagnostic counters of its VFs, in case the VFs get bound to a VM. In other words, the counters are not exposed for representor interfaces. Furthermore, the debugfs design is inconvenient future-wise, in case more counters need to be reported by the driver in the future. As these counters pertain to vNIC health, it is more appropriate to utilize the devlink health reporter to expose them. Thus, this patchest includes the following changes: * Drop the current vnic diagnostic counters debugfs interface. * Add a vnic devlink health reporter for PFs/VFs core devices, which when diagnosed will dump vnic diagnostic counter values that are queried from FW. * Add a vnic devlink health reporter for the representor interface, which serves the same purpose listed in the previous point, in addition to allowing the hypervisor to view its VFs diagnostic counters, even when the VFs are bounded to external VMs. Example of devlink health reporter usage is: $devlink health diagnose pci/0000:08:00.0 reporter vnic vNIC env counters: total_error_queues: 0 send_queue_priority_update_flow: 0 comp_eq_overrun: 0 async_eq_overrun: 0 cq_overrun: 0 invalid_command: 0 quota_exceeded_command: 0 nic_receive_steering_discard: 0 =========== 4) SW steering fixes and improvements Yevgeny Kliteynik Says: ======================= These short patch series are just small fixes / improvements for SW steering: - Patch 1: Fix dumping of legacy modify_hdr in debug dump to align to what is expected by parser - Patch 2: Have separate threshold for ICM sync per ICM type - Patch 3: Add more info to the steering debug dump - Linux version and device name - Patch 4: Keep track of number of buddies that are currently in use per domain per buddy type ======================= * tag 'mlx5-updates-2023-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Update op_mode to op_mod for port selection net/mlx5: E-Switch, Remove unused mlx5_esw_offloads_vport_metadata_set() net/mlx5: E-Switch, Remove redundant dev arg from mlx5_esw_vport_alloc() net/mlx5: Include linux/pci.h for pci_msix_can_alloc_dyn() net/mlx5e: RX, Hook NAPIs to page pools net/mlx5e: RX, Fix XDP_TX page release for legacy rq nonlinear case net/mlx5e: RX, Fix releasing page_pool pages twice for striding RQ net/mlx5e: Add vnic devlink health reporter to representors net/mlx5: Add vnic devlink health reporter to PFs/VFs Revert "net/mlx5: Expose vnic diagnostic counters for eswitch managed vports" Revert "net/mlx5: Expose steering dropped packets counter" net/mlx5: DR, Add memory statistics for domain object net/mlx5: DR, Add more info in domain dbg dump net/mlx5: DR, Calculate sync threshold of each pool according to its type net/mlx5: DR, Fix dumping of legacy modify_hdr in debug dump ==================== Link: https://lore.kernel.org/r/20230421013850.349646-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-21Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2023-04-21 We've added 71 non-merge commits during the last 8 day(s) which contain a total of 116 files changed, 13397 insertions(+), 8896 deletions(-). The main changes are: 1) Add a new BPF netfilter program type and minimal support to hook BPF programs to netfilter hooks such as prerouting or forward, from Florian Westphal. 2) Fix race between btf_put and btf_idr walk which caused a deadlock, from Alexei Starovoitov. 3) Second big batch to migrate test_verifier unit tests into test_progs for ease of readability and debugging, from Eduard Zingerman. 4) Add support for refcounted local kptrs to the verifier for allowing shared ownership, useful for adding a node to both the BPF list and rbtree, from Dave Marchevsky. 5) Migrate bpf_for(), bpf_for_each() and bpf_repeat() macros from BPF selftests into libbpf-provided bpf_helpers.h header and improve kfunc handling, from Andrii Nakryiko. 6) Support 64-bit pointers to kfuncs needed for archs like s390x, from Ilya Leoshkevich. 7) Support BPF progs under getsockopt with a NULL optval, from Stanislav Fomichev. 8) Improve verifier u32 scalar equality checking in order to enable LLVM transformations which earlier had to be disabled specifically for BPF backend, from Yonghong Song. 9) Extend bpftool's struct_ops object loading to support links, from Kui-Feng Lee. 10) Add xsk selftest follow-up fixes for hugepage allocated umem, from Magnus Karlsson. 11) Support BPF redirects from tc BPF to ifb devices, from Daniel Borkmann. 12) Add BPF support for integer type when accessing variable length arrays, from Feng Zhou. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (71 commits) selftests/bpf: verifier/value_ptr_arith converted to inline assembly selftests/bpf: verifier/value_illegal_alu converted to inline assembly selftests/bpf: verifier/unpriv converted to inline assembly selftests/bpf: verifier/subreg converted to inline assembly selftests/bpf: verifier/spin_lock converted to inline assembly selftests/bpf: verifier/sock converted to inline assembly selftests/bpf: verifier/search_pruning converted to inline assembly selftests/bpf: verifier/runtime_jit converted to inline assembly selftests/bpf: verifier/regalloc converted to inline assembly selftests/bpf: verifier/ref_tracking converted to inline assembly selftests/bpf: verifier/map_ptr_mixing converted to inline assembly selftests/bpf: verifier/map_in_map converted to inline assembly selftests/bpf: verifier/lwt converted to inline assembly selftests/bpf: verifier/loops1 converted to inline assembly selftests/bpf: verifier/jeq_infer_not_null converted to inline assembly selftests/bpf: verifier/direct_packet_access converted to inline assembly selftests/bpf: verifier/d_path converted to inline assembly selftests/bpf: verifier/ctx converted to inline assembly selftests/bpf: verifier/btf_ctx_access converted to inline assembly selftests/bpf: verifier/bpf_get_stack converted to inline assembly ... ==================== Link: https://lore.kernel.org/r/20230421211035.9111-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-21net: dst: fix missing initialization of rt_uncachedMaxime Bizon
xfrm_alloc_dst() followed by xfrm4_dst_destroy(), without a xfrm4_fill_dst() call in between, causes the following BUG: BUG: spinlock bad magic on CPU#0, fbxhostapd/732 lock: 0x890b7668, .magic: 890b7668, .owner: <none>/-1, .owner_cpu: 0 CPU: 0 PID: 732 Comm: fbxhostapd Not tainted 6.3.0-rc6-next-20230414-00613-ge8de66369925-dirty #9 Hardware name: Marvell Kirkwood (Flattened Device Tree) unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x28/0x30 dump_stack_lvl from do_raw_spin_lock+0x20/0x80 do_raw_spin_lock from rt_del_uncached_list+0x30/0x64 rt_del_uncached_list from xfrm4_dst_destroy+0x3c/0xbc xfrm4_dst_destroy from dst_destroy+0x5c/0xb0 dst_destroy from rcu_process_callbacks+0xc4/0xec rcu_process_callbacks from __do_softirq+0xb4/0x22c __do_softirq from call_with_stack+0x1c/0x24 call_with_stack from do_softirq+0x60/0x6c do_softirq from __local_bh_enable_ip+0xa0/0xcc Patch "net: dst: Prevent false sharing vs. dst_entry:: __refcnt" moved rt_uncached and rt_uncached_list fields from rtable struct to dst struct, so they are more zeroed by memset_after(xdst, 0, u.dst) in xfrm_alloc_dst(). Note that rt_uncached (list_head) was never properly initialized at alloc time, but xfrm[46]_dst_destroy() is written in such a way that it was not an issue thanks to the memset: if (xdst->u.rt.dst.rt_uncached_list) rt_del_uncached_list(&xdst->u.rt); The route code does it the other way around: rt_uncached_list is assumed to be valid IIF rt_uncached list_head is not empty: void rt_del_uncached_list(struct rtable *rt) { if (!list_empty(&rt->dst.rt_uncached)) { struct uncached_list *ul = rt->dst.rt_uncached_list; spin_lock_bh(&ul->lock); list_del_init(&rt->dst.rt_uncached); spin_unlock_bh(&ul->lock); } } This patch adds mandatory rt_uncached list_head initialization in generic dst_init(), and adapt xfrm[46]_dst_destroy logic to match the rest of the code. Fixes: d288a162dd1c ("net: dst: Prevent false sharing vs. dst_entry:: __refcnt") Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/oe-lkp/202304162125.18b7bcdd-oliver.sang@intel.com Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> CC: Leon Romanovsky <leon@kernel.org> Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Link: https://lore.kernel.org/r/20230420182508.2417582-1-mbizon@freebox.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-21net: dsa: qca8k: fix LEDS_CLASS dependencyArnd Bergmann
With LEDS_CLASS=m, a built-in qca8k driver fails to link: arm-linux-gnueabi-ld: drivers/net/dsa/qca/qca8k-leds.o: in function `qca8k_setup_led_ctrl': qca8k-leds.c:(.text+0x1ea): undefined reference to `devm_led_classdev_register_ext' Change the dependency to avoid the broken configuration. Fixes: 1e264f9d2918 ("net: dsa: qca8k: add LEDs basic support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christian Marangi <ansuelsmth@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230420213639.2243388-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-21net/handshake: Fix section mismatch in handshake_exitGeert Uytterhoeven
If CONFIG_NET_NS=n (e.g. m68k/defconfig): WARNING: modpost: vmlinux.o: section mismatch in reference: handshake_exit (section: .exit.text) -> handshake_genl_net_ops (section: .init.data) ERROR: modpost: Section mismatches detected. Fix this by dropping the __net_initdata tag from handshake_genl_net_ops. Fixes: 3b3009ea8abb713b ("net/handshake: Create a NETLINK service for handling handshake requests") Reported-by: noreply@ellerman.id.au Closes: http://kisskb.ellerman.id.au/kisskb/buildresult/14912987 Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20230420173723.3773434-1-geert@linux-m68k.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-21net: phy: add basic driver for NXP CBTX PHYVladimir Oltean
The CBTX PHY is a Fast Ethernet PHY integrated into the SJA1110 A/B/C automotive Ethernet switches. It was hoped it would work with the Generic PHY driver, but alas, it doesn't. The most important reason why is that the PHY is powered down by default, and it needs a vendor register to power it on. It has a linear memory map that is accessed over SPI by the SJA1110 switch driver, which exposes a fake MDIO controller. It has the following (and only the following) standard clause 22 registers: 0x0: MII_BMCR 0x1: MII_BMSR 0x2: MII_PHYSID1 0x3: MII_PHYSID2 0x4: MII_ADVERTISE 0x5: MII_LPA 0x6: MII_EXPANSION 0x7: the missing MII_NPAGE for Next Page Transmit Register Every other register is vendor-defined. The register map expands the standard clause 22 5-bit address space of 0x20 registers, however the driver does not need to access the extra registers for now (and hopefully never). If it ever needs to do that, it is possible to implement a fake (software) page switching mechanism between the PHY driver and the SJA1110 MDIO controller driver. Also, Auto-MDIX is turned off by default in hardware, the driver turns it on by default and reports the current status. I've tested this with a VSC8514 link partner and a crossover cable, by forcing the mode on the link partner, and seeing that the CBTX PHY always sees the reverse of the mode forced on the VSC8514 (and that traffic works). The link doesn't come up (as expected) if MDI modes are forced on both ends in the same way (with the cross-over cable, that is). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230418190141.1040562-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-22netfilter: nf_tables: allow to create netdev chain without devicePablo Neira Ayuso
Relax netdev chain creation to allow for loading the ruleset, then adding/deleting devices at a later stage. Hardware offload does not support for this feature yet. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: support for deleting devices in an existing netdev chainPablo Neira Ayuso
This patch allows for deleting devices in an existing netdev chain. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: support for adding new devices to an existing netdev chainPablo Neira Ayuso
This patch allows users to add devices to an existing netdev chain. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: rename function to destroy hook listPablo Neira Ayuso
Rename nft_flowtable_hooks_destroy() by nft_hooks_destroy() to prepare for netdev chain device updates. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: do not send complete notification of deletionsPablo Neira Ayuso
In most cases, table, name and handle is sufficient for userspace to identify an object that has been deleted. Skipping unneeded fields in the netlink attributes in the message saves bandwidth (ie. less chances of hitting ENOBUFS). Rules are an exception: the existing userspace monitor code relies on the rule definition. This exception can be removed by implementing a rule cache in userspace, this is already supported by the tracing infrastructure. Regarding flowtables, incremental deletion of devices is possible. Skipping a full notification allows userspace to differentiate between flowtable removal and incremental removal of devices. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: extended netlink error reporting for netdevicePablo Neira Ayuso
Flowtable and netdev chains are bound to one or several netdevice, extend netlink error reporting to specify the the netdevice that triggers the error. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22ipvs: Correct spelling in commentsSimon Horman
Correct some spelling errors flagged by codespell and found by inspection. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22ipvs: Remove {Enter,Leave}FunctionSimon Horman
Remove EnterFunction and LeaveFunction. These debugging macros seem well past their use-by date. And seem to have little value these days. Removing them allows some trivial cleanup of some exit paths for some functions. These are also included in this patch. There is likely scope for further cleanup of both debugging and unwind paths. But let's leave that for another day. Only intended to change debug output, and only when CONFIG_IP_VS_DEBUG is enabled. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22ipvs: Consistently use array_size() in ip_vs_conn_init()Simon Horman
Consistently use array_size() to calculate the size of ip_vs_conn_tab in bytes. Flagged by Coccinelle: WARNING: array_size is already used (line 1498) to compute the same size No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22ipvs: Update width of source for ip_vs_sync_conn_optionsSimon Horman
In ip_vs_sync_conn_v0() copy is made to struct ip_vs_sync_conn_options. That structure looks like this: struct ip_vs_sync_conn_options { struct ip_vs_seq in_seq; struct ip_vs_seq out_seq; }; The source of the copy is the in_seq field of struct ip_vs_conn. Whose type is struct ip_vs_seq. Thus we can see that the source - is not as wide as the amount of data copied, which is the width of struct ip_vs_sync_conn_option. The copy is safe because the next field in is another struct ip_vs_seq. Make use of struct_group() to annotate this. Flagged by gcc-13 as: In file included from ./include/linux/string.h:254, from ./include/linux/bitmap.h:11, from ./include/linux/cpumask.h:12, from ./arch/x86/include/asm/paravirt.h:17, from ./arch/x86/include/asm/cpuid.h:62, from ./arch/x86/include/asm/processor.h:19, from ./arch/x86/include/asm/timex.h:5, from ./include/linux/timex.h:67, from ./include/linux/time32.h:13, from ./include/linux/time.h:60, from ./include/linux/stat.h:19, from ./include/linux/module.h:13, from net/netfilter/ipvs/ip_vs_sync.c:38: In function 'fortify_memcpy_chk', inlined from 'ip_vs_sync_conn_v0' at net/netfilter/ipvs/ip_vs_sync.c:606:3: ./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning] 529 | __read_overflow2_field(q_size_field, size); | Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: do not store rule in traceinfo structureFlorian Westphal
pass it as argument instead. This reduces size of traceinfo to 16 bytes. Total stack usage: nf_tables_core.c:252 nft_do_chain 304 static While its possible to also pass basechain as argument, doing so increases nft_do_chaininfo function size. Unlike pktinfo/verdict/rule the basechain info isn't used in the expression evaluation path. gcc places it on the stack, which results in extra push/pop when it gets passed to the trace helpers as argument rather than as part of the traceinfo structure. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: do not store verdict in traceinfo structureFlorian Westphal
Just pass it as argument to nft_trace_notify. Stack is reduced by 8 bytes: nf_tables_core.c:256 nft_do_chain 312 static Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: do not store pktinfo in traceinfo structureFlorian Westphal
pass it as argument. No change in object size. stack usage decreases by 8 byte: nf_tables_core.c:254 nft_do_chain 320 static Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: remove unneeded conditionalFlorian Westphal
This helper is inlined, so keep it as small as possible. If the static key is true, there is only a very small chance that info->trace is false: 1. tracing was enabled at this very moment, the static key was updated to active right after nft_do_table was called. 2. tracing was disabled at this very moment. trace->info is already false, the static key is about to be patched to false soon. In both cases, no event will be sent because info->trace is false (checked in noinline slowpath). info->nf_trace is irrelevant. The nf_trace update is redunant in this case, but this will only happen for short duration, when static key flips. text data bss dec hex filename old: 2980 192 32 3204 c84 nf_tables_core.o new: 2964 192 32 3188 c74i nf_tables_core.o Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: make validation state per tableFlorian Westphal
We only need to validate tables that saw changes in the current transaction. The existing code revalidates all tables, but this isn't needed as cross-table jumps are not allowed (chains have table scope). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: don't write table validation state without mutexFlorian Westphal
The ->cleanup callback needs to be removed, this doesn't work anymore as the transaction mutex is already released in the ->abort function. Just do it after a successful validation pass, this either happens from commit or abort phases where transaction mutex is held. Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: don't store chain address on jumpFlorian Westphal
Now that the rule trailer/end marker and the rcu head reside in the same structure, we no longer need to save/restore the chain pointer when performing/returning from a jump. We can simply let the trace infra walk the evaluated rule until it hits the end marker and then fetch the chain pointer from there. When the rule is NULL (policy tracing), then chain and basechain pointers were already identical, so just use the basechain. This cuts size of jumpstack in half, from 256 to 128 bytes in 64bit, scripts/stackusage says: nf_tables_core.c:251 nft_do_chain 328 static Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: don't store address of last rule on jumpFlorian Westphal
Walk the rule headers until the trailer one (last_bit flag set) instead of stopping at last_rule address. This avoids the need to store the address when jumping to another chain. This cuts size of jumpstack array by one third, on 64bit from 384 to 256 bytes. Still, stack usage is still quite large: scripts/stackusage: nf_tables_core.c:258 nft_do_chain 496 static Next patch will also remove chain pointer. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: merge nft_rules_old structure and end of ruleblob markerFlorian Westphal
In order to free the rules in a chain via call_rcu, the rule array used to stash a rcu_head and space for a pointer at the end of the rule array. When the current nft_rule_dp blob format got added in 2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout"), this results in a double-trailer: size (unsigned long) struct nft_rule_dp struct nft_expr ... struct nft_rule_dp struct nft_expr ... struct nft_rule_dp (is_last=1) // Trailer The trailer, struct nft_rule_dp (is_last=1), is not accounted for in size, so it can be located via start_addr + size. Because the rcu_head is stored after 'start+size' as well this means the is_last trailer is *aliased* to the rcu_head (struct nft_rules_old). This is harmless, because at this time the nft_do_chain function never evaluates/accesses the trailer, it only checks the address boundary: for (; rule < last_rule; rule = nft_rule_next(rule)) { ... But this way the last_rule address has to be stashed in the jump structure to restore it after returning from a chain. nft_do_chain stack usage has become way too big, so put it on a diet. Without this patch is impossible to use for (; !rule->is_last; rule = nft_rule_next(rule)) { ... because on free, the needed update of the rcu_head will clobber the nft_rule_dp is_last bit. Furthermore, also stash the chain pointer in the trailer, this allows to recover the original chain structure from nf_tables_trace infra without a need to place them in the jump struct. After this patch it is trivial to diet the jump stack structure, done in the next two patches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-21selftests/bpf: verifier/value_ptr_arith converted to inline assemblyEduard Zingerman
Test verifier/value_ptr_arith automatically converted to use inline assembly. Test cases "sanitation: alu with different scalars 2" and "sanitation: alu with different scalars 3" are updated to avoid -ENOENT as return value, as __retval() annotation only supports numeric literals. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-25-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/value_illegal_alu converted to inline assemblyEduard Zingerman
Test verifier/value_illegal_alu automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-24-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/unpriv converted to inline assemblyEduard Zingerman
Test verifier/unpriv semi-automatically converted to use inline assembly. The verifier/unpriv.c had to be split in two parts: - the bulk of the tests is in the progs/verifier_unpriv.c; - the single test that needs `struct bpf_perf_event_data` definition is in the progs/verifier_unpriv_perf.c. The tests above can't be in a single file because: - first requires inclusion of the filter.h header (to get access to BPF_ST_MEM macro, inline assembler does not support this isntruction); - the second requires vmlinux.h, which contains definitions conflicting with filter.h. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-23-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/subreg converted to inline assemblyEduard Zingerman
Test verifier/subreg automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-22-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/spin_lock converted to inline assemblyEduard Zingerman
Test verifier/spin_lock automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-21-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/sock converted to inline assemblyEduard Zingerman
Test verifier/sock automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-20-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/search_pruning converted to inline assemblyEduard Zingerman
Test verifier/search_pruning automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-19-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/runtime_jit converted to inline assemblyEduard Zingerman
Test verifier/runtime_jit automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-18-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/regalloc converted to inline assemblyEduard Zingerman
Test verifier/regalloc automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-17-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/ref_tracking converted to inline assemblyEduard Zingerman
Test verifier/ref_tracking automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-16-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/map_ptr_mixing converted to inline assemblyEduard Zingerman
Test verifier/map_ptr_mixing automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-13-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/map_in_map converted to inline assemblyEduard Zingerman
Test verifier/map_in_map automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-12-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/lwt converted to inline assemblyEduard Zingerman
Test verifier/lwt automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-11-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/loops1 converted to inline assemblyEduard Zingerman
Test verifier/loops1 automatically converted to use inline assembly. There are a few modifications for the converted tests. "tracepoint" programs do not support test execution, change program type to "xdp" (which supports test execution) for the following tests that have __retval tags: - bounded loop, count to 4 - bonded loop containing forward jump Also, remove the __retval tag for test: - bounded loop, count from positive unknown to 4 As it's return value is a random number. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-10-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/jeq_infer_not_null converted to inline assemblyEduard Zingerman
Test verifier/jeq_infer_not_null automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-9-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/direct_packet_access converted to inline assemblyEduard Zingerman
Test verifier/direct_packet_access automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-8-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/d_path converted to inline assemblyEduard Zingerman
Test verifier/d_path automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-7-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/ctx converted to inline assemblyEduard Zingerman
Test verifier/ctx automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-6-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/btf_ctx_access converted to inline assemblyEduard Zingerman
Test verifier/btf_ctx_access automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/bpf_get_stack converted to inline assemblyEduard Zingerman
Test verifier/bpf_get_stack automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: verifier/bounds converted to inline assemblyEduard Zingerman
Test verifier/bounds automatically converted to use inline assembly. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21selftests/bpf: Add notion of auxiliary programs for test_loaderEduard Zingerman
In order to express test cases that use bpf_tail_call() intrinsic it is necessary to have several programs to be loaded at a time. This commit adds __auxiliary annotation to the set of annotations supported by test_loader.c. Programs marked as auxiliary are always loaded but are not treated as a separate test. For example: void dummy_prog1(void); struct { __uint(type, BPF_MAP_TYPE_PROG_ARRAY); __uint(max_entries, 4); __uint(key_size, sizeof(int)); __array(values, void (void)); } prog_map SEC(".maps") = { .values = { [0] = (void *) &dummy_prog1, }, }; SEC("tc") __auxiliary __naked void dummy_prog1(void) { asm volatile ("r0 = 42; exit;"); } SEC("tc") __description("reference tracking: check reference or tail call") __success __retval(0) __naked void check_reference_or_tail_call(void) { asm volatile ( "r2 = %[prog_map] ll;" "r3 = 0;" "call %[bpf_tail_call];" "r0 = 0;" "exit;" :: __imm(bpf_tail_call), : __clobber_all); } Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230421174234.2391278-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21Merge branch 'bpf: add netfilter program type'Alexei Starovoitov
Florian Westphal says: ==================== Changes since last version: - rework test case in last patch wrt. ctx->skb dereference etc (Alexei) - pacify bpf ci tests, netfilter program type missed string translation in libbpf helper. This still uses runtime btf walk rather than extending the btf trace array as Alexei suggested, I would do this later (or someone else can). v1 cover letter: Add minimal support to hook bpf programs to netfilter hooks, e.g. PREROUTING or FORWARD. For this the most relevant parts for registering a netfilter hook via the in-kernel api are exposed to userspace via bpf_link. The new program type is 'tracing style', i.e. there is no context access rewrite done by verifier, the function argument (struct bpf_nf_ctx) isn't stable. There is no support for direct packet access, dynptr api should be used instead. With this its possible to build a small test program such as: #include "vmlinux.h" extern int bpf_dynptr_from_skb(struct __sk_buff *skb, __u64 flags, struct bpf_dynptr *ptr__uninit) __ksym; extern void *bpf_dynptr_slice(const struct bpf_dynptr *ptr, uint32_t offset, void *buffer, uint32_t buffer__sz) __ksym; SEC("netfilter") int nf_test(struct bpf_nf_ctx *ctx) { struct nf_hook_state *state = ctx->state; struct sk_buff *skb = ctx->skb; const struct iphdr *iph, _iph; const struct tcphdr *th, _th; struct bpf_dynptr ptr; if (bpf_dynptr_from_skb(skb, 0, &ptr)) return NF_DROP; iph = bpf_dynptr_slice(&ptr, 0, &_iph, sizeof(_iph)); if (!iph) return NF_DROP; th = bpf_dynptr_slice(&ptr, iph->ihl << 2, &_th, sizeof(_th)); if (!th) return NF_DROP; bpf_printk("accept %x:%d->%x:%d, hook %d ifin %d\n", iph->saddr, bpf_ntohs(th->source), iph->daddr, bpf_ntohs(th->dest), state->hook, state->in->ifindex); return NF_ACCEPT; } Then, tail /sys/kernel/tracing/trace_pipe. Changes since v3: - uapi: remove 'reserved' struct member, s/prio/priority (Alexei) - add ctx access test cases (Alexei, see last patch) - some arm32 can only handle cmpxchg on u32 (build bot) - Fix kdoc annotations (Simon Horman) - bpftool: prefer p_err, not fprintf (Quentin) - add test cases in separate patch Changes since v2: 1. don't WARN when user calls 'bpftool loink detach' twice restrict attachment to ip+ip6 families, lets relax this later in case arp/bridge/netdev are needed too. 2. show netfilter links in 'bpftool net' output as well. Changes since v1: 1. Don't fail to link when CONFIG_NETFILTER=n (build bot) 2. Use test_progs instead of test_verifier (Alexei) Changes since last RFC version: 1. extend 'bpftool link show' to print prio/hooknum etc 2. extend 'nft list hooks' so it can print the bpf program id 3. Add an extra patch to artificially restrict bpf progs with same priority. Its fine from a technical pov but it will cause ordering issues (most recent one comes first). Can be removed later. 4. Add test_run support for netfilter prog type and a small extension to verifier tests to make sure we can't return verdicts like NF_STOLEN. 5. Alter the netfilter part of the bpf_link uapi struct: - add flags/reserved members. Not used here except returning errors when they are nonzero. Plan is to allow the bpf_link users to enable netfilter defrag or conntrack engine by setting feature flags at link create time in the future. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>