summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2023-04-22ipvs: Remove {Enter,Leave}FunctionSimon Horman
Remove EnterFunction and LeaveFunction. These debugging macros seem well past their use-by date. And seem to have little value these days. Removing them allows some trivial cleanup of some exit paths for some functions. These are also included in this patch. There is likely scope for further cleanup of both debugging and unwind paths. But let's leave that for another day. Only intended to change debug output, and only when CONFIG_IP_VS_DEBUG is enabled. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22ipvs: Consistently use array_size() in ip_vs_conn_init()Simon Horman
Consistently use array_size() to calculate the size of ip_vs_conn_tab in bytes. Flagged by Coccinelle: WARNING: array_size is already used (line 1498) to compute the same size No functional change intended. Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22ipvs: Update width of source for ip_vs_sync_conn_optionsSimon Horman
In ip_vs_sync_conn_v0() copy is made to struct ip_vs_sync_conn_options. That structure looks like this: struct ip_vs_sync_conn_options { struct ip_vs_seq in_seq; struct ip_vs_seq out_seq; }; The source of the copy is the in_seq field of struct ip_vs_conn. Whose type is struct ip_vs_seq. Thus we can see that the source - is not as wide as the amount of data copied, which is the width of struct ip_vs_sync_conn_option. The copy is safe because the next field in is another struct ip_vs_seq. Make use of struct_group() to annotate this. Flagged by gcc-13 as: In file included from ./include/linux/string.h:254, from ./include/linux/bitmap.h:11, from ./include/linux/cpumask.h:12, from ./arch/x86/include/asm/paravirt.h:17, from ./arch/x86/include/asm/cpuid.h:62, from ./arch/x86/include/asm/processor.h:19, from ./arch/x86/include/asm/timex.h:5, from ./include/linux/timex.h:67, from ./include/linux/time32.h:13, from ./include/linux/time.h:60, from ./include/linux/stat.h:19, from ./include/linux/module.h:13, from net/netfilter/ipvs/ip_vs_sync.c:38: In function 'fortify_memcpy_chk', inlined from 'ip_vs_sync_conn_v0' at net/netfilter/ipvs/ip_vs_sync.c:606:3: ./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning] 529 | __read_overflow2_field(q_size_field, size); | Compile tested only. Signed-off-by: Simon Horman <horms@kernel.org> Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: do not store rule in traceinfo structureFlorian Westphal
pass it as argument instead. This reduces size of traceinfo to 16 bytes. Total stack usage: nf_tables_core.c:252 nft_do_chain 304 static While its possible to also pass basechain as argument, doing so increases nft_do_chaininfo function size. Unlike pktinfo/verdict/rule the basechain info isn't used in the expression evaluation path. gcc places it on the stack, which results in extra push/pop when it gets passed to the trace helpers as argument rather than as part of the traceinfo structure. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: do not store verdict in traceinfo structureFlorian Westphal
Just pass it as argument to nft_trace_notify. Stack is reduced by 8 bytes: nf_tables_core.c:256 nft_do_chain 312 static Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: do not store pktinfo in traceinfo structureFlorian Westphal
pass it as argument. No change in object size. stack usage decreases by 8 byte: nf_tables_core.c:254 nft_do_chain 320 static Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: remove unneeded conditionalFlorian Westphal
This helper is inlined, so keep it as small as possible. If the static key is true, there is only a very small chance that info->trace is false: 1. tracing was enabled at this very moment, the static key was updated to active right after nft_do_table was called. 2. tracing was disabled at this very moment. trace->info is already false, the static key is about to be patched to false soon. In both cases, no event will be sent because info->trace is false (checked in noinline slowpath). info->nf_trace is irrelevant. The nf_trace update is redunant in this case, but this will only happen for short duration, when static key flips. text data bss dec hex filename old: 2980 192 32 3204 c84 nf_tables_core.o new: 2964 192 32 3188 c74i nf_tables_core.o Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: make validation state per tableFlorian Westphal
We only need to validate tables that saw changes in the current transaction. The existing code revalidates all tables, but this isn't needed as cross-table jumps are not allowed (chains have table scope). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: don't write table validation state without mutexFlorian Westphal
The ->cleanup callback needs to be removed, this doesn't work anymore as the transaction mutex is already released in the ->abort function. Just do it after a successful validation pass, this either happens from commit or abort phases where transaction mutex is held. Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: don't store chain address on jumpFlorian Westphal
Now that the rule trailer/end marker and the rcu head reside in the same structure, we no longer need to save/restore the chain pointer when performing/returning from a jump. We can simply let the trace infra walk the evaluated rule until it hits the end marker and then fetch the chain pointer from there. When the rule is NULL (policy tracing), then chain and basechain pointers were already identical, so just use the basechain. This cuts size of jumpstack in half, from 256 to 128 bytes in 64bit, scripts/stackusage says: nf_tables_core.c:251 nft_do_chain 328 static Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: don't store address of last rule on jumpFlorian Westphal
Walk the rule headers until the trailer one (last_bit flag set) instead of stopping at last_rule address. This avoids the need to store the address when jumping to another chain. This cuts size of jumpstack array by one third, on 64bit from 384 to 256 bytes. Still, stack usage is still quite large: scripts/stackusage: nf_tables_core.c:258 nft_do_chain 496 static Next patch will also remove chain pointer. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-22netfilter: nf_tables: merge nft_rules_old structure and end of ruleblob markerFlorian Westphal
In order to free the rules in a chain via call_rcu, the rule array used to stash a rcu_head and space for a pointer at the end of the rule array. When the current nft_rule_dp blob format got added in 2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout"), this results in a double-trailer: size (unsigned long) struct nft_rule_dp struct nft_expr ... struct nft_rule_dp struct nft_expr ... struct nft_rule_dp (is_last=1) // Trailer The trailer, struct nft_rule_dp (is_last=1), is not accounted for in size, so it can be located via start_addr + size. Because the rcu_head is stored after 'start+size' as well this means the is_last trailer is *aliased* to the rcu_head (struct nft_rules_old). This is harmless, because at this time the nft_do_chain function never evaluates/accesses the trailer, it only checks the address boundary: for (; rule < last_rule; rule = nft_rule_next(rule)) { ... But this way the last_rule address has to be stashed in the jump structure to restore it after returning from a chain. nft_do_chain stack usage has become way too big, so put it on a diet. Without this patch is impossible to use for (; !rule->is_last; rule = nft_rule_next(rule)) { ... because on free, the needed update of the rcu_head will clobber the nft_rule_dp is_last bit. Furthermore, also stash the chain pointer in the trailer, this allows to recover the original chain structure from nf_tables_trace infra without a need to place them in the jump struct. After this patch it is trivial to diet the jump stack structure, done in the next two patches. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-04-21bpf: add test_run support for netfilter program typeFlorian Westphal
add glue code so a bpf program can be run using userspace-provided netfilter state and packet/skb. Default is to use ipv4:output hook point, but this can be overridden by userspace. Userspace provided netfilter state is restricted, only hook and protocol families can be overridden and only to ipv4/ipv6. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-7-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21netfilter: disallow bpf hook attachment at same priorityFlorian Westphal
This is just to avoid ordering issues between multiple bpf programs, this could be removed later in case it turns out to be too cautious. bpf prog could still be shared with non-bpf hook, otherwise we'd have to make conntrack hook registration fail just because a bpf program has same priority. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-5-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21netfilter: nfnetlink hook: dump bpf prog idFlorian Westphal
This allows userspace ("nft list hooks") to show which bpf program is attached to which hook. Without this, user only knows bpf prog is attached at prio x, y, z at INPUT and FORWARD, but can't tell which program is where. v4: kdoc fixups (Simon Horman) Link: https://lore.kernel.org/bpf/ZEELzpNCnYJuZyod@corigine.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-4-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21bpf: minimal support for programs hooked into netfilter frameworkFlorian Westphal
This adds minimal support for BPF_PROG_TYPE_NETFILTER bpf programs that will be invoked via the NF_HOOK() points in the ip stack. Invocation incurs an indirect call. This is not a necessity: Its possible to add 'DEFINE_BPF_DISPATCHER(nf_progs)' and handle the program invocation with the same method already done for xdp progs. This isn't done here to keep the size of this chunk down. Verifier restricts verdicts to either DROP or ACCEPT. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-3-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21bpf: add bpf_link support for BPF_NETFILTER programsFlorian Westphal
Add bpf_link support skeleton. To keep this reviewable, no bpf program can be invoked yet, if a program is attached only a c-stub is called and not the actual bpf program. Defaults to 'y' if both netfilter and bpf syscall are enabled in kconfig. Uapi example usage: union bpf_attr attr = { }; attr.link_create.prog_fd = progfd; attr.link_create.attach_type = 0; /* unused */ attr.link_create.netfilter.pf = PF_INET; attr.link_create.netfilter.hooknum = NF_INET_LOCAL_IN; attr.link_create.netfilter.priority = -128; err = bpf(BPF_LINK_CREATE, &attr, sizeof(attr)); ... this would attach progfd to ipv4:input hook. Such hook gets removed automatically if the calling program exits. BPF_NETFILTER program invocation is added in followup change. NF_HOOK_OP_BPF enum will eventually be read from nfnetlink_hook, it allows to tell userspace which program is attached at the given hook when user runs 'nft hook list' command rather than just the priority and not-very-helpful 'this hook runs a bpf prog but I can't tell which one'. Will also be used to disallow registration of two bpf programs with same priority in a followup patch. v4: arm32 cmpxchg only supports 32bit operand s/prio/priority/ v3: restrict prog attachment to ip/ip6 for now, lets lift restrictions if more use cases pop up (arptables, ebtables, netdev ingress/egress etc). Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-2-fw@strlen.de Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-04-21Merge tag 'nf-23-04-21' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Set on IPS_CONFIRMED before change_status() otherwise EBUSY is bogusly hit. This bug was introduced in the 6.3 release cycle. 2) Fix nfnetlink_queue conntrack support: Set/dump timeout accordingly for unconfirmed conntrack entries. Make sure this is done after IPS_CONFIRMED is set on. This is an old bug, it happens since the introduction of this feature. * tag 'nf-23-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: conntrack: fix wrong ct->timeout value netfilter: conntrack: restore IPS_CONFIRMED out of nf_conntrack_hash_check_insert() ==================== Link: https://lore.kernel.org/r/20230421105700.325438-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-21Merge tag 'wireless-next-2023-04-21' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.4 Most likely the last -next pull request for v6.4. We have changes all over. rtw88 now supports SDIO bus and iwlwifi continues to work on Wi-Fi 7 support. Not much stack changes this time. Major changes: cfg80211/mac80211 - fix some Fine Time Measurement (FTM) frames not being bufferable - flush frames before key removal to avoid potential unencrypted transmission depending on the hardware design iwlwifi - preparation for Wi-Fi 7 EHT and multi-link support rtw88 - SDIO bus support - RTL8822BS, RTL8822CS and RTL8821CS SDIO chipset support rtw89 - framework firmware backwards compatibility brcmfmac - Cypress 43439 SDIO support mt76 - mt7921 P2P support - mt7996 mesh A-MSDU support - mt7996 EHT support - mt7996 coredump support wcn36xx - support for pronto v3 hardware ath11k - PCIe DeviceTree bindings - WCN6750: enable SAR support ath10k - convert DeviceTree bindings to YAML * tag 'wireless-next-2023-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (261 commits) wifi: rtw88: Update spelling in main.h wifi: airo: remove ISA_DMA_API dependency wifi: rtl8xxxu: Simplify setting the initial gain wifi: rtl8xxxu: Add rtl8xxxu_write{8,16,32}_{set,clear} wifi: rtl8xxxu: Don't print the vendor/product/serial wifi: rtw88: Fix memory leak in rtw88_usb wifi: rtw88: call rtw8821c_switch_rf_set() according to chip variant wifi: rtw88: set pkg_type correctly for specific rtw8821c variants wifi: rtw88: rtw8821c: Fix rfe_option field width wifi: rtw88: usb: fix priority queue to endpoint mapping wifi: rtw88: 8822c: add iface combination wifi: rtw88: handle station mode concurrent scan with AP mode wifi: rtw88: prevent scan abort with other VIFs wifi: rtw88: refine reserved page flow for AP mode wifi: rtw88: disallow PS during AP mode wifi: rtw88: 8822c: extend reserved page number wifi: rtw88: add port switch for AP mode wifi: rtw88: add bitmap for dynamic port settings wifi: rtw89: mac: use regular int as return type of DLE buffer request wifi: mac80211: remove return value check of debugfs_create_dir() ... ==================== Link: https://lore.kernel.org/r/20230421104726.800BCC433D2@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-21net/packet: support mergeable feature of virtioJianfeng Tan
Packet sockets, like tap, can be used as the backend for kernel vhost. In packet sockets, virtio net header size is currently hardcoded to be the size of struct virtio_net_hdr, which is 10 bytes; however, it is not always the case: some virtio features, such as mrg_rxbuf, need virtio net header to be 12-byte long. Mergeable buffers, as a virtio feature, is worthy of supporting: packets that are larger than one-mbuf size will be dropped in vhost worker's handle_rx if mrg_rxbuf feature is not used, but large packets cannot be avoided and increasing mbuf's size is not economical. With this virtio feature enabled by virtio-user, packet sockets with hardcoded 10-byte virtio net header will parse mac head incorrectly in packet_snd by taking the last two bytes of virtio net header as part of mac header. This incorrect mac header parsing will cause packet to be dropped due to invalid ether head checking in later under-layer device packet receiving. By adding extra field vnet_hdr_sz with utilizing holes in struct packet_sock to record currently used virtio net header size and supporting extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet sockets can know the exact length of virtio net header that virtio user gives. In packet_snd, tpacket_snd and packet_recvmsg, instead of using hardcoded virtio net header size, it can get the exact vnet_hdr_sz from corresponding packet_sock, and parse mac header correctly based on this information to avoid the packets being mistakenly dropped. Signed-off-by: Jianfeng Tan <henry.tjf@antgroup.com> Co-developed-by: Anqi Shen <amy.saq@antgroup.com> Signed-off-by: Anqi Shen <amy.saq@antgroup.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21bridge: Allow setting per-{Port, VLAN} neighbor suppression stateIdo Schimmel
Add a new bridge port attribute that allows user space to enable per-{Port, VLAN} neighbor suppression. Example: # bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]' false # bridge link set dev swp1 neigh_vlan_suppress on # bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]' true # bridge link set dev swp1 neigh_vlan_suppress off # bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]' false Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21bridge: vlan: Allow setting VLAN neighbor suppression stateIdo Schimmel
Add a new VLAN attribute that allows user space to set the neighbor suppression state of the port VLAN. Example: # bridge -d -j -p vlan show dev swp1 vid 10 | jq '.[]["vlans"][]["neigh_suppress"]' false # bridge vlan set vid 10 dev swp1 neigh_suppress on # bridge -d -j -p vlan show dev swp1 vid 10 | jq '.[]["vlans"][]["neigh_suppress"]' true # bridge vlan set vid 10 dev swp1 neigh_suppress off # bridge -d -j -p vlan show dev swp1 vid 10 | jq '.[]["vlans"][]["neigh_suppress"]' false # bridge vlan set vid 10 dev br0 neigh_suppress on Error: bridge: Can't set neigh_suppress for non-port vlans. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21bridge: Add per-{Port, VLAN} neighbor suppression data path supportIdo Schimmel
When the bridge is not VLAN-aware (i.e., VLAN ID is 0), determine if neighbor suppression is enabled on a given bridge port solely based on the existing 'BR_NEIGH_SUPPRESS' flag. Otherwise, if the bridge is VLAN-aware, first check if per-{Port, VLAN} neighbor suppression is enabled on the given bridge port using the 'BR_NEIGH_VLAN_SUPPRESS' flag. If so, look up the VLAN and check whether it has neighbor suppression enabled based on the per-VLAN 'BR_VLFLAG_NEIGH_SUPPRESS_ENABLED' flag. If the bridge is VLAN-aware, but the bridge port does not have per-{Port, VLAN} neighbor suppression enabled, then fallback to determine neighbor suppression based on the 'BR_NEIGH_SUPPRESS' flag. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21bridge: Encapsulate data path neighbor suppression logicIdo Schimmel
Currently, there are various places in the bridge data path that check whether neighbor suppression is enabled on a given bridge port. As a preparation for per-{Port, VLAN} neighbor suppression, encapsulate this logic in a function and pass the VLAN ID of the packet as an argument. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21bridge: Take per-{Port, VLAN} neighbor suppression into accountIdo Schimmel
The bridge driver gates the neighbor suppression code behind an internal per-bridge flag called 'BROPT_NEIGH_SUPPRESS_ENABLED'. The flag is set when at least one bridge port has neighbor suppression enabled. As a preparation for per-{Port, VLAN} neighbor suppression, make sure the global flag is also set if per-{Port, VLAN} neighbor suppression is enabled. That is, when the 'BR_NEIGH_VLAN_SUPPRESS' flag is set on at least one bridge port. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21bridge: Add internal flags for per-{Port, VLAN} neighbor suppressionIdo Schimmel
Add two internal flags that will be used to enable / disable per-{Port, VLAN} neighbor suppression: 1. 'BR_NEIGH_VLAN_SUPPRESS': A per-port flag used to indicate that per-{Port, VLAN} neighbor suppression is enabled on the bridge port. When set, 'BR_NEIGH_SUPPRESS' has no effect. 2. 'BR_VLFLAG_NEIGH_SUPPRESS_ENABLED': A per-VLAN flag used to indicate that neighbor suppression is enabled on the given VLAN. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21bridge: Pass VLAN ID to br_flood()Ido Schimmel
Subsequent patches are going to add per-{Port, VLAN} neighbor suppression, which will require br_flood() to potentially suppress ARP / NS packets on a per-{Port, VLAN} basis. As a preparation, pass the VLAN ID of the packet as another argument to br_flood(). Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21bridge: Reorder neighbor suppression check when floodingIdo Schimmel
The bridge does not flood ARP / NS packets for which a reply was sent to bridge ports that have neighbor suppression enabled. Subsequent patches are going to add per-{Port, VLAN} neighbor suppression, which is going to make it more expensive to check whether neighbor suppression is enabled since a VLAN lookup will be required. Therefore, instead of unnecessarily performing this lookup for every packet, only perform it for ARP / NS packets for which a reply was sent. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21vlan: Add MACsec offload operations for VLAN interfaceEmeel Hakim
Add support for MACsec offload operations for VLAN driver to allow offloading MACsec when VLAN's real device supports Macsec offload by forwarding the offload request to it. Signed-off-by: Emeel Hakim <ehakim@nvidia.com> Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21sctp: delete the nested flexible array hmacXin Long
This patch deletes the flexible-array hmac[] from the structure sctp_authhdr to avoid some sparse warnings: # make C=2 CF="-Wflexible-array-nested" M=./net/sctp/ net/sctp/auth.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h): ./include/linux/sctp.h:735:29: warning: nested flexible array Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21sctp: delete the nested flexible array peer_initXin Long
This patch deletes the flexible-array peer_init[] from the structure sctp_cookie to avoid some sparse warnings: # make C=2 CF="-Wflexible-array-nested" M=./net/sctp/ net/sctp/sm_make_chunk.c: note: in included file (through include/net/sctp/sctp.h): ./include/net/sctp/structs.h:1588:28: warning: nested flexible array ./include/net/sctp/structs.h:343:28: warning: nested flexible array Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21sctp: delete the nested flexible array variableXin Long
This patch deletes the flexible-array variable[] from the structure sctp_sackhdr and sctp_errhdr to avoid some sparse warnings: # make C=2 CF="-Wflexible-array-nested" M=./net/sctp/ net/sctp/sm_statefuns.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h): ./include/linux/sctp.h:451:28: warning: nested flexible array ./include/linux/sctp.h:393:29: warning: nested flexible array Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21sctp: delete the nested flexible array skipXin Long
This patch deletes the flexible-array skip[] from the structure sctp_ifwdtsn/fwdtsn_hdr to avoid some sparse warnings: # make C=2 CF="-Wflexible-array-nested" M=./net/sctp/ net/sctp/stream_interleave.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h): ./include/linux/sctp.h:611:32: warning: nested flexible array ./include/linux/sctp.h:628:33: warning: nested flexible array Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21sctp: delete the nested flexible array paramsXin Long
This patch deletes the flexible-array params[] from the structure sctp_inithdr, sctp_addiphdr and sctp_reconf_chunk to avoid some sparse warnings: # make C=2 CF="-Wflexible-array-nested" M=./net/sctp/ net/sctp/input.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h): ./include/linux/sctp.h:278:29: warning: nested flexible array ./include/linux/sctp.h:675:30: warning: nested flexible array This warning is reported if a structure having a flexible array member is included by other structures. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-04-21xfrm: Fix leak of dev trackerLeon Romanovsky
At the stage of direction checks, the netdev reference tracker is already initialized, but released with wrong *_put() call. Fixes: 919e43fad516 ("xfrm: add an interface to offload policy") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-04-21xfrm: release all offloaded policy memoryLeon Romanovsky
Failure to add offloaded policy will cause to the following error once user will try to reload driver. Unregister_netdevice: waiting for eth3 to become free. Usage count = 2 This was caused by xfrm_dev_policy_add() which increments reference to net_device. That reference was supposed to be decremented in xfrm_dev_policy_free(). However the latter wasn't called. unregister_netdevice: waiting for eth3 to become free. Usage count = 2 leaked reference. xfrm_dev_policy_add+0xff/0x3d0 xfrm_policy_construct+0x352/0x420 xfrm_add_policy+0x179/0x320 xfrm_user_rcv_msg+0x1d2/0x3d0 netlink_rcv_skb+0xe0/0x210 xfrm_netlink_rcv+0x45/0x50 netlink_unicast+0x346/0x490 netlink_sendmsg+0x3b0/0x6c0 sock_sendmsg+0x73/0xc0 sock_write_iter+0x13b/0x1f0 vfs_write+0x528/0x5d0 ksys_write+0x120/0x150 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: 919e43fad516 ("xfrm: add an interface to offload policy") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2023-04-20mac80211: use the new drop reasons infrastructureJohannes Berg
It can be really hard to analyse or debug why packets are going missing in mac80211, so add the needed infrastructure to use use the new per-subsystem drop reasons. We actually use two drop reason subsystems here because of the different handling of frames that are dropped but still go to monitor for old versions of hostapd, and those that are just completely unusable (e.g. crypto failed.) Annotate a few reasons here just to illustrate this, we'll need to go through and annotate more of them later. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20net: extend drop reasons for multiple subsystemsJohannes Berg
Extend drop reasons to make them usable by subsystems other than core by reserving the high 16 bits for a new subsystem ID, of which 0 of course is used for the existing reasons immediately. To still be able to have string reasons, restructure that code a bit to make the loopup under RCU, the only user of this (right now) is drop_monitor. Link: https://lore.kernel.org/netdev/00659771ed54353f92027702c5bbb84702da62ce.camel@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20ipv6: add icmpv6_error_anycast_as_unicast for ICMPv6Mahesh Bandewar
ICMPv6 error packets are not sent to the anycast destinations and this prevents things like traceroute from working. So create a setting similar to ECHO when dealing with Anycast sources (icmpv6_echo_ignore_anycast). Signed-off-by: Mahesh Bandewar <maheshb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Maciej Żenczykowski <maze@google.com> Link: https://lore.kernel.org/r/20230419013238.2691167-1-maheshb@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20net: ethtool: mm: sanitize some UAPI configurationsVladimir Oltean
The verify-enabled boolean (ETHTOOL_A_MM_VERIFY_ENABLED) was intended to be a sub-setting of tx-enabled (ETHTOOL_A_MM_TX_ENABLED). IOW, MAC Merge TX can be enabled with or without verification, but verification with TX disabled makes no sense. The pmac-enabled boolean (ETHTOOL_A_MM_PMAC_ENABLED) was intended to be a global toggle from an API perspective, whereas tx-enabled just handles the TX direction. IOW, the pMAC can be enabled with or without TX, but it doesn't make sense to enable TX if the pMAC is not enabled. Add two checks which sanitize and reject these invalid cases. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20kill the last remaining user of proc_ns_fget()Al Viro
lookups by descriptor are better off closer to syscall surface... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-04-20net: skbuff: update and rename __kfree_skb_defer()Jakub Kicinski
__kfree_skb_defer() uses the old naming where "defer" meant slab bulk free/alloc APIs. In the meantime we also made __kfree_skb_defer() feed the per-NAPI skb cache, which implies bulk APIs. So take away the 'defer' and add 'napi'. While at it add a drop reason. This only matters on the tx_action path, if the skb has a frag_list. But getting rid of a SKB_DROP_REASON_NOT_SPECIFIED seems like a net benefit so why not. Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230420020005.815854-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20page_pool: unlink from napi during destroyJakub Kicinski
Jesper points out that we must prevent recycling into cache after page_pool_destroy() is called, because page_pool_destroy() is not synchronized with recycling (some pages may still be outstanding when destroy() gets called). I assumed this will not happen because NAPI can't be scheduled if its page pool is being destroyed. But I missed the fact that NAPI may get reused. For instance when user changes ring configuration driver may allocate a new page pool, stop NAPI, swap, start NAPI, and then destroy the old pool. The NAPI is running so old page pool will think it can recycle to the cache, but the consumer at that point is the destroy() path, not NAPI. To avoid extra synchronization let the drivers do "unlinking" during the "swap" stage while NAPI is indeed disabled. Fixes: 8c48eea3adf3 ("page_pool: allow caching from safely localized NAPI") Reported-by: Jesper Dangaard Brouer <jbrouer@redhat.com> Link: https://lore.kernel.org/all/e8df2654-6a5b-3c92-489d-2fe5e444135f@redhat.com/ Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/20230419182006.719923-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Adjacent changes: net/mptcp/protocol.h 63740448a32e ("mptcp: fix accept vs worker race") 2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close") ddb1a072f858 ("mptcp: move first subflow allocation at mpc access time") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-20Merge tag 'net-6.3-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter and bpf. There are a few fixes for new code bugs, including the Mellanox one noted in the last networking pull. No known regressions outstanding. Current release - regressions: - sched: clear actions pointer in miss cookie init fail - mptcp: fix accept vs worker race - bpf: fix bpf_arch_text_poke() with new_addr == NULL on s390 - eth: bnxt_en: fix a possible NULL pointer dereference in unload path - eth: veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag Current release - new code bugs: - eth: revert "net/mlx5: Enable management PF initialization" Previous releases - regressions: - netfilter: fix recent physdev match breakage - bpf: fix incorrect verifier pruning due to missing register precision taints - eth: virtio_net: fix overflow inside xdp_linearize_page() - eth: cxgb4: fix use after free bugs caused by circular dependency problem - eth: mlxsw: pci: fix possible crash during initialization Previous releases - always broken: - sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg - netfilter: validate catch-all set elements - bridge: don't notify FDB entries with "master dynamic" - eth: bonding: fix memory leak when changing bond type to ethernet - eth: i40e: fix accessing vsi->active_filters without holding lock Misc: - Mat is back as MPTCP co-maintainer" * tag 'net-6.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits) net: bridge: switchdev: don't notify FDB entries with "master dynamic" Revert "net/mlx5: Enable management PF initialization" MAINTAINERS: Resume MPTCP co-maintainer role mailmap: add entries for Mat Martineau e1000e: Disable TSO on i219-LM card to increase speed bnxt_en: fix free-runnig PHC mode net: dsa: microchip: ksz8795: Correctly handle huge frame configuration bpf: Fix incorrect verifier pruning due to missing register precision taints hamradio: drop ISA_DMA_API dependency mlxsw: pci: Fix possible crash during initialization mptcp: fix accept vs worker race mptcp: stops worker on unaccepted sockets at listener close net: rpl: fix rpl header size calculation net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete() bonding: Fix memory leak when changing bond type to Ethernet veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next() bnxt_en: Fix a possible NULL pointer dereference in unload path bnxt_en: Do not initialize PTP on older P3/P4 chips netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements ...
2023-04-20wifi: mac80211: remove return value check of debugfs_create_dir()Yingsha Xu
Smatch complains that: debugfs_hw_add() warn: 'statsd' is an error pointer or valid Debugfs checks are generally not supposed to be checked for errors and it is not necessary here. Just delete the dead code. Signed-off-by: Yingsha Xu <ysxu@hust.edu.cn> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Link: https://lore.kernel.org/r/20230419104548.30124-1-ysxu@hust.edu.cn Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-04-20net: bridge: switchdev: don't notify FDB entries with "master dynamic"Vladimir Oltean
There is a structural problem in switchdev, where the flag bits in struct switchdev_notifier_fdb_info (added_by_user, is_local etc) only represent a simplified / denatured view of what's in struct net_bridge_fdb_entry :: flags (BR_FDB_ADDED_BY_USER, BR_FDB_LOCAL etc). Each time we want to pass more information about struct net_bridge_fdb_entry :: flags to struct switchdev_notifier_fdb_info (here, BR_FDB_STATIC), we find that FDB entries were already notified to switchdev with no regard to this flag, and thus, switchdev drivers had no indication whether the notified entries were static or not. For example, this command: ip link add br0 type bridge && ip link set swp0 master br0 bridge fdb add dev swp0 00:01:02:03:04:05 master dynamic has never worked as intended with switchdev. It causes a struct net_bridge_fdb_entry to be passed to br_switchdev_fdb_notify() which has a single flag set: BR_FDB_ADDED_BY_USER. This is further passed to the switchdev notifier chain, where interested drivers have no choice but to assume this is a static (does not age) and sticky (does not migrate) FDB entry. So currently, all drivers offload it to hardware as such, as can be seen below ("offload" is set). bridge fdb get 00:01:02:03:04:05 dev swp0 master 00:01:02:03:04:05 dev swp0 offload master br0 The software FDB entry expires $ageing_time centiseconds after the kernel last sees a packet with this MAC SA, and the bridge notifies its deletion as well, so it eventually disappears from hardware too. This is a problem, because it is actually desirable to start offloading "master dynamic" FDB entries correctly - they should expire $ageing_time centiseconds after the *hardware* port last sees a packet with this MAC SA - and this is how the current incorrect behavior was discovered. With an offloaded data plane, it can be expected that software only sees exception path packets, so an otherwise active dynamic FDB entry would be aged out by software sooner than it should. With the change in place, these FDB entries are no longer offloaded: bridge fdb get 00:01:02:03:04:05 dev swp0 master 00:01:02:03:04:05 dev swp0 master br0 and this also constitutes a better way (assuming a backport to stable kernels) for user space to determine whether the kernel has the capability of doing something sane with these or not. As opposed to "master dynamic" FDB entries, on the current behavior of which no one currently depends on (which can be deduced from the lack of kselftests), Ido Schimmel explains that entries with the "extern_learn" flag (BR_FDB_ADDED_BY_EXT_LEARN) should still be notified to switchdev, since the spectrum driver listens to them (and this is kind of okay, because although they are treated identically to "static", they are expected to not age, and to roam). Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about FDB add/del") Link: https://lore.kernel.org/netdev/20230327115206.jk5q5l753aoelwus@skbuf/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230418155902.898627-1-vladimir.oltean@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-04-19net/handshake: Add Kunit tests for the handshake consumer APIChuck Lever
These verify the API contracts and help exercise lifetime rules for consumer sockets and handshake_req structures. One way to run these tests: ./tools/testing/kunit/kunit.py run --kunitconfig ./net/handshake/.kunitconfig Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19net/handshake: Add a kernel API for requesting a TLSv1.3 handshakeChuck Lever
To enable kernel consumers of TLS to request a TLS handshake, add support to net/handshake/ to request a handshake upcall. This patch also acts as a template for adding handshake upcall support for other kernel transport layer security providers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-04-19net/handshake: Create a NETLINK service for handling handshake requestsChuck Lever
When a kernel consumer needs a transport layer security session, it first needs a handshake to negotiate and establish a session. This negotiation can be done in user space via one of the several existing library implementations, or it can be done in the kernel. No in-kernel handshake implementations yet exist. In their absence, we add a netlink service that can: a. Notify a user space daemon that a handshake is needed. b. Once notified, the daemon calls the kernel back via this netlink service to get the handshake parameters, including an open socket on which to establish the session. c. Once the handshake is complete, the daemon reports the session status and other information via a second netlink operation. This operation marks that it is safe for the kernel to use the open socket and the security session established there. The notification service uses a multicast group. Each handshake mechanism (eg, tlshd) adopts its own group number so that the handshake services are completely independent of one another. The kernel can then tell via netlink_has_listeners() whether a handshake service is active and prepared to handle a handshake request. A new netlink operation, ACCEPT, acts like accept(2) in that it instantiates a file descriptor in the user space daemon's fd table. If this operation is successful, the reply carries the fd number, which can be treated as an open and ready file descriptor. While user space is performing the handshake, the kernel keeps its muddy paws off the open socket. A second new netlink operation, DONE, indicates that the user space daemon is finished with the socket and it is safe for the kernel to use again. The operation also indicates whether a session was established successfully. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>