Age | Commit message (Collapse) | Author |
|
Remove EnterFunction and LeaveFunction.
These debugging macros seem well past their use-by date. And seem to
have little value these days. Removing them allows some trivial cleanup
of some exit paths for some functions. These are also included in this
patch. There is likely scope for further cleanup of both debugging and
unwind paths. But let's leave that for another day.
Only intended to change debug output, and only when CONFIG_IP_VS_DEBUG
is enabled. Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Consistently use array_size() to calculate the size of ip_vs_conn_tab
in bytes.
Flagged by Coccinelle:
WARNING: array_size is already used (line 1498) to compute the same size
No functional change intended.
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In ip_vs_sync_conn_v0() copy is made to struct ip_vs_sync_conn_options.
That structure looks like this:
struct ip_vs_sync_conn_options {
struct ip_vs_seq in_seq;
struct ip_vs_seq out_seq;
};
The source of the copy is the in_seq field of struct ip_vs_conn. Whose
type is struct ip_vs_seq. Thus we can see that the source - is not as
wide as the amount of data copied, which is the width of struct
ip_vs_sync_conn_option.
The copy is safe because the next field in is another struct ip_vs_seq.
Make use of struct_group() to annotate this.
Flagged by gcc-13 as:
In file included from ./include/linux/string.h:254,
from ./include/linux/bitmap.h:11,
from ./include/linux/cpumask.h:12,
from ./arch/x86/include/asm/paravirt.h:17,
from ./arch/x86/include/asm/cpuid.h:62,
from ./arch/x86/include/asm/processor.h:19,
from ./arch/x86/include/asm/timex.h:5,
from ./include/linux/timex.h:67,
from ./include/linux/time32.h:13,
from ./include/linux/time.h:60,
from ./include/linux/stat.h:19,
from ./include/linux/module.h:13,
from net/netfilter/ipvs/ip_vs_sync.c:38:
In function 'fortify_memcpy_chk',
inlined from 'ip_vs_sync_conn_v0' at net/netfilter/ipvs/ip_vs_sync.c:606:3:
./include/linux/fortify-string.h:529:25: error: call to '__read_overflow2_field' declared with attribute warning: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror=attribute-warning]
529 | __read_overflow2_field(q_size_field, size);
|
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
pass it as argument instead. This reduces size of traceinfo to
16 bytes. Total stack usage:
nf_tables_core.c:252 nft_do_chain 304 static
While its possible to also pass basechain as argument, doing so
increases nft_do_chaininfo function size.
Unlike pktinfo/verdict/rule the basechain info isn't used in
the expression evaluation path. gcc places it on the stack, which
results in extra push/pop when it gets passed to the trace helpers
as argument rather than as part of the traceinfo structure.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Just pass it as argument to nft_trace_notify. Stack is reduced by 8 bytes:
nf_tables_core.c:256 nft_do_chain 312 static
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
pass it as argument. No change in object size.
stack usage decreases by 8 byte:
nf_tables_core.c:254 nft_do_chain 320 static
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This helper is inlined, so keep it as small as possible.
If the static key is true, there is only a very small chance
that info->trace is false:
1. tracing was enabled at this very moment, the static key was
updated to active right after nft_do_table was called.
2. tracing was disabled at this very moment.
trace->info is already false, the static key is about to
be patched to false soon.
In both cases, no event will be sent because info->trace
is false (checked in noinline slowpath). info->nf_trace is irrelevant.
The nf_trace update is redunant in this case, but this will only
happen for short duration, when static key flips.
text data bss dec hex filename
old: 2980 192 32 3204 c84 nf_tables_core.o
new: 2964 192 32 3188 c74i nf_tables_core.o
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We only need to validate tables that saw changes in the current
transaction.
The existing code revalidates all tables, but this isn't needed as
cross-table jumps are not allowed (chains have table scope).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The ->cleanup callback needs to be removed, this doesn't work anymore as
the transaction mutex is already released in the ->abort function.
Just do it after a successful validation pass, this either happens
from commit or abort phases where transaction mutex is held.
Fixes: f102d66b335a ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Now that the rule trailer/end marker and the rcu head reside in the
same structure, we no longer need to save/restore the chain pointer
when performing/returning from a jump.
We can simply let the trace infra walk the evaluated rule until it
hits the end marker and then fetch the chain pointer from there.
When the rule is NULL (policy tracing), then chain and basechain
pointers were already identical, so just use the basechain.
This cuts size of jumpstack in half, from 256 to 128 bytes in 64bit,
scripts/stackusage says:
nf_tables_core.c:251 nft_do_chain 328 static
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Walk the rule headers until the trailer one (last_bit flag set) instead
of stopping at last_rule address.
This avoids the need to store the address when jumping to another chain.
This cuts size of jumpstack array by one third, on 64bit from
384 to 256 bytes. Still, stack usage is still quite large:
scripts/stackusage:
nf_tables_core.c:258 nft_do_chain 496 static
Next patch will also remove chain pointer.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In order to free the rules in a chain via call_rcu, the rule array used
to stash a rcu_head and space for a pointer at the end of the rule array.
When the current nft_rule_dp blob format got added in
2c865a8a28a1 ("netfilter: nf_tables: add rule blob layout"), this results
in a double-trailer:
size (unsigned long)
struct nft_rule_dp
struct nft_expr
...
struct nft_rule_dp
struct nft_expr
...
struct nft_rule_dp (is_last=1) // Trailer
The trailer, struct nft_rule_dp (is_last=1), is not accounted for in size,
so it can be located via start_addr + size.
Because the rcu_head is stored after 'start+size' as well this means the
is_last trailer is *aliased* to the rcu_head (struct nft_rules_old).
This is harmless, because at this time the nft_do_chain function never
evaluates/accesses the trailer, it only checks the address boundary:
for (; rule < last_rule; rule = nft_rule_next(rule)) {
...
But this way the last_rule address has to be stashed in the jump
structure to restore it after returning from a chain.
nft_do_chain stack usage has become way too big, so put it on a diet.
Without this patch is impossible to use
for (; !rule->is_last; rule = nft_rule_next(rule)) {
... because on free, the needed update of the rcu_head will clobber the
nft_rule_dp is_last bit.
Furthermore, also stash the chain pointer in the trailer, this allows
to recover the original chain structure from nf_tables_trace infra
without a need to place them in the jump struct.
After this patch it is trivial to diet the jump stack structure,
done in the next two patches.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
add glue code so a bpf program can be run using userspace-provided
netfilter state and packet/skb.
Default is to use ipv4:output hook point, but this can be overridden by
userspace. Userspace provided netfilter state is restricted, only hook and
protocol families can be overridden and only to ipv4/ipv6.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-7-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This is just to avoid ordering issues between multiple bpf programs,
this could be removed later in case it turns out to be too cautious.
bpf prog could still be shared with non-bpf hook, otherwise we'd have to
make conntrack hook registration fail just because a bpf program has
same priority.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-5-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This allows userspace ("nft list hooks") to show which bpf program
is attached to which hook.
Without this, user only knows bpf prog is attached at prio
x, y, z at INPUT and FORWARD, but can't tell which program is where.
v4: kdoc fixups (Simon Horman)
Link: https://lore.kernel.org/bpf/ZEELzpNCnYJuZyod@corigine.com/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-4-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This adds minimal support for BPF_PROG_TYPE_NETFILTER bpf programs
that will be invoked via the NF_HOOK() points in the ip stack.
Invocation incurs an indirect call. This is not a necessity: Its
possible to add 'DEFINE_BPF_DISPATCHER(nf_progs)' and handle the
program invocation with the same method already done for xdp progs.
This isn't done here to keep the size of this chunk down.
Verifier restricts verdicts to either DROP or ACCEPT.
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-3-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add bpf_link support skeleton. To keep this reviewable, no bpf program
can be invoked yet, if a program is attached only a c-stub is called and
not the actual bpf program.
Defaults to 'y' if both netfilter and bpf syscall are enabled in kconfig.
Uapi example usage:
union bpf_attr attr = { };
attr.link_create.prog_fd = progfd;
attr.link_create.attach_type = 0; /* unused */
attr.link_create.netfilter.pf = PF_INET;
attr.link_create.netfilter.hooknum = NF_INET_LOCAL_IN;
attr.link_create.netfilter.priority = -128;
err = bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
... this would attach progfd to ipv4:input hook.
Such hook gets removed automatically if the calling program exits.
BPF_NETFILTER program invocation is added in followup change.
NF_HOOK_OP_BPF enum will eventually be read from nfnetlink_hook, it
allows to tell userspace which program is attached at the given hook
when user runs 'nft hook list' command rather than just the priority
and not-very-helpful 'this hook runs a bpf prog but I can't tell which
one'.
Will also be used to disallow registration of two bpf programs with
same priority in a followup patch.
v4: arm32 cmpxchg only supports 32bit operand
s/prio/priority/
v3: restrict prog attachment to ip/ip6 for now, lets lift restrictions if
more use cases pop up (arptables, ebtables, netdev ingress/egress etc).
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-2-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Set on IPS_CONFIRMED before change_status() otherwise EBUSY is
bogusly hit. This bug was introduced in the 6.3 release cycle.
2) Fix nfnetlink_queue conntrack support: Set/dump timeout
accordingly for unconfirmed conntrack entries. Make sure this
is done after IPS_CONFIRMED is set on. This is an old bug, it
happens since the introduction of this feature.
* tag 'nf-23-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: conntrack: fix wrong ct->timeout value
netfilter: conntrack: restore IPS_CONFIRMED out of nf_conntrack_hash_check_insert()
====================
Link: https://lore.kernel.org/r/20230421105700.325438-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v6.4
Most likely the last -next pull request for v6.4. We have changes all
over. rtw88 now supports SDIO bus and iwlwifi continues to work on
Wi-Fi 7 support. Not much stack changes this time.
Major changes:
cfg80211/mac80211
- fix some Fine Time Measurement (FTM) frames not being bufferable
- flush frames before key removal to avoid potential unencrypted
transmission depending on the hardware design
iwlwifi
- preparation for Wi-Fi 7 EHT and multi-link support
rtw88
- SDIO bus support
- RTL8822BS, RTL8822CS and RTL8821CS SDIO chipset support
rtw89
- framework firmware backwards compatibility
brcmfmac
- Cypress 43439 SDIO support
mt76
- mt7921 P2P support
- mt7996 mesh A-MSDU support
- mt7996 EHT support
- mt7996 coredump support
wcn36xx
- support for pronto v3 hardware
ath11k
- PCIe DeviceTree bindings
- WCN6750: enable SAR support
ath10k
- convert DeviceTree bindings to YAML
* tag 'wireless-next-2023-04-21' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (261 commits)
wifi: rtw88: Update spelling in main.h
wifi: airo: remove ISA_DMA_API dependency
wifi: rtl8xxxu: Simplify setting the initial gain
wifi: rtl8xxxu: Add rtl8xxxu_write{8,16,32}_{set,clear}
wifi: rtl8xxxu: Don't print the vendor/product/serial
wifi: rtw88: Fix memory leak in rtw88_usb
wifi: rtw88: call rtw8821c_switch_rf_set() according to chip variant
wifi: rtw88: set pkg_type correctly for specific rtw8821c variants
wifi: rtw88: rtw8821c: Fix rfe_option field width
wifi: rtw88: usb: fix priority queue to endpoint mapping
wifi: rtw88: 8822c: add iface combination
wifi: rtw88: handle station mode concurrent scan with AP mode
wifi: rtw88: prevent scan abort with other VIFs
wifi: rtw88: refine reserved page flow for AP mode
wifi: rtw88: disallow PS during AP mode
wifi: rtw88: 8822c: extend reserved page number
wifi: rtw88: add port switch for AP mode
wifi: rtw88: add bitmap for dynamic port settings
wifi: rtw89: mac: use regular int as return type of DLE buffer request
wifi: mac80211: remove return value check of debugfs_create_dir()
...
====================
Link: https://lore.kernel.org/r/20230421104726.800BCC433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Packet sockets, like tap, can be used as the backend for kernel vhost.
In packet sockets, virtio net header size is currently hardcoded to be
the size of struct virtio_net_hdr, which is 10 bytes; however, it is not
always the case: some virtio features, such as mrg_rxbuf, need virtio
net header to be 12-byte long.
Mergeable buffers, as a virtio feature, is worthy of supporting: packets
that are larger than one-mbuf size will be dropped in vhost worker's
handle_rx if mrg_rxbuf feature is not used, but large packets
cannot be avoided and increasing mbuf's size is not economical.
With this virtio feature enabled by virtio-user, packet sockets with
hardcoded 10-byte virtio net header will parse mac head incorrectly in
packet_snd by taking the last two bytes of virtio net header as part of
mac header.
This incorrect mac header parsing will cause packet to be dropped due to
invalid ether head checking in later under-layer device packet receiving.
By adding extra field vnet_hdr_sz with utilizing holes in struct
packet_sock to record currently used virtio net header size and supporting
extra sockopt PACKET_VNET_HDR_SZ to set specified vnet_hdr_sz, packet
sockets can know the exact length of virtio net header that virtio user
gives.
In packet_snd, tpacket_snd and packet_recvmsg, instead of using
hardcoded virtio net header size, it can get the exact vnet_hdr_sz from
corresponding packet_sock, and parse mac header correctly based on this
information to avoid the packets being mistakenly dropped.
Signed-off-by: Jianfeng Tan <henry.tjf@antgroup.com>
Co-developed-by: Anqi Shen <amy.saq@antgroup.com>
Signed-off-by: Anqi Shen <amy.saq@antgroup.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new bridge port attribute that allows user space to enable
per-{Port, VLAN} neighbor suppression. Example:
# bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]'
false
# bridge link set dev swp1 neigh_vlan_suppress on
# bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]'
true
# bridge link set dev swp1 neigh_vlan_suppress off
# bridge -d -j -p link show dev swp1 | jq '.[]["neigh_vlan_suppress"]'
false
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new VLAN attribute that allows user space to set the neighbor
suppression state of the port VLAN. Example:
# bridge -d -j -p vlan show dev swp1 vid 10 | jq '.[]["vlans"][]["neigh_suppress"]'
false
# bridge vlan set vid 10 dev swp1 neigh_suppress on
# bridge -d -j -p vlan show dev swp1 vid 10 | jq '.[]["vlans"][]["neigh_suppress"]'
true
# bridge vlan set vid 10 dev swp1 neigh_suppress off
# bridge -d -j -p vlan show dev swp1 vid 10 | jq '.[]["vlans"][]["neigh_suppress"]'
false
# bridge vlan set vid 10 dev br0 neigh_suppress on
Error: bridge: Can't set neigh_suppress for non-port vlans.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the bridge is not VLAN-aware (i.e., VLAN ID is 0), determine if
neighbor suppression is enabled on a given bridge port solely based on
the existing 'BR_NEIGH_SUPPRESS' flag.
Otherwise, if the bridge is VLAN-aware, first check if per-{Port, VLAN}
neighbor suppression is enabled on the given bridge port using the
'BR_NEIGH_VLAN_SUPPRESS' flag. If so, look up the VLAN and check whether
it has neighbor suppression enabled based on the per-VLAN
'BR_VLFLAG_NEIGH_SUPPRESS_ENABLED' flag.
If the bridge is VLAN-aware, but the bridge port does not have
per-{Port, VLAN} neighbor suppression enabled, then fallback to
determine neighbor suppression based on the 'BR_NEIGH_SUPPRESS' flag.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, there are various places in the bridge data path that check
whether neighbor suppression is enabled on a given bridge port.
As a preparation for per-{Port, VLAN} neighbor suppression, encapsulate
this logic in a function and pass the VLAN ID of the packet as an
argument.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The bridge driver gates the neighbor suppression code behind an internal
per-bridge flag called 'BROPT_NEIGH_SUPPRESS_ENABLED'. The flag is set
when at least one bridge port has neighbor suppression enabled.
As a preparation for per-{Port, VLAN} neighbor suppression, make sure
the global flag is also set if per-{Port, VLAN} neighbor suppression is
enabled. That is, when the 'BR_NEIGH_VLAN_SUPPRESS' flag is set on at
least one bridge port.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add two internal flags that will be used to enable / disable per-{Port,
VLAN} neighbor suppression:
1. 'BR_NEIGH_VLAN_SUPPRESS': A per-port flag used to indicate that
per-{Port, VLAN} neighbor suppression is enabled on the bridge port.
When set, 'BR_NEIGH_SUPPRESS' has no effect.
2. 'BR_VLFLAG_NEIGH_SUPPRESS_ENABLED': A per-VLAN flag used to indicate
that neighbor suppression is enabled on the given VLAN.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Subsequent patches are going to add per-{Port, VLAN} neighbor
suppression, which will require br_flood() to potentially suppress ARP /
NS packets on a per-{Port, VLAN} basis.
As a preparation, pass the VLAN ID of the packet as another argument to
br_flood().
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The bridge does not flood ARP / NS packets for which a reply was sent to
bridge ports that have neighbor suppression enabled.
Subsequent patches are going to add per-{Port, VLAN} neighbor
suppression, which is going to make it more expensive to check whether
neighbor suppression is enabled since a VLAN lookup will be required.
Therefore, instead of unnecessarily performing this lookup for every
packet, only perform it for ARP / NS packets for which a reply was sent.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for MACsec offload operations for VLAN driver
to allow offloading MACsec when VLAN's real device supports
Macsec offload by forwarding the offload request to it.
Signed-off-by: Emeel Hakim <ehakim@nvidia.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch deletes the flexible-array hmac[] from the structure
sctp_authhdr to avoid some sparse warnings:
# make C=2 CF="-Wflexible-array-nested" M=./net/sctp/
net/sctp/auth.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h):
./include/linux/sctp.h:735:29: warning: nested flexible array
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch deletes the flexible-array peer_init[] from the structure
sctp_cookie to avoid some sparse warnings:
# make C=2 CF="-Wflexible-array-nested" M=./net/sctp/
net/sctp/sm_make_chunk.c: note: in included file (through include/net/sctp/sctp.h):
./include/net/sctp/structs.h:1588:28: warning: nested flexible array
./include/net/sctp/structs.h:343:28: warning: nested flexible array
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch deletes the flexible-array variable[] from the structure
sctp_sackhdr and sctp_errhdr to avoid some sparse warnings:
# make C=2 CF="-Wflexible-array-nested" M=./net/sctp/
net/sctp/sm_statefuns.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h):
./include/linux/sctp.h:451:28: warning: nested flexible array
./include/linux/sctp.h:393:29: warning: nested flexible array
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch deletes the flexible-array skip[] from the structure
sctp_ifwdtsn/fwdtsn_hdr to avoid some sparse warnings:
# make C=2 CF="-Wflexible-array-nested" M=./net/sctp/
net/sctp/stream_interleave.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h):
./include/linux/sctp.h:611:32: warning: nested flexible array
./include/linux/sctp.h:628:33: warning: nested flexible array
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch deletes the flexible-array params[] from the structure
sctp_inithdr, sctp_addiphdr and sctp_reconf_chunk to avoid some
sparse warnings:
# make C=2 CF="-Wflexible-array-nested" M=./net/sctp/
net/sctp/input.c: note: in included file (through include/net/sctp/structs.h, include/net/sctp/sctp.h):
./include/linux/sctp.h:278:29: warning: nested flexible array
./include/linux/sctp.h:675:30: warning: nested flexible array
This warning is reported if a structure having a flexible array
member is included by other structures.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
At the stage of direction checks, the netdev reference tracker is
already initialized, but released with wrong *_put() call.
Fixes: 919e43fad516 ("xfrm: add an interface to offload policy")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Failure to add offloaded policy will cause to the following
error once user will try to reload driver.
Unregister_netdevice: waiting for eth3 to become free. Usage count = 2
This was caused by xfrm_dev_policy_add() which increments reference
to net_device. That reference was supposed to be decremented
in xfrm_dev_policy_free(). However the latter wasn't called.
unregister_netdevice: waiting for eth3 to become free. Usage count = 2
leaked reference.
xfrm_dev_policy_add+0xff/0x3d0
xfrm_policy_construct+0x352/0x420
xfrm_add_policy+0x179/0x320
xfrm_user_rcv_msg+0x1d2/0x3d0
netlink_rcv_skb+0xe0/0x210
xfrm_netlink_rcv+0x45/0x50
netlink_unicast+0x346/0x490
netlink_sendmsg+0x3b0/0x6c0
sock_sendmsg+0x73/0xc0
sock_write_iter+0x13b/0x1f0
vfs_write+0x528/0x5d0
ksys_write+0x120/0x150
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: 919e43fad516 ("xfrm: add an interface to offload policy")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
It can be really hard to analyse or debug why packets are
going missing in mac80211, so add the needed infrastructure
to use use the new per-subsystem drop reasons.
We actually use two drop reason subsystems here because of
the different handling of frames that are dropped but still
go to monitor for old versions of hostapd, and those that
are just completely unusable (e.g. crypto failed.)
Annotate a few reasons here just to illustrate this, we'll
need to go through and annotate more of them later.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Extend drop reasons to make them usable by subsystems
other than core by reserving the high 16 bits for a
new subsystem ID, of which 0 of course is used for the
existing reasons immediately.
To still be able to have string reasons, restructure
that code a bit to make the loopup under RCU, the only
user of this (right now) is drop_monitor.
Link: https://lore.kernel.org/netdev/00659771ed54353f92027702c5bbb84702da62ce.camel@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ICMPv6 error packets are not sent to the anycast destinations and this
prevents things like traceroute from working. So create a setting similar
to ECHO when dealing with Anycast sources (icmpv6_echo_ignore_anycast).
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Link: https://lore.kernel.org/r/20230419013238.2691167-1-maheshb@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The verify-enabled boolean (ETHTOOL_A_MM_VERIFY_ENABLED) was intended to
be a sub-setting of tx-enabled (ETHTOOL_A_MM_TX_ENABLED). IOW, MAC Merge
TX can be enabled with or without verification, but verification with TX
disabled makes no sense.
The pmac-enabled boolean (ETHTOOL_A_MM_PMAC_ENABLED) was intended to be
a global toggle from an API perspective, whereas tx-enabled just handles
the TX direction. IOW, the pMAC can be enabled with or without TX, but
it doesn't make sense to enable TX if the pMAC is not enabled.
Add two checks which sanitize and reject these invalid cases.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
lookups by descriptor are better off closer to syscall surface...
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
__kfree_skb_defer() uses the old naming where "defer" meant
slab bulk free/alloc APIs. In the meantime we also made
__kfree_skb_defer() feed the per-NAPI skb cache, which
implies bulk APIs. So take away the 'defer' and add 'napi'.
While at it add a drop reason. This only matters on the
tx_action path, if the skb has a frag_list. But getting
rid of a SKB_DROP_REASON_NOT_SPECIFIED seems like a net
benefit so why not.
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230420020005.815854-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Jesper points out that we must prevent recycling into cache
after page_pool_destroy() is called, because page_pool_destroy()
is not synchronized with recycling (some pages may still be
outstanding when destroy() gets called).
I assumed this will not happen because NAPI can't be scheduled
if its page pool is being destroyed. But I missed the fact that
NAPI may get reused. For instance when user changes ring configuration
driver may allocate a new page pool, stop NAPI, swap, start NAPI,
and then destroy the old pool. The NAPI is running so old page
pool will think it can recycle to the cache, but the consumer
at that point is the destroy() path, not NAPI.
To avoid extra synchronization let the drivers do "unlinking"
during the "swap" stage while NAPI is indeed disabled.
Fixes: 8c48eea3adf3 ("page_pool: allow caching from safely localized NAPI")
Reported-by: Jesper Dangaard Brouer <jbrouer@redhat.com>
Link: https://lore.kernel.org/all/e8df2654-6a5b-3c92-489d-2fe5e444135f@redhat.com/
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/r/20230419182006.719923-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Adjacent changes:
net/mptcp/protocol.h
63740448a32e ("mptcp: fix accept vs worker race")
2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close")
ddb1a072f858 ("mptcp: move first subflow allocation at mpc access time")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter and bpf.
There are a few fixes for new code bugs, including the Mellanox one
noted in the last networking pull. No known regressions outstanding.
Current release - regressions:
- sched: clear actions pointer in miss cookie init fail
- mptcp: fix accept vs worker race
- bpf: fix bpf_arch_text_poke() with new_addr == NULL on s390
- eth: bnxt_en: fix a possible NULL pointer dereference in unload
path
- eth: veth: take into account peer device for
NETDEV_XDP_ACT_NDO_XMIT xdp_features flag
Current release - new code bugs:
- eth: revert "net/mlx5: Enable management PF initialization"
Previous releases - regressions:
- netfilter: fix recent physdev match breakage
- bpf: fix incorrect verifier pruning due to missing register
precision taints
- eth: virtio_net: fix overflow inside xdp_linearize_page()
- eth: cxgb4: fix use after free bugs caused by circular dependency
problem
- eth: mlxsw: pci: fix possible crash during initialization
Previous releases - always broken:
- sched: sch_qfq: prevent slab-out-of-bounds in qfq_activate_agg
- netfilter: validate catch-all set elements
- bridge: don't notify FDB entries with "master dynamic"
- eth: bonding: fix memory leak when changing bond type to ethernet
- eth: i40e: fix accessing vsi->active_filters without holding lock
Misc:
- Mat is back as MPTCP co-maintainer"
* tag 'net-6.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (33 commits)
net: bridge: switchdev: don't notify FDB entries with "master dynamic"
Revert "net/mlx5: Enable management PF initialization"
MAINTAINERS: Resume MPTCP co-maintainer role
mailmap: add entries for Mat Martineau
e1000e: Disable TSO on i219-LM card to increase speed
bnxt_en: fix free-runnig PHC mode
net: dsa: microchip: ksz8795: Correctly handle huge frame configuration
bpf: Fix incorrect verifier pruning due to missing register precision taints
hamradio: drop ISA_DMA_API dependency
mlxsw: pci: Fix possible crash during initialization
mptcp: fix accept vs worker race
mptcp: stops worker on unaccepted sockets at listener close
net: rpl: fix rpl header size calculation
net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete()
bonding: Fix memory leak when changing bond type to Ethernet
veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features flag
mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next()
bnxt_en: Fix a possible NULL pointer dereference in unload path
bnxt_en: Do not initialize PTP on older P3/P4 chips
netfilter: nf_tables: tighten netlink attribute requirements for catch-all elements
...
|
|
Smatch complains that:
debugfs_hw_add() warn: 'statsd' is an error pointer or valid
Debugfs checks are generally not supposed to be checked for errors
and it is not necessary here.
Just delete the dead code.
Signed-off-by: Yingsha Xu <ysxu@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Link: https://lore.kernel.org/r/20230419104548.30124-1-ysxu@hust.edu.cn
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
There is a structural problem in switchdev, where the flag bits in
struct switchdev_notifier_fdb_info (added_by_user, is_local etc) only
represent a simplified / denatured view of what's in struct
net_bridge_fdb_entry :: flags (BR_FDB_ADDED_BY_USER, BR_FDB_LOCAL etc).
Each time we want to pass more information about struct
net_bridge_fdb_entry :: flags to struct switchdev_notifier_fdb_info
(here, BR_FDB_STATIC), we find that FDB entries were already notified to
switchdev with no regard to this flag, and thus, switchdev drivers had
no indication whether the notified entries were static or not.
For example, this command:
ip link add br0 type bridge && ip link set swp0 master br0
bridge fdb add dev swp0 00:01:02:03:04:05 master dynamic
has never worked as intended with switchdev. It causes a struct
net_bridge_fdb_entry to be passed to br_switchdev_fdb_notify() which has
a single flag set: BR_FDB_ADDED_BY_USER.
This is further passed to the switchdev notifier chain, where interested
drivers have no choice but to assume this is a static (does not age) and
sticky (does not migrate) FDB entry. So currently, all drivers offload
it to hardware as such, as can be seen below ("offload" is set).
bridge fdb get 00:01:02:03:04:05 dev swp0 master
00:01:02:03:04:05 dev swp0 offload master br0
The software FDB entry expires $ageing_time centiseconds after the
kernel last sees a packet with this MAC SA, and the bridge notifies its
deletion as well, so it eventually disappears from hardware too.
This is a problem, because it is actually desirable to start offloading
"master dynamic" FDB entries correctly - they should expire $ageing_time
centiseconds after the *hardware* port last sees a packet with this
MAC SA - and this is how the current incorrect behavior was discovered.
With an offloaded data plane, it can be expected that software only sees
exception path packets, so an otherwise active dynamic FDB entry would
be aged out by software sooner than it should.
With the change in place, these FDB entries are no longer offloaded:
bridge fdb get 00:01:02:03:04:05 dev swp0 master
00:01:02:03:04:05 dev swp0 master br0
and this also constitutes a better way (assuming a backport to stable
kernels) for user space to determine whether the kernel has the
capability of doing something sane with these or not.
As opposed to "master dynamic" FDB entries, on the current behavior of
which no one currently depends on (which can be deduced from the lack of
kselftests), Ido Schimmel explains that entries with the "extern_learn"
flag (BR_FDB_ADDED_BY_EXT_LEARN) should still be notified to switchdev,
since the spectrum driver listens to them (and this is kind of okay,
because although they are treated identically to "static", they are
expected to not age, and to roam).
Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about FDB add/del")
Link: https://lore.kernel.org/netdev/20230327115206.jk5q5l753aoelwus@skbuf/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20230418155902.898627-1-vladimir.oltean@nxp.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
These verify the API contracts and help exercise lifetime rules for
consumer sockets and handshake_req structures.
One way to run these tests:
./tools/testing/kunit/kunit.py run --kunitconfig ./net/handshake/.kunitconfig
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To enable kernel consumers of TLS to request a TLS handshake, add
support to net/handshake/ to request a handshake upcall.
This patch also acts as a template for adding handshake upcall
support for other kernel transport layer security providers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a kernel consumer needs a transport layer security session, it
first needs a handshake to negotiate and establish a session. This
negotiation can be done in user space via one of the several
existing library implementations, or it can be done in the kernel.
No in-kernel handshake implementations yet exist. In their absence,
we add a netlink service that can:
a. Notify a user space daemon that a handshake is needed.
b. Once notified, the daemon calls the kernel back via this
netlink service to get the handshake parameters, including an
open socket on which to establish the session.
c. Once the handshake is complete, the daemon reports the
session status and other information via a second netlink
operation. This operation marks that it is safe for the
kernel to use the open socket and the security session
established there.
The notification service uses a multicast group. Each handshake
mechanism (eg, tlshd) adopts its own group number so that the
handshake services are completely independent of one another. The
kernel can then tell via netlink_has_listeners() whether a handshake
service is active and prepared to handle a handshake request.
A new netlink operation, ACCEPT, acts like accept(2) in that it
instantiates a file descriptor in the user space daemon's fd table.
If this operation is successful, the reply carries the fd number,
which can be treated as an open and ready file descriptor.
While user space is performing the handshake, the kernel keeps its
muddy paws off the open socket. A second new netlink operation,
DONE, indicates that the user space daemon is finished with the
socket and it is safe for the kernel to use again. The operation
also indicates whether a session was established successfully.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|