Age | Commit message (Collapse) | Author |
|
there is a problem with the CWR flag set in an incoming ACK segment
and it leads to the situation when the ECE flag is latched forever
the following packetdrill script shows what happens:
// Stack receives incoming segments with CE set
+0.1 <[ect0] . 11001:12001(1000) ack 1001 win 65535
+0.0 <[ce] . 12001:13001(1000) ack 1001 win 65535
+0.0 <[ect0] P. 13001:14001(1000) ack 1001 win 65535
// Stack repsonds with ECN ECHO
+0.0 >[noecn] . 1001:1001(0) ack 12001
+0.0 >[noecn] E. 1001:1001(0) ack 13001
+0.0 >[noecn] E. 1001:1001(0) ack 14001
// Write a packet
+0.1 write(3, ..., 1000) = 1000
+0.0 >[ect0] PE. 1001:2001(1000) ack 14001
// Pure ACK received
+0.01 <[noecn] W. 14001:14001(0) ack 2001 win 65535
// Since CWR was sent, this packet should NOT have ECE set
+0.1 write(3, ..., 1000) = 1000
+0.0 >[ect0] P. 2001:3001(1000) ack 14001
// but Linux will still keep ECE latched here, with packetdrill
// flagging a missing ECE flag, expecting
// >[ect0] PE. 2001:3001(1000) ack 14001
// in the script
In the situation above we will continue to send ECN ECHO packets
and trigger the peer to reduce the congestion window. To avoid that
we can check CWR on pure ACKs received.
v3:
- Add a sequence check to avoid sending an ACK to an ACK
v2:
- Adjusted the comment
- move CWR check before checking for unacknowledged packets
Signed-off-by: Denis Kirjanov <denis.kirjanov@suse.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Without this patch, eapol frames cannot be received in mesh
mode, when 802.1X should be used. Initially only a MGTK is
defined, which is found and set as rx->key, when there are
no other keys set. ieee80211_drop_unencrypted would then
drop these eapol frames, as they are data frames without
encryption and there exists some rx->key.
Fix this by differentiating between mesh eapol frames and
other data frames with existing rx->key. Allow mesh mesh
eapol frames only if they are for our vif address.
With this patch in-place, ieee80211_rx_h_mesh_fwding continues
after the ieee80211_drop_unencrypted check and notices, that
these eapol frames have to be delivered locally, as they should.
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20200625104214.50319-1-markus.theil@tu-ilmenau.de
[small code cleanups]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When using 802.1X over mesh networks, at first an ordinary
mesh peering is established, then the 802.1X EAPOL dialog
happens, afterwards an authenticated mesh peering exchange
(AMPE) happens, finally the peering is complete and we can
set the STA authorized flag.
As 802.1X is an intermediate step here and key material is
not yet exchanged for stations we have to skip mesh path lookup
for these EAPOL frames. Otherwise the already configure mesh
group encryption key would be used to send a mesh path request
which no one can decipher, because we didn't already establish
key material on both peers, like with SAE and directly using AMPE.
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Link: https://lore.kernel.org/r/20200617082637.22670-2-markus.theil@tu-ilmenau.de
[remove pointless braces, remove unnecessary local variable,
the list can only process one such frame (or its fragments)]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Broadcast pkts like arp are getting dropped in 'ieee80211_8023_xmit'.
Fix this by replacing is_valid_ether_addr api with is_zero_ether_addr.
Fixes: 50ff477a8639 ("mac80211: add 802.11 encapsulation offloading support")
Signed-off-by: Seevalamuthu Mariappan <seevalam@codeaurora.org>
Link: https://lore.kernel.org/r/1591697754-4975-1-git-send-email-seevalam@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Doing mod_timer() conditionaly is easier than conditionally unlocking
and jumping around...
Signed-off-by: Pavel Machek (CIP) <pavel@denx.de>
Acked-by: Linus Lüssing <ll@simonwunderlich.de>
Link: https://lore.kernel.org/r/20200604214157.GA9737@amd
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The initial control port tx status patch assumed, that
we have IEEE 802.11 frames, but actually ethernet frames
are stored in the ack skb. Fix this by checking for the
correct ethertype and skb protocol 802.3.
Also allow tx status reports for ETH_P_PREAUTH, as preauth
frames can also be send over the nl80211 control port.
Fixes: a7528198add8 ("mac80211: support control port TX status reporting")
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Markus Theil <markus.theil@tu-ilmenau.de>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20200622123542.173695-1-markus.theil@tu-ilmenau.de
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Hardware device may include more than one police entry. Specifying the
action's index make it possible for several tc filters to share the same
police action when installing the filters.
Propagate this index to device drivers through the flow offload
intermediate representation, so that drivers could share a single
hardware policer between multiple filters.
v1->v2 changes:
- Update the commit message suggest by Ido Schimmel <idosch@idosch.org>
Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Current police offloading support the 'burst'' and 'rate_bytes_ps'. Some
hardware own the capability to limit the frame size. If the frame size
larger than the setting, the frame would be dropped. For the police
action itself already accept the 'mtu' parameter in tc command. But not
extend to tc flower offloading. So extend 'mtu' to tc flower offloading.
Signed-off-by: Po Liu <Po.Liu@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The helper is used in tracing programs to cast a socket
pointer to a udp6_sock pointer.
The return value could be NULL if the casting is illegal.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/bpf/20200623230815.3988481-1-yhs@fb.com
|
|
The bpf iterator for udp is implemented. Both udp4 and udp6
sockets will be traversed. It is up to bpf program to
filter for udp4 or udp6 only, or both families of sockets.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230813.3988404-1-yhs@fb.com
|
|
Similar to tcp_iter_state, a new field bpf_seq_afinfo is
added to udp_iter_state to provide bpf udp iterator
afinfo.
This does not change /proc/net/{udp, udp6} behavior. But
it enables bpf iterator to avoid get afinfo from PDE_DATA
and iterate through all udp and udp6 sockets in one pass.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230812.3988347-1-yhs@fb.com
|
|
Three more helpers are added to cast a sock_common pointer to
an tcp_sock, tcp_timewait_sock or a tcp_request_sock for
tracing programs.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230811.3988277-1-yhs@fb.com
|
|
The helper is used in tracing programs to cast a socket
pointer to a tcp6_sock pointer.
The return value could be NULL if the casting is illegal.
A new helper return type RET_PTR_TO_BTF_ID_OR_NULL is added
so the verifier is able to deduce proper return types for the helper.
Different from the previous BTF_ID based helpers,
the bpf_skc_to_tcp6_sock() argument can be several possible
btf_ids. More specifically, all possible socket data structures
with sock_common appearing in the first in the memory layout.
This patch only added socket types related to tcp and udp.
All possible argument btf_id and return value btf_id
for helper bpf_skc_to_tcp6_sock() are pre-calculcated and
cached. In the future, it is even possible to precompute
these btf_id's at kernel build time.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230809.3988195-1-yhs@fb.com
|
|
The bpf iterator for tcp is implemented. Both tcp4 and tcp6
sockets will be traversed. It is up to bpf program to
filter for tcp4 or tcp6 only, or both families of sockets.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230805.3987959-1-yhs@fb.com
|
|
A new field bpf_seq_afinfo is added to tcp_iter_state
to provide bpf tcp iterator afinfo. There are two
reasons on why we did this.
First, the current way to get afinfo from PDE_DATA
does not work for bpf iterator as its seq_file
inode does not conform to /proc/net/{tcp,tcp6}
inode structures. More specifically, anonymous
bpf iterator will use an anonymous inode which
is shared in the system and we cannot change inode
private data structure at all.
Second, bpf iterator for tcp/tcp6 wants to
traverse all tcp and tcp6 sockets in one pass
and bpf program can control whether they want
to skip one sk_family or not. Having a different
afinfo with family AF_UNSPEC make it easier
to understand in the code.
This patch does not change /proc/net/{tcp,tcp6} behavior
as the bpf_seq_afinfo will be NULL for these two proc files.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230804.3987829-1-yhs@fb.com
|
|
Using new helpers ip6t_unregister_table_pre_exit() and
ip6t_unregister_table_exit().
Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
helpers.
The pre_exit will un-register the underlying hook and .exit will do
the table freeing. The netns core does an unconditional synchronize_rcu
after the pre_exit hooks insuring no packets are in flight that have
picked up the pointer before completing the un-register.
Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Using new helpers ipt_unregister_table_pre_exit() and
ipt_unregister_table_exit().
Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
helpers.
The pre_exit will un-register the underlying hook and .exit will do the
table freeing. The netns core does an unconditional synchronize_rcu after
the pre_exit hooks insuring no packets are in flight that have picked up
the pointer before completing the un-register.
Fixes: b9e69e127397 ("netfilter: xtables: don't hook tables by default")
Signed-off-by: David Wilder <dwilder@us.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The user tool modinfo is used to get information on kernel modules, including a
description where it is available.
This patch adds a brief MODULE_DESCRIPTION to netfilter kernel modules
(descriptions taken from Kconfig file or code comments)
Signed-off-by: Rob Gill <rrobgill@protonmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When using ip_set with counters and comment, traffic causes the kernel
to panic on 32-bit ARM:
Alignment trap: not handling instruction e1b82f9f at [<bf01b0dc>]
Unhandled fault: alignment exception (0x221) at 0xea08133c
PC is at ip_set_match_extensions+0xe0/0x224 [ip_set]
The problem occurs when we try to update the 64-bit counters - the
faulting address above is not 64-bit aligned. The problem occurs
due to the way elements are allocated, for example:
set->dsize = ip_set_elem_len(set, tb, 0, 0);
map = ip_set_alloc(sizeof(*map) + elements * set->dsize);
If the element has a requirement for a member to be 64-bit aligned,
and set->dsize is not a multiple of 8, but is a multiple of four,
then every odd numbered elements will be misaligned - and hitting
an atomic64_add() on that element will cause the kernel to panic.
ip_set_elem_len() must return a size that is rounded to the maximum
alignment of any extension field stored in the element. This change
ensures that is the case.
Fixes: 95ad1f4a9358 ("netfilter: ipset: Fix extension alignment")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The driver for Marvell switches puts all ports in IGMP snooping mode
which results in all IGMP/MLD frames that ingress on the ports to be
forwarded to the CPU only.
The bridge code in the kernel can then interpret these frames and act
upon them, for instance by updating the mdb in the switch to reflect
multicast memberships of stations connected to the ports. However,
the IGMP/MLD frames must then also be forwarded to other ports of the
bridge so external IGMP queriers can track membership reports, and
external multicast clients can receive query reports from foreign IGMP
queriers.
Currently, this is impossible as the EDSA tagger sets offload_fwd_mark
on the skb when it unwraps the tagged frames, and that will make the
switchdev layer prevent the skb from egressing on any other port of
the same switch.
To fix that, look at the To_CPU code in the DSA header and make
forwarding of the frame possible for trapped IGMP packets.
Introduce some #defines for the frame types to make the code a bit more
comprehensive.
This was tested on a Marvell 88E6352 variant.
Signed-off-by: Daniel Mack <daniel@zonque.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we modify or create a new fdb entry sometimes we want to avoid
refreshing its activity in order to track it properly. One example is
when a mac is received from EVPN multi-homing peer by FRR, which doesn't
want to change local activity accounting. It makes it static and sets a
flag to track its activity.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds the ability to notify about activity of any entries
(static, permanent or ext_learn). EVPN multihoming peers need it to
properly and efficiently handle mac sync (peer active/locally active).
We add a new NFEA_ACTIVITY_NOTIFY attribute which is used to dump the
current activity state and to control if static entries should be monitored
at all. We use 2 bits - one to activate fdb entry tracking (disabled by
default) and the second to denote that an entry is inactive. We need
the second bit in order to avoid multiple notifications of inactivity.
Obviously this makes no difference for dynamic entries since at the time
of inactivity they get deleted, while the tracked non-dynamic entries get
the inactive bit set and get a notification.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add an attribute to NDA which will contain all future fdb-specific
attributes in order to avoid polluting the NDA namespace with e.g.
bridge or vxlan specific attributes. The attribute is called
NDA_FDB_EXT_ATTRS and the structure would look like:
[NDA_FDB_EXT_ATTRS] = {
[NFEA_xxx]
}
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can just pass ndm as an argument instead of its fields separately.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
execute_check_pkt_len
ovs connection tracking module performs de-fragmentation on incoming
fragmented traffic. Take info account if traffic has been de-fragmented
in execute_check_pkt_len action otherwise we will perform the wrong
nested action considering the original packet size. This issue typically
occurs if ovs-vswitchd adds a rule in the pipeline that requires connection
tracking (e.g. OVN stateful ACLs) before execute_check_pkt_len action.
Moreover take into account GSO fragment size for GSO packet in
execute_check_pkt_len routine
Fixes: 4d5ec89fc8d14 ("net: openvswitch: Add a new action check_pkt_len")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds support of SO_KEEPALIVE flag and TCP related options
to bpf_setsockopt() routine. This is helpful if we want to enable or tune
TCP keepalive for applications which don't do it in the userspace code.
v3:
- update kernel-doc in uapi (Nikita Vetoshkin <nekto0n@yandex-team.ru>)
v4:
- update kernel-doc in tools too (Alexei Starovoitov)
- add test to selftests (Alexei Starovoitov)
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200620153052.9439-3-zeil@yandex-team.ru
|
|
This is preparation for usage in bpf_setsockopt.
v2:
- remove redundant EXPORT_SYMBOL (Alexei Starovoitov)
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200620153052.9439-2-zeil@yandex-team.ru
|
|
This is preparation for usage in bpf_setsockopt.
Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200620153052.9439-1-zeil@yandex-team.ru
|
|
Clearing the sock TX queue in sk_set_socket() might cause unexpected
out-of-order transmit when called from sock_orphan(), as outstanding
packets can pick a different TX queue and bypass the ones already queued.
This is undesired in general. More specifically, it breaks the in-order
scheduling property guarantee for device-offloaded TLS sockets.
Remove the call to sk_tx_queue_clear() in sk_set_socket(), and add it
explicitly only where needed.
Fixes: e022f0b4a03f ("net: Introduce sk_tx_queue_mapping")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dev cannot be NULL here since its already being accessed
before. Remove the redundant null check.
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb cannot be NULL here since its already being accessed
before: sock_net(skb->sk). Remove the redundant null check.
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes. Also, remove unnecessary
function ipv6_rpl_srh_alloc_size() and replace kzalloc() with kcalloc(),
which has a 2-factor argument form for multiplication.
This code was detected with the help of Coccinelle and, audited and
fixed manually.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A last minute change put the TDR cable test parameters into a nest.
The validation is not sufficient, resulting in an oops if the nest is
missing. Set default values first, then update them if the nest is
provided.
Fixes: f2bc8ad31a7f ("net: ethtool: Allow PHY cable test TDR data to configured")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This removes following warnings :
CC net/ipv4/udp_offload.o
net/ipv4/udp_offload.c:504:17: warning: no previous prototype for 'udp4_gro_receive' [-Wmissing-prototypes]
504 | struct sk_buff *udp4_gro_receive(struct list_head *head, struct sk_buff *skb)
| ^~~~~~~~~~~~~~~~
net/ipv4/udp_offload.c:584:29: warning: no previous prototype for 'udp4_gro_complete' [-Wmissing-prototypes]
584 | INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
| ^~~~~~~~~~~~~~~~~
CHECK net/ipv6/udp_offload.c
net/ipv6/udp_offload.c:115:16: warning: symbol 'udp6_gro_receive' was not declared. Should it be static?
net/ipv6/udp_offload.c:148:29: warning: symbol 'udp6_gro_complete' was not declared. Should it be static?
CC net/ipv6/udp_offload.o
net/ipv6/udp_offload.c:115:17: warning: no previous prototype for 'udp6_gro_receive' [-Wmissing-prototypes]
115 | struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb)
| ^~~~~~~~~~~~~~~~
net/ipv6/udp_offload.c:148:29: warning: no previous prototype for 'udp6_gro_complete' [-Wmissing-prototypes]
148 | INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff)
| ^~~~~~~~~~~~~~~~~
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch removes following (C=1 W=1) warnings for CONFIG_RETPOLINE=y :
net/ipv4/tcp_offload.c:306:16: warning: symbol 'tcp4_gro_receive' was not declared. Should it be static?
net/ipv4/tcp_offload.c:306:17: warning: no previous prototype for 'tcp4_gro_receive' [-Wmissing-prototypes]
net/ipv4/tcp_offload.c:319:29: warning: symbol 'tcp4_gro_complete' was not declared. Should it be static?
net/ipv4/tcp_offload.c:319:29: warning: no previous prototype for 'tcp4_gro_complete' [-Wmissing-prototypes]
CHECK net/ipv6/tcpv6_offload.c
net/ipv6/tcpv6_offload.c:16:16: warning: symbol 'tcp6_gro_receive' was not declared. Should it be static?
net/ipv6/tcpv6_offload.c:29:29: warning: symbol 'tcp6_gro_complete' was not declared. Should it be static?
CC net/ipv6/tcpv6_offload.o
net/ipv6/tcpv6_offload.c:16:17: warning: no previous prototype for 'tcp6_gro_receive' [-Wmissing-prototypes]
16 | struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb)
| ^~~~~~~~~~~~~~~~
net/ipv6/tcpv6_offload.c:29:29: warning: no previous prototype for 'tcp6_gro_complete' [-Wmissing-prototypes]
29 | INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff)
| ^~~~~~~~~~~~~~~~~
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Declare ipv4_specific once, in tcp.h were it belongs.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ipv6_specific should be declared in tcp include files,
not mptcp.
This removes the following warning :
CHECK net/ipv6/tcp_ipv6.c
net/ipv6/tcp_ipv6.c:78:42: warning: symbol 'ipv6_specific' was not declared. Should it be static?
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rather than requiring every hw crypto capable NIC driver to do a check for
slave_dev being set, set real_dev in the xfrm layer and xso init time, and
then override it in the bonding driver as needed. Then NIC drivers can
always use real_dev, and at the same time, we eliminate the use of a
variable name that probably shouldn't have been used in the first place,
particularly given recent current events.
CC: Boris Pismenny <borisp@mellanox.com>
CC: Saeed Mahameed <saeedm@mellanox.com>
CC: Leon Romanovsky <leon@kernel.org>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
Suggested-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It was reported that a considerable amount of cycles were spent on the
expensive indirect calls on fib6_rule_lookup. This patch introduces an
inline helper called pol_route_func that uses the indirect_call_wrappers
to avoid the indirect calls.
This patch saves around 50ns per call.
Performance was measured on the receiver by checking the amount of
syncookies that server was able to generate under a synflood load.
Traffic was generated using trafgen[1] which was pushing around 1Mpps on
a single queue. Receiver was using only one rx queue which help to
create a bottle neck and make the experiment rx-bounded.
These are the syncookies generated over 10s from the different runs:
Whithout the patch:
TcpExtSyncookiesSent 3553749 0.0
TcpExtSyncookiesSent 3550895 0.0
TcpExtSyncookiesSent 3553845 0.0
TcpExtSyncookiesSent 3541050 0.0
TcpExtSyncookiesSent 3539921 0.0
TcpExtSyncookiesSent 3557659 0.0
TcpExtSyncookiesSent 3526812 0.0
TcpExtSyncookiesSent 3536121 0.0
TcpExtSyncookiesSent 3529963 0.0
TcpExtSyncookiesSent 3536319 0.0
With the patch:
TcpExtSyncookiesSent 3611786 0.0
TcpExtSyncookiesSent 3596682 0.0
TcpExtSyncookiesSent 3606878 0.0
TcpExtSyncookiesSent 3599564 0.0
TcpExtSyncookiesSent 3601304 0.0
TcpExtSyncookiesSent 3609249 0.0
TcpExtSyncookiesSent 3617437 0.0
TcpExtSyncookiesSent 3608765 0.0
TcpExtSyncookiesSent 3620205 0.0
TcpExtSyncookiesSent 3601895 0.0
Without the patch the average is 354263 pkt/s or 2822 ns/pkt and with
the patch the average is 360738 pkt/s or 2772 ns/pkt which gives an
estimate of 50 ns per packet.
[1] http://netsniff-ng.org/
Changelog since v1:
- Change ordering in the ICW (Paolo Abeni)
Cc: Luigi Rizzo <lrizzo@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit e585f2363637 ("udp: Changes to udp_offload to support remote
checksum offload") added new GSO type and a corresponding netdev
feature, but missed Ethtool's 'netdev_features_strings' table.
Give it a name so it will be exposed to userspace and become available
for manual configuration.
v3:
- decouple from "netdev_features_strings[] cleanup" series;
- no functional changes.
v2:
- don't split the "Fixes:" tag across lines;
- no functional changes.
Fixes: e585f2363637 ("udp: Changes to udp_offload to support remote checksum offload")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds specific checks for primary(0x0) and secondary(0x1) when
setting the port role. For any other value the function
'br_mrp_set_port_role' will return -EINVAL.
Fixes: 20f6a05ef63594 ("bridge: mrp: Rework the MRP netlink interface")
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In RFC 8684, we don't need to send sndr_key in SYN package anymore, so drop
it.
Fixes: cc7972ea1932 ("mptcp: parse and emit MP_CAPABLE option according to v1 spec")
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
arg cannot be NULL since its already being dereferenced
before. Remove the redundant NULL check.
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix check in ethtool_rx_flow_rule_create
Fixes: eca4205f9ec3 ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator")
Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When an interface is being deleted, "/proc/net/dev_snmp6/<interface name>"
is deleted.
The function for this is addrconf_ifdown() in the addrconf_notify() and
it is called by notification, which is NETDEV_UNREGISTER.
But, if NETDEV_CHANGEMTU is triggered after NETDEV_UNREGISTER,
this proc file will be created again.
This recreated proc file will be deleted by netdev_wati_allrefs().
Before netdev_wait_allrefs() is called, creating a new HSR interface
routine can be executed and It tries to create a proc file but it will
find an un-deleted proc file.
At this point, it warns about it.
To avoid this situation, it can use ->dellink() instead of
->ndo_uninit() to release resources because ->dellink() is called
before NETDEV_UNREGISTER.
So, a proc file will not be recreated.
Test commands
ip link add dummy0 type dummy
ip link add dummy1 type dummy
ip link set dummy0 mtu 1300
#SHELL1
while :
do
ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1
done
#SHELL2
while :
do
ip link del hsr0
done
Splat looks like:
[ 9888.980852][ T2752] proc_dir_entry 'dev_snmp6/hsr0' already registered
[ 9888.981797][ C2] WARNING: CPU: 2 PID: 2752 at fs/proc/generic.c:372 proc_register+0x2d5/0x430
[ 9888.981798][ C2] Modules linked in: hsr dummy veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6x
[ 9888.981814][ C2] CPU: 2 PID: 2752 Comm: ip Tainted: G W 5.8.0-rc1+ #616
[ 9888.981815][ C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 9888.981816][ C2] RIP: 0010:proc_register+0x2d5/0x430
[ 9888.981818][ C2] Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 65 01 00 00 49 8b b5 e0 00 00 00 48 89 ea 40
[ 9888.981819][ C2] RSP: 0018:ffff8880628dedf0 EFLAGS: 00010286
[ 9888.981821][ C2] RAX: dffffc0000000008 RBX: ffff888028c69170 RCX: ffffffffaae09a62
[ 9888.981822][ C2] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88806c9f75ac
[ 9888.981823][ C2] RBP: ffff888028c693f4 R08: ffffed100d9401bd R09: ffffed100d9401bd
[ 9888.981824][ C2] R10: ffffffffaddf406f R11: 0000000000000001 R12: ffff888028c69308
[ 9888.981825][ C2] R13: ffff8880663584c8 R14: dffffc0000000000 R15: ffffed100518d27e
[ 9888.981827][ C2] FS: 00007f3876b3b0c0(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
[ 9888.981828][ C2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9888.981829][ C2] CR2: 00007f387601a8c0 CR3: 000000004101a002 CR4: 00000000000606e0
[ 9888.981830][ C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9888.981831][ C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9888.981832][ C2] Call Trace:
[ 9888.981833][ C2] ? snmp6_seq_show+0x180/0x180
[ 9888.981834][ C2] proc_create_single_data+0x7c/0xa0
[ 9888.981835][ C2] snmp6_register_dev+0xb0/0x130
[ 9888.981836][ C2] ipv6_add_dev+0x4b7/0xf60
[ 9888.981837][ C2] addrconf_notify+0x684/0x1ca0
[ 9888.981838][ C2] ? __mutex_unlock_slowpath+0xd0/0x670
[ 9888.981839][ C2] ? kasan_unpoison_shadow+0x30/0x40
[ 9888.981840][ C2] ? wait_for_completion+0x250/0x250
[ 9888.981841][ C2] ? inet6_ifinfo_notify+0x100/0x100
[ 9888.981842][ C2] ? dropmon_net_event+0x227/0x410
[ 9888.981843][ C2] ? notifier_call_chain+0x90/0x160
[ 9888.981844][ C2] ? inet6_ifinfo_notify+0x100/0x100
[ 9888.981845][ C2] notifier_call_chain+0x90/0x160
[ 9888.981846][ C2] register_netdevice+0xbe5/0x1070
[ ... ]
Reported-by: syzbot+1d51c8b74efa4c44adeb@syzkaller.appspotmail.com
Fixes: e0a4b99773d3 ("hsr: use upper/lower device infrastructure")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A netdevice may be marked as detached because the parent is
runtime-suspended and not accessible whilst interface or link is down.
An example are PCI network devices that go into PCI D3hot, see e.g.
__igc_shutdown() or rtl8169_net_suspend().
If netdevice is down and marked as detached we can only open it if
we runtime-resume it before __dev_open() calls netif_device_present().
Therefore, if netdevice is detached, try to runtime-resume the parent
and only return with an error if it's still detached.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Board serial number is a serial number, often available in PCI
*Vital Product Data*.
Also, update devlink-info.rst documentation file.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|