Age | Commit message (Collapse) | Author |
|
kernel test robot says:
net/mptcp/syncookies.c: In function 'mptcp_join_cookie_init':
include/linux/kernel.h:47:38: warning: division by zero [-Wdiv-by-zero]
#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
I forgot that spinock_t size is 0 on UP, so ARRAY_SIZE cannot be used.
Fixes: 9466a1ccebbe54 ("mptcp: enable JOIN requests even if cookies are in use")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When vxlan interface is deleted, all fdbs are deleted by vxlan_flush().
vxlan_flush() flushes fdbs but it doesn't delete fdb, which contains
all-zeros-mac because it is deleted by vxlan_uninit().
But vxlan_uninit() deletes only the fdb, which contains both all-zeros-mac
and default vni.
So, the fdb, which contains both all-zeros-mac and non-default vni
will not be deleted.
Test commands:
ip link add vxlan0 type vxlan dstport 4789 external
ip link set vxlan0 up
bridge fdb add to 00:00:00:00:00:00 dst 172.0.0.1 dev vxlan0 via lo \
src_vni 10000 self permanent
ip link del vxlan0
kmemleak reports as follows:
unreferenced object 0xffff9486b25ced88 (size 96):
comm "bridge", pid 2151, jiffies 4294701712 (age 35506.901s)
hex dump (first 32 bytes):
02 00 00 00 ac 00 00 01 40 00 09 b1 86 94 ff ff ........@.......
46 02 00 00 00 00 00 00 a7 03 00 00 12 b5 6a 6b F.............jk
backtrace:
[<00000000c10cf651>] vxlan_fdb_append.part.51+0x3c/0xf0 [vxlan]
[<000000006b31a8d9>] vxlan_fdb_create+0x184/0x1a0 [vxlan]
[<0000000049399045>] vxlan_fdb_update+0x12f/0x220 [vxlan]
[<0000000090b1ef00>] vxlan_fdb_add+0x12a/0x1b0 [vxlan]
[<0000000056633c2c>] rtnl_fdb_add+0x187/0x270
[<00000000dd5dfb6b>] rtnetlink_rcv_msg+0x264/0x490
[<00000000fc44dd54>] netlink_rcv_skb+0x4a/0x110
[<00000000dff433e7>] netlink_unicast+0x18e/0x250
[<00000000b87fb421>] netlink_sendmsg+0x2e9/0x400
[<000000002ed55153>] ____sys_sendmsg+0x237/0x260
[<00000000faa51c66>] ___sys_sendmsg+0x88/0xd0
[<000000006c3982f1>] __sys_sendmsg+0x4e/0x80
[<00000000a8f875d2>] do_syscall_64+0x56/0xe0
[<000000003610eefa>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
unreferenced object 0xffff9486b1c40080 (size 128):
comm "bridge", pid 2157, jiffies 4294701754 (age 35506.866s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 f8 dc 42 b2 86 94 ff ff ..........B.....
6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
backtrace:
[<00000000a2981b60>] vxlan_fdb_create+0x67/0x1a0 [vxlan]
[<0000000049399045>] vxlan_fdb_update+0x12f/0x220 [vxlan]
[<0000000090b1ef00>] vxlan_fdb_add+0x12a/0x1b0 [vxlan]
[<0000000056633c2c>] rtnl_fdb_add+0x187/0x270
[<00000000dd5dfb6b>] rtnetlink_rcv_msg+0x264/0x490
[<00000000fc44dd54>] netlink_rcv_skb+0x4a/0x110
[<00000000dff433e7>] netlink_unicast+0x18e/0x250
[<00000000b87fb421>] netlink_sendmsg+0x2e9/0x400
[<000000002ed55153>] ____sys_sendmsg+0x237/0x260
[<00000000faa51c66>] ___sys_sendmsg+0x88/0xd0
[<000000006c3982f1>] __sys_sendmsg+0x4e/0x80
[<00000000a8f875d2>] do_syscall_64+0x56/0xe0
[<000000003610eefa>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 3ad7a4b141eb ("vxlan: support fdb and learning in COLLECT_METADATA mode")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It turns out that on commit 41d707b7332f ("fib: fix fib_rules_ops
indirect calls wrappers") I forgot to include the case when
CONFIG_IP_MULTIPLE_TABLES is not set.
Fixes: 41d707b7332f ("fib: fix fib_rules_ops indirect calls wrappers")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Brian Vazquez <brianvv@google.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fixes these errors:
net/ipv4/syncookies.c: In function 'tcp_get_cookie_sock':
net/ipv4/syncookies.c:216:19: error: 'struct tcp_request_sock' has no
member named 'drop_req'
216 | if (tcp_rsk(req)->drop_req) {
| ^~
net/ipv4/syncookies.c: In function 'cookie_tcp_reqsk_alloc':
net/ipv4/syncookies.c:289:27: warning: unused variable 'treq'
[-Wunused-variable]
289 | struct tcp_request_sock *treq;
| ^~~~
make[3]: *** [scripts/Makefile.build:280: net/ipv4/syncookies.o] Error 1
make[3]: *** Waiting for unfinished jobs....
Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fix from Linus Walleij:
"A single last minute pin control fix to the Qualcomm driver fixing
missing dual edge PCH interrupts"
* tag 'pinctrl-v5.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: qcom: Handle broken/missing PDC dual edge IRQs on sc7180
|
|
This is a collection of minor fixes including typos, white space, and
style. No functional changes.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
|
|
The profile ID map lock should be held till the caller completes
all references of that profile entries.
The current code releases the lock right after the match search.
This caused a driver issue when the profile map entries were
referenced after it was freed in other thread after the lock was
released earlier.
Signed-off-by: Victor Raj <victor.raj@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Update the PTYPE lookup table to reflect values that can be set by the
hardware.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
|
|
Disable VLAN pruning when entering promiscuous mode, and re-enable it
when exiting.
Without this VLAN-over-bridge topologies created on the device won't be
functional unless rx-vlan-filter is explicitly disabled with ethtool.
Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In the ice_init_hw_tbls, if the devm_kcalloc for es->written fails, catch
that error and bail out gracefully, instead of continuing with a NULL
pointer.
Fixes: 32d63fa1e9f3 ("ice: Initialize DDP package structures")
Signed-off-by: Surabhi Boob <surabhi.boob@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This is a port of commit 248de22e638f ("i40e/i40evf: Account for frags
split over multiple descriptors in check linearize")
As part of testing workloads (read/write) using larger IO size (128K)
tx_timeout is observed and whenever it happens, it was due to
tx_linearize.
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Currently VFs are only allowed to get 16, 4, and 1 queue pair by
default, which require 17, 5, and 2 MSI-X vectors respectively. This
is because each VF needs a MSI-X per data queue and a MSI-X for its
other interrupt. The calculation is based on the number of VFs created,
MSI-X available, and queue pairs available at the time of VF creation.
Unfortunately the values above exclude 2 queue pairs when only 3 MSI-X
are available to each VF based on resource constraints. The current
calculation would default to 2 MSI-X and 1 queue pair. This is a waste
of resources, so fix this by allowing 2 queue pairs per VF when there
are between 2 and 5 MSI-X available per VF.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This fix has been added to address memory leak issues resulting from
triggering a sudden driver reset which does not allow us to follow our
normal removal flows for SW XLT entries for advanced features.
- Adding call to destroy flow profile locks when clearing SW XLT tables.
- Extraction sequence entries were not correctly cleared previously
which could cause ownership conflicts for repeated reset-replay calls.
Fixes: 31ad4e4ee1e4 ("ice: Allocate flow profile")
Signed-off-by: Vignesh Sridhar <vignesh.sridhar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Display and count some useful hot-path statistics. The usefulness is as
follows:
- tx_restart: use to determine if the transmit ring size is too small or
if the transmit interrupt rate is too low.
- rx_gro_dropped: use to count drops from GRO layer, which previously were
completely uncounted when occurring.
- tx_busy: use to determine when the driver is miscounting number of
descriptors needed for an skb.
- tx_timeout: as our other drivers, count the number of times we've reset
due to timeout because the kernel only prints a warning once per netdev.
Several of these were already counted but not displayed.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The page reuse statistic wasn't even being displayed to the user, even
though the driver counted it. Don't waste the struct space and hot-path
cycles since the driver doesn't display it.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Replacing flow profile locks with RSS profile locks in the function to
remove all RSS rules for a given VSI. This is to align the locks used
for RSS rule addition to VSI and removal during VSI teardown to avoid
a race condition owing to several iterations of the above operations.
In function to get RSS rules for given VSI and protocol header replacing
the pointer reference of the RSS entry with a copy of hash value to
ensure thread safety.
Signed-off-by: Vignesh Sridhar <vignesh.sridhar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
set_rss_lut can fail due to incorrect vsi_id mask. vsi_id is 10 bit
but mask was 0x1FF whereas it should be 0x3FF.
For vsi_num >= 512, FW set_rss_lut can fail with return code
EACCESS (VSI ownership issue) because software was providing
incorrect vsi_num (dropping 10th bit due to incorrect mask) for
set_rss_lut admin command
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The grst_delay variable in ice_check_reset contains the maximum time
(in 100 msec units) that the driver will wait for a reset event to
transition to the Device Active state. The value is the sum of three
separate components:
1) The maximum time it may take for the firmware to process its
outstanding command before handling the reset request.
2) The value in RSTCTL.GRSTDEL (the delay firmware inserts between first
seeing the driver reset request and the actual hardware assertion).
3) The maximum expected reset processing time in hardware.
Referring to this total time as "grst_delay" is misleading and
potentially confusing to someone checking the code and cross-referencing
the hardware specification.
Fix this by renaming the variable to "grst_timeout", which is more
descriptive of its actual use.
Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In certain configurations without power management support, the
following warnings happen:
drivers/net/ethernet/intel/ice/ice_main.c:4214:12: warning:
'ice_resume' defined but not used [-Wunused-function]
4214 | static int ice_resume(struct device *dev)
| ^~~~~~~~~~
drivers/net/ethernet/intel/ice/ice_main.c:4150:12: warning:
'ice_suspend' defined but not used [-Wunused-function]
4150 | static int ice_suspend(struct device *dev)
| ^~~~~~~~~~~
Mark these functions as __maybe_unused to make it clear to the
compiler that this is going to happen based on the configuration,
which is the standard for these types of functions.
Fixes: 769c500dcc1e ("ice: Add advanced power mgmt for WoL")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
We have a number of changes
* code cleanups and fixups as usual
* AQL & internal TXQ improvements from Felix
* some mesh 802.1X support bits
* some injection improvements from Mathy of KRACK
fame, so we'll see what this results in ;-)
* some more initial S1G supports bits, this time
(some of?) the userspace APIs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
netdev protodown is a mechanism that allows protocols to
hold an interface down. It was initially introduced in
the kernel to hold links down by a multihoming protocol.
There was also an attempt to introduce protodown
reason at the time but was rejected. protodown and protodown reason
is supported by almost every switching and routing platform.
It was ok for a while to live without a protodown reason.
But, its become more critical now given more than
one protocol may need to keep a link down on a system
at the same time. eg: vrrp peer node, port security,
multihoming protocol. Its common for Network operators and
protocol developers to look for such a reason on a networking
box (Its also known as errDisable by most networking operators)
This patch adds support for link protodown reason
attribute. There are two ways to maintain protodown
reasons.
(a) enumerate every possible reason code in kernel
- A protocol developer has to make a request and
have that appear in a certain kernel version
(b) provide the bits in the kernel, and allow user-space
(sysadmin or NOS distributions) to manage the bit-to-reasonname
map.
- This makes extending reason codes easier (kind of like
the iproute2 table to vrf-name map /etc/iproute2/rt_tables.d/)
This patch takes approach (b).
a few things about the patch:
- It treats the protodown reason bits as counter to indicate
active protodown users
- Since protodown attribute is already an exposed UAPI,
the reason is not enforced on a protodown set. Its a no-op
if not used.
the patch follows the below algorithm:
- presence of reason bits set indicates protodown
is in use
- user can set protodown and protodown reason in a
single or multiple setlink operations
- setlink operation to clear protodown, will return -EBUSY
if there are active protodown reason bits
- reason is not included in link dumps if not used
example with patched iproute2:
$cat /etc/iproute2/protodown_reasons.d/r.conf
0 mlag
1 evpn
2 vrrp
3 psecurity
$ip link set dev vxlan0 protodown on protodown_reason vrrp on
$ip link set dev vxlan0 protodown_reason mlag on
$ip link show
14: vxlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
link/ether f6:06:be:17:91:e7 brd ff:ff:ff:ff:ff:ff protodown on <mlag,vrrp>
$ip link set dev vxlan0 protodown_reason mlag off
$ip link set dev vxlan0 protodown off protodown_reason vrrp off
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2020-07-31
The following pull-request contains BPF updates for your *net* tree.
We've added 5 non-merge commits during the last 21 day(s) which contain
a total of 5 files changed, 126 insertions(+), 18 deletions(-).
The main changes are:
1) Fix a map element leak in HASH_OF_MAPS map type, from Andrii Nakryiko.
2) Fix a NULL pointer dereference in __btf_resolve_helper_id() when no
btf_vmlinux is available, from Peilin Ye.
3) Init pos variable in __bpfilter_process_sockopt(), from Christoph Hellwig.
4) Fix a cgroup sockopt verifier test by specifying expected attach type,
from Jean-Philippe Brucker.
Note that when net gets merged into net-next later on, there is a small
merge conflict in kernel/bpf/btf.c between commit 5b801dfb7feb ("bpf: Fix
NULL pointer dereference in __btf_resolve_helper_id()") from the bpf tree
and commit 138b9a0511c7 ("bpf: Remove btf_id helpers resolving") from the
net-next tree.
Resolve as follows: remove the old hunk with the __btf_resolve_helper_id()
function. Change the btf_resolve_helper_id() so it actually tests for a
NULL btf_vmlinux and bails out:
int btf_resolve_helper_id(struct bpf_verifier_log *log,
const struct bpf_func_proto *fn, int arg)
{
int id;
if (fn->arg_type[arg] != ARG_PTR_TO_BTF_ID || !btf_vmlinux)
return -EINVAL;
id = fn->btf_id[arg];
if (!id || id > btf_vmlinux->nr_types)
return -EINVAL;
return id;
}
Let me know if you run into any others issues (CC'ing Jiri Olsa so he's in
the loop with regards to merge conflict resolution).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We expecte prog_p to be protected by rcu, so adding the rcu annotation
to fix the following sparse warning:
drivers/net/tun.c:3003:36: warning: incorrect type in argument 2 (different address spaces)
drivers/net/tun.c:3003:36: expected struct tun_prog [noderef] __rcu **prog_p
drivers/net/tun.c:3003:36: got struct tun_prog **prog_p
drivers/net/tun.c:3292:42: warning: incorrect type in argument 2 (different address spaces)
drivers/net/tun.c:3292:42: expected struct tun_prog **prog_p
drivers/net/tun.c:3292:42: got struct tun_prog [noderef] __rcu **
drivers/net/tun.c:3296:42: warning: incorrect type in argument 2 (different address spaces)
drivers/net/tun.c:3296:42: expected struct tun_prog **prog_p
drivers/net/tun.c:3296:42: got struct tun_prog [noderef] __rcu **
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2020-07-31
1) Fix policy matching with mark and mask on userspace interfaces.
From Xin Long.
2) Several fixes for the new ESP in TCP encapsulation.
From Sabrina Dubroca.
3) Fix crash when the hold queue is used. The assumption that
xdst->path and dst->child are not a NULL pointer only if dst->xfrm
is not a NULL pointer is true with the exception of using the
hold queue. Fix this by checking for hold queue usage before
dereferencing xdst->path or dst->child.
4) Validate pfkey_dump parameter before sending them.
From Mark Salyzyn.
5) Fix the location of the transport header with ESP in UDPv6
encapsulation. From Sabrina Dubroca.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2020-07-30
This small patchset introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v4.18:
('net/mlx5e: fix bpf_prog reference count leaks in mlx5e_alloc_rq')
For -stable v5.7:
('net/mlx5e: E-Switch, Add misc bit when misc fields changed for mirroring')
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This change adds TCP_NLA_EDT to SCM_TIMESTAMPING_OPT_STATS that reports
the earliest departure time(EDT) of the timestamped skb. By tracking EDT
values of the skb from different timestamps, we can observe when and how
much the value changed. This allows to measure the precise delay
injected on the sender host e.g. by a bpf-base throttler.
Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Tony Nguyen says:
====================
1GbE Intel Wired LAN Driver Updates 2020-07-30
This series contains updates to e100, e1000, e1000e, igb, igbvf, ixgbe,
ixgbevf, iavf, and driver documentation.
Vaibhav Gupta converts legacy .suspend() and .resume() to generic PM
callbacks for e100, igbvf, ixgbe, ixgbevf, and iavf.
Suraj Upadhyay replaces 1 byte memsets with assignments for e1000,
e1000e, igb, and ixgbe.
Alexander Klimov replaces http links with https.
Miaohe Lin replaces uses of memset to clear MAC addresses with
eth_zero_addr().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Florian Westphal says:
====================
mptcp: add syncookie support
Changes in v2:
- first patch renames req->ts_cookie to req->syncookie instead of
removing ts_cookie member.
- patch to add 'want_cookie' arg to init_req() functions has been dropped.
All users of that arg were changed to check 'req->syncookie' instead.
v1 cover letter:
When syn-cookies are used the SYN?ACK never contains a MPTCP option,
because the code path that creates a request socket based on a valid
cookie ACK lacks the needed changes to construct MPTCP request sockets.
After this series, if SYN carries MP_CAPABLE option, the option is not
cleared anymore and request socket will be reconstructed using the
MP_CAPABLE option data that is re-sent with the ACK.
This means that no additional state gets encoded into the syn cookie or
the TCP timestamp.
There are two caveats for SYN-Cookies with MPTCP:
1. When syn-cookies are used, the server-generated key is not stored.
The drawback is that the next connection request that comes in before
the cookie-ACK has a small chance that it will generate the same local_key.
If this happens, the cookie ACK that comes in second will (re)compute the
token hash and then detects that this is already in use.
Unlike normal case, where the server will pick a new key value and then
re-tries, we can't do that because we already committed to the key value
(it was sent to peer already).
Im this case, MPTCP cannot be used and late TCP fallback happens.
2). SYN packets with a MP_JOIN requests cannot be handled without storing
state. This is because the SYN contains a nonce value that is needed to
verify the HMAC of the MP_JOIN ACK that completes the three-way
handshake. Also, a local nonce is generated and used in the cookie
SYN/ACK.
There are only 2 ways to solve this:
a) Do not support JOINs when cookies are in effect.
b) Store the nonces somewhere.
The approach chosen here is b).
Patch 8 adds a fixed-size (1024 entries) state table to store the
information required to validate the MP_JOIN ACK and re-build the
request socket.
State gets stored when syn-cookies are active and the token in the JOIN
request referred to an established MPTCP connection that can also accept
a new subflow.
State is restored if the ACK cookie is valid, an MP_JOIN option is present
and the state slot contains valid data from a previous SYN.
After the request socket has been re-build, normal HMAC check is done just
as without syn cookies.
Largely identical to last RFC, except patch #8 which follows Paolos
suggestion to use a private table storage area rather than keeping
request sockets around. This also means I dropped the patch to remove
const qualifier from sk_listener pointers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Also add test cases with MP_JOIN when tcp_syncookies sysctl is 2 (i.e.,
syncookies are always-on).
While at it, also print the test number and add the test number
to the pcap files that can be generated optionally.
This makes it easier to match the pcap to the test case.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
check we can establish connections also when syn cookies are in use.
Check that
MPTcpExtMPCapableSYNRX and MPTcpExtMPCapableACKRX increase for each
MPTCP test.
Check TcpExtSyncookiesSent and TcpExtSyncookiesRecv increase in netns2.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
JOIN requests do not work in syncookie mode -- for HMAC validation, the
peers nonce and the mptcp token (to obtain the desired connection socket
the join is for) are required, but this information is only present in the
initial syn.
So either we need to drop all JOIN requests once a listening socket enters
syncookie mode, or we need to store enough state to reconstruct the request
socket later.
This adds a state table (1024 entries) to store the data present in the
MP_JOIN syn request and the random nonce used for the cookie syn/ack.
When a MP_JOIN ACK passed cookie validation, the table is consulted
to rebuild the request socket from it.
An alternate approach would be to "cancel" syn-cookie mode and force
MP_JOIN to always use a syn queue entry.
However, doing so brings the backlog over the configured queue limit.
v2: use req->syncookie, not (removed) want_cookie arg
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If SYN packet contains MP_CAPABLE option, keep it enabled.
Syncokie validation and cookie-based socket creation is changed to
instantiate an mptcp request sockets if the ACK contains an MPTCP
connection request.
Rather than extend both cookie_v4/6_check, add a common helper to create
the (mp)tcp request socket.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Will be used to initialize the mptcp request socket when a MP_CAPABLE
request was handled in syncookie mode, i.e. when a TCP ACK containing a
MP_CAPABLE option is a valid syncookie value.
Normally (non-cookie case), MPTCP will generate a unique 32 bit connection
ID and stores it in the MPTCP token storage to be able to retrieve the
mptcp socket for subflow joining.
In syncookie case, we do not want to store any state, so just generate the
unique ID and use it in the reply.
This means there is a small window where another connection could generate
the same token.
When Cookie ACK comes back, we check that the token has not been registered
in the mean time. If it was, the connection needs to fall back to TCP.
Changes in v2:
- use req->syncookie instead of passing 'want_cookie' arg to ->init_req()
(Eric Dumazet)
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syncookie code path needs to create an mptcp request sock.
Prepare for this and add mptcp prefix plus needed export of ops struct.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When syncookie support is added, we will need to add a variant of
subflow_init_req() helper. It will do almost same thing except
that it will not compute/add a token to the mptcp token tree.
To avoid excess copy&paste, this commit splits away part of the
code into a new helper, __subflow_init_req, that can then be re-used
from the 'no insert' function added in a followup change.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Once syncookie support is added, no state will be stored anymore when the
syn/ack is generated in syncookie mode.
When the ACK comes back, the generated key will be taken from the TCP ACK,
the token is re-generated and inserted into the token tree.
This means we can't retry with a new key when the token is already taken
in the syncookie case.
Therefore, move the retry logic to the caller to prepare for syncookie
support in mptcp.
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Nowadays output function has a 'synack_type' argument that tells us when
the syn/ack is emitted via syncookies.
The request already tells us when timestamps are supported, so check
both to detect special timestamp for tcp option encoding is needed.
We could remove cookie_ts altogether, but a followup patch would
otherwise need to adjust function signatures to pass 'want_cookie' to
mptcp core.
This way, the 'existing' bit can be used.
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
rds_notify_queue_get() is potentially copying uninitialized kernel stack
memory to userspace since the compiler may leave a 4-byte hole at the end
of `cmsg`.
In 2016 we tried to fix this issue by doing `= { 0 };` on `cmsg`, which
unfortunately does not always initialize that 4-byte hole. Fix it by using
memset() instead.
Cc: stable@vger.kernel.org
Fixes: f037590fff30 ("rds: fix a leak of kernel memory")
Fixes: bdbe6fbc6a2f ("RDS: recv.c")
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2020-07-30
This series contains updates to the e1000e and igb drivers.
Aaron Ma allows PHY initialization to continue if ULP disable failed for
e1000e.
Francesco Ruggeri fixes race conditions in igb reset that could cause panics.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make use of the struct_size() helper, in multiple places, instead
of an open-coded version in order to avoid any potential type
mistakes and protect against potential integer overflows.
Also, remove unnecessary object identifier size.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert to %pM instead of using custom code.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert to %pM instead of using custom code.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Convert to %pM instead of using custom code.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Julian Wiedmann says:
====================
s390/qeth: updates 2020-07-30
please apply the following patch series for qeth to netdev's net-next tree.
This primarily brings some modernization to the RX path, laying the
groundwork for smarter RX refill policies.
Some of the patches are tagged as fixes, but really target only rare /
theoretical issues. So given where we are in the release cycle and that we
touch the main RX path, taking them through net-next seems more appropriate.
====================
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The (misplaced) comment doesn't make any sense, enforcing an
uninitialized RX buffer won't help with IRQ reduction.
So make the best use of all available RX buffers.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Discard events that don't contain any entries. This shouldn't happen,
but subsequent code relies on being able to use entry 0. So better
be safe than accessing garbage.
Fixes: b4d72c08b358 ("qeth: bridgeport support - basic control")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Running a RX refill outside of NAPI context is inherently racy, even
though the worker is only started for an entirely idle RX ring.
>From the moment that the worker has replenished parts of the RX ring,
the HW can use those RX buffers, raise an IRQ and cause our NAPI code to
run concurrently to the RX refill worker.
Instead let the worker schedule our NAPI instance, and refill the RX
ring from there. Keeping accurate count of how many buffers still need
to be refilled also removes some quirky arithmetic from the low-level
code.
Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When preparing a buffer for RX refill, tolerate that it already has a
pool_entry attached. Otherwise we could easily leak such a pool_entry
when re-driving the RX refill after an error (from eg. do_qdio()).
This needs some minor adjustment in the code that drains RX buffer(s)
prior to RX refill and during teardown, so that ->pool_entry is NULLed
accordingly.
Fixes: 4a71df50047f ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we don't care about vlan depth, we could pass NULL instead of the
address of a unused local variable to skb_network_protocol() as a param.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Exchange the positions of the err_tbl_init and err_register labels in
ct_init_module function.
Fixes: c34b961a2492 ("net/sched: act_ct: Create nf flow table per zone")
Signed-off-by: liujian <liujian56@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|