Age | Commit message (Collapse) | Author |
|
Commit 36a50a989ee8 ("tipc: fix infinite loop when dumping link monitor
summary") intended to fix a problem with user tool looping when max
number of bearers are enabled.
Unfortunately, the wrong version of the commit was posted, so the
problem was not solved at all.
This commit adds the missing part.
Fixes: 36a50a989ee8 ("tipc: fix infinite loop when dumping link monitor summary")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A vti6 interface can carry IPv4 as well, so it makes no sense to
enforce a minimum MTU of IPV6_MIN_MTU.
If the user sets an MTU below IPV6_MIN_MTU, IPv6 will be
disabled on the interface, courtesy of addrconf_notify().
Reported-by: Xin Long <lucien.xin@gmail.com>
Fixes: b96f9afee4eb ("ipv4/6: use core net MTU range checking")
Fixes: c6741fbed6dc ("vti6: Properly adjust vti6 MTU from MTU of lower device")
Fixes: 53c81e95df17 ("ip6_vti: adjust vti mtu according to mtu of lower device")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-04-27
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add extensive BPF helper description into include/uapi/linux/bpf.h
and a new script bpf_helpers_doc.py which allows for generating a
man page out of it. Thus, every helper in BPF now comes with proper
function signature, detailed description and return code explanation,
from Quentin.
2) Migrate the BPF collect metadata tunnel tests from BPF samples over
to the BPF selftests and further extend them with v6 vxlan, geneve
and ipip tests, simplify the ipip tests, improve documentation and
convert to bpf_ntoh*() / bpf_hton*() api, from William.
3) Currently, helpers that expect ARG_PTR_TO_MAP_{KEY,VALUE} can only
access stack and packet memory. Extend this to allow such helpers
to also use map values, which enabled use cases where value from
a first lookup can be directly used as a key for a second lookup,
from Paul.
4) Add a new helper bpf_skb_get_xfrm_state() for tc BPF programs in
order to retrieve XFRM state information containing SPI, peer
address and reqid values, from Eyal.
5) Various optimizations in nfp driver's BPF JIT in order to turn ADD
and SUB instructions with negative immediate into the opposite
operation with a positive immediate such that nfp can better fit
small immediates into instructions. Savings in instruction count
up to 4% have been observed, from Jakub.
6) Add the BPF prog's gpl_compatible flag to struct bpf_prog_info
and add support for dumping this through bpftool, from Jiri.
7) Move the BPF sockmap samples over into BPF selftests instead since
sockmap was rather a series of tests than sample anyway and this way
this can be run from automated bots, from John.
8) Follow-up fix for bpf_adjust_tail() helper in order to make it work
with generic XDP, from Nikita.
9) Some follow-up cleanups to BTF, namely, removing unused defines from
BTF uapi header and renaming 'name' struct btf_* members into name_off
to make it more clear they are offsets into string section, from Martin.
10) Remove test_sock_addr from TEST_GEN_PROGS in BPF selftests since
not run directly but invoked from test_sock_addr.sh, from Yonghong.
11) Remove redundant ret assignment in sample BPF loader, from Wang.
12) Add couple of missing files to BPF selftest's gitignore, from Anders.
There are two trivial merge conflicts while pulling:
1) Remove samples/sockmap/Makefile since all sockmap tests have been
moved to selftests.
2) Add both hunks from tools/testing/selftests/bpf/.gitignore to the
file since git should ignore all of them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After processing the transaction log, the remaining entries of the log
need to be released.
However, in some cases no entries remain, e.g. because the transaction
did not remove anything.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
ebtables uses find_match() rather than find_request_match in one case
(see bcf4934288402be3464110109a4dae3bd6fb3e93,
"netfilter: ebtables: Fix extension lookup with identical name"), so
extend the check on name length to those functions too.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Dominique Martinet reported a TCP hang problem when simultaneous open was used.
The problem is that the tcp_conntracks state table is not smart enough
to handle the case. The state table could be fixed by introducing a new state,
but that would require more lines of code compared to this patch, due to the
required backward compatibility with ctnetlink.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Reported-by: Dominique Martinet <asmadeus@codewreck.org>
Tested-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Similarly, tbl->entries is not initialized after kmalloc(),
therefore causes an uninit-value warning in ip_vs_lblc_check_expire(),
as reported by syzbot.
Reported-by: <syzbot+3e9695f147fb529aa9bc@syzkaller.appspotmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
tbl->entries is not initialized after kmalloc(), therefore
causes an uninit-value warning in ip_vs_lblc_check_expire()
as reported by syzbot.
Reported-by: <syzbot+3dfdea57819073a04f21@syzkaller.appspotmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next
Simon Horman says:
====================
IPVS Updates for v4.18
please consider these IPVS enhancements for v4.18.
* Whitepace cleanup
* Add Maglev hashing algorithm as a IPVS scheduler
Inju Song says "Implements the Google's Maglev hashing algorithm as a
IPVS scheduler. Basically it provides consistent hashing but offers some
special features about disruption and load balancing.
1) minimal disruption: when the set of destinations changes,
a connection will likely be sent to the same destination
as it was before.
2) load balancing: each destination will receive an almost
equal number of connections.
Seel also: [3.4 Consistent Hasing] in
https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-eisenbud.pdf
"
* Fix to correct implementation of Knuth's multiplicative hashing
which is used in sh/dh/lblc/lblcr algorithms. Instead the
implementation provided by the hash_32() macro is used.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
before:
text data bss dec hex filename
5056 844 0 5900 170c net/netfilter/nft_exthdr.ko
102456 2316 401 105173 19ad5 net/netfilter/nf_tables.ko
after:
106410 2392 401 109203 1aa93 net/netfilter/nf_tables.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
before:
text data bss dec hex filename
2657 844 0 3501 dad net/netfilter/nft_rt.ko
100826 2240 401 103467 1942b net/netfilter/nf_tables.ko
after:
2657 844 0 3501 dad net/netfilter/nft_rt.ko
102456 2316 401 105173 19ad5 net/netfilter/nf_tables.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
size net/netfilter/nft_meta.ko
text data bss dec hex filename
5826 936 1 6763 1a6b net/netfilter/nft_meta.ko
96407 2064 400 98871 18237 net/netfilter/nf_tables.ko
after:
100826 2240 401 103467 1942b net/netfilter/nf_tables.ko
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When bpf_adjust_tail was introduced for generic xdp, it changed skb's tail
pointer, so it was pointing to the new "end of the packet". However skb's
len field wasn't properly modified, so on the wire ethernet frame had
original (or even bigger, if adjust_head was used) size. This diff is
fixing this.
Fixes: 198d83bb3 (" bpf: make generic xdp compatible w/ bpf_xdp_adjust_tail")
Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
Virtual devices such as tunnels and bonding can handle large packets.
Only segment packets when reaching a physical or loopback device.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Allow specifying segment size in the send call.
The new control message performs the same function as socket option
UDP_SEGMENT while avoiding the extra system call.
[ Export udp_cmsg_send for ipv6. -DaveM ]
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When sending large datagrams that are later segmented, store data in
page frags to avoid copying from linear in skb_segment.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb_segment by default transfers allocated wmem from the gso skb
to the tail of the segment list. This underreports real truesize
of the list, especially if the tail might be dropped.
Similar to tcp_gso_segment, update wmem_alloc with the aggregate
list truesize and make each segment responsible for its own
share by setting skb->destructor.
Clear gso_skb->destructor prior to calling skb_segment to skip
the default assignment to tail.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Support generic segmentation offload for udp datagrams. Callers can
concatenate and send at once the payload of multiple datagrams with
the same destination.
To set segment size, the caller sets socket option UDP_SEGMENT to the
length of each discrete payload. This value must be smaller than or
equal to the relevant MTU.
A follow-up patch adds cmsg UDP_SEGMENT to specify segment size on a
per send call basis.
Total byte length may then exceed MTU. If not an exact multiple of
segment size, the last segment will be shorter.
The implementation adds a gso_size field to the udp socket, ip(v6)
cmsg cookie and inet_cork structure to be able to set the value at
setsockopt or cmsg time and to work with both lockless and corked
paths.
Initial benchmark numbers show UDP GSO about as expensive as TCP GSO.
tcp tso
3197 MB/s 54232 msg/s 54232 calls/s
6,457,754,262 cycles
tcp gso
1765 MB/s 29939 msg/s 29939 calls/s
11,203,021,806 cycles
tcp without tso/gso *
739 MB/s 12548 msg/s 12548 calls/s
11,205,483,630 cycles
udp
876 MB/s 14873 msg/s 624666 calls/s
11,205,777,429 cycles
udp gso
2139 MB/s 36282 msg/s 36282 calls/s
11,204,374,561 cycles
[*] after reverting commit 0a6b2a1dc2a2
("tcp: switch to GSO being always on")
Measured total system cycles ('-a') for one core while pinning both
the network receive path and benchmark process to that core:
perf stat -a -C 12 -e cycles \
./udpgso_bench_tx -C 12 -4 -D "$DST" -l 4
Note the reduction in calls/s with GSO. Bytes per syscall drops
increases from 1470 to 61818.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement generic segmentation offload support for udp datagrams. A
follow-up patch adds support to the protocol stack to generate such
packets.
UDP GSO is not UFO. UFO fragments a single large datagram. GSO splits
a large payload into a number of discrete UDP datagrams.
The implementation adds a GSO type SKB_UDP_GSO_L4 to differentiate it
from UFO (SKB_UDP_GSO).
IPPROTO_UDPLITE is excluded, as that protocol has no gso handler
registered.
[ Export __udp_gso_segment for ipv6. -DaveM ]
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
UDP segmentation offload needs access to inet_cork in the udp layer.
Pass the struct to ip(6)_make_skb instead of allocating it on the
stack in that function itself.
This patch is a noop otherwise.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ceph_con_workfn() validates con->state before calling try_read() and
then try_write(). However, try_read() temporarily releases con->mutex,
notably in process_message() and ceph_con_in_msg_alloc(), opening the
window for ceph_con_close() to sneak in, close the connection and
release con->sock. When try_write() is called on the assumption that
con->state is still valid (i.e. not STANDBY or CLOSED), a NULL sock
gets passed to the networking stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: selinux_socket_sendmsg+0x5/0x20
Make sure con->state is valid at the top of try_write() and add an
explicit BUG_ON for this, similar to try_read().
Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/23706
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
|
In the quest to remove all stack VLA usage removed from the kernel[1],
just use XFRM_MAX_DEPTH as already done for the "class" array. In one
case, it'll do this loop up to 5, the other caller up to 6.
[1] https://lkml.org/lkml/2018/3/7/621
Co-developed-by: Andreas Christoforou <andreaschristofo@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Merging net into net-next to help the bpf folks avoid
some really ugly merge conflicts.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf 2018-04-25
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix to clear the percpu metadata_dst that could otherwise carry
stale ip_tunnel_info, from William.
2) Fix that reduces the number of passes in x64 JIT with regards to
dead code sanitation to avoid risk of prog rejection, from Gianluca.
3) Several fixes of sockmap programs, besides others, fixing a double
page_put() in error path, missing refcount hold for pinned sockmap,
adding required -target bpf for clang in sample Makefile, from John.
4) Fix to disable preemption in __BPF_PROG_RUN_ARRAY() paths, from Roman.
5) Fix tools/bpf/ Makefile with regards to a lex/yacc build error
seen on older gcc-5, from John.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The function rds_ib_setup_qp is calling rds_ib_get_client_data and
should correspondingly call rds_ib_dev_put. This call was lost in
the non-error path with the introduction of error handling done in
commit 3b12f73a5c29 ("rds: ib: add error handle")
Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The internal CLC socket should exist till the SMC-socket is released.
Function tcp_listen_worker() releases the internal CLC socket of a
listen socket, if an smc_close_active() is called. This function
is called for the final release(), but it is called for shutdown
SHUT_RDWR as well. This opens a door for protection faults, if
socket calls using the internal CLC socket are called for a
shutdown listen socket.
With the changes of
commit 3d502067599f ("net/smc: simplify wait when closing listen socket")
there is no need anymore to release the internal CLC socket in
function tcp_listen_worker((). It is sufficient to release it in
smc_release().
Fixes: 127f49705823 ("net/smc: release clcsock from tcp_listen_worker")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Reported-by: syzbot+9045fc589fcd196ef522@syzkaller.appspotmail.com
Reported-by: syzbot+28a2c86cf19c81d871fa@syzkaller.appspotmail.com
Reported-by: syzbot+9605e6cace1b5efd4a0a@syzkaller.appspotmail.com
Reported-by: syzbot+cf9012c597c8379d535c@syzkaller.appspotmail.com
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After Commit 4f0087812648 ("sctp: apply rhashtable api to send/recv
path"), there's no place using sctp_assoc_is_match, so remove it.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.
Fix this by returning 'netdev_tx_t' in this driver too.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
[sven@narfation.org: fixed alignment]
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
Move the check on FRA_L3MDEV attribute to helper to improve the
readability of fib_nl2rule. Update the extack messages to be
clear when the configuration option is disabled versus an invalid
value has been passed.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It's currently written as:
if (!tchunk->tsn_gap_acked) { [1]
tchunk->tsn_gap_acked = 1;
...
}
if (TSN_lte(tsn, sack_ctsn)) {
if (!tchunk->tsn_gap_acked) {
/* SFR-CACC processing */
...
}
}
Which causes the SFR-CACC processing on ack reception to never process,
as tchunk->tsn_gap_acked is always true by then. Block [1] was
moved to that position by the commit marked below.
This patch fixes it by doing SFR-CACC processing earlier, before
tsn_gap_acked is set to true.
Fixes: 31b02e154940 ("sctp: Failover transmitted list on transport delete")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sctp_make_sack() make changes to the asoc and this cast is just
bypassing the const attribute. As there is no need to have the const
there, just remove it and fix the violation.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch extends NTF_EXT_LEARNED support to the neighbour system.
Example use-case: An Ethernet VPN implementation (eg in FRR routing suite)
can use this flag to add dynamic reachable external neigh entires
learned via control plane. The use of neigh NTF_EXT_LEARNED in this
patch is consistent with its use with bridge and vxlan fdb entries.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The addrconf_ifdown() evaluates keep_addr_on_down state twice. There
is no need to do it.
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ECMP (equal-cost multipath) hashes are typically computed on the packets'
5-tuple(src IP, dst IP, src port, dst port, L4 proto).
For encapsulated packets, the L4 data is not readily available and ECMP
hashing will often revert to (src IP, dst IP). This will lead to traffic
polarization on a single ECMP path, causing congestion and waste of network
capacity.
In IPv6, the 20-bit flow label field is also used as part of the ECMP hash.
In the lack of L4 data, the hashing will be on (src IP, dst IP, flow
label). Having a non-zero flow label is thus important for proper traffic
load balancing when L4 data is unavailable (i.e., when packets are
encapsulated).
Currently, the seg6_do_srh_encap() function extracts the original packet's
flow label and set it as the outer IPv6 flow label. There are two issues
with this behaviour:
a) There is no guarantee that the inner flow label is set by the source.
b) If the original packet is not IPv6, the flow label will be set to
zero (e.g., IPv4 or L2 encap).
This patch adds a function, named seg6_make_flowlabel(), that computes a
flow label from a given skb. It supports IPv6, IPv4 and L2 payloads, and
leverages the per namespace 'seg6_flowlabel" sysctl value.
The currently support behaviours are as follows:
-1 set flowlabel to zero.
0 copy flowlabel from Inner paceket in case of Inner IPv6
(Set flowlabel to 0 in case IPv4/L2)
1 Compute the flowlabel using seg6_make_flowlabel()
This patch has been tested for IPv6, IPv4, and L2 traffic.
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Acked-by: David Lebrun <dlebrun@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The percpu metadata_dst might carry the stale ip_tunnel_info
and cause incorrect behavior. When mixing tests using ipv4/ipv6
bpf vxlan and geneve tunnel, the ipv6 tunnel info incorrectly uses
ipv4's src ip addr as its ipv6 src address, because the previous
tunnel info does not clean up. The patch zeros the fields in
ip_tunnel_info.
Signed-off-by: William Tu <u9012063@gmail.com>
Reported-by: Yifeng Sun <pkusunyifeng@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
|
|
This commit introduces a helper which allows fetching xfrm state
parameters by eBPF programs attached to TC.
Prototype:
bpf_skb_get_xfrm_state(skb, index, xfrm_state, size, flags)
skb: pointer to skb
index: the index in the skb xfrm_state secpath array
xfrm_state: pointer to 'struct bpf_xfrm_state'
size: size of 'struct bpf_xfrm_state'
flags: reserved for future extensions
The helper returns 0 on success. Non zero if no xfrm state at the index
is found - or non exists at all.
struct bpf_xfrm_state currently includes the SPI, peer IPv4/IPv6
address and the reqid; it can be further extended by adding elements to
its end - indicating the populated fields by the 'size' argument -
keeping backwards compatibility.
Typical usage:
struct bpf_xfrm_state x = {};
bpf_skb_get_xfrm_state(skb, 0, &x, sizeof(x), 0);
...
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
rt6_remove_exception_rt() is called under rcu_read_lock() only.
We lock rt6_exception_lock a bit later, so we do not hold
rt6_exception_lock yet.
Fixes: 8a14e46f1402 ("net/ipv6: Fix missing rcu dereferences on from")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: David Ahern <dsahern@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A duplicated null check on sgout is redundant as it is known to be
already true because of the identical earlier check. Remove it.
Detected by cppcheck:
net/tls/tls_sw.c:696: (warning) Identical inner 'if' condition is always
true.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Distributed filesystems are most effective when the server and client
clocks are synchronised. Embedded devices often use NFS for their
root filesystem but typically do not contain an RTC, so the clocks of
the NFS server and the embedded device will be out-of-sync when the root
filesystem is mounted (and may not be synchronised until late in the
boot process).
Extend ipconfig with the ability to export IP addresses of NTP servers
it discovers to /proc/net/ipconfig/ntp_servers. They can be supplied as
follows:
- If ipconfig is configured manually via the "ip=" or "nfsaddrs="
kernel command line parameters, one NTP server can be specified in
the new "<ntp0-ip>" parameter.
- If ipconfig is autoconfigured via DHCP, request DHCP option 42 in
the DHCPDISCOVER message, and record the IP addresses of up to three
NTP servers sent by the responding DHCP server in the subsequent
DHCPOFFER message.
ipconfig will only write the NTP server IP addresses it discovers to
/proc/net/ipconfig/ntp_servers, one per line (in the order received from
the DHCP server, if DHCP autoconfiguration is used); making use of these
NTP servers is the responsibility of a user space process (e.g. an
initrd/initram script that invokes an NTP client before mounting an NFS
root filesystem).
Signed-off-by: Chris Novakovic <chris@chrisn.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To allow ipconfig to report IP configuration details to user space
processes without cluttering /proc/net, create a new subdirectory
/proc/net/ipconfig. All files containing IP configuration details should
be written to this directory.
Signed-off-by: Chris Novakovic <chris@chrisn.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ic_nameservers, which stores the list of name servers discovered by
ipconfig, is initialised (i.e. has all of its elements set to NONE, or
0xffffffff) by ic_nameservers_predef() in the following scenarios:
- before the "ip=" and "nfsaddrs=" kernel command line parameters are
parsed (in ip_auto_config_setup());
- before autoconfiguring via DHCP or BOOTP (in ic_bootp_init()), in
order to clear any values that may have been set after parsing "ip="
or "nfsaddrs=" and are no longer needed.
This means that ic_nameservers_predef() is not called when neither "ip="
nor "nfsaddrs=" is specified on the kernel command line. In this
scenario, every element in ic_nameservers remains set to 0x00000000,
which is indistinguishable from ANY and causes pnp_seq_show() to write
the following (bogus) information to /proc/net/pnp:
#MANUAL
nameserver 0.0.0.0
nameserver 0.0.0.0
nameserver 0.0.0.0
This is potentially problematic for systems that blindly link
/etc/resolv.conf to /proc/net/pnp.
Ensure that ic_nameservers is also initialised when neither "ip=" nor
"nfsaddrs=" are specified by calling ic_nameservers_predef() in
ip_auto_config(), but only when ip_auto_config_setup() was not called
earlier. This causes the following to be written to /proc/net/pnp, and
is consistent with what gets written when ipconfig is configured
manually but no name servers are specified on the kernel command line:
#MANUAL
Signed-off-by: Chris Novakovic <chris@chrisn.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When ipconfig is autoconfigured via BOOTP, the request packet
initialised by ic_bootp_init_ext() always allocates 8 bytes for the name
server option, limiting the BOOTP server to responding with at most 2
name servers even though ipconfig in fact supports an arbitrary number
of name servers (as defined by CONF_NAMESERVERS_MAX, which is currently
3).
Only request name servers in the request packet if CONF_NAMESERVERS_MAX
is positive (to comply with [1, §3.8]), and allocate enough space in the
packet for CONF_NAMESERVERS_MAX name servers to indicate the maximum
number we can accept in response.
[1] RFC 2132, "DHCP Options and BOOTP Vendor Extensions":
https://tools.ietf.org/rfc/rfc2132.txt
Signed-off-by: Chris Novakovic <chris@chrisn.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When ipconfig is autoconfigured via BOOTP, the request packet
initialised by ic_bootp_init_ext() allocates 8 bytes for tag 5 ("Name
Server" [1, §3.7]), but tag 5 in the response isn't processed by
ic_do_bootp_ext(). Instead, allocate the 8 bytes to tag 6 ("Domain Name
Server" [1, §3.8]), which is processed by ic_do_bootp_ext(), and appears
to have been the intended tag to request.
This won't cause any breakage for existing users, as tag 5 responses
provided by BOOTP servers weren't being processed anyway.
[1] RFC 2132, "DHCP Options and BOOTP Vendor Extensions":
https://tools.ietf.org/rfc/rfc2132.txt
Signed-off-by: Chris Novakovic <chris@chrisn.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 5e953778a2aab04929a5e7b69f53dc26e39b079e ("ipconfig: add
nameserver IPs to kernel-parameter ip=") adds the IP addresses of
discovered name servers to the summary printed by ipconfig when
configuration is complete. It appears the intention in ip_auto_config()
was to print the name servers on a new line (especially given the
spacing and lack of comma before "nameserver0="), but they're actually
printed on the same line as the NFS root filesystem configuration
summary:
[ 0.686186] IP-Config: Complete:
[ 0.686226] device=eth0, hwaddr=xx:xx:xx:xx:xx:xx, ipaddr=10.0.0.2, mask=255.255.255.0, gw=10.0.0.1
[ 0.686328] host=test, domain=example.com, nis-domain=(none)
[ 0.686386] bootserver=10.0.0.1, rootserver=10.0.0.1, rootpath= nameserver0=10.0.0.1
This makes it harder to read and parse ipconfig's output. Instead, print
the name servers on a separate line:
[ 0.791250] IP-Config: Complete:
[ 0.791289] device=eth0, hwaddr=xx:xx:xx:xx:xx:xx, ipaddr=10.0.0.2, mask=255.255.255.0, gw=10.0.0.1
[ 0.791407] host=test, domain=example.com, nis-domain=(none)
[ 0.791475] bootserver=10.0.0.1, rootserver=10.0.0.1, rootpath=
[ 0.791476] nameserver0=10.0.0.1
Signed-off-by: Chris Novakovic <chris@chrisn.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
RETPOLINE made calls to tp->af_specific->md5_lookup() quite expensive,
given they have no result.
We can omit the calls for sockets that have no md5 keys.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Updates to the bitfields in struct packet_sock are not atomic.
Serialize these read-modify-write cycles.
Move po->running into a separate variable. Its writes are protected by
po->bind_lock (except for one startup case at packet_create). Also
replace a textual precondition warning with lockdep annotation.
All others are set only in packet_setsockopt. Serialize these
updates by holding the socket lock. Analogous to other field updates,
also hold the lock when testing whether a ring is active (pg_vec).
Fixes: 8dc419447415 ("[PACKET]: Add optional checksum computation for recvmsg")
Reported-by: DaeRyong Jeong <threeearcat@gmail.com>
Reported-by: Byoungyoung Lee <byoungyoung@purdue.edu>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit <c6849a3ac17e> ("net: init sk_cookie for inet socket")
Per discussion with Eric, when update sock_net(sk)->cookie_gen, the
whole cache cache line will be invalidated, as this cache line is shared
with all cpus, that may cause great performace hit.
Bellow is the data form Eric.
"Performance is reduced from ~5 Mpps to ~3.8 Mpps with 16 RX queues on
my host" when running synflood test.
Have to revert it to prevent from cache line false sharing.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If we go without an established session for a while, backoff delay will
climb to 30 seconds. The keepalive timeout is also 30 seconds, so it's
pretty easily hit after a prolonged hunting for a monitor: we don't get
a chance to send out a keepalive in time, which means we never get back
a keepalive ack in time, cutting an established session and attempting
to connect to a different monitor every 30 seconds:
[Sun Apr 1 23:37:05 2018] libceph: mon0 10.80.20.99:6789 session established
[Sun Apr 1 23:37:36 2018] libceph: mon0 10.80.20.99:6789 session lost, hunting for new mon
[Sun Apr 1 23:37:36 2018] libceph: mon2 10.80.20.103:6789 session established
[Sun Apr 1 23:38:07 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon
[Sun Apr 1 23:38:07 2018] libceph: mon1 10.80.20.100:6789 session established
[Sun Apr 1 23:38:37 2018] libceph: mon1 10.80.20.100:6789 session lost, hunting for new mon
[Sun Apr 1 23:38:37 2018] libceph: mon2 10.80.20.103:6789 session established
[Sun Apr 1 23:39:08 2018] libceph: mon2 10.80.20.103:6789 session lost, hunting for new mon
The regular keepalive interval is 10 seconds. After ->hunting is
cleared in finish_hunting(), call __schedule_delayed() to ensure we
send out a keepalive after 10 seconds.
Cc: stable@vger.kernel.org # 4.7+
Link: http://tracker.ceph.com/issues/23537
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
|
This means that if we do some backoff, then authenticate, and are
healthy for an extended period of time, a subsequent failure won't
leave us starting our hunting sequence with a large backoff.
Mirrors ceph.git commit d466bc6e66abba9b464b0b69687cf45c9dccf383.
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|