Age | Commit message (Collapse) | Author |
|
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Note that sysctl_tcp_thin_dupack was not used, I deleted it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
pull-request: mac80211 2017-10-25
Here are:
* follow-up fixes for the WoWLAN security issue, to fix a
partial TKIP key material problem and to use crypto_memneq()
* a change for better enforcement of FQ's memory limit
* a disconnect/connect handling fix, and
* a user rate mask validation fix
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit 0da4af00b2ed ("ipv6: only update __use and lastusetime
once per jiffy at most"), updating the dst lastuse field is an
unlikely action: it happens at most once per jiffy, out of
potentially millions of calls per second.
Mark explicitly the code as such, and let the compiler generate
better code.
Note: gcc 7.2 and several older versions do actually generate
different - better - code when the unlikely() hint is in place,
avoid jump in the fast path and keeping better code locality.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The SMC protocol [1] relies on the use of a new TCP experimental
option [2, 3]. With this option, SMC capabilities are exchanged
between peers during the TCP three way handshake. This patch adds
support for this experimental option to TCP.
References:
[1] SMC-R Informational RFC: http://www.rfc-editor.org/info/rfc7609
[2] Shared Use of TCP Experimental Options RFC 6994:
https://tools.ietf.org/rfc/rfc6994.txt
[3] IANA ExID SMCR:
http://www.iana.org/assignments/tcp-parameters/tcp-parameters.xhtml#tcp-exids
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In my first attempt to fix the lockdep splat, I forgot we could
enter inet_csk_route_req() with a freshly allocated request socket,
for which refcount has not yet been elevated, due to complex
SLAB_TYPESAFE_BY_RCU rules.
We either are in rcu_read_lock() section _or_ we own a refcount on the
request.
Correct RCU verb to use here is rcu_dereference_check(), although it is
not possible to prove we actually own a reference on a shared
refcount :/
In v2, I added ireq_opt_deref() helper and use in three places, to fix other
possible splats.
[ 49.844590] lockdep_rcu_suspicious+0xea/0xf3
[ 49.846487] inet_csk_route_req+0x53/0x14d
[ 49.848334] tcp_v4_route_req+0xe/0x10
[ 49.850174] tcp_conn_request+0x31c/0x6a0
[ 49.851992] ? __lock_acquire+0x614/0x822
[ 49.854015] tcp_v4_conn_request+0x5a/0x79
[ 49.855957] ? tcp_v4_conn_request+0x5a/0x79
[ 49.858052] tcp_rcv_state_process+0x98/0xdcc
[ 49.859990] ? sk_filter_trim_cap+0x2f6/0x307
[ 49.862085] tcp_v4_do_rcv+0xfc/0x145
[ 49.864055] ? tcp_v4_do_rcv+0xfc/0x145
[ 49.866173] tcp_v4_rcv+0x5ab/0xaf9
[ 49.868029] ip_local_deliver_finish+0x1af/0x2e7
[ 49.870064] ip_local_deliver+0x1b2/0x1c5
[ 49.871775] ? inet_del_offload+0x45/0x45
[ 49.873916] ip_rcv_finish+0x3f7/0x471
[ 49.875476] ip_rcv+0x3f1/0x42f
[ 49.876991] ? ip_local_deliver_finish+0x2e7/0x2e7
[ 49.878791] __netif_receive_skb_core+0x6d3/0x950
[ 49.880701] ? process_backlog+0x7e/0x216
[ 49.882589] __netif_receive_skb+0x1d/0x5e
[ 49.884122] process_backlog+0x10c/0x216
[ 49.885812] net_rx_action+0x147/0x3df
Fixes: a6ca7abe53633 ("tcp/dccp: fix lockdep splat in inet_csk_route_req()")
Fixes: c92e8c02fe66 ("tcp/dccp: fix ireq->opt races")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kernel test robot <fengguang.wu@intel.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
to READ_ONCE()/WRITE_ONCE()
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.
However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:
----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()
// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
READ_ONCE()/WRITE_ONCE()
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.
However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.
It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't handle comments, leaving references
to ACCESS_ONCE() instances which have been removed. As a preparatory
step, this patch converts netlink and netfilter code and comments to use
{READ,WRITE}_ONCE() consistently.
----
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Florian Westphal <fw@strlen.de>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-7-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "Reshetova, Elena" <elena.reshetova@intel.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After the patch 'rtnetlink: bring NETDEV_CHANGELOWERSTATE event
process back to rtnetlink_event', bond_lower_state_changed would
generate NETDEV_CHANGEUPPER event which would send a notification
to userspace in rtnetlink_event.
There's no need to call rtmsg_ifinfo to send the notification
any more. So this patch is to remove it from these places after
bond_lower_state_changed.
Besides, after this, rtmsg_ifinfo is not needed to be exported.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sock lock may be taken in the message timer function which is a
problem since timers run in BH. Instead of timers use delayed_work.
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: bbb03029a899 ("strparser: Generalize strparser")
Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
previous patches removed all writes to them.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
not needed/used anymore.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We currently pass down the l4 protocol to the conntrack ->packet()
function, but the only user of this is the debug info decision.
Same information can be derived from struct nf_conn.
Add a wrapper for the previous patch that extracs the information
from nf_conn and passes it to nf_l4proto_log_invalid().
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We currently pass down the l4 protocol to the conntrack ->packet()
function, but the only user of this is the debug info decision.
Same information can be derived from struct nf_conn.
As a first step, add and use a new log function for this, similar to
nf_ct_helper_log().
Add __cold annotation -- invalid packets should be infrequent so
gcc can consider all call paths that lead to such a function as
unlikely.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We already allow to enable TFO without a cookie by using the
fastopen-sysctl and setting it to TFO_SERVER_COOKIE_NOT_REQD (or
TFO_CLIENT_NO_COOKIE).
This is safe to do in certain environments where we know that there
isn't a malicous host (aka., data-centers) or when the
application-protocol already provides an authentication mechanism in the
first flight of data.
A server however might be providing multiple services or talking to both
sides (public Internet and data-center). So, this server would want to
enable cookie-less TFO for certain services and/or for connections that
go to the data-center.
This patch exposes a socket-option and a per-route attribute to enable such
fine-grained configurations.
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Mark hlist node in sk rcu iterator as protected by the rcu.
hlist_next_rcu accomplishes this and silences the warnings
sparse throws.
Found with make C=1 net/ipv4/udp.o on linux-next tag
next-20171009.
Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Bring IPv6 in par with IPv4 :
- Use net_hash_mix() to spread addresses a bit more.
- Use 256 slots hash table instead of 16
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There were quite a few overlapping sets of changes here.
Daniel's bug fix for off-by-ones in the new BPF branch instructions,
along with the added allowances for "data_end > ptr + x" forms
collided with the metadata additions.
Along with those three changes came veritifer test cases, which in
their final form I tried to group together properly. If I had just
trimmed GIT's conflict tags as-is, this would have split up the
meta tests unnecessarily.
In the socketmap code, a set of preemption disabling changes
overlapped with the rename of bpf_compute_data_end() to
bpf_compute_data_pointers().
Changes were made to the mv88e6060.c driver set addr method
which got removed in net-next.
The hyperv transport socket layer had a locking change in 'net'
which overlapped with a change of socket state macro usage
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These helpers are no longer in use by drivers, so remove them.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is no longer used by the drivers, so remove it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extend the tc_setup_cb_call entrypoint function originally used only for
action egress devices callbacks to call per-block callbacks as well.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce infrastructure that allows drivers to register callbacks that
are called whenever tc would offload inserted rule for a specific block.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use previously introduced extended variants of block get and put
functions. This allows to specify a binder types specific to clsact
ingress/egress which is useful for drivers to distinguish who actually
got the block.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce new type of ndo_setup_tc message to propage binding/unbinding
of a block to driver. Call this ndo whenever qdisc gets/puts a block.
Alongside with this, there's need to propagate binder type from qdisc
code down to the notifier. So introduce extended variants of
block_get/put in order to pass this info.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzkaller found another bug in DCCP/TCP stacks [1]
For the reasons explained in commit ce1050089c96 ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
ireq->opt unless we own the request sock.
Note the opt field is renamed to ireq_opt to ease grep games.
[1]
BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295
CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x25b/0x340 mm/kasan/report.c:409
__asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
__tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x40c341
RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341
RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1
R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000
Allocated by task 3295:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
__do_kmalloc mm/slab.c:3725 [inline]
__kmalloc+0x162/0x760 mm/slab.c:3734
kmalloc include/linux/slab.h:498 [inline]
tcp_v4_save_options include/net/tcp.h:1962 [inline]
tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe
Freed by task 3306:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
__cache_free mm/slab.c:3503 [inline]
kfree+0xca/0x250 mm/slab.c:3820
inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
__sk_destruct+0xfd/0x910 net/core/sock.c:1560
sk_destruct+0x47/0x80 net/core/sock.c:1595
__sk_free+0x57/0x230 net/core/sock.c:1603
sk_free+0x2a/0x40 net/core/sock.c:1614
sock_put include/net/sock.h:1652 [inline]
inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959
tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765
tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675
ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
dst_input include/net/dst.h:464 [inline]
ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
NF_HOOK include/linux/netfilter.h:249 [inline]
ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
__netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
__netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
netif_receive_skb+0xae/0x390 net/core/dev.c:4611
tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
call_write_iter include/linux/fs.h:1770 [inline]
new_sync_write fs/read_write.c:468 [inline]
__vfs_write+0x68a/0x970 fs/read_write.c:481
vfs_write+0x18f/0x510 fs/read_write.c:543
SYSC_write fs/read_write.c:588 [inline]
SyS_write+0xef/0x220 fs/read_write.c:580
entry_SYSCALL_64_fastpath+0x1f/0xbe
Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
New socket option TCP_FASTOPEN_KEY to allow different keys per
listener. The listener by default uses the global key until the
socket option is set. The key is a 16 bytes long binary data. This
option has no effect on regular non-listener TCP sockets.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add extack to in_validator_info and in6_validator_info. Update the one
user of each, ipvlan, to return an error message for failures.
Only manual configuration of an address is plumbed in the IPv6 code path.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SK_SKB BPF programs are run from the socket/tcp context but early in
the stack before much of the TCP metadata is needed in tcp_skb_cb. So
we can use some unused fields to place BPF metadata needed for SK_SKB
programs when implementing the redirect function.
This allows us to drop the preempt disable logic. It does however
require an API change so sk_redirect_map() has been updated to
additionally provide ctx_ptr to skb. Note, we do however continue to
disable/enable preemption around actual BPF program running to account
for map updates.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
David Howells says:
====================
rxrpc: Add bits for kernel services
Here are some patches that add a few things for kernel services to use:
(1) Allow service upgrade to be requested and allow the resultant actual
service ID to be obtained.
(2) Allow the RTT time of a call to be obtained.
(3) Allow a kernel service to find out if a call is still alive on a
server between transmitting a request and getting the reply.
(4) Allow data transmission to ignore signals if transmission progress is
being made in reasonable time. This is also usable by userspace by
passing MSG_WAITALL to sendmsg()[*].
[*] I'm not sure this is the right interface for this or whether a sockopt
should be used instead.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Alexander Aring <alex.aring@gmail.com>
Cc: Stefan Schmidt <stefan@osg.samsung.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Florian Westphal <fw@strlen.de>
Cc: linux-wpan@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com> # for ieee802154
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Cc: dccp@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: linux-decnet-user@lists.sourceforge.net
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The dsa_port structure is part of DSA core data and must only be updated
by the later. It is OK and sometimes necessary for the DSA drivers to
access this data, but this has to be read only.
For that purpose, add a dsa_to_port() helper which returns a const
pointer to a dsa_port structure which must be used by DSA drivers from
now on instead of digging into ds->ports[] themselves.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The dsa_port structure has a "netdev" member, which can be used for
either the master device, or the slave device, depending on its type.
It is true that today, CPU port are not exposed to userspace, thus the
port's netdev member can be used to point to its master interface.
But it is still slightly confusing, so split it into more explicit
"master" and "slave" members inside an anonymous union.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Provide a couple of functions to allow cleaner handling of signals in a
kernel service. They are:
(1) rxrpc_kernel_get_rtt()
This allows the kernel service to find out the RTT time for a call, so
as to better judge how large a timeout to employ.
Note, though, that whilst this returns a value in nanoseconds, the
timeouts can only actually be in jiffies.
(2) rxrpc_kernel_check_life()
This returns a number that is updated when ACKs are received from the
peer (notably including PING RESPONSE ACKs which we can elicit by
sending PING ACKs to see if the call still exists on the server).
The caller should compare the numbers of two calls to see if the call
is still alive.
These can be used to provide an extending timeout rather than returning
immediately in the case that a signal occurs that would otherwise abort an
RPC operation. The timeout would be extended if the server is still
responsive and the call is still apparently alive on the server.
For most operations this isn't that necessary - but for FS.StoreData it is:
OpenAFS writes the data to storage as it comes in without making a backup,
so if we immediately abort it when partially complete on a CTRL+C, say, we
have no idea of the state of the file after the abort.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Provide support for a kernel service to make use of the service upgrade
facility. This involves:
(1) Pass an upgrade request flag to rxrpc_kernel_begin_call().
(2) Make rxrpc_kernel_recv_data() return the call's current service ID so
that the caller can detect service upgrade and see what the service
was upgraded to.
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
The fq structure would fail to properly enforce the memory limit in the case
where the packet being enqueued was bigger than the packet being removed to
bring the memory usage down. So keep dropping packets until the memory usage is
back below the limit. Also, fix the statistics for memory limit violations.
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
In order to not dirty the cacheline too often, we try to only update
dst->__use and dst->lastusetime at most once per jiffy.
As dst->lastusetime is only used by ipv6 garbage collector, it should
be good enough time resolution.
And __use is only used in ipv6_route_seq_show() to show how many times a
dst has been used. And as __use is not atomic_t right now, it does not
show the precise number of usage times anyway. So we think it should be
OK to only update it at most once per jiffy.
According to my latest syn flood test on a machine with intel Xeon 6th
gen processor and 2 10G mlx nics bonded together, each with 8 rx queues
on 2 NUMA nodes:
With this patch, the packet process rate increases from ~3.49Mpps to
~3.75Mpps with a 7% increase rate.
Note: dst_use() is being renamed to dst_hold_and_use() to better specify
the purpose of the function.
Signed-off-by: Wei Wang <weiwan@google.com>
Acked-by: Eric Dumazet <edumazet@googl.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use tcf_block_q helper to get q pointer to be used for direct call of
sch_tree_lock/unlock instead of tcf_tree_lock/unlock.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Whenever the block->q is set, it can be used instead of tp->q as it
contains the same value. When it is not set, which can't happen now but
it might happen with the follow-up shared blocks introduction, the class
is not set in the result. That would lead to a class lookup instead
of direct class pointer use for classful qdiscs. However, it is not
planned to support classful qdisqs sharing filter blocks, so that may
never happen.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These helpers allows to get a q and netdev pointers
for given block easily.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|