Age | Commit message (Collapse) | Author |
|
The linux/net.h provides the net_ratelimited_function. It should
therefore be included directly before it is used.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
commit bc6245e5efd7 ("bug: split BUILD_BUG stuff out into
<linux/build_bug.h>") added a new header for BUILD_BUG_ON. It should
therefore be included instead of linux/bug.h
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
The linux/gfp.h provides the GFP_ATOMIC and GFP_KERNEL define. It should
therefore be included instead of linux/fs.h.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
The last remaining file without license notice and/or SPDX license
identifier under net/batman-adv/ is the Kconfig. It should have been
licensed under the same conditions as the rest of batman-adv and the
Makefile which uses the CONFIG_* variables from Kconfig.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
The "Linux kernel licensing rules" require that each file has a SPDX
license identifier as first line (and sometimes as second line).
The FSFE REUSE practices [1] would also require the same tags but have no
restrictions on the placement in the source file. Using the "Linux kernel
licensing rules" is therefore also fulfilling the FSFE REUSE practices
requirements at the same time.
[1] https://reuse.software/practices/
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2017-12-15
1) Currently we can add or update socket policies, but
not clear them. Support clearing of socket policies
too. From Lorenzo Colitti.
2) Add documentation for the xfrm device offload api.
From Shannon Nelson.
3) Fix IPsec extended sequence numbers (ESN) for
IPsec offloading. From Yossef Efraim.
4) xfrm_dev_state_add function returns success even for
unsupported options, fix this to fail in such cases.
From Yossef Efraim.
5) Remove a redundant xfrm_state assignment.
From Aviv Heller.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simon Wunderlich says:
====================
Here are some batman-adv bugfixes:
- Initialize the fragment headers, by Sven Eckelmann
- Fix a NULL check in BATMAN V, by Sven Eckelmann
- Fix kernel doc for the time_setup() change, by Sven Eckelmann
- Use the right lock in BATMAN IV OGM Update, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to let MT7530 switch can recognize well those egress packets
having both special tag and VLAN tag, the information about the special
tag should be carried on the existing VLAN tag. On the other hand, it's
unnecessary for extra handling for ingress packets when VLAN tag is
present since it is able to put the VLAN tag after the special tag and
then follow the existing way to parse.
Signed-off-by: Sean Wang <sean.wang@mediatek.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In netif_receive_generic_xdp(), it is necessary to linearize all
nonlinear skb. However, in current implementation, skb with
troom <= 0 are not linearized. This patch fixes this by calling
skb_linearize() for all nonlinear skb.
Fixes: de8f3a83b0a0 ("bpf: add meta pointer for direct access")
Signed-off-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.
This patch:
- Make groups_sort globally visible.
- Move the call to groups_sort to the modifiers of group_info
- Remove the call to groups_sort from set_groups
Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Only the retransmit timer currently refreshes tcp_mstamp
We should do the same for delayed acks and keepalives.
Even if RFC 7323 does not request it, this is consistent to what linux
did in the past, when TS values were based on jiffies.
Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Mike Maloney <maloney@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Mike Maloney <maloney@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When ms timestamp is used, current logic uses 1us in
tcp_rcv_rtt_update() when the real rcv_rtt is within 1 - 999us.
This could cause rcv_rtt underestimation.
Fix it by always using a min value of 1ms if ms timestamp is used.
Fixes: 645f4c6f2ebd ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps")
Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prior to this patch, active Fast Open is paused on a specific
destination IP address if the previous connections to the
IP address have experienced recurring timeouts . But recent
experiments by Microsoft (https://goo.gl/cykmn7) and Mozilla
browsers indicate the isssue is often caused by broken middle-boxes
sitting close to the client. Therefore it is much better user
experience if Fast Open is disabled out-right globally to avoid
experiencing further timeouts on connections toward other
destinations.
This patch changes the destination-IP disablement to global
disablement if a connection experiencing recurring timeouts
or aborts due to timeout. Repeated incidents would still
exponentially increase the pause time, starting from an hour.
This is extremely conservative but an unfortunate compromise to
minimize bad experience due to broken middle-boxes.
Reported-by: Dragana Damjanovic <ddamjanovic@mozilla.com>
Reported-by: Patrick McManus <mcmanus@ducksong.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Wei Wang <weiwan@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
skb_warn_bad_offload warns when packets enter the GSO stack that
require skb_checksum_help or vice versa. Do not warn on arbitrary
bad packets. Packet sockets can craft many. Syzkaller was able to
demonstrate another one with eth_type games.
In particular, suppress the warning when segmentation returns an
error, which is for reasons other than checksum offload.
See also commit 36c92474498a ("net: WARN if skb_checksum_help() is
called on skb requiring segmentation") for context on this warning.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before this patch the bridge used a fixed 256 element hash table which
was fine for small use cases (in my tests it starts to degrade
above 1000 entries), but it wasn't enough for medium or large
scale deployments. Modern setups have thousands of participants in a
single bridge, even only enabling vlans and adding a few thousand vlan
entries will cause a few thousand fdbs to be automatically inserted per
participating port. So we need to scale the fdb table considerably to
cope with modern workloads, and this patch converts it to use a
rhashtable for its operations thus improving the bridge scalability.
Tests show the following results (10 runs each), at up to 1000 entries
rhashtable is ~3% slower, at 2000 rhashtable is 30% faster, at 3000 it
is 2 times faster and at 30000 it is 50 times faster.
Obviously this happens because of the properties of the two constructs
and is expected, rhashtable keeps pretty much a constant time even with
10000000 entries (tested), while the fixed hash table struggles
considerably even above 10000.
As a side effect this also reduces the net_bridge struct size from 3248
bytes to 1344 bytes. Also note that the key struct is 8 bytes.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
First, rename __inet_twsk_hashdance() to inet_twsk_hashdance()
Then, remove one inet_twsk_put() by setting tw_refcnt to 3 instead
of 4, but adding a fat warning that we do not have the right to access
tw anymore after inet_twsk_hashdance()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This macro deduplicates a lot of similar code across the hci_debugfs.c
module. Targeting to be moved to seq_file.h eventually.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The follow patchset contains Netfilter fixes for your net tree,
they are:
1) Fix compilation warning in x_tables with clang due to useless
redundant reassignment, from Colin Ian King.
2) Add bugtrap to net_exit to catch uninitialized lists, patch
from Vasily Averin.
3) Fix out of bounds memory reads in H323 conntrack helper, this
comes with an initial patch to remove replace the obscure
CHECK_BOUND macro as a dependency. From Eric Sesterhenn.
4) Reduce retransmission timeout when window is 0 in TCP conntrack,
from Florian Westphal.
6) ctnetlink clamp timeout to INT_MAX if timeout is too large,
otherwise timeout wraps around and it results in killing the
entry that is being added immediately.
7) Missing CAP_NET_ADMIN checks in cthelper and xt_osf, due to
no netns support. From Kevin Cernekee.
8) Missing maximum number of instructions checks in xt_bpf, patch
from Jann Horn.
9) With no CONFIG_PROC_FS ipt_CLUSTERIP compilation breaks,
patch from Arnd Bergmann.
10) Missing netlink attribute policy in nftables exthdr, from
Florian Westphal.
11) Enable conntrack with IPv6 MASQUERADE rules, as a357b3f80bc8
should have done in first place, from Konstantin Khlebnikov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch enables tail loss probe in cwnd reduction (CWR) state
to detect potential losses. Prior to this patch, since the sender
uses PRR to determine the cwnd in CWR state, the combination of
CWR+PRR plus tcp_tso_should_defer() could cause unnecessary stalls
upon losses: PRR makes cwnd so gentle that tcp_tso_should_defer()
defers sending wait for more ACKs. The ACKs may not come due to
packet losses.
Disallowing TLP when there is unused cwnd had the primary effect
of disallowing TLP when there is TSO deferral, Nagle deferral,
or we hit the rwin limit. Because basically every application
write() or incoming ACK will cause us to run tcp_write_xmit()
to see if we can send more, and then if we sent something we call
tcp_schedule_loss_probe() to see if we should schedule a TLP. At
that point, there are a few common reasons why some cwnd budget
could still be unused:
(a) rwin limit
(b) nagle check
(c) TSO deferral
(d) TSQ
For (d), after the next packet tx completion the TSQ mechanism
will allow us to send more packets, so we don't really need a
TLP (in practice it shouldn't matter whether we schedule one
or not). But for (a), (b), (c) the sender won't send any more
packets until it gets another ACK. But if the whole flight was
lost, or all the ACKs were lost, then we won't get any more ACKs,
and ideally we should schedule and send a TLP to get more feedback.
In particular for a long time we have wanted some kind of timer for
TSO deferral, and at least this would give us some kind of timer
Reported-by: Steve Ibanez <sibanez@stanford.edu>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Nandita Dukkipati <nanditad@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since we now hold RTNL lock in tc_action_net_exit(), it is good to
batch them to speedup tc action dismantle.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Closing a multicast socket after the final IPv4 address is deleted
from an interface can generate a membership report that uses the
source IP from a different interface. The following test script, run
from an isolated netns, reproduces the issue:
#!/bin/bash
ip link add dummy0 type dummy
ip link add dummy1 type dummy
ip link set dummy0 up
ip link set dummy1 up
ip addr add 10.1.1.1/24 dev dummy0
ip addr add 192.168.99.99/24 dev dummy1
tcpdump -U -i dummy0 &
socat EXEC:"sleep 2" \
UDP4-DATAGRAM:239.101.1.68:8889,ip-add-membership=239.0.1.68:10.1.1.1 &
sleep 1
ip addr del 10.1.1.1/24 dev dummy0
sleep 5
kill %tcpdump
RFC 3376 specifies that the report must be sent with a valid IP source
address from the destination subnet, or from address 0.0.0.0. Add an
extra check to make sure this is the case.
Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In the function tipc_sk_mcast_rcv() we call refcount_dec(&skb->users)
on received sk_buffers. Since the reference counter might hit zero at
this point, we have a potential memory leak.
We fix this by replacing refcount_dec() with kfree_skb().
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.
Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IPv4 stack reacts to changes to small MTU, by disabling itself under
RTNL.
But there is a window where threads not using RTNL can see a wrong
device mtu. This can lead to surprises, in igmp code where it is
assumed the mtu is suitable.
Fix this by reading device mtu once and checking IPv4 minimal MTU.
This patch adds missing IPV4_MIN_MTU define, to not abuse
ETH_MIN_MTU anymore.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzkaller reported crashes in IPv6 stack [1]
Xin Long found that lo MTU was set to silly values.
IPv6 stack reacts to changes to small MTU, by disabling itself under
RTNL.
But there is a window where threads not using RTNL can see a wrong
device mtu. This can lead to surprises, in mld code where it is assumed
the mtu is suitable.
Fix this by reading device mtu once and checking IPv6 minimal MTU.
[1]
skbuff: skb_over_panic: text:0000000010b86b8d len:196 put:20
head:000000003b477e60 data:000000000e85441e tail:0xd4 end:0xc0 dev:lo
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc2-mm1+ #39
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:skb_panic+0x15c/0x1f0 net/core/skbuff.c:100
RSP: 0018:ffff8801db307508 EFLAGS: 00010286
RAX: 0000000000000082 RBX: ffff8801c517e840 RCX: 0000000000000000
RDX: 0000000000000082 RSI: 1ffff1003b660e61 RDI: ffffed003b660e95
RBP: ffff8801db307570 R08: 1ffff1003b660e23 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff85bd4020
R13: ffffffff84754ed2 R14: 0000000000000014 R15: ffff8801c4e26540
FS: 0000000000000000(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000463610 CR3: 00000001c6698000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
skb_over_panic net/core/skbuff.c:109 [inline]
skb_put+0x181/0x1c0 net/core/skbuff.c:1694
add_grhead.isra.24+0x42/0x3b0 net/ipv6/mcast.c:1695
add_grec+0xa55/0x1060 net/ipv6/mcast.c:1817
mld_send_cr net/ipv6/mcast.c:1903 [inline]
mld_ifc_timer_expire+0x4d2/0x770 net/ipv6/mcast.c:2448
call_timer_fn+0x23b/0x840 kernel/time/timer.c:1320
expire_timers kernel/time/timer.c:1357 [inline]
__run_timers+0x7e1/0xb60 kernel/time/timer.c:1660
run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686
__do_softirq+0x29d/0xbb2 kernel/softirq.c:285
invoke_softirq kernel/softirq.c:365 [inline]
irq_exit+0x1d3/0x210 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:540 [inline]
smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:920
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Tested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
So far, kernel did not allow to advertise when there was a connection
established. With this patch kernel does allow it if controller
supports it.
If controller supports non-connectable advertising when connected, then
only non-connectable advertising instances will be advertised.
Signed-off-by: Łukasz Rymanowski <lukasz.rymanowski@codecoup.pl>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
There is already __hci_req_disable_advertising() function for disabling,
so use it.
Signed-off-by: Jaganath Kanakkassery <jaganathx.kanakkassery@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
Instead of open coding byte-by-byte printing, re-use %*ph specifier.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
* Improve jump targets so that a bit of exception handling can be better
reused at the end of this function.
* Adjust five condition checks.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
|
|
The MD5-key that belongs to a connection is identified by the peer's
IP-address. When we are in tcp_v4(6)_reqsk_send_ack(), we are replying
to an incoming segment from tcp_check_req() that failed the seq-number
checks.
Thus, to find the correct key, we need to use the skb's saddr and not
the daddr.
This bug seems to have been there since quite a while, but probably got
unnoticed because the consequences are not catastrophic. We will call
tcp_v4_reqsk_send_ack only to send a challenge-ACK back to the peer,
thus the connection doesn't really fail.
Fixes: 9501f9722922 ("tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().")
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Back in linux-3.13 (commit b0983d3c9b13 ("tcp: fix dynamic right sizing"))
I addressed the pressing issues we had with receiver autotuning.
But DRS suffers from extra latencies caused by rcv_rtt_est.rtt_us
drifts. One common problem happens during slow start, since the
apparent RTT measured by the receiver can be inflated by ~50%,
at the end of one packet train.
Also, a single drop can delay read() calls by one RTT, meaning
tcp_rcv_space_adjust() can be called one RTT too late.
By replacing the tri-modal heuristic with a continuous function,
we can offset the effects of not growing 'at the optimal time'.
The curve of the function matches prior behavior if the space
increased by 25% and 50% exactly.
Cost of added multiply/divide is small, considering a TCP flow
typically would run this part of the code few times in its life.
I tested this patch with 100 ms RTT / 1% loss link, 100 runs
of (netperf -l 5), and got an average throughput of 4600 Mbit
instead of 1700 Mbit.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When using large tcp_rmem[2] values (I did tests with 500 MB),
I noticed overflows while computing rcvwin.
Lets fix this before the following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While rcvbuf is properly clamped by tcp_rmem[2], rcvwin
is left to a potentially too big value.
It has no serious effect, since :
1) tcp_grow_window() has very strict checks.
2) window_clamp can be mangled by user space to any value anyway.
tcp_init_buffer_space() and companions use tcp_full_space(),
we use tcp_win_from_space() to avoid reloading sk->sk_rcvbuf
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We need to put the policies when re-using the pcpu xdst entry, else
this leaks the reference.
Fixes: ec30d78c14a813db39a647b6a348b428 ("xfrm: add xdst pcpu cache")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Now in sctp_setsockopt_reset_streams, it only does the check
optlen < sizeof(*params) for optlen. But it's not enough, as
params->srs_number_streams should also match optlen.
If the streams in params->srs_stream_list are less than stream
nums in params->srs_number_streams, later when dereferencing
the stream list, it could cause a slab-out-of-bounds crash, as
reported by syzbot.
This patch is to fix it by also checking the stream numbers in
sctp_setsockopt_reset_streams to make sure at least it's not
greater than the streams in the list.
Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset Request Parameter")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
inet->hdrincl is racy, and could lead to uninitialized stack pointer
usage, so its value should be read only once.
Fixes: c008ba5bdc9f ("ipv4: Avoid reading user iov twice after raw_probe_proto_opt")
Signed-off-by: Mohamed Ghannam <simo.ghannam@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
'fixes.2017.12.11a', 'srbd.2017.12.05a' and 'torture.2017.12.11a' into HEAD
cond_resched.2017.12.04a: Convert cond_resched_rcu_qs() to cond_resched()
dyntick.2017.11.28a: Make RCU dynticks handle interrupts from NMI
fixes.2017.12.11a: Miscellaneous fixes
srbd.2017.12.05a: Remove now-redundant smp_read_barrier_depends()
torture.2017.12.11a: Torture-testing update
|
|
Currently, a nlmon link inside a child namespace can observe systemwide
netlink activity. Filter the traffic so that nlmon can only sniff
netlink messages from its own netns.
Test case:
vpnns -- bash -c "ip link add nlmon0 type nlmon; \
ip link set nlmon0 up; \
tcpdump -i nlmon0 -q -w /tmp/nlmon.pcap -U" &
sudo ip xfrm state add src 10.1.1.1 dst 10.1.1.2 proto esp \
spi 0x1 mode transport \
auth sha1 0x6162633132330000000000000000000000000000 \
enc aes 0x00000000000000000000000000000000
grep --binary abc123 /tmp/nlmon.pcap
Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Unordered idata process is more complicated than unordered data:
- It has to add mid into sctp_stream_out to save the next mid value,
which is separated from ordered idata's.
- To support pd for unordered idata, another mid and pd_mode need to
be added to save the message id and pd state in sctp_stream_in.
- To make unordered idata reasm easier, it adds a new event queue
to save frags for idata.
The patch mostly adds the samilar reasm functions for unordered idata
as ordered idata's, and also adjusts some other codes on assign_mid,
abort_pd and ulpevent_data for idata.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
abort_pd is added as a member of sctp_stream_interleave, used to abort
partial delivery for data or idata, called in sctp_cmd_assoc_failed.
Since stream interleave allows to do partial delivery for each stream
at the same time, sctp_intl_abort_pd for idata would be very different
from the old function sctp_ulpq_abort_pd for data.
Note that sctp_ulpevent_make_pdapi will support per stream in this
patch by adding pdapi_stream and pdapi_seq in sctp_pdapi_event, as
described in section 6.1.7 of RFC6458.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
start_pd is added as a member of sctp_stream_interleave, used to
do partial_delivery for data or idata when datalen >= asoc->rwnd
in sctp_eat_data. The codes have been done in last patches, but
they need to be extracted into start_pd, so that it could be used
for SCTP_CMD_PART_DELIVER cmd as well.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
renege_events is added as a member of sctp_stream_interleave, used to
renege some old data or idata in reasm or lobby queue properly to free
some memory for the new data when there's memory stress.
It defines sctp_renege_events for idata, and leaves sctp_ulpq_renege
as it is for data.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
enqueue_event is added as a member of sctp_stream_interleave, used to
enqueue either data, idata or notification events into user socket rx
queue.
It replaces sctp_ulpq_tail_event used in the other places with
enqueue_event.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ulpevent_data is added as a member of sctp_stream_interleave, used to
do the most process in ulpq, including to convert data or idata chunk
to event, reasm them in reasm queue and put them in lobby queue in
right order, and deliver them up to user sk rx queue.
This procedure is described in section 2.2.3 of RFC8260.
It adds most functions for idata here to do the similar process as
the old functions for data. But since the details are very different
between them, the old functions can not be reused for idata.
event->ssn and event->ppid settings are moved to ulpevent_data from
sctp_ulpevent_make_rcvmsg, so that sctp_ulpevent_make_rcvmsg could
work for both data and idata.
Note that mid is added in sctp_ulpevent for idata, __packed has to
be used for defining sctp_ulpevent, or it would exceeds the skb cb
that saves a sctp_ulpevent variable for ulp layer process.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
validate_data is added as a member of sctp_stream_interleave, used
to validate ssn/chunk type for data or mid (message id)/chunk type
for idata, called in sctp_eat_data.
If this check fails, an abort packet will be sent, as said in
section 2.2.3 of RFC8260.
It also adds the process for idata in rx path. As Marcelo pointed
out, there's no need to add event table for idata, but just share
chunk_event_table with data's. It would drop data chunk for idata
and drop idata chunk for data by calling validate_data in
sctp_eat_data.
As last patch did, it also replaces sizeof(struct sctp_data_chunk)
with sctp_datachk_len for rx path.
After this patch, the idata can be accepted and delivered to ulp
layer.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
assign_number is added as a member of sctp_stream_interleave, used
to assign ssn for data or mid (message id) for idata, called in
sctp_packet_append_data. sctp_chunk_assign_ssn is left as it is,
and sctp_chunk_assign_mid is added for sctp_stream_interleave_1.
This procedure is described in section 2.2.2 of RFC8260.
All sizeof(struct sctp_data_chunk) in tx path is replaced with
sctp_datachk_len, to make it right for idata as well. And also
adjust sctp_chunk_is_data for SCTP_CID_I_DATA.
After this patch, idata can be built and sent in tx path.
Note that if sp strm_interleave is set, it has to wait_connect in
sctp_sendmsg, as asoc intl_enable need to be known after 4 shake-
hands, to decide if it should use data or idata later. data and
idata can't be mixed to send in one asoc.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To avoid hundreds of checks for the different process on I-DATA chunk,
struct sctp_stream_interleave is defined as a group of functions used
to replace the codes in some place where it needs to do different job
according to if the asoc intl_enabled is set.
With these ops, it only needs to initialize asoc->stream.si with
sctp_stream_interleave_0 for normal data if asoc intl_enable is 0,
or sctp_stream_interleave_1 for idata if asoc intl_enable is set in
sctp_stream_init.
After that, the members in asoc->stream.si can be used directly in
some special places without checking asoc intl_enable.
make_datafrag is the first member for sctp_stream_interleave, it's
used to make data or idata frags, called in sctp_datamsg_from_user.
The old function sctp_make_datafrag_empty needs to be adjust some
to fit in this ops.
Note that as idata and data chunks have different length, it also
defines data_chunk_len for sctp_stream_interleave to describe the
chunk size.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sctp_idatahdr and sctp_idata_chunk are used to define and parse
I-DATA chunk format, and sctp_make_idata is a function to build
the chunk.
The I-DATA Chunk Format is defined in section 2.1 of RFC8260.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
asoc intl_enable will be set when local sp strm_interleave is set
and there's I-DATA chunk in init and init_ack extensions, as said
in section 2.2.1 of RFC8260.
asoc intl_enable indicates all data will be sent as I-DATA chunks.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds intl_enable in asoc and netns, and strm_interleave in
sctp_sock to indicate if stream interleave is enabled and supported.
netns intl_enable would be set via procfs, but that is not added yet
until all stream interleave codes are completely implemented; asoc
intl_enable will be set when doing 4-shakehands.
sp strm_interleave can be set by sockopt SCTP_INTERLEAVING_SUPPORTED
which is also added in this patch. This socket option is defined in
section 4.3.1 of RFC8260.
Note that strm_interleave can only be set by sockopt when both netns
intl_enable and sp frag_interleave are set.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|