Age | Commit message (Collapse) | Author |
|
Previously if an active TCP open has SYN timeout, it always undo the
cwnd upon receiving the SYNACK. This is because tcp_clean_rtx_queue
would reset tp->retrans_stamp when SYN is acked, which fools then
tcp_try_undo_loss and tcp_packet_delayed. Addressing this issue is
required to properly support undo for spurious SYN timeout.
Fixing this is tricky -- for active TCP open tp->retrans_stamp
records the time when the handshake starts, not the first
retransmission time as the name may suggest. The simplest fix is
for tcp_packet_delayed to ensure it is valid before comparing with
other timestamp.
One side effect of this change is active TCP Fast Open that incurred
SYN timeout. Upon receiving a SYN-ACK that only acknowledged
the SYN, it would immediately retransmit unacknowledged data in
tcp_ack() because the data is marked lost after SYN timeout. But
the retransmission would have an incorrect ack sequence number since
rcv_nxt has not been updated yet tcp_rcv_synsent_state_process(), the
retransmission needs to properly handed by tcp_rcv_fastopen_synack()
like before.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
update_chksum() accesses nskb->sk before it has been set
by complete_skb(), move the init up.
Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Packet sockets in datagram mode take a destination address. Verify its
length before passing to dev_hard_header.
Prior to 2.6.14-rc3, the send code ignored sll_halen. This is
established behavior. Directly compare msg_namelen to dev->addr_len.
Change v1->v2: initialize addr in all paths
Fixes: 6b8d95f1795c4 ("packet: validate address length if non-zero")
Suggested-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Packet send checks that msg_name is at least sizeof sockaddr_ll.
Packet recv must return at least this length, so that its output
can be passed unmodified to packet send.
This ceased to be true since adding support for lladdr longer than
sll_addr. Since, the return value uses true address length.
Always return at least sizeof sockaddr_ll, even if address length
is shorter. Zero the padding bytes.
Change v1->v2: do not overwrite zeroed padding again. use copy_len.
Fixes: 0fb375fb9b93 ("[AF_PACKET]: Allow for > 8 byte hardware addresses.")
Suggested-by: David Laight <David.Laight@aculab.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The devlink health reporters create/destroy and user commands currently
use the devlink->lock as a locking mechanism. Different reporters have
different rules in the driver and are being created/destroyed during
different stages of driver load/unload/running. So during execution of a
reporter recover the flow can go through another reporter's destroy and
create. Such flow leads to deadlock trying to lock a mutex already
held.
With the new locking mechanism the different reporters share mutex lock
only to protect access to shared reporters list.
Added refcount per reporter, to protect the reporters from destroy while
being used.
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ying triggered a call trace when doing an asconf testing:
BUG: scheduling while atomic: swapper/12/0/0x10000100
Call Trace:
<IRQ> [<ffffffffa4375904>] dump_stack+0x19/0x1b
[<ffffffffa436fcaf>] __schedule_bug+0x64/0x72
[<ffffffffa437b93a>] __schedule+0x9ba/0xa00
[<ffffffffa3cd5326>] __cond_resched+0x26/0x30
[<ffffffffa437bc4a>] _cond_resched+0x3a/0x50
[<ffffffffa3e22be8>] kmem_cache_alloc_node+0x38/0x200
[<ffffffffa423512d>] __alloc_skb+0x5d/0x2d0
[<ffffffffc0995320>] sctp_packet_transmit+0x610/0xa20 [sctp]
[<ffffffffc098510e>] sctp_outq_flush+0x2ce/0xc00 [sctp]
[<ffffffffc098646c>] sctp_outq_uncork+0x1c/0x20 [sctp]
[<ffffffffc0977338>] sctp_cmd_interpreter.isra.22+0xc8/0x1460 [sctp]
[<ffffffffc0976ad1>] sctp_do_sm+0xe1/0x350 [sctp]
[<ffffffffc099443d>] sctp_primitive_ASCONF+0x3d/0x50 [sctp]
[<ffffffffc0977384>] sctp_cmd_interpreter.isra.22+0x114/0x1460 [sctp]
[<ffffffffc0976ad1>] sctp_do_sm+0xe1/0x350 [sctp]
[<ffffffffc097b3a4>] sctp_assoc_bh_rcv+0xf4/0x1b0 [sctp]
[<ffffffffc09840f1>] sctp_inq_push+0x51/0x70 [sctp]
[<ffffffffc099732b>] sctp_rcv+0xa8b/0xbd0 [sctp]
As it shows, the first sctp_do_sm() running under atomic context (NET_RX
softirq) invoked sctp_primitive_ASCONF() that uses GFP_KERNEL flag later,
and this flag is supposed to be used in non-atomic context only. Besides,
sctp_do_sm() was called recursively, which is not expected.
Vlad tried to fix this recursive call in Commit c0786693404c ("sctp: Fix
oops when sending queued ASCONF chunks") by introducing a new command
SCTP_CMD_SEND_NEXT_ASCONF. But it didn't work as this command is still
used in the first sctp_do_sm() call, and sctp_primitive_ASCONF() will
be called in this command again.
To avoid calling sctp_do_sm() recursively, we send the next queued ASCONF
not by sctp_primitive_ASCONF(), but by sctp_sf_do_prm_asconf() in the 1st
sctp_do_sm() directly.
Reported-by: Ying Xu <yinxu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We had many syzbot reports that seem to be caused by use-after-free
of struct fib6_info.
ip6_dst_destroy(), fib6_drop_pcpu_from() and rt6_remove_exception()
are writers vs rt->from, and use non consistent synchronization among
themselves.
Switching to xchg() will solve the issues with no possible
lockdep issues.
BUG: KASAN: user-memory-access in atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:294 [inline]
BUG: KASAN: user-memory-access in fib6_info_release include/net/ip6_fib.h:292 [inline]
BUG: KASAN: user-memory-access in fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
BUG: KASAN: user-memory-access in fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
Write of size 4 at addr 0000000000ffffb4 by task syz-executor.1/7649
CPU: 0 PID: 7649 Comm: syz-executor.1 Not tainted 5.1.0-rc6+ #183
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
kasan_report.cold+0x5/0x40 mm/kasan/report.c:321
check_memory_region_inline mm/kasan/generic.c:185 [inline]
check_memory_region+0x123/0x190 mm/kasan/generic.c:191
kasan_check_write+0x14/0x20 mm/kasan/common.c:108
atomic_dec_and_test include/asm-generic/atomic-instrumented.h:747 [inline]
fib6_info_release include/net/ip6_fib.h:294 [inline]
fib6_info_release include/net/ip6_fib.h:292 [inline]
fib6_drop_pcpu_from net/ipv6/ip6_fib.c:927 [inline]
fib6_purge_rt+0x4f6/0x670 net/ipv6/ip6_fib.c:960
fib6_del_route net/ipv6/ip6_fib.c:1813 [inline]
fib6_del+0xac2/0x10a0 net/ipv6/ip6_fib.c:1844
fib6_clean_node+0x3a8/0x590 net/ipv6/ip6_fib.c:2006
fib6_walk_continue+0x495/0x900 net/ipv6/ip6_fib.c:1928
fib6_walk+0x9d/0x100 net/ipv6/ip6_fib.c:1976
fib6_clean_tree+0xe0/0x120 net/ipv6/ip6_fib.c:2055
__fib6_clean_all+0x118/0x2a0 net/ipv6/ip6_fib.c:2071
fib6_clean_all+0x2b/0x40 net/ipv6/ip6_fib.c:2082
rt6_sync_down_dev+0x134/0x150 net/ipv6/route.c:4057
rt6_disable_ip+0x27/0x5f0 net/ipv6/route.c:4062
addrconf_ifdown+0xa2/0x1220 net/ipv6/addrconf.c:3705
addrconf_notify+0x19a/0x2260 net/ipv6/addrconf.c:3630
notifier_call_chain+0xc7/0x240 kernel/notifier.c:93
__raw_notifier_call_chain kernel/notifier.c:394 [inline]
raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:401
call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1753
call_netdevice_notifiers_extack net/core/dev.c:1765 [inline]
call_netdevice_notifiers net/core/dev.c:1779 [inline]
dev_close_many+0x33f/0x6f0 net/core/dev.c:1522
rollback_registered_many+0x43b/0xfd0 net/core/dev.c:8177
rollback_registered+0x109/0x1d0 net/core/dev.c:8242
unregister_netdevice_queue net/core/dev.c:9289 [inline]
unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9282
unregister_netdevice include/linux/netdevice.h:2658 [inline]
__tun_detach+0xd5b/0x1000 drivers/net/tun.c:727
tun_detach drivers/net/tun.c:744 [inline]
tun_chr_close+0xe0/0x180 drivers/net/tun.c:3443
__fput+0x2e5/0x8d0 fs/file_table.c:278
____fput+0x16/0x20 fs/file_table.c:309
task_work_run+0x14a/0x1c0 kernel/task_work.c:113
exit_task_work include/linux/task_work.h:22 [inline]
do_exit+0x90a/0x2fa0 kernel/exit.c:876
do_group_exit+0x135/0x370 kernel/exit.c:980
__do_sys_exit_group kernel/exit.c:991 [inline]
__se_sys_exit_group kernel/exit.c:989 [inline]
__x64_sys_exit_group+0x44/0x50 kernel/exit.c:989
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458da9
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffeafc2a6a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 000000000000001c RCX: 0000000000458da9
RDX: 0000000000412a80 RSI: 0000000000a54ef0 RDI: 0000000000000043
RBP: 00000000004be552 R08: 000000000000000c R09: 000000000004c0d1
R10: 0000000002341940 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00007ffeafc2a7f0 R14: 000000000004c065 R15: 00007ffeafc2a800
Fixes: a68886a69180 ("net/ipv6: Make from in rt6_info rcu protected")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: David Ahern <dsahern@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Wei Wang <weiwan@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that all drivers can be probed using more traditional methods,
remove the legacy probe code.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This hides the need to perform a two-phase transaction and construct a
switchdev_obj_port_vlan struct.
Call graph (including a function that will be introduced in a follow-up
patch) looks like this now (same for the *_vlan_del function):
dsa_slave_vlan_rx_add_vid dsa_port_setup_8021q_tagging
| |
| |
| +-------------+
| |
v v
dsa_port_vid_add dsa_slave_port_obj_add
| |
+-------+ +-------+
| |
v v
dsa_port_vlan_add
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Even if VLAN filtering is global, DSA will call this callback once per
each port. Drivers should not have to compare the global state with the
requested change. So let DSA do it.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The current behavior is not as obvious as one would assume (which is
that, if the driver set vlan_filtering_is_global = 1, then checking any
dp->vlan_filtering would yield the same result). Only the ports which
are actively enslaved into a bridge would have vlan_filtering set.
This makes it tricky for drivers to check what the global state is.
So fix this and make the struct dsa_switch hold this global setting.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When ports are standalone (after they left the bridge), they should have
no VLAN filtering semantics (they should pass all traffic to the CPU).
Currently this is not true for switchdev drivers, because the bridge
"forgets" to unset that.
Normally one would think that doing this at the bridge layer would be a
better idea, i.e. call br_vlan_filter_toggle() from br_del_if(), similar
to how nbp_vlan_init() is called from br_add_if().
However what complicates that approach, and makes this one preferable,
is the fact that for the bridge core, vlan_filtering is a per-bridge
setting, whereas for switchdev/DSA it is per-port. Also there are
switches where the setting is per the entire device, and unsetting
vlan_filtering one by one, for each leaving port, would not be possible
from the bridge core without a certain level of awareness. So do this in
DSA and let drivers be unaware of it.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On some switches, the action of whether to parse VLAN frame headers and use
that information for ingress admission is configurable, but not per
port. Such is the case for the Broadcom BCM53xx and the NXP SJA1105
families, for example. In that case, DSA can prevent the bridge core
from trying to apply different VLAN filtering settings on net devices
that belong to the same switch.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Suggested-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This allows drivers to query the VLAN setting imposed by the bridge
driver directly from DSA, instead of keeping their own state based on
the .port_vlan_filtering callback.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before taking a refcount on a rcu protected structure,
we need to make sure the refcount is not zero.
syzbot reported :
refcount_t: increment on 0; use-after-free.
WARNING: CPU: 1 PID: 23533 at lib/refcount.c:156 refcount_inc_checked lib/refcount.c:156 [inline]
WARNING: CPU: 1 PID: 23533 at lib/refcount.c:156 refcount_inc_checked+0x61/0x70 lib/refcount.c:154
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 23533 Comm: syz-executor.2 Not tainted 5.1.0-rc7+ #93
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
panic+0x2cb/0x65c kernel/panic.c:214
__warn.cold+0x20/0x45 kernel/panic.c:571
report_bug+0x263/0x2b0 lib/bug.c:186
fixup_bug arch/x86/kernel/traps.c:179 [inline]
fixup_bug arch/x86/kernel/traps.c:174 [inline]
do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272
do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291
invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
RIP: 0010:refcount_inc_checked lib/refcount.c:156 [inline]
RIP: 0010:refcount_inc_checked+0x61/0x70 lib/refcount.c:154
Code: 1d 98 2b 2a 06 31 ff 89 de e8 db 2c 40 fe 84 db 75 dd e8 92 2b 40 fe 48 c7 c7 20 7a a1 87 c6 05 78 2b 2a 06 01 e8 7d d9 12 fe <0f> 0b eb c1 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 41 57 41
RSP: 0018:ffff888069f0fba8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 000000000000f353 RSI: ffffffff815afcb6 RDI: ffffed100d3e1f67
RBP: ffff888069f0fbb8 R08: ffff88809b1845c0 R09: ffffed1015d23ef1
R10: ffffed1015d23ef0 R11: ffff8880ae91f787 R12: ffff8880a8f26968
R13: 0000000000000004 R14: dffffc0000000000 R15: ffff8880a49a6440
l2tp_tunnel_inc_refcount net/l2tp/l2tp_core.h:240 [inline]
l2tp_tunnel_get+0x250/0x580 net/l2tp/l2tp_core.c:173
pppol2tp_connect+0xc00/0x1c70 net/l2tp/l2tp_ppp.c:702
__sys_connect+0x266/0x330 net/socket.c:1808
__do_sys_connect net/socket.c:1819 [inline]
__se_sys_connect net/socket.c:1816 [inline]
__x64_sys_connect+0x73/0xb0 net/socket.c:1816
Fixes: 54652eb12c1b ("l2tp: hold tunnel while looking up sessions in l2tp_netlink")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If register_snap_client fails in atalk_init,
error code should be set, otherwise it will
triggers NULL pointer dereference while unloading
module.
Fixes: 9804501fa122 ("appletalk: Fix potential NULL pointer dereference in unregister_snap_client")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In rxrpc_destroy_all_calls(), there are two phases: (1) make sure the
->calls list is empty, emitting error messages if not, and (2) wait for the
RCU cleanup to happen on outstanding calls (ie. ->nr_calls becomes 0).
To avoid taking the call_lock, the function prechecks ->calls and if empty,
it returns to avoid taking the lock - this is wrong, however: it still
needs to go and do the second phase and wait for ->nr_calls to become 0.
Without this, the rxrpc_net struct may get deallocated before we get to the
RCU cleanup for the last calls. This can lead to:
Slab corruption (Not tainted): kmalloc-16k start=ffff88802b178000, len=16384
050: 6b 6b 6b 6b 6b 6b 6b 6b 61 6b 6b 6b 6b 6b 6b 6b kkkkkkkkakkkkkkk
Note the "61" at offset 0x58. This corresponds to the ->nr_calls member of
struct rxrpc_net (which is >9k in size, and thus allocated out of the 16k
slab).
Fix this by flipping the condition on the if-statement, putting the locked
section inside the if-body and dropping the return from there. The
function will then always go on to wait for the RCU cleanup on outstanding
calls.
Fixes: 2baec2c3f854 ("rxrpc: Support network namespacing")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2019-04-30
1) A lot of work to remove indirections from the xfrm code.
From Florian Westphal.
2) Support ESP offload in combination with gso partial.
From Boris Pismenny.
3) Remove some duplicated code from vti4.
From Jeremy Sowden.
Please note that there is merge conflict
between commit:
8742dc86d0c7 ("xfrm4: Fix uninitialized memory read in _decode_session4")
from the ipsec tree and commit:
c53ac41e3720 ("xfrm: remove decode_session indirection from afinfo_policy")
from the ipsec-next tree. The merge conflict will appear
when those trees get merged during the merge window.
The conflict can be solved as it is done in linux-next:
https://lkml.org/lkml/2019/4/25/1207
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2019-04-30
1) Fix an out-of-bound array accesses in __xfrm_policy_unlink.
From YueHaibing.
2) Reset the secpath on failure in the ESP GRO handlers
to avoid dereferencing an invalid pointer on error.
From Myungho Jung.
3) Add and revert a patch that tried to add rcu annotations
to netns_xfrm. From Su Yanjun.
4) Wait for rcu callbacks before freeing xfrm6_tunnel_spi_kmem.
From Su Yanjun.
5) Fix forgotten vti4 ipip tunnel deregistration.
From Jeremy Sowden:
6) Remove some duplicated log messages in vti4.
From Jeremy Sowden.
7) Don't use IPSEC_PROTO_ANY when flushing states because
this will flush only IPsec portocol speciffic states.
IPPROTO_ROUTING states may remain in the lists when
doing net exit. Fix this by replacing IPSEC_PROTO_ANY
with zero. From Cong Wang.
8) Add length check for UDP encapsulation to fix "Oversized IP packet"
warnings on receive side. From Sabrina Dubroca.
9) Fix xfrm interface lookup when the interface is associated to
a vrf layer 3 master device. From Martin Willi.
10) Reload header pointers after pskb_may_pull() in _decode_session4(),
otherwise we may read from uninitialized memory.
11) Update the documentation about xfrm[46]_gc_thresh, it
is not used anymore after the flowcache removal.
From Nicolas Dichtel.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a spelling mistake in the module description. Fix this.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The 'id' key returns the unique id of the conntrack entry as returned
by nf_ct_get_id().
Signed-off-by: Brett Mastbergen <bmastbergen@untangle.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This improves the original commit 17c357efe5ec ("openvswitch: load
NAT helper") where it unconditionally tries to load the module for
every flow using NAT, so not efficient when loading multiple flows.
It also doesn't hold any references to the NAT module while the
flow is active.
This change fixes those problems. It will try to load the module
only if it's not present. It grabs a reference to the NAT module
and holds it while the flow is active. Finally, an error message
shows up if either actions above fails.
Fixes: 17c357efe5ec ("openvswitch: load NAT helper")
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Register amanda, ftp, irc, sip and tftp NAT helpers.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The API allows a conntrack helper to indicate its corresponding
NAT helper which then can be loaded and reference counted.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Each NAT helper creates a module alias which follows a pattern.
Use macros for consistency.
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We use the zero and one to limit the boolean options setting.
After this patch we only set 0 or 1 to boolean options for nf
conntrack sysctl.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This file clearly uses modular infrastructure but does not call
out the inclusion of <linux/module.h> explicitly. We add that
include explicitly here, so we can tidy up some header usage
elsewhere w/o causing build breakage.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
The nf_tables.h header is used in a lot of files, but it turns out
that there is only one actual user of nft_expr_clone().
Hence we relocate that function to be with the one consumer of it
and avoid having to process it with CPP for all the other files.
This will also enable a reduction in the other headers that the
nf_tables.h itself has to include just to be stand-alone, hence
a pending further significant reduction in the CPP content that
needs to get processed for each netfilter file.
Note that the explicit "inline" has been dropped as part of this
relocation. In similar changes to this, I believe Dave has asked
this be done, so we free up gcc to make the choice of whether to
inline or not.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
syzbot was able to catch a use-after-free read in pid_nr_ns() [1]
ip6fl_seq_show() seems to use RCU protection, dereferencing fl->owner.pid
but fl_free() releases fl->owner.pid before rcu grace period is started.
[1]
BUG: KASAN: use-after-free in pid_nr_ns+0x128/0x140 kernel/pid.c:407
Read of size 4 at addr ffff888094012a04 by task syz-executor.0/18087
CPU: 0 PID: 18087 Comm: syz-executor.0 Not tainted 5.1.0-rc6+ #89
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x172/0x1f0 lib/dump_stack.c:113
print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
__asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
pid_nr_ns+0x128/0x140 kernel/pid.c:407
ip6fl_seq_show+0x2f8/0x4f0 net/ipv6/ip6_flowlabel.c:794
seq_read+0xad3/0x1130 fs/seq_file.c:268
proc_reg_read+0x1fe/0x2c0 fs/proc/inode.c:227
do_loop_readv_writev fs/read_write.c:701 [inline]
do_loop_readv_writev fs/read_write.c:688 [inline]
do_iter_read+0x4a9/0x660 fs/read_write.c:922
vfs_readv+0xf0/0x160 fs/read_write.c:984
kernel_readv fs/splice.c:358 [inline]
default_file_splice_read+0x475/0x890 fs/splice.c:413
do_splice_to+0x12a/0x190 fs/splice.c:876
splice_direct_to_actor+0x2d2/0x970 fs/splice.c:953
do_splice_direct+0x1da/0x2a0 fs/splice.c:1062
do_sendfile+0x597/0xd00 fs/read_write.c:1443
__do_sys_sendfile64 fs/read_write.c:1498 [inline]
__se_sys_sendfile64 fs/read_write.c:1490 [inline]
__x64_sys_sendfile64+0x15a/0x220 fs/read_write.c:1490
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458da9
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f300d24bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 0000000000458da9
RDX: 00000000200000c0 RSI: 0000000000000008 RDI: 0000000000000007
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 000000000000005a R11: 0000000000000246 R12: 00007f300d24c6d4
R13: 00000000004c5fa3 R14: 00000000004da748 R15: 00000000ffffffff
Allocated by task 17543:
save_stack+0x45/0xd0 mm/kasan/common.c:75
set_track mm/kasan/common.c:87 [inline]
__kasan_kmalloc mm/kasan/common.c:497 [inline]
__kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505
slab_post_alloc_hook mm/slab.h:437 [inline]
slab_alloc mm/slab.c:3393 [inline]
kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555
alloc_pid+0x55/0x8f0 kernel/pid.c:168
copy_process.part.0+0x3b08/0x7980 kernel/fork.c:1932
copy_process kernel/fork.c:1709 [inline]
_do_fork+0x257/0xfd0 kernel/fork.c:2226
__do_sys_clone kernel/fork.c:2333 [inline]
__se_sys_clone kernel/fork.c:2327 [inline]
__x64_sys_clone+0xbf/0x150 kernel/fork.c:2327
do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Freed by task 7789:
save_stack+0x45/0xd0 mm/kasan/common.c:75
set_track mm/kasan/common.c:87 [inline]
__kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
__cache_free mm/slab.c:3499 [inline]
kmem_cache_free+0x86/0x260 mm/slab.c:3765
put_pid.part.0+0x111/0x150 kernel/pid.c:111
put_pid+0x20/0x30 kernel/pid.c:105
fl_free+0xbe/0xe0 net/ipv6/ip6_flowlabel.c:102
ip6_fl_gc+0x295/0x3e0 net/ipv6/ip6_flowlabel.c:152
call_timer_fn+0x190/0x720 kernel/time/timer.c:1325
expire_timers kernel/time/timer.c:1362 [inline]
__run_timers kernel/time/timer.c:1681 [inline]
__run_timers kernel/time/timer.c:1649 [inline]
run_timer_softirq+0x652/0x1700 kernel/time/timer.c:1694
__do_softirq+0x266/0x95a kernel/softirq.c:293
The buggy address belongs to the object at ffff888094012a00
which belongs to the cache pid_2 of size 88
The buggy address is located 4 bytes inside of
88-byte region [ffff888094012a00, ffff888094012a58)
The buggy address belongs to the page:
page:ffffea0002500480 count:1 mapcount:0 mapping:ffff88809a483080 index:0xffff888094012980
flags: 0x1fffc0000000200(slab)
raw: 01fffc0000000200 ffffea00018a3508 ffffea0002524a88 ffff88809a483080
raw: ffff888094012980 ffff888094012000 000000010000001b 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888094012900: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
ffff888094012980: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
>ffff888094012a00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
^
ffff888094012a80: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
ffff888094012b00: fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc fc
Fixes: 4f82f45730c6 ("net ip6 flowlabel: Make owner a union of struct pid * and kuid_t")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sending dest unreach
When there is no route to an IPv6 dest addr, skb_dst(skb) points
to loopback dev in the case of that the IP6CB(skb)->iif is
enslaved to a vrf. This causes Ip6InNoRoutes to be incremented on the
loopback dev. This also causes the lookup to fail on icmpv6_send() and
the dest unreachable to not sent and Ip6OutNoRoutes gets incremented on
the loopback dev.
To reproduce:
* Gateway configuration:
ip link add dev vrf_258 type vrf table 258
ip link set dev enp0s9 master vrf_258
ip addr add 66:1/64 dev enp0s9
ip -6 route add unreachable default metric 8192 table 258
sysctl -w net.ipv6.conf.all.forwarding=1
sysctl -w net.ipv6.conf.enp0s9.forwarding=1
* Sender configuration:
ip addr add 66::2/64 dev enp0s9
ip -6 route add default via 66::1
and ping 67::1 for example from the sender.
Fix this by counting on the original netdev and reset the skb dst to
force a fresh lookup.
v2: Fix typo of destination address in the repro steps.
v3: Simplify the loopback check (per David Ahern) and use reverse
Christmas tree format (per David Miller).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Richard and Bruno both reported that my commit added a bug,
and Bruno was able to determine the problem came when a segment
wih a FIN packet was coalesced to a prior one in tcp backlog queue.
It turns out the header prediction in tcp_rcv_established()
looks back to TCP headers in the packet, not in the metadata
(aka TCP_SKB_CB(skb)->tcp_flags)
The fast path in tcp_rcv_established() is not supposed to
handle a FIN flag (it does not call tcp_fin())
Therefore we need to make sure to propagate the FIN flag,
so that the coalesced packet does not go through the fast path,
the same than a GRO packet carrying a FIN flag.
While we are at it, make sure we do not coalesce packets with
RST or SYN, or if they do not have ACK set.
Many thanks to Richard and Bruno for pinpointing the bad commit,
and to Richard for providing a first version of the fix.
Fixes: 4f693b55c3d2 ("tcp: implement coalescing on backlog queue")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Reported-by: Bruno Prémont <bonbons@sysophe.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A request for a flowlabel fails in process or user exclusive mode must
fail if the caller pid or uid does not match. Invert the test.
Previously, the test was unsafe wrt PID recycling, but indeed tested
for inequality: fl1->owner != fl->owner
Fixes: 4f82f45730c68 ("net ip6 flowlabel: Make owner a union of struct pid* and kuid_t")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
* fix use-after-free in mac80211 TXQs
* fix RX STBC byte order
* fix debugfs rename crashing due to ERR_PTR()
* fix missing regulatory notification
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make the CONFIG symbols tristate and add help text.
The broadcom and Microchip KSZ tag drivers support two different
tagging protocols in one driver. Add a configuration option for the
drivers, and then options to select the protocol.
Create a submenu for the tagging drivers.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v2:
tab/space cleanup
Help text wording
NET_DSA_TAG_BRCM_COMMON and NET_DSA_TAG_KZS_COMMON hidden
v3:
More tabification
Punctuation
v4:
trailler->trailer
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It is possible that the driver is compiled with both
CONFIG_NET_DSA_TAG_BRCM and CONFIG_NET_DSA_TAG_BRCM_PREPEND disabled.
This results in warnings about unused symbols. Add some conditional
compilation to avoid this.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2
Reorder patch to before tag drivers can be modules
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that tag drivers dynamically register, we don't need the static
table. Remove it. This also means the tag driver structures can be
made static.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Implement the _get and _put functions to make use of the list of tag
drivers. Also, trigger the loading of the module, based on the alias
information. The _get function takes a reference on the tag driver, so
it cannot be unloaded, and the _put function releases the reference.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2:
Make tag_driver_register void
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When a DSA switch driver is unloaded, the lock on the tag driver
should be released so the module can be unloaded. Add the needed calls,
but leave the actual release code as a stub.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dsa_resolve_tag_protocol() is used to find the tagging driver needed
by a switch driver. When the tagging drivers become modules, it will
be necassary to take a reference on the module to prevent it being
unloaded. So rename this function to _get() to indicate it has some
locking properties.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The none tagger is special in that it does not live in a tag_*.c file,
but is within the core. Register/unregister when DSA is
loaded/unloaded.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Let the tag drivers register themselves with the DSA core, keeping
them in a linked list.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
v2
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A DSA tag driver module will need to register the tag protocols it
implements with the DSA core. Add macros containing this boiler plate.
The registration/unregistration code is currently just a stub. A Later
patch will add the real implementation.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2
Fix indent of #endif
Rewrite to move list pointer into a new structure
v3
Move kdoc next to macro
Fix THIS_MODULE indentation
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order that we can match the tagging protocol a switch driver
request to the tagger, we need to know what protocol the tagger
supports. Add this information to the ops structure.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2
More tag protocol to end of structure to keep hot members at the beginning.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All the tag drivers are some variant of GPL. Add a MODULE_LICENSE()
indicating this, so the drivers can later be compiled as modules.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When the tag drivers become modules, we will need to dynamically load
them based on what the switch drivers need. Add aliases to map between
the TAG protocol and the driver.
In order to do this, we need the tag protocol number as something
which the C pre-processor can stringinfy. Only the compiler knows the
value of an enum, CPP cannot use them. So add #defines.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rather than keep a list to map a tagger ops to a name, place the name
into the ops structure. This removes the hard coded list, a step
towards making the taggers more dynamic.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
v2:
Move name to end of structure, keeping the hot entries at the beginning.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add an SPDX header, and remove the license boilerplate text.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2019-04-28
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Introduce BPF socket local storage map so that BPF programs can store
private data they associate with a socket (instead of e.g. separate hash
table), from Martin.
2) Add support for bpftool to dump BTF types. This is done through a new
`bpftool btf dump` sub-command, from Andrii.
3) Enable BPF-based flow dissector for skb-less eth_get_headlen() calls which
was currently not supported since skb was used to lookup netns, from Stanislav.
4) Add an opt-in interface for tracepoints to expose a writable context
for attached BPF programs, used here for NBD sockets, from Matt.
5) BPF xadd related arm64 JIT fixes and scalability improvements, from Daniel.
6) Change the skb->protocol for bpf_skb_adjust_room() helper in order to
support tunnels such as sit. Add selftests as well, from Willem.
7) Various smaller misc fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, the UDP GRO code path does bad things on some edge
conditions - Aggregation can happen even on packet with different
lengths.
Fix the above by rewriting the 'complete' condition for GRO
packets. While at it, note explicitly that we allow merging the
first packet per burst below gso_size.
Reported-by: Sean Tong <seantong114@gmail.com>
Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|