Age | Commit message (Collapse) | Author |
|
Similarly to what was done with commit a52956dfc503 ("net sched actions:
fix refcnt leak in skbmod"), fix the error path of tcf_vlan_init() to avoid
refcnt leaks when wrong value of TCA_VLAN_PUSH_VLAN_PROTOCOL is given.
Fixes: 5026c9b1bafc ("net sched: vlan action fix late binding")
CC: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently NOLOCK qdiscs pay a measurable overhead to atomically
manipulate the __QDISC_STATE_RUNNING. Such bit is flipped twice per
packet in the uncontended scenario with packet rate below the
line rate: on packed dequeue and on the next, failing dequeue attempt.
This changeset moves the bit manipulation into the qdisc_run_{begin,end}
helpers, so that the bit is now flipped only once per packet, with
measurable performance improvement in the uncontended scenario.
This also allows simplifying the qdisc teardown code path - since
qdisc_is_running() is now effective for each qdisc type - and avoid a
possible race between qdisc_run() and dev_deactivate_many(), as now
the some_qdisc_is_busy() can properly detect NOLOCK qdiscs being busy
dequeuing packets.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzkaller found a reliable way to crash the host, hitting a BUG()
in __tcp_retransmit_skb()
Malicous MSG_FASTOPEN is the root cause. We need to purge write queue
in tcp_connect_init() at the point we init snd_una/write_seq.
This patch also replaces the BUG() by a less intrusive WARN_ON_ONCE()
kernel BUG at net/ipv4/tcp_output.c:2837!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 5276 Comm: syz-executor0 Not tainted 4.17.0-rc3+ #51
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__tcp_retransmit_skb+0x2992/0x2eb0 net/ipv4/tcp_output.c:2837
RSP: 0000:ffff8801dae06ff8 EFLAGS: 00010206
RAX: ffff8801b9fe61c0 RBX: 00000000ffc18a16 RCX: ffffffff864e1a49
RDX: 0000000000000100 RSI: ffffffff864e2e12 RDI: 0000000000000005
RBP: ffff8801dae073a0 R08: ffff8801b9fe61c0 R09: ffffed0039c40dd2
R10: ffffed0039c40dd2 R11: ffff8801ce206e93 R12: 00000000421eeaad
R13: ffff8801ce206d4e R14: ffff8801ce206cc0 R15: ffff8801cd4f4a80
FS: 0000000000000000(0000) GS:ffff8801dae00000(0063) knlGS:00000000096bc900
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000020000000 CR3: 00000001c47b6000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<IRQ>
tcp_retransmit_skb+0x2e/0x250 net/ipv4/tcp_output.c:2923
tcp_retransmit_timer+0xc50/0x3060 net/ipv4/tcp_timer.c:488
tcp_write_timer_handler+0x339/0x960 net/ipv4/tcp_timer.c:573
tcp_write_timer+0x111/0x1d0 net/ipv4/tcp_timer.c:593
call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
expire_timers kernel/time/timer.c:1363 [inline]
__run_timers+0x79e/0xc50 kernel/time/timer.c:1666
run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
__do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
invoke_softirq kernel/softirq.c:365 [inline]
irq_exit+0x1d1/0x200 kernel/softirq.c:405
exiting_irq arch/x86/include/asm/apic.h:525 [inline]
smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
Fixes: cf60af03ca4e ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Avoid to run the processing in smc_lgr_terminate() more than once,
remember when the link group termination is triggered.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Drop incoming messages when the link is flagged as inactive.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before smc_lgr_free() is called the link must be set inactive by calling
smc_llc_link_inactive().
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Always set a reason_code when smc_conn_create() returns an error code.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
SMC handles deferred work in tasklets. As tasklets cannot sleep this
can result in rare EBUSY conditions, so defer this work in a work queue.
The high level api functions do not defer work because they can sleep
until the llc send is actually completed.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move the llc layer specific initialization and cleanup out of smc_core.c
into smc_llc.c (smc_llc_link_init and smc_llc_link_clear). Move all
initialization of a link into the new init function.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Make smc_llc_send_test_link() static and remove it from the header file.
And to send a test_link response set the response flag and send the
message back as-is, without using smc_llc_send_test_link(). And because
smc_llc_send_test_link() must no longer send responses, remove the
response flag handling from the function.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove an unneeded (void *) cast from the calls to
smc_llc_send_message(). No functional changes.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Register new rmb buffers with the remote peer by exchanging a
confirm_rkey llc message.
Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If TCP_NODELAY is set or TCP_CORK is reset, setsockopt triggers the
tx worker. This does not make sense, if the SMC socket switched to
the TCP fallback when the connection is created. This patch adds
the additional check for the fallback case.
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
And switch to proc_create_single_data.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
And remove proc boilerplate code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls. Switch to use
proc_create_seq where applicable.
Also don't bother handling proc_create* failures - the driver works
perfectly fine without the proc files, and the cleanup will handle
missing files gracefully.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
And use proc private data directly instead of doing a detour
through seq->private and private state.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
And remove proc boilerplate code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
And use proc private data directly instead of doing a detour
through seq->private.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
And use proc private data directly instead of doing a detour
through seq->private.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Variant of proc_create_data that directly take a seq_file show
callback and deals with network namespaces in ->open and ->release.
All callers of proc_create + single_open_net converted over, and
single_{open,release}_net are removed entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Variants of proc_create{,_data} that directly take a struct seq_operations
and deal with network namespaces in ->open and ->release. All callers of
proc_create + seq_open_net converted over, and seq_{open,release}_net are
removed entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Just use the address family from the proc private data instead of copying
it into per-file data.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Remove a couple indirections to make the code look like most other
protocols.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The code should be using the pid namespace from the procfs mount
instead of trying to look it up during open.
Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Pass the hashtable to the proc private data instead of copying
it into the per-file private data.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Remove the pointless ping_seq_afinfo indirection and make the code look
like most other protocols.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Avoid most of the afinfo indirections and just call the proc helpers
directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Remove a couple indirections to make the code look like most other
protocols.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Variants of proc_create{,_data} that directly take a seq_file show
callback and drastically reduces the boilerplate code in the callers.
All trivial callers converted over.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Variant of proc_create_data that directly take a struct seq_operations
argument + a private state size and drastically reduces the boilerplate
code in the callers.
All trivial callers converted over.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Variants of proc_create{,_data} that directly take a struct seq_operations
argument and drastically reduces the boilerplate code in the callers.
All trivial callers converted over.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Sockmap is currently backed by an array and enforces keys to be
four bytes. This works well for many use cases and was originally
modeled after devmap which also uses four bytes keys. However,
this has become limiting in larger use cases where a hash would
be more appropriate. For example users may want to use the 5-tuple
of the socket as the lookup key.
To support this add hash support.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This patch only refactors the existing sockmap code. This will allow
much of the psock initialization code path and bpf helper codes to
work for both sockmap bpf map types that are backed by an array, the
currently supported type, and the new hash backed bpf map type
sockhash.
Most the fallout comes from three changes,
- Pushing bpf programs into an independent structure so we
can use it from the htab struct in the next patch.
- Generalizing helpers to use void *key instead of the hardcoded
u32.
- Instead of passing map/key through the metadata we now do
the lookup inline. This avoids storing the key in the metadata
which will be useful when keys can be longer than 4 bytes. We
rename the sk pointers to sk_redir at this point as well to
avoid any confusion between the current sk pointer and the
redirect pointer sk_redir.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
A collection of fixups from previous patches, left for later to not
introduce unnecessary changes while moving code around.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pre-compute these so the compiler won't reload them (due to
no-strict-aliasing).
Changes since v2:
- Do not replace a return with a break in sctp_outq_flush_data
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With this struct we avoid passing lots of variables around and taking care
of updating the current transport/packet.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove an inner one, which tended to be error prone due to the cascading
and it can be replaced by a simple if ().
Rework the outer one so that the actual flush code is not inside it. Now
we first validate if we can or cannot send data, return if not, and then
the flush code.
Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Retransmissions may be triggered when in user context, so lets make use
of gfp.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To the new sctp_outq_flush_transports.
Comment on Nagle is outdated and removed. Nagle is performed earlier, while
checking if the chunk fits the packet: if the outq length is not enough to
fill the packet, it returns SCTP_XMIT_DELAY.
So by when it gets to sctp_outq_flush_transports, it has to go through all
enlisted transports.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
To the new sctp_outq_flush_data. Again, smaller functions and with well
defined objectives.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch renames current sctp_outq_flush_rtx to __sctp_outq_flush_rtx
and create a new sctp_outq_flush_rtx, with the code that was on
sctp_outq_flush. Again, the idea is to have functions with small and
defined objectives.
Yes, there is an open-coded path selection in the now sctp_outq_flush_rtx.
That is kept as is for now because it may be very different when we
implement retransmission path selection algorithms for CMT-SCTP.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Named sctp_outq_flush_ctrl and, with that, keep the contexts contained.
One small fix embedded is the reset of one_packet at every iteration.
This allows bundling of some control chunks in case they were preceeded by
another control chunk that cannot be bundled.
Other than this, it has the same behavior.
Changes since v2:
- Fixed panic reported by kbuild test robot if building with
only up to this patch applied, due to bad parameter to
sctp_outq_select_transport and by not initializing packet after
calling sctp_outq_flush_ctrl.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We had two spots doing such complex operation and they were very close to
each other, a bit more tailored to here or there.
This patch unifies these under the same function,
sctp_outq_select_transport, which knows how to handle control chunks and
original transmissions (but not retransmissions).
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Factor out the code for generating singletons. It's used only once, but
helps to keep the context contained.
The const variables are to ease the reading of subsequent calls in there.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Recognizing that the audit context is an internal audit value, use an
access function to retrieve the audit context pointer for the task
rather than reaching directly into the task struct to get it.
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
[PM: merge fuzz in auditsc.c and selinuxfs.c, checkpatch.pl fixes]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
It's possible to crash the kernel in several different ways by sending
messages to the SMC_PNETID generic netlink family that are missing the
expected attributes:
- Missing SMC_PNETID_NAME => null pointer dereference when comparing
names.
- Missing SMC_PNETID_ETHNAME => null pointer dereference accessing
smc_pnetentry::ndev.
- Missing SMC_PNETID_IBNAME => null pointer dereference accessing
smc_pnetentry::smcibdev.
- Missing SMC_PNETID_IBPORT => out of bounds array access to
smc_ib_device::pattr[-1].
Fix it by validating that all expected attributes are present and that
SMC_PNETID_IBPORT is nonzero.
Reported-by: syzbot+5cd61039dc9b8bfa6e47@syzkaller.appspotmail.com
Fixes: 6812baabf24d ("smc: establish pnet table management")
Cc: <stable@vger.kernel.org> # v4.11+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Otherwise we will leak a reference to the network namespace.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
syzbot found a way to trigger an infinitie loop by overflowing
@offset variable that has been forced to use u16 for some very
obscure reason in the past.
We probably want to look at NEXTHDR_FRAGMENT handling which looks
wrong, in a separate patch.
In net-next, we shall try to use skb_header_pointer() instead of
pskb_may_pull().
watchdog: BUG: soft lockup - CPU#1 stuck for 134s! [syz-executor738:4553]
Modules linked in:
irq event stamp: 13885653
hardirqs last enabled at (13885652): [<ffffffff878009d5>] restore_regs_and_return_to_kernel+0x0/0x2b
hardirqs last disabled at (13885653): [<ffffffff87800905>] interrupt_entry+0xb5/0xf0 arch/x86/entry/entry_64.S:625
softirqs last enabled at (13614028): [<ffffffff84df0809>] tun_napi_alloc_frags drivers/net/tun.c:1478 [inline]
softirqs last enabled at (13614028): [<ffffffff84df0809>] tun_get_user+0x1dd9/0x4290 drivers/net/tun.c:1825
softirqs last disabled at (13614032): [<ffffffff84df1b6f>] tun_get_user+0x313f/0x4290 drivers/net/tun.c:1942
CPU: 1 PID: 4553 Comm: syz-executor738 Not tainted 4.17.0-rc3+ #40
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:check_kcov_mode kernel/kcov.c:67 [inline]
RIP: 0010:__sanitizer_cov_trace_pc+0x20/0x50 kernel/kcov.c:101
RSP: 0018:ffff8801d8cfe250 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: ffff8801d88a8080 RBX: ffff8801d7389e40 RCX: 0000000000000006
RDX: 0000000000000000 RSI: ffffffff868da4ad RDI: ffff8801c8a53277
RBP: ffff8801d8cfe250 R08: ffff8801d88a8080 R09: ffff8801d8cfe3e8
R10: ffffed003b19fc87 R11: ffff8801d8cfe43f R12: ffff8801c8a5327f
R13: 0000000000000000 R14: ffff8801c8a4e5fe R15: ffff8801d8cfe3e8
FS: 0000000000d88940(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 00000001acab3000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
_decode_session6+0xc1d/0x14f0 net/ipv6/xfrm6_policy.c:150
__xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:2368
xfrm_decode_session_reverse include/net/xfrm.h:1213 [inline]
icmpv6_route_lookup+0x395/0x6e0 net/ipv6/icmp.c:372
icmp6_send+0x1982/0x2da0 net/ipv6/icmp.c:551
icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43
ip6_input_finish+0x14e1/0x1a30 net/ipv6/ip6_input.c:305
NF_HOOK include/linux/netfilter.h:288 [inline]
ip6_input+0xe1/0x5e0 net/ipv6/ip6_input.c:327
dst_input include/net/dst.h:450 [inline]
ip6_rcv_finish+0x29c/0xa10 net/ipv6/ip6_input.c:71
NF_HOOK include/linux/netfilter.h:288 [inline]
ipv6_rcv+0xeb8/0x2040 net/ipv6/ip6_input.c:208
__netif_receive_skb_core+0x2468/0x3650 net/core/dev.c:4646
__netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4711
netif_receive_skb_internal+0x126/0x7b0 net/core/dev.c:4785
napi_frags_finish net/core/dev.c:5226 [inline]
napi_gro_frags+0x631/0xc40 net/core/dev.c:5299
tun_get_user+0x3168/0x4290 drivers/net/tun.c:1951
tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:1996
call_write_iter include/linux/fs.h:1784 [inline]
do_iter_readv_writev+0x859/0xa50 fs/read_write.c:680
do_iter_write+0x185/0x5f0 fs/read_write.c:959
vfs_writev+0x1c7/0x330 fs/read_write.c:1004
do_writev+0x112/0x2f0 fs/read_write.c:1039
__do_sys_writev fs/read_write.c:1112 [inline]
__se_sys_writev fs/read_write.c:1109 [inline]
__x64_sys_writev+0x75/0xb0 fs/read_write.c:1109
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: syzbot+0053c8...@syzkaller.appspotmail.com
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Acked-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
|