Age | Commit message (Collapse) | Author |
|
dev->name is a char array of IFNAMSIZ elements, hence can never be
null, so the null pointer check is redundant. Remove it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter updates for your net tree,
they are:
1) Dump only conntrack that belong to this namespace via /proc file.
This is some fallout from the conversion to single conntrack table
for all netns, patch from Liping Zhang.
2) Missing MODULE_ALIAS_NF_LOGGER() for the ARP family that prevents
module autoloading, also from Liping Zhang.
3) Report overquota event to the right netnamespace, again from Liping.
4) Fix tproxy listener sk refcount that leads to crash, from
Eric Dumazet.
5) Fix racy refcounting on object deletion from nfnetlink and rule
removal both for nfacct and cttimeout, from Liping Zhang.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This is required to iterate over the hash table in cttimeout, ctnetlink
and nf_conntrack_ipv4.
>> ERROR: "nf_conntrack_htable_size" [net/netfilter/nfnetlink_cttimeout.ko] undefined!
ERROR: "nf_conntrack_htable_size" [net/netfilter/nf_conntrack_netlink.ko] undefined!
ERROR: "nf_conntrack_htable_size" [net/ipv4/netfilter/nf_conntrack_ipv4.ko] undefined!
Fixes: adf0516845bcd0 ("netfilter: remove ip_conntrack* sysctl compat code")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
In general, when we want to delete a netns, cttimeout_net_exit will
be called before ipt_unregister_table, i.e. before ctnl_timeout_put.
But after call kfree_rcu in cttimeout_net_exit, we will still decrease
the timeout object's refcnt in ctnl_timeout_put, this is incorrect,
and will cause a use after free error.
It is easy to reproduce this problem:
# while : ; do
ip netns add xxx
ip netns exec xxx nfct add timeout testx inet icmp timeout 200
ip netns exec xxx iptables -t raw -p icmp -I OUTPUT -j CT --timeout testx
ip netns del xxx
done
=======================================================================
BUG kmalloc-96 (Tainted: G B E ): Poison overwritten
-----------------------------------------------------------------------
INFO: 0xffff88002b5161e8-0xffff88002b5161e8. First byte 0x6a instead of
0x6b
INFO: Allocated in cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
age=104 cpu=0 pid=3330
___slab_alloc+0x4da/0x540
__slab_alloc+0x20/0x40
__kmalloc+0x1c8/0x240
cttimeout_new_timeout+0xd4/0x240 [nfnetlink_cttimeout]
nfnetlink_rcv_msg+0x21a/0x230 [nfnetlink]
[ ... ]
So only when the refcnt decreased to 0, we call kfree_rcu to free the
timeout object. And like nfnetlink_acct do, use atomic_cmpxchg to
avoid race between ctnl_timeout_try_del and ctnl_timeout_put.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Suppose that we input the following commands at first:
# nfacct add test
# iptables -A INPUT -m nfacct --nfacct-name test
And now "test" acct's refcnt is 2, but later when we try to delete the
"test" nfacct and the related iptables rule at the same time, race maybe
happen:
CPU0 CPU1
nfnl_acct_try_del nfnl_acct_put
atomic_dec_and_test //ref=1,testfail -
- atomic_dec_and_test //ref=0,testok
- kfree_rcu
atomic_inc //ref=1 -
So after the rcu grace period, nf_acct will be freed but it is still linked
in the nfnl_acct_list, and we can access it later, then oops will happen.
Convert atomic_dec_and_test and atomic_inc combinaiton to one atomic
operation atomic_cmpxchg here to fix this problem.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Minor overlapping changes for both merge conflicts.
Resolution work done by Stephen Rothwell was used
as a reference.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
Fietkau.
2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.
3) Fix some tg3 ethtool logic bugs, and one that would cause no
interrupts to be generated when rx-coalescing is set to 0. From
Satish Baddipadige and Siva Reddy Kallam.
4) QLCNIC mailbox corruption and napi budget handling fix from Manish
Chopra.
5) Fix fib_trie logic when walking the trie during /proc/net/route
output than can access a stale node pointer. From David Forster.
6) Several sctp_diag fixes from Phil Sutter.
7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.
8) Checksum fixup fixes in bpf from Daniel Borkmann.
9) Memork leaks in nfnetlink, from Liping Zhang.
10) Use after free in rxrpc, from David Howells.
11) Use after free in new skb_array code of macvtap driver, from Jason
Wang.
12) Calipso resource leak, from Colin Ian King.
13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.
14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.
15) Fix lockdep splats in macsec, from Sabrina Dubroca.
16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
handling.
17) Various tc-action bug fixes, from CONG Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net_sched: allow flushing tc police actions
net_sched: unify the init logic for act_police
net_sched: convert tcf_exts from list to pointer array
net_sched: move tc offload macros to pkt_cls.h
net_sched: fix a typo in tc_for_each_action()
net_sched: remove an unnecessary list_del()
net_sched: remove the leftover cleanup_a()
mlxsw: spectrum: Allow packets to be trapped from any PG
mlxsw: spectrum: Unmap 802.1Q FID before destroying it
mlxsw: spectrum: Add missing rollbacks in error path
mlxsw: reg: Fix missing op field fill-up
mlxsw: spectrum: Trap loop-backed packets
mlxsw: spectrum: Add missing packet traps
mlxsw: spectrum: Mark port as active before registering it
mlxsw: spectrum: Create PVID vPort before registering netdevice
mlxsw: spectrum: Remove redundant errors from the code
mlxsw: spectrum: Don't return upon error in removal path
i40e: check for and deal with non-contiguous TCs
ixgbe: Re-enable ability to toggle VLAN filtering
ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
...
|
|
Adapt KCM to use the stream parser. This mostly involves removing
the RX handling and setting up the strparser using the interface.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch introduces a utility for parsing application layer protocol
messages in a TCP stream. This is a generalization of the mechanism
implemented of Kernel Connection Multiplexor.
The API includes a context structure, a set of callbacks, utility
functions, and a data ready function.
A stream parser instance is defined by a strparse structure that
is bound to a TCP socket. The function to initialize the structure
is:
int strp_init(struct strparser *strp, struct sock *csk,
struct strp_callbacks *cb);
csk is the TCP socket being bound to and cb are the parser callbacks.
The upper layer calls strp_tcp_data_ready when data is ready on the lower
socket for strparser to process. This should be called from a data_ready
callback that is set on the socket:
void strp_tcp_data_ready(struct strparser *strp);
A parser is bound to a TCP socket by setting data_ready function to
strp_tcp_data_ready so that all receive indications on the socket
go through the parser. This is assumes that sk_user_data is set to
the strparser structure.
There are four callbacks.
- parse_msg is called to parse the message (returns length or error).
- rcv_msg is called when a complete message has been received
- read_sock_done is called when data_ready function exits
- abort_parser is called to abort the parser
The input to parse_msg is an skbuff which contains next message under
construction. The backend processing of parse_msg will parse the
application layer protocol headers to determine the length of
the message in the stream. The possible return values are:
>0 : indicates length of successfully parsed message
0 : indicates more data must be received to parse the message
-ESTRPIPE : current message should not be processed by the
kernel, return control of the socket to userspace which
can proceed to read the messages itself
other < 0 : Error is parsing, give control back to userspace
assuming that synchronzation is lost and the stream
is unrecoverable (application expected to close TCP socket)
In the case of error return (< 0) strparse will stop the parser
and report and error to userspace. The application must deal
with the error. To handle the error the strparser is unbound
from the TCP socket. If the error indicates that the stream
TCP socket is at recoverable point (ESTRPIPE) then the application
can read the TCP socket to process the stream. Once the application
has dealt with the exceptions in the stream, it may again bind the
socket to a strparser to continue data operations.
Note that ENODATA may be returned to the application. In this case
parse_msg returned -ESTRPIPE, however strparser was unable to maintain
synchronization of the stream (i.e. some of the message in question
was already read by the parser).
strp_pause and strp_unpause are used to provide flow control. For
instance, if rcv_msg is called but the upper layer can't immediately
consume the message it can hold the message and pause strparser.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
While commit 9c706a49d660 ("net: ipconfig: fix use after free") avoids
the use after free, the resulting code still ends up calling both the
ic_setup_if() and ic_setup_routes() after calling ic_close_devs(), and
access to the device is still required.
Move the call to ic_close_devs() to the very end of the function.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The act_police uses its own code to walk the
action hashtable, which leads to that we could
not flush standalone tc police actions, so just
switch to tcf_generic_walker() like other actions.
(Joint work from Roman and Cong.)
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jamal reported a crash when we create a police action
with a specific index, this is because the init logic
is not correct, we should always create one for this
case. Just unify the logic with other tc actions.
Fixes: a03e6fe56971 ("act_police: fix a crash during removal")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As pointed out by Jamal, an action could be shared by
multiple filters, so we can't use list to chain them
any more after we get rid of the original tc_action.
Instead, we could just save pointers to these actions
in tcf_exts, since they are refcount'ed, so convert
the list to an array of pointers.
The "ugly" part is the action API still accepts list
as a parameter, I just introduce a helper function to
convert the array of pointers to a list, instead of
relying on the C99 feature to iterate the array.
Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This list_del() for tc action is not needed actually,
because we only use this list to chain bulk operations,
therefore should not be carried for latter operations.
Fixes: ec0595cc4495 ("net_sched: get rid of struct tcf_common")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After refactoring tc_action into tcf_common, we no
longer need to cleanup temporary "actions" in list,
they are permanently stored in the hashtable.
Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common")
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
pull request for net-next: batman-adv 2016-08-16
This feature patchset is all about adding netlink support, which should
supersede our debugfs configuration interface in the long run. It is
especially necessary when batman-adv should be used in different
namespaces, since debugfs can not differentiate between those.
More specifically, the following changes are included:
- Two fixes for namespace handling by Andrew Lunn, checking also the
namespaces for parent interfaces, and supress debugfs entries
for non-default netns
- Implement various netlink commands for the new interface, by
Matthias Schiffer, Andrew Lunn, Sven Eckelmann and Simon Wunderlich
(13 patches):
* routing algorithm list
* hardif list
* translation tables (local and global)
* TTVN for the translation tables
* originator and neighbor tables for B.A.T.M.A.N. IV
and B.A.T.M.A.N. V
* gateway dump functionality for B.A.T.M.A.N. IV
and B.A.T.M.A.N. V
* Bridge Loop Avoidance claims, and corresponding BLA group
* Bridge Loop Avoidance backbone tables
- Finally, mark batman-adv as netns compatible, by Andrew Lunn (1 patch)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since commit 64b87639c9cb ("netfilter: conntrack: fix race between
nf_conntrack proc read and hash resize") introduce the
nf_conntrack_get_ht, so there's no need to check nf_conntrack_generation
again and again to get the hash table and hash size. And convert
nf_conntrack_get_ht to inline function here.
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
inet_lookup_listener() and inet6_lookup_listener() no longer
take a reference on the found listener.
This minimal patch adds back the refcounting, but we might do
this differently in net-next later.
Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Reported-and-tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We should report the over quota message to the right net namespace
instead of the init netns.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Otherwise, if nfnetlink_log.ko is not loaded, we cannot add rules
to log packets to the userspace when we specify it with arp family,
such as:
# nft add rule arp filter input log group 0
<cmdline>:1:1-37: Error: Could not process rule: No such file or
directory
add rule arp filter input log group 0
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We should skip the conntracks that belong to a different namespace,
otherwise other unrelated netns's conntrack entries will be dumped via
/proc/net/nf_conntrack.
Fixes: 56d52d4892d0 ("netfilter: conntrack: use a single hashtable for all namespaces")
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pull virtio/vhost fixes from Michael Tsirkin:
- test fixes
- a vsock fix
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
tools/virtio: add dma stubs
vhost/test: fix after swiotlb changes
vhost/vsock: drop space available check for TX vq
ringtest: test build fix
|
|
tipc_msg_create() can return a NULL skb and if so, we shouldn't try to
call tipc_node_xmit_skb() on it.
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 3 PID: 30298 Comm: trinity-c0 Not tainted 4.7.0-rc7+ #19
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: ffff8800baf09980 ti: ffff8800595b8000 task.ti: ffff8800595b8000
RIP: 0010:[<ffffffff830bb46b>] [<ffffffff830bb46b>] tipc_node_xmit_skb+0x6b/0x140
RSP: 0018:ffff8800595bfce8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003023b0e0
RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff83d12580
RBP: ffff8800595bfd78 R08: ffffed000b2b7f32 R09: 0000000000000000
R10: fffffbfff0759725 R11: 0000000000000000 R12: 1ffff1000b2b7f9f
R13: ffff8800595bfd58 R14: ffffffff83d12580 R15: dffffc0000000000
FS: 00007fcdde242700(0000) GS:ffff88011af80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcddde1db10 CR3: 000000006874b000 CR4: 00000000000006e0
DR0: 00007fcdde248000 DR1: 00007fcddd73d000 DR2: 00007fcdde248000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602
Stack:
0000000000000018 0000000000000018 0000000041b58ab3 ffffffff83954208
ffffffff830bb400 ffff8800595bfd30 ffffffff8309d767 0000000000000018
0000000000000018 ffff8800595bfd78 ffffffff8309da1a 00000000810ee611
Call Trace:
[<ffffffff830c84a3>] tipc_shutdown+0x553/0x880
[<ffffffff825b4a3b>] SyS_shutdown+0x14b/0x170
[<ffffffff8100334c>] do_syscall_64+0x19c/0x410
[<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25
Code: 90 00 b4 0b 83 c7 00 f1 f1 f1 f1 4c 8d 6d e0 c7 40 04 00 00 00 f4 c7 40 08 f3 f3 f3 f3 48 89 d8 48 c1 e8 03 c7 45 b4 00 00 00 00 <80> 3c 30 00 75 78 48 8d 7b 08 49 8d 75 c0 48 b8 00 00 00 00 00
RIP [<ffffffff830bb46b>] tipc_node_xmit_skb+0x6b/0x140
RSP <ffff8800595bfce8>
---[ end trace 57b0484e351e71f1 ]---
I feel like we should maybe return -ENOMEM or -ENOBUFS, but I'm not sure
userspace is equipped to handle that. Anyway, this is better than a GPF
and looks somewhat consistent with other tipc_msg_create() callers.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move exporting of switchdev_port_same_parent_id to be right
below it and not elsewhere.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ensure that the inner_protocol is set on transmit so that GSO segmentation,
which relies on that field, works correctly.
This is achieved by setting the inner_protocol in gre_build_header rather
than each caller of that function. It ensures that the inner_protocol is
set when gre_fb_xmit() is used to transmit GRE which was not previously the
case.
I have observed this is not the case when OvS transmits GRE using
lwtunnel metadata (which it always does).
Fixes: 38720352412a ("gre: Use inner_proto to obtain inner header protocol")
Cc: Pravin Shelar <pshelar@ovn.org>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
ping_v6_sendmsg does not set flowi6_oif in response to
sin6_scope_id or sk_bound_dev_if, so it is not possible to use
these APIs to ping an IPv6 address on a different interface.
Instead, it sets flowi6_iif, which is incorrect but harmless.
Stop setting flowi6_iif, and support various ways of setting oif
in the same priority order used by udpv6_sendmsg.
Tested: https://android-review.googlesource.com/#/c/254470/
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If net namespace is attached to a user namespace let's make container's
root owner of sysctls affecting said network namespace instead of global
root.
This also allows us to clean up net_ctl_permissions() because we do not
need to fudge permissions anymore for the container's owner since it now
owns the objects in question.
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When CONFIG_NET_NS is disabled, registering pernet operations causes
init() to be called immediately with init_net as an argument. Unfortunately
this leads to some pernet ops, such as proc_net_ns_init() to be called too
early, when init_net namespace has not been fully initialized. This causes
issues when we want to change pernet ops to use more data from the net
namespace in question, for example reference user namespace that owns our
network namespace.
To fix this we could either play game of musical chairs and rearrange init
order, or we could do the same as when CONFIG_NET_NS is enabled, and
postpone calling pernet ops->init() until namespace is set up properly.
Note that we can not simply undo commit ed160e839d2e ("[NET]: Cleanup
pernet operation without CONFIG_NET_NS") and use the same implementations
for __register_pernet_operations() and __unregister_pernet_operations(),
because many pernet ops are marked as __net_initdata and will be discarded,
which wreaks havoc on our ops lists. Here we rely on the fact that we only
use lists until init_net is fully initialized, which happens much earlier
than discarding __net_initdata sections.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Remove unnecessary use of enable/disable callback notifications
and the incorrect more space available check.
The virtio_transport_tx_work handles when the TX virtqueue
has more buffers available.
Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
The idea for type_check in dev_get_nest_level() was to count the number
of nested devices of the same type (currently, only macvlan or vlan
devices).
This prevented the false positive lockdep warning on configurations such
as:
eth0 <--- macvlan0 <--- vlan0 <--- macvlan1
However, this doesn't prevent a warning on a configuration such as:
eth0 <--- macvlan0 <--- vlan0
eth1 <--- vlan1 <--- macvlan1
In this case, all the locks end up with a nesting subclass of 1, so
lockdep thinks that there is still a deadlock:
- in the first case we have (macvlan_netdev_addr_lock_key, 1) and then
take (vlan_netdev_xmit_lock_key, 1)
- in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then
take (macvlan_netdev_addr_lock_key, 1)
By removing the linktype check in dev_get_nest_level() and always
incrementing the nesting depth, lockdep considers this configuration
valid.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If IPv6 is disabled when the option is set to keep IPv6
addresses on link down, userspace is unaware of this as
there is no such indication via netlink. The solution is to
remove the IPv6 addresses in this case, which results in
netlink messages indicating removal of addresses in the
usual manner. This fix also makes the behavior consistent
with the case of having IPv6 disabled first, which stops
IPv6 addresses from being added.
Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional")
Signed-off-by: Mike Manning <mmanning@brocade.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Not much for -next so far, but here it goes:
* send more nl80211 events for interfaces
* remove useless network/transport offset mangling code
* validate beacon intervals identically for all interface types
* use driver rate estimates for mesh
* fix a compiler type/signedness warning
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
sctp_transport_seq_start() does not currently clear iter->start_fail on
success, but relies on it being zero when it is allocated (by
seq_open_net()).
This can be a problem in the following sequence:
open() // allocates iter (and implicitly sets iter->start_fail = 0)
read()
- iter->start() // fails and sets iter->start_fail = 1
- iter->stop() // doesn't call sctp_transport_walk_stop() (correct)
read() again
- iter->start() // succeeds, but doesn't change iter->start_fail
- iter->stop() // doesn't call sctp_transport_walk_stop() (wrong)
We should initialize sctp_ht_iter::start_fail to zero if ->start()
succeeds, otherwise it's possible that we leave an old value of 1 there,
which will cause ->stop() to not call sctp_transport_walk_stop(), which
causes all sorts of problems like not calling rcu_read_unlock() (and
preempt_enable()), eventually leading to more warnings like this:
BUG: sleeping function called from invalid context at mm/slab.h:388
in_atomic(): 0, irqs_disabled(): 0, pid: 16551, name: trinity-c2
Preemption disabled at:[<ffffffff819bceb6>] rhashtable_walk_start+0x46/0x150
[<ffffffff81149abb>] preempt_count_add+0x1fb/0x280
[<ffffffff83295892>] _raw_spin_lock+0x12/0x40
[<ffffffff819bceb6>] rhashtable_walk_start+0x46/0x150
[<ffffffff82ec665f>] sctp_transport_walk_start+0x2f/0x60
[<ffffffff82edda1d>] sctp_transport_seq_start+0x4d/0x150
[<ffffffff81439e50>] traverse+0x170/0x850
[<ffffffff8143aeec>] seq_read+0x7cc/0x1180
[<ffffffff814f996c>] proc_reg_read+0xbc/0x180
[<ffffffff813d0384>] do_loop_readv_writev+0x134/0x210
[<ffffffff813d2a95>] do_readv_writev+0x565/0x660
[<ffffffff813d6857>] vfs_readv+0x67/0xa0
[<ffffffff813d6c16>] do_preadv+0x126/0x170
[<ffffffff813d710c>] SyS_preadv+0xc/0x10
[<ffffffff8100334c>] do_syscall_64+0x19c/0x410
[<ffffffff83296225>] return_from_SYSCALL_64+0x0/0x6a
[<ffffffffffffffff>] 0xffffffffffffffff
Notice that this is a subtly different stacktrace from the one in commit
5fc382d875 ("net/sctp: terminate rhashtable walk correctly").
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-By: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If iriap_register_lsap() fails to allocate memory, self->lsap is
set to NULL. However, none of the callers handle the failure and
irlmp_connect_request() will happily dereference it:
iriap_register_lsap: Unable to allocated LSAP!
================================================================================
UBSAN: Undefined behaviour in net/irda/irlmp.c:378:2
member access within null pointer of type 'struct lsap_cb'
CPU: 1 PID: 15403 Comm: trinity-c0 Not tainted 4.8.0-rc1+ #81
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org
04/01/2014
0000000000000000 ffff88010c7e78a8 ffffffff82344f40 0000000041b58ab3
ffffffff84f98000 ffffffff82344e94 ffff88010c7e78d0 ffff88010c7e7880
ffff88010630ad00 ffffffff84a5fae0 ffffffff84d3f5c0 000000000000017a
Call Trace:
[<ffffffff82344f40>] dump_stack+0xac/0xfc
[<ffffffff8242f5a8>] ubsan_epilogue+0xd/0x8a
[<ffffffff824302bf>] __ubsan_handle_type_mismatch+0x157/0x411
[<ffffffff83b7bdbc>] irlmp_connect_request+0x7ac/0x970
[<ffffffff83b77cc0>] iriap_connect_request+0xa0/0x160
[<ffffffff83b77f48>] state_s_disconnect+0x88/0xd0
[<ffffffff83b78904>] iriap_do_client_event+0x94/0x120
[<ffffffff83b77710>] iriap_getvaluebyclass_request+0x3e0/0x6d0
[<ffffffff83ba6ebb>] irda_find_lsap_sel+0x1eb/0x630
[<ffffffff83ba90c8>] irda_connect+0x828/0x12d0
[<ffffffff833c0dfb>] SYSC_connect+0x22b/0x340
[<ffffffff833c7e09>] SyS_connect+0x9/0x10
[<ffffffff81007bd3>] do_syscall_64+0x1b3/0x4b0
[<ffffffff845f946a>] entry_SYSCALL64_slow_path+0x25/0x25
================================================================================
The bug seems to have been around since forever.
There's more problems with missing error checks in iriap_init() (and
indeed all of irda_init()), but that's a bigger problem that needs
very careful review and testing. This patch will fix the most serious
bug (as it's easily reached from unprivileged userspace).
I have tested my patch with a reproducer.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix the bpf_try_make_writable() helper and all call sites we have in BPF,
it's currently defect with regards to skbs when the write_len spans into
non-linear parts, no matter if cloned or not.
There are multiple issues at once. First, using skb_store_bits() is not
correct since even if we have a cloned skb, page frags can still be shared.
To really make them private, we need to pull them in via __pskb_pull_tail()
first, which also gets us a private head via pskb_expand_head() implicitly.
This is for helpers like bpf_skb_store_bytes(), bpf_l3_csum_replace(),
bpf_l4_csum_replace(). Really, the only thing reasonable and working here
is to call skb_ensure_writable() before any write operation. Meaning, via
pskb_may_pull() it makes sure that parts we want to access are pulled in and
if not does so plus unclones the skb implicitly. If our write_len still fits
the headlen and we're cloned and our header of the clone is not writable,
then we need to make a private copy via pskb_expand_head(). skb_store_bits()
is a bit misleading and only safe to store into non-linear data in different
contexts such as 357b40a18b04 ("[IPV6]: IPV6_CHECKSUM socket option can
corrupt kernel memory").
For above BPF helper functions, it means after fixed bpf_try_make_writable(),
we've pulled in enough, so that we operate always based on skb->data. Thus,
the call to skb_header_pointer() and skb_store_bits() becomes superfluous.
In bpf_skb_store_bytes(), the len check is unnecessary too since it can
only pass in maximum of BPF stack size, so adding offset is guaranteed to
never overflow. Also bpf_l3/4_csum_replace() helpers must test for proper
offset alignment since they use __sum16 pointer for writing resulting csum.
The remaining helpers that change skb data not discussed here yet are
bpf_skb_vlan_push(), bpf_skb_vlan_pop() and bpf_skb_change_proto(). The
vlan helpers internally call either skb_ensure_writable() (pop case) and
skb_cow_head() (push case, for head expansion), respectively. Similarly,
bpf_skb_proto_xlat() takes care to not mangle page frags.
Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
Fixes: 91bc4822c3d6 ("tc: bpf: add checksum helpers")
Fixes: 3697649ff29e ("bpf: try harder on clones when writing into skb")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, if calipso_genopt fails then the error exit path
does not free the ipv6_opt_hdr new causing a memory leak. Fix
this by kfree'ing new on the error exit path.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This backward compatibility has been around for more than ten years,
since Yasuyuki Kozakai introduced IPv6 in conntrack. These days, we have
alternate /proc/net/nf_conntrack* entries, the ctnetlink interface and
the conntrack utility got adopted by many people in the user community
according to what I observed on the netfilter user mailing list.
So let's get rid of this.
Note that nf_conntrack_htable_size and unsigned int nf_conntrack_max do
not need to be exported as symbol anymore.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
While hashing out BPF's current_task_under_cgroup helper bits, it came
to discussion that the skb_in_cgroup helper name was suboptimally chosen.
Tejun says:
So, I think in_cgroup should mean that the object is in that
particular cgroup while under_cgroup in the subhierarchy of that
cgroup. Let's rename the other subhierarchy test to under too. I
think that'd be a lot less confusing going forward.
[...]
It's more intuitive and gives us the room to implement the real
"in" test if ever necessary in the future.
Since this touches uapi bits, we need to change this as long as v4.8
is not yet officially released. Thus, change the helper enum and rename
related bits.
Fixes: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto")
Reference: http://patchwork.ozlabs.org/patch/658500/
Suggested-by: Sargun Dhillon <sargun@sargun.me>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|
Fixes the following sparse warning:
net/ipv6/sit.c:1129:6: warning:
symbol 'ipip6_valid_ip_proto' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.open-mesh.org/linux-merge
Simon Wunderlich says:
====================
This feature patchset includes the following changes (mostly
chronological order):
- bump version strings, by Simon Wunderlich
- kerneldoc clean up, by Sven Eckelmann
- enable RTNL automatic loading and according documentation
changes, by Sven Eckelmann (2 patches)
- fix/improve interface removal and associated locking, by
Sven Eckelmann (3 patches)
- clean up unused variables, by Linus Luessing
- implement Gateway selection code for B.A.T.M.A.N. V by
Antonio Quartulli (4 patches)
- rewrite TQ comparison by Markus Pargmann
- fix Cocinelle warnings on bool vs integers (by Fenguang Wu/Intels
kbuild test robot) and bitwise arithmetic operations (by Linus
Luessing)
- rewrite packet creation for forwarding for readability and to avoid
reference count mistakes, by Linus Luessing
- use kmem_cache for translation table, which results in more efficient
storing of translation table entries, by Sven Eckelmann
- rewrite/clarify reference handling for send_skb_unicast, by Sven
Eckelmann
- fix debug messages when updating routes, by Sven Eckelmann
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
- Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user
needs multiple different security services (e.g. krb5i and krb5p).
- Stable patch to fix a regression introduced by the use of
SO_REUSEPORT, and that prevented the use of multiple different NFS
versions to the same server.
- TCP socket reconnection timer fixes.
- Patch from Neil to disable the use of IPv6 temporary addresses"
* tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Cap the transport reconnection timer at 1/2 lease period
NFSv4: Cleanup the setting of the nfs4 lease period
SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout
SUNRPC: Fix reconnection timeouts
NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED
SUNRPC: disable the use of IPv6 temporary addresses.
SUNRPC: allow for upcalls for same uid but different gss service
SUNRPC: Fix up socket autodisconnect
SUNRPC: Handle EADDRNOTAVAIL on connection failures
|
|
This patch adds a new hash expression, this provides jhash support but
this can be extended to support for other hash functions. The modulus
and seed already comes embedded into this new expression.
Use case example:
... meta mark set hash ip saddr mod 10
Signed-off-by: Laura Garcia Liebana <nevola@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
After earlier patches conversions all spots acquire the writer lock and
we can now convert this to a normal spinlock.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
It doesn't seem that important.
We now get inconsistent view of the counters, but those are stale anyway
right after we drop the lock.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Don't acquire the readlock anymore and rely on rcu alone.
In case writer on other CPU changed policy at the wrong moment (after we
obtained sk policy pointer but before we could obtain the reference)
just repeat the lookup.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
side effect: no longer disables BH (should be fine).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
If we don't hold the policy lock anymore the refcnt might
already be 0, i.e. policy struct is about to be free'd.
Switch to atomic_inc_not_zero to avoid this.
On removal policies are already unlinked from the tables (lists)
before the last _put occurs so we are not supposed to find the same
'dead' entry on the next loop, so its safe to just repeat the lookup.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Once xfrm_policy_lookup_bytype doesn't grab xfrm_policy_lock anymore its
possible for a hash resize to occur in parallel.
Use sequence counter to block lookup in case a resize is in
progress and to also re-lookup in case hash table was altered
in the mean time (might cause use to not find the best-match).
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
Since commit 56f047305dd4b6b617
("xfrm: add rcu grace period in xfrm_policy_destroy()") xfrm policy
objects are already free'd via rcu.
In order to make more places lockless (i.e. use rcu_read_lock instead of
grabbing read-side of policy rwlock) we only need to:
- use rcu_assign_pointer to store address of new hash table backend memory
- add rcu barrier so that freeing of old memory is delayed (expansion
and free happens from system workqueue, so synchronize_rcu is fine)
- use rcu_dereference to fetch current address of the hash table.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|
|
This is required once we allow lockless readers.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
|