summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-07-30tipc: use temporary, non-protected skb queue for bundle receptionJon Paul Maloy
Currently, when we extract small messages from a message bundle, or when many messages have accumulated in the link arrival queue, those messages are added one by one to the lock protected link input queue. This may increase contention with the reader of that queue, in the function tipc_sk_rcv(). This commit introduces a temporary, unprotected input queue in tipc_link_rcv() for such cases. Only when the arrival queue has been emptied, and the function is ready to return, does it splice the whole temporary queue into the real input queue. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: remove implicit message delivery in node_unlock()Jon Paul Maloy
After the most recent changes, all access calls to a link which may entail addition of messages to the link's input queue are postpended by an explicit call to tipc_sk_rcv(), using a reference to the correct queue. This means that the potentially hazardous implicit delivery, using tipc_node_unlock() in combination with a binary flag and a cached queue pointer, now has become redundant. This commit removes this implicit delivery mechanism both for regular data messages and for binding table update messages. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: make resetting of links non-atomicJon Paul Maloy
In order to facilitate future improvements to the locking structure, we want to make resetting and establishing of links non-atomic. I.e., the functions tipc_node_link_up() and tipc_node_link_down() should be called from outside the node lock context, and grab/release the node lock themselves. This requires that we can freeze the link state from the moment it is set to RESETTING or PEER_RESET in one lock context until it is set to RESET or ESTABLISHING in a later context. The recently introduced link FSM makes this possible, so we are now ready to introduce the above change. This commit implements this. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: move received discovery data evaluation inside node.cJon Paul Maloy
The node lock is currently grabbed and and released in the function tipc_disc_rcv() in the file discover.c. As a preparation for the next commits, we need to move this node lock handling, along with the code area it is covering, to node.c. This commit introduces this change. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: merge link->exec_mode and link->state into one FSMJon Paul Maloy
Until now, we have been handling link failover and synchronization by using an additional link state variable, "exec_mode". This variable is not independent of the link FSM state, something causing a risk of inconsistencies, apart from the fact that it clutters the code. The conditions are now in place to define a new link FSM that covers all existing use cases, including failover and synchronization, and eliminate the "exec_mode" field altogether. The FSM must also support non-atomic resetting of links, which will be introduced later. The new link FSM is shown below, with 7 states and 8 events. Only events leading to state change are shown as edges. +------------------------------------+ |RESET_EVT | | | | +--------------+ | +-----------------| SYNCHING |-----------------+ | |FAILURE_EVT +--------------+ PEER_RESET_EVT| | | A | | | | | | | | | | | | | | |SYNCH_ |SYNCH_ | | | |BEGIN_EVT |END_EVT | | | | | | | V | V V | +-------------+ +--------------+ +------------+ | | RESETTING |<---------| ESTABLISHED |--------->| PEER_RESET | | +-------------+ FAILURE_ +--------------+ PEER_ +------------+ | | EVT | A RESET_EVT | | | | | | | | | | | | | +--------------+ | | | RESET_EVT| |RESET_EVT |ESTABLISH_EVT | | | | | | | | | | | | V V | | | +-------------+ +--------------+ RESET_EVT| +--->| RESET |--------->| ESTABLISHING |<----------------+ +-------------+ PEER_ +--------------+ | A RESET_EVT | | | | | | | |FAILOVER_ |FAILOVER_ |FAILOVER_ |BEGIN_EVT |END_EVT |BEGIN_EVT | | | V | | +-------------+ | | FAILINGOVER |<----------------+ +-------------+ These changes are fully backwards compatible. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: move protocol message sending away from link FSMJon Paul Maloy
The implementation of the link FSM currently takes decisions about and sends out link protocol messages. This is unnecessary, since such actions are not the result of any link state change, and are even decided based on non-FSM state information ("silent_intv_cnt"). We now move the sending of unicast link protocol messages to the function tipc_link_timeout(), and the initial broadcast synchronization message to tipc_node_link_up(). The latter is done because a link instance should not need to know whether it is the first or second link to a destination. Such information is now restricted to and handled by the link aggregation layer in node.c Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: move link synch and failover to link aggregation levelJon Paul Maloy
Link failover and synchronization have until now been handled by the links themselves, forcing them to have knowledge about and to access parallel links in order to make the two algorithms work correctly. In this commit, we move the control part of this functionality to the link aggregation level in node.c, which is the right location for this. As a result, the two algorithms become easier to follow, and the link implementation becomes simpler. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: extend node FSMJon Paul Maloy
In the next commit, we will move link synch/failover orchestration to the link aggregation level. In order to do this, we first need to extend the node FSM with two more states, NODE_SYNCHING and NODE_FAILINGOVER, plus four new events to enter and leave those states. This commit introduces this change, without yet making use of it. The node FSM now looks as follows: +-----------------------------------------+ | PEER_DOWN_EVT| | | +------------------------+----------------+ | |SELF_DOWN_EVT | | | | | | | | +-----------+ +-----------+ | | |NODE_ | |NODE_ | | | +----------|FAILINGOVER|<---------|SYNCHING |------------+ | | |SELF_ +-----------+ FAILOVER_+-----------+ PEER_ | | | |DOWN_EVT | A BEGIN_EVT A | DOWN_EVT| | | | | | | | | | | | | | | | | | | | |FAILOVER_|FAILOVER_ |SYNCH_ |SYNCH_ | | | | |END_EVT |BEGIN_EVT |BEGIN_EVT|END_EVT | | | | | | | | | | | | | | | | | | | | | +--------------+ | | | | | +------->| SELF_UP_ |<-------+ | | | | +----------------| PEER_UP |------------------+ | | | | |SELF_DOWN_EVT +--------------+ PEER_DOWN_EVT| | | | | | A A | | | | | | | | | | | | | | PEER_UP_EVT| |SELF_UP_EVT | | | | | | | | | | | V V V | | V V V +------------+ +-----------+ +-----------+ +------------+ |SELF_DOWN_ | |SELF_UP_ | |PEER_UP_ | |PEER_DOWN | |PEER_LEAVING|<------|PEER_COMING| |SELF_COMING|------>|SELF_LEAVING| +------------+ SELF_ +-----------+ +-----------+ PEER_ +------------+ | DOWN_EVT A A DOWN_EVT | | | | | | | | | | SELF_UP_EVT| |PEER_UP_EVT | | | | | | | | | |PEER_DOWN_EVT +--------------+ SELF_DOWN_EVT| +------------------->| SELF_DOWN_ |<--------------------+ | PEER_DOWN | +--------------+ Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: reverse call order for link_reset()->node_link_down()Jon Paul Maloy
In many cases the call order when a link is reset goes as follows: tipc_node_xx()->tipc_link_reset()->tipc_node_link_down() This is not the right order if we want the node to be in control, so in this commit we change the order to: tipc_node_xx()->tipc_node_link_down()->tipc_link_reset() The fact that tipc_link_reset() now is called from only one location with a well-defined state will also facilitate later simplifications of tipc_link_reset() and the link FSM. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: move all link_reset() calls to link aggregation levelJon Paul Maloy
In line with our effort to let the node level have full control over its links, we want to move all link reset calls from link.c to node.c. Some of the calls can be moved by simply moving the calling function, when this is the right thing to do. For the remaining calls we use the now established technique of returning a TIPC_LINK_DOWN_EVT flag from tipc_link_rcv(), whereafter we perform the reset call when the call returns. This change serves as a preparation for the coming commits. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30tipc: eliminate function tipc_link_activate()Jon Paul Maloy
The function tipc_link_activate() is redundant, since it mostly performs settings that have already been done in a preceding tipc_link_reset(). There are three exceptions to this: - The actual state change to TIPC_LINK_WORKING. This should anyway be done in the FSM, and not in a separate function. - Registration of the link with the bearer. This should be done by the node, since we don't want the link to have any knowledge about its specific bearer. - Call to tipc_node_link_up() for user access registration. With the new role distribution between link aggregation and link level this becomes the wrong call order; tipc_node_link_up() should instead be called directly as a result of a TIPC_LINK_UP event, hence by the node itself. This commit implements those changes. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-07-30 Here's a set of Bluetooth & 802.15.4 patches intended for the 4.3 kernel. - Cleanups & fixes to mac802154 - Refactoring of Intel Bluetooth HCI driver - Various coding style fixes to Bluetooth HCI drivers - Support for Intel Lightning Peak Bluetooth devices - Generic class code in interface descriptor in btusb to match more HW - Refactoring of Bluetooth HS code together with a new config option - Support for BCM4330B1 Broadcom UART controller Let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30net: sk_clone_lock() should only do get_net() if the parent is not a kernel ↵Sowmini Varadhan
socket The newsk returned by sk_clone_lock should hold a get_net() reference if, and only if, the parent is not a kernel socket (making this similar to sk_alloc()). E.g,. for the SYN_RECV path, tcp_v4_syn_recv_sock->..inet_csk_clone_lock sets up the syn_recv newsk from sk_clone_lock. When the parent (listen) socket is a kernel socket (defined in sk_alloc() as having sk_net_refcnt == 0), then the newsk should also have a 0 sk_net_refcnt and should not hold a get_net() reference. Fixes: 26abe14379f8 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") Acked-by: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30net/ipv6: add sysctl option accept_ra_min_hop_limitHangbin Liu
Commit 6fd99094de2b ("ipv6: Don't reduce hop limit for an interface") disabled accept hop limit from RA if it is smaller than the current hop limit for security stuff. But this behavior kind of break the RFC definition. RFC 4861, 6.3.4. Processing Received Router Advertisements A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time, and Retrans Timer) may contain a value denoting that it is unspecified. In such cases, the parameter should be ignored and the host should continue using whatever value it is already using. If the received Cur Hop Limit value is non-zero, the host SHOULD set its CurHopLimit variable to the received value. So add sysctl option accept_ra_min_hop_limit to let user choose the minimum hop limit value they can accept from RA. And set default to 1 to meet RFC standards. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30net: sched: fix refcount imbalance in actionsDaniel Borkmann
Since commit 55334a5db5cd ("net_sched: act: refuse to remove bound action outside"), we end up with a wrong reference count for a tc action. Test case 1: FOO="1,6 0 0 4294967295," BAR="1,6 0 0 4294967294," tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \ action bpf bytecode "$FOO" tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 1 bind 1 tc actions replace action bpf bytecode "$BAR" index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe index 1 ref 2 bind 1 tc actions replace action bpf bytecode "$FOO" index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 3 bind 1 Test case 2: FOO="1,6 0 0 4294967295," tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 1 bind 1 tc actions add action drop index 1 RTNETLINK answers: File exists [...] tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 2 bind 1 tc actions add action drop index 1 RTNETLINK answers: File exists [...] tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 3 bind 1 What happens is that in tcf_hash_check(), we check tcf_common for a given index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've found an existing action. Now there are the following cases: 1) We do a late binding of an action. In that case, we leave the tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init() handler. This is correctly handeled. 2) We replace the given action, or we try to add one without replacing and find out that the action at a specific index already exists (thus, we go out with error in that case). In case of 2), we have to undo the reference count increase from tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to do so because of the 'tcfc_bindcnt > 0' check which bails out early with an -EPERM error. Now, while commit 55334a5db5cd prevents 'tc actions del action ...' on an already classifier-bound action to drop the reference count (which could then become negative, wrap around etc), this restriction only accounts for invocations outside a specific action's ->init() handler. One possible solution would be to add a flag thus we possibly trigger the -EPERM ony in situations where it is indeed relevant. After the patch, above test cases have correct reference count again. Fixes: 55334a5db5cd ("net_sched: act: refuse to remove bound action outside") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-30Bluetooth: 6lowpan: Fix possible raceAlexander Aring
This patch fix a possible race after calling register_netdev. After calling netdev_register it could be possible that netdev_ops callbacks use the uninitialized private data of lowpan_dev. By moving the initialization of this data before netdev_register we can be sure that initialized private data is be used after netdev_register. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30mac802154: Fix memory corruption with global deferred transmit state.Lennert Buytenhek
When transmitting a packet via a mac802154 driver that can sleep in its transmit function, mac802154 defers the call to the driver's transmit function to a per-device workqueue. However, mac802154 uses a single global work_struct for this, which means that if you have more than one registered mac802154 interface in the system, and you transmit on more than one of them at the same time, you'll very easily cause memory corruption. This patch moves the deferred transmit processing state from global variables to struct ieee802154_local, and this seems to fix the memory corruption issue. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30netfilter: nf_conntrack: checking for IS_ERR() instead of NULLDan Carpenter
We recently changed this from nf_conntrack_alloc() to nf_ct_tmpl_alloc() so the error handling needs to changed to check for NULL instead of IS_ERR(). Fixes: 0838aa7fcfcd ('netfilter: fix netns dependencies with conntrack templates') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-30Bluetooth: cmtp: Do not use list_for_each_safe when not neededChristophe JAILLET
There is no need to use the safe version of list_for_each here. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30netfilter: bridge: do not initialize statics to 0 or NULLBernhard Thaler
Fix checkpatch.pl "ERROR: do not initialise statics to 0 or NULL" for all statics explicitly initialized to 0. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-30netfilter: bridge: reduce nf_bridge_info to 32 bytes againFlorian Westphal
We can use union for most of the temporary cruft (original ipv4/ipv6 address, source mac, physoutdev) since they're used during different stages of br netfilter traversal. Also get rid of the last two ->mask users. Shrinks struct from 48 to 32 on 64bit arch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-30Bluetooth: Move create/accept phy link completed callback to amp.cArron Wang
To avoid amp module hooks from hci_event.c Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30Bluetooth: Move amp assoc read/write completed callback to amp.cArron Wang
To avoid amp module hooks from hci_event.c Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30Bluetooth: Move get info completed callback to a2mp.cArron Wang
To avoid a2mp module hooks from hci_event.c and send getinfo response operation only required by a2mp module, we can move this callback to a2mp.c Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30Bluetooth: Move high speed specific event under BT_HS optionArron Wang
Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30Bluetooth: Add BT_HS config optionArron Wang
Move A2MP Module under BT_HS config option and allow the user have flexible option to choose the feature only they need a2mp_discover_amp() & a2mp_channel_create() are a2mp module entry point for master and slave, and this is dynamic invoked depends on the userspace or remote request, then we defined their implementation depends on BT_HS config Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-07-30netfilter: nf_ct_sctp: minimal multihoming supportMichal Kubeček
Currently nf_conntrack_proto_sctp module handles only packets between primary addresses used to establish the connection. Any packets between secondary addresses are classified as invalid so that usual firewall configurations drop them. Allowing HEARTBEAT and HEARTBEAT-ACK chunks to establish a new conntrack would allow traffic between secondary addresses to pass through. A more sophisticated solution based on the addresses advertised in the initial handshake (and possibly also later dynamic address addition and removal) would be much harder to implement. Moreover, in general we cannot assume to always see the initial handshake as it can be routed through a different path. The patch adds two new conntrack states: SCTP_CONNTRACK_HEARTBEAT_SENT - a HEARTBEAT chunk seen but not acked SCTP_CONNTRACK_HEARTBEAT_ACKED - a HEARTBEAT acked by HEARTBEAT-ACK State transition rules: - HEARTBEAT_SENT responds to usual chunks the same way as NONE (so that the behaviour changes as little as possible) - HEARTBEAT_ACKED responds to usual chunks the same way as ESTABLISHED does, except the resulting state is HEARTBEAT_ACKED rather than ESTABLISHED - previously existing states except NONE are preserved when HEARTBEAT or HEARTBEAT-ACK is seen - NONE (in the initial direction) changes to HEARTBEAT_SENT on HEARTBEAT and to CLOSED on HEARTBEAT-ACK - HEARTBEAT_SENT changes to HEARTBEAT_ACKED on HEARTBEAT-ACK in the reply direction - HEARTBEAT_SENT and HEARTBEAT_ACKED are preserved on HEARTBEAT and HEARTBEAT-ACK otherwise Normally, vtag is set from the INIT chunk for the reply direction and from the INIT-ACK chunk for the originating direction (i.e. each of these defines vtag value for the opposite direction). For secondary conntracks, we can't rely on seeing INIT/INIT-ACK and even if we have seen them, we would need to connect two different conntracks. Therefore simplified logic is applied: vtag of first packet in each direction (HEARTBEAT in the originating and HEARTBEAT-ACK in reply direction) is saved and all following packets in that direction are compared with this saved value. While INIT and INIT-ACK define vtag for the opposite direction, vtags extracted from HEARTBEAT and HEARTBEAT-ACK are always for their direction. Default timeout values for new states are HEARTBEAT_SENT: 30 seconds (default hb_interval) HEARTBEAT_ACKED: 210 seconds (hb_interval * path_max_retry + max_rto) (We cannot expect to see the shutdown sequence so that, unlike ESTABLISHED, the HEARTBEAT_ACKED timeout shouldn't be too long.) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-07-30netfilter: nf_conntrack: silence warning on falling back to vmalloc()Pablo Neira Ayuso
Since 88eab472ec21 ("netfilter: conntrack: adjust nf_conntrack_buckets default value"), the hashtable can easily hit this warning. We got reports from users that are getting this message in a quite spamming fashion, so better silence this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>
2015-07-29act_bpf: fix memory leaks when replacing bpf programsDaniel Borkmann
We currently trigger multiple memory leaks when replacing bpf actions, besides others: comm "tc", pid 1909, jiffies 4294851310 (age 1602.796s) hex dump (first 32 bytes): 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ 18 b0 98 6d 00 88 ff ff 00 00 00 00 00 00 00 00 ...m............ backtrace: [<ffffffff817e623e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff8120a22d>] __vmalloc_node_range+0x1bd/0x2c0 [<ffffffff8120a37a>] __vmalloc+0x4a/0x50 [<ffffffff811a8d0a>] bpf_prog_alloc+0x3a/0xa0 [<ffffffff816c0684>] bpf_prog_create+0x44/0xa0 [<ffffffffa09ba4eb>] tcf_bpf_init+0x28b/0x3c0 [act_bpf] [<ffffffff816d7001>] tcf_action_init_1+0x191/0x1b0 [<ffffffff816d70a2>] tcf_action_init+0x82/0xf0 [<ffffffff816d4d12>] tcf_exts_validate+0xb2/0xc0 [<ffffffffa09b5838>] cls_bpf_modify_existing+0x98/0x340 [cls_bpf] [<ffffffffa09b5cd6>] cls_bpf_change+0x1a6/0x274 [cls_bpf] [<ffffffff816d56e5>] tc_ctl_tfilter+0x335/0x910 [<ffffffff816b9145>] rtnetlink_rcv_msg+0x95/0x240 [<ffffffff816df34f>] netlink_rcv_skb+0xaf/0xc0 [<ffffffff816b909e>] rtnetlink_rcv+0x2e/0x40 [<ffffffff816deaaf>] netlink_unicast+0xef/0x1b0 Issue is that the old content from tcf_bpf is allocated and needs to be released when we replace it. We seem to do that since the beginning of act_bpf on the filter and insns, later on the name as well. Example test case, after patch: # FOO="1,6 0 0 4294967295," # BAR="1,6 0 0 4294967294," # tc actions add action bpf bytecode "$FOO" index 2 # tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 2 ref 1 bind 0 # tc actions replace action bpf bytecode "$BAR" index 2 # tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe index 2 ref 1 bind 0 # tc actions replace action bpf bytecode "$FOO" index 2 # tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 2 ref 1 bind 0 # tc actions del action bpf index 2 [...] # echo "scan" > /sys/kernel/debug/kmemleak # cat /sys/kernel/debug/kmemleak | grep "comm \"tc\"" | wc -l 0 Fixes: d23b8ad8ab23 ("tc: add BPF based action") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29openvswitch: Re-add CONFIG_OPENVSWITCH_VXLANThomas Graf
This readds the config option CONFIG_OPENVSWITCH_VXLAN to avoid a hard dependency of OVS on VXLAN. It moves the VXLAN config compat code to vport-vxlan.c and allows compliation as a module. Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device") Fixes: 2661371ace96 ("openvswitch: fix compilation when vxlan is a module") Cc: Pravin B Shelar <pshelar@nicira.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29ipv6: flush nd cache on IFF_NOARP changeEric Dumazet
This patch is the IPv6 equivalent of commit 6c8b4e3ff81b ("arp: flush arp cache on IFF_NOARP change") Without it, we keep buggy neighbours in the cache, with destination MAC address equal to our own MAC address. Tested: tcpdump -i eth0 -s 0 ip6 -n -e & ip link set dev eth0 arp off ping6 remote // sends buggy frames ip link set dev eth0 arp on ping6 remote // should work once kernel is patched Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Mario Fanelli <mariofanelli@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29net: pktgen: Remove unused 'allocated_skbs' fieldBogdan Hamciuc
Field pktgen_dev.allocated_skbs had been written to, but never read from. The number of allocated skbs can be deduced anyway, from the total number of sent packets and the 'clone_skb' param. Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29net: pktgen: Observe needed_headroom of the deviceBogdan Hamciuc
Allocate enough space so as not to force the outgoing net device to do skb_realloc_headroom(). Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29lwtunnel: Make lwtun_encaps[] staticThomas Graf
Any external user should use the registration API instead of accessing this directly. Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29net: Set sk_txhash from a random numberTom Herbert
This patch creates sk_set_txhash and eliminates protocol specific inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a random number instead of performing flow dissection. sk_set_txash is also allowed to be called multiple times for the same socket, we'll need this when redoing the hash for negative routing advice. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29tipc: fix bug in broadcast synch message create functionJon Maloy
In commit d999297c3dbbe7fdd832f7fa4ec84301e170b3e6 ("tipc: reduce locking scope during packet reception") we introduced a new function tipc_build_bcast_sync_msg(), which carries initial synchronization data between two nodes at first contact and at re-contact. In this function, we missed to add synchronization data, with the effect that the broadcast link endpoints will fail to synchronize correctly at re-contact between a running and a restarted node. All other cases work as intended. With this commit, we fix this bug. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29packet: remove handling of tx_ring from prb_shutdown_retire_blk_timer()Tobias Klauser
Follow e8e85cc5eb57 ("packet: remove handling of tx_ring") and remove the tx_ring parameter from prb_shutdown_retire_blk_timer() as it is only called with tx_ring = 0. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29bridge: mdb: fix delmdb state in the notificationNikolay Aleksandrov
Since mdb states were introduced when deleting an entry the state was left as it was set in the delete request from the user which leads to the following output when doing a monitor (for example): $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent (monitor) dev br0 port eth3 grp 239.0.0.1 permanent $ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent (monitor) dev br0 port eth3 grp 239.0.0.1 temp ^^^ Note the "temp" state in the delete notification which is wrong since the entry was permanent, the state in a delete is always reported as "temp" regardless of the real state of the entry. After this patch: $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent (monitor) dev br0 port eth3 grp 239.0.0.1 permanent $ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent (monitor) dev br0 port eth3 grp 239.0.0.1 permanent There's one important note to make here that the state is actually not matched when doing a delete, so one can delete a permanent entry by stating "temp" in the end of the command, I've chosen this fix in order not to break user-space tools which rely on this (incorrect) behaviour. So to give an example after this patch and using the wrong state: $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent (monitor) dev br0 port eth3 grp 239.0.0.1 permanent $ bridge mdb del dev br0 port eth3 grp 239.0.0.1 temp (monitor) dev br0 port eth3 grp 239.0.0.1 permanent Note the state of the entry that got deleted is correct in the notification. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: ccb1c31a7a87 ("bridge: add flags to distinguish permanent mdb entires") Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29bridge: mcast: give fast leave precedence over multicast router and querierSatish Ashok
When fast leave is configured on a bridge port and an IGMP leave is received for a group, the group is not deleted immediately if there is a router detected or if multicast querier is configured. Ideally the group should be deleted immediately when fast leave is configured. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29bridge: Fix network header pointer for vlan tagged packetsToshiaki Makita
There are several devices that can receive vlan tagged packets with CHECKSUM_PARTIAL like tap, possibly veth and xennet. When (multiple) vlan tagged packets with CHECKSUM_PARTIAL are forwarded by bridge to a device with the IP_CSUM feature, they end up with checksum error because before entering bridge, the network header is set to ETH_HLEN (not including vlan header length) in __netif_receive_skb_core(), get_rps_cpu(), or drivers' rx functions, and nobody fixes the pointer later. Since the network header is exepected to be ETH_HLEN in flow-dissection and hash-calculation in RPS in rx path, and since the header pointer fix is needed only in tx path, set the appropriate network header on forwarding packets. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-29packet: tpacket_snd(): fix signed/unsigned comparisonAlexander Drozdov
tpacket_fill_skb() can return a negative value (-errno) which is stored in tp_len variable. In that case the following condition will be (but shouldn't be) true: tp_len > dev->mtu + dev->hard_header_len as dev->mtu and dev->hard_header_len are both unsigned. That may lead to just returning an incorrect EMSGSIZE errno to the user. Fixes: 52f1454f629fa ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case") Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-28arp: filter NOARP neighbours for SIOCGARPEric Dumazet
When arp is off on a device, and ioctl(SIOCGARP) is queried, a buggy answer is given with MAC address of the device, instead of the mac address of the destination/gateway. We filter out NUD_NOARP neighbours for /proc/net/arp, we must do the same for SIOCGARP ioctl. Tested: lpaa23:~# ./arp 10.246.7.190 MAC=00:01:e8:22:cb:1d // correct answer lpaa23:~# ip link set dev eth0 arp off lpaa23:~# cat /proc/net/arp # check arp table is now 'empty' IP address HW type Flags HW address Mask Device lpaa23:~# ./arp 10.246.7.190 MAC=00:1a:11:c3:0d:7f // buggy answer before patch (this is eth0 mac) After patch : lpaa23:~# ip link set dev eth0 arp off lpaa23:~# ./arp 10.246.7.190 ioctl(SIOCGARP) failed: No such device or address Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Vytautas Valancius <valas@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-28net/ipv4: suppress NETDEV_UP notification on address lifetime updateDavid Ward
This notification causes the FIB to be updated, which is not needed because the address already exists, and more importantly it may undo intentional changes that were made to the FIB after the address was originally added. (As a point of comparison, when an address becomes deprecated because its preferred lifetime expired, a notification on this chain is not generated.) The motivation for this commit is fixing an incompatibility between DHCP clients which set and update the address lifetime according to the lease, and a commercial VPN client which replaces kernel routes in a way that outbound traffic is sent only through the tunnel (and disconnects if any further route changes are detected via netlink). Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-28bridge: stp: when using userspace stp stop kernel hello and hold timersNikolay Aleksandrov
These should be handled only by the respective STP which is in control. They become problematic for devices with limited resources with many ports because the hold_timer is per port and fires each second and the hello timer fires each 2 seconds even though it's global. While in user-space STP mode these timers are completely unnecessary so it's better to keep them off. Also ensure that when the bridge is up these timers are started only when running with kernel STP. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-28Merge tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable patches: - Fix a situation where the client uses the wrong (zero) stateid. - Fix a memory leak in nfs_do_recoalesce Bugfixes: - Plug a memory leak when ->prepare_layoutcommit fails - Fix an Oops in the NFSv4 open code - Fix a backchannel deadlock - Fix a livelock in sunrpc when sendmsg fails due to low memory availability - Don't revalidate the mapping if both size and change attr are up to date - Ensure we don't miss a file extension when doing pNFS - Several fixes to handle NFSv4.1 sequence operation status bits correctly - Several pNFS layout return bugfixes" * tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits) nfs: Fix an oops caused by using other thread's stack space in ASYNC mode nfs: plug memory leak when ->prepare_layoutcommit fails SUNRPC: Report TCP errors to the caller sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable. NFSv4.2: handle NFS-specific llseek errors NFS: Don't clear desc->pg_moreio in nfs_do_recoalesce() NFS: Fix a memory leak in nfs_do_recoalesce NFS: nfs_mark_for_revalidate should always set NFS_INO_REVAL_PAGECACHE NFS: Remove the "NFS_CAP_CHANGE_ATTR" capability NFS: Set NFS_INO_REVAL_PAGECACHE if the change attribute is uninitialised NFS: Don't revalidate the mapping if both size and change attr are up to date NFSv4/pnfs: Ensure we don't miss a file extension NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked SUNRPC: xprt_complete_bc_request must also decrement the free slot count SUNRPC: Fix a backchannel deadlock pNFS: Don't throw out valid layout segments pNFS: pnfs_roc_drain() fix a race with open pNFS: Fix races between return-on-close and layoutreturn. pNFS: pnfs_roc_drain should return 'true' when sleeping pNFS: Layoutreturn must invalidate all existing layout segments. ...
2015-07-27packet: missing dev_put() in packet_do_bind()Lars Westerhoff
When binding a PF_PACKET socket, the use count of the bound interface is always increased with dev_hold in dev_get_by_{index,name}. However, when rebound with the same protocol and device as in the previous bind the use count of the interface was not decreased. Ultimately, this caused the deletion of the interface to fail with the following message: unregister_netdevice: waiting for dummy0 to become free. Usage count = 1 This patch moves the dev_put out of the conditional part that was only executed when either the protocol or device changed on a bind. Fixes: 902fefb82ef7 ('packet: improve socket create/bind latency in some cases') Signed-off-by: Lars Westerhoff <lars.westerhoff@newtec.eu> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27SUNRPC: Report TCP errors to the callerTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-27fib_trie: Drop unnecessary calls to leaf_pull_suffixAlexander Duyck
It was reported that update_suffix was taking a long time on systems where a large number of leaves were attached to a single node. As it turns out fib_table_flush was calling update_suffix for each leaf that didn't have all of the aliases stripped from it. As a result, on this large node removing one leaf would result in us calling update_suffix for every other leaf on the node. The fix is to just remove the calls to leaf_pull_suffix since they are redundant as we already have a call in resize that will go through and update the suffix length for the node before we exit out of fib_table_flush or fib_table_flush_external. Reported-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable.NeilBrown
The networking layer does not reliably report the distinction between a non-block write failing because: 1/ the queue is too full already and 2/ a memory allocation attempt failed. The distinction is important because in the first case it is appropriate to retry as soon as the socket reports that it is writable, and in the second case a small delay is required as the socket will most likely report as writable but kmalloc could still fail. sk_stream_wait_memory() exhibits this distinction nicely, setting 'vm_wait' if a small wait is needed. However in the non-blocking case it always returns -EAGAIN no matter the cause of the failure. This -EAGAIN call get all the way to sunrpc. The sunrpc layer expects EAGAIN to indicate the first cause, and ENOBUFS to indicate the second. Various documentation suggests that this is not unreasonable, but does not guarantee the desired error codes. The result of getting -EAGAIN when -ENOBUFS is expected is that the send is tried again in a tight loop and soft lockups are reported. so: add tests after calls to xs_sendpages() to translate -EAGAIN into -ENOBUFS if the socket is writable. This cannot happen inside xs_sendpages() as the test for "is socket writable" is different between TCP and UDP. With this change, the tight loop retrying xs_sendpages() becomes a loop which only retries every 250ms, and so will not trigger a soft-lockup warning. It is possible that the write did fail because the queue was too full and by the time xs_sendpages() completed, the queue was writable again. In this case an extra 250ms delay is inserted that isn't really needed. This circumstance suggests a degree of congestion so a delay is not necessarily a bad thing, and it can only cause a single 250ms delay, not a series of them. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-07-27lwtunnel: use kfree_skb() instead of vanilla kfree()Dan Carpenter
kfree_skb() is correct here. Fixes: ffce41962ef6 ('lwtunnel: support dst output redirect function') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>