summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-02-28net: Verify permission to dest_net in newlinkEric W. Biederman
When applicable verify that the caller has permision to create a network device in another network namespace. This check is already present when moving a network device between network namespaces in setlink so all that is needed is to duplicate that check in newlink. This change almost backports cleanly, but there are context conflicts as the code that follows was added in v4.0-rc1 Fixes: b51642f6d77b net: Enable a userns root rtnl calls that are safe for unprivilged users Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28tcp: tso: allow CA_CWR state in tcp_tso_should_defer()Eric Dumazet
Another TCP issue is triggered by ECN. Under pressure, receiver gets ECN marks, and send back ACK packets with ECE TCP flag. Senders enter CA_CWR state. In this state, tcp_tso_should_defer() is short cut : if (icsk->icsk_ca_state != TCP_CA_Open) goto send_now; This means that about all ACK packets we receive are triggering a partial send, and because cwnd is kept small, we can only send a small amount of data for each incoming ACK, which in return generate more ACK packets. Allowing CA_Open and CA_CWR states to enable TSO defer in tcp_tso_should_defer() brings performance back : TSO autodefer has more chance to defer under pressure. This patch increases TSO and LRO/GRO efficiency back to normal levels, and does not impact overall ECN behavior. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28tcp: tso: restore IW10 after TSO autosizingEric Dumazet
With sysctl_tcp_min_tso_segs being 4, it is very possible that tcp_tso_should_defer() decides not sending last 2 MSS of initial window of 10 packets. This also applies if autosizing decides to send X MSS per GSO packet, and cwnd is not a multiple of X. This patch implements an heuristic based on age of first skb in write queue : If it was sent very recently (less than half srtt), we can predict that no ACK packet will come in less than half rtt, so deferring might cause an under utilization of our window. This is visible on initial send (IW10) on web servers, but more generally on some RPC, as the last part of the message might need an extra RTT to get delivered. Tested: Ran following packetdrill test // A simple server-side test that sends exactly an initial window (IW10) // worth of packets. `sysctl -e -q net.ipv4.tcp_min_tso_segs=4` 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +.1 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> +.1 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 +0 write(4, ..., 14600) = 14600 +0 > . 1:5841(5840) ack 1 win 457 +0 > . 5841:11681(5840) ack 1 win 457 // Following packet should be sent right now. +0 > P. 11681:14601(2920) ack 1 win 457 +.1 < . 1:1(0) ack 14601 win 257 +0 close(4) = 0 +0 > F. 14601:14601(0) ack 1 +.1 < F. 1:1(0) ack 14602 win 257 +0 > . 14602:14602(0) ack 2 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-28tcp: tso: remove tp->tso_deferredEric Dumazet
TSO relies on ability to defer sending a small amount of packets. Heuristic is to wait for future ACKS in hope to send more packets at once. Current algorithm uses a per socket tso_deferred field as a pseudo timer. This pseudo timer relies on future ACK, but there is no guarantee we receive them in time. Fix would be to use a real timer, but cost of such timer is probably too expensive for typical cases. This patch changes the logic to test the time of last transmit, because we should not add bursts of more than 1ms for any given flow. We've used this patch for about two years at Google, before FQ/pacing as it would reduce a fair amount of bursts. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27tipc: make media address offset a common defineErik Hugne
With the exception of infiniband media which does not use media offsets, the media address is always located at offset 4 in the media info field as defined by the protocol, so we move the definition to the generic bearer.h Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27tipc: rename media/msg related definitionsErik Hugne
The TIPC_MEDIA_ADDR_SIZE and TIPC_MEDIA_ADDR_OFFSET names are misleading, as they actually define the size and offset of the whole media info field and not the address part. This patch does not have any functional changes. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27tipc: purge links when bearer is disabledErik Hugne
If a bearer is disabled by manual intervention, all links over that bearer should be purged, indicated with the 'shutting_down' flag. Otherwise tipc will get confused if a new bearer is enabled using a different media type. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27tipc: fix nullpointer bug when subscribing to eventsErik Hugne
If a subscription request is sent to a topology server connection, and any error occurs (malformed request, oom or limit reached) while processing this request, TIPC should terminate the subscriber connection. While doing so, it tries to access fields in an already freed (or never allocated) subscription element leading to a nullpointer exception. We fix this by removing the subscr_terminate function and terminate the connection immediately upon any subscription failure. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27tipc: only create header copy for name distr messagesErik Hugne
The TIPC name distributor pushes topology updates to the cluster neighbors. Currently this is done in a unicast manner, and the skb holding the update is cloned for each cluster member. This is unnecessary, as we only modify the destnode field in the header so we change it to do pskb_copy instead. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27fib_trie: Remove leaf_infoAlexander Duyck
At this point the leaf_info hash is redundant. By adding the suffix length to the fib_alias hash list we no longer have need of leaf_info as we can determine the prefix length from fa_slen. So we can compress things by dropping the leaf_info structure from fib_trie and instead directly connect the leaves to the fib_alias hash list. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27fib_trie: Add slen to fib aliasAlexander Duyck
Make use of an empty spot in the alias to store the suffix length so that we don't need to pull that information from the leaf_info structure. This patch also makes a slight change to the user statistics. Instead of incrementing semantic_match_miss once per leaf_info miss we now just increment it once per leaf if a match was not found. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27fib_trie: Replace plen with slen in leaf_infoAlexander Duyck
This replaces the prefix length variable in the leaf_info structure with a suffix length value, or host identifier length in bits. By doing this it makes it easier to sort out since the tnodes and leaf are carrying this value as well since it is compatible with the ->pos field in tnodes. I also cleaned up one spot that had some list manipulation that could be simplified. I basically updated it so that we just use hlist_add_head_rcu instead of calling hlist_add_before_rcu on the first node in the list. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27fib_trie: Convert fib_alias to hlist from listAlexander Duyck
There isn't any advantage to having it as a list and by making it an hlist we make the fib_alias more compatible with the list_info in terms of the type of list used. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27multicast: Extend ip address command to enable multicast group join/leave onMadhu Challa
Joining multicast group on ethernet level via "ip maddr" command would not work if we have an Ethernet switch that does igmp snooping since the switch would not replicate multicast packets on ports that did not have IGMP reports for the multicast addresses. Linux vxlan interfaces created via "ip link add vxlan" have the group option that enables then to do the required join. By extending ip address command with option "autojoin" we can get similar functionality for openvswitch vxlan interfaces as well as other tunneling mechanisms that need to receive multicast traffic. The kernel code is structured similar to how the vxlan driver does a group join / leave. example: ip address add 224.1.1.10/24 dev eth5 autojoin ip address del 224.1.1.10/24 dev eth5 Signed-off-by: Madhu Challa <challa@noironetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27igmp v6: add __ipv6_sock_mc_join and __ipv6_sock_mc_dropMadhu Challa
Based on the igmp v4 changes from Eric Dumazet. 959d10f6bbf6("igmp: add __ip_mc_{join|leave}_group()") These changes are needed to perform igmp v6 join/leave while RTNL is held. Make ipv6_sock_mc_join and ipv6_sock_mc_drop wrappers around __ipv6_sock_mc_join and __ipv6_sock_mc_drop to avoid proliferation of work queues. Signed-off-by: Madhu Challa <challa@noironetworks.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27rhashtable: remove indirection for grow/shrink decision functionsDaniel Borkmann
Currently, all real users of rhashtable default their grow and shrink decision functions to rht_grow_above_75() and rht_shrink_below_30(), so that there's currently no need to have this explicitly selectable. It can/should be generic and private inside rhashtable until a real use case pops up. Since we can make this private, we'll save us this additional indirection layer and can improve insertion/deletion time as well. Reference: http://patchwork.ozlabs.org/patch/443040/ Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27udp: In udp_flow_src_port use random hash value if skb_get_hash failsTom Herbert
In the unlikely event that skb_get_hash is unable to deduce a hash in udp_flow_src_port we use a consistent random value instead. This is specified in GRE/UDP draft section 3.2.1: https://tools.ietf.org/html/draft-ietf-tsvwg-gre-in-udp-encap-04 Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-27Bluetooth: make hci_test_bit's addr constJiri Slaby
gcc5 warns about passing a const array to hci_test_bit which takes a non-const pointer: net/bluetooth/hci_sock.c: In function ‘hci_sock_sendmsg’: net/bluetooth/hci_sock.c:955:8: warning: passing argument 2 of ‘hci_test_bit’ discards ‘const’ qualifier from pointer target type [-Wdiscarded-array-qualifiers] &hci_sec_filter.ocf_mask[ogf])) && ^ net/bluetooth/hci_sock.c:49:19: note: expected ‘void *’ but argument is of type ‘const __u32 (*)[4] {aka const unsigned int (*)[4]}’ static inline int hci_test_bit(int nr, void *addr) ^ So make 'addr' 'const void *'. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: Gustavo Padovan <gustavo@padovan.org> Cc: Johan Hedberg <johan.hedberg@gmail.com>
2015-02-27Bluetooth: Update New CSRK event to match latest specificationJohan Hedberg
The 'master' parameter of the New CSRK event was recently renamed to 'type', with the old values kept for backwards compatibility as unauthenticated local/remote keys. This patch updates the code to take into account the two new (authenticated) values and ensures they get used based on the security level of the connection that the respective keys get distributed over. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-02-26sunrpc: integer underflow in rsc_parse()Dan Carpenter
If we call groups_alloc() with invalid values then it's might lead to memory corruption. For example, with a negative value then we might not allocate enough for sizeof(struct group_info). (We're doing this in the caller for consistency with other callers of groups_alloc(). The other alternative might be to move the check out of all the callers into groups_alloc().) Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-02-26mac80211: Send EAPOL frames at lowest rateJouni Malinen
The current minstrel_ht rate control behavior is somewhat optimistic in trying to find optimum TX rate. While this is usually fine for normal Data frames, there are cases where a more conservative set of retry parameters would be beneficial to make the connection more robust. EAPOL frames are critical to the authentication and especially the EAPOL-Key message 4/4 (the last message in the 4-way handshake) is important to get through to the AP. If that message is lost, the only recovery mechanism in many cases is to reassociate with the AP and start from scratch. This can often be avoided by trying to send the frame with more conservative rate and/or with more link layer retries. In most cases, minstrel_ht is currently using the initial EAPOL-Key frames for probing higher rates and this results in only five link layer transmission attempts (one at high(ish) MCS and four at MCS0). While this works with most APs, it looks like there are some deployed APs that may have issues with the EAPOL frames using HT MCS immediately after association. Similarly, there may be issues in cases where the signal strength or radio environment is not good enough to be able to get frames through even at couple of MCS 0 tries. The best approach for this would likely to be to reduce the TX rate for the last rate (3rd rate parameter in the set) to a low basic rate (say, 6 Mbps on 5 GHz and 2 or 5.5 Mbps on 2.4 GHz), but doing that cleanly requires some more effort. For now, we can start with a simple one-liner that forces the minimum rate to be used for EAPOL frames similarly how the TX rate is selected for the IEEE 802.11 Management frames. This does result in a small extra latency added to the cases where the AP would be able to receive the higher rate, but taken into account how small number of EAPOL frames are used, this is likely to be insignificant. A future optimization in the minstrel_ht design can also allow this patch to be reverted to get back to the more optimized initial TX rate. It should also be noted that many drivers that do not use minstrel as the rate control algorithm are already doing similar workarounds by forcing the lowest TX rate to be used for EAPOL frames. Cc: stable@vger.kernel.org Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-02-26bridge: fix link notification skb size calculation to include vlan rangesRoopa Prabhu
my previous patch skipped vlan range optimizations during skb size calculations for simplicity. This incremental patch considers vlan ranges during skb size calculations. This leads to a bit of code duplication in the fill and size calculation functions. But, I could not find a prettier way to do this. will take any suggestions. Previously, I had reused the existing br_get_link_af_size size calculation function to calculate skb size for notifications. Reusing it this time around creates some change in behaviour issues for the usual .get_link_af_size callback. This patch adds a new br_get_link_af_size_filtered() function to base the size calculation on the incoming filter flag and include vlan ranges. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-25net: dsa: Introduce dsa_is_port_initializedGuenter Roeck
To avoid race conditions when using the ds->ports[] array, we need to check if the accessed port has been initialized. Introduce and use helper function dsa_is_port_initialized for that purpose and use it where needed. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-25net: dsa: integrate with SWITCHDEV for HW bridgingFlorian Fainelli
In order to support bridging offloads in DSA switch drivers, select NET_SWITCHDEV to get access to the port_stp_update and parent_get_id NDOs that we are required to implement. To facilitate the integratation at the DSA driver level, we implement 3 types of operations: - port_join_bridge - port_leave_bridge - port_stp_update DSA will resolve which switch ports that are currently bridge port members as some Switch hardware/drivers need to know about that to limit the register programming to just the relevant registers (especially for slow MDIO buses). We also take care of setting the correct STP state when slave network devices are brought up/down while being bridge members. Finally, when a port is leaving the bridge, we make sure we set in BR_STATE_FORWARDING state, otherwise the bridge layer would leave it disabled as a result of having left the bridge. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-25net: dsa: Ensure that port array elements are initialized before being usedGuenter Roeck
A network device notifier can be called for one or more of the created slave devices before all slave devices have been registered. This can result in a mismatch between ds->phys_port_mask and the registered devices by the time the call is made, and it can result in a slave device being added to a bridge before its entry in ds->ports[] has been initialized. Rework the initialization code to initialize entries in ds->ports[] in dsa_slave_create. With this change, dsa_slave_create no longer needs to return slave_dev but can return an error code instead. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-25ipvs: allow rescheduling of new connections when port reuse is detectedMarcelo Ricardo Leitner
Currently, when TCP/SCTP port reusing happens, IPVS will find the old entry and use it for the new one, behaving like a forced persistence. But if you consider a cluster with a heavy load of small connections, such reuse will happen often and may lead to a not optimal load balancing and might prevent a new node from getting a fair load. This patch introduces a new sysctl, conn_reuse_mode, that allows controlling how to proceed when port reuse is detected. The default value will allow rescheduling of new connections only if the old entry was in TIME_WAIT state for TCP or CLOSED for SCTP. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-02-24ipv6: remove dead debug code from ip6_tunnel.cIan Morris
The IP6_TNL_TRACE macro is no longer used anywhere in the code so remove definition. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-24af_packet: don't pass empty blocks for PACKET_V3Alexander Drozdov
Before da413eec729d ("packet: Fixed TPACKET V3 to signal poll when block is closed rather than every packet") poll listening for an af_packet socket was not signaled if there was no packets to process. After the patch poll is signaled evety time when block retire timer expires. That happens because af_packet closes the current block on timeout even if the block is empty. Passing empty blocks to the user not only wastes CPU but also wastes ring buffer space increasing probability of packets dropping on small timeouts. Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com> Cc: Dan Collins <dan@dcollins.co.nz> Cc: Willem de Bruijn <willemb@google.com> Cc: Guy Harris <guy@alum.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-24rtnetlink: avoid 0 sized arraysSasha Levin
Arrays (when not in a struct) "shall have a value greater than zero". GCC complains when it's not the case here. Fixes: ba7d49b1f0 ("rtnetlink: provide api for getting and setting slave info") Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-24mac80211/minstrel: fix !x!=0 confusionJiri Slaby
Commit 06d961a8e210 ("mac80211/minstrel: use the new rate control API") inverted the condition 'if (msr->sample_limit != 0)' to 'if (!msr->sample_limit != 0)'. But it is confusing both to people and compilers (gcc5): net/mac80211/rc80211_minstrel.c: In function 'minstrel_get_rate': net/mac80211/rc80211_minstrel.c:376:26: warning: logical not is only applied to the left hand side of comparison if (!msr->sample_limit != 0) ^ Let there be only 'if (!msr->sample_limit)'. Fixes: 06d961a8e210 ("mac80211/minstrel: use the new rate control API") Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-02-24Merge https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvsPablo Neira Ayuso
Simon Horman says: ==================== Second Round of IPVS Fixes for v3.20 This patch resolves some memory leaks in connection synchronisation code that date back to v2.6.39. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-02-24cfg80211: calls nl80211_exit on errorJunjie Mao
nl80211_exit should be called in cfg80211_init if nl80211_init succeeds but regulatory_init or create_singlethread_workqueue fails. Signed-off-by: Junjie Mao <junjie_mao@yeah.net> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-02-24cfg80211: fix n_reg_rules to match world_regdomJason Abele
There are currently 8 rules in the world_regdom, but only the first 6 are applied due to an incorrect value for n_reg_rules. This causes channels 149-165 and 60GHz to be disabled. Signed-off-by: Jason Abele <jason@aether.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-02-24nl80211: fix memory leak in monitor flags parsingJohannes Berg
If monitor flags parsing results in active monitor but that isn't supported, the already allocated message is leaked. Fix this by moving the allocation after this check. Reported-by: Christian Engelmayer <cengelma@gmx.at> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-02-24nl80211: use loop index as type for net detect frequency resultsSamuel Tan
We currently add nested members of the NL80211_ATTR_SCAN_FREQUENCIES as NLA_U32 attributes of type NL80211_ATTR_WIPHY_FREQ in cfg80211_net_detect_results. However, since there can be an arbitrary number of frequency results, we should use the loop index of the loop used to add the frequency results to NL80211_ATTR_SCAN_FREQUENCIES as the type (i.e. nla_type) for each result attribute, rather than a fixed type. This change is in line with how nested members are added to NL80211_ATTR_SCAN_FREQUENCIES in the functions nl80211_send_wowlan_nd and nl80211_add_scan_req. Signed-off-by: Samuel Tan <samueltan@chromium.org> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-02-24mac80211: clear sdata->radar_requiredEliad Peller
If ieee80211_vif_use_channel() fails, we have to clear sdata->radar_required (which we might have just set). Failing to do it results in stale radar_required field which prevents starting new scan requests. Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Eliad Peller <eliad@wizery.com> [use false instead of 0] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-02-23Merge tag 'nfs-rdma-for-4.0-3' of git://git.linux-nfs.org/projects/anna/nfs-rdmaTrond Myklebust
NFS: RDMA Client Sparse Fix #2 This patch fixes another sparse fix found by Dan Carpenter's tool. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'nfs-rdma-for-4.0-3' of git://git.linux-nfs.org/projects/anna/nfs-rdma: xprtrdma: Store RDMA credits in unsigned variables
2015-02-23ipv6: addrconf: validate new MTU before applying itMarcelo Leitner
Currently we don't check if the new MTU is valid or not and this allows one to configure a smaller than minimum allowed by RFCs or even bigger than interface own MTU, which is a problem as it may lead to packet drops. If you have a daemon like NetworkManager running, this may be exploited by remote attackers by forging RA packets with an invalid MTU, possibly leading to a DoS. (NetworkManager currently only validates for values too small, but not for too big ones.) The fix is just to make sure the new value is valid. That is, between IPV6_MIN_MTU and interface's MTU. Note that similar check is already performed at ndisc_router_discovery(), for when kernel itself parses the RA. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-23net: compat: Ignore MSG_CMSG_COMPAT in compat_sys_{send, recv}msgCatalin Marinas
With commit a7526eb5d06b (net: Unbreak compat_sys_{send,recv}msg), the MSG_CMSG_COMPAT flag is blocked at the compat syscall entry points, changing the kernel compat behaviour from the one before the commit it was trying to fix (1be374a0518a, net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg). On 32-bit kernels (!CONFIG_COMPAT), MSG_CMSG_COMPAT is 0 and the native 32-bit sys_sendmsg() allows flag 0x80000000 to be set (it is ignored by the kernel). However, on a 64-bit kernel, the compat ABI is different with commit a7526eb5d06b. This patch changes the compat_sys_{send,recv}msg behaviour to the one prior to commit 1be374a0518a. The problem was found running 32-bit LTP (sendmsg01) binary on an arm64 kernel. Arguably, LTP should not pass 0xffffffff as flags to sendmsg() but the general rule is not to break user ABI (even when the user behaviour is not entirely sane). Fixes: a7526eb5d06b (net: Unbreak compat_sys_{send,recv}msg) Cc: Andy Lutomirski <luto@amacapital.net> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-23irda: replace current->state by set_current_state()Fabian Frederick
Use helper functions to access current->state. Direct assignments are prone to races and therefore buggy. current->state = TASK_RUNNING can be replaced by __set_current_state() Thanks to Peter Zijlstra for the exact definition of the problem. Suggested-By: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-23xprtrdma: Store RDMA credits in unsigned variablesChuck Lever
Dan Carpenter's static checker pointed out: net/sunrpc/xprtrdma/rpc_rdma.c:879 rpcrdma_reply_handler() warn: can 'credits' be negative? "credits" is defined as an int. The credits value comes from the server as a 32-bit unsigned integer. A malicious or broken server can plant a large unsigned integer in that field which would result in an underflow in the following logic, potentially triggering a deadlock of the mount point by blocking the client from issuing more RPC requests. net/sunrpc/xprtrdma/rpc_rdma.c: 876 credits = be32_to_cpu(headerp->rm_credit); 877 if (credits == 0) 878 credits = 1; /* don't deadlock */ 879 else if (credits > r_xprt->rx_buf.rb_max_requests) 880 credits = r_xprt->rx_buf.rb_max_requests; 881 882 cwnd = xprt->cwnd; 883 xprt->cwnd = credits << RPC_CWNDSHIFT; 884 if (xprt->cwnd > cwnd) 885 xprt_release_rqst_cong(rqst->rq_task); Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: eba8ff660b2d ("xprtrdma: Move credit update to RPC . . .") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-02-23decnet: Fix obvious o/0 typoRasmus Villemoes
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-22tcp: fix tcp_should_expand_sndbuf() to use tcp_packets_in_flight()Neal Cardwell
tcp_should_expand_sndbuf() does not expand the send buffer if we have filled the congestion window. However, it should use tcp_packets_in_flight() instead of tp->packets_out to make this check. Testing has established that the difference matters a lot if there are many SACKed packets, causing a needless performance shortfall. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-22bridge: add vlan info to bridge setlink and dellink notification messagesRoopa Prabhu
vlan add/deletes are not notified to userspace today. This patch adds vlan info to bridge newlink/dellink notifications generated from the bridge driver. Notifications use the RTEXT_FILTER_BRVLAN_COMPRESSED flag to compress vlans into ranges whereever applicable. The size calculations does not take ranges into account for simplicity. This has the potential for allocating a larger skb than required. There is an existing inconsistency with bridge NEWLINK and DELLINK change notifications. Both generate NEWLINK notifications. Since its always a NEWLINK notification, this patch includes all vlans the port belongs to in the notification. The NEWLINK and DELLINK request messages however only include the vlans to be added and deleted. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-22net: pktgen: disable xmit_clone on virtual devicesEric Dumazet
Trying to use burst capability (aka xmit_more) on a virtual device like bonding is not supported. For example, skb might be queued multiple times on a qdisc, with various list corruptions. Fixes: 38b2cf2982dc ("net: pktgen: packet bursting via skb->xmit_more") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-22net: __aligned(size) is preferred over __attribute__((aligned(size)))Ameen Ali
Signed-off-by: Ameen Ali <AmeenAli023@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-22batman-adv: Fix use of seq_has_overflowed()Joe Perches
net-next commit 6d91147d183c ("batman-adv: Remove uses of return value of seq_printf") incorrectly changed the overflow occurred return from -1 to 1. Change it back so that the test of batadv_write_buffer_text's return value in batadv_gw_client_seq_print_text works properly. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-22ipvs: add missing ip_vs_pe_put in sync codeJulian Anastasov
ip_vs_conn_fill_param_sync() gets in param.pe a module reference for persistence engine from __ip_vs_pe_getbyname() but forgets to put it. Problem occurs in backup for sync protocol v1 (2.6.39). Also, pe_data usually comes in sync messages for connection templates and ip_vs_conn_new() copies the pointer only in this case. Make sure pe_data is not leaked if it comes unexpectedly for normal connections. Leak can happen only if bogus messages are sent to backup server. Fixes: fe5e7a1efb66 ("IPVS: Backup, Adding Version 1 receive capability") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2015-02-22net: Remove state argument from skb_find_text()Bojan Prtvar
Although it is clear that textsearch state is intentionally passed to skb_find_text() as uninitialized argument, it was never used by the callers. Therefore, we can simplify skb_find_text() by making it local variable. Signed-off-by: Bojan Prtvar <prtvar.b@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-22netfilter: nf_tables: fix addition/deletion of elements from commit/abortPablo Neira Ayuso
We have several problems in this path: 1) There is a use-after-free when removing individual elements from the commit path. 2) We have to uninit() the data part of the element from the abort path to avoid a chain refcount leak. 3) We have to check for set->flags to see if there's a mapping, instead of the element flags. 4) We have to check for !(flags & NFT_SET_ELEM_INTERVAL_END) to skip elements that are part of the interval that have no data part, so they don't need to be uninit(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>