summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2014-07-27inet: frag: remove periodic secret rebuild timerFlorian Westphal
merge functionality into the eviction workqueue. Instead of rebuilding every n seconds, take advantage of the upper hash chain length limit. If we hit it, mark table for rebuild and schedule workqueue. To prevent frequent rebuilds when we're completely overloaded, don't rebuild more than once every 5 seconds. ipfrag_secret_interval sysctl is now obsolete and has been marked as deprecated, it still can be changed so scripts won't be broken but it won't have any effect. A comment is left above each unused secret_timer variable to avoid confusion. Joint work with Nikolay Aleksandrov. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: remove lru listFlorian Westphal
no longer used. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: don't account number of fragment queuesFlorian Westphal
The 'nqueues' counter is protected by the lru list lock, once thats removed this needs to be converted to atomic counter. Given this isn't used for anything except for reporting it to userspace via /proc, just remove it. We still report the memory currently used by fragment reassembly queues. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: move eviction of queues to work queueFlorian Westphal
When the high_thresh limit is reached we try to toss the 'oldest' incomplete fragment queues until memory limits are below the low_thresh value. This happens in softirq/packet processing context. This has two drawbacks: 1) processors might evict a queue that was about to be completed by another cpu, because they will compete wrt. resource usage and resource reclaim. 2) LRU list maintenance is expensive. But when constantly overloaded, even the 'least recently used' element is recent, so removing 'lru' queue first is not 'fairer' than removing any other fragment queue. This moves eviction out of the fast path: When the low threshold is reached, a work queue is scheduled which then iterates over the table and removes the queues that exceed the memory limits of the namespace. It sets a new flag called INET_FRAG_EVICTED on the evicted queues so the proper counters will get incremented when the queue is forcefully expired. When the high threshold is reached, no more fragment queues are created until we're below the limit again. The LRU list is now unused and will be removed in a followup patch. Joint work with Nikolay Aleksandrov. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: move evictor calls into frag_find functionFlorian Westphal
First step to move eviction handling into a work queue. We lose two spots that accounted evicted fragments in MIB counters. Accounting will be restored since the upcoming work-queue evictor invokes the frag queue timer callbacks instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: remove hash size assumptions from callersFlorian Westphal
hide actual hash size from individual users: The _find function will now fold the given hash value into the required range. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: constify match, hashfn and constructor argumentsFlorian Westphal
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-24ipv6: remove obsolete comment in ip6_append_data()Li RongQing
After 11878b40e[net-timestamp: SOCK_RAW and PING timestamping], this comment becomes obsolete since the codes check not only UDP socket, but also RAW sock; and the codes are clear, not need the comments Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-24net: do not name the pointer to struct net_device netWANG Cong
"net" is normally for struct net*, pointer to struct net_device should be named to either "dev" or "ndev" etc. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-24net: filter: rename 'struct sock_filter_int' into 'struct bpf_insn'Alexei Starovoitov
eBPF is used by socket filtering, seccomp and soon by tracing and exposed to userspace, therefore 'sock_filter_int' name is not accurate. Rename it to 'bpf_insn' Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-24net_sched: remove exceptional & on function nameHimangi Saraogi
In this file, function names are otherwise used as pointers without &. A simplified version of the Coccinelle semantic patch that makes this change is as follows: // <smpl> @r@ identifier f; @@ f(...) { ... } @@ identifier r.f; @@ - &f + f // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-24neigh: remove exceptional & on function nameHimangi Saraogi
In this file, function names are otherwise used as pointers without &. A simplified version of the Coccinelle semantic patch that makes this change is as follows: // <smpl> @r@ identifier f; @@ f(...) { ... } @@ identifier r.f; @@ - &f + f // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-24igmp: remove exceptional & on function nameHimangi Saraogi
In this file, function names are otherwise used as pointers without &. A simplified version of the Coccinelle semantic patch that makes this change is as follows: // <smpl> @r@ identifier f; @@ f(...) { ... } @@ identifier r.f; @@ - &f + f // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-24openvswitch: Add skb_clone NULL check for the sampling action.Andy Zhou
Fix a bug where skb_clone() NULL check is missing in sample action implementation. Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-07-24openvswitch: Sample action without side effectsSimon Horman
The sample action is rather generic, allowing arbitrary actions to be executed based on a probability. However its use, within the Open vSwitch code-base is limited: only a single user-space action is ever nested. A consequence of the current implementation of sample actions is that depending on weather the sample action executed (due to its probability) any side-effects of nested actions may or may not be present before executing subsequent actions. This has the potential to complicate verification of valid actions by the (kernel) datapath. And indeed adding support for push and pop MPLS actions inside sample actions is one case where such case. In order to allow all supported actions to be continue to be nested inside sample actions without the potential need for complex verification code this patch changes the implementation of the sample action in the kernel datapath so that sample actions are more like a function call and any side effects of nested actions are not present when executing subsequent actions. With the above in mind the motivation for this change is twofold: * To contain side-effects the sample action in the hope of making it easier to deal with in the future and; * To avoid some rather complex verification code introduced in the MPLS datapath patch. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-07-24openvswitch: Avoid memory corruption in queue_userspace_packet()Andy Zhou
In queue_userspace_packet(), the ovs_nla_put_flow return value is not checked. This is fine as long as key_attr_size() returns the correct value. In case it does not, the current code may corrupt buffer memory. Add a run time assertion catch this case to avoid silent failure. Reported-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-07-24openvswitch: Enable tunnel GSO for OVS bridge.Pravin B Shelar
Following patch enables all available tunnel GSO features for OVS bridge device so that ovs can use hardware offloads available to underling device. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Andy Zhou <azhou@nicira.com>
2014-07-24openvswitch: Allow each vport to have an array of 'port_id's.Alex Wang
In order to allow handlers directly read upcalls from datapath, we need to support per-handler netlink socket for each vport in datapath. This commit makes this happen. Also, it is guaranteed to be backward compatible with previous branch. Signed-off-by: Alex Wang <alexw@nicira.com> Acked-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-07-23net: filter: split filter.c into two filesAlexei Starovoitov
BPF is used in several kernel components. This split creates logical boundary between generic eBPF core and the rest kernel/bpf/core.c: eBPF interpreter net/core/filter.c: classic->eBPF converter, classic verifiers, socket filters This patch only moves functions. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-23ipv4: Make IP_MULTICAST_ALL and IP_MSFILTER work on raw socketsQuentin Armitage
Currently, although IP_MULTICAST_ALL and IP_MSFILTER ioctl calls succeed on raw sockets, there is no code to implement the functionality on received packets; it is only implemented for UDP sockets. The raw(7) man page states: "In addition, all ip(7) IPPROTO_IP socket options valid for datagram sockets are supported", which implies these ioctls should work on raw sockets. To fix this, add a call to ip_mc_sf_allow on raw sockets. This should not break any existing code, since the current position of not calling ip_mc_sf_filter makes it behave as if neither the IP_MULTICAST_ALL nor the IP_MSFILTER ioctl had been called. Adding the call to ip_mc_sf_allow will therefore maintain the current behaviour so long as IP_MULTICAST_ALL and IP_MSFILTER ioctls are not called. Any code that currently is calling IP_MULTICAST_ALL or IP_MSFILTER ioctls on raw sockets presumably is wanting the filter to be applied, although no filtering will currently be occurring. Signed-off-by: Quentin Armitage <quentin@armitage.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-23sock: remove skb argument from sk_rcvqueues_fullSorin Dumitru
It hasn't been used since commit 0fd7bac(net: relax rcvbuf limits). Signed-off-by: Sorin Dumitru <sorin@returnze.ro> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22net: sctp: Rename SCTP_XMIT_NAGLE_DELAY to SCTP_XMIT_DELAYDavid Laight
MSG_MORE and 'corking' a socket would require that the transmit of a data chunk be delayed. Rename the return value to be less specific. Signed-off-by: David Laight <david.laight@aculab.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22net: sctp: Open out the check for NagleDavid Laight
The check for Nagle contains 6 separate checks all of which must be true before a data packet is delayed. Separate out each into its own 'if (test) return SCTP_XMIT_OK' so that the reasons can be individually described. Also return directly with SCTP_XMIT_RWND_FULL. Delete the now-unused 'retval' variable and 'finish' label from sctp_packet_can_append_data(). Signed-off-by: David Laight <david.laight@aculab.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/infiniband/hw/cxgb4/device.c The cxgb4 conflict was simply overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21af_iucv: avoid path quiesce of severed path in shutdown()Ursula Braun
An af_iucv stress test showed -EPIPE results for sendmsg() calls. They are caused by quiescing a path even though it has been already severed by peer. For IUCV transport shutdown() consists of 2 steps: (1) sending the shutdown message to peer (2) quiescing the iucv path If the iucv path between these 2 steps is severed due to peer closing the path, the quiesce step is no longer needed. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reported-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Antonio Quartulli says: ==================== pull request [net]: batman-adv 20140721 here you have two fixes that we have been testing for quite some time (this is why they arrived a bit late in the rc cycle). Patch 1) ensures that BLA packets get dropped and not forwarded to the mesh even if they reach batman-adv within QinQ frames. Forwarding them into the mesh means messing up with the TT database of other nodes which can generate all kind of unexpected behaviours during route computation. Patch 2) avoids a couple of race conditions triggered upon fast VLAN deletion-addition. Such race conditions are pretty dangerous because they not only create inconsistencies in the TT database of the nodes in the network, but such scenario is also unrecoverable (unless nodes are rebooted). ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21ipv4: fix buffer overflow in ip_options_compile()Eric Dumazet
There is a benign buffer overflow in ip_options_compile spotted by AddressSanitizer[1] : Its benign because we always can access one extra byte in skb->head (because header is followed by struct skb_shared_info), and in this case this byte is not even used. [28504.910798] ================================================================== [28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile [28504.913170] Read of size 1 by thread T15843: [28504.914026] [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0 [28504.915394] [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120 [28504.916843] [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.918175] [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0 [28504.919490] [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90 [28504.920835] [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70 [28504.922208] [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140 [28504.923459] [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b [28504.924722] [28504.925106] Allocated by thread T15843: [28504.925815] [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120 [28504.926884] [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.927975] [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0 [28504.929175] [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90 [28504.930400] [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70 [28504.931677] [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140 [28504.932851] [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b [28504.934018] [28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right [28504.934377] of 40-byte region [ffff880026382800, ffff880026382828) [28504.937144] [28504.937474] Memory state around the buggy address: [28504.938430] ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr [28504.939884] ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.941294] ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.942504] ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.943483] ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.945573] ^ [28504.946277] ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.094949] ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.096114] ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.097116] ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.098472] ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.099804] Legend: [28505.100269] f - 8 freed bytes [28505.100884] r - 8 redzone bytes [28505.101649] . - 8 allocated bytes [28505.102406] x=1..7 - x allocated bytes + (8-x) redzone bytes [28505.103637] ================================================================== [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21batman-adv: fix TT VLAN inconsistency on VLAN re-addAntonio Quartulli
When a VLAN interface (on top of batX) is removed and re-added within a short timeframe TT does not have enough time to properly cleanup. This creates an internal TT state mismatch as the newly created softif_vlan will be initialized from scratch with a TT client count of zero (even if TT entries for this VLAN still exist). The resulting TT messages are bogus due to the counter / tt client listing mismatch, thus creating inconsistencies on every node in the network To fix this issue destroy_vlan() has to not free the VLAN object immediately but it has to be kept alive until all the TT entries for this VLAN have been removed. destroy_vlan() still removes the sysfs folder so that the user has the feeling that everything went fine. If the same VLAN is re-added before the old object is free'd, then the latter is resurrected and re-used. Implement such behaviour by increasing the reference counter of a softif_vlan object every time a new local TT entry for such VLAN is created and remove the object from the list only when all the TT entries have been destroyed. Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
2014-07-21batman-adv: drop QinQ claim frames in bridge loop avoidanceSimon Wunderlich
Since bridge loop avoidance only supports untagged or simple 802.1q tagged VLAN claim frames, claim frames with stacked VLAN headers (QinQ) should be detected and dropped. Transporting the over the mesh may cause problems on the receivers, or create bogus entries in the local tt tables. Reported-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
2014-07-20dns_resolver: Null-terminate the right stringBen Hutchings
*_result[len] is parsed as *(_result[len]) which is not at all what we want to touch here. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 84a7c0b1db1c ("dns_resolver: assure that dns_query() result is null-terminated") Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20tipc: fix sparse non static symbol warningsWei Yongjun
Fixes the following sparse warnings: net/tipc/socket.c:545:5: warning: symbol 'tipc_sk_proto_rcv' was not declared. Should it be static? net/tipc/socket.c:2015:5: warning: symbol 'tipc_ioctl' was not declared. Should it be static? Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20net/rxrpc/ar-key.c: drop negativity check on unsigned valueAndrey Utkin
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=80611 Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains updates for your net-next tree, they are: 1) Use kvfree() helper function from x_tables, from Eric Dumazet. 2) Remove extra timer from the conntrack ecache extension, use a workqueue instead to redeliver lost events to userspace instead, from Florian Westphal. 3) Removal of the ulog targets for ebtables and iptables. The nflog infrastructure superseded this almost 9 years ago, time to get rid of this code. 4) Replace the list of loggers by an array now that we can only have two possible non-overlapping logger flavours, ie. kernel ring buffer and netlink logging. 5) Move Eric Dumazet's log buffer code to nf_log to reuse it from all of the supported per-family loggers. 6) Consolidate nf_log_packet() as an unified interface for packet logging. After this patch, if the struct nf_loginfo is available, it explicitly selects the logger that is used. 7) Move ip and ip6 logging code from xt_LOG to the corresponding per-family loggers. Thus, x_tables and nf_tables share the same code for packet logging. 8) Add generic ARP packet logger, which is used by nf_tables. The format aims to be consistent with the output of xt_LOG. 9) Add generic bridge packet logger. Again, this is used by nf_tables and it routes the packets to the real family loggers. As a result, we get consistent logging format for the bridge family. The ebt_log logging code has been intentionally left in place not to break backward compatibility since the logging output differs from xt_LOG. 10) Update nft_log to explicitly request the required family logger when needed. 11) Finish nft_log so it supports arp, ip, ip6, bridge and inet families. Allowing selection between netlink and kernel buffer ring logging. 12) Several fixes coming after the netfilter core logging changes spotted by robots. 13) Use IS_ENABLED() macros whenever possible in the netfilter tree, from Duan Jiong. 14) Removal of a couple of unnecessary branch before kfree, from Fabian Frederick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20net_sched: avoid generating same handle for u32 filtersCong Wang
When kernel generates a handle for a u32 filter, it tries to start from the max in the bucket. So when we have a filter with the max (fff) handle, it will cause kernel always generates the same handle for new filters. This can be shown by the following command: tc qdisc add dev eth0 ingress tc filter add dev eth0 parent ffff: protocol ip pref 770 handle 800::fff u32 match ip protocol 1 0xff tc filter add dev eth0 parent ffff: protocol ip pref 770 u32 match ip protocol 1 0xff ... we will get some u32 filters with same handle: # tc filter show dev eth0 parent ffff: filter protocol ip pref 770 u32 filter protocol ip pref 770 u32 fh 800: ht divisor 1 filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0 match 00010000/00ff0000 at 8 filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0 match 00010000/00ff0000 at 8 filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0 match 00010000/00ff0000 at 8 filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0 match 00010000/00ff0000 at 8 handles should be unique. This patch fixes it by looking up a bitmap, so that can guarantee the handle is as unique as possible. For compatibility, we still start from 0x800. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20net: print a notification on device renameVeaceslav Falico
Currently it's done silently (from the kernel part), and thus it might be hard to track the renames from logs. Add a simple netdev_info() to notify the rename, but only in case the previous name was valid. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Vlad Yasevich <vyasevic@redhat.com> CC: stephen hemminger <stephen@networkplumber.org> CC: Jerry Chu <hkchu@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20net: print net_device reg_state in netdev_* unless it's registeredVeaceslav Falico
This way we'll always know in what status the device is, unless it's running normally (i.e. NETDEV_REGISTERED). Also, emit a warning once in case of a bad reg_state. CC: "David S. Miller" <davem@davemloft.net> CC: Jason Baron <jbaron@akamai.com> CC: Eric Dumazet <edumazet@google.com> CC: Vlad Yasevich <vyasevic@redhat.com> CC: stephen hemminger <stephen@networkplumber.org> CC: Jerry Chu <hkchu@google.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Joe Perches <joe@perches.com> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20net_sched: hold tcf_lock in netdevice notifierCong Wang
We modify mirred action (m->tcfm_dev) in netdev event, we need to prevent on-going mirred actions from reading freed m->tcfm_dev. So we need to acquire this spin lock. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-17Update setapp/getapp prototypes in dcbnl_rtnl_ops to return int instead of u8Anish Bhatt
v2: fixed issue with checking return of dcbnl_rtnl_ops->getapp() Signed-off-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-17net_sched: cancel nest attribute on failure in tcf_exts_dump()Cong Wang
Like other places, we need to cancel the nest attribute after we start. Fortunately the netlink message will not be sent on failure, so it's not a big problem at all. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16openvswitch: make generic netlink group conststephen hemminger
Generic netlink tables can be const. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16udp: Use hash2 for long hash1 chains in __udp*_lib_mcast_deliver.David Held
Many multicast sources can have the same port which can result in a very large list when hashing by port only. Hash by address and port instead if this is the case. This makes multicast more similar to unicast. On a 24-core machine receiving from 500 multicast sockets on the same port, before this patch 80% of system CPU was used up by spin locking and only ~25% of packets were successfully delivered. With this patch, all packets are delivered and kernel overhead is ~8% system CPU on spinlocks. Signed-off-by: David Held <drheld@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16udp: Simplify __udp*_lib_mcast_deliver.David Held
Switch to using sk_nulls_for_each which shortens the code and makes it easier to update. Signed-off-by: David Held <drheld@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16netlink: remove bool varibleVarka Bhadram
This patch removes the bool variable 'pass'. If the swith case exist return true or return false. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16rtnetlink: Drop unnecessary return value from ndo_dflt_fdb_delAlexander Duyck
This change cleans up ndo_dflt_fdb_del to drop the ENOTSUPP return value since that isn't actually returned anywhere in the code. As a result we are able to drop a few lines by just defaulting this to -EINVAL. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16net: remove open-coded skb_cow_head.françois romieu
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16tipc: ensure sequential message delivery across dual bearersJon Paul Maloy
When we run broadcast packets over dual bearers/interfaces, the current transmission code is flipping bearers between each sent packet, with the purpose of leveraging the double bandwidth available. The receiving bclink is resequencing the packets if needed, so all messages are delivered upwards from the broadcast link in the correct order, even if they may arrive in concurrent interrupts. However, at the moment of delivery upwards to the socket, we release all spinlocks (bclink_lock, node_lock), so it is still possible that arriving messages bypass each other before they reach the socket queue. We fix this by applying the same technique we are using for unicast traffic. We use a link selector (i.e., the last bit of sending port number) to ensure that messages from the same sender socket always are sent over the same bearer. This guarantees sequential delivery between socket pairs, which is sufficient to satisfy the protocol spec, as well as all known user requirements. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16tipc: rename temporarily named functionsJon Paul Maloy
After the previous commit, we can now give the functions with temporary names, such as tipc_link_xmit2(), tipc_msg_build2() etc., their proper names. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16tipc: remove unreferenced functionsJon Paul Maloy
We can now remove a number of functions which have become obsolete and unreferenced through this commit series. There are no functional changes in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16tipc: start using the new multicast functionsJon Paul Maloy
In this commit, we convert the socket multicast send function to directly call the new multicast/broadcast function (tipc_bclink_xmit2()) introduced in the previous commit. We do this instead of letting the call go via the now obsolete tipc_port_mcast_xmit(), hence saving a call level and some code complexity. We also remove the initial destination lookup at the message sending side, and replace that with an unconditional lookup at the receiving side, including on the sending node itself. This makes the destination lookup and message transfer more uniform than before. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16tipc: add new functions for multicast and broadcast distributionJon Paul Maloy
We add a new broadcast link transmit function in bclink.c and a new receive function in socket.c. The purpose is to move the branching between external and internal destination down to the link layer, just as we have done with unicast in earlier commits. We also make use of the new link-independent fragmentation support that was introduced in an earlier commit series. This gives a shorter and simpler code path, and makes it possible to obtain copy-free buffer delivery to all node local destination sockets. The new transmission code is added in parallel with the existing one, and will be used by the socket multicast send function in the next commit in this series. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>