summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2013-09-04tcp: Change return value of tcp_rcv_established()Vijay Subramanian
tcp_rcv_established() returns only one value namely 0. We change the return value to void (as suggested by David Miller). After commit 0c24604b (tcp: implement RFC 5961 4.2), we no longer send RSTs in response to SYNs. We can remove the check and processing on the return value of tcp_rcv_established(). We also fix jtcp_rcv_established() in tcp_probe.c to match that of tcp_rcv_established(). Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04tunnels: harmonize cleanup done on skb on rx pathNicolas Dichtel
The goal of this patch is to harmonize cleanup done on a skbuff on rx path. Before this patch, behaviors were different depending of the tunnel type. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04tunnels: harmonize cleanup done on skb on xmit pathNicolas Dichtel
The goal of this patch is to harmonize cleanup done on a skbuff on xmit path. Before this patch, behaviors were different depending of the tunnel type. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04skb: allow skb_scrub_packet() to be used by tunnelsNicolas Dichtel
This function was only used when a packet was sent to another netns. Now, it can also be used after tunnel encapsulation or decapsulation. Only skb_orphan() should not be done when a packet is not crossing netns. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04vxlan: remove net arg from vxlan[6]_xmit_skb()Nicolas Dichtel
This argument is not used, let's remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-04iptunnels: remove net arg from iptunnel_xmit()Nicolas Dichtel
This argument is not used, let's remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03llc: Use normal etherdevice.h testsJoe Perches
Convert the llc_<foo> static inlines to the equivalents from etherdevice.h and remove the llc_<foo> static inline functions. llc_mac_null -> is_zero_ether_addr llc_mac_multicast -> is_multicast_ether_addr llc_mac_match -> ether_addr_equal Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03net: fix comment typo for __skb_alloc_pages()Florian Fainelli
The name of the function in the comment is __skb_alloc_page() while we are actually commenting __skb_alloc_pages(). Fix this typo and make it a valid kernel doc comment. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-nextDavid S. Miller
Marc Kleine-Budde says: ==================== this is a pull request for net-next. There are two patches from Gerhard Sittig, which improves the clock handling on mpc5121. Oliver Hartkopp provides a patch that adds a per rule limitation of frame hops. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03Merge branch 'for-davem' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next John W. Linville says: ==================== Please accept this batch of updates intended for the 3.12 stream. For the mac80211 bits, Johannes says this: "This time I have various improvements all over the place: IBSS, mesh, testmode, AP client powersave handling, one of the rare rfkill patches and some code cleanup." Also for mac80211: "And I also have some more changes for -next, just a few small fixes and improvements, nothing really stands out." And for iwlwifi: "This time I have some powersave work (notably uAPSD support), CQM offloads, support for a new firmware API and various code cleanups." Regarding the Bluetooth bits, Gustavo says: "Patches to 3.12, here we have: * implementation of a proper tty_port for RFCOMM devices, this fixes some issues people were seeing lately in the kernel. * Add voice_setting option for SCO, it is used for SCO Codec selection * bugfixes, small improvements and clean ups" For the NFC bits, Samuel says: "With this one we have: - A few pn533 improvements and minor fixes. Testing our pn533 driver against Google's NCI stack triggered a few issues that we fixed now. We also added Tx fragmentation support to this driver. - More NFC secure element handling. We added a GET_SE netlink command for getting all the discovered secure elements, and we defined 2 additional secure element netlink event (transaction and connectivity). We also fixed a couple of typos and copy-paste bugs from the secure element handling code. - Firmware download support for the pn544 driver. This chipset can enter a special mode where it's waiting for firmware blobs to replace the already flashed one. We now support that mode." With repect to the ath tree, Kalle says: "New features in ath10k are rx/tx checsumming in hw and survey scan implemented by Michal. Also he made fixes to different areas of the driver, most notable being fixing the case when using two streams and reducing the number of interface combinations to avoid firmware crashes. Bartosz did a clean related to how we handle SoC power save in PCI layer. For ath6kl Mohammed and Vasanth sent each a patch to fix two infrequent crashes." I also pulled the wireless tree into wireless-next to support a request from Johannes. On top of all that, there are the usual sort of driver updates. The mwifiex, brcmfmac, brcmsmac, ath9k, and rt2x00 drivers all get some attention, as does the bcma bus and a few other random bits here and there. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-03net: etherdevice: add address inherit helperBjørn Mork
Some etherdevices inherit their address from a parent or master device. The addr_assign_type should be updated along with the address in these cases. Adding a helper function to simplify this. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-02net: make snmp_mib_free static inlineCong Wang
Fengguang reported: net/built-in.o: In function `in6_dev_finish_destroy': (.text+0x4ca7d): undefined reference to `snmp_mib_free' this is due to snmp_mib_free() is defined when CONFIG_INET is enabled, but in6_dev_finish_destroy() is now moved to core kernel. I think snmp_mib_free() is small enough to be inlined, so just make it static inline. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31vxlan: add ipv6 proxy supportCong Wang
This patch adds the IPv6 version of "arp_reduce", ndisc_send_na() will be needed. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31vxlan: add ipv6 route short circuit supportCong Wang
route short circuit only has IPv4 part, this patch adds the IPv6 part. nd_tbl will be needed. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31vxlan: add ipv6 supportCong Wang
This patch adds IPv6 support to vxlan device, as the new version RFC already mentions it: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03 Cc: David Stevens <dlstevens@us.ibm.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31ipv6: export a stub for IPv6 symbols used by vxlanCong Wang
In case IPv6 is compiled as a module, introduce a stub for ipv6_sock_mc_join and ipv6_sock_mc_drop etc.. It will be used by vxlan module. Suggested by Ben. This is an ugly but easy solution for now. Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31ipv6: move ip6_dst_hoplimit() into core kernelCong Wang
It will be used by vxlan, and may not be inlined. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31qdisc: make args to qdisc_create_default conststephen hemminger
Fixes warnings introduced by the qdisc default patch. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-31qdisc: allow setting default queuing disciplinestephen hemminger
By default, the pfifo_fast queue discipline has been used by default for all devices. But we have better choices now. This patch allow setting the default queueing discipline with sysctl. This allows easy use of better queueing disciplines on all devices without having to use tc qdisc scripts. It is intended to allow an easy path for distributions to make fq_codel or sfq the default qdisc. This patch also makes pfifo_fast more of a first class qdisc, since it is now possible to manually override the default and explicitly use pfifo_fast. The behavior for systems who do not use the sysctl is unchanged, they still get pfifo_fast Also removes leftover random # in sysctl net core. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-30driver:net:stmmac: Disable DMA store and forward mode if platform data ↵Sonic Zhang
force_thresh_dma_mode is set. Some synopsys ip implementation doesn't support DMA store and forward mode, such as BF60x. So, set force_thresh_dma_mode to use DMA thresholds only. Update document and devicetree as well. Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29drivers:net: Convert dma_alloc_coherent(...__GFP_ZERO) to dma_zalloc_coherentJoe Perches
__GFP_ZERO is an uncommon flag and perhaps is better not used. static inline dma_zalloc_coherent exists so convert the uses of dma_alloc_coherent with __GFP_ZERO to the more common kernel style with zalloc. Remove memset from the static inline dma_zalloc_coherent and add just one use of __GFP_ZERO instead. Trivially reduces the size of the existing uses of dma_zalloc_coherent. Realign arguments as appropriate. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29pkt_sched: fq: Fair Queue packet schedulerEric Dumazet
- Uses perfect flow match (not stochastic hash like SFQ/FQ_codel) - Uses the new_flow/old_flow separation from FQ_codel - New flows get an initial credit allowing IW10 without added delay. - Special FIFO queue for high prio packets (no need for PRIO + FQ) - Uses a hash table of RB trees to locate the flows at enqueue() time - Smart on demand gc (at enqueue() time, RB tree lookup evicts old unused flows) - Dynamic memory allocations. - Designed to allow millions of concurrent flows per Qdisc. - Small memory footprint : ~8K per Qdisc, and 104 bytes per flow. - Single high resolution timer for throttled flows (if any). - One RB tree to link throttled flows. - Ability to have a max rate per flow. We might add a socket option to add per socket limitation. Attempts have been made to add TCP pacing in TCP stack, but this seems to add complex code to an already complex stack. TCP pacing is welcomed for flows having idle times, as the cwnd permits TCP stack to queue a possibly large number of packets. This removes the 'slow start after idle' choice, hitting badly large BDP flows, and applications delivering chunks of data as video streams. Nicely spaced packets : Here interface is 10Gbit, but flow bottleneck is ~20Mbit cwin is big, yet FQ avoids the typical bursts generated by TCP (as in netperf TCP_RR -- -r 100000,100000) 15:01:23.545279 IP A > B: . 78193:81089(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.545394 IP B > A: . ack 81089 win 3668 <nop,nop,timestamp 11597985 1115> 15:01:23.546488 IP A > B: . 81089:83985(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.546565 IP B > A: . ack 83985 win 3668 <nop,nop,timestamp 11597986 1115> 15:01:23.547713 IP A > B: . 83985:86881(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.547778 IP B > A: . ack 86881 win 3668 <nop,nop,timestamp 11597987 1115> 15:01:23.548911 IP A > B: . 86881:89777(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.548949 IP B > A: . ack 89777 win 3668 <nop,nop,timestamp 11597988 1115> 15:01:23.550116 IP A > B: . 89777:92673(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.550182 IP B > A: . ack 92673 win 3668 <nop,nop,timestamp 11597989 1115> 15:01:23.551333 IP A > B: . 92673:95569(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.551406 IP B > A: . ack 95569 win 3668 <nop,nop,timestamp 11597991 1115> 15:01:23.552539 IP A > B: . 95569:98465(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.552576 IP B > A: . ack 98465 win 3668 <nop,nop,timestamp 11597992 1115> 15:01:23.553756 IP A > B: . 98465:99913(1448) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.554138 IP A > B: P 99913:100001(88) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.554204 IP B > A: . ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.554234 IP B > A: . 65248:68144(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.555620 IP B > A: . 68144:71040(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.557005 IP B > A: . 71040:73936(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.558390 IP B > A: . 73936:76832(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.559773 IP B > A: . 76832:79728(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.561158 IP B > A: . 79728:82624(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.562543 IP B > A: . 82624:85520(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.563928 IP B > A: . 85520:88416(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.565313 IP B > A: . 88416:91312(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.566698 IP B > A: . 91312:94208(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.568083 IP B > A: . 94208:97104(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.569467 IP B > A: . 97104:100000(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.570852 IP B > A: . 100000:102896(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.572237 IP B > A: . 102896:105792(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.573639 IP B > A: . 105792:108688(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.575024 IP B > A: . 108688:111584(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.576408 IP B > A: . 111584:114480(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.577793 IP B > A: . 114480:117376(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> TCP timestamps show that most packets from B were queued in the same ms timeframe (TSval 1159799{3,4}), but FQ managed to send them right in time to avoid a big burst. In slow start or steady state, very few packets are throttled [1] FQ gets a bunch of tunables as : limit : max number of packets on whole Qdisc (default 10000) flow_limit : max number of packets per flow (default 100) quantum : the credit per RR round (default is 2 MTU) initial_quantum : initial credit for new flows (default is 10 MTU) maxrate : max per flow rate (default : unlimited) buckets : number of RB trees (default : 1024) in hash table. (consumes 8 bytes per bucket) [no]pacing : disable/enable pacing (default is enable) All of them can be changed on a live qdisc. $ tc qd add dev eth0 root fq help Usage: ... fq [ limit PACKETS ] [ flow_limit PACKETS ] [ quantum BYTES ] [ initial_quantum BYTES ] [ maxrate RATE ] [ buckets NUMBER ] [ [no]pacing ] $ tc -s -d qd qdisc fq 8002: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 256 quantum 3028 initial_quantum 15140 Sent 216532416 bytes 148395 pkt (dropped 0, overlimits 0 requeues 14) backlog 0b 0p requeues 14 511 flows, 511 inactive, 0 throttled 110 gc, 0 highprio, 0 retrans, 1143 throttled, 0 flows_plimit [1] Except if initial srtt is overestimated, as if using cached srtt in tcp metrics. We'll provide a fix for this issue. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29can: gw: add a per rule limitation of frame hopsOliver Hartkopp
Usually the received CAN frames can be processed/routed as much as 'max_hops' times (which is given at module load time of the can-gw module). Introduce a new configuration option to reduce the number of possible hops for a specific gateway rule to a value smaller then max_hops. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2013-08-29net: packet: add randomized fanout schedulerDaniel Borkmann
We currently allow for different fanout scheduling policies in pf_packet such as scheduling by skb's rxhash, round-robin, by cpu, and rollover. Also allow for a random, equidistributed selection of the socket from the fanout process group. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29net: add netdev_for_each_upper_dev_rcu()Veaceslav Falico
The new macro netdev_for_each_upper_dev_rcu(dev, upper, iter) iterates through the dev->upper_dev_list starting from the first element, using the netdev_upper_get_next_dev_rcu(dev, &iter). Must be called under RCU read lock. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29net: add lower_dev_list to net_device and make a full meshVeaceslav Falico
This patch adds lower_dev_list list_head to net_device, which is the same as upper_dev_list, only for lower devices, and begins to use it in the same way as the upper list. It also changes the way the whole adjacent device lists work - now they contain *all* of upper/lower devices, not only the first level. The first level devices are distinguished by the bool neighbour field in netdev_adjacent, also added by this patch. There are cases when a device can be added several times to the adjacent list, the simplest would be: /---- eth0.10 ---\ eth0- --- bond0 \---- eth0.20 ---/ where both bond0 and eth0 'see' each other in the adjacent lists two times. To avoid duplication of netdev_adjacent structures ref_nr is being kept as the number of times the device was added to the list. The 'full view' is achieved by adding, on link creation, all of the upper_dev's upper_dev_list devices as upper devices to all of the lower_dev's lower_dev_list devices (and to the lower_dev itself), and vice versa. On unlink they are removed using the same logic. I've tested it with thousands vlans/bonds/bridges, everything works ok and no observable lags even on a huge number of interfaces. Memory footprint for 128 devices interconnected with each other via both upper and lower (which is impossible, but for the comparison) lists would be: 128*128*2*sizeof(netdev_adjacent) = 1.5MB but in the real world we usualy have at most several devices with slaves and a lot of vlans, so the footprint will be much lower. CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29tcp: TSO packets automatic sizingEric Dumazet
After hearing many people over past years complaining against TSO being bursty or even buggy, we are proud to present automatic sizing of TSO packets. One part of the problem is that tcp_tso_should_defer() uses an heuristic relying on upcoming ACKS instead of a timer, but more generally, having big TSO packets makes little sense for low rates, as it tends to create micro bursts on the network, and general consensus is to reduce the buffering amount. This patch introduces a per socket sk_pacing_rate, that approximates the current sending rate, and allows us to size the TSO packets so that we try to send one packet every ms. This field could be set by other transports. Patch has no impact for high speed flows, where having large TSO packets makes sense to reach line rate. For other flows, this helps better packet scheduling and ACK clocking. This patch increases performance of TCP flows in lossy environments. A new sysctl (tcp_min_tso_segs) is added, to specify the minimal size of a TSO packet (default being 2). A follow-up patch will provide a new packet scheduler (FQ), using sk_pacing_rate as an input to perform optional per flow pacing. This explains why we chose to set sk_pacing_rate to twice the current rate, allowing 'slow start' ramp up. sk_pacing_rate = 2 * cwnd * mss / srtt v2: Neal Cardwell reported a suspect deferring of last two segments on initial write of 10 MSS, I had to change tcp_tso_should_defer() to take into account tp->xmit_size_goal_segs Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Tom Herbert <therbert@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29ipv6: drop fragmented ndisc packets by default (RFC 6980)Hannes Frederic Sowa
This patch implements RFC6980: Drop fragmented ndisc packets by default. If a fragmented ndisc packet is received the user is informed that it is possible to disable the check. Cc: Fernando Gont <fernando@gont.com.ar> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29net: sctp: reorder sctp_globals to reduce cacheline usageDaniel Borkmann
Reduce cacheline usage from 2 to 1 cacheline for sctp_globals structure. By reordering elements, we can close gaps and simply achieve the following: Current situation: /* size: 80, cachelines: 2, members: 10 */ /* sum members: 57, holes: 4, sum holes: 16 */ /* padding: 7 */ /* last cacheline: 16 bytes */ Afterwards: /* size: 64, cachelines: 1, members: 10 */ /* padding: 7 */ Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-29Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c
2013-08-28Merge branch 'for-john' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
2013-08-28Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c net/mac80211/ibss.c
2013-08-27Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch Jesse Gross says: ==================== A number of significant new features and optimizations for net-next/3.12. Highlights are: * "Megaflows", an optimization that allows userspace to specify which flow fields were used to compute the results of the flow lookup. This allows for a major reduction in flow setups (the major performance bottleneck in Open vSwitch) without reducing flexibility. * Converting netlink dump operations to use RCU, allowing for additional parallelism in userspace. * Matching and modifying SCTP protocol fields. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-28net: syncookies: export cookie_v6_init_sequence/cookie_v6_checkPatrick McHardy
Extract the local TCP stack independant parts of tcp_v6_init_sequence() and cookie_v6_check() and export them for use by the upcoming IPv6 SYNPROXY target. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28netfilter: add SYNPROXY core/targetPatrick McHardy
Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy core with common functions and an address family specific target. The SYNPROXY receives the connection request from the client, responds with a SYN/ACK containing a SYN cookie and announcing a zero window and checks whether the final ACK from the client contains a valid cookie. It then establishes a connection to the original destination and, if successful, sends a window update to the client with the window size announced by the server. Support for timestamps, SACK, window scaling and MSS options can be statically configured as target parameters if the features of the server are known. If timestamps are used, the timestamp value sent back to the client in the SYN/ACK will be different from the real timestamp of the server. In order to now break PAWS, the timestamps are translated in the direction server->client. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28net: syncookies: export cookie_v4_init_sequence/cookie_v4_checkPatrick McHardy
Extract the local TCP stack independant parts of tcp_v4_init_sequence() and cookie_v4_check() and export them for use by the upcoming SYNPROXY target. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-28netfilter: nf_conntrack: make sequence number adjustments usuable without NATPatrick McHardy
Split out sequence number adjustments from NAT and move them to the conntrack core to make them usable for SYN proxying. The sequence number adjustment information is moved to a seperate extend. The extend is added to new conntracks when a NAT mapping is set up for a connection using a helper. As a side effect, this saves 24 bytes per connection with NAT in the common case that a connection does not have a helper assigned. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-08-26openvswitch: Add SCTP supportJoe Stringer
This patch adds support for rewriting SCTP src,dst ports similar to the functionality already available for TCP/UDP. Rewriting SCTP ports is expensive due to double-recalculation of the SCTP checksums; this is performed to ensure that packets traversing OVS with invalid checksums will continue to the destination with any checksum corruption intact. Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c include/linux/inetdevice.h The inetdevice.h conflict involves moving the IPV4_DEVCONF values into a UAPI header, overlapping additions of some new entries. The iwlwifi conflict is a context overlap. Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-26bcma: add bcma_core_pci_power_save()Hauke Mehrtens
This enables or disables power saving on the PCIe bus when the wifi is in operation or not. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-08-26bcma: do not export bcma_core_pci_extend_L1timer()Hauke Mehrtens
This is not called any more, do not export it. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-08-26bcma: add method to power up and down the PCIe core by wifi driverHauke Mehrtens
The wifi driver should tell the PCIe core that it is now in operation so that some workarounds can be applied and the power state is changed. This should replace the call to bcma_core_pci_extend_L1timer by the brcmsmac driver. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2013-08-23net: Add NEXTHDR_SCTP to ipv6.hJoe Stringer
Signed-off-by: Joe Stringer <joe@wand.net.nz> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-23openvswitch: Mega flow implementationAndy Zhou
Add wildcarded flow support in kernel datapath. Wildcarded flow can improve OVS flow set up performance by avoid sending matching new flows to the user space program. The exact performance boost will largely dependent on wildcarded flow hit rate. In case all new flows hits wildcard flows, the flow set up rate is within 5% of that of linux bridge module. Pravin has made significant contributions to this patch. Including API clean ups and bug fixes. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2013-08-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Merge networking fixes from David Miller: 1) Revert Johannes Berg's genetlink locking fix, because it causes regressions. Johannes and Pravin Shelar are working on fixing things properly. 2) Do not drop ipv6 ICMP messages without a redirected header option, they are legal. From Duan Jiong. 3) Missing error return propagation in probing of via-ircc driver. From Alexey Khoroshilov. 4) Do not clear out broadcast/multicast/unicast/WOL bits in r8169 when initializing, from Peter Wu. 5) realtek phy driver programs wrong interrupt status bit, from Giuseppe CAVALLARO. 6) Fix statistics regression in AF_PACKET code, from Willem de Bruijn. 7) Bridge code uses wrong bitmap length, from Toshiaki Makita. 8) SFC driver uses wrong indexes to look up MAC filters, from Ben Hutchings. 9) Don't pass stack buffers into usb control operations in hso driver, from Daniel Gimpelevich. 10) Multiple ipv6 fragmentation headers in one packet is illegal and such packets should be dropped, from Hannes Frederic Sowa. 11) When TCP sockets are "repaired" as part of checkpoint/restart, the timestamp field of SKBs need to be refreshed otherwise RTOs can be wildly off. From Andrey Vagin. 12) Fix memcpy args (uses 'address of pointer' instead of 'pointer') in hostp driver. From Dan Carpenter. 13) nl80211hdr_put() doesn't return an ERR_PTR, but some code believes it does. From Dan Carpenter. 14) Fix regression in wireless SME disconnects, from Johannes Berg. 15) Don't use a stack buffer for DMA in zd1201 USB wireless driver, from Jussi Kivilinna. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits) ipv4: expose IPV4_DEVCONF ipv6: handle Redirect ICMP Message with no Redirected Header option be2net: fix disabling TX in be_close() Revert "genetlink: fix family dump race" hso: Fix stack corruption on some architectures hso: Earlier catch of error condition sfc: Fix lookup of default RX MAC filters when steered using ethtool bridge: Use the correct bit length for bitmap functions in the VLAN code packet: restore packet statistics tp_packets to include drops net: phy: rtl8211: fix interrupt on status link change r8169: remember WOL preferences on driver load via-ircc: don't return zero if via_ircc_open() failed macvtap: Ignore tap features when VNET_HDR is off macvtap: Correctly set tap features when IFF_VNET_HDR is disabled. macvtap: simplify usage of tap_features tcp: set timestamps for restored skb-s bnx2x: set VF DMAE when first function has 0 supported VFs bnx2x: Protect against VFs' ndos when SR-IOV is disabled bnx2x: prevent VF benign attentions bnx2x: Consider DCBX remote error ...
2013-08-23cfg80211: add flags to cfg80211_rx_mgmt()Vladimir Kondratiev
Add flags intended to report various auxiliary information and introduce the NL80211_RXMGMT_FLAG_ANSWERED flag to report that the frame was already answered by the device. Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com> [REPLIED->ANSWERED, reword commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-08-22fs_enet: cleanup clock API useGerhard Sittig
make the Freescale ethernet driver get, prepare and enable the FEC clock during probe(); disable and unprepare the clock upon remove(), put is done by the devm approach; hold a reference to the clock over the period of use. clock lookup is non-fatal as not all platforms provide clock specs in their device tree; failure to enable specified clocks is fatal. Signed-off-by: Gerhard Sittig <gsi@denx.de> Signed-off-by: Anatolij Gustschin <agust@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-22ipv4: expose IPV4_DEVCONFstephen hemminger
IP sends device configuration (see inet_fill_link_af) as an array in the netlink information, but the indices in that array are not exposed to userspace through any current santized header file. It was available back in 2.6.32 (in /usr/include/linux/sysctl.h) but was broken by: commit 02291680ffba92e5b5865bc0c5e7d1f3056b80ec Author: Eric W. Biederman <ebiederm@xmission.com> Date: Sun Feb 14 03:25:51 2010 +0000 net ipv4: Decouple ipv4 interface parameters from binary sysctl numbers Eric was solving the sysctl problem but then the indices were re-exposed by a later addition of devconf support for IPV4 commit 9f0f7272ac9506f4c8c05cc597b7e376b0b9f3e4 Author: Thomas Graf <tgraf@infradead.org> Date: Tue Nov 16 04:32:48 2010 +0000 ipv4: AF_INET link address family Putting them in /usr/include/linux/ip.h seemed the logical match for the DEVCONF_ definitions for IPV6 in /usr/include/linux/ip6.h Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-08-22ipv6: handle Redirect ICMP Message with no Redirected Header optionDuan Jiong
rfc 4861 says the Redirected Header option is optional, so the kernel should not drop the Redirect Message that has no Redirected Header option. In this patch, the function ip6_redirect_no_header() is introduced to deal with that condition. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
2013-08-22Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Some constifications, from Mathias Krause. 2) Catch bugs if a hold timer is still active when xfrm_policy_destroy() is called, from Fan Du. 3) Remove a redundant address family checking, from Fan Du. 4) Make xfrm_state timer monotonic to be independent of system clock changes, from Fan Du. 5) Remove an outdated comment on returning -EREMOTE in the xfrm_lookup(), from Rami Rosen. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>