summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2012-05-04Merge tag 'v3.4-rc5' into nextJames Morris
Linux 3.4-rc5 Merge to pull in prerequisite change for Smack: 86812bb0de1a3758dc6c7aa01a763158a7c0638a Requested by Casey.
2012-05-03openvswitch: Release rtnl_lock if ovs_vport_cmd_build_info() failed.Ansis Atteka
This patch fixes a possible lock-up bug where rtnl_lock might not get released. Signed-off-by: Ansis Atteka <aatteka@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-05-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Transfer padding was wrong for full-speed USB in ASIX driver, fix from Ingo van Lil. 2) Propagate the negative packet offset fix into the PowerPC BPF JIT. From Jan Seiffert. 3) dl2k driver's private ioctls were letting unprivileged tasks make MII writes and other ugly bits like that. Fix from Jeff Mahoney. 4) Fix TX VLAN and RX packet drops in ucc_geth, from Joakim Tjernlund. 5) OOPS and network namespace fixes in IPVS from Hans Schillstrom and Julian Anastasov. 6) Fix races and sleeping in locked context bugs in drop_monitor, from Neil Horman. 7) Fix link status indication in smsc95xx driver, from Paolo Pisati. 8) Fix bridge netfilter OOPS, from Peter Huang. 9) L2TP sendmsg can return on error conditions with the socket lock held, oops. Fix from Sasha Levin. 10) udp_diag should return meaningful values for socket memory usage, from Shan Wei. 11) Eric Dumazet is so awesome he gets his own section: Socket memory cgroup code (I never should have applied those patches, grumble...) made erroneous changes to sk_sockets_allocated_read_positive(). It was changed to use percpu_counter_sum_positive (which requires BH disabling) instead of percpu_counter_read_positive (which does not). Revert back to avoid crashes and lockdep warnings. Adjust the default tcp_adv_win_scale and tcp_rmem[2] values to fix throughput regressions. This is necessary as a result of our more precise skb->truesize tracking. Fix SKB leak in netem packet scheduler. 12) New device IDs for various bluetooth devices, from Manoj Iyer, AceLan Kao, and Steven Harms. 13) Fix command completion race in ipw2200, from Stanislav Yakovlev. 14) Fix rtlwifi oops on unload, from Larry Finger. 15) Fix hard_mtu when adjusting hard_header_len in smsc95xx driver. From Stephane Fillod. 16) ehea driver registers it's IRQ before all the necessary state is setup, resulting in crashes. Fix from Thadeu Lima de Souza Cascardo. 17) Fix PHY connection failures in davinci_emac driver, from Anatolij Gustschin. 18) Missing break; in switch statement in bluetooth's hci_cmd_complete_evt(). Fix from Szymon Janc. 19) Fix queue programming in iwlwifi, from Johannes Berg. 20) Interrupt throttling defaults not being actually programmed into the hardware, fix from Jeff Kirsher and Ying Cai. 21) TLAN driver SKB encoding in descriptor busted on 64-bit, fix from Benjamin Poirier. 22) Fix blind status block RX producer pointer deref in TG3 driver, from Matt Carlson. 23) Promisc and multicast are busted on ehea, fixes from Thadeu Lima de Souza Cascardo. 24) Fix crashes in 6lowpan, from Alexander Smirnov. 25) tcp_complete_cwr() needs to be careful to not rewind the CWND to ssthresh if ssthresh has the "infinite" value. Fix from Yuchung Cheng. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits) sungem: Fix WakeOnLan tcp: change tcp_adv_win_scale and tcp_rmem[2] net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg drop_monitor: prevent init path from scheduling on the wrong cpu usbnet: fix failure handling in usbnet_probe usbnet: fix leak of transfer buffer of dev->interrupt ucc_geth: Add 16 bytes to max TX frame for VLANs net: ucc_geth, increase no. of HW RX descriptors netem: fix possible skb leak sky2: fix receive length error in mixed non-VLAN/VLAN traffic sky2: propogate rx hash when packet is copied net: fix two typos in skbuff.h cxgb3: Don't call cxgb_vlan_mode until q locks are initialized ixgbe: fix calling skb_put on nonlinear skb assertion bug ixgbe: Fix a memory leak in IEEE DCB igbvf: fix the bug when initializing the igbvf smsc75xx: enable mac to detect speed/duplex from phy smsc75xx: declare smsc75xx's MII as GMII capable smsc75xx: fix phy interrupt acknowledge smsc75xx: fix phy init reset loop ...
2012-05-03skb: Add skb_head_is_locked helper functionAlexander Duyck
This patch adds support for a skb_head_is_locked helper function. It is meant to be used any time we are considering transferring the head from skb->head to a paged frag. If the head is locked it means we cannot remove the head from the skb so it must be copied or we must take the skb as a whole. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03net: Fix truesize accounting in skb_gro_receive()Eric Dumazet
GRO is very optimistic in skb truesize estimates, only taking into account the used part of fragments. Be conservative, and use more precise computation, so that bloated GRO skbs can be collapsed eventually. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03auth_gss: the list of pseudoflavors not being parsed correctlySteve Dickson
gss_mech_list_pseudoflavors() parses a list of registered mechanisms. On that list contains a list of pseudo flavors which was not being parsed correctly, causing only the first pseudo flavor to be found. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-05-03userns: Replace user_ns_map_uid and user_ns_map_gid with from_kuid and from_kgidEric W. Biederman
These function are no longer needed replace them with their more useful equivalents. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03userns: Convert group_info values from gid_t to kgid_t.Eric W. Biederman
As a first step to converting struct cred to be all kuid_t and kgid_t values convert the group values stored in group_info to always be kgid_t values. Unless user namespaces are used this change should have no effect. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-05-03tcp: move stats merge to the end of tcp_try_coalesceAlexander Duyck
This change cleans up the last bits of tcp_try_coalesce so that we only need one goto which jumps to the end of the function. The idea is to make the code more readable by putting things in a linear order so that we start execution at the top of the function, and end it at the bottom. I also made a slight tweak to the code for handling frags when we are a clone. Instead of making it an if (clone) loop else nr_frags = 0 I changed the logic so that if (!clone) we just set the number of frags to 0 which disables the for loop anyway. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03tcp: Move code related to head frag in tcp_try_coalesceAlexander Duyck
This change reorders the code related to the use of an skb->head_frag so it is placed before we check the rest of the frags. This allows the code to read more linearly instead of like some sort of loop. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03tcp: Fix truesize accounting in tcp_try_coalesceAlexander Duyck
This patch addresses several issues in the way we were tracking the truesize in tcp_try_coalesce. First it was using ksize which prevents us from having a 0 sized head frag and getting a usable result. To resolve that this patch uses the end pointer which is set based off either ksize, or the frag_size supplied in build_skb. This allows us to compute the original truesize of the entire buffer and remove that value leaving us with just what was added as pages. The second issue was the use of skb->len if there is a mergeable head frag. We should only need to remove the size of an data aligned sk_buff from our current skb->truesize to compute the delta for a buffer with a reused head. By using skb->len the value of truesize was being artificially reduced which means that head frags could use more memory than buffers using standard allocations. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03net: Add missing linux/prefetch.h include to net/core/sock.cDavid S. Miller
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03net: Stop decapitating clones that have a head_fragAlexander Duyck
This change is meant ot prevent stealing the skb->head to use as a page in the event that the skb->head was cloned. This allows the other clones to track each other via shinfo->dataref. Without this we break down to two methods for tracking the reference count, one being dataref, the other being the page count. As a result it becomes difficult to track how many references there are to skb->head. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02net: implement tcp coalescing in tcp_queue_rcv()Eric Dumazet
Extend tcp coalescing implementing it from tcp_queue_rcv(), the main receiver function when application is not blocked in recvmsg(). Function tcp_queue_rcv() is moved a bit to allow its call from tcp_data_queue() This gives good results especially if GRO could not kick, and if skb head is a fragment. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02net: take care of cloned skbs in tcp_try_coalesce()Eric Dumazet
Before stealing fragments or skb head, we must make sure skbs are not cloned. Alexander was worried about destination skb being cloned : In bridge setups, a driver could be fooled if skb->data_len would not match skb nr_frags. If source skb is cloned, we must take references on pages instead. Bug happened using tcpdump (if not using mmap()) Introduce kfree_skb_partial() helper to cleanup code. Reported-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02tcp: change tcp_adv_win_scale and tcp_rmem[2]Eric Dumazet
tcp_adv_win_scale default value is 2, meaning we expect a good citizen skb to have skb->len / skb->truesize ratio of 75% (3/4) In 2.6 kernels we (mis)accounted for typical MSS=1460 frame : 1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392. So these skbs were considered as not bloated. With recent truesize fixes, a typical MSS=1460 frame truesize is now the more precise : 2048 + 256 = 2304. But 2304 * 3/4 = 1728. So these skb are not good citizen anymore, because 1460 < 1728 (GRO can escape this problem because it build skbs with a too low truesize.) This also means tcp advertises a too optimistic window for a given allocated rcvspace : When receiving frames, sk_rmem_alloc can hit sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often, especially when application is slow to drain its receive queue or in case of losses (netperf is fast, scp is slow). This is a major latency source. We should adjust the len/truesize ratio to 50% instead of 75% This patch : 1) changes tcp_adv_win_scale default to 1 instead of 2 2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account better truesize tracking and to allow autotuning tcp receive window to reach same value than before. Note that same amount of kernel memory is consumed compared to 2.6 kernels. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsgSasha Levin
l2tp_ip_sendmsg could return without releasing socket lock, making it all the way to userspace, and generating the following warning: [ 130.891594] ================================================ [ 130.894569] [ BUG: lock held when returning to user space! ] [ 130.897257] 3.4.0-rc5-next-20120501-sasha #104 Tainted: G W [ 130.900336] ------------------------------------------------ [ 130.902996] trinity/8384 is leaving the kernel with locks still held! [ 130.906106] 1 lock held by trinity/8384: [ 130.907924] #0: (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff82b9503f>] l2tp_ip_sendmsg+0x2f/0x550 Introduced by commit 2f16270 ("l2tp: Fix locking in l2tp_ip.c"). Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02drop_monitor: prevent init path from scheduling on the wrong cpuNeil Horman
I just noticed after some recent updates, that the init path for the drop monitor protocol has a minor error. drop monitor maintains a per cpu structure, that gets initalized from a single cpu. Normally this is fine, as the protocol isn't in use yet, but I recently made a change that causes a failed skb allocation to reschedule itself . Given the current code, the implication is that this workqueue reschedule will take place on the wrong cpu. If drop monitor is used early during the boot process, its possible that two cpus will access a single per-cpu structure in parallel, possibly leading to data corruption. This patch fixes the situation, by storing the cpu number that a given instance of this per-cpu data should be accessed from. In the case of a need for a reschedule, the cpu stored in the struct is assigned the rescheule, rather than the currently executing cpu Tested successfully by myself. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02tcp: early retransmit: delayed fast retransmitYuchung Cheng
Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2). Delays the fast retransmit by an interval of RTT/4. We borrow the RTO timer to implement the delay. If we receive another ACK or send a new packet, the timer is cancelled and restored to original RTO value offset by time elapsed. When the delayed-ER timer fires, we enter fast recovery and perform fast retransmit. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02tcp: early retransmitYuchung Cheng
This patch implements RFC 5827 early retransmit (ER) for TCP. It reduces DUPACK threshold (dupthresh) if outstanding packets are less than 4 to recover losses by fast recovery instead of timeout. While the algorithm is simple, small but frequent network reordering makes this feature dangerous: the connection repeatedly enter false recovery and degrade performance. Therefore we implement a mitigation suggested in the appendix of the RFC that delays entering fast recovery by a small interval, i.e., RTT/4. Currently ER is conservative and is disabled for the rest of the connection after the first reordering event. A large scale web server experiment on the performance impact of ER is summarized in section 6 of the paper "Proportional Rate Reduction for TCP”, IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf Note that Linux has a similar feature called THIN_DUPACK. The differences are THIN_DUPACK do not mitigate reorderings and is only used after slow start. Currently ER is disabled if THIN_DUPACK is enabled. I would be happy to merge THIN_DUPACK feature with ER if people think it's a good idea. ER is enabled by sysctl_tcp_early_retrans: 0: Disables ER 1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4. 2: (Default) reduce dupthresh like mode 1. In addition, delay entering fast recovery by RTT/4. Note: mode 2 is implemented in the third part of this patch series. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02tcp: early retransmit: tcp_enter_recovery()Yuchung Cheng
This a prepartion patch that refactors the code to enter recovery into a new function tcp_enter_recovery(). It's needed to implement the delayed fast retransmit in ER. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01Merge branch 'master' of ↵John W. Linville
git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem
2012-05-01netem: fix possible skb leakEric Dumazet
skb_checksum_help(skb) can return an error, we must free skb in this case. qdisc_drop(skb, sch) can also be feeded with a NULL skb (if skb_unshare() failed), so lets use this generic helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01netem: add ECN capabilityEric Dumazet
Add ECN (Explicit Congestion Notification) marking capability to netem tc qdisc add dev eth0 root netem drop 0.5 ecn Instead of dropping packets, try to ECN mark them. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01net: add a prefetch in socket backlog processingEric Dumazet
TCP or UDP stacks have big enough latencies that prefetching next pointer is worth it. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01l2tp: let iproute2 create L2TPv3 IP tunnels using IPv6James Chapman
The netlink API lets users create unmanaged L2TPv3 tunnels using iproute2. Until now, a request to create an unmanaged L2TPv3 IP encapsulation tunnel over IPv6 would be rejected with EPROTONOSUPPORT. Now that l2tp_ip6 implements sockets for L2TP IP encapsulation over IPv6, we can add support for that tunnel type. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01l2tp: introduce L2TPv3 IP encapsulation support for IPv6Chris Elston
L2TPv3 defines an IP encapsulation packet format where data is carried directly over IP (no UDP). The kernel already has support for L2TP IP encapsulation over IPv4 (l2tp_ip). This patch introduces support for L2TP IP encapsulation over IPv6. The implementation is derived from ipv6/raw and ipv4/l2tp_ip. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01ipv6: Export ipv6 functions for use by other protocolsChris Elston
For implementing other protocols on top of IPv6, such as L2TPv3's IP encapsulation over ipv6, we'd like to call some IPv6 functions which are not currently exported. This patch exports them. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01l2tp: netlink api for l2tpv3 ipv6 unmanaged tunnelsChris Elston
This patch adds support for unmanaged L2TPv3 tunnels over IPv6 using the netlink API. We already support unmanaged L2TPv3 tunnels over IPv4. A patch to iproute2 to make use of this feature will be submitted separately. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01l2tp: show IPv6 addresses in l2tp debugfs fileChris Elston
If an L2TP tunnel uses IPv6, make sure the l2tp debugfs file shows the IPv6 address correctly. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01l2tp: pppol2tp_connect() handles ipv6 sockaddr variantsJames Chapman
Userspace uses connect() to associate a pppol2tp socket with a tunnel socket. This needs to allow the caller to supply the new IPv6 sockaddr_pppol2tp structures if IPv6 is used. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01l2tp: remove unused stats from l2tp_ip socketJames Chapman
The l2tp_ip socket currently maintains packet/byte stats in its private socket structure. But these counters aren't exposed to userspace and so serve no purpose. The counters were also smp-unsafe. So this patch just gets rid of the stats. While here, change a couple of internal __u32 variables to u32. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01l2tp: Use ip4_datagram_connect() in l2tp_ip_connect()James Chapman
Cleanup the l2tp_ip code to make use of an existing ipv4 support function. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01l2tp: fix locking of 64-bit counters for smpJames Chapman
L2TP uses 64-bit counters but since these are not updated atomically, we need to make them safe for smp. This patch addresses that. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01HID: Create a common generic driverHenrik Rydberg
Move the hid drivers of the bus drivers to a common generic hid driver, and make it a proper module. This ought to simplify device handling moving forward. Cc: Gustavo Padovan <gustavo@padovan.org> Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-05-01HID: Create a generic device groupHenrik Rydberg
Devices that do not have a special driver are handled by the generic driver. This patch does the same thing using device groups; Instead of forcing a particular driver, the appropriate driver is picked up by udev. As a consequence, one can now move a device from generic to specific handling by a simple rebind. By adding a new device id to the generic driver, the same thing can be done in reverse. Signed-off-by: Henrik Rydberg <rydberg@euromail.se> Acked-by: Benjamin Tissoires <benjamin.tissoires@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-04-30Merge branch 'tipc_net-next' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
2012-04-30net: makes skb_splice_bits() aware of skb->head_fragEric Dumazet
__skb_splice_bits() can check if skb to be spliced has its skb->head mapped to a page fragment, instead of a kmalloc() area. If so we can avoid a copy of the skb head and get a reference on underlying page. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30tcp: makes tcp_try_coalesce aware of skb->head_fragEric Dumazet
TCP coalesce can check if skb to be merged has its skb->head mapped to a page fragment, instead of a kmalloc() area. We had to disable coalescing in this case, for performance reasons. We 'upgrade' skb->head as a fragment in itself. This reduces number of cache misses when user makes its copies, since a less sk_buff are fetched. This makes receive and ofo queues shorter and thus reduce cache line misses in TCP stack. This is a followup of patch "net: allow skb->head to be a page fragment" Tested with tg3 nic, with GRO on or off. We can see "TCPRcvCoalesce" counter being incremented. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30net: make GRO aware of skb->head_fragEric Dumazet
GRO can check if skb to be merged has its skb->head mapped to a page fragment, instead of a kmalloc() area. We 'upgrade' skb->head as a fragment in itself This avoids the frag_list fallback, and permits to build true GRO skb (one sk_buff and up to 16 fragments), using less memory. This reduces number of cache misses when user makes its copy, since a single sk_buff is fetched. This is a followup of patch "net: allow skb->head to be a page fragment" Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30net: allow skb->head to be a page fragmentEric Dumazet
skb->head is currently allocated from kmalloc(). This is convenient but has the drawback the data cannot be converted to a page fragment if needed. We have three spots were it hurts : 1) GRO aggregation When a linear skb must be appended to another skb, GRO uses the frag_list fallback, very inefficient since we keep all struct sk_buff around. So drivers enabling GRO but delivering linear skbs to network stack aren't enabling full GRO power. 2) splice(socket -> pipe). We must copy the linear part to a page fragment. This kind of defeats splice() purpose (zero copy claim) 3) TCP coalescing. Recently introduced, this permits to group several contiguous segments into a single skb. This shortens queue lengths and save kernel memory, and greatly reduce probabilities of TCP collapses. This coalescing doesnt work on linear skbs (or we would need to copy data, this would be too slow) Given all these issues, the following patch introduces the possibility of having skb->head be a fragment in itself. We use a new skb flag, skb->head_frag to carry this information. build_skb() is changed to accept a frag_size argument. Drivers willing to provide a page fragment instead of kmalloc() data will set a non zero value, set to the fragment size. Then, on situations we need to convert the skb head to a frag in itself, we can check if skb->head_frag is set and avoid the copies or various fallbacks we have. This means drivers currently using frags could be updated to avoid the current skb->head allocation and reduce their memory footprint (aka skb truesize). (thats 512 or 1024 bytes saved per skb). This also makes bpf/netfilter faster since the 'first frag' will be part of skb linear part, no need to copy data. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Maciej Żenczykowski <maze@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Matt Carlson <mcarlson@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30tipc: compress out gratuitous extra carriage returnsPaul Gortmaker
Some of the comment blocks are floating in limbo between two functions, or between blocks of code. Delete the extra line feeds between any comment and its associated following block of code, to be consistent with the majority of the rest of the kernel. Also delete trailing newlines at EOF and fix a couple trivial typos in existing comments. This is a 100% cosmetic change with no runtime impact. We get rid of over 500 lines of non-code, and being blank line deletes, they won't even show up as noise in git blame. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-04-30mac80211: fix AP mode EAP tx for VLAN stationsFelix Fietkau
EAP frames for stations in an AP VLAN are sent on the main AP interface to avoid race conditions wrt. moving stations. For that to work properly, sta_info_get_bss must be used instead of sta_info_get when sending EAP packets. Previously this was only done for cooked monitor injected packets, so this patch adds a check for tx->skb->protocol to the same place. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Cc: stable@vger.kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-30tcp: fix infinite cwnd in tcp_complete_cwr()Yuchung Cheng
When the cwnd reduction is done, ssthresh may be infinite if TCP enters CWR via ECN or F-RTO. If cwnd is not undone, i.e., undo_marker is set, tcp_complete_cwr() falsely set cwnd to the infinite ssthresh value. The correct operation is to keep cwnd intact because it has been updated in ECN or F-RTO. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30bridge: Fix fatal typo in setup of multicast_querier_expiredHerbert Xu
Unfortunately it seems that I didn't properly test the case of an expired external querier in the recent multicast bridge series. The setup of the timer in that case is completely broken and leads to a NULL-pointer dereference. This patch fixes it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30Merge branch 'master' of git://1984.lsi.us.es/netDavid S. Miller
2012-04-30l2tp: Add missing net/net/ip6_checksum.h include.David S. Miller
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30SUNRPC: RPC client must use the current utsname hostname stringTrond Myklebust
Now that the rpc client is namespace aware, it needs to use the utsname of the process that created it instead of using the init_utsname. Both rpc_new_client and rpc_clone_client need to be fixed. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
2012-04-30netfilter: xt_CT: fix wrong checking in the timeout assignment pathPablo Neira Ayuso
The current checking always succeeded. We have to check the first character of the string to check that it's empty, thus, skipping the timeout path. This fixes the use of the CT target without the timeout option. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-04-30ipvs: kernel oops - do_ip_vs_get_ctlHans Schillstrom
Change order of init so netns init is ready when register ioctl and netlink. Ver2 Whitespace fixes and __init added. Reported-by: "Ryan O'Hara" <rohara@redhat.com> Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Simon Horman <horms@verge.net.au>