summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-07-19ixgbe: Drop references to deprecated pci_ DMA api and instead use dma_ APIAlexander Duyck
The networking side of the code had already been updated to use dma_ calls instead of the old pci_ calls. However it looks like the FCoE code was never updated. This change goes through and moves everything from the pci APIs to the dma APIs. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-19ixgbe: Fix memory leak when SR-IOV VFs are direct assignedAlexander Duyck
The VF driver had a memory leak that would occur if VFs were assigned to a guest. The amount of leak would vary with the number of VFs but could max out at about 14K per PF. To reproduce the leak all you would need to do is enable all the VFs on the first PF. Then start a loop of loading and unloading the driver with max_vfs=63 for the first port. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-19ixgbe: Use VMDq offset to indicate the default poolAlexander Duyck
This change makes it so that we can use the VMDq ring feature offset value to determine the default pool instead of using num_vfs. The reason for this change is to avoid issues should we fail to allocate vfinfo but have pre-existing VFs. What should happen in this case is that num_vfs will go to 0, but the VMDq offset will contain the location of the first PF pool. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Tested-by: Sibai Li <Sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-07-19Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull last minute Ceph fixes from Sage Weil: "The important one fixes a bug in the socket failure handling behavior that was turned up in some recent failure injection testing. The other two are minor bug fixes." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: endian bug in rbd_req_cb() rbd: Fix ceph_snap_context size calculation libceph: fix messenger retry
2012-07-19mm: frontswap: split out function to clear a page outSasha Levin
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19clk: fix compile for OF && !COMMON_CLKRob Herring
With commit 766e6a4ec602d0c107 (clk: add DT clock binding support), compiling with OF && !COMMON_CLK is broken. Reported-by: Alexandre Pereira da Silva <aletes.xgr@gmail.com> Reported-by: Prashant Gaikwad <pgaikwad@nvidia.com> Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Mike Turquette <mturquette@linaro.org>
2012-07-19clk: fix clk_get on of_clk_get_by_name return checkShawn Guo
The commit 766e6a4 (clk: add DT clock binding support) plugs device tree clk lookup of_clk_get_by_name into clk_get, and fall on non-DT lookup clk_get_sys if DT lookup fails. The return check on of_clk_get_by_name takes (clk != NULL) as a successful DT lookup. But it's not the case. For any system that does not define clk lookup in device tree, ERR_PTR(-ENOENT) will be returned, and consequently, all the client drivers calling clk_get in their probe functions will fail to probe with error code -ENOENT returned. Fix the issue by checking of_clk_get_by_name return with !IS_ERR(clk), and update of_clk_get and of_clk_get_by_name for !CONFIG_OF build correspondingly. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Tested-by: Marek Vasut <marex@denx.de> Tested-by: Lauri Hintsala <lauri.hintsala@bluegiga.com> Signed-off-by: Mike Turquette <mturquette@linaro.org>
2012-07-19Merge branch 'net' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
2012-07-19ipv4: Fix again the time difference calculationJulian Anastasov
Fix again the diff value in rt_bind_exception after collision of two latest patches, my original commit actually fixed the same problem. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19xen: populate correct number of pages when across mem boundary (v2)zhenzhong.duan
When populate pages across a mem boundary at bootup, the page count populated isn't correct. This is due to mem populated to non-mem region and ignored. Pfn range is also wrongly aligned when mem boundary isn't page aligned. For a dom0 booted with dom_mem=3368952K(0xcd9ff000-4k) dmesg diff is: [ 0.000000] Freeing 9e-100 pfn range: 98 pages freed [ 0.000000] 1-1 mapping on 9e->100 [ 0.000000] 1-1 mapping on cd9ff->100000 [ 0.000000] Released 98 pages of unused memory [ 0.000000] Set 206435 page(s) to 1-1 mapping -[ 0.000000] Populating cd9fe-cda00 pfn range: 1 pages added +[ 0.000000] Populating cd9fe-cd9ff pfn range: 1 pages added +[ 0.000000] Populating 100000-100061 pfn range: 97 pages added [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 000000000009e000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 00000000cd9ff000 (usable) [ 0.000000] Xen: 00000000cd9ffc00 - 00000000cda53c00 (ACPI NVS) ... [ 0.000000] Xen: 0000000100000000 - 0000000100061000 (usable) [ 0.000000] Xen: 0000000100061000 - 000000012c000000 (unusable) ... [ 0.000000] MEMBLOCK configuration: ... -[ 0.000000] reserved[0x4] [0x000000cd9ff000-0x000000cd9ffbff], 0xc00 bytes -[ 0.000000] reserved[0x5] [0x00000100000000-0x00000100060fff], 0x61000 bytes Related xen memory layout: (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009ec00 (usable) (XEN) 00000000000f0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 00000000cd9ffc00 (usable) Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> [v2: If xen_do_chunk fail(populate), abort this chunk and any others] Suggested by David, thanks. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen PVonHVM: move shared_info to MMIO before kexecOlaf Hering
Currently kexec in a PVonHVM guest fails with a triple fault because the new kernel overwrites the shared info page. The exact failure depends on the size of the kernel image. This patch moves the pfn from RAM into MMIO space before the kexec boot. The pfn containing the shared_info is located somewhere in RAM. This will cause trouble if the current kernel is doing a kexec boot into a new kernel. The new kernel (and its startup code) can not know where the pfn is, so it can not reserve the page. The hypervisor will continue to update the pfn, and as a result memory corruption occours in the new kernel. One way to work around this issue is to allocate a page in the xen-platform pci device's BAR memory range. But pci init is done very late and the shared_info page is already in use very early to read the pvclock. So moving the pfn from RAM to MMIO is racy because some code paths on other vcpus could access the pfn during the small window when the old pfn is moved to the new pfn. There is even a small window were the old pfn is not backed by a mfn, and during that time all reads return -1. Because it is not known upfront where the MMIO region is located it can not be used right from the start in xen_hvm_init_shared_info. To minimise trouble the move of the pfn is done shortly before kexec. This does not eliminate the race because all vcpus are still online when the syscore_ops will be called. But hopefully there is no work pending at this point in time. Also the syscore_op is run last which reduces the risk further. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen: simplify init_hvm_pv_infoOlaf Hering
init_hvm_pv_info is called only in PVonHVM context, move it into ifdef. init_hvm_pv_info does not fail, make it a void function. remove arguments from init_hvm_pv_info because they are not used by the caller. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen: remove cast from HYPERVISOR_shared_info assignmentOlaf Hering
Both have type struct shared_info so no cast is needed. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen: enable platform-pci only in a Xen guestOlaf Hering
While debugging kexec issues in a PVonHVM guest I modified xen_hvm_platform() to return false to disable all PV drivers. This caused a crash in platform_pci_init() because it expects certain data structures to be initialized properly. To avoid such a crash make sure the driver is initialized only if running in a Xen guest. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/pv-on-hvm kexec: shutdown watches from old kernelOlaf Hering
Add xs_reset_watches function to shutdown watches from old kernel after kexec boot. The old kernel does not unregister all watches in the shutdown path. They are still active, the double registration can not be detected by the new kernel. When the watches fire, unexpected events will arrive and the xenwatch thread will crash (jumps to NULL). An orderly reboot of a hvm guest will destroy the entire guest with all its resources (including the watches) before it is rebuilt from scratch, so the missing unregister is not an issue in that case. With this change the xenstored is instructed to wipe all active watches for the guest. However, a patch for xenstored is required so that it accepts the XS_RESET_WATCHES request from a client (see changeset 23839:42a45baf037d in xen-unstable.hg). Without the patch for xenstored the registration of watches will fail and some features of a PVonHVM guest are not available. The guest is still able to boot, but repeated kexec boots will fail. Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/x86: avoid updating TLS descriptors if they haven't changedDavid Vrabel
When switching tasks in a Xen PV guest, avoid updating the TLS descriptors if they haven't changed. This improves the speed of context switches by almost 10% as much of the time the descriptors are the same or only one is different. The descriptors written into the GDT by Xen are modified from the values passed in the update_descriptor hypercall so we keep shadow copies of the three TLS descriptors to compare against. lmbench3 test Before After Improvement -------------------------------------------- lat_ctx -s 32 24 7.19 6.52 9% lat_pipe 12.56 11.66 7% Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/x86: add desc_equal() to compare GDT descriptorsDavid Vrabel
Signed-off-by: David Vrabel <david.vrabel@citrix.com> [v1: Moving it to the Xen file] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/mm: zero PTEs for non-present MFNs in the initial page tableDavid Vrabel
When constructing the initial page tables, if the MFN for a usable PFN is missing in the p2m then that frame is initially ballooned out. In this case, zero the PTE (as in decrease_reservation() in drivers/xen/balloon.c). This is obviously safe instead of having an valid PTE with an MFN of INVALID_P2M_ENTRY (~0). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/mm: do direct hypercall in xen_set_pte() if batching is unavailableDavid Vrabel
In xen_set_pte() if batching is unavailable (because the caller is in an interrupt context such as handling a page fault) it would fall back to using native_set_pte() and trapping and emulating the PTE write. On 32-bit guests this requires two traps for each PTE write (one for each dword of the PTE). Instead, do one mmu_update hypercall directly. During construction of the initial page tables, continue to use native_set_pte() because most of the PTEs being set are in writable and unpinned pages (see phys_pmd_init() in arch/x86/mm/init_64.c) and using a hypercall for this is very expensive. This significantly improves page fault performance in 32-bit PV guests. lmbench3 test Before After Improvement ---------------------------------------------- lat_pagefault 3.18 us 2.32 us 27% lat_proc fork 356 us 313.3 us 11% Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/hvc: Fix up checks when the info is allocated.Konrad Rzeszutek Wilk
Coverity would complain about this - even thought it looks OK. CID 401957 Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/acpi: Fix potential memory leak.Konrad Rzeszutek Wilk
Coverity points out that we do not free in one case the pr_backup - and sure enough we forgot. Found by Coverity (CID 401970) Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/mce: add .poll method for mcelog device driverLiu, Jinsong
If a driver leaves its poll method NULL, the device is assumed to be both readable and writable without blocking. This patch add .poll method to xen mcelog device driver, so that when mcelog use system calls like ppoll or select, it would be blocked when no data available, and avoid spinning at CPU. Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/mce: schedule a workqueue to avoid sleep in atomic contextLiu, Jinsong
copy_to_user might sleep and print a stack trace if it is executed in an atomic spinlock context. Like this: (XEN) CMCI: send CMCI to DOM0 through virq BUG: sleeping function called from invalid context at /home/konradinux/kernel.h:199 in_atomic(): 1, irqs_disabled(): 0, pid: 4581, name: mcelog Pid: 4581, comm: mcelog Tainted: G O 3.5.0-rc1upstream-00003-g149000b-dirty #1 [<ffffffff8109ad9a>] __might_sleep+0xda/0x100 [<ffffffff81329b0b>] xen_mce_chrdev_read+0xab/0x140 [<ffffffff81148945>] vfs_read+0xc5/0x190 [<ffffffff81148b0c>] sys_read+0x4c/0x90 [<ffffffff815bd039>] system_call_fastpath+0x16 This patch schedule a workqueue for IRQ handler to poll the data, and use mutex instead of spinlock, so copy_to_user sleep in atomic context would not occur. Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/pcpu: Xen physical cpus online/offline sys interfaceLiu, Jinsong
This patch provide Xen physical cpus online/offline sys interface. User can use it for their own purpose, like power saving: by offlining some cpus when light workload it save power greatly. Its basic workflow is, user online/offline cpu via sys interface, then hypercall xen to implement, after done xen inject virq back to dom0, and then dom0 sync cpu status. Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/mce: Register native mce handler as vMCE bounce back pointLiu, Jinsong
When Xen hypervisor inject vMCE to guest, use native mce handler to handle it Signed-off-by: Ke, Liping <liping.ke@intel.com> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19x86, MCE, AMD: Adjust initcall sequence for xenLiu, Jinsong
there are 3 funcs which need to be _initcalled in a logic sequence: 1. xen_late_init_mcelog 2. mcheck_init_device 3. threshold_init_device xen_late_init_mcelog must register xen_mce_chrdev_device before native mce_chrdev_device registration if running under xen platform; mcheck_init_device should be inited before threshold_init_device to initialize mce_device, otherwise a a NULL ptr dereference will cause panic. so we use following _initcalls 1. device_initcall(xen_late_init_mcelog); 2. device_initcall_sync(mcheck_init_device); 3. late_initcall(threshold_init_device); when running under xen, the initcall order is 1,2,3; on baremetal, we skip 1 and we do only 2 and 3. Acked-and-tested-by: Borislav Petkov <bp@amd64.org> Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19xen/mce: Add mcelog support for Xen platformLiu, Jinsong
When MCA error occurs, it would be handled by Xen hypervisor first, and then the error information would be sent to initial domain for logging. This patch gets error information from Xen hypervisor and convert Xen format error into Linux format mcelog. This logic is basically self-contained, not touching other kernel components. By using tools like mcelog tool users could read specific error information, like what they did under native Linux. To test follow directions outlined in Documentation/acpi/apei/einj.txt Acked-and-tested-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Ke, Liping <liping.ke@intel.com> Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2012-07-19IB/qib: checkpatch fixesMike Marciniszyn
Elminate some simple_strto* usage. checkpatch also noted pr_ conversations, which have been done as recommended. The pr_fmt() define is used to shorten line length. Other multi-line string warnings are also elmininated. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-19IB/qib: Add congestion control agent implementationMike Marciniszyn
Add a congestion control agent in the driver that handles gets and sets from the congestion control manager in the fabric for the Performance Scale Messaging (PSM) library. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-19IB/qib: Reduce sdma_lock contentionMike Marciniszyn
Profiling has shown that sdma_lock is proving a bottleneck for performance. The situations include: - RDMA reads when krcvqs > 1 - post sends from multiple threads For RDMA read the current global qib_wq mechanism runs on all CPUs and contends for the sdma_lock when multiple RMDA read requests are fielded on differenct CPUs. For post sends, the direct call to qib_do_send() from multiple threads causes the contention. Since the sdma mechanism is per port, this fix converts the existing workqueue to a per port single thread workqueue to reduce the lock contention in the RDMA read case, and for any other case where the QP is scheduled via the workqueue mechanism from more than 1 CPU. For the post send case, This patch modifies the post send code to test for a non empty sdma engine. If the sdma is not idle the (now single thread) workqueue will be used to trigger the send engine instead of the direct call to qib_do_send(). Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-19IB/qib: Fix an incorrect log messageBetty Dall
There is a cut-and-paste typo in the function qib_pci_slot_reset() where it prints that the "link_reset" function is called rather than the "slot_reset" function. This makes the message misleading. Signed-off-by: Betty Dall <betty.dall@hp.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
2012-07-19net-tcp: Fast Open client - cookie-less modeYuchung Cheng
In trusted networks, e.g., intranet, data-center, the client does not need to use Fast Open cookie to mitigate DoS attacks. In cookie-less mode, sendmsg() with MSG_FASTOPEN flag will send SYN-data regardless of cookie availability. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19net-tcp: Fast Open client - detecting SYN-data dropsYuchung Cheng
On paths with firewalls dropping SYN with data or experimental TCP options, Fast Open connections will have experience SYN timeout and bad performance. The solution is to track such incidents in the cookie cache and disables Fast Open temporarily. Since only the original SYN includes data and/or Fast Open option, the SYN-ACK has some tell-tale sign (tcp_rcv_fastopen_synack()) to detect such drops. If a path has recurring Fast Open SYN drops, Fast Open is disabled for 2^(recurring_losses) minutes starting from four minutes up to roughly one and half day. sendmsg with MSG_FASTOPEN flag will succeed but it behaves as connect() then write(). Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)Yuchung Cheng
sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2) and write(2). The application should replace connect() with it to send data in the opening SYN packet. For blocking socket, sendmsg() blocks until all the data are buffered locally and the handshake is completed like connect() call. It returns similar errno like connect() if the TCP handshake fails. For non-blocking socket, it returns the number of bytes queued (and transmitted in the SYN-data packet) if cookie is available. If cookie is not available, it transmits a data-less SYN packet with Fast Open cookie request option and returns -EINPROGRESS like connect(). Using MSG_FASTOPEN on connecting or connected socket will result in simlar errno like repeating connect() calls. Therefore the application should only use this flag on new sockets. The buffer size of sendmsg() is independent of the MSS of the connection. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19net-tcp: Fast Open client - receiving SYN-ACKYuchung Cheng
On receiving the SYN-ACK after SYN-data, the client needs to a) update the cached MSS and cookie (if included in SYN-ACK) b) retransmit the data not yet acknowledged by the SYN-ACK in the final ACK of the handshake. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19net-tcp: Fast Open client - sending SYN-dataYuchung Cheng
This patch implements sending SYN-data in tcp_connect(). The data is from tcp_sendmsg() with flag MSG_FASTOPEN (implemented in a later patch). The length of the cookie in tcp_fastopen_req, init'd to 0, controls the type of the SYN. If the cookie is not cached (len==0), the host sends data-less SYN with Fast Open cookie request option to solicit a cookie from the remote. If cookie is not available (len > 0), the host sends a SYN-data with Fast Open cookie option. If cookie length is negative, the SYN will not include any Fast Open option (for fall back operations). To deal with middleboxes that may drop SYN with data or experimental TCP option, the SYN-data is only sent once. SYN retransmits do not include data or Fast Open options. The connection will fall back to regular TCP handshake. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19net-tcp: Fast Open client - cookie cacheYuchung Cheng
With help from Eric Dumazet, add Fast Open metrics in tcp metrics cache. The basic ones are MSS and the cookies. Later patch will cache more to handle unfriendly middleboxes. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19net-tcp: Fast Open baseYuchung Cheng
This patch impelements the common code for both the client and server. 1. TCP Fast Open option processing. Since Fast Open does not have an option number assigned by IANA yet, it shares the experiment option code 254 by implementing draft-ietf-tcpm-experimental-options with a 16 bits magic number 0xF989. This enables global experiments without clashing the scarce(2) experimental options available for TCP. When the draft status becomes standard (maybe), the client should switch to the new option number assigned while the server supports both numbers for transistion. 2. The new sysctl tcp_fastopen 3. A place holder init function Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19mlx4_en: map entire pages to increase throughputThadeu Lima de Souza Cascardo
In its receive path, mlx4_en driver maps each page chunk that it pushes to the hardware and unmaps it when pushing it up the stack. This limits throughput to about 3Gbps on a Power7 8-core machine. One solution is to map the entire allocated page at once. However, this requires that we keep track of every page fragment we give to a descriptor. We also need to work with the discipline that all fragments will be released (in the sense that it will not be reused by the driver anymore) in the order they are allocated to the driver. This requires that we don't reuse any fragments, every single one of them must be reallocated. We do that by releasing all the fragments that are processed and only after finished processing the descriptors, we start the refill. We also must somehow guarantee that we either refill all fragments in a descriptor or none at all, without resorting to giving up a page fragment that we would have already given. Otherwise, we would break the discipline of only releasing the fragments in the order they were allocated. This has passed page allocation fault injections (restricted to the driver by using required-start and required-end) and device hotplug while 16 TCP streams were able to deliver more than 9Gbps. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19sfc: initialize dynamic sysfs attributes for lockdepMichal Schmidt
Dynamically allocated sysfs attributes must be initialized using sysfs_attr_init(), otherwise lockdep complains: BUG: key <address> not in .data! Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19bridge: update documentation referencesstephen hemminger
Update the references to bridge utilities and web pages to current locations Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19net: e100: ucode is optional in some casesBjørn Mork
commit 9ac32e1b firmware: convert e100 driver to request_firmware() did a straight conversion of the in-driver ucode to external files. This introduced the possibility of the driver failing to enable an interface due to missing ucode. There was no evaluation of the importance of the ucode at the time. Based on comments in earlier versions of this driver, and in the source code for the FreeBSD fxp driver, we can assume that the ucode implements the "CPU Cycle Saver" feature on supported adapters. Although generally wanted, this is an optional feature. The ucode source is not available, preventing it from being included in free distributions. This creates unnecessary problems for the end users. Doing a network install based on a free distribution installer requires the user to download and insert the ucode into the installer. Making the ucode optional when possible improves the user experience and driver usability. The ucode for some adapters include a bugfix, making it essential. We continue to fail for these adapters unless the ucode is available. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19asix: AX88172A driver depends on phylibChristian Riesch
Since commit 16626b0cc3d5afe250850f96759b241f8a403b52 the asix driver depends on the phylib. Select phylib when the asix driver is selected. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: kernel-janitors@vger.kernel.org Signed-off-by: Christian Riesch <christian.riesch@omicron.at> Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19asix: Add support for programming the EEPROMChristian Riesch
This patch adds the asix_set_eeprom() function to provide support for programming the configuration EEPROM via ethtool. Signed-off-by: Christian Riesch <christian.riesch@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19asix: Rework reading from EEPROMChristian Riesch
The current code for reading the EEPROM via ethtool in the asix driver has a few issues. It cannot handle odd length values (accesses must be aligned at 16 bit boundaries) and interprets the offset provided by ethtool as 16 bit word offset instead as byte offset. The new code for asix_get_eeprom() introduced by this patch is modeled after the code in drivers/net/ethernet/atheros/atl1e/atl1e_ethtool.c and provides read access to the entire EEPROM with arbitrary offsets and lengths. Signed-off-by: Christian Riesch <christian.riesch@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19net: stmmac: Add ip version to dts bindingsDinh Nguyen
Because there are multiple variants to the stmmac/dwmac driver, the dts bindings should be updated to include version of the IP used. Signed-off-by: Dinh Nguyen <dinguyen@altera.com> Acked-by: Stefan Roese <sr@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19cxgb3: Set vlan_feature on net_devicebrenohl@br.ibm.com
cxgb3 interface has a bad performance when VLAN is set. On my current setup, a PowerLinux 7R2, I am able to get around 7 Gbps on a TCP_STREAM (8 instances, 4k message). With this patch, I am able to reach 9.5 Gbps. Signed-off-by: Breno Leitao <brenohl@br.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19ipx: move peII functionsstephen hemminger
The Ethernet II wrapper is only used by IPX protocol, may have once been used by Appletalk but not currently. Therefore it makes sense to move it to the IPX dust bin and drop the exports. Build tested only. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19net: Fix warnings in dst_ops.hDavid S. Miller
include/net/dst_ops.h:28:20: warning: ‘struct sock’ declared inside parameter list Signed-off-by: David S. Miller <davem@davemloft.net>