summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2015-10-04bridge: netlink: add support for multicast_querierNikolay Aleksandrov
Add IFLA_BR_MCAST_QUERIER to allow setting/getting br->multicast_querier via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04bridge: netlink: add support for multicast_query_use_ifaddrNikolay Aleksandrov
Add IFLA_BR_MCAST_QUERY_USE_IFADDR to allow setting/getting br->multicast_query_use_ifaddr via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04bridge: netlink: add support for multicast_snoopingNikolay Aleksandrov
Add IFLA_BR_MCAST_SNOOPING to allow enabling/disabling multicast snooping via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04bridge: netlink: add support for multicast_routerNikolay Aleksandrov
Add IFLA_BR_MCAST_ROUTER to allow setting and retrieving br->multicast_router when igmp snooping is enabled. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04bridge: netlink: add fdb flushNikolay Aleksandrov
Simple attribute that flushes the bridge's fdb. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04bridge: netlink: add group_addr supportNikolay Aleksandrov
Add IFLA_BR_GROUP_ADDR attribute to allow setting and retrieving the group_addr via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04bridge: netlink: export all timersNikolay Aleksandrov
Export the following bridge timers (also exported via sysfs): IFLA_BR_HELLO_TIMER, IFLA_BR_TCN_TIMER, IFLA_BR_TOPOLOGY_CHANGE_TIMER, IFLA_BR_GC_TIMER via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04bridge: netlink: export topology_change and topology_change_detectedNikolay Aleksandrov
Add IFLA_BR_TOPOLOGY_CHANGE and IFLA_BR_TOPOLOGY_CHANGE_DETECTED and export them via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04bridge: netlink: export root path costNikolay Aleksandrov
Add IFLA_BR_ROOT_PATH_COST and export it via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04bridge: netlink: export root portNikolay Aleksandrov
Add IFLA_BR_ROOT_PORT and export it via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04bridge: netlink: export bridge idNikolay Aleksandrov
Add IFLA_BR_BRIDGE_ID and export br->bridge_id via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04bridge: netlink: export root idNikolay Aleksandrov
Add IFLA_BR_ROOT_ID and export br->designated_root via netlink. For this purpose add struct ifla_bridge_id that would represent struct bridge_id. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04bridge: netlink: add group_fwd_mask supportNikolay Aleksandrov
Add IFLA_BR_GROUP_FWD_MASK attribute to allow setting and retrieving the group_fwd_mask via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-04netfilter: nfnetlink_queue: get rid of nfnetlink_queue_ct.cPablo Neira Ayuso
The original intention was to avoid dependencies between nfnetlink_queue and conntrack without ifdef pollution. However, we can achieve this by moving the conntrack dependent code into ctnetlink and keep some glue code to access the nfq_ct indirection from nfqueue. After this patch, the nfq_ct indirection is always compiled in the netfilter core to avoid polluting nfqueue with ifdefs. Thus, if nf_conntrack is not compiled this results in only 8-bytes of memory waste in x86_64. This patch also adds ctnetlink_nfqueue_seqadj() to avoid that the nf_conn structure layout if exposed to nf_queue, which creates another dependency with nf_conntrack at compilation time. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-04stm class: Introduce an abstraction for System Trace Module devicesAlexander Shishkin
A System Trace Module (STM) is a device exporting data in System Trace Protocol (STP) format as defined by MIPI STP standards. Examples of such devices are Intel(R) Trace Hub and Coresight STM. This abstraction provides a unified interface for software trace sources to send their data over an STM device to a debug host. In order to do that, such a trace source needs to be assigned a pair of master/channel identifiers that all the data from this source will be tagged with. The STP decoder on the debug host side will use these master/channel tags to distinguish different trace streams from one another inside one STP stream. This abstraction provides a configfs-based policy management mechanism for dynamic allocation of these master/channel pairs based on trace source-supplied string identifier. It has the flexibility of being defined at runtime and at the same time (provided that the policy definition is aligned with the decoding end) consistency. For userspace trace sources, this abstraction provides write()-based and mmap()-based (if the underlying stm device allows this) output mechanism. For kernel-side trace sources, we provide "stm_source" device class that can be connected to an stm device at run time. Cc: linux-api@vger.kernel.org Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04tty/serial: at91: move ATMEL_MAX_UARTAlexandre Belloni
Move ATMEL_MAX_UART from platform_data/atmel.h to atmel_serial.c as this is the only file using it and it is common practise from tty/serial drivers to define it directly in the driver file. Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04Merge branch 'strscpy' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile Pull strscpy string copy function implementation from Chris Metcalf. Chris sent this during the merge window, but I waffled back and forth on the pull request, which is why it's going in only now. The new "strscpy()" function is definitely easier to use and more secure than either strncpy() or strlcpy(), both of which are horrible nasty interfaces that have serious and irredeemable problems. strncpy() has a useless return value, and doesn't NUL-terminate an overlong result. To make matters worse, it pads a short result with zeroes, which is a performance disaster if you have big buffers. strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking the insane NUL padding, but having a differently broken return value which returns the original length of the source string. Which means that it will read characters past the count from the source buffer, and you have to trust the source to be properly terminated. It also makes error handling fragile, since the test for overflow is unnecessarily subtle. strscpy() avoids both these problems, guaranteeing the NUL termination (but not excessive padding) if the destination size wasn't zero, and making the overflow condition very obvious by returning -E2BIG. It also doesn't read past the size of the source, and can thus be used for untrusted source data too. So why did I waffle about this for so long? Every time we introduce a new-and-improved interface, people start doing these interminable series of trivial conversion patches. And every time that happens, somebody does some silly mistake, and the conversion patch to the improved interface actually makes things worse. Because the patch is mindnumbing and trivial, nobody has the attention span to look at it carefully, and it's usually done over large swatches of source code which means that not every conversion gets tested. So I'm pulling the strscpy() support because it *is* a better interface. But I will refuse to pull mindless conversion patches. Use this in places where it makes sense, but don't do trivial patches to fix things that aren't actually known to be broken. * 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: tile: use global strscpy() rather than private copy string: provide strscpy() Make asm/word-at-a-time.h available on all architectures
2015-10-04misc: mic: SCIF RMA header file and IOCTL changesSudeep Dutt
This patch updates the SCIF header file and IOCTL interface with the changes required to support RMAs. APIs added include the ability to pin pages and register those pages with SCIF. SCIF kernel clients can also add references to remote registered pages and access them via the CPU. The user space IOCTL interface has been updated to enable SCIF registration, RDMA/CPU copies and fence APIs for RDMA synchronization. Reviewed-by: Nikhil Rao <nikhil.rao@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04misc: mic: Update MIC host daemon with COSM changesAshutosh Dixit
This patch updates the MIC host daemon to work with corresponding changes in COSM. Other MIC daemon fixes, cleanups and enhancements as are also rolled into this patch. Changes to MIC sysfs ABI which go into effect with this patch are also documented. Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com> Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04misc: mic: Remove COSM functionality from the MIC card driverAshutosh Dixit
Since card side COSM functionality, to trigger MIC device shutdowns and communicate shutdown status to the host, is now moved into a separate COSM client driver, this patch removes this functionality from the base MIC card driver. The mic_bus driver is also updated to use the device index provided by COSM rather than maintain its own device index. Reviewed-by: Nikhil Rao <nikhil.rao@intel.com> Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04misc: mic: Add support for kernel mode SCIF clientsAshutosh Dixit
Add support for registration/de-registration of kernel mode SCIF clients. SCIF clients are probed with new and existing SCIF peer devices. Similarly the client remove method is called when SCIF peer devices are removed. Changes to SCIF peer device framework necessitated by supporting kernel mode SCIF clients are also included in this patch. Reviewed-by: Nikhil Rao <nikhil.rao@intel.com> Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04misc: mic: SCIF pollAshutosh Dixit
SCIF poll allows both user and kernel mode clients to wait on events on a SCIF endpoint. These events include availability of space or data in the SCIF ring buffer, availability of connection requests on a listening endpoint and completion of connections when using async connects. Reviewed-by: Nikhil Rao <nikhil.rao@intel.com> Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04dma: Add support to program MIC x100 status descriptiorsSiva Yerramreddy
The MIC X100 DMA engine has a special status descriptor which writes an 8 byte value to a destination location. This is used to signal completion of all DMA descriptors prior to the status descriptor. This patch add a new DMA engine API which enables updating a destination address with an 8 byte immediate data value. Reviewed-by: Nikhil Rao <nikhil.rao@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Lawrynowicz, Jacek <jacek.lawrynowicz@intel.com> Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com> Signed-off-by: Siva Yerramreddy <yshivakrishna@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04iommu: iova: Move iova cache management to the iova librarySakari Ailus
This is necessary to separate intel-iommu from the iova library. Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04kobject: explain what kobject's sd field isUlf Magnusson
(More) unclear, especially name-wise, after sysfs_dirent became kernfs_node. Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04debugfs: Pass bool pointer to debugfs_create_bool()Viresh Kumar
Its a bit odd that debugfs_create_bool() takes 'u32 *' as an argument, when all it needs is a boolean pointer. It would be better to update this API to make it accept 'bool *' instead, as that will make it more consistent and often more convenient. Over that bool takes just a byte. That required updates to all user sites as well, in the same commit updating the API. regmap core was also using debugfs_{read|write}_file_bool(), directly and variable types were updated for that to be bool as well. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Mark Brown <broonie@kernel.org> Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04usb: renesas_usbhs: fix build warning if 64-bit architectureYoshihiro Shimoda
This patch fixes the following warning if 64-bit architecture environment: ./drivers/usb/renesas_usbhs/common.c:496:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] dparam->type = of_id ? (u32)of_id->data : 0; Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04usb: define HCD_USB31 speed option for hosts that support USB 3.1 featuresMathias Nyman
Hosts that support USB 3.1 Enhaned SuperSpeed can set their speed to HCD_USB31 to let usb core and host drivers know that the controller supports new USB 3.1 features. make sure usb core handle HCD_USB31 hosts correctly, for now similar to HCD_USB3. Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04usb: store the new usb 3.1 SuperSpeedPlus device capability descriptorMathias Nyman
If a device supports usb 3.1 SupeerSpeedPlus Gen2 speeds it povides a SuperSpeedPlus device capability descriptor as a part of its BOS descriptor. If we find one while parsing the BOS then save it togeter with the other device capabilities found in the BOS Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04usb: Add USB 3.1 SuperSpeedPlus device capability descriptorMathias Nyman
USB 3.1 SuperSpeedPlus device capability descriptor is returned as part of the bos descriptor for devices that support SuperSpeedPlus protocol. The descriptor contains more detailed information about the supported speeds of the device. More details about the descriptor can be found in the USB 3.1 specification section 9.6.2.5 Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-03tcp/dccp: add SLAB_DESTROY_BY_RCU flag for request socketsEric Dumazet
Before letting request sockets being put in TCP/DCCP regular ehash table, we need to add either : - SLAB_DESTROY_BY_RCU flag to their kmem_cache - add RCU grace period before freeing them. Since we carefully respected the SLAB_DESTROY_BY_RCU protocol like ESTABLISH and TIMEWAIT sockets, use it here. req_prot_init() being only used by TCP and DCCP, I did not add a new slab_flags into their rsk_prot, but reuse prot->slab_flags Since all reqsk_alloc() users are correctly dealing with a failure, add the __GFP_NOWARN flag to avoid traces under pressure. Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "Bunch of fixes all over the place, all pretty small: amdgpu, i915, exynos, one qxl and one vmwgfx. There is also a bunch of mst fixes, I left some cleanups in the series as I didn't think it was worth splitting up the tested series" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (37 commits) drm/dp/mst: add some defines for logical/physical ports drm/dp/mst: drop cancel work sync in the mstb destroy path (v2) drm/dp/mst: split connector registration into two parts (v2) drm/dp/mst: update the link_address_sent before sending the link address (v3) drm/dp/mst: fixup handling hotplug on port removal. drm/dp/mst: don't pass port into the path builder function drm/radeon: drop radeon_fb_helper_set_par drm: handle cursor_set2 in restore_fbdev_mode drm/exynos: Staticize local function in exynos_drm_gem.c drm/exynos: fimd: actually disable dp clock drm/exynos: dp: remove suspend/resume functions drm/qxl: recreate the primary surface when the bo is not primary drm/amdgpu: only print meaningful VM faults drm/amdgpu/cgs: remove import_gpu_mem drm/i915: Call non-locking version of drm_kms_helper_poll_enable(), v2 drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2 drm/vmwgfx: Fix a command submission hang regression drm/exynos: remove unused mode_fixup() code drm/exynos: remove decon_mode_fixup() drm/exynos: remove fimd_mode_fixup() ...
2015-10-03phylib: Add phy_set_max_speed helperSimon Horman
Add a helper to allow ethernet drivers to limit the speed of a phy (that they are attached to). This mainly involves factoring out the business-end of of_set_phy_supported() and exporting a new symbol. This code seems to be open coded in several places, in several different variants. It is is envisaged that this will be used in situations where setting the "max-speed" property in DT is not appropriate, e.g. because the maximum speed is not a property of the phy hardware. Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03sched, bpf: add helper for retrieving routing realmsDaniel Borkmann
Using routing realms as part of the classifier is quite useful, it can be viewed as a tag for one or multiple routing entries (think of an analogy to net_cls cgroup for processes), set by user space routing daemons or via iproute2 as an indicator for traffic classifiers and later on processed in the eBPF program. Unlike actions, the classifier can inspect device flags and enable netif_keep_dst() if necessary. tc actions don't have that possibility, but in case people know what they are doing, it can be used from there as well (e.g. via devs that must keep dsts by design anyway). If a realm is set, the handler returns the non-zero realm. User space can set the full 32bit realm for the dst. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03ebpf: migrate bpf_prog's flags to bitfieldDaniel Borkmann
As we need to add further flags to the bpf_prog structure, lets migrate both bools to a bitfield representation. The size of the base structure (excluding insns) remains unchanged at 40 bytes. Add also tags for the kmemchecker, so that it doesn't throw false positives. Even in case gcc would generate suboptimal code, it's not being accessed in performance critical paths. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03switchdev: push object ID back to object structureJiri Pirko
Suggested-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03switchdev: bring back switchdev_obj and use it as a generic object paramJiri Pirko
Replace "void *obj" with a generic structure. Introduce couple of helpers along that. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03switchdev: rename switchdev_obj_fdb to switchdev_obj_port_fdbJiri Pirko
Make the struct name in sync with object id name. Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03switchdev: rename switchdev_obj_vlan to switchdev_obj_port_vlanJiri Pirko
Make the struct name in sync with object id name. Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03switchdev: rename SWITCHDEV_ATTR_* enum values to SWITCHDEV_ATTR_ID_*Jiri Pirko
To be aligned with obj. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03switchdev: rename SWITCHDEV_OBJ_* enum values to SWITCHDEV_OBJ_ID_*Jiri Pirko
Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03tcp: remove max_qlen_logEric Dumazet
This control variable was set at first listen(fd, backlog) call, but not updated if application tried to increase or decrease backlog. It made sense at the time listener had a non resizeable hash table. Also rounding to powers of two was not very friendly. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03tcp/dccp: remove struct listen_sockEric Dumazet
It is enough to check listener sk_state, no need for an extra condition. max_qlen_log can be moved into struct request_sock_queue We can remove syn_wait_lock and the alignment it enforced. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03tcp: attach SYNACK messages to request sockets instead of listenerEric Dumazet
If a listen backlog is very big (to avoid syncookies), then the listener sk->sk_wmem_alloc is the main source of false sharing, as we need to touch it twice per SYNACK re-transmit and TX completion. (One SYN packet takes listener lock once, but up to 6 SYNACK are generated) By attaching the skb to the request socket, we remove this source of contention. Tested: listen(fd, 10485760); // single listener (no SO_REUSEPORT) 16 RX/TX queue NIC Sustain a SYNFLOOD attack of ~320,000 SYN per second, Sending ~1,400,000 SYNACK per second. Perf profiles now show listener spinlock being next bottleneck. 20.29% [kernel] [k] queued_spin_lock_slowpath 10.06% [kernel] [k] __inet_lookup_established 5.12% [kernel] [k] reqsk_timer_handler 3.22% [kernel] [k] get_next_timer_interrupt 3.00% [kernel] [k] tcp_make_synack 2.77% [kernel] [k] ipt_do_table 2.70% [kernel] [k] run_timer_softirq 2.50% [kernel] [k] ip_finish_output 2.04% [kernel] [k] cascade Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03ipv6: remove obsolete inet6 functionsEric Dumazet
inet6_csk_search_req() and inet6_csk_reqsk_queue_hash_add() no longer exist. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03tcp/dccp: shrink struct listen_sockEric Dumazet
We no longer use hash_rnd, nr_table_entries and syn_table[] For a listener with a backlog of 10 millions sockets, this saves 80 MBytes of vmalloced memory. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03tcp/dccp: install syn_recv requests into ehash tableEric Dumazet
In this patch, we insert request sockets into TCP/DCCP regular ehash table (where ESTABLISHED and TIMEWAIT sockets are) instead of using the per listener hash table. ACK packets find SYN_RECV pseudo sockets without having to find and lock the listener. In nominal conditions, this halves pressure on listener lock. Note that this will allow for SO_REUSEPORT refinements, so that we can select a listener using cpu/numa affinities instead of the prior 'consistent hash', since only SYN packets will apply this selection logic. We will shrink listen_sock in the following patch to ease code review. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ying Cai <ycai@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03tcp/dccp: remove inet_csk_reqsk_queue_added() timeout argumentEric Dumazet
This is no longer used. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03tcp: get_openreq[46]() changesEric Dumazet
When request sockets are no longer in a per listener hash table but on regular TCP ehash, we need to access listener uid through req->rsk_listener get_openreq6() also gets a const for its request socket argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-03tcp/dccp: init sk_prot and call sk_node_init() in reqsk_alloc()Eric Dumazet
We plan to use generic functions to insert request sockets into ehash table. sk_prot needs to be set (to retrieve sk_prot->h.hashinfo) sk_node needs to be cleared. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>