summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-02-11gre: Use GSO flags to determine csum need instead of GRE flagsAlexander Duyck
This patch updates the gre checksum path to follow something much closer to the UDP checksum path. By doing this we can avoid needing to do as much header inspection and can just make use of the fields we were already reading in the sk_buff structure. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: Move skb_has_shared_frag check out of GRE code and into segmentationAlexander Duyck
The call skb_has_shared_frag is used in the GRE path and skb_checksum_help to verify that no frags can be modified by an external entity. This check really doesn't belong in the GRE path but in the skb_segment function itself. This way any protocol that might be segmented will be performing this check before attempting to offload a checksum to software. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: Store checksum result for offloaded GSO checksumsAlexander Duyck
This patch makes it so that we can offload the checksums for a packet up to a certain point and then begin computing the checksums via software. Setting this up is fairly straight forward as all we need to do is reset the values stored in csum and csum_start for the GSO context block. One complication for this is remote checksum offload. In order to allow the inner checksums to be offloaded while computing the outer checksum manually we needed to have some way of indicating that the offload wasn't real. In order to do that I replaced CHECKSUM_PARTIAL with CHECKSUM_UNNECESSARY in the case of us computing checksums for the outer header while skipping computing checksums for the inner headers. We clean up the ip_summed flag and set it to either CHECKSUM_PARTIAL or CHECKSUM_NONE once we hand the packet off to the next lower level. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: Update remote checksum segmentation to support use of GSO checksumAlexander Duyck
This patch addresses two main issues. First in the case of remote checksum offload we were avoiding dealing with scatter-gather issues. As a result it would be possible to assemble a series of frames that used frags instead of being linearized as they should have if remote checksum offload was enabled. Second I have updated the code so that we now let GSO take care of doing the checksum on the data itself and drop the special case that was added for remote checksum offload. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: Move GSO csum into SKB_GSO_CBAlexander Duyck
This patch moves the checksum maintained by GSO out of skb->csum and into the GSO context block in order to allow for us to work on outer checksums while maintaining the inner checksum offsets in the case of the inner checksum being offloaded, while the outer checksums will be computed. While updating the code I also did a minor cleanu-up on gso_make_checksum. The change is mostly to make it so that we store the values and compute the checksum instead of computing the checksum and then storing the values we needed to update. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: Drop unecessary enc_features variable from tunnel segmentation functionsAlexander Duyck
The enc_features variable isn't necessary since features isn't used anywhere after we create enc_features so instead just use a destructive AND on features itself and save ourselves the variable declaration. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11ipv6: add option to drop unsolicited neighbor advertisementsJohannes Berg
In certain 802.11 wireless deployments, there will be NA proxies that use knowledge of the network to correctly answer requests. To prevent unsolicitd advertisements on the shared medium from being a problem, on such deployments wireless needs to drop them. Enable this by providing an option called "drop_unsolicited_na". Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11ipv6: add option to drop unicast encapsulated in L2 multicastJohannes Berg
In order to solve a problem with 802.11, the so-called hole-196 attack, add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if enabled, causes the stack to drop IPv6 unicast packets encapsulated in link-layer multi- or broadcast frames. Such frames can (as an attack) be created by any member of the same wireless network and transmitted as valid encrypted frames since the symmetric key for broadcast frames is shared between all stations. Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11ipv4: add option to drop gratuitous ARP packetsJohannes Berg
In certain 802.11 wireless deployments, there will be ARP proxies that use knowledge of the network to correctly answer requests. To prevent gratuitous ARP frames on the shared medium from being a problem, on such deployments wireless needs to drop them. Enable this by providing an option called "drop_gratuitous_arp". Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11ipv4: add option to drop unicast encapsulated in L2 multicastJohannes Berg
In order to solve a problem with 802.11, the so-called hole-196 attack, add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if enabled, causes the stack to drop IPv4 unicast packets encapsulated in link-layer multi- or broadcast frames. Such frames can (as an attack) be created by any member of the same wireless network and transmitted as valid encrypted frames since the symmetric key for broadcast frames is shared between all stations. Additionally, enabling this option provides compliance with a SHOULD clause of RFC 1122. Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11net: Add support for filtering link dump by master device and kindDavid Ahern
Add support for filtering link dumps by master device and kind, similar to the filtering implemented for neighbor dumps. Each net_device that exists adds between 1196 bytes (eth) and 1556 bytes (bridge) to the link dump. As the number of interfaces increases so does the amount of data pushed to user space for a link list. If the user only wants to see a list of specific devices (e.g., interfaces enslaved to a specific bridge or a list of VRFs) most of that data is thrown away. Passing the filters to the kernel to have only relevant data returned makes the dump more efficient. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11soreuseport: fast reuseport TCP socket selectionCraig Gallek
This change extends the fast SO_REUSEPORT socket lookup implemented for UDP to TCP. Listener sockets with SO_REUSEPORT and the same receive address are additionally added to an array for faster random access. This means that only a single socket from the group must be found in the listener list before any socket in the group can be used to receive a packet. Previously, every socket in the group needed to be considered before handing off the incoming packet. This feature also exposes the ability to use a BPF program when selecting a socket from a reuseport group. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11soreuseport: Prep for fast reuseport TCP socket selectionCraig Gallek
Both of the lines in this patch probably should have been included in the initial implementation of this code for generic socket support, but weren't technically necessary since only UDP sockets were supported. First, the sk_reuseport_cb points to a structure which assumes each socket in the group has this pointer assigned at the same time it's added to the array in the structure. The sk_clone_lock function breaks this assumption. Since a child socket shouldn't implicitly be in a reuseport group, the simple fix is to clear the field in the clone. Second, the SO_ATTACH_REUSEPORT_xBPF socket options require that SO_REUSEPORT also be set first. For UDP sockets, this is easily enforced at bind-time since that process both puts the socket in the appropriate receive hlist and updates the reuseport structures. Since these operations can happen at two different times for TCP sockets (bind and listen) it must be explicitly checked to enforce the use of SO_REUSEPORT with SO_ATTACH_REUSEPORT_xBPF in the setsockopt call. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11inet: refactor inet[6]_lookup functions to take skbCraig Gallek
This is a preliminary step to allow fast socket lookup of SO_REUSEPORT groups. Doing so with a BPF filter will require access to the skb in question. This change plumbs the skb (and offset to payload data) through the call stack to the listening socket lookup implementations where it will be used in a following patch. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11inet: create IPv6-equivalent inet_hash functionCraig Gallek
In order to support fast lookups for TCP sockets with SO_REUSEPORT, the function that adds sockets to the listening hash set needs to be able to check receive address equality. Since this equality check is different for IPv4 and IPv6, we will need two different socket hashing functions. This patch adds inet6_hash identical to the existing inet_hash function and updates the appropriate references. A following patch will differentiate the two by passing different comparison functions to __inet_hash. Additionally, in order to use the IPv6 address equality function from inet6_hashtables (which is compiled as a built-in object when IPv6 is enabled) it also needs to be in a built-in object file as well. This moves ipv6_rcv_saddr_equal into inet_hashtables to accomplish this. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-11sock: struct proto hash function may errorCraig Gallek
In order to support fast reuseport lookups in TCP, the hash function defined in struct proto must be capable of returning an error code. This patch changes the function signature of all related hash functions to return an integer and handles or propagates this return value at all call sites. Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-10batman-adv: Convert batadv_tt_common_entry to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_orig_node to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_orig_node_vlan to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_hard_iface to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_neigh_node to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_orig_ifinfo to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_neigh_ifinfo to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_tt_orig_list_entry to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_tvlv_handler to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_tvlv_container to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_dat_entry to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_nc_path to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_nc_node to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_bla_claim to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_bla_backbone_gw to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_softif_vlan to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_gw_node to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Convert batadv_hardif_neigh_node to krefSven Eckelmann
batman-adv uses a self-written reference implementation which is just based on atomic_t. This is less obvious when reading the code than kref and therefore increases the change that the reference counting will be missed. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Add lockdep assert for container_list_lockSven Eckelmann
The batadv_tvlv_container* functions state in their kernel-doc that they require tvlv.container_list_lock. Add an assert to automatically detect when this might have been ignored by the caller. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: add seqno maximum age and protection start flag parametersSimon Wunderlich
To allow future use of the window protected function with different maximum sequence numbers, add a parameter to set this value which was previously hardcoded. Another parameter added for future use is a flag to return whether the protection window has started. While at it, also fix the kerneldoc. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10batman-adv: Drop reference to netdevice on last referenceSven Eckelmann
The references to the network device should be dropped inside the release function for batadv_hard_iface similar to what is done with the batman-adv internal datastructures. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
2016-02-10vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devicesDavid Wragg
Prior to 4.3, openvswitch tunnel vports (vxlan, gre and geneve) could transmit vxlan packets of any size, constrained only by the ability to send out the resulting packets. 4.3 introduced netdevs corresponding to tunnel vports. These netdevs have an MTU, which limits the size of a packet that can be successfully encapsulated. The default MTU values are low (1500 or less), which is awkwardly small in the context of physical networks supporting jumbo frames, and leads to a conspicuous change in behaviour for userspace. Instead, set the MTU on openvswitch-created netdevs to be the relevant maximum (i.e. the maximum IP packet size minus any relevant overhead), effectively restoring the behaviour prior to 4.3. Signed-off-by: David Wragg <david@weave.works> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09flow_dissector: Fix unaligned access in __skb_flow_dissector when used by ↵Alexander Duyck
eth_get_headlen This patch fixes an issue with unaligned accesses when using eth_get_headlen on a page that was DMA aligned instead of being IP aligned. The fact is when trying to check the length we don't need to be looking at the flow label so we can reorder the checks to first check if we are supposed to gather the flow label and then make the call to actually get it. v2: Updated path so that either STOP_AT_FLOW_LABEL or KEY_FLOW_LABEL can cause us to check for the flow label. Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09packet: tpacket_snd gso and checksum offloadWillem de Bruijn
Support socket option PACKET_VNET_HDR together with PACKET_TX_RING. When enabled, a struct virtio_net_hdr is expected to precede the data in the ring. The vnet option must be set before the ring is created. The implementation reuses the existing skb_copy_bits code that is used when dev->hard_header_len is non-zero. Move this ll_header check to before the skb alloc and combine it with a test for vnet_hdr->hdr_len. Allocate and copy the max of the two. Verified with test program at github.com/wdebruij/kerneltools/blob/master/tests/psock_txring_vnet.c Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09packet: parse tpacket header before skb allocWillem de Bruijn
GSO packet headers must be stored in the linear skb segment. Move tpacket header parsing before sock_alloc_send_skb. The GSO follow-on patch will later increase the skb linear argument to sock_alloc_send_skb if needed for large packets. The header parsing code does not require an allocated skb, so is safe to move. Later pass to tpacket_fill_skb the computed data start and length. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09packet: vnet_hdr support for tpacket_rcvWillem de Bruijn
Support socket option PACKET_VNET_HDR together with PACKET_RX_RING. When enabled, a struct virtio_net_hdr will precede the data in the packet ring slots. Verified with test program at github.com/wdebruij/kerneltools/blob/master/tests/psock_rxring_vnet.c pkt: 1454269209.798420 len=5066 vnet: gso_type=tcpv4 gso_size=1448 hlen=66 ecn=off csum: start=34 off=16 eth: proto=0x800 ip: src=<masked> dst=<masked> proto=6 len=5052 Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09packet: move vnet_hdr code to helper functionsWillem de Bruijn
packet_snd and packet_rcv support virtio net headers for GSO. Move this logic into helper functions to be able to reuse it in tpacket_snd and tpacket_rcv. This is a straighforward code move with one exception. Instead of creating and passing a separate gso_type variable, reuse vnet_hdr.gso_type after conversion from virtio to kernel gso type. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09sctp: translate network order to host order when users get a hmacidXin Long
Commit ed5a377d87dc ("sctp: translate host order to network order when setting a hmacid") corrected the hmacid byte-order when setting a hmacid. but the same issue also exists on getting a hmacid. We fix it by changing hmacids to host order when users get them with getsockopt. Fixes: Commit ed5a377d87dc ("sctp: translate host order to network order when setting a hmacid") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09bridge: mdb: Passing the port-group pointer to br_mdb moduleElad Raz
Passing the port-group to br_mdb in order to allow direct access to the structure. br_mdb will later use the structure to reflect HW reflection status via "state" variable. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09bridge: mdb: Separate br_mdb_entry->state from net_bridge_port_group->stateElad Raz
Change net_bridge_port_group 'state' member to 'flags' and define new set of flags internal to the kernel. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09net:Add sysctl_max_skb_fragsHans Westgaard Ry
Devices may have limits on the number of fragments in an skb they support. Current codebase uses a constant as maximum for number of fragments one skb can hold and use. When enabling scatter/gather and running traffic with many small messages the codebase uses the maximum number of fragments and may thereby violate the max for certain devices. The patch introduces a global variable as max number of fragments. Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-09tcp: do not drop syn_recv on all icmp reportsEric Dumazet
Petr Novopashenniy reported that ICMP redirects on SYN_RECV sockets were leading to RST. This is of course incorrect. A specific list of ICMP messages should be able to drop a SYN_RECV. For instance, a REDIRECT on SYN_RECV shall be ignored, as we do not hold a dst per SYN_RECV pseudo request. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111751 Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Reported-by: Petr Novopashenniy <pety@rusnet.ru> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-08ipv6: fix a lockdep splatEric Dumazet
Silence lockdep false positive about rcu_dereference() being used in the wrong context. First one should use rcu_dereference_protected() as we own the spinlock. Second one should be a normal assignation, as no barrier is needed. Fixes: 18367681a10bd ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.") Reported-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-08unix: correctly track in-flight fds in sending process user_structHannes Frederic Sowa
The commit referenced in the Fixes tag incorrectly accounted the number of in-flight fds over a unix domain socket to the original opener of the file-descriptor. This allows another process to arbitrary deplete the original file-openers resource limit for the maximum of open files. Instead the sending processes and its struct cred should be credited. To do so, we add a reference counted struct user_struct pointer to the scm_fp_list and use it to account for the number of inflight unix fds. Fixes: 712f4aad406bb1 ("unix: properly account for FDs passed over unix sockets") Reported-by: David Herrmann <dh.herrmann@gmail.com> Cc: David Herrmann <dh.herrmann@gmail.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>