linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2013-10-22	netfilter: ipset: The unnamed union initialization may lead to compilation error	Jozsef Kadlecsik
	The unnamed union should be possible to be initialized directly, but unfortunately it's not so: /usr/src/ipset/kernel/net/netfilter/ipset/ip_set_hash_netnet.c: In function ?hash_netnet4_kadt?: /usr/src/ipset/kernel/net/netfilter/ipset/ip_set_hash_netnet.c:141: error: unknown field ?cidr? specified in initializer Reported-by: Husnu Demir <hdemir@metu.edu.tr> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-22	netfilter: ipset: Use netlink callback dump args only	Jozsef Kadlecsik
	Instead of cb->data, use callback dump args only and introduce symbolic names instead of plain numbers at accessing the argument members. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-22	netfilter: x_tables: fix ordering of jumpstack allocation and table update	Will Deacon
	During kernel stability testing on an SMP ARMv7 system, Yalin Wang reported the following panic from the netfilter code: 1fe0: 0000001c 5e2d3b10 4007e779 4009e110 60000010 00000032 ff565656 ff545454 [<c06c48dc>] (ipt_do_table+0x448/0x584) from [<c0655ef0>] (nf_iterate+0x48/0x7c) [<c0655ef0>] (nf_iterate+0x48/0x7c) from [<c0655f7c>] (nf_hook_slow+0x58/0x104) [<c0655f7c>] (nf_hook_slow+0x58/0x104) from [<c0683bbc>] (ip_local_deliver+0x88/0xa8) [<c0683bbc>] (ip_local_deliver+0x88/0xa8) from [<c0683718>] (ip_rcv_finish+0x418/0x43c) [<c0683718>] (ip_rcv_finish+0x418/0x43c) from [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598) [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598) from [<c062b314>] (process_backlog+0x84/0x158) [<c062b314>] (process_backlog+0x84/0x158) from [<c062de84>] (net_rx_action+0x70/0x1dc) [<c062de84>] (net_rx_action+0x70/0x1dc) from [<c0088230>] (__do_softirq+0x11c/0x27c) [<c0088230>] (__do_softirq+0x11c/0x27c) from [<c008857c>] (do_softirq+0x44/0x50) [<c008857c>] (do_softirq+0x44/0x50) from [<c0088614>] (local_bh_enable_ip+0x8c/0xd0) [<c0088614>] (local_bh_enable_ip+0x8c/0xd0) from [<c06b0330>] (inet_stream_connect+0x164/0x298) [<c06b0330>] (inet_stream_connect+0x164/0x298) from [<c061d68c>] (sys_connect+0x88/0xc8) [<c061d68c>] (sys_connect+0x88/0xc8) from [<c000e340>] (ret_fast_syscall+0x0/0x30) Code: 2a000021 e59d2028 e59de01c e59f011c (e7824103) ---[ end trace da227214a82491bd ]--- Kernel panic - not syncing: Fatal exception in interrupt This comes about because CPU1 is executing xt_replace_table in response to a setsockopt syscall, resulting in: ret = xt_jumpstack_alloc(newinfo); --> newinfo->jumpstack = kzalloc(size, GFP_KERNEL); [...] table->private = newinfo; newinfo->initial_entries = private->initial_entries; Meanwhile, CPU0 is handling the network receive path and ends up in ipt_do_table, resulting in: private = table->private; [...] jumpstack = (struct ipt_entry **)private->jumpstack[cpu]; On weakly ordered memory architectures, the writes to table->private and newinfo->jumpstack from CPU1 can be observed out of order by CPU0. Furthermore, on architectures which don't respect ordering of address dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered. This patch adds an smp_wmb() before the assignment to table->private (which is essentially publishing newinfo) to ensure that all writes to newinfo will be observed before plugging it into the table structure. A dependent-read barrier is also added on the consumer sides, to ensure the same ordering requirements are also respected there. Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reported-by: Wang, Yalin <Yalin.Wang@sonymobile.com> Tested-by: Wang, Yalin <Yalin.Wang@sonymobile.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-10-21	tcp: initialize passive-side sk_pacing_rate after 3WHS	Neal Cardwell
	For passive TCP connections, upon receiving the ACK that completes the 3WHS, make sure we set our pacing rate after we get our first RTT sample. On passive TCP connections, when we receive the ACK completing the 3WHS we do not take an RTT sample in tcp_ack(), but rather in tcp_synack_rtt_meas(). So upon receiving the ACK that completes the 3WHS, tcp_ack() leaves sk_pacing_rate at its initial value. Originally the initial sk_pacing_rate value was 0, so passive-side connections defaulted to sysctl_tcp_min_tso_segs (2 segs) in skbuffs made in the first RTT. With a default initial cwnd of 10 packets, this happened to be correct for RTTs 5ms or bigger, so it was hard to see problems in WAN or emulated WAN testing. Since 7eec4174ff ("pkt_sched: fq: fix non TCP flows pacing"), the initial sk_pacing_rate is 0xffffffff. So after that change, passive TCP connections were keeping this value (and using large numbers of segments per skbuff) until receiving an ACK for data. Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21	ipv6: probe routes asynchronous in rt6_probe	Hannes Frederic Sowa
	Routes need to be probed asynchronous otherwise the call stack gets exhausted when the kernel attemps to deliver another skb inline, like e.g. xt_TEE does, and we probe at the same time. We update neigh->updated still at once, otherwise we would send to many probes. Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21	ipv6: sit: add GSO/TSO support	Eric Dumazet
	Now ipv6_gso_segment() is stackable, its relatively easy to implement GSO/TSO support for SIT tunnels Performance results, when segmentation is done after tunnel device (as no NIC is yet enabled for TSO SIT support) : Before patch : lpq84:~# ./netperf -H 2002:af6:1153:: -Cc MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 3168.31 4.81 4.64 2.988 2.877 After patch : lpq84:~# ./netperf -H 2002:af6:1153:: -Cc MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6 Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 5525.00 7.76 5.17 2.763 1.840 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21	ipv6: gso: make ipv6_gso_segment() stackable	Eric Dumazet
	In order to support GSO on SIT tunnels, we need to make inet_gso_segment() stackable. It should not assume network header starts right after mac header. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21	ipv4: Allow unprivileged users to use per net sysctls	Eric W. Biederman
	Allow unprivileged users to use: /proc/sys/net/ipv4/icmp_echo_ignore_all /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts /proc/sys/net/ipv4/icmp_ignore_bogus_error_response /proc/sys/net/ipv4/icmp_errors_use_inbound_ifaddr /proc/sys/net/ipv4/icmp_ratelimit /proc/sys/net/ipv4/icmp_ratemask /proc/sys/net/ipv4/ping_group_range /proc/sys/net/ipv4/tcp_ecn /proc/sys/net/ipv4/ip_local_ports_range These are occassionally handy and after a quick review I don't see any problems with unprivileged users using them. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21	ipv4: Use math to point per net sysctls into the appropriate struct net.	Eric W. Biederman
	Simplify maintenance of ipv4_net_table by using math to point the per net sysctls into the appropriate struct net, instead of manually reassinging all of the variables into hard coded table slots. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21	tcp_memcontrol: Kill struct tcp_memcontrol	Eric W. Biederman
	Replace the pointers in struct cg_proto with actual data fields and kill struct tcp_memcontrol as it is not fully redundant. This removes a confusing, unnecessary layer of abstraction. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21	tcp_memcontrol: Remove the per netns control.	Eric W. Biederman
	The code that is implemented is per memory cgroup not per netns, and having per netns bits is just confusing. Remove the per netns bits to make it easier to see what is really going on. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21	tcp_memcontrol: Remove setting cgroup settings via sysctl	Eric W. Biederman
	The code is broken and does not constrain sysctl_tcp_mem as tcp_update_limit does. With the result that it allows the cgroup tcp memory limits to be bypassed. The semantics are broken as the settings are not per netns and are in a per netns table, and instead looks at current. Since the code is broken in both design and implementation and does not implement the functionality for which it was written remove it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21	tcp_memcontrol: Remove tcp_max_memory	Eric W. Biederman
	This function is never called. Remove it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21	netfilter: nf_conntrack: fix rt6i_gateway checks for H.323 helper	Julian Anastasov
	Now when rt6_nexthop() can return nexthop address we can use it for proper nexthop comparison of directly connected destinations. For more information refer to commit bbb5823cf742a7 ("netfilter: nf_conntrack: fix rt_gateway checks for H.323 helper"). Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21	ipv6: fill rt6i_gateway with nexthop address	Julian Anastasov
	Make sure rt6i_gateway contains nexthop information in all routes returned from lookup or when routes are directly attached to skb for generated ICMP packets. The effect of this patch should be a faster version of rt6_nexthop() and the consideration of local addresses as nexthop. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-21	Bluetooth: Remove sk member from struct l2cap_chan	Gustavo Padovan
	There is no access to chan->sk in L2CAP core now. This change marks the end of the task of splitting L2CAP between Core and Socket, thus sk is now gone from struct l2cap_chan. Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21	Bluetooth: Use bt_cb(skb)->chan to send raw data back	Gustavo Padovan
	Instead of accessing skb->sk in L2CAP core we now compare the channel a skb belongs to and not send it back if the channel is same. This change removes another struct socket usage from L2CAP core. Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21	Bluetooth: Add L2CAP channel to skb private data	Gustavo Padovan
	Adding the channel to the skb private data makes possible to us know which channel the skb we have came from. Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21	Bluetooth: Remove parent socket usage from l2cap_core.c	Gustavo Padovan
	The parent socket is not used inside the L2CAP core anymore. We only lock it to indirect access through the new_connection() callback. The hold of the socket lock was moved to the new_connection() callback. Inside L2CAP core the channel lock is now used in l2cap_le_conn_ready() and l2cap_conn_ready() to protect the execution of these two functions during the handling of new incoming connections. This change remove the socket lock usage from L2CAP core while keeping the code safe against race conditions. Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21	Bluetooth: Remove socket lock from l2cap_state_change()	Gustavo Padovan
	This simplify and make safer the state change handling inside l2cap_core.c. we got rid of __l2cap_state_change(). And l2cap_state_change() doesn't lock the socket anymore, instead the socket is locked inside the ops callback for state change in l2cap_sock.c. It makes the code safer because in some we were using a unlocked version, and now we are calls to l2cap_state_change(), when dealing with sockets, use the locked version. Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21	Bluetooth: Hold socket in defer callback in L2CAP socket	Gustavo Padovan
	In both places that we use the defer callback the socket lock is held for a indirect sk access inside __l2cap_change_state() and chan->ops->defer(), all the rest of the code between lock_sock() and release_sock() is already protected by the channel lock and won't be affected by this change. We now use l2cap_change_state(), the locked version of the change state function, and the defer callback does the locking itself now. This does not affect other uses of the defer callback. Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21	Bluetooth: Do not access chan->sk directly	Gustavo Padovan
	In the process of removing socket usage from L2CAP we now access the L2CAP socket from the data member of struct l2cap_chan. For the L2CAP socket user the data member points to the L2CAP socket. Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21	Bluetooth: Remove not used struct sock	Gustavo Padovan
	It is a leftover from the recent effort of remove sk usage from L2CAP core. Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21	Bluetooth: Fix enabling fast connectable on LE-only controllers	Johan Hedberg
	The current "fast connectable" feature is BR/EDR-only, so add a proper check for BR/EDR support before proceeding with the associated HCI commands. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-21	cfg80211: update dfs_state_entered upon dfs_state change	Michal Kazior
	The timestamp wasn't updated after transitioning to the NL80211_DFS_USABLE state after NOP time. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-21	cfg80211: fix DFS channel recovery timeout	Michal Kazior
	The timeout was not properly converted from msecs to jiffies. As a result channel transition to NL80211_DFS_USABLE was delayed depending on CONFIG_HZ configuration, e.g. HZ=100 would delay the NOP from 30 minutes to 300 minutes. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-21	cfg80211: fix scheduled scan pointer access	Johannes Berg
	Since rdev->sched_scan_req is dereferenced outside the lock protecting it, this might be done at the wrong time, causing crashes. Move the dereference to where it should be - inside the RTNL locked section. Cc: stable@vger.kernel.org [3.8+] Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-10-21	xfrm: Don't queue retransmitted packets if the original is still on the host	Steffen Klassert
	It does not make sense to queue retransmitted packets if the original packet is still in some queue of this host. So add a check to xdst_queue_output() and drop the packet if the original packet is not yet sent. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Eric Dumazet <edumazet@google.com>
2013-10-21	xfrm: use vmalloc_node() for percpu scratches	Eric Dumazet
	scratches are per cpu, we can use vmalloc_node() for proper NUMA affinity. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-10-20	Bluetooth: Update Set Discoverable to support LE	Johan Hedberg
	This patch updates the Set Discoverable management command to also be applicable for LE. In particular this affects the advertising flags where we can say "general discoverable" or "limited discoverable". Since the device flags may not be up-to-date when the advertising data is written this patch introduces a get_adv_discov_flags() helper function which also looks at any pending mgmt commands (a pending set_discoverable would be the exception when the flags are not yet correct). The patch also adds HCI_DISCOVERABLE flag clearing to the mgmt_discoverable_timeout function, since the code was previously relying on the mgmt_discoverable callback to handle this, which is only called for the BR/EDR-only HCI_Write_Scan_Enable command. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20	Bluetooth: Move HCI_LIMITED_DISCOVERABLE changes to a general place	Johan Hedberg
	We'll soon be introducing also LE support for the Set Discoverable management command, so move the HCI_LIMITED_DISCOVERABLE flag clearing and setting out from the if-branch that is only used for a BR/EDR specific HCI command. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20	Bluetooth: Fix sending write_scan_enable when BR/EDR is disabled	Johan Hedberg
	We should only send the HCI_Write_Scan_Enable command from mgmt_set_powered_failed() when BR/EDR support is enabled. This is particularly important when the discoverable setting is also tied to LE. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20	Bluetooth: Move mgmt_pending_find to avoid forward declarations	Johan Hedberg
	We will soon need this function for updating the advertising data, so move it higher up in mgmt.c to avoid a forward declaration. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20	Bluetooth: Fix updating settings when there are no HCI commands to send	Johan Hedberg
	It is possible that the Set Connectable management command doesn't cause any HCI commands to send (such as when BR/EDR is disabled). We can't just send a response to user space in this case but must also update the necessary device flags and settings. This patch fixes the issue by using the recently introduced set_connectable_update_settings function. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20	Bluetooth: Refactor set_connectable settings update to separate function	Johan Hedberg
	We will need to directly update the device flags and notify user space of the new settings not just when we're powered off but also if it turns out that there are no HCI commands to send (which can happen in particular when BR/EDR is disabled). Since this is a considerable amount of code, refactor it to a separate function so it can be reused for the "no HCI commands to send" case. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20	Bluetooth: Add missing check for BREDR_ENABLED flag in update_class()	Johan Hedberg
	We shouldn't be sending the HCI_Write_Class_Of_Device command when BR/EDR is disabled since this is a BR/EDR-only command. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20	Bluetooth: Check for flag instead of features in update_adv_data()	Johan Hedberg
	It's better to check for the device flag instead of device features so that we avoid unnecessary HCI commands when the feature is supported but disabled (i.e. the flag is unset). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-20	Bluetooth: Check for flag instead of features in update_scan_rsp_data()	Johan Hedberg
	It's better to check for the device flag instead of device features so that we avoid unnecessary HCI commands when the feature is supported but disabled (i.e. the flag is unset). Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2013-10-19	Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge	David S. Miller
	Antonio Quartulli says: ==================== this is another batch intended for net-next/linux-3.13. This pull request is a bit bigger than usual, but 6 patches are very small (three of them are about email updates).. Patch 1 is fixing a previous merge conflict resolution that went wrong (I realised that only now while checking other patches..). Patches from 2 to 4 that are updating our emails in all the proper files (Documentation/, headers and MAINTAINERS). Patches 5, 6 and 7 are bringing a big improvement to the TranslationTable component: it is now able to group non-mesh clients based on the VLAN they belong to. In this way a lot a new enhancements are now possible thanks to the fact that each batman-adv behaviour can be applied on a per VLAN basis. And, of course, in patches from 8 to 12 you have some of the enhancements I was talking about: - make the batman-Gateway selection VLAN dependent - make DAT (Distributed ARP Table) group ARP entries on a VLAN basis (this allows DAT to work even when the admin decided to use the same IP subnet on different VLANs) - make the AP-Isolation behaviour switchable on each VLAN independently - export VLAN specific attributes via sysfs. Switches like the AP-Isolation are now exported once per VLAN (backward compatibility of the sysfs interface has been preserved) Patches 13 and 14 are small code cleanups. Patch 15 is a minor improvement in the TT locking mechanism. Patches 16 and 17 are other enhancements to the TT component. Those allow a node to parse a "non-mesh client announcement message" and accept only those TT entries belonging to certain VLANs. Patch 18 exploits this parse&accept mechanism to make the Bridge Loop Avoidance component reject only TT entries connected to the VLAN where it is operating. Previous to this change, BLA was rejecting all the entries coming from any other Backbone node, regardless of the VLAN (for more details about how the Bridge Loop Avoidance works please check [1]). [1] http://www.open-mesh.org/projects/batman-adv/wiki/Bridge-loop-avoidance-II ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19	net: switch net_secret key generation to net_get_random_once	Hannes Frederic Sowa
	Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19	tcp: switch tcp_fastopen key generation to net_get_random_once	Hannes Frederic Sowa
	Changed key initialization of tcp_fastopen cookies to net_get_random_once. If the user sets a custom key net_get_random_once must be called at least once to ensure we don't overwrite the user provided key when the first cookie is generated later on. Cc: Yuchung Cheng <ycheng@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19	inet: convert inet_ehash_secret and ipv6_hash_secret to net_get_random_once	Hannes Frederic Sowa
	Initialize the ehash and ipv6_hash_secrets with net_get_random_once. Each compilation unit gets its own secret now: ipv4/inet_hashtables.o ipv4/udp.o ipv6/inet6_hashtables.o ipv6/udp.o rds/connection.o The functions still get inlined into the hashing functions. In the fast path we have at most two (needed in ipv6) if (unlikely(...)). Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19	inet: split syncookie keys for ipv4 and ipv6 and initialize with ↵	Hannes Frederic Sowa
	net_get_random_once This patch splits the secret key for syncookies for ipv4 and ipv6 and initializes them with net_get_random_once. This change was the reason I did this series. I think the initialization of the syncookie_secret is way to early. Cc: Florian Westphal <fw@strlen.de> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19	net: introduce new macro net_get_random_once	Hannes Frederic Sowa
	net_get_random_once is a new macro which handles the initialization of secret keys. It is possible to call it in the fast path. Only the initialization depends on the spinlock and is rather slow. Otherwise it should get used just before the key is used to delay the entropy extration as late as possible to get better randomness. It returns true if the key got initialized. The usage of static_keys for net_get_random_once is a bit uncommon so it needs some further explanation why this actually works: === In the simple non-HAVE_JUMP_LABEL case we actually have === no constrains to use static_key_(true\|false) on keys initialized with STATIC_KEY_INIT_(FALSE\|TRUE). So this path just expands in favor of the likely case that the initialization is already done. The key is initialized like this: ___done_key = { .enabled = ATOMIC_INIT(0) } The check if (!static_key_true(&___done_key)) \ expands into (pseudo code) if (!likely(___done_key > 0)) , so we take the fast path as soon as ___done_key is increased from the helper function. === If HAVE_JUMP_LABELs are available this depends === on patching of jumps into the prepared NOPs, which is done in jump_label_init at boot-up time (from start_kernel). It is forbidden and dangerous to use net_get_random_once in functions which are called before that! At compilation time NOPs are generated at the call sites of net_get_random_once. E.g. net/ipv6/inet6_hashtable.c:inet6_ehashfn (we need to call net_get_random_once two times in inet6_ehashfn, so two NOPs): 71: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 76: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) Both will be patched to the actual jumps to the end of the function to call __net_get_random_once at boot time as explained above. arch_static_branch is optimized and inlined for false as return value and actually also returns false in case the NOP is placed in the instruction stream. So in the fast case we get a "return false". But because we initialize ___done_key with (enabled != (entries & 1)) this call-site will get patched up at boot thus returning true. The final check looks like this: if (!static_key_true(&___done_key)) \ ___ret = __net_get_random_once(buf, \ expands to if (!!static_key_false(&___done_key)) \ ___ret = __net_get_random_once(buf, \ So we get true at boot time and as soon as static_key_slow_inc is called on the key it will invert the logic and return false for the fast path. static_key_slow_inc will change the branch because it got initialized with .enabled == 0. After static_key_slow_inc is called on the key the branch is replaced with a nop again. === Misc: === The helper defers the increment into a workqueue so we don't have problems calling this code from atomic sections. A seperate boolean (___done) guards the case where we enter net_get_random_once again before the increment happend. Cc: Ingo Molnar <mingo@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Baron <jbaron@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19	ipv6: split inet6_ehashfn to hash functions per compilation unit	Hannes Frederic Sowa
	This patch splits the inet6_ehashfn into separate ones in ipv6/inet6_hashtables.o and ipv6/udp.o to ease the introduction of seperate secrets keys later. Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19	ipv4: split inet_ehashfn to hash functions per compilation unit	Hannes Frederic Sowa
	This duplicates a bit of code but let's us easily introduce separate secret keys later. The separate compilation units are ipv4/inet_hashtabbles.o, ipv4/udp.o and rds/connection.o. Cc: Eric Dumazet <edumazet@google.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19	ipip: add GSO/TSO support	Eric Dumazet
	Now inet_gso_segment() is stackable, its relatively easy to implement GSO/TSO support for IPIP Performance results, when segmentation is done after tunnel device (as no NIC is yet enabled for TSO IPIP support) : Before patch : lpq83:~# ./netperf -H 7.7.9.84 -Cc MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 3357.88 5.09 3.70 2.983 2.167 After patch : lpq83:~# ./netperf -H 7.7.9.84 -Cc MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 7710.19 4.52 6.62 1.152 1.687 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19	ipv4: gso: make inet_gso_segment() stackable	Eric Dumazet
	In order to support GSO on IPIP, we need to make inet_gso_segment() stackable. It should not assume network header starts right after mac header. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19	ipv4: generalize gre_handle_offloads	Eric Dumazet
	This patch makes gre_handle_offloads() more generic and rename it to iptunnel_handle_offloads() This will be used to add GSO/TSO support to IPIP tunnels. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-19	net: generalize skb_segment()	Eric Dumazet
	While implementing GSO/TSO support for IPIP, I found skb_segment() was assuming network header was immediately following mac header. Its not really true in the case inet_gso_segment() is stacked : By the time tcp_gso_segment() is called, network header points to the inner IP header. Let's instead assume nothing and pick the current offsets found in original skb, we have skb_headers_offset_update() helper for that. Also move the csum_start update inside skb_headers_offset_update() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>