summaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)Author
2014-08-02ipv4: remove nested rcu_read_lock/unlockDuan Jiong
ip_local_deliver_finish() already have a rcu_read_lock/unlock, so the rcu_read_lock/unlock is unnecessary. See the stack below: ip_local_deliver_finish | | ->icmp_rcv | | ->icmp_socket_deliver Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-02Merge branch 'next' of git://git.infradead.org/users/pcmoore/selinux into nextJames Morris
2014-08-01netlabel: shorter names for the NetLabel catmap funcs/structsPaul Moore
Historically the NetLabel LSM secattr catmap functions and data structures have had very long names which makes a mess of the NetLabel code and anyone who uses NetLabel. This patch renames the catmap functions and structures from "*_secattr_catmap_*" to just "*_catmap_*" which improves things greatly. There are no substantial code or logic changes in this patch. Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com>
2014-08-01netlabel: fix the horribly broken catmap functionsPaul Moore
The NetLabel secattr catmap functions, and the SELinux import/export glue routines, were broken in many horrible ways and the SELinux glue code fiddled with the NetLabel catmap structures in ways that we probably shouldn't allow. At some point this "worked", but that was likely due to a bit of dumb luck and sub-par testing (both inflicted by yours truly). This patch corrects these problems by basically gutting the code in favor of something less obtuse and restoring the NetLabel abstractions in the SELinux catmap glue code. Everything is working now, and if it decides to break itself in the future this code will be much easier to debug than the code it replaces. One noteworthy side effect of the changes is that it is no longer necessary to allocate a NetLabel catmap before calling one of the NetLabel APIs to set a bit in the catmap. NetLabel will automatically allocate the catmap nodes when needed, resulting in less allocations when the lowest bit is greater than 255 and less code in the LSMs. Cc: stable@vger.kernel.org Reported-by: Christian Evans <frodox@zoho.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com>
2014-08-01netlabel: fix a problem when setting bits below the previously lowest bitPaul Moore
The NetLabel category (catmap) functions have a problem in that they assume categories will be set in an increasing manner, e.g. the next category set will always be larger than the last. Unfortunately, this is not a valid assumption and could result in problems when attempting to set categories less than the startbit in the lowest catmap node. In some cases kernel panics and other nasties can result. This patch corrects the problem by checking for this and allocating a new catmap node instance and placing it at the front of the list. Cc: stable@vger.kernel.org Reported-by: Christian Evans <frodox@zoho.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com>
2014-07-31net: use inet6_iif instead of IP6CB()->iifDuan Jiong
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31net: fix the counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORSDuan Jiong
When dealing with ICMPv[46] Error Message, function icmp_socket_deliver() and icmpv6_notify() do some valid checks on packet's length, but then some protocols check packet's length redaudantly. So remove those duplicated statements, and increase counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS in function icmp_socket_deliver() and icmpv6_notify() respectively. In addition, add missed counter in udp6/udplite6 when socket is NULL. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains netfilter updates for net-next, they are: 1) Add the reject expression for the nf_tables bridge family, this allows us to send explicit reject (TCP RST / ICMP dest unrech) to the packets matching a rule. 2) Simplify and consolidate the nf_tables set dumping logic. This uses netlink control->data to filter out depending on the request. 3) Perform garbage collection in xt_hashlimit using a workqueue instead of a timer, which is problematic when many entries are in place in the tables, from Eric Dumazet. 4) Remove leftover code from the removed ulog target support, from Paul Bolle. 5) Dump unmodified flags in the netfilter packet accounting when resetting counters, so userspace knows that a counter was in overquota situation, from Alexey Perevalov. 6) Fix wrong usage of the bitwise functions in nfnetlink_acct, also from Alexey. 7) Fix a crash when adding new set element with an empty NFTA_SET_ELEM_LIST attribute. This patchset also includes a couple of cleanups for xt_LED from Duan Jiong and for nf_conntrack_ipv4 (using coccinelle) from Himangi Saraogi. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31tcp: don't require root to read tcp_metricsBanerjee, Debabrata
commit d23ff7016 (tcp: add generic netlink support for tcp_metrics) introduced netlink support for the new tcp_metrics, however it restricted getting of tcp_metrics to root user only. This is a change from how these values could have been fetched when in the old route cache. Unless there's a legitimate reason to restrict the reading of these values it would be better if normal users could fetch them. Cc: Julian Anastasov <ja@ssi.bg> Cc: linux-kernel@vger.kernel.org Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2014-07-30 This is the last pull request for ipsec-next before I'll be off for two weeks starting on friday. David, can you please take urgent ipsec patches directly into net/net-next during this time? 1) Error handling simplifications for vti and vti6. From Mathias Krause. 2) Remove a duplicate semicolon after a return statement. From Christoph Paasch. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30tcp: Fix integer-overflow in TCP vegasChristoph Paasch
In vegas we do a multiplication of the cwnd and the rtt. This may overflow and thus their result is stored in a u64. However, we first need to cast the cwnd so that actually 64-bit arithmetic is done. Then, we need to do do_div to allow this to be used on 32-bit arches. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Doug Leith <doug.leith@nuim.ie> Fixes: 8d3a564da34e (tcp: tcp_vegas cong avoid fix) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30tcp: Fix integer-overflows in TCP venoChristoph Paasch
In veno we do a multiplication of the cwnd and the rtt. This may overflow and thus their result is stored in a u64. However, we first need to cast the cwnd so that actually 64-bit arithmetic is done. A first attempt at fixing 76f1017757aa0 ([TCP]: TCP Veno congestion control) was made by 159131149c2 (tcp: Overflow bug in Vegas), but it failed to add the required cast in tcp_veno_cong_avoid(). Fixes: 76f1017757aa0 ([TCP]: TCP Veno congestion control) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30ip_tunnel(ipv4): fix tunnels with "local any remote $remote_ip"Dmitry Popov
Ipv4 tunnels created with "local any remote $ip" didn't work properly since 7d442fab0 (ipv4: Cache dst in tunnels). 99% of packets sent via those tunnels had src addr = 0.0.0.0. That was because only dst_entry was cached, although fl4.saddr has to be cached too. Every time ip_tunnel_xmit used cached dst_entry (tunnel_rtable_get returned non-NULL), fl4.saddr was initialized with tnl_params->saddr (= 0 in our case), and wasn't changed until iptunnel_xmit(). This patch adds saddr to ip_tunnel->dst_cache, fixing this issue. Reported-by: Sergey Popov <pinkbyte@gentoo.org> Signed-off-by: Dmitry Popov <ixaphire@qrator.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-29ipv4: clean up cast warning in do_ip_getsockoptKaroly Kemeny
Sparse warns because of implicit pointer cast. v2: subject line correction, space between "void" and "*" Signed-off-by: Karoly Kemeny <karoly.kemeny@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-29net/udp_offload: Use IS_ERR_OR_NULLHimangi Saraogi
This patch introduces the use of the macro IS_ERR_OR_NULL in place of tests for NULL and IS_ERR. The following Coccinelle semantic patch was used for making the change: @@ expression e; @@ - e == NULL || IS_ERR(e) + IS_ERR_OR_NULL(e) || ... Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-29net/ipv4: Use IS_ERR_OR_NULLHimangi Saraogi
This patch introduces the use of the macro IS_ERR_OR_NULL in place of tests for NULL and IS_ERR. The following Coccinelle semantic patch was used for making the change: @@ expression e; @@ - e == NULL || IS_ERR(e) + IS_ERR_OR_NULL(e) || ... Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-29ipv4: fail early when creating netdev named all or defaultWANG Cong
We create a proc dir for each network device, this will cause conflicts when the devices have name "all" or "default". Rather than emitting an ugly kernel warning, we could just fail earlier by checking the device name. Reported-by: Stephane Chazelas <stephane.chazelas@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-28ip: make IP identifiers less predictableEric Dumazet
In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and Jedidiah describe ways exploiting linux IP identifier generation to infer whether two machines are exchanging packets. With commit 73f156a6e8c1 ("inetpeer: get rid of ip_id_count"), we changed IP id generation, but this does not really prevent this side-channel technique. This patch adds a random amount of perturbation so that IP identifiers for a given destination [1] are no longer monotonically increasing after an idle period. Note that prandom_u32_max(1) returns 0, so if generator is used at most once per jiffy, this patch inserts no hole in the ID suite and do not increase collision probability. This is jiffies based, so in the worst case (HZ=1000), the id can rollover after ~65 seconds of idle time, which should be fine. We also change the hash used in __ip_select_ident() to not only hash on daddr, but also saddr and protocol, so that ICMP probes can not be used to infer information for other protocols. For IPv6, adds saddr into the hash as well, but not nexthdr. If I ping the patched target, we can see ID are now hard to predict. 21:57:11.008086 IP (...) A > target: ICMP echo request, seq 1, length 64 21:57:11.010752 IP (... id 2081 ...) target > A: ICMP echo reply, seq 1, length 64 21:57:12.013133 IP (...) A > target: ICMP echo request, seq 2, length 64 21:57:12.015737 IP (... id 3039 ...) target > A: ICMP echo reply, seq 2, length 64 21:57:13.016580 IP (...) A > target: ICMP echo request, seq 3, length 64 21:57:13.019251 IP (... id 3437 ...) target > A: ICMP echo reply, seq 3, length 64 [1] TCP sessions uses a per flow ID generator not changed by this patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jeffrey Knockel <jeffk@cs.unm.edu> Reported-by: Jedidiah R. Crandall <crandall@cs.unm.edu> Cc: Willy Tarreau <w@1wt.eu> Cc: Hannes Frederic Sowa <hannes@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: set limits and make init_net's high_thresh limit globalNikolay Aleksandrov
This patch makes init_net's high_thresh limit to be the maximum for all namespaces, thus introducing a global memory limit threshold equal to the sum of the individual high_thresh limits which are capped. It also introduces some sane minimums for low_thresh as it shouldn't be able to drop below 0 (or > high_thresh in the unsigned case), and overall low_thresh should not ever be above high_thresh, so we make the following relations for a namespace: init_net: high_thresh - max(not capped), min(init_net low_thresh) low_thresh - max(init_net high_thresh), min (0) all other namespaces: high_thresh = max(init_net high_thresh), min(namespace's low_thresh) low_thresh = max(namespace's high_thresh), min(0) The major issue with having low_thresh > high_thresh is that we'll schedule eviction but never evict anything and thus rely only on the timers. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: use seqlock for hash rebuildFlorian Westphal
rehash is rare operation, don't force readers to take the read-side rwlock. Instead, we only have to detect the (rare) case where the secret was altered while we are trying to insert a new inetfrag queue into the table. If it was changed, drop the bucket lock and recompute the hash to get the 'new' chain bucket that we have to insert into. Joint work with Nikolay Aleksandrov. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: remove periodic secret rebuild timerFlorian Westphal
merge functionality into the eviction workqueue. Instead of rebuilding every n seconds, take advantage of the upper hash chain length limit. If we hit it, mark table for rebuild and schedule workqueue. To prevent frequent rebuilds when we're completely overloaded, don't rebuild more than once every 5 seconds. ipfrag_secret_interval sysctl is now obsolete and has been marked as deprecated, it still can be changed so scripts won't be broken but it won't have any effect. A comment is left above each unused secret_timer variable to avoid confusion. Joint work with Nikolay Aleksandrov. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: remove lru listFlorian Westphal
no longer used. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: don't account number of fragment queuesFlorian Westphal
The 'nqueues' counter is protected by the lru list lock, once thats removed this needs to be converted to atomic counter. Given this isn't used for anything except for reporting it to userspace via /proc, just remove it. We still report the memory currently used by fragment reassembly queues. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: move eviction of queues to work queueFlorian Westphal
When the high_thresh limit is reached we try to toss the 'oldest' incomplete fragment queues until memory limits are below the low_thresh value. This happens in softirq/packet processing context. This has two drawbacks: 1) processors might evict a queue that was about to be completed by another cpu, because they will compete wrt. resource usage and resource reclaim. 2) LRU list maintenance is expensive. But when constantly overloaded, even the 'least recently used' element is recent, so removing 'lru' queue first is not 'fairer' than removing any other fragment queue. This moves eviction out of the fast path: When the low threshold is reached, a work queue is scheduled which then iterates over the table and removes the queues that exceed the memory limits of the namespace. It sets a new flag called INET_FRAG_EVICTED on the evicted queues so the proper counters will get incremented when the queue is forcefully expired. When the high threshold is reached, no more fragment queues are created until we're below the limit again. The LRU list is now unused and will be removed in a followup patch. Joint work with Nikolay Aleksandrov. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: move evictor calls into frag_find functionFlorian Westphal
First step to move eviction handling into a work queue. We lose two spots that accounted evicted fragments in MIB counters. Accounting will be restored since the upcoming work-queue evictor invokes the frag queue timer callbacks instead. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: remove hash size assumptions from callersFlorian Westphal
hide actual hash size from individual users: The _find function will now fold the given hash value into the required range. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-27inet: frag: constify match, hashfn and constructor argumentsFlorian Westphal
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-25netfilter: kill remnants of ulog targetsPaul Bolle
The ulog targets were recently killed. A few references to the Kconfig macros CONFIG_IP_NF_TARGET_ULOG and CONFIG_BRIDGE_EBT_ULOG were left untouched. Kill these too. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-07-25netfilter: nf_conntrack: remove exceptional & on function nameHimangi Saraogi
In this file, function names are otherwise used as pointers without &. A simplified version of the Coccinelle semantic patch that makes this change is as follows: // <smpl> @r@ identifier f; @@ f(...) { ... } @@ identifier r.f; @@ - &f + f // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-07-24igmp: remove exceptional & on function nameHimangi Saraogi
In this file, function names are otherwise used as pointers without &. A simplified version of the Coccinelle semantic patch that makes this change is as follows: // <smpl> @r@ identifier f; @@ f(...) { ... } @@ identifier r.f; @@ - &f + f // </smpl> Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-23ipv4: Make IP_MULTICAST_ALL and IP_MSFILTER work on raw socketsQuentin Armitage
Currently, although IP_MULTICAST_ALL and IP_MSFILTER ioctl calls succeed on raw sockets, there is no code to implement the functionality on received packets; it is only implemented for UDP sockets. The raw(7) man page states: "In addition, all ip(7) IPPROTO_IP socket options valid for datagram sockets are supported", which implies these ioctls should work on raw sockets. To fix this, add a call to ip_mc_sf_allow on raw sockets. This should not break any existing code, since the current position of not calling ip_mc_sf_filter makes it behave as if neither the IP_MULTICAST_ALL nor the IP_MSFILTER ioctl had been called. Adding the call to ip_mc_sf_allow will therefore maintain the current behaviour so long as IP_MULTICAST_ALL and IP_MSFILTER ioctls are not called. Any code that currently is calling IP_MULTICAST_ALL or IP_MSFILTER ioctls on raw sockets presumably is wanting the filter to be applied, although no filtering will currently be occurring. Signed-off-by: Quentin Armitage <quentin@armitage.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-23sock: remove skb argument from sk_rcvqueues_fullSorin Dumitru
It hasn't been used since commit 0fd7bac(net: relax rcvbuf limits). Signed-off-by: Sorin Dumitru <sorin@returnze.ro> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/infiniband/hw/cxgb4/device.c The cxgb4 conflict was simply overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-21ipv4: fix buffer overflow in ip_options_compile()Eric Dumazet
There is a benign buffer overflow in ip_options_compile spotted by AddressSanitizer[1] : Its benign because we always can access one extra byte in skb->head (because header is followed by struct skb_shared_info), and in this case this byte is not even used. [28504.910798] ================================================================== [28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile [28504.913170] Read of size 1 by thread T15843: [28504.914026] [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0 [28504.915394] [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120 [28504.916843] [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.918175] [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0 [28504.919490] [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90 [28504.920835] [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70 [28504.922208] [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140 [28504.923459] [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b [28504.924722] [28504.925106] Allocated by thread T15843: [28504.925815] [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120 [28504.926884] [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630 [28504.927975] [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0 [28504.929175] [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90 [28504.930400] [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70 [28504.931677] [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140 [28504.932851] [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b [28504.934018] [28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right [28504.934377] of 40-byte region [ffff880026382800, ffff880026382828) [28504.937144] [28504.937474] Memory state around the buggy address: [28504.938430] ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr [28504.939884] ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.941294] ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.942504] ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.943483] ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr [28504.945573] ^ [28504.946277] ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.094949] ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.096114] ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.097116] ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.098472] ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr [28505.099804] Legend: [28505.100269] f - 8 freed bytes [28505.100884] r - 8 redzone bytes [28505.101649] . - 8 allocated bytes [28505.102406] x=1..7 - x allocated bytes + (8-x) redzone bytes [28505.103637] ================================================================== [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains updates for your net-next tree, they are: 1) Use kvfree() helper function from x_tables, from Eric Dumazet. 2) Remove extra timer from the conntrack ecache extension, use a workqueue instead to redeliver lost events to userspace instead, from Florian Westphal. 3) Removal of the ulog targets for ebtables and iptables. The nflog infrastructure superseded this almost 9 years ago, time to get rid of this code. 4) Replace the list of loggers by an array now that we can only have two possible non-overlapping logger flavours, ie. kernel ring buffer and netlink logging. 5) Move Eric Dumazet's log buffer code to nf_log to reuse it from all of the supported per-family loggers. 6) Consolidate nf_log_packet() as an unified interface for packet logging. After this patch, if the struct nf_loginfo is available, it explicitly selects the logger that is used. 7) Move ip and ip6 logging code from xt_LOG to the corresponding per-family loggers. Thus, x_tables and nf_tables share the same code for packet logging. 8) Add generic ARP packet logger, which is used by nf_tables. The format aims to be consistent with the output of xt_LOG. 9) Add generic bridge packet logger. Again, this is used by nf_tables and it routes the packets to the real family loggers. As a result, we get consistent logging format for the bridge family. The ebt_log logging code has been intentionally left in place not to break backward compatibility since the logging output differs from xt_LOG. 10) Update nft_log to explicitly request the required family logger when needed. 11) Finish nft_log so it supports arp, ip, ip6, bridge and inet families. Allowing selection between netlink and kernel buffer ring logging. 12) Several fixes coming after the netfilter core logging changes spotted by robots. 13) Use IS_ENABLED() macros whenever possible in the netfilter tree, from Duan Jiong. 14) Removal of a couple of unnecessary branch before kfree, from Fabian Frederick. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16udp: Use hash2 for long hash1 chains in __udp*_lib_mcast_deliver.David Held
Many multicast sources can have the same port which can result in a very large list when hashing by port only. Hash by address and port instead if this is the case. This makes multicast more similar to unicast. On a 24-core machine receiving from 500 multicast sockets on the same port, before this patch 80% of system CPU was used up by spin locking and only ~25% of packets were successfully delivered. With this patch, all packets are delivered and kernel overhead is ~8% system CPU on spinlocks. Signed-off-by: David Held <drheld@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16udp: Simplify __udp*_lib_mcast_deliver.David Held
Switch to using sk_nulls_for_each which shortens the code and makes it easier to update. Signed-off-by: David Held <drheld@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16net-gre-gro: Fix a bug that breaks the forwarding pathJerry Chu
Fixed a bug that was introduced by my GRE-GRO patch (bf5a755f5e9186406bbf50f4087100af5bd68e40 net-gre-gro: Add GRE support to the GRO stack) that breaks the forwarding path because various GSO related fields were not set. The bug will cause on the egress path either the GSO code to fail, or a GRE-TSO capable (NETIF_F_GSO_GRE) NICs to choke. The following fix has been tested for both cases. Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15net-timestamp: SOCK_RAW and PING timestampingWillem de Bruijn
Add SO_TIMESTAMPING to sockets of type PF_INET[6]/SOCK_RAW: Add the necessary sock_tx_timestamp calls to the datapath for RAW sockets (ping sockets already had these calls). Fix the IP output path to pass the timestamp flags on the first fragment also for these sockets. The existing code relies on transhdrlen != 0 to indicate a first fragment. For these sockets, that assumption does not hold. This fixes http://bugzilla.kernel.org/show_bug.cgi?id=77221 Tested SOCK_RAW on IPv4 and IPv6, not PING. Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15tcp: Remove unnecessary arg from tcp_enter_cwr and tcp_init_cwnd_reductionChristoph Paasch
Since Yuchung's 9b44190dc11 (tcp: refactor F-RTO), tcp_enter_cwr is always called with set_ssthresh = 1. Thus, we can remove this argument from tcp_enter_cwr. Further, as we remove this one, tcp_init_cwnd_reduction is then always called with set_ssthresh = true, and so we can get rid of this argument as well. Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15net: set name_assign_type in alloc_netdev()Tom Gundersen
Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert all users to pass NET_NAME_UNKNOWN. Coccinelle patch: @@ expression sizeof_priv, name, setup, txqs, rxqs, count; @@ ( -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs) +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs) | -alloc_netdev_mq(sizeof_priv, name, setup, count) +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count) | -alloc_netdev(sizeof_priv, name, setup) +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup) ) v9: move comments here from the wrong commit Signed-off-by: Tom Gundersen <teg@jklm.no> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes()Tejun Heo
Currently, cftypes added by cgroup_add_cftypes() are used for both the unified default hierarchy and legacy ones and subsystems can mark each file with either CFTYPE_ONLY_ON_DFL or CFTYPE_INSANE if it has to appear only on one of them. This is quite hairy and error-prone. Also, we may end up exposing interface files to the default hierarchy without thinking it through. cgroup_subsys will grow two separate cftype addition functions and apply each only on the hierarchies of the matching type. This will allow organizing cftypes in a lot clearer way and encourage subsystems to scrutinize the interface which is being exposed in the new default hierarchy. In preparation, this patch adds cgroup_add_legacy_cftypes() which currently is a simple wrapper around cgroup_add_cftypes() and replaces all cgroup_add_cftypes() usages with it. While at it, this patch drops a completely spurious return from __hugetlb_cgroup_file_init(). This patch doesn't introduce any functional differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2014-07-14udp: Move udp_tunnel_segment into udp_offload.cTom Herbert
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-14udp: Add udp_sock_create for UDP tunnels to open listener socketTom Herbert
Added udp_tunnel.c which can contain some common functions for UDP tunnels. The first function in this is udp_sock_create which is used to open the listener port for a UDP tunnel. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-11ipv4: remove the unnecessary variable in udp_mcast_nextLi RongQing
Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-11GRE: enable offloads for GREAmritha Nambiar
To get offloads to work with Generic Routing Encapsulation (GRE), the outer transport header has to be reset after skb_push is done. This patch has the support for this fix and hence GRE offloading. Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com> Tested-By: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-09ipconfig: Only bootp paths should reference ic_dev_xid.David S. Miller
It is only tested, and declared, in the bootp code. So, in ic_dynamic() guard it's setting with IPCONFIG_BOOTP. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-09ipconfig: move ic_dev_xid under IPCONFIG_BOOTPFabian Frederick
ic_dev_xid is only used in __init ic_bootp_recv under IPCONFIG_BOOTP and __init ic_dynamic under IPCONFIG_DYNAMIC(which is itself defined with the same IPCONFIG_BOOTP) This patch fixes the following warning when IPCONFIG_BOOTP is not set: >> net/ipv4/ipconfig.c:146:15: warning: 'ic_dev_xid' defined but not used [-Wunused-variable] static __be32 ic_dev_xid; /* Device under configuration */ Reported-by: Fengguang Wu <fengguang.wu@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: netdev@vger.kernel.org Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>