summaryrefslogtreecommitdiff
path: root/net/ipv4/xfrm4_input.c
AgeCommit message (Collapse)Author
2015-09-17netfilter: Pass net into okfnEric W. Biederman
This is immediately motivated by the bridge code that chains functions that call into netfilter. Without passing net into the okfns the bridge code would need to guess about the best expression for the network namespace to process packets in. As net is frequently one of the first things computed in continuation functions after netfilter has done it's job passing in the desired network namespace is in many cases a code simplification. To support this change the function dst_output_okfn is introduced to simplify passing dst_output as an okfn. For the moment dst_output_okfn just silently drops the struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17netfilter: Pass struct net into the netfilter hooksEric W. Biederman
Pass a network namespace parameter into the netfilter hooks. At the call site of the netfilter hooks the path a packet is taking through the network stack is well known which allows the network namespace to be easily and reliabily. This allows the replacement of magic code like "dev_net(state->in?:state->out)" that appears at the start of most netfilter hooks with "state->net". In almost all cases the network namespace passed in is derived from the first network device passed in, guaranteeing those paths will not see any changes in practice. The exceptions are: xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm) ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp) ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp) ipv4/raw.c:raw_send_hdrinc() sock_net(sk) ipv6/ip6_output.c:ip6_xmit() sock_net(sk) ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev) ipv6/raw.c:raw6_send_hdrinc() sock_net(sk) br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev In all cases these exceptions seem to be a better expression for the network namespace the packet is being processed in then the historic "dev_net(in?in:out)". I am documenting them in case something odd pops up and someone starts trying to track down what happened. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-07netfilter: Pass socket pointer down through okfn().David Miller
On the output paths in particular, we have to sometimes deal with two socket contexts. First, and usually skb->sk, is the local socket that generated the frame. And second, is potentially the socket used to control a tunneling socket, such as one the encapsulates using UDP. We do not want to disassociate skb->sk when encapsulating in order to fix this, because that would break socket memory accounting. The most extreme case where this can cause huge problems is an AF_PACKET socket transmitting over a vxlan device. We hit code paths doing checks that assume they are dealing with an ipv4 socket, but are actually operating upon the AF_PACKET one. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-04-03ipv4: coding style: comparison for equality with NULLIan Morris
The ipv4 code uses a mixture of coding styles. In some instances check for NULL pointer is done as x == NULL and sometimes as !x. !x is preferred according to checkpatch and this patch makes the code consistent by adopting the latter form. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-25xfrm4: Add IPsec protocol multiplexerSteffen Klassert
This patch add an IPsec protocol multiplexer. With this it is possible to add alternative protocol handlers as needed for IPsec virtual tunnel interfaces. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2013-02-15net: Add skb_unclone() helper function.Pravin B Shelar
This function will be used in next GRE_GSO patch. This patch does not change any functionality. Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com>
2012-07-26ipv4: Fix input route performance regression.David S. Miller
With the routing cache removal we lost the "noref" code paths on input, and this can kill some routing workloads. Reinstate the noref path when we hit a cached route in the FIB nexthops. With help from Eric Dumazet. Reported-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-20ipv4: Kill ip_route_input_noref().David Miller
The "noref" argument to ip_route_input_common() is now always ignored because we do not cache routes, and in that case we must always grab a reference to the resulting 'dst'. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-27Revert "ipv4: tcp: dont cache unconfirmed intput dst"David S. Miller
This reverts commit c074da2810c118b3812f32d6754bd9ead2f169e7. This change has several unwanted side effects: 1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll thus never create a real cached route. 2) All TCP traffic will use DST_NOCACHE and never use the routing cache at all. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-27ipv4: tcp: dont cache unconfirmed intput dstEric Dumazet
DDOS synflood attacks hit badly IP route cache. On typical machines, this cache is allowed to hold up to 8 Millions dst entries, 256 bytes for each, for a total of 2GB of memory. rt_garbage_collect() triggers and tries to cleanup things. Eventually route cache is disabled but machine is under fire and might OOM and crash. This patch exploits the new TCP early demux, to set a nocache boolean in case incoming TCP frame is for a not yet ESTABLISHED or TIMEWAIT socket. This 'nocache' boolean is then used in case dst entry is not found in route cache, to create an unhashed dst entry (DST_NOCACHE) SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache output dst for syncookies), so after this patch, a machine is able to absorb a DDOS synflood attack without polluting its IP route cache. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-07-12net/ipv4: EXPORT_SYMBOL cleanupsEric Dumazet
CodingStyle cleanups EXPORT_SYMBOL should immediately follow the symbol declaration. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17net: Use ip_route_input_noref() in input pathEric Dumazet
Use ip_route_input_noref() in ip fast path, to avoid two atomic ops per incoming packet. Note: loopback is excluded from this optimization in ip_rcv_finish() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-20Merge branch 'master' of /repos/git/net-next-2.6Patrick McHardy
Conflicts: Documentation/feature-removal-schedule.txt net/ipv6/netfilter/ip6t_REJECT.c net/netfilter/xt_limit.c Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-25netfilter: ipv4: use NFPROTO values for NF_HOOK invocationJan Engelhardt
The semantic patch that was used: // <smpl> @@ @@ (NF_HOOK |NF_HOOK_COND |nf_hook )( -PF_INET, +NFPROTO_IPV4, ...) // </smpl> Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
2009-06-03net: skb->dst accessorsEric Dumazet
Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-26ipsec: Remove useless ret variableHerbert Xu
This patch removes a useless ret variable from the IPv4 ESP/UDP decapsulation code. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPSEC]: Fix transport-mode async resume on intput without netfilterHerbert Xu
When netfilter is off the transport-mode async resumption doesn't work because we don't push back the IP header. This patch fixes that by moving most of the code outside of ifdef NETFILTER since the only part that's not common is the short-circuit in the protocol handler. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPSEC]: Use the correct family for input state lookupHerbert Xu
When merging the input paths of IPsec I accidentally left a hard-coded AF_INET for the state lookup call. This broke IPv6 obviously. This patch fixes by getting the input callers to specify the family through skb->cb. Credit goes to Kazunori Miyazawa for diagnosing this and providing an initial patch. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[NETFILTER]: Introduce NF_INET_ hook valuesPatrick McHardy
The IPv4 and IPv6 hook values are identical, yet some code tries to figure out the "correct" value by looking at the address family. Introduce NF_INET_* values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__ section for userspace compatibility. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPSEC]: Add async resume support on inputHerbert Xu
This patch adds support for async resumptions on input. To do so, the transform would return -EINPROGRESS and subsequently invoke the function xfrm_input_resume to resume processing. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPSEC]: Remove nhoff from xfrm_inputHerbert Xu
The nhoff field isn't actually necessary in xfrm_input. For tunnel mode transforms we now throw away the output IP header so it makes no sense to fill in the nexthdr field. For transport mode we can now let the function transport_finish do the setting and it knows where the nexthdr field is. The only other thing that needs the nexthdr field to be set is the header extraction code. However, we can simply move the protocol extraction out of the generic header extraction. We want to minimise the amount of info we have to carry around between transforms as this simplifies the resumption process for async crypto. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPSEC]: Merge most of the input pathHerbert Xu
As part of the work on asynchronous cryptographic operations, we need to be able to resume from the spot where they occur. As such, it helps if we isolate them to one spot. This patch moves most of the remaining family-specific processing into the common input code. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-01-28[IPSEC]: Separate inner/outer mode processing on inputHerbert Xu
With inter-family transforms the inner mode differs from the outer mode. Attempting to handle both sides from the same function means that it needs to handle both IPv4 and IPv6 which creates duplication and confusion. This patch separates the two parts on the input path so that each function deals with one family only. In particular, the functions xfrm4_extract_inut/xfrm6_extract_inut moves the pertinent fields from the IPv4/IPv6 IP headers into a neutral format stored in skb->cb. This is then used by the inner mode input functions to modify the inner IP header. In this way the input function no longer has to know about the outer address family. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17[IPSEC]: Rename mode to outer_mode and add inner_modeHerbert Xu
This patch adds a new field to xfrm states called inner_mode. The existing mode object is renamed to outer_mode. This is the first part of an attempt to fix inter-family transforms. As it is we always use the outer family when determining which mode to use. As a result we may end up shoving IPv4 packets into netfilter6 and vice versa. What we really want is to use the inner family for the first part of outbound processing and the outer family for the second part. For inbound processing we'd use the opposite pairing. I've also added a check to prevent silly combinations such as transport mode with inter-family transforms. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17[IPSEC]: Add missing BEET checksHerbert Xu
Currently BEET mode does not reinject the packet back into the stack like tunnel mode does. Since BEET should behave just like tunnel mode this is incorrect. This patch fixes this by introducing a flags field to xfrm_mode that tells the IPsec code whether it should terminate and reinject the packet back into the stack. It then sets the flag for BEET and tunnel mode. I've also added a number of missing BEET checks elsewhere where we check whether a given mode is a tunnel or not. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-17[IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_inputHerbert Xu
This patch moves the tunnel parsing for IPv4 out of xfrm4_input and into xfrm4_tunnel. This change is in line with what IPv6 does and will allow us to merge the two input functions. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-10-10[IPSEC]: Move IP protocol setting from transforms into xfrm4_input.cHerbert Xu
This patch makes the IPv4 x->type->input functions return the next protocol instead of setting it directly. This is identical to how we do things in IPv6 and will help us merge common code on the input path. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10[UDP]: Cleanup UDP encapsulation codeJames Chapman
This cleanup fell out after adding L2TP support where a new encap_rcv funcptr was added to struct udp_sock. Have XFRM use the new encap_rcv funcptr, which allows us to move the XFRM encap code from udp.c into xfrm4_input.c. Make xfrm4_rcv_encap() static since it is no longer called externally. Signed-off-by: James Chapman <jchapman@katalix.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-31[IPSEC]: Fix panic when using inter address familiy IPsec on loopback.Kazunori MIYAZAWA
Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce ip_hdr(), remove skb->nh.iphArnaldo Carvalho de Melo
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-25[SK_BUFF]: Introduce skb_network_header()Arnaldo Carvalho de Melo
For the places where we need a pointer to the network header, it is still legal to touch skb->nh.raw directly if just adding to, subtracting from or setting it to another layer header. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-13[IPSEC]: Changing API of xfrm4_tunnel_register.Kazunori MIYAZAWA
This patch changes xfrm4_tunnel register and deregister interface to prepare for solving the conflict of device tunnels with inter address family IPsec tunnel. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10[NET] IPV4: Fix whitespace errors.YOSHIFUJI Hideaki
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-28[XFRM]: xfrm_parse_spi() annotationsAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-09-22[XFRM]: Add XFRM_MODE_xxx for future use.Masahide NAKAMURA
Transformation mode is used as either IPsec transport or tunnel. It is required to add two more items, route optimization and inbound trigger for Mobile IPv6. Based on MIPL2 kernel patch. This patch was also written by: Ville Nuorvala <vnuorval@tcs.hut.fi> Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-06-17[IPSEC] xfrm: Abstract out encapsulation modesHerbert Xu
This patch adds the structure xfrm_mode. It is meant to represent the operations carried out by transport/tunnel modes. By doing this we allow additional encapsulation modes to be added without clogging up the xfrm_input/xfrm_output paths. Candidate modes include 4-to-6 tunnel mode, 6-to-4 tunnel mode, and BEET modes. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-09[INET]: Move no-tunnel ICMP error to tunnel4/tunnel6Herbert Xu
This patch moves the sending of ICMP messages when there are no IPv4/IPv6 tunnels present to tunnel4/tunnel6 respectively. Please note that for now if xfrm4_tunnel/xfrm6_tunnel is loaded then no ICMP messages will ever be sent. This is similar to how we handle AH/ESP/IPCOMP. This move fixes the bug where we always send an ICMP message when there is no ip6_tunnel device present for a given packet even if it is later handled by IPsec. It also causes ICMP messages to be sent when no IPIP tunnel is present. I've decided to use the "port unreachable" ICMP message over the current value of "address unreachable" (and "protocol unreachable" by GRE) because it is not ambiguous unlike the other ones which can be triggered by other conditions. There seems to be no standard specifying what value must be used so this change should be OK. In fact we should change GRE to use this value as well. Incidentally, this patch also fixes a fairly serious bug in xfrm6_tunnel where we don't check whether the embedded IPv6 header is present before dereferencing it for the inside source address. This patch is inspired by a previous patch by Hugo Santos <hsantos@av.it.pt>. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-09[IPSEC]: Check x->encap before dereferencing itHerbert Xu
We need to dereference x->encap before dereferencing it for encap_type. If it's absent then the encap_type is zero. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-01[IPSEC]: Kill unused decap state structureHerbert Xu
This patch removes the *_decap_state structures which were previously used to share state between input/post_input. This is no longer needed. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-04-01[IPSEC]: Kill unused decap state argumentHerbert Xu
This patch removes the decap_state argument from the xfrm input hook. Previously this function allowed the input hook to share state with the post_input hook. The latter has since been removed. The only purpose for it now is to check the encap type. However, it is easier and better to move the encap type check to the generic xfrm_rcv function. This allows us to get rid of the decap state argument altogether. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2006-01-07[IPV4/6]: Netfilter IPsec input hooksPatrick McHardy
When the innermost transform uses transport mode the decapsulated packet is not visible to netfilter. Pass the packet through the PRE_ROUTING and LOCAL_IN hooks again before handing it to upper layer protocols to make netfilter-visibility symetrical to the output path. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2005-04-16Linux-2.6.12-rc2Linus Torvalds
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!