summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-02-14bridge: fdb: converge fdb_delete_by functions into oneNikolay Aleksandrov
We can simplify the logic of entries pointing to the bridge by converging the fdb_delete_by functions, this would allow us to use the same function for both cases since the fdb's dst is set to NULL if it is pointing to the bridge thus we can always check for a port match. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-14bridge: fdb: add proper lock checks in searching functionsNikolay Aleksandrov
In order to avoid new errors add checks to br_fdb_find and fdb_find_rcu functions. The first requires hash_lock, the second obviously RCU. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-14bridge: fdb: converge fdb searching functions into oneNikolay Aleksandrov
Before this patch we had 3 different fdb searching functions which was confusing. This patch reduces all of them to one - fdb_find_rcu(), and two flavors: br_fdb_find() which requires hash_lock and br_fdb_find_rcu which requires RCU. This makes it clear what needs to be used, we also remove two abusers of __br_fdb_get which called it under hash_lock and replace them with br_fdb_find(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-14sched: Fix accidental removal of errout gotoJiri Pirko
Bring back the goto that was removed by accident. Reported-by: Colin Ian King <colin.king@canonical.com> Fixes: 40c81b25b16c ("sched: check negative err value to safe one level of indent") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-13net: busy-poll: remove LL_FLUSH_FAILED and LL_FLUSH_BUSYEric Dumazet
Commit 79e7fff47b7b ("net: remove support for per driver ndo_busy_poll()") made them obsolete. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree, most relevantly they are: 1) Extend nft_exthdr to allow to match TCP options bitfields, from Manuel Messner. 2) Allow to check if IPv6 extension header is present in nf_tables, from Phil Sutter. 3) Allow to set and match conntrack zone in nf_tables, patches from Florian Westphal. 4) Several patches for the nf_tables set infrastructure, this includes cleanup and preparatory patches to add the new bitmap set type. 5) Add optional ruleset generation ID check to nf_tables and allow to delete rules that got no public handle yet via NFTA_RULE_ID. These patches add the missing kernel infrastructure to support rule deletion by description from userspace. 6) Missing NFT_SET_OBJECT flag to select the right backend when sets stores an object map. 7) A couple of cleanups for the expectation and SIP helper, from Gao feng. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-12netfilter: nf_tables: honor NFT_SET_OBJECT in set backend selectionPablo Neira Ayuso
Check for NFT_SET_OBJECT feature flag, otherwise we may end up selecting the wrong set backend. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-12netfilter: nf_tables: add NFTA_RULE_ID attributePablo Neira Ayuso
This new attribute allows us to uniquely identify a rule in transaction. Robots may trigger an insertion followed by deletion in a batch, in that scenario we still don't have a public rule handle that we can use to delete the rule. This is similar to the NFTA_SET_ID attribute that allows us to refer to an anonymous set from a batch. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-12netfilter: nf_tables: add check_genid to the nfnetlink subsystemPablo Neira Ayuso
This patch implements the check generation id as provided by nfnetlink. This allows us to reject ruleset updates against stale baseline, so userspace can retry update with a fresh ruleset cache. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-12netfilter: nfnetlink: allow to check for generation IDPablo Neira Ayuso
This patch allows userspace to specify the generation ID that has been used to build an incremental batch update. If userspace specifies the generation ID in the batch message as attribute, then nfnetlink compares it to the current generation ID so you make sure that you work against the right baseline. Otherwise, bail out with ERESTART so userspace knows that its changeset is stale and needs to respin. Userspace can do this transparently at the cost of taking slightly more time to refresh caches and rework the changeset. This check is optional, if there is no NFNL_BATCH_GENID attribute in the batch begin message, then no check is performed. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-12netfilter: nfnetlink: add nfnetlink_rcv_skb_batch()Pablo Neira Ayuso
Add new nfnetlink_rcv_skb_batch() to wrap initial nfnetlink batch handling. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-12netfilter: nfnetlink: get rid of u_intX_t typesPablo Neira Ayuso
Use uX types instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-12netfilter: nf_ct_expect: nf_ct_expect_insert() returns voidGao Feng
Because nf_ct_expect_insert() always succeeds now, its return value can be just void instead of int. And remove code that checks for its return value. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-12netfilter: nf_ct_sip: Use mod_timer_pending()Gao Feng
timer_del() followed by timer_add() can be replaced by mod_timer_pending(). Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-02-11net_sched: fix error recovery at qdisc creationEric Dumazet
Dmitry reported uses after free in qdisc code [1] The problem here is that ops->init() can return an error. qdisc_create_dflt() then call ops->destroy(), while qdisc_create() does _not_ call it. Four qdisc chose to call their own ops->destroy(), assuming their caller would not. This patch makes sure qdisc_create() calls ops->destroy() and fixes the four qdisc to avoid double free. [1] BUG: KASAN: use-after-free in mq_destroy+0x242/0x290 net/sched/sch_mq.c:33 at addr ffff8801d415d440 Read of size 8 by task syz-executor2/5030 CPU: 0 PID: 5030 Comm: syz-executor2 Not tainted 4.3.5-smp-DEV #119 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 0000000000000046 ffff8801b435b870 ffffffff81bbbed4 ffff8801db000400 ffff8801d415d440 ffff8801d415dc40 ffff8801c4988510 ffff8801b435b898 ffffffff816682b1 ffff8801b435b928 ffff8801d415d440 ffff8801c49880c0 Call Trace: [<ffffffff81bbbed4>] __dump_stack lib/dump_stack.c:15 [inline] [<ffffffff81bbbed4>] dump_stack+0x6c/0x98 lib/dump_stack.c:51 [<ffffffff816682b1>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158 [<ffffffff81668524>] print_address_description mm/kasan/report.c:196 [inline] [<ffffffff81668524>] kasan_report_error+0x1b4/0x4b0 mm/kasan/report.c:285 [<ffffffff81668953>] kasan_report mm/kasan/report.c:305 [inline] [<ffffffff81668953>] __asan_report_load8_noabort+0x43/0x50 mm/kasan/report.c:326 [<ffffffff82527b02>] mq_destroy+0x242/0x290 net/sched/sch_mq.c:33 [<ffffffff82524bdd>] qdisc_destroy+0x12d/0x290 net/sched/sch_generic.c:953 [<ffffffff82524e30>] qdisc_create_dflt+0xf0/0x120 net/sched/sch_generic.c:848 [<ffffffff8252550d>] attach_default_qdiscs net/sched/sch_generic.c:1029 [inline] [<ffffffff8252550d>] dev_activate+0x6ad/0x880 net/sched/sch_generic.c:1064 [<ffffffff824b1db1>] __dev_open+0x221/0x320 net/core/dev.c:1403 [<ffffffff824b24ce>] __dev_change_flags+0x15e/0x3e0 net/core/dev.c:6858 [<ffffffff824b27de>] dev_change_flags+0x8e/0x140 net/core/dev.c:6926 [<ffffffff824f5bf6>] dev_ifsioc+0x446/0x890 net/core/dev_ioctl.c:260 [<ffffffff824f61fa>] dev_ioctl+0x1ba/0xb80 net/core/dev_ioctl.c:546 [<ffffffff82430509>] sock_do_ioctl+0x99/0xb0 net/socket.c:879 [<ffffffff82430d30>] sock_ioctl+0x2a0/0x390 net/socket.c:958 [<ffffffff816f3b68>] vfs_ioctl fs/ioctl.c:44 [inline] [<ffffffff816f3b68>] do_vfs_ioctl+0x8a8/0xe50 fs/ioctl.c:611 [<ffffffff816f41a4>] SYSC_ioctl fs/ioctl.c:626 [inline] [<ffffffff816f41a4>] SyS_ioctl+0x94/0xc0 fs/ioctl.c:617 [<ffffffff8123e357>] entry_SYSCALL_64_fastpath+0x12/0x17 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11net: rename dst_neigh_output back to neigh_outputJulian Anastasov
After the dst->pending_confirm flag was removed, we do not need anymore to provide dst arg to dst_neigh_output. So, rename it to neigh_output as before commit 5110effee8fd ("net: Do delayed neigh confirmation."). Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2017-02-10l2tp: do not use udp_ioctl()Eric Dumazet
udp_ioctl(), as its name suggests, is used by UDP protocols, but is also used by L2TP :( L2TP should use its own handler, because it really does not look the same. SIOCINQ for instance should not assume UDP checksum or headers. Thanks to Andrey and syzkaller team for providing the report and a nice reproducer. While crashes only happen on recent kernels (after commit 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue")), this probably needs to be backported to older kernels. Fixes: 7c13f97ffde6 ("udp: do fwd memory scheduling on dequeue") Fixes: 85584672012e ("udp: Fix udp_poll() and ioctl()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10devlink: allow to fillup eswitch attrs even if mode_get op does not existJiri Pirko
Even when mode_get op is not present, other eswitch attrs need to be filled-up. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10devlink: use nla_put_failure goto label instead of outJiri Pirko
Be aligned with the rest of the code and use label named nla_put_failure. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10devlink: rename devlink_eswitch_fill to devlink_nl_eswitch_fillJiri Pirko
Be aligned with the rest of the file and name the helper function accordingly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10devlink: fix the name of eswitch commandsJiri Pirko
The eswitch_[gs]et command is supposed to be similar to port_[gs]et command - for multiple eswitch attributes. However, when it was introduced by 08f4b5918b2d ("net/devlink: Add E-Switch mode control") it was wrongly named with the word "mode" in it. So fix this now, make the oririnal enum value existing but obsolete. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10Merge tag 'mac80211-next-for-davem-2017-02-09' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Some more updates: * use shash in mac80211 crypto code where applicable * some documentation fixes * pass RSSI levels up in change notifications * remove unused rfkill-regulator * various other cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10net: cgroups: fix build errors when linux/phy*.h is removed from net/dsa.hRussell King
net/core/netprio_cgroup.c:303:16: error: expected declaration specifiers or '...' before string constant MODULE_LICENSE("GPL v2"); ^~~~~~~~ Add linux/module.h to fix this. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10net: sunrpc: fix build errors when linux/phy*.h is removed from net/dsa.hRussell King
Removing linux/phy.h from net/dsa.h reveals a build error in the sunrpc code: net/sunrpc/xprtrdma/svc_rdma_backchannel.c: In function 'xprt_rdma_bc_put': net/sunrpc/xprtrdma/svc_rdma_backchannel.c:277:2: error: implicit declaration of function 'module_put' [-Werror=implicit-function-declaration] net/sunrpc/xprtrdma/svc_rdma_backchannel.c: In function 'xprt_setup_rdma_bc': net/sunrpc/xprtrdma/svc_rdma_backchannel.c:348:7: error: implicit declaration of function 'try_module_get' [-Werror=implicit-function-declaration] Fix this by adding linux/module.h to svc_rdma_backchannel.c Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10net: Fix checkpatch, Missing a blank line after declarationstcharding
This patch fixes multiple occurrences of checkpatch WARNING: Missing a blank line after declarations. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10net: Fix checkpatch block comments warningstcharding
Fix multiple occurrences of checkpatch warning. WARNING: Block comments use * on subsequent lines. Also make comment blocks more uniform. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10net: Fix checkpatch whitespace errorstcharding
This patch fixes two trivial whitespace errors. Brace should be on the previous line and trailing statements should be on next line. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10net: Fix checkpatch WARNING: please, no space before tabstcharding
This patch fixes multiple occurrences of space before tabs warnings. More lines of code were moved than required to keep kernel-doc comments uniform. Signed-off-by: Tobin C. Harding <me@tobin.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10net/act_pedit: Introduce 'add' operationAmir Vadai
This command could be useful to inc/dec fields. For example, to forward any TCP packet and decrease its TTL: $ tc filter add dev enp0s9 protocol ip parent ffff: \ flower ip_proto tcp \ action pedit munge ip ttl add 0xff pipe \ action mirred egress redirect dev veth0 In the example above, adding 0xff to this u8 field is actually decreasing it by one, since the operation is masked. Signed-off-by: Amir Vadai <amir@vadai.me> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10net/act_pedit: Support using offset relative to the conventional network headersAmir Vadai
Extend pedit to enable the user setting offset relative to network headers. This change would enable to work with more complex header schemes (vs the simple IPv4 case) where setting a fixed offset relative to the network header is not enough. After this patch, the action has information about the exact header type and field inside this header. This information could be used later on for hardware offloading of pedit. Backward compatibility was being kept: 1. Old kernel <-> new userspace 2. New kernel <-> old userspace 3. add rule using new userspace <-> dump using old userspace 4. add rule using old userspace <-> dump using new userspace When using the extended api, new netlink attributes are being used. This way, operation will fail in (1) and (3) - and no malformed rule be added or dumped. Of course, new user space that doesn't need the new functionality can use the old netlink attributes and operation will succeed. Since action can support both api's, (2) should work, and it is easy to write the new user space to have (4) work. The action is having a strict check that only header types and commands it can handle are accepted. This way future additions will be much easier. Usage example: $ tc filter add dev enp0s9 protocol ip parent ffff: \ flower \ ip_proto tcp \ dst_port 80 \ action pedit munge tcp dport set 8080 pipe \ action mirred egress redirect dev veth0 Will forward tcp port whose original dest port is 80, while modifying the destination port to 8080. Signed-off-by: Amir Vadai <amir@vadai.me> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10switchdev: bridge: Offload mc router portsNogah Frankel
Offload the mc router ports list, whenever it is being changed. It is done because in some cases mc packets needs to be flooded to all the ports in this list. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10bridge: mcast: Merge the mc router ports deletions to one functionNogah Frankel
There are three places where a port gets deleted from the mc router port list. This patch join the actual deletion to one function. It will be helpful for later patch that will offload changes in the mc router ports list. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10switchdev: bridge: Offload multicast disabledNogah Frankel
Offload multicast disabled flag, for more accurate mc flood behavior: When it is on, the mdb should be ignored. When it is off, unregistered mc packets should be flooded to mc router ports. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10sched: check negative err value to safe one level of indentJiri Pirko
As it is more common, check err for !0. That allows to safe one level of indentation and makes the code easier to read. Also, make 'next' variable global in function as it is used twice. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10sched: add missing curly braces in else branch in tc_ctl_tfilterJiri Pirko
Curly braces need to be there, for stylistic reasons. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10sched: move err set right before goto errout in tc_ctl_tfilterJiri Pirko
This makes the reader to know right away what is the error value. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10sched: push TC filter protocol creation into a separate functionJiri Pirko
Make the long function tc_ctl_tfilter a little bit shorter and easier to read. Also make the creation of filter proto symmetric to destruction. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10sched: move tcf_proto_destroy and tcf_destroy_chain helpers into cls_apiJiri Pirko
Creation is done in this file, move destruction to be at the same place. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10sched: rename tcf_destroy to tcf_destroy_protoJiri Pirko
This function destroys TC filter protocol, not TC filter. So name it accordingly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10ipv4: fib: Add events for FIB replace and appendIdo Schimmel
The FIB notification chain currently uses the NLM_F_{REPLACE,APPEND} flags to signal routes being replaced or appended. Instead of using netlink flags for in-kernel notifications we can simply introduce two new events in the FIB notification chain. This has the added advantage of making the API cleaner, thereby making it clear that these events should be supported by listeners of the notification chain. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10ipv4: fib: Send notification before deleting FIB aliasIdo Schimmel
When a FIB alias is replaced following NLM_F_REPLACE, the ENTRY_ADD notification is sent after the reference on the previous FIB info was dropped. This is problematic as potential listeners might need to access it in their notification blocks. Solve this by sending the notification prior to the deletion of the replaced FIB alias. This is consistent with ENTRY_DEL notifications. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10ipv4: fib: Send deletion notification with actual FIB alias typeIdo Schimmel
When a FIB alias is removed, a notification is sent using the type passed from user space - can be RTN_UNSPEC - instead of the actual type of the removed alias. This is problematic for listeners of the FIB notification chain, as several FIB aliases can exist with matching parameters, but the type. Solve this by passing the actual type of the removed FIB alias. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-10ipv4: fib: Only flush FIB aliases belonging to currently flushed tableIdo Schimmel
In case the MAIN table is flushed and its trie is shared with the LOCAL table, then we might be flushing FIB aliases belonging to the latter. This can lead to FIB_ENTRY_DEL notifications sent with the wrong table ID. The above doesn't affect current listeners, as the table ID is ignored during entry deletion, but this will change later in the patchset. When flushing a particular table, skip any aliases belonging to a different one. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Patrick McHardy <kaber@trash.net> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Pack struct sw_flow_key.Jarno Rajahalme
struct sw_flow_key has two 16-bit holes. Move the most matched conntrack match fields there. In some typical cases this reduces the size of the key that needs to be hashed into half and into one cache line. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Add force commit.Jarno Rajahalme
Stateful network admission policy may allow connections to one direction and reject connections initiated in the other direction. After policy change it is possible that for a new connection an overlapping conntrack entry already exists, where the original direction of the existing connection is opposed to the new connection's initial packet. Most importantly, conntrack state relating to the current packet gets the "reply" designation based on whether the original direction tuple or the reply direction tuple matched. If this "directionality" is wrong w.r.t. to the stateful network admission policy it may happen that packets in neither direction are correctly admitted. This patch adds a new "force commit" option to the OVS conntrack action that checks the original direction of an existing conntrack entry. If that direction is opposed to the current packet, the existing conntrack entry is deleted and a new one is subsequently created in the correct direction. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Add original direction conntrack tuple to sw_flow_key.Jarno Rajahalme
Add the fields of the conntrack original direction 5-tuple to struct sw_flow_key. The new fields are initially marked as non-existent, and are populated whenever a conntrack action is executed and either finds or generates a conntrack entry. This means that these fields exist for all packets that were not rejected by conntrack as untrackable. The original tuple fields in the sw_flow_key are filled from the original direction tuple of the conntrack entry relating to the current packet, or from the original direction tuple of the master conntrack entry, if the current conntrack entry has a master. Generally, expected connections of connections having an assigned helper (e.g., FTP), have a master conntrack entry. The main purpose of the new conntrack original tuple fields is to allow matching on them for policy decision purposes, with the premise that the admissibility of tracked connections reply packets (as well as original direction packets), and both direction packets of any related connections may be based on ACL rules applying to the master connection's original direction 5-tuple. This also makes it easier to make policy decisions when the actual packet headers might have been transformed by NAT, as the original direction 5-tuple represents the packet headers before any such transformation. When using the original direction 5-tuple the admissibility of return and/or related packets need not be based on the mere existence of a conntrack entry, allowing separation of admission policy from the established conntrack state. While existence of a conntrack entry is required for admission of the return or related packets, policy changes can render connections that were initially admitted to be rejected or dropped afterwards. If the admission of the return and related packets was based on mere conntrack state (e.g., connection being in an established state), a policy change that would make the connection rejected or dropped would need to find and delete all conntrack entries affected by such a change. When using the original direction 5-tuple matching the affected conntrack entries can be allowed to time out instead, as the established state of the connection would not need to be the basis for packet admission any more. It should be noted that the directionality of related connections may be the same or different than that of the master connection, and neither the original direction 5-tuple nor the conntrack state bits carry this information. If needed, the directionality of the master connection can be stored in master's conntrack mark or labels, which are automatically inherited by the expected related connections. The fact that neither ARP nor ND packets are trackable by conntrack allows mutual exclusion between ARP/ND and the new conntrack original tuple fields. Hence, the IP addresses are overlaid in union with ARP and ND fields. This allows the sw_flow_key to not grow much due to this patch, but it also means that we must be careful to never use the new key fields with ARP or ND packets. ARP is easy to distinguish and keep mutually exclusive based on the ethernet type, but ND being an ICMPv6 protocol requires a bit more attention. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Inherit master's labels.Jarno Rajahalme
We avoid calling into nf_conntrack_in() for expected connections, as that would remove the expectation that we want to stick around until we are ready to commit the connection. Instead, we do a lookup in the expectation table directly. However, after a successful expectation lookup we have set the flow key label field from the master connection, whereas nf_conntrack_in() does not do this. This leads to master's labels being inherited after an expectation lookup, but those labels not being inherited after the corresponding conntrack action with a commit flag. This patch resolves the problem by changing the commit code path to also inherit the master's labels to the expected connection. Resolving this conflict in favor of inheriting the labels allows more information be passed from the master connection to related connections, which would otherwise be much harder if the 32 bits in the connmark are not enough. Labels can still be set explicitly, so this change only affects the default values of the labels in presense of a master connection. Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Refactor labels initialization.Jarno Rajahalme
Refactoring conntrack labels initialization makes changes in later patches easier to review. Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-09openvswitch: Simplify labels length logic.Jarno Rajahalme
Since 23014011ba42 ("netfilter: conntrack: support a fixed size of 128 distinct labels"), the size of conntrack labels extension has fixed to 128 bits, so we do not need to check for labels sizes shorter than 128 at run-time. This patch simplifies labels length logic accordingly, but allows the conntrack labels size to be increased in the future without breaking the build. In the event of conntrack labels increasing in size OVS would still be able to deal with the 128 first label bits. Suggested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>