summaryrefslogtreecommitdiff
path: root/net/sched
AgeCommit message (Collapse)Author
2019-02-25net: sched: pie: change default value of pie_params->tupdateMohit P. Tahiliani
RFC 8033 suggests a default value of 15 milliseconds for the update interval. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25net: sched: pie: change default value of pie_params->targetMohit P. Tahiliani
RFC 8033 suggests a default value of 15 milliseconds for the target queue delay. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25net: sched: pie: change value of QUEUE_THRESHOLDMohit P. Tahiliani
RFC 8033 recommends a value of 16384 bytes for the queue threshold. Signed-off-by: Mohit P. Tahiliani <tahiliani@nitk.edu.in> Signed-off-by: Dhaval Khandla <dhavaljkhandla26@gmail.com> Signed-off-by: Hrishikesh Hiraskar <hrishihiraskar@gmail.com> Signed-off-by: Manish Kumar B <bmanish15597@gmail.com> Signed-off-by: Sachin D. Patil <sdp.sachin@gmail.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Acked-by: Dave Taht <dave.taht@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25net: sched: don't release block->lock when dumping chainsVlad Buslov
Function tc_dump_chain() obtains and releases block->lock on each iteration of its inner loop that dumps all chains on block. Outputting chain template info is fast operation so locking/unlocking mutex multiple times is an overhead when lock is highly contested. Modify tc_dump_chain() to only obtain block->lock once and dump all chains without releasing it. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25net: sched: set dedicated tcf_walker flag when tp is emptyVlad Buslov
Using tcf_walker->stop flag to determine when tcf_walker->fn() was called at least once is unreliable. Some classifiers set 'stop' flag on error before calling walker callback, other classifiers used to call it with NULL filter pointer when empty. In order to prevent further regressions, extend tcf_walker structure with dedicated 'nonempty' flag. Set this flag in tcf_walker->fn() implementation that is used to check if classifier has filters configured. Fixes: 8b64678e0af8 ("net: sched: refactor tp insert/delete for concurrent execution") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-25net: sched: act_tunnel_key: fix NULL pointer dereference during initVlad Buslov
Metadata pointer is only initialized for action TCA_TUNNEL_KEY_ACT_SET, but it is unconditionally dereferenced in tunnel_key_init() error handler. Verify that metadata pointer is not NULL before dereferencing it in tunnel_key_init error handling code. Fixes: ee28bb56ac5b ("net/sched: fix memory leak in act_tunnel_key_init()") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net/sched: act_tunnel_key: Add dst_cache supportwenxu
The metadata_dst is not init the dst_cache which make the ip_md_tunnel_xmit can't use the dst_cache. It will lookup route table every packets. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net/sched: act_skbedit: fix refcount leak when replace failsDavide Caratti
when act_skbedit was converted to use RCU in the data plane, we added an error path, but we forgot to drop the action refcount in case of failure during a 'replace' operation: # tc actions add action skbedit ptype otherhost pass index 100 # tc action show action skbedit total acts 1 action order 0: skbedit ptype otherhost pass index 100 ref 1 bind 0 # tc actions replace action skbedit ptype otherhost drop index 100 RTNETLINK answers: Cannot allocate memory We have an error talking to the kernel # tc action show action skbedit total acts 1 action order 0: skbedit ptype otherhost pass index 100 ref 2 bind 0 Ensure we call tcf_idr_release(), in case 'params_new' allocation failed, also when the action is being replaced. Fixes: c749cdda9089 ("net/sched: act_skbedit: don't use spinlock in the data path") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-24net/sched: act_ipt: fix refcount leak when replace failsDavide Caratti
After commit 4e8ddd7f1758 ("net: sched: don't release reference on action overwrite"), the error path of all actions was converted to drop refcount also when the action was being overwritten. But we forgot act_ipt_init(), in case allocation of 'tname' was not successful: # tc action add action xt -j LOG --log-prefix hello index 100 tablename: mangle hook: NF_IP_POST_ROUTING target: LOG level warning prefix "hello" index 100 # tc action show action xt total acts 1 action order 0: tablename: mangle hook: NF_IP_POST_ROUTING target LOG level warning prefix "hello" index 100 ref 1 bind 0 # tc action replace action xt -j LOG --log-prefix world index 100 tablename: mangle hook: NF_IP_POST_ROUTING target: LOG level warning prefix "world" index 100 RTNETLINK answers: Cannot allocate memory We have an error talking to the kernel # tc action show action xt total acts 1 action order 0: tablename: mangle hook: NF_IP_POST_ROUTING target LOG level warning prefix "hello" index 100 ref 2 bind 0 Ensure we call tcf_idr_release(), in case 'tname' allocation failed, also when the action is being replaced. Fixes: 4e8ddd7f1758 ("net: sched: don't release reference on action overwrite") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-22net_sched: initialize net pointer inside tcf_exts_init()Cong Wang
For tcindex filter, it is too late to initialize the net pointer in tcf_exts_validate(), as tcf_exts_get_net() requires a non-NULL net pointer. We can just move its initialization into tcf_exts_init(), which just requires an additional parameter. This makes the code in tcindex_alloc_perfect_hash() prettier. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21net: sched: potential NULL dereference in tcf_block_find()Dan Carpenter
The error code isn't set on this path so it would result in returning ERR_PTR(0) and a NULL dereference in the caller. Fixes: 18d3eefb17cf ("net: sched: refactor tcf_block_find() into standalone functions") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20net_sched: fix a memory leak in cls_tcindexCong Wang
(cherry picked from commit 033b228e7f26b29ae37f8bfa1bc6b209a5365e9f) When tcindex_destroy() destroys all the filter results in the perfect hash table, it invokes the walker to delete each of them. However, results with class==0 are skipped in either tcindex_walk() or tcindex_delete(), which causes a memory leak reported by kmemleak. This patch fixes it by skipping the walker and directly deleting these filter results so we don't miss any filter result. As a result of this change, we have to initialize exts->net properly in tcindex_alloc_perfect_hash(). For net-next, we need to consider whether we should initialize ->net in tcf_exts_init() instead, before that just directly test CONFIG_NET_CLS_ACT=y. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20net_sched: fix a race condition in tcindex_destroy()Cong Wang
(cherry picked from commit 8015d93ebd27484418d4952284fd02172fa4b0b2) tcindex_destroy() invokes tcindex_destroy_element() via a walker to delete each filter result in its perfect hash table, and tcindex_destroy_element() calls tcindex_delete() which schedules tcf RCU works to do the final deletion work. Unfortunately this races with the RCU callback __tcindex_destroy(), which could lead to use-after-free as reported by Adrian. Fix this by migrating this RCU callback to tcf RCU work too, as that workqueue is ordered, we will not have use-after-free. Note, we don't need to hold netns refcnt because we don't call tcf_exts_destroy() here. Fixes: 27ce4f05e2ab ("net_sched: use tcf_queue_work() in tcindex filter") Reported-by: Adrian <bugs@abtelecom.ro> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18net: sched: using kfree_rcu() to simplify the codeWei Yongjun
The callback function of call_rcu() just calls a kfree(), so we can use kfree_rcu() instead of call_rcu() + callback function. Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17net: sched: sch_api: set an error msg when qdisc_alloc_handle() failsIvan Vecera
This patch sets an error message in extack when the number of qdisc handles exceeds the maximum. Also the error-code ENOSPC is more appropriate than ENOMEM in this situation. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reported-by: Li Shuang <shuali@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17net: sched: cgroup: verify that filter is not NULL during walkVlad Buslov
Check that filter is not NULL before passing it to tcf_walker->fn() callback in cls_cgroup_walk(). This can happen when cls_cgroup_change() failed to set first filter. Fixes: ed76f5edccc9 ("net: sched: protect filter_chain list with filter_chain_lock mutex") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17net: sched: matchall: verify that filter is not NULL in mall_walk()Vlad Buslov
Check that filter is not NULL before passing it to tcf_walker->fn() callback. This can happen when mall_change() failed to offload filter to hardware. Fixes: ed76f5edccc9 ("net: sched: protect filter_chain list with filter_chain_lock mutex") Reported-by: Ido Schimmel <idosch@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17net: sched: route: don't set arg->stop in route4_walk() when emptyVlad Buslov
Some classifiers set arg->stop in their implementation of tp->walk() API when empty. Most of classifiers do not adhere to that convention. Do not set arg->stop in route4_walk() to unify tp->walk() behavior among classifier implementations. Fixes: ed76f5edccc9 ("net: sched: protect filter_chain list with filter_chain_lock mutex") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-17net: sched: fw: don't set arg->stop in fw_walk() when emptyVlad Buslov
Some classifiers set arg->stop in their implementation of tp->walk() API when empty. Most of classifiers do not adhere to that convention. Do not set arg->stop in fw_walk() to unify tp->walk() behavior among classifier implementations. Fixes: ed76f5edccc9 ("net: sched: protect filter_chain list with filter_chain_lock mutex") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The netfilter conflicts were rather simple overlapping changes. However, the cls_tcindex.c stuff was a bit more complex. On the 'net' side, Cong is fixing several races and memory leaks. Whilst on the 'net-next' side we have Vlad adding the rtnl-ness support. What I've decided to do, in order to resolve this, is revert the conversion over to using a workqueue that Cong did, bringing us back to pure RCU. I did it this way because I believe that either Cong's races don't apply with have Vlad did things, or Cong will have to implement the race fix slightly differently. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: sched: remove duplicated include from cls_api.cYueHaibing
Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-13net: sched: flower: only return error from hw offload if skip_swVlad Buslov
Recently introduced tc_setup_flow_action() can fail when parsing tcf_exts on some unsupported action commands. However, this should not affect the case when user did not explicitly request hw offload by setting skip_sw flag. Modify tc_setup_flow_action() callers to only propagate the error if skip_sw flag is set for filter that is being offloaded, and set extack error message in that case. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Fixes: 3a7b68617de7 ("cls_api: add translator to flow_action representation") Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net_sched: fix two more memory leaks in cls_tcindexCong Wang
struct tcindex_filter_result contains two parts: struct tcf_exts and struct tcf_result. For the local variable 'cr', its exts part is never used but initialized without being released properly on success path. So just completely remove the exts part to fix this leak. For the local variable 'new_filter_result', it is never properly released if not used by 'r' on success path. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net_sched: fix a memory leak in cls_tcindexCong Wang
When tcindex_destroy() destroys all the filter results in the perfect hash table, it invokes the walker to delete each of them. However, results with class==0 are skipped in either tcindex_walk() or tcindex_delete(), which causes a memory leak reported by kmemleak. This patch fixes it by skipping the walker and directly deleting these filter results so we don't miss any filter result. As a result of this change, we have to initialize exts->net properly in tcindex_alloc_perfect_hash(). For net-next, we need to consider whether we should initialize ->net in tcf_exts_init() instead, before that just directly test CONFIG_NET_CLS_ACT=y. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net_sched: fix a race condition in tcindex_destroy()Cong Wang
tcindex_destroy() invokes tcindex_destroy_element() via a walker to delete each filter result in its perfect hash table, and tcindex_destroy_element() calls tcindex_delete() which schedules tcf RCU works to do the final deletion work. Unfortunately this races with the RCU callback __tcindex_destroy(), which could lead to use-after-free as reported by Adrian. Fix this by migrating this RCU callback to tcf RCU work too, as that workqueue is ordered, we will not have use-after-free. Note, we don't need to hold netns refcnt because we don't call tcf_exts_destroy() here. Fixes: 27ce4f05e2ab ("net_sched: use tcf_queue_work() in tcindex filter") Reported-by: Adrian <bugs@abtelecom.ro> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: unlock rules update APIVlad Buslov
Register netlink protocol handlers for message types RTM_NEWTFILTER, RTM_DELTFILTER, RTM_GETTFILTER as unlocked. Set rtnl_held variable that tracks rtnl mutex state to be false by default. Introduce tcf_proto_is_unlocked() helper that is used to check tcf_proto_ops->flag to determine if ops can be called without taking rtnl lock. Manually lookup Qdisc, class and block in rule update handlers. Verify that both Qdisc ops and proto ops are unlocked before using any of their callbacks, and obtain rtnl lock otherwise. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: refactor tcf_block_find() into standalone functionsVlad Buslov
Refactor tcf_block_find() code into three standalone functions: - __tcf_qdisc_find() to lookup Qdisc and increment its reference counter. - __tcf_qdisc_cl_find() to lookup class. - __tcf_block_find() to lookup block and increment its reference counter. This change is necessary to allow netlink tc rule update handlers to call these functions directly in order to conditionally take rtnl lock according to Qdisc class ops flags before calling any of class ops functions. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: extend proto ops to support unlocked classifiersVlad Buslov
Add 'rtnl_held' flag to tcf proto change, delete, destroy, dump, walk functions to track rtnl lock status. Extend users of these function in cls API to propagate rtnl lock status to them. This allows classifiers to obtain rtnl lock when necessary and to pass rtnl lock status to extensions and driver offload callbacks. Add flags field to tcf proto ops. Add flag value to indicate that classifier doesn't require rtnl lock. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: extend proto ops with 'put' callbackVlad Buslov
Add optional tp->ops->put() API to be implemented for filter reference counting. This new function is called by cls API to release filter reference for filters returned by tp->ops->change() or tp->ops->get() functions. Implement tfilter_put() helper to call tp->ops->put() only for classifiers that implement it. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: track rtnl lock status when validating extensionsVlad Buslov
Actions API is already updated to not rely on rtnl lock for synchronization. However, it need to be provided with rtnl status when called from classifiers API in order to be able to correctly release the lock when loading kernel module. Extend extension validation function with 'rtnl_held' flag which is passed to actions API. Add new 'rtnl_held' parameter to tcf_exts_validate() in cls API. No classifier is currently updated to support unlocked execution, so pass hardcoded 'true' flag parameter value. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: prevent insertion of new classifiers during chain flushVlad Buslov
Extend tcf_chain with 'flushing' flag. Use the flag to prevent insertion of new classifier instances when chain flushing is in progress in order to prevent resource leak when tcf_proto is created by unlocked users concurrently. Return EAGAIN error from tcf_chain_tp_insert_unique() to restart tc_new_tfilter() and lookup the chain/proto again. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: refactor tp insert/delete for concurrent executionVlad Buslov
Implement unique insertion function to atomically attach tcf_proto to chain after verifying that no other tcf proto with specified priority exists. Implement delete function that verifies that tp is actually empty before deleting it. Use these functions to refactor cls API to account for concurrent tp and rule update instead of relying on rtnl lock. Add new 'deleting' flag to tcf proto. Use it to restart search when iterating over tp's on chain to prevent accessing potentially inval tp->next pointer. Extend tcf proto with spinlock that is intended to be used to protect its data from concurrent modification instead of relying on rtnl mutex. Use it to protect 'deleting' flag. Add lockdep macros to validate that lock is held when accessing protected fields. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: traverse classifiers in chain with tcf_get_next_proto()Vlad Buslov
All users of chain->filters_chain rely on rtnl lock and assume that no new classifier instances are added when traversing the list. Use tcf_get_next_proto() to traverse filters list without relying on rtnl mutex. This function iterates over classifiers by taking reference to current iterator classifier only and doesn't assume external synchronization of filters list. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: introduce reference counting for tcf_protoVlad Buslov
In order to remove dependency on rtnl lock and allow concurrent tcf_proto modification, extend tcf_proto with reference counter. Implement helper get/put functions for tcf proto and use them to modify cls API to always take reference to tcf_proto while using it. Only release reference to parent chain after releasing last reference to tp. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: protect filter_chain list with filter_chain_lock mutexVlad Buslov
Extend tcf_chain with new filter_chain_lock mutex. Always lock the chain when accessing filter_chain list, instead of relying on rtnl lock. Dereference filter_chain with tcf_chain_dereference() lockdep macro to verify that all users of chain_list have the lock taken. Rearrange tp insert/remove code in tc_new_tfilter/tc_del_tfilter to execute all necessary code while holding chain lock in order to prevent invalidation of chain_info structure by potential concurrent change. This also serializes calls to tcf_chain0_head_change(), which allows head change callbacks to rely on filter_chain_lock for synchronization instead of rtnl mutex. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: protect chain template accesses with block lockVlad Buslov
When cls API is called without protection of rtnl lock, parallel modification of chain is possible, which means that chain template can be changed concurrently in certain circumstances. For example, when chain is 'deleted' by new user-space chain API, the chain might continue to be used if it is referenced by actions, and can be 're-created' again by user. In such case same chain structure is reused and its template is changed. To protect from described scenario, cache chain template while holding block lock. Introduce standalone tc_chain_notify_delete() function that works with cached template values, instead of chains themselves. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: traverse chains in block with tcf_get_next_chain()Vlad Buslov
All users of block->chain_list rely on rtnl lock and assume that no new chains are added when traversing the list. Use tcf_get_next_chain() to traverse chain list without relying on rtnl mutex. This function iterates over chains by taking reference to current iterator chain only and doesn't assume external synchronization of chain list. Don't take reference to all chains in block when flushing and use tcf_get_next_chain() to safely iterate over chain list instead. Remove tcf_block_put_all_chains() that is no longer used. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: protect block->chain0 with block->lockVlad Buslov
In order to remove dependency on rtnl lock, use block->lock to protect chain0 struct from concurrent modification. Rearrange code in chain0 callback add and del functions to only access chain0 when block->lock is held. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: refactor tc_ctl_chain() to use block->lockVlad Buslov
In order to remove dependency on rtnl lock, modify chain API to use block->lock to protect chain from concurrent modification. Rearrange tc_ctl_chain() code to call tcf_chain_hold() while holding block->lock to prevent concurrent chain removal. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: protect chain->explicitly_created with block->lockVlad Buslov
In order to remove dependency on rtnl lock, protect tcf_chain->explicitly_created flag with block->lock. Consolidate code that checks and resets 'explicitly_created' flag into __tcf_chain_put() to execute it atomically with rest of code that puts chain reference. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-12net: sched: protect block state with mutexVlad Buslov
Currently, tcf_block doesn't use any synchronization mechanisms to protect critical sections that manage lifetime of its chains. block->chain_list and multiple variables in tcf_chain that control its lifetime assume external synchronization provided by global rtnl lock. Converting chain reference counting to atomic reference counters is not possible because cls API uses multiple counters and flags to control chain lifetime, so all of them must be synchronized in chain get/put code. Use single per-block lock to protect block data and manage lifetime of all chains on the block. Always take block->lock when accessing chain_list. Chain get and put modify chain lifetime-management data and parent block's chain_list, so take the lock in these functions. Verify block->lock state with assertions in functions that expect to be called with the lock taken and are called from multiple places. Take block->lock when accessing filter_chain_list. In order to allow parallel update of rules on single block, move all calls to classifiers outside of critical sections protected by new block->lock. Rearrange chain get and put functions code to only access protected chain data while holding block lock: - Rearrange code to only access chain reference counter and chain action reference counter while holding block lock. - Extract code that requires block->lock from tcf_chain_destroy() into standalone tcf_chain_destroy() function that is called by __tcf_chain_put() in same critical section that changes chain reference counters. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-11Documentation: bring operstate documentation up-to-dateJouke Witteveen
Netlink has moved from bitmasks to group numbers long ago. Signed-off-by: Jouke Witteveen <j.witteveen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-10net: Change TCA_ACT_* to TCA_ID_* to match that of TCA_ID_POLICEEli Cohen
Modify the kernel users of the TCA_ACT_* macros to use TCA_ID_*. For example, use TCA_ID_GACT instead of TCA_ACT_GACT. This will align with TCA_ID_POLICE and also differentiates these identifier, used in struct tc_action_ops type field, from other macros starting with TCA_ACT_. To make things clearer, we name the enum defining the TCA_ID_* identifiers and also change the "type" field of struct tc_action to id. Signed-off-by: Eli Cohen <eli@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-10net: Move all TC actions identifiers to one placeEli Cohen
Move all the TC identifiers to one place, to the same enum that defines the identifier of police action. This makes it easier choose numbers for new actions since they are now defined in one place. We preserve the original values for binary compatibility. New IDs should be added inside the enum. Signed-off-by: Eli Cohen <eli@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08net/sched: use struct_size() helperGustavo A. R. Silva
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; struct boo entry[]; }; size = sizeof(struct foo) + count * sizeof(struct boo); instance = alloc(size, GFP_KERNEL) Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: size = struct_size(instance, entry, count); instance = alloc(size, GFP_KERNEL) This code was detected with the help of Coccinelle. Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
An ipvlan bug fix in 'net' conflicted with the abstraction away of the IPV6 specific support in 'net-next'. Similarly, a bug fix for mlx5 in 'net' conflicted with the flow action conversion in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06cls_flower: don't expose TC actions to drivers anymorePablo Neira Ayuso
Now that drivers have been converted to use the flow action infrastructure, remove this field from the tc_cls_flower_offload structure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06flow_offload: add statistics retrieval infrastructure and use itPablo Neira Ayuso
This patch provides the flow_stats structure that acts as container for tc_cls_flower_offload, then we can use to restore the statistics on the existing TC actions. Hence, tcf_exts_stats_update() is not used from drivers anymore. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06cls_api: add translator to flow_action representationPablo Neira Ayuso
This patch implements a new function to translate from native TC action to the new flow_action representation. Moreover, this patch also updates cls_flower to use this new function. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06flow_offload: add flow action infrastructurePablo Neira Ayuso
This new infrastructure defines the nic actions that you can perform from existing network drivers. This infrastructure allows us to avoid a direct dependency with the native software TC action representation. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>