summaryrefslogtreecommitdiff
path: root/net/netfilter/nf_tables_api.c
AgeCommit message (Collapse)Author
2019-09-25netfilter: nf_tables: bogus EBUSY when deleting flowtable after flushLaura Garcia Liebana
The deletion of a flowtable after a flush in the same transaction results in EBUSY. This patch adds an activation and deactivation of flowtables in order to update the _use_ counter. Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-20netfilter: nf_tables: allow lookups in dynamic setsFlorian Westphal
This un-breaks lookups in sets that have the 'dynamic' flag set. Given this active example configuration: table filter { set set1 { type ipv4_addr size 64 flags dynamic,timeout timeout 1m } chain input { type filter hook input priority 0; policy accept; } } ... this works: nft add rule ip filter input add @set1 { ip saddr } -> whenever rule is triggered, the source ip address is inserted into the set (if it did not exist). This won't work: nft add rule ip filter input ip saddr @set1 counter Error: Could not process rule: Operation not supported In other words, we can add entries to the set, but then can't make matching decision based on that set. That is just wrong -- all set backends support lookups (else they would not be very useful). The failure comes from an explicit rejection in nft_lookup.c. Looking at the history, it seems like NFT_SET_EVAL used to mean 'set contains expressions' (aka. "is a meter"), for instance something like nft add rule ip filter input meter example { ip saddr limit rate 10/second } or nft add rule ip filter input meter example { ip saddr counter } The actual meaning of NFT_SET_EVAL however, is 'set can be updated from the packet path'. 'meters' and packet-path insertions into sets, such as 'add @set { ip saddr }' use exactly the same kernel code (nft_dynset.c) and thus require a set backend that provides the ->update() function. The only set that provides this also is the only one that has the NFT_SET_EVAL feature flag. Removing the wrong check makes the above example work. While at it, also fix the flag check during set instantiation to allow supported combinations only. Fixes: 8aeff920dcc9b3f ("netfilter: nf_tables: add stateful object reference to set elements") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-20netfilter: nf_tables: add NFT_CHAIN_POLICY_UNSET and use itPablo Neira Ayuso
Default policy is defined as a unsigned 8-bit field, do not use a negative value to leave it unset, use this new NFT_CHAIN_POLICY_UNSET instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-13netfilter: nf_tables_offload: remove rules when the device unregisterswenxu
If the net_device unregisters, clean up the offload rules before the chain is destroy. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-10netfilter: nft_{fwd,dup}_netdev: add offload supportPablo Neira Ayuso
This patch adds support for packet mirroring and redirection. The nft_fwd_dup_netdev_offload() function configures the flow_action object for the fwd and the dup actions. Extend nft_flow_rule_destroy() to release the net_device object when the flow_rule object is released, since nft_fwd_dup_netdev_offload() bumps the net_device reference counter. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: wenxu <wenxu@ucloud.cn>
2019-09-08netfilter: nf_tables_offload: move indirect flow_block callback logic to corePablo Neira Ayuso
Add nft_offload_init() and nft_offload_exit() function to deal with the init and the exit path of the offload infrastructure. Rename nft_indr_block_get_and_ing_cmd() to nft_indr_block_cb(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-08netfilter: nf_tables: Fix an Oops in nf_tables_updobj() error handlingDan Carpenter
The "newobj" is an error pointer so we can't pass it to kfree(). It doesn't need to be freed so we can remove that and I also renamed the error label. Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-05netfilter: nf_tables: fix possible null-pointer dereference in object updateFernando Fernandez Mancera
Not all objects have an update operation. If the object type doesn't implement an update operation and the user tries to update it will hit EOPNOTSUPP. Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-09-03netfilter: nf_tables: Introduce stateful object update operationFernando Fernandez Mancera
This patch adds the infrastructure needed for the stateful object update support. Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Merge conflict of mlx5 resolved using instructions in merge commit 9566e650bf7fdf58384bb06df634f7531ca3a97e. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-18netfilter: nf_tables: map basechain priority to hardware priorityPablo Neira Ayuso
This patch adds initial support for offloading basechains using the priority range from 1 to 65535. This is restricting the netfilter priority range to 16-bit integer since this is what most drivers assume so far from tc. It should be possible to extend this range of supported priorities later on once drivers are updated to support for 32-bit integer priorities. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-08-09netfilter: nf_tables: use-after-free in failing rule with bound setPablo Neira Ayuso
If a rule that has already a bound anonymous set fails to be added, the preparation phase releases the rule and the bound set. However, the transaction object from the abort path still has a reference to the set object that is stale, leading to a use-after-free when checking for the set->bound field. Add a new field to the transaction that specifies if the set is bound, so the abort path can skip releasing it since the rule command owns it and it takes care of releasing it. After this update, the set->bound field is removed. [ 24.649883] Unable to handle kernel paging request at virtual address 0000000000040434 [ 24.657858] Mem abort info: [ 24.660686] ESR = 0x96000004 [ 24.663769] Exception class = DABT (current EL), IL = 32 bits [ 24.669725] SET = 0, FnV = 0 [ 24.672804] EA = 0, S1PTW = 0 [ 24.675975] Data abort info: [ 24.678880] ISV = 0, ISS = 0x00000004 [ 24.682743] CM = 0, WnR = 0 [ 24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000 [ 24.692207] [0000000000040434] pgd=0000000000000000 [ 24.697119] Internal error: Oops: 96000004 [#1] SMP [...] [ 24.889414] Call trace: [ 24.891870] __nf_tables_abort+0x3f0/0x7a0 [ 24.895984] nf_tables_abort+0x20/0x40 [ 24.899750] nfnetlink_rcv_batch+0x17c/0x588 [ 24.904037] nfnetlink_rcv+0x13c/0x190 [ 24.907803] netlink_unicast+0x18c/0x208 [ 24.911742] netlink_sendmsg+0x1b0/0x350 [ 24.915682] sock_sendmsg+0x4c/0x68 [ 24.919185] ___sys_sendmsg+0x288/0x2c8 [ 24.923037] __sys_sendmsg+0x7c/0xd0 [ 24.926628] __arm64_sys_sendmsg+0x2c/0x38 [ 24.930744] el0_svc_common.constprop.0+0x94/0x158 [ 24.935556] el0_svc_handler+0x34/0x90 [ 24.939322] el0_svc+0x8/0xc [ 24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863) [ 24.948336] ---[ end trace cebbb9dcbed3b56f ]--- Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-08-08netfilter: nf_tables_offload: support indr block callwenxu
nftable support indr-block call. It makes nftable an offload vlan and tunnel device. nft add table netdev firewall nft add chain netdev firewall aclout { type filter hook ingress offload device mlx_pf0vf0 priority - 300 \; } nft add rule netdev firewall aclout ip daddr 10.0.0.1 fwd to vlan0 nft add chain netdev firewall aclin { type filter hook ingress device vlan0 priority - 300 \; } nft add rule netdev firewall aclin ip daddr 10.0.0.7 fwd to mlx_pf0vf0 Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-19net: flow_offload: add flow_block structure and use itPablo Neira Ayuso
This object stores the flow block callbacks that are attached to this block. Update flow_block_cb_lookup() to take this new object. This patch restores the block sharing feature. Fixes: da3eeb904ff4 ("net: flow_offload: add list handling functions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-16netfilter: nf_tables: don't fail when updating base chain policyFlorian Westphal
The following nftables test case fails on nf-next: tests/shell/run-tests.sh tests/shell/testcases/transactions/0011chain_0 The test case contains: add chain x y { type filter hook input priority 0; } add chain x y { policy drop; }" The new test if (chain->flags ^ flags) return -EOPNOTSUPP; triggers here, because chain->flags has NFT_BASE_CHAIN set, but flags is 0 because no flag attribute was present in the policy update. Just fetch the current flag settings of a pre-existing chain in case userspace did not provide any. Fixes: c9626a2cbdb20 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-09netfilter: nf_tables: add hardware offload supportPablo Neira Ayuso
This patch adds hardware offload support for nftables through the existing netdev_ops->ndo_setup_tc() interface, the TC_SETUP_CLSFLOWER classifier and the flow rule API. This hardware offload support is available for the NFPROTO_NETDEV family and the ingress hook. Each nftables expression has a new ->offload interface, that is used to populate the flow rule object that is attached to the transaction object. There is a new per-table NFT_TABLE_F_HW flag, that is set on to offload an entire table, including all of its chains. This patch supports for basic metadata (layer 3 and 4 protocol numbers), 5-tuple payload matching and the accept/drop actions; this also includes basechain hardware offload only. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-06netfilter: nf_tables: force module load in case select_ops() returns -EAGAINPablo Neira Ayuso
nft_meta needs to pull in the nft_meta_bridge module in case that this is a bridge family rule from the select_ops() path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-05netfilter: nf_tables: __nft_expr_type_get() selects specific family typePablo Neira Ayuso
In case that there are two types, prefer the family specify extension. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-05netfilter: nf_tables: add nft_expr_type_request_module()Pablo Neira Ayuso
This helper function makes sure the family specific extension is loaded. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-06-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextPablo Neira Ayuso
Resolve conflict between d2912cb15bdd ("treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500") removing the GPL disclaimer and fe03d4745675 ("Update my email address") which updates Jozsef Kadlecsik's email. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-06-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Minor SPDX change conflict. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-19netfilter: nf_tables: enable set expiration time for set elementsLaura Garcia Liebana
Currently, the expiration of every element in a set or map is a read-only parameter generated at kernel side. This change will permit to set a certain expiration date per element that will be required, for example, during stateful replication among several nodes. This patch handles the NFTA_SET_ELEM_EXPIRATION in order to configure the expiration parameter per element, or will use the timeout in the case that the expiration is not set. Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset container Netfilter/IPVS update for net-next: 1) Add UDP tunnel support for ICMP errors in IPVS. Julian Anastasov says: This patchset is a followup to the commit that adds UDP/GUE tunnel: "ipvs: allow tunneling with gue encapsulation". What we do is to put tunnel real servers in hash table (patch 1), add function to lookup tunnels (patch 2) and use it to strip the embedded tunnel headers from ICMP errors (patch 3). 2) Extend xt_owner to match for supplementary groups, from Lukasz Pawelczyk. 3) Remove unused oif field in flow_offload_tuple object, from Taehee Yoo. 4) Release basechain counters from workqueue to skip synchronize_rcu() call. From Florian Westphal. 5) Replace skb_make_writable() by skb_ensure_writable(). Patchset from Florian Westphal. 6) Checksum support for gue encapsulation in IPVS, from Jacky Hu. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-31netfilter: nf_tables: free base chain counters from workerFlorian Westphal
No need to use synchronize_rcu() here, just swap the two pointers and have the release occur from work queue after commit has completed. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-20netfilter: nf_tables: fix oops during rule dumpFlorian Westphal
We can oops in nf_tables_fill_rule_info(). Its not possible to fetch previous element in rcu-protected lists when deletions are not prevented somehow: list_del_rcu poisons the ->prev pointer value. Before rcu-conversion this was safe as dump operations did hold nfnetlink mutex. Pass previous rule as argument, obtained by keeping a pointer to the previous rule during traversal. Fixes: d9adf22a291883 ("netfilter: nf_tables: use call_rcu in netlink dumps") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Postpone chain policy update to drop after transaction is complete, from Florian Westphal. 2) Add entry to flowtable after confirmation to fix UDP flows with packets going in one single direction. 3) Reference count leak in dst object, from Taehee Yoo. 4) Check for TTL field in flowtable datapath, from Taehee Yoo. 5) Fix h323 conntrack helper due to incorrect boundary check, from Jakub Jankowski. 6) Fix incorrect rcu dereference when fetching basechain stats, from Florian Westphal. 7) Missing error check when adding new entries to flowtable, from Taehee Yoo. 8) Use version field in nfnetlink message to honor the nfgen_family field, from Kristian Evensen. 9) Remove incorrect configuration check for CONFIG_NF_CONNTRACK_IPV6, from Subash Abhinov Kasiviswanathan. 10) Prevent dying entries from being added to the flowtable, from Taehee Yoo. 11) Don't hit WARN_ON() with malformed blob in ebtables with trailing data after last rule, reported by syzbot, patch from Florian Westphal. 12) Remove NFT_CT_TIMEOUT enumeration, never used in the kernel code. 13) Fix incorrect definition for NFT_LOGLEVEL_MAX, from Florian Westphal. This batch comes with a conflict that can be fixed with this patch: diff --cc include/uapi/linux/netfilter/nf_tables.h index 7bdb234f3d8c,f0cf7b0f4f35..505393c6e959 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@@ -966,6 -966,8 +966,7 @@@ enum nft_socket_keys * @NFT_CT_DST_IP: conntrack layer 3 protocol destination (IPv4 address) * @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address) * @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address) - * @NFT_CT_TIMEOUT: connection tracking timeout policy assigned to conntrack + * @NFT_CT_ID: conntrack id */ enum nft_ct_keys { NFT_CT_STATE, @@@ -991,6 -993,8 +992,7 @@@ NFT_CT_DST_IP, NFT_CT_SRC_IP6, NFT_CT_DST_IP6, - NFT_CT_TIMEOUT, + NFT_CT_ID, __NFT_CT_MAX }; #define NFT_CT_MAX (__NFT_CT_MAX - 1) That replaces the unused NFT_CT_TIMEOUT definition by NFT_CT_ID. If you prefer, I can also solve this conflict here, just let me know. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-06netfilter: nf_tables: fix base chain stat rcu_dereference usageFlorian Westphal
Following splat gets triggered when nfnetlink monitor is running while xtables-nft selftests are running: net/netfilter/nf_tables_api.c:1272 suspicious rcu_dereference_check() usage! other info that might help us debug this: 1 lock held by xtables-nft-mul/27006: #0: 00000000e0f85be9 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1a/0x50 Call Trace: nf_tables_fill_chain_info.isra.45+0x6cc/0x6e0 nf_tables_chain_notify+0xf8/0x1a0 nf_tables_commit+0x165c/0x1740 nf_tables_fill_chain_info() can be called both from dumps (rcu read locked) or from the transaction path if a userspace process subscribed to nftables notifications. In the 'table dump' case, rcu_access_pointer() cannot be used: We do not hold transaction mutex so the pointer can be NULLed right after the check. Just unconditionally fetch the value, then have the helper return immediately if its NULL. In the notification case we don't hold the rcu read lock, but updates are prevented due to transaction mutex. Use rcu_dereference_check() to make lockdep aware of this. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-30netfilter: nf_tables: delay chain policy update until transaction is completeFlorian Westphal
When we process a long ruleset of the form chain input { type filter hook input priority filter; policy drop; ... } Then the base chain gets registered early on, we then continue to process/validate the next messages coming in the same transaction. Problem is that if the base chain policy is 'drop', it will take effect immediately, which causes all traffic to get blocked until the transaction completes or is aborted. Fix this by deferring the policy until the transaction has been processed and all of the rules have been flagged as active. Reported-by: Jann Haber <jann.haber@selfnet.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-27netlink: make validation more configurable for future strictnessJohannes Berg
We currently have two levels of strict validation: 1) liberal (default) - undefined (type >= max) & NLA_UNSPEC attributes accepted - attribute length >= expected accepted - garbage at end of message accepted 2) strict (opt-in) - NLA_UNSPEC attributes accepted - attribute length >= expected accepted Split out parsing strictness into four different options: * TRAILING - check that there's no trailing data after parsing attributes (in message or nested) * MAXTYPE - reject attrs > max known type * UNSPEC - reject attributes with NLA_UNSPEC policy entries * STRICT_ATTRS - strictly validate attribute size The default for future things should be *everything*. The current *_strict() is a combination of TRAILING and MAXTYPE, and is renamed to _deprecated_strict(). The current regular parsing has none of this, and is renamed to *_parse_deprecated(). Additionally it allows us to selectively set one of the new flags even on old policies. Notably, the UNSPEC flag could be useful in this case, since it can be arranged (by filling in the policy) to not be an incompatible userspace ABI change, but would then going forward prevent forgetting attribute entries. Similar can apply to the POLICY flag. We end up with the following renames: * nla_parse -> nla_parse_deprecated * nla_parse_strict -> nla_parse_deprecated_strict * nlmsg_parse -> nlmsg_parse_deprecated * nlmsg_parse_strict -> nlmsg_parse_deprecated_strict * nla_parse_nested -> nla_parse_nested_deprecated * nla_validate_nested -> nla_validate_nested_deprecated Using spatch, of course: @@ expression TB, MAX, HEAD, LEN, POL, EXT; @@ -nla_parse(TB, MAX, HEAD, LEN, POL, EXT) +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression NLH, HDRLEN, TB, MAX, POL, EXT; @@ -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT) +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT) @@ expression TB, MAX, NLA, POL, EXT; @@ -nla_parse_nested(TB, MAX, NLA, POL, EXT) +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT) @@ expression START, MAX, POL, EXT; @@ -nla_validate_nested(START, MAX, POL, EXT) +nla_validate_nested_deprecated(START, MAX, POL, EXT) @@ expression NLH, HDRLEN, MAX, POL, EXT; @@ -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT) +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT) For this patch, don't actually add the strict, non-renamed versions yet so that it breaks compile if I get it wrong. Also, while at it, make nla_validate and nla_parse go down to a common __nla_validate_parse() function to avoid code duplication. Ultimately, this allows us to have very strict validation for every new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the next patch, while existing things will continue to work as is. In effect then, this adds fully strict validation for any new command. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27netlink: make nla_nest_start() add NLA_F_NESTED flagMichal Kubecek
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two easy cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-15netfilter: nf_tables: prevent shift wrap in nft_chain_parse_hook()Dan Carpenter
I believe that "hook->num" can be up to UINT_MAX. Shifting more than 31 bits would is undefined in C but in practice it would lead to shift wrapping. That would lead to an array overflow in nf_tables_addchain(): ops->hook = hook.type->hooks[ops->hooknum]; Fixes: fe19c04ca137 ("netfilter: nf_tables: remove nhooks field from struct nft_af_info") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08netfilter: make two functions staticFlorian Westphal
They have no external callers anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08netfilter: nf_tables: merge route type into coreFlorian Westphal
very little code, so it really doesn't make sense to have extra modules or even a kconfig knob for this. Merge them and make functionality available unconditionally. The merge makes inet family route support trivial, so add it as well here. Before: text data bss dec hex filename 835 832 0 1667 683 nft_chain_route_ipv4.ko 870 832 0 1702 6a6 nft_chain_route_ipv6.ko 111568 2556 529 114653 1bfdd nf_tables.ko After: text data bss dec hex filename 113133 2556 529 116218 1c5fa nf_tables.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08netfilter: nf_tables: remove unused parameter ctxColin Ian King
Function nf_tables_set_desc_parse parameter ctx is not being used so remove it as it is redundant. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-08time: Introduce jiffies64_to_msecs()Li RongQing
there is a similar helper in net/netfilter/nf_tables_api.c, this maybe become a common request someday, so move it to time.c Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Acked-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-04-07rhashtable: use bit_spin_locks to protect hash bucket.NeilBrown
This patch changes rhashtables to use a bit_spin_lock on BIT(1) of the bucket pointer to lock the hash chain for that bucket. The benefits of a bit spin_lock are: - no need to allocate a separate array of locks. - no need to have a configuration option to guide the choice of the size of this array - locking cost is often a single test-and-set in a cache line that will have to be loaded anyway. When inserting at, or removing from, the head of the chain, the unlock is free - writing the new address in the bucket head implicitly clears the lock bit. For __rhashtable_insert_fast() we ensure this always happens when adding a new key. - even when lockings costs 2 updates (lock and unlock), they are in a cacheline that needs to be read anyway. The cost of using a bit spin_lock is a little bit of code complexity, which I think is quite manageable. Bit spin_locks are sometimes inappropriate because they are not fair - if multiple CPUs repeatedly contend of the same lock, one CPU can easily be starved. This is not a credible situation with rhashtable. Multiple CPUs may want to repeatedly add or remove objects, but they will typically do so at different buckets, so they will attempt to acquire different locks. As we have more bit-locks than we previously had spinlocks (by at least a factor of two) we can expect slightly less contention to go with the slightly better cache behavior and reduced memory consumption. To enhance type checking, a new struct is introduced to represent the pointer plus lock-bit that is stored in the bucket-table. This is "struct rhash_lock_head" and is empty. A pointer to this needs to be cast to either an unsigned lock, or a "struct rhash_head *" to be useful. Variables of this type are most often called "bkt". Previously "pprev" would sometimes point to a bucket, and sometimes a ->next pointer in an rhash_head. As these are now different types, pprev is NULL when it would have pointed to the bucket. In that case, 'blk' is used, together with correct locking protocol. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20netfilter: nf_tables: add missing ->release_ops() in error path of newrule()Taehee Yoo
->release_ops() callback releases resources and this is used in error path. If nf_tables_newrule() fails after ->select_ops(), it should release resources. but it can not call ->destroy() because that should be called after ->init(). At this point, ->release_ops() should be used for releasing resources. Test commands: modprobe -rv xt_tcpudp iptables-nft -I INPUT -m tcp <-- error command lsmod Result: Module Size Used by xt_tcpudp 20480 2 <-- it should be 0 Fixes: b8e204006340 ("netfilter: nft_compat: use .release_ops and remove list of extension") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-11netfilter: nf_tables: return immediately on empty commitFlorian Westphal
When running 'nft flush ruleset' while no rules exist, we will increment the generation counter and announce a new genid to userspace, yet nothing had changed in the first place. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-11netfilter: nf_tables: use-after-free in dynamic operationsPablo Neira Ayuso
Smatch reports: net/netfilter/nf_tables_api.c:2167 nf_tables_expr_destroy() error: dereferencing freed memory 'expr->ops' net/netfilter/nf_tables_api.c 2162 static void nf_tables_expr_destroy(const struct nft_ctx *ctx, 2163 struct nft_expr *expr) 2164 { 2165 if (expr->ops->destroy) 2166 expr->ops->destroy(ctx, expr); ^^^^ --> 2167 module_put(expr->ops->type->owner); ^^^^^^^^^ 2168 } Smatch says there are three functions which free expr->ops. Fixes: b8e204006340 ("netfilter: nft_compat: use .release_ops and remove list of extension") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-11netfilter: nf_tables: bogus EBUSY when deleting set after flushPablo Neira Ayuso
Set deletion after flush coming in the same batch results in EBUSY. Add set use counter to track the number of references to this set from rules. We cannot rely on the list of bindings for this since such list is still populated from the preparation phase. Reported-by: Václav Zindulka <vaclav.zindulka@tlapnet.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-08netfilter: nf_tables: fix set double-free in abort pathPablo Neira Ayuso
The abort path can cause a double-free of an anonymous set. Added-and-to-be-aborted rule looks like this: udp dport { 137, 138 } drop The to-be-aborted transaction list looks like this: newset newsetelem newsetelem rule This gets walked in reverse order, so first pass disables the rule, the set elements, then the set. After synchronize_rcu(), we then destroy those in same order: rule, set element, set element, newset. Problem is that the anonymous set has already been bound to the rule, so the rule (lookup expression destructor) already frees the set, when then cause use-after-free when trying to delete the elements from this set, then try to free the set again when handling the newset expression. Rule releases the bound set in first place from the abort path, this causes the use-after-free on set element removal when undoing the new element transactions. To handle this, skip new element transaction if set is bound from the abort path. This is still causes the use-after-free on set element removal. To handle this, remove transaction from the list when the set is already bound. Joint work with Florian Westphal. Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path") Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1325 Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27netfilter: nft_compat: use .release_ops and remove list of extensionPablo Neira Ayuso
Add .release_ops, that is called in case of error at a later stage in the expression initialization path, ie. .select_ops() has been already set up operations and that needs to be undone. This allows us to unwind .select_ops from the error path, ie. release the dynamic operations for this extension. Moreover, allocate one single operation instead of recycling them, this comes at the cost of consuming a bit more memory per rule, but it simplifies the infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Two easily resolvable overlapping change conflicts, one in TCP and one in the eBPF verifier. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for you net-next tree: 1) Missing NFTA_RULE_POSITION_ID netlink attribute validation, from Phil Sutter. 2) Restrict matching on tunnel metadata to rx/tx path, from wenxu. 3) Avoid indirect calls for IPV6=y, from Florian Westphal. 4) Add two indirections to prepare merger of IPV4 and IPV6 nat modules, from Florian Westphal. 5) Broken indentation in ctnetlink, from Colin Ian King. 6) Patches to use struct_size() from netfilter and IPVS, from Gustavo A. R. Silva. 7) Display kernel splat only once in case of racing to confirm conntrack from bridge plus nfqueue setups, from Chieh-Min Wang. 8) Skip checksum validation for layer 4 protocols that don't need it, patch from Alin Nastac. 9) Sparse warning due to symbol that should be static in CLUSTERIP, from Wei Yongjun. 10) Add new toggle to disable SDP payload translation when media endpoint is reachable though the same interface as the signalling peer, from Alin Nastac. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-15netfilter: nf_tables: fix flush after rule deletion in the same batchPablo Neira Ayuso
Flush after rule deletion bogusly hits -ENOENT. Skip rules that have been already from nft_delrule_by_chain() which is always called from the flush path. Fixes: cf9dc09d0949 ("netfilter: nf_tables: fix missing rules flushing per table") Reported-by: Phil Sutter <phil@nwl.cc> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
An ipvlan bug fix in 'net' conflicted with the abstraction away of the IPV6 specific support in 'net-next'. Similarly, a bug fix for mlx5 in 'net' conflicted with the flow action conversion in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04netfilter: nf_tables: unbind set in rule from commit pathPablo Neira Ayuso
Anonymous sets that are bound to rules from the same transaction trigger a kernel splat from the abort path due to double set list removal and double free. This patch updates the logic to search for the transaction that is responsible for creating the set and disable the set list removal and release, given the rule is now responsible for this. Lookup is reverse since the transaction that adds the set is likely to be at the tail of the list. Moreover, this patch adds the unbind step to deliver the event from the commit path. This should not be done from the worker thread, since we have no guarantees of in-order delivery to the listener. This patch removes the assumption that both activate and deactivate callbacks need to be provided. Fixes: cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase") Reported-by: Mikhail Morfikov <mmorfikov@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-01-29netfilter: nf_tables: add NFTA_RULE_POSITION_ID to nla_policyFlorian Westphal
Fixes: 75dd48e2e420a ("netfilter: nf_tables: Support RULE_ID reference in new rule") Reported-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>