summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-06-03netfilter: nf_tables: pass ctx to nf_tables_expr_destroy()Pablo Neira Ayuso
nft_set_elem_destroy() can be called from call_rcu context. Annotate netns and table in set object so we can populate the context object. Moreover, pass context object to nf_tables_set_elem_destroy() from the commit phase, since it is already available from there. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03netfilter: nf_conncount: expose connection list interfacePablo Neira Ayuso
This patch provides an interface to maintain the list of connections and the lookup function to obtain the number of connections in the list. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03netfilter: nf_tables: pass context to object destroy indirectionPablo Neira Ayuso
The new connlimit object needs this to properly deal with conntrack dependencies. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03netfilter: Libify xt_TPROXYMáté Eckl
The extracted functions will likely be usefull to implement tproxy support in nf_tables. Extrancted functions: - nf_tproxy_sk_is_transparent - nf_tproxy_laddr4 - nf_tproxy_handle_time_wait4 - nf_tproxy_get_sock_v4 - nf_tproxy_laddr6 - nf_tproxy_handle_time_wait6 - nf_tproxy_get_sock_v6 (nf_)tproxy_handle_time_wait6 also needed some refactor as its current implementation was xtables-specific. Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-03netfilter: Decrease code duplication regarding transparent socket optionMáté Eckl
There is a function in include/net/netfilter/nf_socket.h to decide if a socket has IP(V6)_TRANSPARENT socket option set or not. However this does the same as inet_sk_transparent() in include/net/tcp.h include/net/tcp.h:1733 /* This helper checks if socket has IP_TRANSPARENT set */ static inline bool inet_sk_transparent(const struct sock *sk) { switch (sk->sk_state) { case TCP_TIME_WAIT: return inet_twsk(sk)->tw_transparent; case TCP_NEW_SYN_RECV: return inet_rsk(inet_reqsk(sk))->no_srccheck; } return inet_sk(sk)->transparent; } tproxy_sk_is_transparent has also been refactored to use this function instead of reimplementing it. Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Just three small last minute regressions that were found in the last week. The Broadcom fix is a bit big for rc7, but since it is fixing driver crash regressions that were merged via netdev into rc1, I am sending it. - bnxt netdev changes merged this cycle caused the bnxt RDMA driver to crash under certain situations - Arnd found (several, unfortunately) kconfig problems with the patches adding INFINIBAND_ADDR_TRANS. Reverting this last part, will fix it more fully outside -rc. - Subtle change in error code for a uapi function caused breakage in userspace. This was bug was subtly introduced cycle" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: IB/core: Fix error code for invalid GID entry IB: Revert "remove redundant INFINIBAND kconfig dependencies" RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes
2018-06-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree, the most relevant things in this batch are: 1) Compile masquerade infrastructure into NAT module, from Florian Westphal. Same thing with the redirection support. 2) Abort transaction if early initialization of the commit phase fails. Also from Florian. 3) Get rid of synchronize_rcu() by using rule array in nf_tables, from Florian. 4) Abort nf_tables batch if fatal signal is pending, from Florian. 5) Use .call_rcu nfnetlink from nf_tables to make dumps fully lockless. From Florian Westphal. 6) Support to match transparent sockets from nf_tables, from Máté Eckl. 7) Audit support for nf_tables, from Phil Sutter. 8) Validate chain dependencies from commit phase, fall back to fine grain validation only in case of errors. 9) Attach dst to skbuff from netfilter flowtable packet path, from Jason A. Donenfeld. 10) Use artificial maximum attribute cap to remove VLA from nfnetlink. Patch from Kees Cook. 11) Add extension to allow to forward packets through neighbour layer. 12) Add IPv6 conntrack helper support to IPVS, from Julian Anastasov. 13) Add IPv6 FTP conntrack support to IPVS, from Julian Anastasov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-02ipvs: register conntrack hooks for ftpJulian Anastasov
ip_vs_ftp requires conntrack modules for mangling of FTP command responses in passive mode. Make sure the conntrack hooks are registered when real servers use NAT method in FTP virtual service. The hooks will be registered while the service is present. Fixes: 0c66dc1ea3f0 ("netfilter: conntrack: register hooks in netns when needed by ruleset") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01xprtrdma: Remove transfertypes arrayChuck Lever
Clean up: This array was used in a dprintk that was replaced by a trace point in commit ab03eff58eb5 ("xprtrdma: Add trace points in RPC Call transmit paths"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-06-01ip6_tunnel: remove magic mtu value 0xFFF8Nicolas Dichtel
I don't know where this value comes from (probably a copy and paste and paste and paste ...). Let's use standard values which are a bit greater. Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01xprtrdma: Add trace_xprtrdma_dma_map(mr)Chuck Lever
Matches trace_xprtrdma_dma_unmap(mr). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-06-01xprtrdma: Wait on empty sendctx queueChuck Lever
Currently, when the sendctx queue is exhausted during marshaling, the RPC/RDMA transport places the RPC task on the delayq, which forces a wait for HZ >> 2 before the marshal and send is retried. With this change, the transport now places such an RPC task on the pending queue, and wakes it just as soon as more sendctxs become available. This typically takes less than a millisecond, and the write_space waking mechanism is less deadlock-prone. Moreover, the waiting RPC task is holding the transport's write lock, which blocks the transport from sending RPCs. Therefore faster recovery from sendctx queue exhaustion is desirable. Cf. commit 5804891455d5 ("xprtrdma: ->send_request returns -EAGAIN when there are no free MRs"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-06-01xprtrdma: Move common wait_for_buffer_space call to parent functionChuck Lever
Clean up: The logic to wait for write space is common to a bunch of the encoding helper functions. Lift it out and put it in the tail of rpcrdma_marshal_req(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-06-01xprtrdma: Return -ENOBUFS when no pages are availableChuck Lever
The use of -EAGAIN in rpcrdma_convert_iovs() is a latent bug: the transport never calls xprt_write_space() when more pages become available. -ENOBUFS will trigger the correct "delay briefly and call again" logic. Fixes: 7a89f9c626e3 ("xprtrdma: Honor ->send_request API contract") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-06-01ip_tunnel: restore binding to ifaces with a large mtuNicolas Dichtel
After commit f6cc9c054e77, the following conf is broken (note that the default loopback mtu is 65536, ie IP_MAX_MTU + 1): $ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev lo add tunnel "gre0" failed: Invalid argument $ ip l a type dummy $ ip l s dummy1 up $ ip l s dummy1 mtu 65535 $ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev dummy1 add tunnel "gre0" failed: Invalid argument dev_set_mtu() doesn't allow to set a mtu which is too large. First, let's cap the mtu returned by ip_tunnel_bind_dev(). Second, remove the magic value 0xFFF8 and use IP_MAX_MTU instead. 0xFFF8 seems to be there for ages, I don't know why this value was used. With a recent kernel, it's also possible to set a mtu > IP_MAX_MTU: $ ip l s dummy1 mtu 66000 After that patch, it's also possible to bind an ip tunnel on that kind of interface. CC: Petr Machata <petrm@mellanox.com> CC: Ido Schimmel <idosch@mellanox.com> Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a Fixes: f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-05-31 1) Avoid possible overflow of the offset variable in _decode_session6(), this fixes an infinite lookp there. From Eric Dumazet. 2) We may use an error pointer in the error path of xfrm_bundle_create(). Fix this by returning this pointer directly to the caller. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01net: sched: split tc_ctl_tfilter into three handlersVlad Buslov
tc_ctl_tfilter handles three netlink message types: RTM_NEWTFILTER, RTM_DELTFILTER, RTM_GETTFILTER. However, implementation of this function involves a lot of branching on specific message type because most of the code is message-specific. This significantly complicates adding new functionality and doesn't provide much benefit of code reuse. Split tc_ctl_tfilter to three standalone functions that handle filter new, delete and get requests. The only truly protocol independent part of tc_ctl_tfilter is code that looks up queue, class, and block. Refactor this code to standalone tcf_block_find function that is used by all three new handlers. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01rtnetlink: Fix null-ptr-deref in rtnl_newlinkPrashant Bhole
In rtnl_newlink(), NULL check is performed on m_ops however member of ops is accessed. Fixed by accessing member of m_ops instead of ops. [ 345.432629] BUG: KASAN: null-ptr-deref in rtnl_newlink+0x400/0x1110 [ 345.432629] Read of size 4 at addr 0000000000000088 by task ip/986 [ 345.432629] [ 345.432629] CPU: 1 PID: 986 Comm: ip Not tainted 4.17.0-rc6+ #9 [ 345.432629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 345.432629] Call Trace: [ 345.432629] dump_stack+0xc6/0x150 [ 345.432629] ? dump_stack_print_info.cold.0+0x1b/0x1b [ 345.432629] ? kasan_report+0xb4/0x410 [ 345.432629] kasan_report.cold.4+0x8f/0x91 [ 345.432629] ? rtnl_newlink+0x400/0x1110 [ 345.432629] rtnl_newlink+0x400/0x1110 [...] Fixes: ccf8dbcd062a ("rtnetlink: Remove VLA usage") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01kcm: Fix use-after-free caused by clonned socketsKirill Tkhai
(resend for properly queueing in patchwork) kcm_clone() creates kernel socket, which does not take net counter. Thus, the net may die before the socket is completely destructed, i.e. kcm_exit_net() is executed before kcm_done(). Reported-by: syzbot+5f1a04e374a635efc426@syzkaller.appspotmail.com Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-01ipvs: add ipv6 support to ftpJulian Anastasov
Add support for FTP commands with extended format (RFC 2428): - FTP EPRT: IPv4 and IPv6, active mode, similar to PORT - FTP EPSV: IPv4 and IPv6, passive mode, similar to PASV. EPSV response usually contains only port but we allow real server to provide different address We restrict control and data connection to be from same address family. Allow the "(" and ")" to be optional in PASV response. Also, add ipvsh argument to the pkt_in/pkt_out handlers to better access the payload after transport header. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01ipvs: add full ipv6 support to nfctJulian Anastasov
Prepare NFCT to support IPv6 for FTP: - Do not restrict the expectation callback to PF_INET - Split the debug messages, so that the 160-byte limitation in IP_VS_DBG_BUF is not exceeded when printing many IPv6 addresses. This means no more than 3 addresses in one message, i.e. 1 tuple with 2 addresses or 1 connection with 3 addresses. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: nft_fwd_netdev: allow to forward packets via neighbour layerPablo Neira Ayuso
This allows us to forward packets from the netdev family via neighbour layer, so you don't need an explicit link-layer destination when using this expression from rules. The ttl/hop_limit field is decremented. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: nf_tables: check msg_type before nft_trans_set(trans)Alexey Kodanev
The patch moves the "trans->msg_type == NFT_MSG_NEWSET" check before using nft_trans_set(trans). Otherwise we can get out of bounds read. For example, KASAN reported the one when running 0001_cache_handling_0 nft test. In this case "trans->msg_type" was NFT_MSG_NEWTABLE: [75517.177808] BUG: KASAN: slab-out-of-bounds in nft_set_lookup_global+0x22f/0x270 [nf_tables] [75517.279094] Read of size 8 at addr ffff881bdb643fc8 by task nft/7356 ... [75517.375605] CPU: 26 PID: 7356 Comm: nft Tainted: G E 4.17.0-rc7.1.x86_64 #1 [75517.489587] Hardware name: Oracle Corporation SUN SERVER X4-2 [75517.618129] Call Trace: [75517.648821] dump_stack+0xd1/0x13b [75517.691040] ? show_regs_print_info+0x5/0x5 [75517.742519] ? kmsg_dump_rewind_nolock+0xf5/0xf5 [75517.799300] ? lock_acquire+0x143/0x310 [75517.846738] print_address_description+0x85/0x3a0 [75517.904547] kasan_report+0x18d/0x4b0 [75517.949892] ? nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.019153] ? nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.088420] ? nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.157689] nft_set_lookup_global+0x22f/0x270 [nf_tables] [75518.224869] nf_tables_newsetelem+0x1a5/0x5d0 [nf_tables] [75518.291024] ? nft_add_set_elem+0x2280/0x2280 [nf_tables] [75518.357154] ? nla_parse+0x1a5/0x300 [75518.401455] ? kasan_kmalloc+0xa6/0xd0 [75518.447842] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink] [75518.507743] ? nfnetlink_rcv+0x7a5/0x1bdf [nfnetlink] [75518.569745] ? nfnl_err_reset+0x3c0/0x3c0 [nfnetlink] [75518.631711] ? lock_acquire+0x143/0x310 [75518.679133] ? netlink_deliver_tap+0x9b/0x1070 [75518.733840] ? kasan_unpoison_shadow+0x31/0x40 [75518.788542] netlink_unicast+0x45d/0x680 [75518.837111] ? __isolate_free_page+0x890/0x890 [75518.891913] ? netlink_attachskb+0x6b0/0x6b0 [75518.944542] netlink_sendmsg+0x6fa/0xd30 [75518.993107] ? netlink_unicast+0x680/0x680 [75519.043758] ? netlink_unicast+0x680/0x680 [75519.094402] sock_sendmsg+0xd9/0x160 [75519.138810] ___sys_sendmsg+0x64d/0x980 [75519.186234] ? copy_msghdr_from_user+0x350/0x350 [75519.243118] ? lock_downgrade+0x650/0x650 [75519.292738] ? do_raw_spin_unlock+0x5d/0x250 [75519.345456] ? _raw_spin_unlock+0x24/0x30 [75519.395065] ? __handle_mm_fault+0xbde/0x3410 [75519.448830] ? sock_setsockopt+0x3d2/0x1940 [75519.500516] ? __lock_acquire.isra.25+0xdc/0x19d0 [75519.558448] ? lock_downgrade+0x650/0x650 [75519.608057] ? __audit_syscall_entry+0x317/0x720 [75519.664960] ? __fget_light+0x58/0x250 [75519.711325] ? __sys_sendmsg+0xde/0x170 [75519.758850] __sys_sendmsg+0xde/0x170 [75519.804193] ? __ia32_sys_shutdown+0x90/0x90 [75519.856725] ? syscall_trace_enter+0x897/0x10e0 [75519.912354] ? trace_event_raw_event_sys_enter+0x920/0x920 [75519.979432] ? __audit_syscall_entry+0x720/0x720 [75520.036118] do_syscall_64+0xa3/0x3d0 [75520.081248] ? prepare_exit_to_usermode+0x47/0x1d0 [75520.139904] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [75520.201680] RIP: 0033:0x7fc153320ba0 [75520.245772] RSP: 002b:00007ffe294c3638 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [75520.337708] RAX: ffffffffffffffda RBX: 00007ffe294c4820 RCX: 00007fc153320ba0 [75520.424547] RDX: 0000000000000000 RSI: 00007ffe294c46b0 RDI: 0000000000000003 [75520.511386] RBP: 00007ffe294c47b0 R08: 0000000000000004 R09: 0000000002114090 [75520.598225] R10: 00007ffe294c30a0 R11: 0000000000000246 R12: 00007ffe294c3660 [75520.684961] R13: 0000000000000001 R14: 00007ffe294c3650 R15: 0000000000000001 [75520.790946] Allocated by task 7356: [75520.833994] kasan_kmalloc+0xa6/0xd0 [75520.878088] __kmalloc+0x189/0x450 [75520.920107] nft_trans_alloc_gfp+0x20/0x190 [nf_tables] [75520.983961] nf_tables_newtable+0xcd0/0x1bd0 [nf_tables] [75521.048857] nfnetlink_rcv+0xc43/0x1bdf [nfnetlink] [75521.108655] netlink_unicast+0x45d/0x680 [75521.157013] netlink_sendmsg+0x6fa/0xd30 [75521.205271] sock_sendmsg+0xd9/0x160 [75521.249365] ___sys_sendmsg+0x64d/0x980 [75521.296686] __sys_sendmsg+0xde/0x170 [75521.341822] do_syscall_64+0xa3/0x3d0 [75521.386957] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [75521.467867] Freed by task 23454: [75521.507804] __kasan_slab_free+0x132/0x180 [75521.558137] kfree+0x14d/0x4d0 [75521.596005] free_rt_sched_group+0x153/0x280 [75521.648410] sched_autogroup_create_attach+0x19a/0x520 [75521.711330] ksys_setsid+0x2ba/0x400 [75521.755529] __ia32_sys_setsid+0xa/0x10 [75521.802850] do_syscall_64+0xa3/0x3d0 [75521.848090] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [75521.929000] The buggy address belongs to the object at ffff881bdb643f80 which belongs to the cache kmalloc-96 of size 96 [75522.079797] The buggy address is located 72 bytes inside of 96-byte region [ffff881bdb643f80, ffff881bdb643fe0) [75522.221234] The buggy address belongs to the page: [75522.280100] page:ffffea006f6d90c0 count:1 mapcount:0 mapping:0000000000000000 index:0x0 [75522.377443] flags: 0x2fffff80000100(slab) [75522.426956] raw: 002fffff80000100 0000000000000000 0000000000000000 0000000180200020 [75522.521275] raw: ffffea006e6fafc0 0000000c0000000c ffff881bf180f400 0000000000000000 [75522.615601] page dumped because: kasan: bad access detected Fixes: 37a9cc525525 ("netfilter: nf_tables: add generation mask to sets") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: xt_CT: Reject the non-null terminated string from user spaceGao Feng
The helper and timeout strings are from user-space, we need to make sure they are null terminated. If not, evil user could make kernel read the unexpected memory, even print it when fail to find by the following codes. pr_info_ratelimited("No such helper \"%s\"\n", helper_name); Signed-off-by: Gao Feng <gfree.wind@vip.163.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: nfnetlink: Remove VLA usageKees Cook
In the quest to remove all stack VLA usage from the kernel[1], this allocates the maximum size expected for all possible attrs and adds sanity-checks at both registration and usage to make sure nothing gets out of sync. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: nf_flow_table: attach dst to skbsJason A. Donenfeld
Some drivers, such as vxlan and wireguard, use the skb's dst in order to determine things like PMTU. They therefore loose functionality when flow offloading is enabled. So, we ensure the skb has it before xmit'ing it in the offloading path. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: nf_tables: fix chain dependency validationPablo Neira Ayuso
The following ruleset: add table ip filter add chain ip filter input { type filter hook input priority 4; } add chain ip filter ap add rule ip filter input jump ap add rule ip filter ap masquerade results in a panic, because the masquerade extension should be rejected from the filter chain. The existing validation is missing a chain dependency check when the rule is added to the non-base chain. This patch fixes the problem by walking down the rules from the basechains, searching for either immediate or lookup expressions, then jumping to non-base chains and again walking down the rules to perform the expression validation, so we make sure the full ruleset graph is validated. This is done only once from the commit phase, in case of problem, we abort the transaction and perform fine grain validation for error reporting. This patch requires 003087911af2 ("netfilter: nfnetlink: allow commit to fail") to achieve this behaviour. This patch also adds a cleanup callback to nfnl batch interface to reset the validate state from the exit path. As a result of this patch, nf_tables_check_loops() doesn't use ->validate to check for loops, instead it just checks for immediate expressions. Reported-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: nf_tables: Add audit support to log statementPhil Sutter
This extends log statement to support the behaviour achieved with AUDIT target in iptables. Audit logging is enabled via a pseudo log level 8. In this case any other settings like log prefix are ignored since audit log format is fixed. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: nf_tables: add support for native socket matchingMáté Eckl
Now it can only match the transparent flag of an ip/ipv6 socket. Signed-off-by: Máté Eckl <ecklm94@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-01netfilter: fix ptr_ret.cocci warningskbuild test robot
net/netfilter/nft_numgen.c:117:1-3: WARNING: PTR_ERR_OR_ZERO can be used net/netfilter/nft_hash.c:180:1-3: WARNING: PTR_ERR_OR_ZERO can be used net/netfilter/nft_hash.c:223:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Fixes: b9ccc07e3f31 ("netfilter: nft_hash: add map lookups for hashing operations") Fixes: d734a2888922 ("netfilter: nft_numgen: add map lookups for numgen statements") CC: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Acked-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-05-31net-sysfs: Fix memory leak in XPS configurationAlexander Duyck
This patch reorders the error cases in showing the XPS configuration so that we hold off on memory allocation until after we have verified that we can support XPS on a given ring. Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes") Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31rtnetlink: Remove VLA usageKees Cook
In the quest to remove all stack VLA usage from the kernel[1], this allocates the maximum size expected for all possible types and adds sanity-checks at both registration and usage to make sure nothing gets out of sync. This matches the proposed VLA solution for nfnetlink[2]. The values chosen here were based on finding assignments for .maxtype and .slave_maxtype and manually counting the enums: slave_maxtype (max 33): IFLA_BRPORT_MAX 33 IFLA_BOND_SLAVE_MAX 9 maxtype (max 45): IFLA_BOND_MAX 28 IFLA_BR_MAX 45 __IFLA_CAIF_HSI_MAX 8 IFLA_CAIF_MAX 4 IFLA_CAN_MAX 16 IFLA_GENEVE_MAX 12 IFLA_GRE_MAX 25 IFLA_GTP_MAX 5 IFLA_HSR_MAX 7 IFLA_IPOIB_MAX 4 IFLA_IPTUN_MAX 21 IFLA_IPVLAN_MAX 3 IFLA_MACSEC_MAX 15 IFLA_MACVLAN_MAX 7 IFLA_PPP_MAX 2 __IFLA_RMNET_MAX 4 IFLA_VLAN_MAX 6 IFLA_VRF_MAX 2 IFLA_VTI_MAX 7 IFLA_VXLAN_MAX 28 VETH_INFO_MAX 2 VXCAN_INFO_MAX 2 This additionally changes maxtype and slave_maxtype fields to unsigned, since they're only ever using positive values. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com [2] https://patchwork.kernel.org/patch/10439647/ Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31net/ncsi: Fix array size in dumpit handlerSamuel Mendoza-Jonas
With CONFIG_CC_STACKPROTECTOR enabled the kernel panics as below when parsing a NCSI_CMD_PKG_INFO command: [ 150.149711] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 805cff08 [ 150.149711] [ 150.159919] CPU: 0 PID: 1301 Comm: ncsi-netlink Not tainted 4.13.16-468cbec6d2c91239332cb91b1f0a73aafcb6f0c6 #1 [ 150.170004] Hardware name: Generic DT based system [ 150.174852] [<80109930>] (unwind_backtrace) from [<80106bc4>] (show_stack+0x20/0x24) [ 150.182641] [<80106bc4>] (show_stack) from [<805d36e4>] (dump_stack+0x20/0x28) [ 150.189888] [<805d36e4>] (dump_stack) from [<801163ac>] (panic+0xdc/0x278) [ 150.196780] [<801163ac>] (panic) from [<801162cc>] (__stack_chk_fail+0x20/0x24) [ 150.204111] [<801162cc>] (__stack_chk_fail) from [<805cff08>] (ncsi_pkg_info_all_nl+0x244/0x258) [ 150.212912] [<805cff08>] (ncsi_pkg_info_all_nl) from [<804f939c>] (genl_lock_dumpit+0x3c/0x54) [ 150.221535] [<804f939c>] (genl_lock_dumpit) from [<804f873c>] (netlink_dump+0xf8/0x284) [ 150.229550] [<804f873c>] (netlink_dump) from [<804f8d44>] (__netlink_dump_start+0x124/0x17c) [ 150.237992] [<804f8d44>] (__netlink_dump_start) from [<804f9880>] (genl_rcv_msg+0x1c8/0x3d4) [ 150.246440] [<804f9880>] (genl_rcv_msg) from [<804f9174>] (netlink_rcv_skb+0xd8/0x134) [ 150.254361] [<804f9174>] (netlink_rcv_skb) from [<804f96a4>] (genl_rcv+0x30/0x44) [ 150.261850] [<804f96a4>] (genl_rcv) from [<804f7790>] (netlink_unicast+0x198/0x234) [ 150.269511] [<804f7790>] (netlink_unicast) from [<804f7ffc>] (netlink_sendmsg+0x368/0x3b0) [ 150.277783] [<804f7ffc>] (netlink_sendmsg) from [<804abea4>] (sock_sendmsg+0x24/0x34) [ 150.285625] [<804abea4>] (sock_sendmsg) from [<804ac1dc>] (___sys_sendmsg+0x244/0x260) [ 150.293556] [<804ac1dc>] (___sys_sendmsg) from [<804ad98c>] (__sys_sendmsg+0x5c/0x9c) [ 150.301400] [<804ad98c>] (__sys_sendmsg) from [<804ad9e4>] (SyS_sendmsg+0x18/0x1c) [ 150.308984] [<804ad9e4>] (SyS_sendmsg) from [<80102640>] (ret_fast_syscall+0x0/0x3c) [ 150.316743] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: 805cff08 This turns out to be because the attrs array in ncsi_pkg_info_all_nl() is initialised to a length of NCSI_ATTR_MAX which is the maximum attribute number, not the number of attributes. Fixes: 955dc68cb9b2 ("net/ncsi: Add generic netlink family") Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31cls_flower: Fix incorrect idr release when failing to modify rulePaul Blakey
When we fail to modify a rule, we incorrectly release the idr handle of the unmodified old rule. Fix that by checking if we need to release it. Fixes: fe2502e49b58 ("net_sched: remove cls_flower idr on failure") Reported-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31net: bridge: Notify about bridge VLANsPetr Machata
A driver might need to react to changes in settings of brentry VLANs. Therefore send switchdev port notifications for these as well. Reuse SWITCHDEV_OBJ_ID_PORT_VLAN for this purpose. Listeners should use netif_is_bridge_master() on orig_dev to determine whether the notification is about a bridge port or a bridge. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31dsa: port: Ignore bridge VLAN eventsPetr Machata
A follow-up patch enables emitting VLAN notifications for the bridge CPU port in addition to the existing slave port notifications. These notifications have orig_dev set to the bridge in question. Because there's no specific support for these VLANs, just ignore the notifications to maintain the current behavior. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31net: bridge: Extract br_vlan_add_existing()Petr Machata
Extract the code that deals with adding a preexisting VLAN to bridge CPU port to a separate function. A follow-up patch introduces a need to roll back operations in this block due to an error, and this split will make the error-handling code clearer. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31net: bridge: Extract boilerplate around switchdev_port_obj_*()Petr Machata
A call to switchdev_port_obj_add() or switchdev_port_obj_del() involves initializing a struct switchdev_obj_port_vlan, a piece of code that repeats on each call site almost verbatim. While in the current codebase there is just one duplicated add call, the follow-up patches add more of both add and del calls. Thus to remove the duplication, extract the repetition into named functions and reuse. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31net: remove bypassed check in sch_direct_xmit()Song Liu
Checking netif_xmit_frozen_or_stopped() at the end of sch_direct_xmit() is being bypassed. This is because "ret" from sch_direct_xmit() will be either NETDEV_TX_OK or NETDEV_TX_BUSY, and only ret == NETDEV_TX_OK == 0 will reach the condition: if (ret && netif_xmit_frozen_or_stopped(txq)) return false; This patch cleans up the code by removing the whole condition. For more discussion about this, please refer to https://marc.info/?t=152727195700008 Signed-off-by: Song Liu <songliubraving@fb.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31tcp: minor optimization around tcp_hdr() usage in receive pathYafang Shao
This is additional to the commit ea1627c20c34 ("tcp: minor optimizations around tcp_hdr() usage"). At this point, skb->data is same with tcp_hdr() as tcp header has not been pulled yet. So use the less expensive one to get the tcp header. Remove the third parameter of tcp_rcv_established() and put it into the function body. Furthermore, the local variables are listed as a reverse christmas tree :) Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-31xfrm Fix potential error pointer dereference in xfrm_bundle_create.Steffen Klassert
We may derference an invalid pointer in the error path of xfrm_bundle_create(). Fix this by returning this error pointer directly instead of assigning it to xdst0. Fixes: 45b018beddb6 ("ipsec: Create and use new helpers for dst child access.") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-05-30bpf: Change bpf_fib_lookup to return -EAFNOSUPPORT for unsupported address ↵David Ahern
families Update bpf_fib_lookup to return -EAFNOSUPPORT for unsupported address families. Allows userspace to probe for support as more are added (e.g., AF_MPLS). Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-30Bluetooth: Re-use kstrtobool_from_user()Andy Shevchenko
Re-use kstrtobool_from_user() instead of open coded variant. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2018-05-29bpf: Verify flags in bpf_fib_lookupDavid Ahern
Verify flags argument contains only known flags. Allows programs to probe for support as more are added. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-29bpf: hide the unused 'off' variableYueHaibing
The local variable is only used while CONFIG_IPV6 enabled net/core/filter.c: In function ‘sk_msg_convert_ctx_access’: net/core/filter.c:6489:6: warning: unused variable ‘off’ [-Wunused-variable] int off; ^ This puts it into #ifdef. Fixes: 303def35f64e ("bpf: allow sk_msg programs to read sock fields") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-05-29bpfilter: fix a build errYueHaibing
gcc-7.3.0 report following err: HOSTCC net/bpfilter/main.o In file included from net/bpfilter/main.c:9:0: ./include/uapi/linux/bpf.h:12:10: fatal error: linux/bpf_common.h: No such file or directory #include <linux/bpf_common.h> remove it by adding a include path. Fixes: d2ba09c17a06 ("net: add skeleton of bpfilter kernel module") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29net/ipv6: Add support for specifying metric of connected routesDavid Ahern
Add support for IFA_RT_PRIORITY to ipv6 addresses. If the metric is changed on an existing address then the new route is inserted before removing the old one. Since the metric is one of the route keys, the prefix route can not be atomically replaced. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29net/ipv4: Add support for specifying metric of connected routesDavid Ahern
Add support for IFA_RT_PRIORITY to ipv4 addresses. If the metric is changed on an existing address then the new route is inserted before removing the old one. Since the metric is one of the route keys, the prefix route can not be replaced. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29net/ipv6: Pass ifa6_config struct to inet6_addr_modifyDavid Ahern
Update inet6_addr_modify to take ifa6_config argument versus a parameter list. This is an argument move only; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-05-29net/ipv6: Pass ifa6_config struct to inet6_addr_addDavid Ahern
Move the creation of struct ifa6_config up to callers of inet6_addr_add. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>