summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-01-16net: delete /proc THIS_MODULE referencesAlexey Dobriyan
/proc has been ignoring struct file_operations::owner field for 10 years. Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where inode->i_fop is initialized with proxy struct file_operations for regular files: - if (de->proc_fops) - inode->i_fop = de->proc_fops; + if (de->proc_fops) { + if (S_ISREG(inode->i_mode)) + inode->i_fop = &proc_reg_file_ops; + else + inode->i_fop = de->proc_fops; + } VFS stopped pinning module at this point. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16tipc: fix race condition at topology server receiveJon Maloy
We have identified a race condition during reception of socket events and messages in the topology server. - The function tipc_close_conn() is releasing the corresponding struct tipc_subscriber instance without considering that there may still be items in the receive work queue. When those are scheduled, in the function tipc_receive_from_work(), they are using the subscriber pointer stored in struct tipc_conn, without first checking if this is valid or not. This will sometimes lead to crashes, as the next call of tipc_conn_recvmsg() will access the now deleted item. We fix this by making the usage of this pointer conditional on whether the connection is active or not. I.e., we check the condition test_bit(CF_CONNECTED) before making the call tipc_conn_recvmsg(). - Since the two functions may be running on different cores, the condition test described above is not enough. tipc_close_conn() may come in between and delete the subscriber item after the condition test is done, but before tipc_conn_recv_msg() is finished. This happens less frequently than the problem described above, but leads to the same symptoms. We fix this by using the existing sk_callback_lock for mutual exclusion in the two functions. In addition, we have to move a call to tipc_conn_terminate() outside the mentioned lock to avoid deadlock. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16Merge tag 'mac80211-for-davem-2018-01-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== More fixes: * hwsim: - properly flush deletion works at module unload - validate # of channels passed from userspace * cfg80211: - fix RCU locking regression - initialize on-stack channel data for nl80211 event - check dev_set_name() return value ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16sctp: do not allow the v4 socket to bind a v4mapped v6 addressXin Long
The check in sctp_sockaddr_af is not robust enough to forbid binding a v4mapped v6 addr on a v4 socket. The worse thing is that v4 socket's bind_verify would not convert this v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4 socket bound a v6 addr. This patch is to fix it by doing the common sa.sa_family check first, then AF_INET check for v4mapped v6 addrs. Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.") Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbufXin Long
After commit cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep"), it may change to lock another sk if the asoc has been peeled off in sctp_wait_for_sndbuf. However, the asoc's new sk could be already closed elsewhere, as it's in the sendmsg context of the old sk that can't avoid the new sk's closing. If the sk's last one refcnt is held by this asoc, later on after putting this asoc, the new sk will be freed, while under it's own lock. This patch is to revert that commit, but fix the old issue by returning error under the old sk's lock. Fixes: cea0cc80a677 ("sctp: use the right sk after waking up from wait_buf sleep") Reported-by: syzbot+ac6ea7baa4432811eb50@syzkaller.appspotmail.com Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16sctp: reinit stream if stream outcnt has been change by sinit in sendmsgXin Long
After introducing sctp_stream structure, sctp uses stream->outcnt as the out stream nums instead of c.sinit_num_ostreams. However when users use sinit in cmsg, it only updates c.sinit_num_ostreams in sctp_sendmsg. At that moment, stream->outcnt is still using previous value. If it's value is not updated, the sinit_num_ostreams of sinit could not really work. This patch is to fix it by updating stream->outcnt and reiniting stream if stream outcnt has been change by sinit in sendmsg. Fixes: a83863174a61 ("sctp: prepare asoc stream for stream reconf") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16devlink: Add relation between dpipe and resourceArkadi Sharshevsky
The hardware processes which are modeled via dpipe commonly use some internal hardware resources. Such relation can improve the understanding of hardware limitations. The number of resource's unit consumed per table's entry are also provided for each table. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16devlink: Add support for reloadArkadi Sharshevsky
Add support for performing driver hot reload. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16devlink: Add support for resource abstractionArkadi Sharshevsky
Add support for hardware resource abstraction over devlink. Each resource is identified via id, furthermore it contains information regarding its size and its related sub resources. Each resource can also provide its current occupancy. In some cases the sizes of some resources can be changed, yet for those changes to take place a hot driver reload may be needed. The reload capability will be introduced in the next patch. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16devlink: Add per devlink instance lockArkadi Sharshevsky
This is a preparation before introducing resources and hot reload support. Currently there are two global lock where one protects all devlink access, and the second one protects devlink port access. This patch adds per devlink instance lock which protects the internal members which are the sb/dpipe/ resource/ports. By introducing this lock the global devlink port lock can be discarded. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16xprtrdma: Introduce rpcrdma_mw_unmap_and_putChuck Lever
Clean up: Code review suggested that a common bit of code can be placed into a helper function, and this gives us fewer places to stick an "I DMA unmapped something" trace point. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: Remove usage of "mw"Chuck Lever
Clean up: struct rpcrdma_mw was named after Memory Windows, but xprtrdma no longer supports a Memory Window registration mode. Rename rpcrdma_mw and its fields to reduce confusion and make the code more sensible to read. Renaming "mw" was suggested by Tom Talpey, the author of the original xprtrdma implementation. It's a good idea, but I haven't done this until now because it's a huge diffstat for no benefit other than code readability. However, I'm about to introduce static trace points that expose a few of xprtrdma's internal data structures. They should make sense in the trace report, and it's reasonable to treat trace points as a kernel API contract which might be difficult to change later. While I'm churning things up, two additional changes: - rename variables unhelpfully called "r" to "mr", to improve code clarity, and - rename the MR-related helper functions using the form "rpcrdma_mr_<verb>", to be consistent with other areas of the code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: Replace all usage of "frmr" with "frwr"Chuck Lever
Clean up: Over time, the industry has adopted the term "frwr" instead of "frmr". The term "frwr" is now more widely recognized. For the past couple of years I've attempted to add new code using "frwr" , but there still remains plenty of older code that still uses "frmr". Replace all usage of "frmr" to avoid confusion. While we're churning code, rename variables unhelpfully called "f" to "frwr", to improve code clarity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: Don't clear RPC_BC_PA_IN_USE on pre-allocated rpc_rqst'sChuck Lever
No need for the overhead of atomically setting and clearing this bit flag for every use of a pre-allocated backchannel rpc_rqst. These are a distinct pool of rpc_rqsts that are used only for callback operations, so it is safe to simply leave the bit set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: Split xprt_rdma_send_requestChuck Lever
Clean up. @rqst is set up differently for backchannel Replies. For example, rqst->rq_task and task->tk_client are both NULL. So it is easier to understand and maintain this code path if it is separated. Also, we can get rid of the confusing rl_connect_cookie hack in rpcrdma_bc_receive_call. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: buf_free not called for CB repliesChuck Lever
Since commit 5a6d1db45569 ("SUNRPC: Add a transport-specific private field in rpc_rqst"), the rpc_rqst's for RPC-over-RDMA backchannel operations leave rq_buffer set to NULL. xprt_release does not invoke ->op->buf_free when rq_buffer is NULL. The RPCRDMA_REQ_F_BACKCHANNEL check in xprt_rdma_free is therefore redundant because xprt_rdma_free is not invoked for backchannel requests. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: Move unmap-safe logic to rpcrdma_marshal_reqChuck Lever
Clean up. This logic is related to marshaling the request, and I'd like to keep everything that touches req->rl_registered close together, for CPU cache efficiency. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: Support IPv6 in xprt_rdma_set_portChuck Lever
Clean up a harmless oversight. xprtrdma's ->set_port method has never properly supported IPv6. This issue has never been a problem because NFS/RDMA mounts have always required "port=20049", thus so far, rpcbind is not invoked for these mounts. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: Remove another sockaddr_storage field (cdata::addr)Chuck Lever
Save more space in struct rpcrdma_xprt by removing the redundant "addr" field from struct rpcrdma_create_data_internal. Wherever we have rpcrdma_xprt, we also have the rpc_xprt, which has a sockaddr_storage field with the same content. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: Initialize the xprt address string array earlierChuck Lever
This makes the address strings available for debugging messages in earlier stages of transport set up. The first benefit is to get rid of the single-use rep_remote_addr field, saving 128+ bytes in struct rpcrdma_ep. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: Remove unused padding variablesChuck Lever
Clean up. Remove fields that should have been removed by commit b3221d6a53c4 ("xprtrdma: Remove logic that constructs RDMA_MSGP type calls"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: Remove ri_reminv_expectedChuck Lever
Clean up. Commit b5f0afbea4f2 ("xprtrdma: Per-connection pad optimization") should have removed this. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: Per-mode handling for Remote InvalidationChuck Lever
Refactoring change: Remote Invalidation is particular to the memory registration mode that is use. Use a callout instead of a generic function to handle Remote Invalidation. This gets rid of the 8-byte flags field in struct rpcrdma_mw, of which only a single bit flag has been allocated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: Eliminate unnecessary lock cycle in xprt_rdma_send_requestChuck Lever
The rpcrdma_req is not shared yet, and its associated Send hasn't been posted, thus RMW should be safe. There's no need for the expense of a lock cycle here. Fixes: 0ba6f37012db ("xprtrdma: Refactor rpcrdma_deferred_completion") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: Fix backchannel allocation of extra rpcrdma_repsChuck Lever
The backchannel code uses rpcrdma_recv_buffer_put to add new reps to the free rep list. This also decrements rb_recv_count, which spoofs the receive overrun logic in rpcrdma_buffer_get_rep. Commit 9b06688bc3b9 ("xprtrdma: Fix additional uses of spin_lock_irqsave(rb_lock)") replaced the original open-coded list_add with a call to rpcrdma_recv_buffer_put(), but then a year later, commit 05c974669ece ("xprtrdma: Fix receive buffer accounting") added rep accounting to rpcrdma_recv_buffer_put. It was an oversight to let the backchannel continue to use this function. The fix this, let's combine the "add to free list" logic with rpcrdma_create_rep. Also, do not allocate RPCRDMA_MAX_BC_REQUESTS rpcrdma_reps in rpcrdma_buffer_create and then allocate additional rpcrdma_reps in rpcrdma_bc_setup_reps. Allocating the extra reps during backchannel set-up is sufficient. Fixes: 05c974669ece ("xprtrdma: Fix receive buffer accounting") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16xprtrdma: Fix buffer leak after transport set up failureChuck Lever
This leak has been around forever, and is exceptionally rare. EINVAL causes mount to fail with "an incorrect mount option was specified" although it's not likely that one of the mount options is incorrect. Instead, return ENODEV in this case, as this appears to be an issue with system or device configuration rather than a specific mount option. Some obsolete comments are also removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2018-01-16netfilter: nf_defrag: move NF_CONNTRACK bits into #ifdefArnd Bergmann
We cannot access the skb->_nfct field when CONFIG_NF_CONNTRACK is disabled: net/ipv4/netfilter/nf_defrag_ipv4.c: In function 'ipv4_conntrack_defrag': net/ipv4/netfilter/nf_defrag_ipv4.c:83:9: error: 'struct sk_buff' has no member named '_nfct' net/ipv6/netfilter/nf_defrag_ipv6_hooks.c: In function 'ipv6_defrag': net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68:9: error: 'struct sk_buff' has no member named '_nfct' Both functions already have an #ifdef for this, so let's move the check in there. Fixes: 902d6a4c2a4f ("netfilter: nf_defrag: Skip defrag if NOTRACK is set") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-16netfilter: nf_defrag: mark xt_table structures 'const' againArnd Bergmann
As a side-effect of adding the module option, we now get a section mismatch warning: WARNING: net/ipv4/netfilter/iptable_raw.o(.data+0x1c): Section mismatch in reference from the variable packet_raw to the function .init.text:iptable_raw_table_init() The variable packet_raw references the function __init iptable_raw_table_init() If the reference is valid then annotate the variable with __init* or __refdata (see linux/init.h) or name the variable: *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console Apparently it's ok to link to a __net_init function from .rodata but not from .data. We can address this by rearranging the logic so that the structure is read-only again. Instead of writing to the .priority field later, we have an extra copies of the structure with that flag. An added advantage is that that we don't have writable function pointers with this approach. Fixes: 902d6a4c2a4f ("netfilter: nf_defrag: Skip defrag if NOTRACK is set") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-16netfilter: ipv6: nf_defrag: Pass on packets to stack per RFC2460Subash Abhinov Kasiviswanathan
ipv6_defrag pulls network headers before fragment header. In case of an error, the netfilter layer is currently dropping these packets. This results in failure of some IPv6 standards tests which passed on older kernels due to the netfilter framework using cloning. The test case run here is a check for ICMPv6 error message replies when some invalid IPv6 fragments are sent. This specific test case is listed in https://www.ipv6ready.org/docs/Core_Conformance_Latest.pdf in the Extension Header Processing Order section. A packet with unrecognized option Type 11 is sent and the test expects an ICMP error in line with RFC2460 section 4.2 - 11 - discard the packet and, only if the packet's Destination Address was not a multicast address, send an ICMP Parameter Problem, Code 2, message to the packet's Source Address, pointing to the unrecognized Option Type. Since netfilter layer now drops all invalid IPv6 frag packets, we no longer see the ICMP error message and fail the test case. To fix this, save the transport header. If defrag is unable to process the packet due to RFC2460, restore the transport header and allow packet to be processed by stack. There is no change for other packet processing paths. Tested by confirming that stack sends an ICMP error when it receives these packets. Also tested that fragmented ICMP pings succeed. v1->v2: Instead of cloning always, save the transport_header and restore it in case of this specific error. Update the title and commit message accordingly. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-16netfilter: x_tables: don't return garbage pointer on modprobe failureFlorian Westphal
request_module may return a positive error result from modprobe, if we cast this to ERR_PTR this returns a garbage result (it passes IS_ERR checks). Fix it by ignoring modprobe return values entirely, just retry the table lookup instead. Reported-by: syzbot+980925dbfbc7f93bc2ef@syzkaller.appspotmail.com Fixes: 03d13b6868a2 ("netfilter: xtables: add and use xt_request_find_table_lock") Fixes: 20651cefd25f ("netfilter: x_tables: unbreak module auto loading") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-16netfilter: nf_tables: flow_offload depends on flow_tableArnd Bergmann
Without CONFIG_NF_FLOW_TABLE, the new nft_flow_offload module produces a link error: net/netfilter/nft_flow_offload.o: In function `nft_flow_offload_iterate_cleanup': nft_flow_offload.c:(.text+0xb0): undefined reference to `nf_flow_table_iterate' net/netfilter/nft_flow_offload.o: In function `flow_offload_iterate_cleanup': nft_flow_offload.c:(.text+0x160): undefined reference to `flow_offload_dead' net/netfilter/nft_flow_offload.o: In function `nft_flow_offload_eval': nft_flow_offload.c:(.text+0xc4c): undefined reference to `flow_offload_alloc' nft_flow_offload.c:(.text+0xc64): undefined reference to `flow_offload_add' nft_flow_offload.c:(.text+0xc94): undefined reference to `flow_offload_free' This adds a Kconfig dependency for it. Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-15Merge tag 'linux-can-next-for-4.16-20180105' of ↵David S. Miller
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2017-12-01,Re: pull-request: can-next this is a pull request of 7 patches for net-next/master. All patches are by me. Patch 6 is for the "can_raw" protocol and add error checking to the bind() function. All other patches clean up the coding style and remove unused parameters in various CAN drivers and infrastructure. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15net: Restrict unwhitelisted proto caches to size 0Kees Cook
Now that protocols have been annotated (the copy of icsk_ca_ops->name is of an ops field from outside the slab cache): $ git grep 'copy_.*_user.*sk.*->' caif/caif_socket.c: copy_from_user(&cf_sk->conn_req.param.data, ov, ol)) { ipv4/raw.c: if (copy_from_user(&raw_sk(sk)->filter, optval, optlen)) ipv4/raw.c: copy_to_user(optval, &raw_sk(sk)->filter, len)) ipv4/tcp.c: if (copy_to_user(optval, icsk->icsk_ca_ops->name, len)) ipv4/tcp.c: if (copy_to_user(optval, icsk->icsk_ulp_ops->name, len)) ipv6/raw.c: if (copy_from_user(&raw6_sk(sk)->filter, optval, optlen)) ipv6/raw.c: if (copy_to_user(optval, &raw6_sk(sk)->filter, len)) sctp/socket.c: if (copy_from_user(&sctp_sk(sk)->subscribe, optval, optlen)) sctp/socket.c: if (copy_to_user(optval, &sctp_sk(sk)->subscribe, len)) sctp/socket.c: if (copy_to_user(optval, &sctp_sk(sk)->initmsg, len)) we can switch the default proto usercopy region to size 0. Any protocols needing to add whitelisted regions must annotate the fields with the useroffset and usersize fields of struct proto. This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15sctp: Copy struct sctp_sock.autoclose to userspace using put_user()David Windsor
The autoclose field can be copied with put_user(), so there is no need to use copy_to_user(). In both cases, hardened usercopy is being bypassed since the size is constant, and not open to runtime manipulation. This patch is verbatim from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Signed-off-by: David Windsor <dave@nullcore.net> [kees: adjust commit log] Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-sctp@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15sctp: Define usercopy region in SCTP proto slab cacheDavid Windsor
The SCTP socket event notification subscription information need to be copied to/from userspace. In support of usercopy hardening, this patch defines a region in the struct proto slab cache in which userspace copy operations are allowed. Additionally moves the usercopy fields to be adjacent for the region to cover both. example usage trace: net/sctp/socket.c: sctp_getsockopt_events(...): ... copy_to_user(..., &sctp_sk(sk)->subscribe, len) sctp_setsockopt_events(...): ... copy_from_user(&sctp_sk(sk)->subscribe, ..., optlen) sctp_getsockopt_initmsg(...): ... copy_to_user(..., &sctp_sk(sk)->initmsg, len) This region is known as the slab cache's usercopy region. Slab caches can now check that each dynamically sized copy operation involving cache-managed memory falls entirely within the slab's usercopy region. This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Signed-off-by: David Windsor <dave@nullcore.net> [kees: split from network patch, move struct members adjacent] [kees: add SCTPv6 struct whitelist, provide usage trace] Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-sctp@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15caif: Define usercopy region in caif proto slab cacheDavid Windsor
The CAIF channel connection request parameters need to be copied to/from userspace. In support of usercopy hardening, this patch defines a region in the struct proto slab cache in which userspace copy operations are allowed. example usage trace: net/caif/caif_socket.c: setsockopt(...): ... copy_from_user(&cf_sk->conn_req.param.data, ..., ol) This region is known as the slab cache's usercopy region. Slab caches can now check that each dynamically sized copy operation involving cache-managed memory falls entirely within the slab's usercopy region. This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Signed-off-by: David Windsor <dave@nullcore.net> [kees: split from network patch, provide usage trace] Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15ip: Define usercopy region in IP proto slab cacheDavid Windsor
The ICMP filters for IPv4 and IPv6 raw sockets need to be copied to/from userspace. In support of usercopy hardening, this patch defines a region in the struct proto slab cache in which userspace copy operations are allowed. example usage trace: net/ipv4/raw.c: raw_seticmpfilter(...): ... copy_from_user(&raw_sk(sk)->filter, ..., optlen) raw_geticmpfilter(...): ... copy_to_user(..., &raw_sk(sk)->filter, len) net/ipv6/raw.c: rawv6_seticmpfilter(...): ... copy_from_user(&raw6_sk(sk)->filter, ..., optlen) rawv6_geticmpfilter(...): ... copy_to_user(..., &raw6_sk(sk)->filter, len) This region is known as the slab cache's usercopy region. Slab caches can now check that each dynamically sized copy operation involving cache-managed memory falls entirely within the slab's usercopy region. This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Signed-off-by: David Windsor <dave@nullcore.net> [kees: split from network patch, provide usage trace] Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15net: Define usercopy region in struct proto slab cacheDavid Windsor
In support of usercopy hardening, this patch defines a region in the struct proto slab cache in which userspace copy operations are allowed. Some protocols need to copy objects to/from userspace, and they can declare the region via their proto structure with the new usersize and useroffset fields. Initially, if no region is specified (usersize == 0), the entire field is marked as whitelisted. This allows protocols to be whitelisted in subsequent patches. Once all protocols have been annotated, the full-whitelist default can be removed. This region is known as the slab cache's usercopy region. Slab caches can now check that each dynamically sized copy operation involving cache-managed memory falls entirely within the slab's usercopy region. This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Signed-off-by: David Windsor <dave@nullcore.net> [kees: adjust commit log, split off per-proto patches] [kees: add logic for by-default full-whitelist] Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANYJim Westfall
Map all lookup neigh keys to INADDR_ANY for loopback/point-to-point devices to avoid making an entry for every remote ip the device needs to talk to. This used the be the old behavior but became broken in a263b3093641f (ipv4: Make neigh lookups directly in output packet path) and later removed in 0bb4087cbec0 (ipv4: Fix neigh lookup keying over loopback/point-to-point devices) because it was broken. Signed-off-by: Jim Westfall <jwestfall@surrealistic.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15net: Allow neigh contructor functions ability to modify the primary_keyJim Westfall
Use n->primary_key instead of pkey to account for the possibility that a neigh constructor function may have modified the primary_key value. Signed-off-by: Jim Westfall <jwestfall@surrealistic.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15Revert "openvswitch: Add erspan tunnel support."William Tu
This reverts commit ceaa001a170e43608854d5290a48064f57b565ed. The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr should be designed as a nested attribute to support all ERSPAN v1 and v2's fields. The current attr is a be32 supporting only one field. Thus, this patch reverts it and later patch will redo it using nested attr. Signed-off-by: William Tu <u9012063@gmail.com> Cc: Jiri Benc <jbenc@redhat.com> Cc: Pravin Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15ipv6: Fix build with gcc-4.4.5Ido Schimmel
Emil reported the following compiler errors: net/ipv6/route.c: In function `rt6_sync_up`: net/ipv6/route.c:3586: error: unknown field `nh_flags` specified in initializer net/ipv6/route.c:3586: warning: missing braces around initializer net/ipv6/route.c:3586: warning: (near initialization for `arg.<anonymous>`) net/ipv6/route.c: In function `rt6_sync_down_dev`: net/ipv6/route.c:3695: error: unknown field `event` specified in initializer net/ipv6/route.c:3695: warning: missing braces around initializer net/ipv6/route.c:3695: warning: (near initialization for `arg.<anonymous>`) Problem is with the named initializers for the anonymous union members. Fix this by adding curly braces around the initialization. Fixes: 4c981e28d373 ("ipv6: Prepare to handle multiple netdev events") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Emil S Tantilov <emils.tantilov@gmail.com> Tested-by: Emil S Tantilov <emils.tantilov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15tipc: fix bug during lookup of multicast destination nodesJon Maloy
In commit 232d07b74a33 ("tipc: improve groupcast scope handling") we inadvertently broke non-group multicast transmission when changing the parameter 'domain' to 'scope' in the function tipc_nametbl_lookup_dst_nodes(). We missed to make the corresponding change in the calling function, with the result that the lookup always fails. A closer anaysis reveals that this parameter is not needed at all. Non-group multicast is hard coded to use CLUSTER_SCOPE, and in the current implementation this will be delivered to all matching destinations except those which are published with NODE_SCOPE on other nodes. Since such publications never will be visible on the sending node anyway, it makes no sense to discriminate by scope at all. We now remove this parameter altogether. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15net: Convert atomic_t net::count to refcount_tKirill Tkhai
Since net could be obtained from RCU lists, and there is a race with net destruction, the patch converts net::count to refcount_t. This provides sanity checks for the cases of incrementing counter of already dead net, when maybe_get_net() has to used instead of get_net(). Drivers: allyesconfig and allmodconfig are OK. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15net/tls: Fix inverted error codes to avoid endless loopr.hering@avm.de
sendfile() calls can hang endless with using Kernel TLS if a socket error occurs. Socket error codes must be inverted by Kernel TLS before returning because they are stored with positive sign. If returned non-inverted they are interpreted as number of bytes sent, causing endless looping of the splice mechanic behind sendfile(). Signed-off-by: Robert Hering <r.hering@avm.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15ipv6: ip6_make_skb() needs to clear cork.base.dstEric Dumazet
In my last patch, I missed fact that cork.base.dst was not initialized in ip6_make_skb() : If ip6_setup_cork() returns an error, we might attempt a dst_release() on some random pointer. Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15sctp: removed unused var from sctp_make_authMarcelo Ricardo Leitner
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15sctp: avoid compiler warning on implicit fallthruMarcelo Ricardo Leitner
These fall-through are expected. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15net: ipv4: Make "ip route get" match iif lo rules again.Lorenzo Colitti
Commit 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") broke "ip route get" in the presence of rules that specify iif lo. Host-originated traffic always has iif lo, because ip_route_output_key_hash and ip6_route_output_flags set the flow iif to LOOPBACK_IFINDEX. Thus, putting "iif lo" in an ip rule is a convenient way to select only originated traffic and not forwarded traffic. inet_rtm_getroute used to match these rules correctly because even though it sets the flow iif to 0, it called ip_route_output_key which overwrites iif with LOOPBACK_IFINDEX. But now that it calls ip_route_output_key_hash_rcu, the ifindex will remain 0 and not match the iif lo in the rule. As a result, "ip route get" will return ENETUNREACH. Fixes: 3765d35ed8b9 ("net: ipv4: Convert inet_rtm_getroute to rcu versions of route lookup") Tested: https://android.googlesource.com/kernel/tests/+/master/net/test/multinetwork_test.py passes again Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15netlink: extack needs to be reset each time through loopDavid Ahern
syzbot triggered the WARN_ON in netlink_ack testing the bad_attr value. The problem is that netlink_rcv_skb loops over the skb repeatedly invoking the callback and without resetting the extack leaving potentially stale data. Initializing each time through avoids the WARN_ON. Fixes: 2d4bc93368f5a ("netlink: extended ACK reporting") Reported-by: syzbot+315fa6766d0f7c359327@syzkaller.appspotmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>