summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-06-12genetlink: clean up family attributes allocationsCong Wang
genl_family_rcv_msg_attrs_parse() and genl_family_rcv_msg_attrs_free() take a boolean parameter to determine whether allocate/free the family attrs. This is unnecessary as we can just check family->parallel_ops. More importantly, callers would not need to worry about pairing these parameters correctly after this patch. And this fixes a memory leak, as after commit c36f05559104 ("genetlink: fix memory leaks in genl_family_rcv_msg_dumpit()") we call genl_family_rcv_msg_attrs_parse() for both parallel and non-parallel cases. Fixes: c36f05559104 ("genetlink: fix memory leaks in genl_family_rcv_msg_dumpit()") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-12netfilter: nf_tables: hook list memleak in flowtable deletionPablo Neira Ayuso
After looking up for the flowtable hooks that need to be removed, release the hook objects in the deletion list. The error path needs to released these hook objects too. Fixes: abadb2f865d7 ("netfilter: nf_tables: delete devices from flowtable") Reported-by: syzbot+eb9d5924c51d6d59e094@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-11rxrpc: Fix race between incoming ACK parser and retransmitterDavid Howells
There's a race between the retransmission code and the received ACK parser. The problem is that the retransmission loop has to drop the lock under which it is iterating through the transmission buffer in order to transmit a packet, but whilst the lock is dropped, the ACK parser can crank the Tx window round and discard the packets from the buffer. The retransmission code then updated the annotations for the wrong packet and a later retransmission thought it had to retransmit a packet that wasn't there, leading to a NULL pointer dereference. Fix this by: (1) Moving the annotation change to before we drop the lock prior to transmission. This means we can't vary the annotation depending on the outcome of the transmission, but that's fine - we'll retransmit again later if it failed now. (2) Skipping the packet if the skb pointer is NULL. The following oops was seen: BUG: kernel NULL pointer dereference, address: 000000000000002d Workqueue: krxrpcd rxrpc_process_call RIP: 0010:rxrpc_get_skb+0x14/0x8a ... Call Trace: rxrpc_resend+0x331/0x41e ? get_vtime_delta+0x13/0x20 rxrpc_process_call+0x3c0/0x4ac process_one_work+0x18f/0x27f worker_thread+0x1a3/0x247 ? create_worker+0x17d/0x17d kthread+0xe6/0xeb ? kthread_delayed_work_timer_fn+0x83/0x83 ret_from_fork+0x1f/0x30 Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-11xdp: Fix xsk_generic_xmit errnoLi RongQing
Propagate sock_alloc_send_skb error code, not set it to EAGAIN unconditionally, when fail to allocate skb, which might cause that user space unnecessary loops. Fixes: 35fcde7f8deb ("xsk: support for Tx") Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/1591852266-24017-1-git-send-email-lirongqing@baidu.com
2020-06-11tipc: fix NULL pointer dereference in tipc_disc_rcv()Tuong Lien
When a bearer is enabled, we create a 'tipc_discoverer' object to store the bearer related data along with a timer and a preformatted discovery message buffer for later probing... However, this is only carried after the bearer was set 'up', that left a race condition resulting in kernel panic. It occurs when a discovery message from a peer node is received and processed in bottom half (since the bearer is 'up' already) just before the discoverer object is created but is now accessed in order to update the preformatted buffer (with a new trial address, ...) so leads to the NULL pointer dereference. We solve the problem by simply moving the bearer 'up' setting to later, so make sure everything is ready prior to any message receiving. Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-11tipc: fix kernel WARNING in tipc_msg_append()Tuong Lien
syzbot found the following issue: WARNING: CPU: 0 PID: 6808 at include/linux/thread_info.h:150 check_copy_size include/linux/thread_info.h:150 [inline] WARNING: CPU: 0 PID: 6808 at include/linux/thread_info.h:150 copy_from_iter include/linux/uio.h:144 [inline] WARNING: CPU: 0 PID: 6808 at include/linux/thread_info.h:150 tipc_msg_append+0x49a/0x5e0 net/tipc/msg.c:242 Kernel panic - not syncing: panic_on_warn set ... This happens after commit 5e9eeccc58f3 ("tipc: fix NULL pointer dereference in streaming") that tried to build at least one buffer even when the message data length is zero... However, it now exposes another bug that the 'mss' can be zero and the 'cpy' will be negative, thus the above kernel WARNING will appear! The zero value of 'mss' is never expected because it means Nagle is not enabled for the socket (actually the socket type was 'SOCK_SEQPACKET'), so the function 'tipc_msg_append()' must not be called at all. But that was in this particular case since the message data length was zero, and the 'send <= maxnagle' check became true. We resolve the issue by explicitly checking if Nagle is enabled for the socket, i.e. 'maxnagle != 0' before calling the 'tipc_msg_append()'. We also reinforce the function to against such a negative values if any. Reported-by: syzbot+75139a7d2605236b0b7f@syzkaller.appspotmail.com Fixes: c0bceb97db9e ("tipc: add smart nagle feature") Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-11Merge tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New features and improvements: - Sunrpc receive buffer sizes only change when establishing a GSS credentials - Add more sunrpc tracepoints - Improve on tracepoints to capture internal NFS I/O errors Other bugfixes and cleanups: - Move a dprintk() to after a call to nfs_alloc_fattr() - Fix off-by-one issues in rpc_ntop6 - Fix a few coccicheck warnings - Use the correct SPDX license identifiers - Fix rpc_call_done assignment for BIND_CONN_TO_SESSION - Replace zero-length array with flexible array - Remove duplicate headers - Set invalid blocks after NFSv4 writes to update space_used attribute - Fix direct WRITE throughput regression" * tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits) NFS: Fix direct WRITE throughput regression SUNRPC: rpc_xprt lifetime events should record xprt->state xprtrdma: Make xprt_rdma_slot_table_entries static nfs: set invalid blocks after NFSv4 writes NFS: remove redundant initialization of variable result sunrpc: add missing newline when printing parameter 'auth_hashtable_size' by sysfs NFS: Add a tracepoint in nfs_set_pgio_error() NFS: Trace short NFS READs NFS: nfs_xdr_status should record the procedure name SUNRPC: Set SOFTCONN when destroying GSS contexts SUNRPC: rpc_call_null_helper() should set RPC_TASK_SOFT SUNRPC: rpc_call_null_helper() already sets RPC_TASK_NULLCREDS SUNRPC: trace RPC client lifetime events SUNRPC: Trace transport lifetime events SUNRPC: Split the xdr_buf event class SUNRPC: Add tracepoint to rpc_call_rpcerror() SUNRPC: Update the RPC_SHOW_SOCKET() macro SUNRPC: Update the rpc_show_task_flags() macro SUNRPC: Trace GSS context lifetimes SUNRPC: receive buffer size estimation values almost never change ...
2020-06-11xprtrdma: Make xprt_rdma_slot_table_entries staticZou Wei
Fix the following sparse warning: net/sunrpc/xprtrdma/transport.c:71:14: warning: symbol 'xprt_rdma_slot_table_entries' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11sunrpc: add missing newline when printing parameter 'auth_hashtable_size' by ↵Xiongfeng Wang
sysfs When I cat parameter '/sys/module/sunrpc/parameters/auth_hashtable_size', it displays as follows. It is better to add a newline for easy reading. [root@hulk-202 ~]# cat /sys/module/sunrpc/parameters/auth_hashtable_size 16[root@hulk-202 ~]# Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11SUNRPC: Set SOFTCONN when destroying GSS contextsChuck Lever
Move the RPC_TASK_SOFTCONN flag into rpc_call_null_helper(). The only minor behavior change is that it is now also set when destroying GSS contexts. This gives a better guarantee that gss_send_destroy_context() will not hang for long if a connection cannot be established. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11SUNRPC: rpc_call_null_helper() should set RPC_TASK_SOFTChuck Lever
Clean up. All of rpc_call_null_helper() call sites assert RPC_TASK_SOFT, so move that setting into rpc_call_null_helper() itself. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11SUNRPC: rpc_call_null_helper() already sets RPC_TASK_NULLCREDSChuck Lever
Clean up. Commit a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.") made rpc_call_null_helper() set RPC_TASK_NULLCREDS unconditionally. Therefore there's no need for rpc_call_null_helper()'s call sites to set RPC_TASK_NULLCREDS. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11SUNRPC: trace RPC client lifetime eventsChuck Lever
The "create" tracepoint records parts of the rpc_create arguments, and the shutdown tracepoint records when the rpc_clnt is about to signal pending tasks and destroy auths. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11SUNRPC: Trace transport lifetime eventsChuck Lever
Refactor: Hoist create/destroy/disconnect tracepoints out of xprtrdma and into the generic RPC client. Some benefits include: - Enable tracing of xprt lifetime events for the socket transport types - Expose the different types of disconnect to help run down issues with lingering connections Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11SUNRPC: Split the xdr_buf event classChuck Lever
To help tie the recorded xdr_buf to a particular RPC transaction, the client side version of this class should display task ID information and the server side one should show the request's XID. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11SUNRPC: Add tracepoint to rpc_call_rpcerror()Chuck Lever
Add a tracepoint in another common exit point for failing RPCs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11SUNRPC: Trace GSS context lifetimesChuck Lever
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11SUNRPC: receive buffer size estimation values almost never changeChuck Lever
Avoid unnecessary cache sloshing by placing the buffer size estimation update logic behind an atomic bit flag. The size of GSS information included in each wrapped Reply does not change during the lifetime of a GSS context. Therefore, the au_rslack and au_ralign fields need to be updated only once after establishing a fresh GSS credential. Thus a slack size update must occur after a cred is created, duplicated, renewed, or expires. I'm not sure I have this exactly right. A trace point is introduced to track updates to these variables to enable troubleshooting the problem if I missed a spot. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-06-11Merge tag 'nfsd-5.8' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Highlights: - Keep nfsd clients from unnecessarily breaking their own delegations. Note this requires a small kthreadd addition. The result is Tejun Heo's suggestion (see link), and he was OK with this going through my tree. - Patch nfsd/clients/ to display filenames, and to fix byte-order when displaying stateid's. - fix a module loading/unloading bug, from Neil Brown. - A big series from Chuck Lever with RPC/RDMA and tracing improvements, and lay some groundwork for RPC-over-TLS" Link: https://lore.kernel.org/r/1588348912-24781-1-git-send-email-bfields@redhat.com * tag 'nfsd-5.8' of git://linux-nfs.org/~bfields/linux: (49 commits) sunrpc: use kmemdup_nul() in gssp_stringify() nfsd: safer handling of corrupted c_type nfsd4: make drc_slab global, not per-net SUNRPC: Remove unreachable error condition in rpcb_getport_async() nfsd: Fix svc_xprt refcnt leak when setup callback client failed sunrpc: clean up properly in gss_mech_unregister() sunrpc: svcauth_gss_register_pseudoflavor must reject duplicate registrations. sunrpc: check that domain table is empty at module unload. NFSD: Fix improperly-formatted Doxygen comments NFSD: Squash an annoying compiler warning SUNRPC: Clean up request deferral tracepoints NFSD: Add tracepoints for monitoring NFSD callbacks NFSD: Add tracepoints to the NFSD state management code NFSD: Add tracepoints to NFSD's duplicate reply cache SUNRPC: svc_show_status() macro should have enum definitions SUNRPC: Restructure svc_udp_recvfrom() SUNRPC: Refactor svc_recvfrom() SUNRPC: Clean up svc_release_skb() functions SUNRPC: Refactor recvfrom path dealing with incomplete TCP receives SUNRPC: Replace dprintk() call sites in TCP receive path ...
2020-06-11net/filter: Permit reading NET in load_bytes_relative when MAC not setYiFei Zhu
Added a check in the switch case on start_header that checks for the existence of the header, and in the case that MAC is not set and the caller requests for MAC, -EFAULT. If the caller requests for NET then MAC's existence is completely ignored. There is no function to check NET header's existence and as far as cgroup_skb/egress is concerned it should always be set. Removed for ptr >= the start of header, considering offset is bounded unsigned and should always be true. len <= end - mac is redundant to ptr + len <= end. Fixes: 3eee1f75f2b9 ("bpf: fix bpf_skb_load_bytes_relative pkt length check") Signed-off-by: YiFei Zhu <zhuyifei@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/76bb820ddb6a95f59a772ecbd8c8a336f646b362.1591812755.git.zhuyifei@google.com
2020-06-10mptcp: don't leak msk in token containerPaolo Abeni
If a listening MPTCP socket has unaccepted sockets at close time, the related msks are freed via mptcp_sock_destruct(), which in turn does not invoke the proto->destroy() method nor the mptcp_token_destroy() function. Due to the above, the child msk socket is not removed from the token container, leading to later UaF. Address the issue explicitly removing the token even in the above error path. Fixes: 79c0949e9a09 ("mptcp: Add key generation and token tree") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-10Merge branch 'work.sysctl' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull sysctl fixes from Al Viro: "Fixups to regressions in sysctl series" * 'work.sysctl' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: sysctl: reject gigantic reads/write to sysctl files cdrom: fix an incorrect __user annotation on cdrom_sysctl_info trace: fix an incorrect __user annotation on stack_trace_sysctl random: fix an incorrect __user annotation on proc_do_entropy net/sysctl: remove leftover __user annotations on neigh_proc_dointvec* net/sysctl: use cpumask_parse in flow_limit_cpu_sysctl
2020-06-10Merge branch 'rwonce/rework' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux Pull READ/WRITE_ONCE rework from Will Deacon: "This the READ_ONCE rework I've been working on for a while, which bumps the minimum GCC version and improves code-gen on arm64 when stack protector is enabled" [ Side note: I'm _really_ tempted to raise the minimum gcc version to 4.9, so that we can just say that we require _Generic() support. That would allow us to more cleanly handle a lot of the cases where we depend on very complex macros with 'sizeof' or __builtin_choose_expr() with __builtin_types_compatible_p() etc. This branch has a workaround for sparse not handling _Generic(), either, but that was already fixed in the sparse development branch, so it's really just gcc-4.9 that we'd require. - Linus ] * 'rwonce/rework' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux: compiler_types.h: Use unoptimized __unqual_scalar_typeof for sparse compiler_types.h: Optimize __unqual_scalar_typeof compilation time compiler.h: Enforce that READ_ONCE_NOCHECK() access size is sizeof(long) compiler-types.h: Include naked type in __pick_integer_type() match READ_ONCE: Fix comment describing 2x32-bit atomicity gcov: Remove old GCC 3.4 support arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros READ_ONCE: Drop pointer qualifiers when reading from scalar types READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE() arm64: csum: Disable KASAN for do_csum() fault_inject: Don't rely on "return value" from WRITE_ONCE() net: tls: Avoid assigning 'const' pointer to non-const pointer netfilter: Avoid assigning 'const' pointer to non-const pointer compiler/gcc: Raise minimum GCC version for kernel builds to 4.8
2020-06-10mptcp: fix races between shutdown and recvmsgPaolo Abeni
The msk sk_shutdown flag is set by a workqueue, possibly introducing some delay in user-space notification. If the last subflow carries some data with the fin packet, the user space can wake-up before RCV_SHUTDOWN is set. If it executes unblocking recvmsg(), it may return with an error instead of eof. Address the issue explicitly checking for eof in recvmsg(), when no data is found. Fixes: 59832e246515 ("mptcp: subflow: check parent mptcp socket on subflow state change") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-10nexthop: Fix fdb labeling for groupsDavid Ahern
fdb nexthops are marked with a flag. For standalone nexthops, a flag was added to the nh_info struct. For groups that flag was added to struct nexthop when it should have been added to the group information. Fix by removing the flag from the nexthop struct and adding a flag to nh_group that mirrors nh_info and is really only a caching of the individual types. Add a helper, nexthop_is_fdb, for use by the vxlan code and fixup the internal code to use the flag from either nh_info or nh_group. v2 - propagate fdb_nh in remove_nh_grp_entry Fixes: 38428d68719c ("nexthop: support for fdb ecmp nexthops") Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-10netfilter: ctnetlink: memleak in filter initialization error pathPablo Neira Ayuso
Release the filter object in case of error. Fixes: cb8aa9a3affb ("netfilter: ctnetlink: add kernel side filtering for dump") Reported-by: syzbot+38b8b548a851a01793c5@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-09dccp: Fix possible memleak in dccp_init and dccp_finiWang Hai
There are some memory leaks in dccp_init() and dccp_fini(). In dccp_fini() and the error handling path in dccp_init(), free lhash2 is missing. Add inet_hashinfo2_free_mod() to do it. If inet_hashinfo2_init_mod() failed in dccp_init(), percpu_counter_destroy() should be called to destroy dccp_orphan_count. It need to goto out_free_percpu when inet_hashinfo2_init_mod() failed. Fixes: c92c81df93df ("net: dccp: fix kernel crash on module load") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-09net: sched: export __netdev_watchdog_up()Valentin Longchamp
Since the quiesce/activate rework, __netdev_watchdog_up() is directly called in the ucc_geth driver. Unfortunately, this function is not available for modules and thus ucc_geth cannot be built as a module anymore. Fix it by exporting __netdev_watchdog_up(). Since the commit introducing the regression was backported to stable branches, this one should ideally be as well. Fixes: 79dde73cf9bc ("net/ethernet/freescale: rework quiesce/activate for ucc_geth") Signed-off-by: Valentin Longchamp <valentin@longchamp.me> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-09net: change addr_list_lock back to static keyCong Wang
The dynamic key update for addr_list_lock still causes troubles, for example the following race condition still exists: CPU 0: CPU 1: (RCU read lock) (RTNL lock) dev_mc_seq_show() netdev_update_lockdep_key() -> lockdep_unregister_key() -> netif_addr_lock_bh() because lockdep doesn't provide an API to update it atomically. Therefore, we have to move it back to static keys and use subclass for nest locking like before. In commit 1a33e10e4a95 ("net: partially revert dynamic lockdep key changes"), I already reverted most parts of commit ab92d68fc22f ("net: core: add generic lockdep keys"). This patch reverts the rest and also part of commit f3b0a18bb6cb ("net: remove unnecessary variables and callback"). After this patch, addr_list_lock changes back to using static keys and subclasses to satisfy lockdep. Thanks to dev->lower_level, we do not have to change back to ->ndo_get_lock_subclass(). And hopefully this reduces some syzbot lockdep noises too. Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com Cc: Taehee Yoo <ap420073@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-09bpf, sockhash: Synchronize delete from bucket list on map freeJakub Sitnicki
We can end up modifying the sockhash bucket list from two CPUs when a sockhash is being destroyed (sock_hash_free) on one CPU, while a socket that is in the sockhash is unlinking itself from it on another CPU it (sock_hash_delete_from_link). This results in accessing a list element that is in an undefined state as reported by KASAN: | ================================================================== | BUG: KASAN: wild-memory-access in sock_hash_free+0x13c/0x280 | Write of size 8 at addr dead000000000122 by task kworker/2:1/95 | | CPU: 2 PID: 95 Comm: kworker/2:1 Not tainted 5.7.0-rc7-02961-ge22c35ab0038-dirty #691 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 | Workqueue: events bpf_map_free_deferred | Call Trace: | dump_stack+0x97/0xe0 | ? sock_hash_free+0x13c/0x280 | __kasan_report.cold+0x5/0x40 | ? mark_lock+0xbc1/0xc00 | ? sock_hash_free+0x13c/0x280 | kasan_report+0x38/0x50 | ? sock_hash_free+0x152/0x280 | sock_hash_free+0x13c/0x280 | bpf_map_free_deferred+0xb2/0xd0 | ? bpf_map_charge_finish+0x50/0x50 | ? rcu_read_lock_sched_held+0x81/0xb0 | ? rcu_read_lock_bh_held+0x90/0x90 | process_one_work+0x59a/0xac0 | ? lock_release+0x3b0/0x3b0 | ? pwq_dec_nr_in_flight+0x110/0x110 | ? rwlock_bug.part.0+0x60/0x60 | worker_thread+0x7a/0x680 | ? _raw_spin_unlock_irqrestore+0x4c/0x60 | kthread+0x1cc/0x220 | ? process_one_work+0xac0/0xac0 | ? kthread_create_on_node+0xa0/0xa0 | ret_from_fork+0x24/0x30 | ================================================================== Fix it by reintroducing spin-lock protected critical section around the code that removes the elements from the bucket on sockhash free. To do that we also need to defer processing of removed elements, until out of atomic context so that we can unlink the socket from the map when holding the sock lock. Fixes: 90db6d772f74 ("bpf, sockmap: Remove bucket->lock from sock_{hash|map}_free") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200607205229.2389672-3-jakub@cloudflare.com
2020-06-09bpf, sockhash: Fix memory leak when unlinking sockets in sock_hash_freeJakub Sitnicki
When sockhash gets destroyed while sockets are still linked to it, we will walk the bucket lists and delete the links. However, we are not freeing the list elements after processing them, leaking the memory. The leak can be triggered by close()'ing a sockhash map when it still contains sockets, and observed with kmemleak: unreferenced object 0xffff888116e86f00 (size 64): comm "race_sock_unlin", pid 223, jiffies 4294731063 (age 217.404s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 81 de e8 41 00 00 00 00 c0 69 2f 15 81 88 ff ff ...A.....i/..... backtrace: [<00000000dd089ebb>] sock_hash_update_common+0x4ca/0x760 [<00000000b8219bd5>] sock_hash_update_elem+0x1d2/0x200 [<000000005e2c23de>] __do_sys_bpf+0x2046/0x2990 [<00000000d0084618>] do_syscall_64+0xad/0x9a0 [<000000000d96f263>] entry_SYSCALL_64_after_hwframe+0x49/0xb3 Fix it by freeing the list element when we're done with it. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200607205229.2389672-2-jakub@cloudflare.com
2020-06-09bpf/sockmap: Fix kernel panic at __tcp_bpf_recvmsgdihu
When user application calls read() with MSG_PEEK flag to read data of bpf sockmap socket, kernel panic happens at __tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg queue after read out under MSG_PEEK flag is set. Because it's not judged whether sk_msg is the last msg of ingress_msg queue, the next sk_msg may be the head of ingress_msg queue, whose memory address of sg page is invalid. So it's necessary to add check codes to prevent this problem. [20759.125457] BUG: kernel NULL pointer dereference, address: 0000000000000008 [20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G E 5.4.32 #1 [20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS 4.1.12 06/18/2017 [20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300 [20759.270877] __tcp_bpf_recvmsg+0x12c/0x350 [20759.276099] tcp_bpf_recvmsg+0x113/0x370 [20759.281137] inet_recvmsg+0x55/0xc0 [20759.285734] __sys_recvfrom+0xc8/0x130 [20759.290566] ? __audit_syscall_entry+0x103/0x130 [20759.296227] ? syscall_trace_enter+0x1d2/0x2d0 [20759.301700] ? __audit_syscall_exit+0x1e4/0x290 [20759.307235] __x64_sys_recvfrom+0x24/0x30 [20759.312226] do_syscall_64+0x55/0x1b0 [20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: dihu <anny.hu@linux.alibaba.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20200605084625.9783-1-anny.hu@linux.alibaba.com
2020-06-09mmap locking API: convert mmap_sem API commentsMichel Lespinasse
Convert comments that reference old mmap_sem APIs to reference corresponding new mmap locking APIs instead. Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09mmap locking API: use coccinelle to convert mmap_sem rwsem call sitesMichel Lespinasse
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-08Merge tag 'rxrpc-fixes-20200605' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Fix hang due to missing notification Here's a fix for AF_RXRPC. Occasionally calls hang because there are circumstances in which rxrpc generate a notification when a call is completed - primarily because initial packet transmission failed and the call was killed off and an error returned. But the AFS filesystem driver doesn't check this under all circumstances, expecting failure to be delivered by asynchronous notification. There are two patches: the first moves the problematic bits out-of-line and the second contains the fix. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-08mptcp: bugfix for RM_ADDR option parsingGeliang Tang
In MPTCPOPT_RM_ADDR option parsing, the pointer "ptr" pointed to the "Subtype" octet, the pointer "ptr+1" pointed to the "Address ID" octet: +-------+-------+---------------+ |Subtype|(resvd)| Address ID | +-------+-------+---------------+ | | ptr ptr+1 We should set mp_opt->rm_id to the value of "ptr+1", not "ptr". This patch will fix this bug. Fixes: 3df523ab582c ("mptcp: Add ADD_ADDR handling") Signed-off-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-08net-zerocopy: use vm_insert_pages() for tcp rcv zerocopyArjun Roy
Use vm_insert_pages() for tcp receive zerocopy. Spin lock cycles (as reported by perf) drop from a couple of percentage points to a fraction of a percent. This results in a roughly 6% increase in efficiency, measured roughly as zerocopy receive count divided by CPU utilization. The intention of this patchset is to reduce atomic ops for tcp zerocopy receives, which normally hits the same spinlock multiple times consecutively. [akpm@linux-foundation.org: suppress gcc-7.2.0 warning] Link: http://lkml.kernel.org/r/20200128025958.43490-3-arjunroy.kdev@gmail.com Signed-off-by: Arjun Roy <arjunroy@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Cc: David Miller <davem@davemloft.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-08Merge tag 'mac80211-for-davem-2020-06-08' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Just a small update: * fix the deadlock on rfkill/wireless removal that a few people reported * fix an uninitialized variable * update wiki URLs ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-08netfilter: nft_set_pipapo: Disable preemption before getting per-CPU pointerStefano Brivio
The lkp kernel test robot reports, with CONFIG_DEBUG_PREEMPT enabled: [ 165.316525] BUG: using smp_processor_id() in preemptible [00000000] code: nft/6247 [ 165.319547] caller is nft_pipapo_insert+0x464/0x610 [nf_tables] [ 165.321846] CPU: 1 PID: 6247 Comm: nft Not tainted 5.6.0-rc5-01595-ge32a4dc6512ce3 #1 [ 165.332128] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 165.334892] Call Trace: [ 165.336435] dump_stack+0x8f/0xcb [ 165.338128] debug_smp_processor_id+0xb2/0xc0 [ 165.340117] nft_pipapo_insert+0x464/0x610 [nf_tables] [ 165.342290] ? nft_trans_alloc_gfp+0x1c/0x60 [nf_tables] [ 165.344420] ? rcu_read_lock_sched_held+0x52/0x80 [ 165.346460] ? nft_trans_alloc_gfp+0x1c/0x60 [nf_tables] [ 165.348543] ? __mmu_interval_notifier_insert+0xa0/0xf0 [ 165.350629] nft_add_set_elem+0x5ff/0xa90 [nf_tables] [ 165.352699] ? __lock_acquire+0x241/0x1400 [ 165.354573] ? __lock_acquire+0x241/0x1400 [ 165.356399] ? reacquire_held_locks+0x12f/0x200 [ 165.358384] ? nf_tables_valid_genid+0x1f/0x40 [nf_tables] [ 165.360502] ? nla_strcmp+0x10/0x50 [ 165.362199] ? nft_table_lookup+0x4f/0xa0 [nf_tables] [ 165.364217] ? nla_strcmp+0x10/0x50 [ 165.365891] ? nf_tables_newsetelem+0xd5/0x150 [nf_tables] [ 165.367997] nf_tables_newsetelem+0xd5/0x150 [nf_tables] [ 165.370083] nfnetlink_rcv_batch+0x4fd/0x790 [nfnetlink] [ 165.372205] ? __lock_acquire+0x241/0x1400 [ 165.374058] ? __nla_validate_parse+0x57/0x8a0 [ 165.375989] ? cap_inode_getsecurity+0x230/0x230 [ 165.377954] ? security_capable+0x38/0x50 [ 165.379795] nfnetlink_rcv+0x11d/0x140 [nfnetlink] [ 165.381779] netlink_unicast+0x1b2/0x280 [ 165.383612] netlink_sendmsg+0x351/0x470 [ 165.385439] sock_sendmsg+0x5b/0x60 [ 165.387133] ____sys_sendmsg+0x200/0x280 [ 165.388871] ? copy_msghdr_from_user+0xd9/0x160 [ 165.390805] ___sys_sendmsg+0x88/0xd0 [ 165.392524] ? __might_fault+0x3e/0x90 [ 165.394273] ? sock_getsockopt+0x3d5/0xbb0 [ 165.396021] ? __handle_mm_fault+0x545/0x6a0 [ 165.397822] ? find_held_lock+0x2d/0x90 [ 165.399593] ? __sys_sendmsg+0x5e/0xa0 [ 165.401338] __sys_sendmsg+0x5e/0xa0 [ 165.402979] do_syscall_64+0x60/0x280 [ 165.404680] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 165.406621] RIP: 0033:0x7ff1fa46e783 [ 165.408299] Code: c7 c0 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48 [ 165.414163] RSP: 002b:00007ffedf59ea78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 165.416804] RAX: ffffffffffffffda RBX: 00007ffedf59fc60 RCX: 00007ff1fa46e783 [ 165.419419] RDX: 0000000000000000 RSI: 00007ffedf59fb10 RDI: 0000000000000005 [ 165.421886] RBP: 00007ffedf59fc10 R08: 00007ffedf59ea54 R09: 0000000000000001 [ 165.424445] R10: 00007ff1fa630c6c R11: 0000000000000246 R12: 0000000000020000 [ 165.426954] R13: 0000000000000280 R14: 0000000000000005 R15: 00007ffedf59ea90 Disable preemption before accessing the lookup scratch area in nft_pipapo_insert(). Reported-by: kernel test robot <lkp@intel.com> Analysed-by: Florian Westphal <fw@strlen.de> Cc: <stable@vger.kernel.org> # 5.6.x Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-08Merge tag 'ceph-for-5.8-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "The highlights are: - OSD/MDS latency and caps cache metrics infrastructure for the filesytem (Xiubo Li). Currently available through debugfs and will be periodically sent to the MDS in the future. - support for replica reads (balanced and localized reads) for rbd and the filesystem (myself). The default remains to always read from primary, users can opt-in with the new crush_location and read_from_replica options. Note that reading from replica is safe for general use only since Octopus. - support for RADOS allocation hint flags (myself). Currently used by rbd to propagate the compressible/incompressible hint given with the new compression_hint map option and ready for passing on more advanced hints, e.g. based on fadvise() from the filesystem. - support for efficient cross-quota-realm renames (Luis Henriques) - assorted cap handling improvements and cleanups, particularly untangling some of the locking (Jeff Layton)" * tag 'ceph-for-5.8-rc1' of git://github.com/ceph/ceph-client: (29 commits) rbd: compression_hint option libceph: support for alloc hint flags libceph: read_from_replica option libceph: support for balanced and localized reads libceph: crush_location infrastructure libceph: decode CRUSH device/bucket types and names libceph: add non-asserting rbtree insertion helper ceph: skip checking caps when session reconnecting and releasing reqs ceph: make sure mdsc->mutex is nested in s->s_mutex to fix dead lock ceph: don't return -ESTALE if there's still an open file libceph, rbd: replace zero-length array with flexible-array ceph: allow rename operation under different quota realms ceph: normalize 'delta' parameter usage in check_quota_exceeded ceph: ceph_kick_flushing_caps needs the s_mutex ceph: request expedited service on session's last cap flush ceph: convert mdsc->cap_dirty to a per-session list ceph: reset i_requested_max_size if file write is not wanted ceph: throw a warning if we destroy session with mutex still locked ceph: fix potential race in ceph_check_caps ceph: document what protects i_dirty_item and i_flushing_item ...
2020-06-08netfilter: nft_set_rbtree: Don't account for expired elements on insertionStefano Brivio
While checking the validity of insertion in __nft_rbtree_insert(), we currently ignore conflicting elements and intervals only if they are not active within the next generation. However, if we consider expired elements and intervals as potentially conflicting and overlapping, we'll return error for entries that should be added instead. This is particularly visible with garbage collection intervals that are comparable with the element timeout itself, as reported by Mike Dillinger. Other than the simple issue of denying insertion of valid entries, this might also result in insertion of a single element (opening or closing) out of a given interval. With single entries (that are inserted as intervals of size 1), this leads in turn to the creation of new intervals. For example: # nft add element t s { 192.0.2.1 } # nft list ruleset [...] elements = { 192.0.2.1-255.255.255.255 } Always ignore expired elements active in the next generation, while checking for conflicts. It might be more convenient to introduce a new macro that covers both inactive and expired items, as this type of check also appears quite frequently in other set back-ends. This is however beyond the scope of this fix and can be deferred to a separate patch. Other than the overlap detection cases introduced by commit 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion"), we also have to cover the original conflict check dealing with conflicts between two intervals of size 1, which was introduced before support for timeout was introduced. This won't return an error to the user as -EEXIST is masked by nft if NLM_F_EXCL is not given, but would result in a silent failure adding the entry. Reported-by: Mike Dillinger <miked@softtalker.com> Cc: <stable@vger.kernel.org> # 5.6.x Fixes: 8d8540c4f5e0 ("netfilter: nft_set_rbtree: add timeout support") Fixes: 7c84d41416d8 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-06-08sunrpc: use kmemdup_nul() in gssp_stringify()Chen Zhou
It is more efficient to use kmemdup_nul() if the size is known exactly . According to doc: "Note: Use kmemdup_nul() instead if the size is known exactly." Signed-off-by: Chen Zhou <chenzhou10@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2020-06-08net/sysctl: use cpumask_parse in flow_limit_cpu_sysctlChristoph Hellwig
cpumask_parse_user works on __user pointers, so this is wrong now. Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler") Reported-by: build test robot <lkp@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-06-08net: fix wiki website url mac80211 and wireless filesFlavio Suligoi
In the files: - net/mac80211/rx.c - net/wireless/Kconfig the wiki url is still the old "wireless.kernel.org" instead of the new "wireless.wiki.kernel.org" Signed-off-by: Flavio Suligoi <f.suligoi@asem.it> Link: https://lore.kernel.org/r/20200605154112.16277-10-f.suligoi@asem.it Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-06-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: - Fix the build with certain Kconfig combinations for the Chelsio inline TLS device, from Rohit Maheshwar and Vinay Kumar Yadavi. - Fix leak in genetlink, from Cong Lang. - Fix out of bounds packet header accesses in seg6, from Ahmed Abdelsalam. - Two XDP fixes in the ENA driver, from Sameeh Jubran - Use rwsem in device rename instead of a seqcount because this code can sleep, from Ahmed S. Darwish. - Fix WoL regressions in r8169, from Heiner Kallweit. - Fix qed crashes in kdump mode, from Alok Prasad. - Fix the callbacks used for certain thermal zones in mlxsw, from Vadim Pasternak. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits) net: dsa: lantiq_gswip: fix and improve the unsupported interface error mlxsw: core: Use different get_trend() callbacks for different thermal zones net: dp83869: Reset return variable if PHY strap is read rhashtable: Drop raw RCU deref in nested_table_free cxgb4: Use kfree() instead kvfree() where appropriate net: qed: fixes crash while running driver in kdump kernel vsock/vmci: make vmci_vsock_transport_cb() static net: ethtool: Fix comment mentioning typo in IS_ENABLED() net: phy: mscc: fix Serdes configuration in vsc8584_config_init net: mscc: Fix OF_MDIO config check net: marvell: Fix OF_MDIO config check net: dp83867: Fix OF_MDIO config check net: dp83869: Fix OF_MDIO config check net: ethernet: mvneta: fix MVNETA_SKB_HEADROOM alignment ethtool: linkinfo: remove an unnecessary NULL check net/xdp: use shift instead of 64 bit division crypto/chtls:Fix compile error when CONFIG_IPV6 is disabled inet_connection_sock: clear inet_num out of destroy helper yam: fix possible memory leak in yam_init_driver lan743x: Use correct MAC_CR configuration for 1 GBit speed ...
2020-06-06Merge tag 'kbuild-v5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - fix warnings in 'make clean' for ARCH=um, hexagon, h8300, unicore32 - ensure to rebuild all objects when the compiler is upgraded - exclude system headers from dependency tracking and fixdep processing - fix potential bit-size mismatch between the kernel and BPF user-mode helper - add the new syntax 'userprogs' to build user-space programs for the target architecture (the same arch as the kernel) - compile user-space sample code under samples/ for the target arch instead of the host arch - make headers_install fail if a CONFIG option is leaked to user-space - sanitize the output format of scripts/checkstack.pl - handle ARM 'push' instruction in scripts/checkstack.pl - error out before modpost if a module name conflict is found - error out when multiple directories are passed to M= because this feature is broken for a long time - add CONFIG_DEBUG_INFO_COMPRESSED to support compressed debug info - a lot of cleanups of modpost - dump vmlinux symbols out into vmlinux.symvers, and reuse it in the second pass of modpost - do not run the second pass of modpost if nothing in modules is updated - install modules.builtin(.modinfo) by 'make install' as well as by 'make modules_install' because it is useful even when CONFIG_MODULES=n - add new command line variables, GZIP, BZIP2, LZOP, LZMA, LZ4, and XZ to allow users to use alternatives such as pigz, pbzip2, etc. * tag 'kbuild-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (96 commits) kbuild: add variables for compression tools Makefile: install modules.builtin even if CONFIG_MODULES=n mksysmap: Fix the mismatch of '.L' symbols in System.map kbuild: doc: rename LDFLAGS to KBUILD_LDFLAGS modpost: change elf_info->size to size_t modpost: remove is_vmlinux() helper modpost: strip .o from modname before calling new_module() modpost: set have_vmlinux in new_module() modpost: remove mod->skip struct member modpost: add mod->is_vmlinux struct member modpost: remove is_vmlinux() call in check_for_{gpl_usage,unused}() modpost: remove mod->is_dot_o struct member modpost: move -d option in scripts/Makefile.modpost modpost: remove -s option modpost: remove get_next_text() and make {grab,release_}file static modpost: use read_text_file() and get_line() for reading text files modpost: avoid false-positive file open error modpost: fix potential mmap'ed file overrun in get_src_version() modpost: add read_text_file() and get_line() helpers modpost: do not call get_modinfo() for vmlinux(.o) ...
2020-06-05Merge tag 'afs-next-20200604' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull AFS updates from David Howells: "There's some core VFS changes which affect a couple of filesystems: - Make the inode hash table RCU safe and providing some RCU-safe accessor functions. The search can then be done without taking the inode_hash_lock. Care must be taken because the object may be being deleted and no wait is made. - Allow iunique() to avoid taking the inode_hash_lock. - Allow AFS's callback processing to avoid taking the inode_hash_lock when using the inode table to find an inode to notify. - Improve Ext4's time updating. Konstantin Khlebnikov said "For now, I've plugged this issue with try-lock in ext4 lazy time update. This solution is much better." Then there's a set of changes to make a number of improvements to the AFS driver: - Improve callback (ie. third party change notification) processing by: (a) Relying more on the fact we're doing this under RCU and by using fewer locks. This makes use of the RCU-based inode searching outlined above. (b) Moving to keeping volumes in a tree indexed by volume ID rather than a flat list. (c) Making the server and volume records logically part of the cell. This means that a server record now points directly at the cell and the tree of volumes is there. This removes an N:M mapping table, simplifying things. - Improve keeping NAT or firewall channels open for the server callbacks to reach the client by actively polling the fileserver on a timed basis, instead of only doing it when we have an operation to process. - Improving detection of delayed or lost callbacks by including the parent directory in the list of file IDs to be queried when doing a bulk status fetch from lookup. We can then check to see if our copy of the directory has changed under us without us getting notified. - Determine aliasing of cells (such as a cell that is pointed to be a DNS alias). This allows us to avoid having ambiguity due to apparently different cells using the same volume and file servers. - Improve the fileserver rotation to do more probing when it detects that all of the addresses to a server are listed as non-responsive. It's possible that an address that previously stopped responding has become responsive again. Beyond that, lay some foundations for making some calls asynchronous: - Turn the fileserver cursor struct into a general operation struct and hang the parameters off of that rather than keeping them in local variables and hang results off of that rather than the call struct. - Implement some general operation handling code and simplify the callers of operations that affect a volume or a volume component (such as a file). Most of the operation is now done by core code. - Operations are supplied with a table of operations to issue different variants of RPCs and to manage the completion, where all the required data is held in the operation object, thereby allowing these to be called from a workqueue. - Put the standard "if (begin), while(select), call op, end" sequence into a canned function that just emulates the current behaviour for now. There are also some fixes interspersed: - Don't let the EACCES from ICMP6 mapping reach the user as such, since it's confusing as to whether it's a filesystem error. Convert it to EHOSTUNREACH. - Don't use the epoch value acquired through probing a server. If we have two servers with the same UUID but in different cells, it's hard to draw conclusions from them having different epoch values. - Don't interpret the argument to the CB.ProbeUuid RPC as a fileserver UUID and look up a fileserver from it. - Deal with servers in different cells having the same UUIDs. In the event that a CB.InitCallBackState3 RPC is received, we have to break the callback promises for every server record matching that UUID. - Don't let afs_statfs return values that go below 0. - Don't use running fileserver probe state to make server selection and address selection decisions on. Only make decisions on final state as the running state is cleared at the start of probing" Acked-by: Al Viro <viro@zeniv.linux.org.uk> (fs/inode.c part) * tag 'afs-next-20200604' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (27 commits) afs: Adjust the fileserver rotation algorithm to reprobe/retry more quickly afs: Show more a bit more server state in /proc/net/afs/servers afs: Don't use probe running state to make decisions outside probe code afs: Fix afs_statfs() to not let the values go below zero afs: Fix the by-UUID server tree to allow servers with the same UUID afs: Reorganise volume and server trees to be rooted on the cell afs: Add a tracepoint to track the lifetime of the afs_volume struct afs: Detect cell aliases 3 - YFS Cells with a canonical cell name op afs: Detect cell aliases 2 - Cells with no root volumes afs: Detect cell aliases 1 - Cells with root volumes afs: Implement client support for the YFSVL.GetCellName RPC op afs: Retain more of the VLDB record for alias detection afs: Fix handling of CB.ProbeUuid cache manager op afs: Don't get epoch from a server because it may be ambiguous afs: Build an abstraction around an "operation" concept afs: Rename struct afs_fs_cursor to afs_operation afs: Remove the error argument from afs_protocol_error() afs: Set error flag rather than return error from file status decode afs: Make callback processing more efficient. afs: Show more information in /proc/net/afs/servers ...
2020-06-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "A more active cycle than most of the recent past, with a few large, long discussed works this time. The RNBD block driver has been posted for nearly two years now, and flowing through RDMA due to it also introducing a new ULP. The removal of FMR has been a recurring discussion theme for a long time. And the usual smattering of features and bug fixes. Summary: - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa - Continuing driver cleanups in bnxt_re, hns - Big cleanup of mlx5 QP creation flows - More consistent use of src port and flow label when LAG is used and a mlx5 implementation - Additional set of cleanups for IB CM - 'RNBD' network block driver and target. This is a network block RDMA device specific to ionos's cloud environment. It brings strong multipath and resiliency capabilities. - Accelerated IPoIB for HFI1 - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple async fds - Support for exchanging the new IBTA defiend ECE data during RDMA CM exchanges - Removal of the very old and insecure FMR interface from all ULPs and drivers. FRWR should be preferred for at least a decade now" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (247 commits) RDMA/cm: Spurious WARNING triggered in cm_destroy_id() RDMA/mlx5: Return ECE DC support RDMA/mlx5: Don't rely on FW to set zeros in ECE response RDMA/mlx5: Return an error if copy_to_user fails IB/hfi1: Use free_netdev() in hfi1_netdev_free() RDMA/hns: Uninitialized variable in modify_qp_init_to_rtr() RDMA/core: Move and rename trace_cm_id_create() IB/hfi1: Fix hfi1_netdev_rx_init() error handling RDMA: Remove 'max_map_per_fmr' RDMA: Remove 'max_fmr' RDMA/core: Remove FMR device ops RDMA/rdmavt: Remove FMR memory registration RDMA/mthca: Remove FMR support for memory registration RDMA/mlx4: Remove FMR support for memory registration RDMA/i40iw: Remove FMR leftovers RDMA/bnxt_re: Remove FMR leftovers RDMA/mlx5: Remove FMR leftovers RDMA/core: Remove FMR pool API RDMA/rds: Remove FMR support for memory registration RDMA/srp: Remove support for FMR memory registration ...
2020-06-05vsock/vmci: make vmci_vsock_transport_cb() staticStefano Garzarella
Fix the following gcc-9.3 warning when building with 'make W=1': net/vmw_vsock/vmci_transport.c:2058:6: warning: no previous prototype for ‘vmci_vsock_transport_cb’ [-Wmissing-prototypes] 2058 | void vmci_vsock_transport_cb(bool is_host) | ^~~~~~~~~~~~~~~~~~~~~~~ Fixes: b1bba80a4376 ("vsock/vmci: register vmci_transport only when VMCI guest/host are active") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-05ethtool: linkinfo: remove an unnecessary NULL checkDan Carpenter
This code generates a Smatch warning: net/ethtool/linkinfo.c:143 ethnl_set_linkinfo() warn: variable dereferenced before check 'info' (see line 119) Fortunately, the "info" pointer is never NULL so the check can be removed. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>