summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-07-09Merge tag 'nfs-for-5.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - Multiple patches to add support for fcntl() leases over NFSv4. - A sysfs interface to display more information about the various transport connections used by the RPC client - A sysfs interface to allow a suitably privileged user to offline a transport that may no longer point to a valid server - A sysfs interface to allow a suitably privileged user to change the server IP address used by the RPC client Stable fixes: - Two sunrpc fixes for deadlocks involving privileged rpc_wait_queues Bugfixes: - SUNRPC: Avoid a KASAN slab-out-of-bounds bug in xdr_set_page_base() - SUNRPC: prevent port reuse on transports which don't request it. - NFSv3: Fix memory leak in posix_acl_create() - NFS: Various fixes to attribute revalidation timeouts - NFSv4: Fix handling of non-atomic change attribute updates - NFSv4: If a server is down, don't cause mounts to other servers to hang as well - pNFS: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT - NFS: Fix mount failures due to incorrect setting of the has_sec_mnt_opts filesystem flag - NFS: Ensure nfs_readpage returns promptly when an internal error occurs - NFS: Fix fscache read from NFS after cache error - pNFS: Various bugfixes around the LAYOUTGET operation" * tag 'nfs-for-5.14-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (46 commits) NFSv4/pNFS: Return an error if _nfs4_pnfs_v3_ds_connect can't load NFSv3 NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times NFSv4/pnfs: Clean up layout get on open NFSv4/pnfs: Fix layoutget behaviour after invalidation NFSv4/pnfs: Fix the layout barrier update NFS: Fix fscache read from NFS after cache error NFS: Ensure nfs_readpage returns promptly when internal error occurs sunrpc: remove an offlined xprt using sysfs sunrpc: provide showing transport's state info in the sysfs directory sunrpc: display xprt's queuelen of assigned tasks via sysfs sunrpc: provide multipath info in the sysfs directory NFSv4.1 identify and mark RPC tasks that can move between transports sunrpc: provide transport info in the sysfs directory SUNRPC: take a xprt offline using sysfs sunrpc: add dst_attr attributes to the sysfs xprt directory SUNRPC for TCP display xprt's source port in sysfs xprt_info SUNRPC query transport's source port SUNRPC display xprt's main value in sysfs's xprt_info SUNRPC mark the first transport sunrpc: add add sysfs directory per xprt under each xprt_switch ...
2021-07-08net/ncsi: add dummy response handler for Intel boardsIvan Mikhaylov
Add the dummy response handler for Intel boards to prevent incorrect handling of OEM commands. Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08net/ncsi: add NCSI Intel OEM command to keep PHY upIvan Mikhaylov
This allows to keep PHY link up and prevents any channel resets during the host load. It is KEEP_PHY_LINK_UP option(Veto bit) in i210 datasheet which block PHY reset and power state changes. Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08net/ncsi: fix restricted cast warning of sparseIvan Mikhaylov
Sparse reports: net/ncsi/ncsi-rsp.c:406:24: warning: cast to restricted __be32 net/ncsi/ncsi-manage.c:732:33: warning: cast to restricted __be32 net/ncsi/ncsi-manage.c:756:25: warning: cast to restricted __be32 net/ncsi/ncsi-manage.c:779:25: warning: cast to restricted __be32 Signed-off-by: Ivan Mikhaylov <i.mikhaylov@yadro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08ipv6: tcp: drop silly ICMPv6 packet too big messagesEric Dumazet
While TCP stack scales reasonably well, there is still one part that can be used to DDOS it. IPv6 Packet too big messages have to lookup/insert a new route, and if abused by attackers, can easily put hosts under high stress, with many cpus contending on a spinlock while one is stuck in fib6_run_gc() ip6_protocol_deliver_rcu() icmpv6_rcv() icmpv6_notify() tcp_v6_err() tcp_v6_mtu_reduced() inet6_csk_update_pmtu() ip6_rt_update_pmtu() __ip6_rt_update_pmtu() ip6_rt_cache_alloc() ip6_dst_alloc() dst_alloc() ip6_dst_gc() fib6_run_gc() spin_lock_bh() ... Some of our servers have been hit by malicious ICMPv6 packets trying to _increase_ the MTU/MSS of TCP flows. We believe these ICMPv6 packets are a result of a bug in one ISP stack, since they were blindly sent back for _every_ (small) packet sent to them. These packets are for one TCP flow: 09:24:36.266491 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 09:24:36.266509 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 09:24:36.316688 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 09:24:36.316704 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 09:24:36.608151 IP6 Addr1 > Victim ICMP6, packet too big, mtu 1460, length 1240 TCP stack can filter some silly requests : 1) MTU below IPV6_MIN_MTU can be filtered early in tcp_v6_err() 2) tcp_v6_mtu_reduced() can drop requests trying to increase current MSS. This tests happen before the IPv6 routing stack is entered, thus removing the potential contention and route exhaustion. Note that IPv6 stack was performing these checks, but too late (ie : after the route has been added, and after the potential garbage collect war) v2: fix typo caught by Martin, thanks ! v3: exports tcp_mtu_to_mss(), caught by David, thanks ! Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Cc: Martin KaFai Lau <kafai@fb.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-08Merge part 2 of branch 'sysfs-devel'Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08Merge branch 'sysfs-devel'Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: remove an offlined xprt using sysfsOlga Kornievskaia
Once a transport has been put offline, this transport can be also removed from the list of transports. Any tasks that have been stuck on this transport would find the next available active transport and be re-tried. This transport would be removed from the xprt_switch list and freed. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: provide showing transport's state info in the sysfs directoryOlga Kornievskaia
In preparation of being able to change the xprt's state, add a way to show currect state of the transport. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: display xprt's queuelen of assigned tasks via sysfsOlga Kornievskaia
Once a task grabs a trasnport it's reflected in the queuelen of the rpc_xprt structure. Add display of that value in the xprt's info file in sysfs. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: provide multipath info in the sysfs directoryOlga Kornievskaia
Allow to query xrpt_switch attributes. Currently showing the following fields of the rpc_xprt_switch structure: xps_nxprts, xps_nactive, xps_queuelen. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: provide transport info in the sysfs directoryOlga Kornievskaia
Allow to query transport's attributes. Currently showing following fields of the rpc_xprt structure: state, last_used, cong, cwnd, max_reqs, min_reqs, num_reqs, sizes of queues binding, sending, pending, backlog. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08SUNRPC: take a xprt offline using sysfsOlga Kornievskaia
Using sysfs's xprt_state attribute, mark a particular transport offline. It will not be picked during the round-robin selection. It's not allowed to take the main (1st created transport associated with the rpc_client) offline. Also bring a transport back online via sysfs by writing "online" and that would allow for this transport to be picked during the round- robin selection. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: add dst_attr attributes to the sysfs xprt directoryOlga Kornievskaia
Allow to query and set the destination's address of a transport. Setting of the destination address is allowed only for TCP or RDMA based connections. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08SUNRPC for TCP display xprt's source port in sysfs xprt_infoOlga Kornievskaia
Using TCP connection's source port it is useful to match connections seen on the network traces to the xprts used by the linux nfs client. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08SUNRPC query transport's source portOlga Kornievskaia
Provide ability to query transport's source port. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08SUNRPC display xprt's main value in sysfs's xprt_infoOlga Kornievskaia
Display in sysfs in the information about the xprt if this is a main transport or not. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08SUNRPC mark the first transportOlga Kornievskaia
When an RPC client gets created it's first transport is special and should be marked a main transport. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: add add sysfs directory per xprt under each xprt_switchOlga Kornievskaia
Add individual transport directories under each transport switch group. For instance, for each nconnect=X connections there will be a transport directory. Naming conventions also identifies transport type -- xprt-<id>-<type> where type is udp, tcp, rdma, local, bc. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: add a symlink from rpc-client directory to the xprt_switchOlga Kornievskaia
An rpc client uses a transport switch and one ore more transports associated with that switch. Since transports are shared among rpc clients, create a symlink into the xprt_switch directory instead of duplicating entries under each rpc client. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: add xprt_switch direcotry to sunrpc's sysfsOlga Kornievskaia
Add xprt_switch directory to the sysfs and create individual xprt_swith subdirectories for multipath transport group. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: keep track of the xprt_class in rpc_xprt structureOlga Kornievskaia
We need to keep track of the type for a given transport. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: add IDs to multipathOlga Kornievskaia
This is used to uniquely identify sunrpc multipath objects in /sys. Signed-off-by: Dan Aloni <dan@kernelim.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: add xprt idOlga Kornievskaia
This adds a unique identifier for a sunrpc transport in sysfs, which is similarly managed to the unique IDs of clients. Signed-off-by: Dan Aloni <dan@kernelim.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: Create per-rpc_clnt sysfs kobjectsOlga Kornievskaia
These will eventually have files placed under them for sysfs operations. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: Create a client/ subdirectory in the sunrpc sysfsOlga Kornievskaia
For network namespace separation. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08sunrpc: Create a sunrpc directory under /sys/kernel/Olga Kornievskaia
This is where we'll put per-rpc_client related files Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-07-08skbuff: Fix build with SKB extensions disabledFlorian Fainelli
We will fail to build with CONFIG_SKB_EXTENSIONS disabled after 8550ff8d8c75 ("skbuff: Release nfct refcount on napi stolen or re-used skbs") since there is an unconditionally use of skb_ext_find() without an appropriate stub. Simply build the code conditionally and properly guard against both COFNIG_SKB_EXTENSIONS as well as CONFIG_NET_TC_SKB_EXT being disabled. Fixes: Fixes: 8550ff8d8c75 ("skbuff: Release nfct refcount on napi stolen or re-used skbs") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-07ipmr: Fix indentation issueRoy, UjjaL
Fixed indentation by removing extra spaces. Signed-off-by: Roy, UjjaL <royujjal@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-07sock: unlock on error in sock_setsockopt()Dan Carpenter
If copy_from_sockptr() then we need to unlock before returning. Fixes: d463126e23f1 ("net: sock: extend SO_TIMESTAMPING for PHC binding") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-07bpf: devmap: Implement devmap prog execution for generic XDPKumar Kartikeya Dwivedi
This lifts the restriction on running devmap BPF progs in generic redirect mode. To match native XDP behavior, it is invoked right before generic_xdp_tx is called, and only supports XDP_PASS/XDP_ABORTED/ XDP_DROP actions. We also return 0 even if devmap program drops the packet, as semantically redirect has already succeeded and the devmap prog is the last point before TX of the packet to device where it can deliver a verdict on the packet. This also means it must take care of freeing the skb, as xdp_do_generic_redirect callers only do that in case an error is returned. Since devmap entry prog is supported, remove the check in generic_xdp_install entirely. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210702111825.491065-5-memxor@gmail.com
2021-07-07bpf: cpumap: Implement generic cpumapKumar Kartikeya Dwivedi
This change implements CPUMAP redirect support for generic XDP programs. The idea is to reuse the cpu map entry's queue that is used to push native xdp frames for redirecting skb to a different CPU. This will match native XDP behavior (in that RPS is invoked again for packet reinjected into networking stack). To be able to determine whether the incoming skb is from the driver or cpumap, we reuse skb->redirected bit that skips generic XDP processing when it is set. To always make use of this, CONFIG_NET_REDIRECT guard on it has been lifted and it is always available. >From the redirect side, we add the skb to ptr_ring with its lowest bit set to 1. This should be safe as skb is not 1-byte aligned. This allows kthread to discern between xdp_frames and sk_buff. On consumption of the ptr_ring item, the lowest bit is unset. In the end, the skb is simply added to the list that kthread is anyway going to maintain for xdp_frames converted to skb, and then received again by using netif_receive_skb_list. Bulking optimization for generic cpumap is left as an exercise for a future patch for now. Since cpumap entry progs are now supported, also remove check in generic_xdp_install for the cpumap. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20210702111825.491065-4-memxor@gmail.com
2021-07-07net: core: Split out code to run generic XDP progKumar Kartikeya Dwivedi
This helper can later be utilized in code that runs cpumap and devmap programs in generic redirect mode and adjust skb based on changes made to xdp_buff. When returning XDP_REDIRECT/XDP_TX, it invokes __skb_push, so whenever a generic redirect path invokes devmap/cpumap prog if set, it must __skb_pull again as we expect mac header to be pulled. It also drops the skb_reset_mac_len call after do_xdp_generic, as the mac_header and network_header are advanced by the same offset, so the difference (mac_len) remains constant. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210702111825.491065-2-memxor@gmail.com
2021-07-07bpf: Support specifying ingress via xdp_md context in BPF_PROG_TEST_RUNZvi Effron
Support specifying the ingress_ifindex and rx_queue_index of xdp_md contexts for BPF_PROG_TEST_RUN. The intended use case is to allow testing XDP programs that make decisions based on the ingress interface or RX queue. If ingress_ifindex is specified, look up the device by the provided index in the current namespace and use its xdp_rxq for the xdp_buff. If the rx_queue_index is out of range, or is non-zero when the ingress_ifindex is 0, return -EINVAL. Co-developed-by: Cody Haas <chaas@riotgames.com> Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Cody Haas <chaas@riotgames.com> Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Zvi Effron <zeffron@riotgames.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210707221657.3985075-4-zeffron@riotgames.com
2021-07-07bpf: Support input xdp_md context in BPF_PROG_TEST_RUNZvi Effron
Support passing a xdp_md via ctx_in/ctx_out in bpf_attr for BPF_PROG_TEST_RUN. The intended use case is to pass some XDP meta data to the test runs of XDP programs that are used as tail calls. For programs that use bpf_prog_test_run_xdp, support xdp_md input and output. Unlike with an actual xdp_md during a non-test run, data_meta must be 0 because it must point to the start of the provided user data. From the initial xdp_md, use data and data_end to adjust the pointers in the generated xdp_buff. All other non-zero fields are prohibited (with EINVAL). If the user has set ctx_out/ctx_size_out, copy the (potentially different) xdp_md back to the userspace. We require all fields of input xdp_md except the ones we explicitly support to be set to zero. The expectation is that in the future we might add support for more fields and we want to fail explicitly if the user runs the program on the kernel where we don't yet support them. Co-developed-by: Cody Haas <chaas@riotgames.com> Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Cody Haas <chaas@riotgames.com> Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Zvi Effron <zeffron@riotgames.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210707221657.3985075-3-zeffron@riotgames.com
2021-07-07bpf: Add function for XDP meta data length checkZvi Effron
This commit prepares to use the XDP meta data length check in multiple places by making it into a static inline function instead of a literal. Co-developed-by: Cody Haas <chaas@riotgames.com> Co-developed-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Cody Haas <chaas@riotgames.com> Signed-off-by: Lisa Watanabe <lwatanabe@riotgames.com> Signed-off-by: Zvi Effron <zeffron@riotgames.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210707221657.3985075-2-zeffron@riotgames.com
2021-07-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Do not refresh timeout in SYN_SENT for syn retransmissions. Add selftest for unreplied TCP connection, from Florian Westphal. 2) Fix null dereference from error path with hardware offload in nftables. 3) Remove useless nf_ct_gre_keymap_flush() from netns exit path, from Vasily Averin. 4) Missing rcu read-lock side in ctnetlink helper info dump, also from Vasily. 5) Do not mark RST in the reply direction coming after SYN packet for an out-of-sync entry, from Ali Abdallah and Florian Westphal. 6) Add tcp_ignore_invalid_rst sysctl to allow to disable out of segment RSTs, from Ali. 7) KCSAN fix for nf_conntrack_all_lock(), from Manfred Spraul. 8) Honor NFTA_LAST_SET in nft_last. 9) Fix incorrect arithmetics when restore last_jiffies in nft_last. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-07Merge tag 'nfsd-5.14' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: - add tracepoints for callbacks and for client creation and destruction - cache the mounts used for server-to-server copies - expose callback information in /proc/fs/nfsd/clients/*/info - don't hold locks unnecessarily while waiting for commits - update NLM to use xdr_stream, as we have for NFSv2/v3/v4 * tag 'nfsd-5.14' of git://linux-nfs.org/~bfields/linux: (69 commits) nfsd: fix NULL dereference in nfs3svc_encode_getaclres NFSD: Prevent a possible oops in the nfs_dirent() tracepoint nfsd: remove redundant assignment to pointer 'this' nfsd: Reduce contention for the nfsd_file nf_rwsem lockd: Update the NLMv4 SHARE results encoder to use struct xdr_stream lockd: Update the NLMv4 nlm_res results encoder to use struct xdr_stream lockd: Update the NLMv4 TEST results encoder to use struct xdr_stream lockd: Update the NLMv4 void results encoder to use struct xdr_stream lockd: Update the NLMv4 FREE_ALL arguments decoder to use struct xdr_stream lockd: Update the NLMv4 SHARE arguments decoder to use struct xdr_stream lockd: Update the NLMv4 SM_NOTIFY arguments decoder to use struct xdr_stream lockd: Update the NLMv4 nlm_res arguments decoder to use struct xdr_stream lockd: Update the NLMv4 UNLOCK arguments decoder to use struct xdr_stream lockd: Update the NLMv4 CANCEL arguments decoder to use struct xdr_stream lockd: Update the NLMv4 LOCK arguments decoder to use struct xdr_stream lockd: Update the NLMv4 TEST arguments decoder to use struct xdr_stream lockd: Update the NLMv4 void arguments decoder to use struct xdr_stream lockd: Update the NLMv1 SHARE results encoder to use struct xdr_stream lockd: Update the NLMv1 nlm_res results encoder to use struct xdr_stream lockd: Update the NLMv1 TEST results encoder to use struct xdr_stream ...
2021-07-06rpc: remove redundant initialization of variable statusColin Ian King
The variable status is being initialized with a value that is never read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-07-06xprtrdma: Fix spelling mistakesZheng Yongjun
Fix some spelling mistakes in comments: succes ==> success Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-07-06ipv6: fix 'disable_policy' for fwd packetsNicolas Dichtel
The goal of commit df789fe75206 ("ipv6: Provide ipv6 version of "disable_policy" sysctl") was to have the disable_policy from ipv4 available on ipv6. However, it's not exactly the same mechanism. On IPv4, all packets coming from an interface, which has disable_policy set, bypass the policy check. For ipv6, this is done only for local packets, ie for packets destinated to an address configured on the incoming interface. Let's align ipv6 with ipv4 so that the 'disable_policy' sysctl has the same effect for both protocols. My first approach was to create a new kind of route cache entries, to be able to set DST_NOPOLICY without modifying routes. This would have added a lot of code. Because the local delivery path is already handled, I choose to focus on the forwarding path to minimize code churn. Fixes: df789fe75206 ("ipv6: Provide ipv6 version of "disable_policy" sysctl") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-06tcp: fix tcp_init_transfer() to not reset icsk_ca_initializedNguyen Dinh Phi
This commit fixes a bug (found by syzkaller) that could cause spurious double-initializations for congestion control modules, which could cause memory leaks or other problems for congestion control modules (like CDG) that allocate memory in their init functions. The buggy scenario constructed by syzkaller was something like: (1) create a TCP socket (2) initiate a TFO connect via sendto() (3) while socket is in TCP_SYN_SENT, call setsockopt(TCP_CONGESTION), which calls: tcp_set_congestion_control() -> tcp_reinit_congestion_control() -> tcp_init_congestion_control() (4) receive ACK, connection is established, call tcp_init_transfer(), set icsk_ca_initialized=0 (without first calling cc->release()), call tcp_init_congestion_control() again. Note that in this sequence tcp_init_congestion_control() is called twice without a cc->release() call in between. Thus, for CC modules that allocate memory in their init() function, e.g, CDG, a memory leak may occur. The syzkaller tool managed to find a reproducer that triggered such a leak in CDG. The bug was introduced when that commit 8919a9b31eb4 ("tcp: Only init congestion control if not initialized already") introduced icsk_ca_initialized and set icsk_ca_initialized to 0 in tcp_init_transfer(), missing the possibility for a sequence like the one above, where a process could call setsockopt(TCP_CONGESTION) in state TCP_SYN_SENT (i.e. after the connect() or TFO open sendmsg()), which would call tcp_init_congestion_control(). It did not intend to reset any initialization that the user had already explicitly made; it just missed the possibility of that particular sequence (which syzkaller managed to find). Fixes: 8919a9b31eb4 ("tcp: Only init congestion control if not initialized already") Reported-by: syzbot+f1e24a0594d4e3a895d3@syzkaller.appspotmail.com Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com> Acked-by: Neal Cardwell <ncardwell@google.com> Tested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-06skbuff: Release nfct refcount on napi stolen or re-used skbsPaul Blakey
When multiple SKBs are merged to a new skb under napi GRO, or SKB is re-used by napi, if nfct was set for them in the driver, it will not be released while freeing their stolen head state or on re-use. Release nfct on napi's stolen or re-used SKBs, and in gro_list_prepare, check conntrack metadata diff. Fixes: 5c6b94604744 ("net/mlx5e: CT: Handle misses after executing CT action") Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-06netfilter: nft_last: incorrect arithmetics when restoring last usedPablo Neira Ayuso
Subtract the jiffies that have passed by to current jiffies to fix last used restoration. Fixes: 836382dc2471 ("netfilter: nf_tables: add last expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-06netfilter: nft_last: honor NFTA_LAST_SET on restorationPablo Neira Ayuso
NFTA_LAST_SET tells us if this expression has ever seen a packet, do not ignore this attribute when restoring the ruleset. Fixes: 836382dc2471 ("netfilter: nf_tables: add last expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-06netfilter: conntrack: Mark access for KCSANManfred Spraul
KCSAN detected an data race with ipc/sem.c that is intentional. As nf_conntrack_lock() uses the same algorithm: Update nf_conntrack_core as well: nf_conntrack_lock() contains a1) spin_lock() a2) smp_load_acquire(nf_conntrack_locks_all). a1) actually accesses one lock from an array of locks. nf_conntrack_locks_all() contains b1) nf_conntrack_locks_all=true (normal write) b2) spin_lock() b3) spin_unlock() b2 and b3 are done for every lock. This guarantees that nf_conntrack_locks_all() prevents any concurrent nf_conntrack_lock() owners: If a thread past a1), then b2) will block until that thread releases the lock. If the threat is before a1, then b3)+a1) ensure the write b1) is visible, thus a2) is guaranteed to see the updated value. But: This is only the latest time when b1) becomes visible. It may also happen that b1) is visible an undefined amount of time before the b3). And thus KCSAN will notice a data race. In addition, the compiler might be too clever. Solution: Use WRITE_ONCE(). Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-06netfilter: conntrack: add new sysctl to disable RST checkAli Abdallah
This patch adds a new sysctl tcp_ignore_invalid_rst to disable marking out of segments RSTs as INVALID. Signed-off-by: Ali Abdallah <aabdallah@suse.de> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-06netfilter: conntrack: improve RST handling when tuple is re-usedAli Abdallah
If we receive a SYN packet in original direction on an existing connection tracking entry, we let this SYN through because conntrack might be out-of-sync. Conntrack gets back in sync when server responds with SYN/ACK and state gets updated accordingly. However, if server replies with RST, this packet might be marked as INVALID because td_maxack value reflects the *old* conntrack state and not the state of the originator of the RST. Avoid td_maxack-based checks if previous packet was a SYN. Unfortunately that is not be enough: an out of order ACK in original direction updates last_index, so we still end up marking valid RST. Thus disable the sequence check when we are not in established state and the received RST has a sequence of 0. Because marking RSTs as invalid usually leads to unwanted timeouts, also skip RST sequence checks if a conntrack entry is already closing. Such entries can already be evicted via GC in case the table is full. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Ali Abdallah <aabdallah@suse.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-07-05Merge tag 'tty-5.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty / serial updates from Greg KH: "Here is the big set of tty and serial driver patches for 5.14-rc1. A bit more than normal, but nothing major, lots of cleanups. Highlights are: - lots of tty api cleanups and mxser driver cleanups from Jiri - build warning fixes - various serial driver updates - coding style cleanups - various tty driver minor fixes and updates - removal of broken and disable r3964 line discipline (finally!) All of these have been in linux-next for a while with no reported issues" * tag 'tty-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (227 commits) serial: mvebu-uart: remove unused member nb from struct mvebu_uart arm64: dts: marvell: armada-37xx: Fix reg for standard variant of UART dt-bindings: mvebu-uart: fix documentation serial: mvebu-uart: correctly calculate minimal possible baudrate serial: mvebu-uart: do not allow changing baudrate when uartclk is not available serial: mvebu-uart: fix calculation of clock divisor tty: make linux/tty_flip.h self-contained serial: Prefer unsigned int to bare use of unsigned serial: 8250: 8250_omap: Fix possible interrupt storm on K3 SoCs serial: qcom_geni_serial: use DT aliases according to DT bindings Revert "tty: serial: Add UART driver for Cortina-Access platform" tty: serial: Add UART driver for Cortina-Access platform MAINTAINERS: add me back as mxser maintainer mxser: Documentation, fix typos mxser: Documentation, make the docs up-to-date mxser: Documentation, remove traces of callout device mxser: introduce mxser_16550A_or_MUST helper mxser: rename flags to old_speed in mxser_set_serial_info mxser: use port variable in mxser_set_serial_info mxser: access info->MCR under info->slock ...
2021-07-02udp: properly flush normal packet at GRO timePaolo Abeni
If an UDP packet enters the GRO engine but is not eligible for aggregation and is not targeting an UDP tunnel, udp_gro_receive() will not set the flush bit, and packet could delayed till the next napi flush. Fix the issue ensuring non GROed packets traverse skb_gro_flush_final(). Reported-and-tested-by: Matthias Treydte <mt@waldheinz.de> Fixes: 18f25dc39990 ("udp: skip L4 aggregation for UDP tunnel packets") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>