summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-03-26Merge tag 'ceph-for-5.6-rc8' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph fixes from Ilya Dryomov: "A patch for a rather old regression in fullness handling and two memory leak fixes, marked for stable" * tag 'ceph-for-5.6-rc8' of git://github.com/ceph/ceph-client: ceph: fix memory leak in ceph_cleanup_snapid_map() libceph: fix alloc_msg_with_page_vector() memory leaks ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL
2020-03-26Merge tag 'mac80211-for-net-2020-03-26' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== We have the following fixes: * drop data packets if there's no key for them anymore, after there had been one, to avoid sending them in clear when hostapd removes the key before it removes the station and the packets are still queued * check port authorization again after dequeue, to avoid sending packets if the station is no longer authorized * actually remove the authorization flag before the key so packets are also dropped properly because of this * fix nl80211 control port packet tagging to handle them as packets allowed to go out without encryption * fix NL80211_ATTR_CHANNEL_WIDTH outgoing netlink attribute width (should be 32 bits, not 8) * don't WARN in a CSA scenario that happens on some APs * fix HE spatial reuse element size calculation ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26Merge tag 'mlx5-updates-2020-03-25' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-03-25 1) Cleanups from Dan Carpenter and wenxu. 2) Paul and Roi, Some minor updates and fixes to E-Switch to address issues introduced in the previous reg_c0 updates series. 3) Eli Cohen simplifies and improves flow steering matching group searches and flow table entries version management. 4) Parav Pandit, improves devlink eswitch mode changes thread safety. By making devlink rely on driver for thread safety and introducing mlx5 eswitch mode change protection. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26tipc: Add a missing case of TIPC_DIRECT_MSG typeHoang Le
In the commit f73b12812a3d ("tipc: improve throughput between nodes in netns"), we're missing a check to handle TIPC_DIRECT_MSG type, it's still using old sending mechanism for this message type. So, throughput improvement is not significant as expected. Besides that, when sending a large message with that type, we're also handle wrong receiving queue, it should be enqueued in socket receiving instead of multicast messages. Fix this by adding the missing case for TIPC_DIRECT_MSG. Fixes: f73b12812a3d ("tipc: improve throughput between nodes in netns") Reported-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au> Acked-by: Jon Maloy <jmaloy@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26mac80211: set IEEE80211_TX_CTRL_PORT_CTRL_PROTO for nl80211 TXJohannes Berg
When a frame is transmitted via the nl80211 TX rather than as a normal frame, IEEE80211_TX_CTRL_PORT_CTRL_PROTO wasn't set and this will lead to wrong decisions (rate control etc.) being made about the frame; fix this. Fixes: 911806491425 ("mac80211: Add support for tx_control_port") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200326155333.f183f52b02f0.I4054e2a8c11c2ddcb795a0103c87be3538690243@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-26mac80211: mark station unauthorized before key removalJohannes Berg
If a station is still marked as authorized, mark it as no longer so before removing its keys. This allows frames transmitted to it to be rejected, providing additional protection against leaking plain text data during the disconnection flow. Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200326155133.ccb4fb0bb356.If48f0f0504efdcf16b8921f48c6d3bb2cb763c99@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-26mac80211: Check port authorization in the ieee80211_tx_dequeue() caseJouni Malinen
mac80211 used to check port authorization in the Data frame enqueue case when going through start_xmit(). However, that authorization status may change while the frame is waiting in a queue. Add a similar check in the dequeue case to avoid sending previously accepted frames after authorization change. This provides additional protection against potential leaking of frames after a station has been disconnected and the keys for it are being removed. Cc: stable@vger.kernel.org Signed-off-by: Jouni Malinen <jouni@codeaurora.org> Link: https://lore.kernel.org/r/20200326155133.ced84317ea29.I34d4c47cd8cc8a4042b38a76f16a601fbcbfd9b3@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-26SUNRPC: fix krb5p mount to provide large enough buffer in rq_rcvsizeOlga Kornievskaia
Ever since commit 2c94b8eca1a2 ("SUNRPC: Use au_rslack when computing reply buffer size"). It changed how "req->rq_rcvsize" is calculated. It used to use au_cslack value which was nice and large and changed it to au_rslack value which turns out to be too small. Since 5.1, v3 mount with sec=krb5p fails against an Ontap server because client's receive buffer it too small. For gss krb5p, we need to account for the mic token in the verifier, and the wrap token in the wrap token. RFC 4121 defines: mic token Octet no Name Description -------------------------------------------------------------- 0..1 TOK_ID Identification field. Tokens emitted by GSS_GetMIC() contain the hex value 04 04 expressed in big-endian order in this field. 2 Flags Attributes field, as described in section 4.2.2. 3..7 Filler Contains five octets of hex value FF. 8..15 SND_SEQ Sequence number field in clear text, expressed in big-endian order. 16..last SGN_CKSUM Checksum of the "to-be-signed" data and octet 0..15, as described in section 4.2.4. that's 16bytes (GSS_KRB5_TOK_HDR_LEN) + chksum wrap token Octet no Name Description -------------------------------------------------------------- 0..1 TOK_ID Identification field. Tokens emitted by GSS_Wrap() contain the hex value 05 04 expressed in big-endian order in this field. 2 Flags Attributes field, as described in section 4.2.2. 3 Filler Contains the hex value FF. 4..5 EC Contains the "extra count" field, in big- endian order as described in section 4.2.3. 6..7 RRC Contains the "right rotation count" in big- endian order, as described in section 4.2.5. 8..15 SND_SEQ Sequence number field in clear text, expressed in big-endian order. 16..last Data Encrypted data for Wrap tokens with confidentiality, or plaintext data followed by the checksum for Wrap tokens without confidentiality, as described in section 4.2.4. Also 16bytes of header (GSS_KRB5_TOK_HDR_LEN), encrypted data, and cksum (other things like padding) RFC 3961 defines known cksum sizes: Checksum type sumtype checksum section or value size reference --------------------------------------------------------------------- CRC32 1 4 6.1.3 rsa-md4 2 16 6.1.2 rsa-md4-des 3 24 6.2.5 des-mac 4 16 6.2.7 des-mac-k 5 8 6.2.8 rsa-md4-des-k 6 16 6.2.6 rsa-md5 7 16 6.1.1 rsa-md5-des 8 24 6.2.4 rsa-md5-des3 9 24 ?? sha1 (unkeyed) 10 20 ?? hmac-sha1-des3-kd 12 20 6.3 hmac-sha1-des3 13 20 ?? sha1 (unkeyed) 14 20 ?? hmac-sha1-96-aes128 15 20 [KRB5-AES] hmac-sha1-96-aes256 16 20 [KRB5-AES] [reserved] 0x8003 ? [GSS-KRB5] Linux kernel now mainly supports type 15,16 so max cksum size is 20bytes. (GSS_KRB5_MAX_CKSUM_LEN) Re-use already existing define of GSS_KRB5_MAX_SLACK_NEEDED that's used for encoding the gss_wrap tokens (same tokens are used in reply). Fixes: 2c94b8eca1a2 ("SUNRPC: Use au_rslack when computing reply buffer size") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-03-26cfg80211: Do not warn on same channel at the end of CSAIlan Peer
When cfg80211_update_assoc_bss_entry() is called, there is a verification that the BSS channel actually changed. As some APs use CSA also for bandwidth changes, this would result with a kernel warning. Fix this by removing the WARN_ON(). Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20200326150855.96316ada0e8d.I6710376b1b4257e5f4712fc7ab16e2b638d512aa@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-26mac80211: drop data frames without key on encrypted linksJohannes Berg
If we know that we have an encrypted link (based on having had a key configured for TX in the past) then drop all data frames in the key selection handler if there's no key anymore. This fixes an issue with mac80211 internal TXQs - there we can buffer frames for an encrypted link, but then if the key is no longer there when they're dequeued, the frames are sent without encryption. This happens if a station is disconnected while the frames are still on the TXQ. Detecting that a link should be encrypted based on a first key having been configured for TX is fine as there are no use cases for a connection going from with encryption to no encryption. With extended key IDs, however, there is a case of having a key configured for only decryption, so we can't just trigger this behaviour on a key being configured. Cc: stable@vger.kernel.org Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20200326150855.6865c7f28a14.I9fb1d911b064262d33e33dfba730cdeef83926ca@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-26xfrm: add prep for esp beet mode offloadXin Long
Like __xfrm_transport/mode_tunnel_prep(), this patch is to add __xfrm_mode_beet_prep() to fix the transport_header for gso segments, and reset skb mac_len, and pull skb data to the proto inside esp. This patch also fixes a panic, reported by ltp: # modprobe esp4_offload # runltp -f net_stress.ipsec_tcp [ 2452.780511] kernel BUG at net/core/skbuff.c:109! [ 2452.799851] Call Trace: [ 2452.800298] <IRQ> [ 2452.800705] skb_push.cold.98+0x14/0x20 [ 2452.801396] esp_xmit+0x17b/0x270 [esp4_offload] [ 2452.802799] validate_xmit_xfrm+0x22f/0x2e0 [ 2452.804285] __dev_queue_xmit+0x589/0x910 [ 2452.806264] __neigh_update+0x3d7/0xa50 [ 2452.806958] arp_process+0x259/0x810 [ 2452.807589] arp_rcv+0x18a/0x1c It was caused by the skb going to esp_xmit with a wrong transport header. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-03-26esp6: add gso_segment for esp6 beet modeXin Long
Similar to xfrm6_tunnel/transport_gso_segment(), _gso_segment() is added to do gso_segment for esp6 beet mode. Before calling inet6_offloads[proto]->callbacks.gso_segment, it needs to do: - Get the upper proto from ph header to get its gso_segment when xo->proto is IPPROTO_BEETPH. - Add SKB_GSO_TCPV6 to gso_type if x->sel.family != AF_INET6 and the proto == IPPROTO_TCP, so that the current tcp ipv6 packet can be segmented. - Calculate a right value for skb->transport_header and move skb->data to the transport header position. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-03-26esp4: add gso_segment for esp4 beet modeXin Long
Similar to xfrm4_tunnel/transport_gso_segment(), _gso_segment() is added to do gso_segment for esp4 beet mode. Before calling inet_offloads[proto]->callbacks.gso_segment, it needs to do: - Get the upper proto from ph header to get its gso_segment when xo->proto is IPPROTO_BEETPH. - Add SKB_GSO_TCPV4 to gso_type if x->sel.family == AF_INET6 and the proto == IPPROTO_TCP, so that the current tcp ipv4 packet can be segmented. - Calculate a right value for skb->transport_header and move skb->data to the transport header position. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-03-25devlink: Rely on driver eswitch thread safety instead of devlinkParav Pandit
devlink_nl_cmd_eswitch_set_doit() doesn't hold devlink->lock mutex while invoking driver callback. This is likely due to eswitch mode setting involves adding/remove devlink ports, health reporters or other devlink objects for a devlink device. So it is driver responsiblity to ensure thread safe eswitch state transition happening via either sriov legacy enablement or via devlink eswitch set callback. Therefore, get() callback should also be invoked without holding devlink->lock mutex. Vendor driver can use same internal lock which it uses during eswitch mode set() callback. This makes get() and set() implimentation symmetric in devlink core and in vendor drivers. Hence, remove holding devlink->lock mutex during eswitch get() callback. Failing to do so results into below deadlock scenario when mlx5_core driver is improved to handle eswitch mode set critical section invoked by devlink and sriov sysfs interface in subsequent patch. devlink_nl_cmd_eswitch_set_doit() mlx5_eswitch_mode_set() mutex_lock(esw->mode_lock) <- Lock A [...] register_devlink_port() mutex_lock(&devlink->lock); <- lock B mutex_lock(&devlink->lock); <- lock B devlink_nl_cmd_eswitch_get_doit() mlx5_eswitch_mode_get() mutex_lock(esw->mode_lock) <- Lock A In subsequent patch, mlx5_core driver uses its internal lock during get() and set() eswitch callbacks. Other drivers have been inspected which returns either constant during get operations or reads the value from already allocated structure. Hence it is safe to remove the lock in get( ) callback and let vendor driver handle it. Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-03-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Overlapping header include additions in macsec.c A bug fix in 'net' overlapping with the removal of 'version' string in ena_netdev.c Overlapping test additions in selftests Makefile Overlapping PCI ID table adjustments in iwlwifi driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25Bluetooth: L2CAP: Use DEFER_SETUP to group ECRED connectionsLuiz Augusto von Dentz
This uses the DEFER_SETUP flag to group channels with L2CAP_CREDIT_BASED_CONNECTION_REQ. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-03-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix deadlock in bpf_send_signal() from Yonghong Song. 2) Fix off by one in kTLS offload of mlx5, from Tariq Toukan. 3) Add missing locking in iwlwifi mvm code, from Avraham Stern. 4) Fix MSG_WAITALL handling in rxrpc, from David Howells. 5) Need to hold RTNL mutex in tcindex_partial_destroy_work(), from Cong Wang. 6) Fix producer race condition in AF_PACKET, from Willem de Bruijn. 7) cls_route removes the wrong filter during change operations, from Cong Wang. 8) Reject unrecognized request flags in ethtool netlink code, from Michal Kubecek. 9) Need to keep MAC in reset until PHY is up in bcmgenet driver, from Doug Berger. 10) Don't leak ct zone template in act_ct during replace, from Paul Blakey. 11) Fix flushing of offloaded netfilter flowtable flows, also from Paul Blakey. 12) Fix throughput drop during tx backpressure in cxgb4, from Rahul Lakkireddy. 13) Don't let a non-NULL skb->dev leave the TCP stack, from Eric Dumazet. 14) TCP_QUEUE_SEQ socket option has to update tp->copied_seq as well, also from Eric Dumazet. 15) Restrict macsec to ethernet devices, from Willem de Bruijn. 16) Fix reference leak in some ethtool *_SET handlers, from Michal Kubecek. 17) Fix accidental disabling of MSI for some r8169 chips, from Heiner Kallweit. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (138 commits) net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build net: ena: Add PCI shutdown handler to allow safe kexec selftests/net/forwarding: define libs as TEST_PROGS_EXTENDED selftests/net: add missing tests to Makefile r8169: re-enable MSI on RTL8168c net: phy: mdio-bcm-unimac: Fix clock handling cxgb4/ptp: pass the sign of offset delta in FW CMD net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop net: cbs: Fix software cbs to consider packet sending time net/mlx5e: Do not recover from a non-fatal syndrome net/mlx5e: Fix ICOSQ recovery flow with Striding RQ net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset net/mlx5e: Enhance ICOSQ WQE info fields net/mlx5_core: Set IB capability mask1 to fix ib_srpt connection failure selftests: netfilter: add nfqueue test case netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress netfilter: nft_fwd_netdev: validate family and chain type netfilter: nft_set_rbtree: Detect partial overlaps on insertion netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start() netfilter: nft_set_pipapo: Separate partial and complete overlap cases on insertion ...
2020-03-25net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} buildPablo Neira Ayuso
net/netfilter/nft_fwd_netdev.c: In function ‘nft_fwd_netdev_eval’: net/netfilter/nft_fwd_netdev.c:32:10: error: ‘struct sk_buff’ has no member named ‘tc_redirected’ pkt->skb->tc_redirected = 1; ^~ net/netfilter/nft_fwd_netdev.c:33:10: error: ‘struct sk_buff’ has no member named ‘tc_from_ingress’ pkt->skb->tc_from_ingress = 1; ^~ To avoid a direct dependency with tc actions from netfilter, wrap the redirect bits around CONFIG_NET_REDIRECT and move helpers to include/linux/skbuff.h. Turn on this toggle from the ifb driver, the only existing client of these bits in the tree. This patch adds skb_set_redirected() that sets on the redirected bit on the skbuff, it specifies if the packet was redirect from ingress and resets the timestamp (timestamp reset was originally missing in the netfilter bugfix). Fixes: bcfabee1afd99484 ("netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress") Reported-by: noreply@ellerman.id.au Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25net: use indirect call wrappers for skb_copy_datagram_iter()Eric Dumazet
TCP recvmsg() calls skb_copy_datagram_iter(), which calls an indirect function (cb pointing to simple_copy_to_iter()) for every MSS (fragment) present in the skb. CONFIG_RETPOLINE=y forces a very expensive operation that we can avoid thanks to indirect call wrappers. This patch gives a 13% increase of performance on a single flow, if the bottleneck is the thread reading the TCP socket. Fixes: 950fcaecd5cc ("datagram: consolidate datagram copy to iter helpers") Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25Bluetooth: don't assume key size is 16 when the command failsAlain Michaud
With this change, the encryption key size is not assumed to be 16 if the read_encryption_key_size command fails for any reason. This ensures that if the controller fails the command for any reason that the encryption key size isn't implicitely set to 16 and instead take a more concervative posture to assume it is 0. Signed-off-by: Alain Michaud <alainm@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-03-25.gitignore: add SPDX License IdentifierMasahiro Yamada
Add SPDX License Identifier to all .gitignore files. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-25nl80211: fix NL80211_ATTR_CHANNEL_WIDTH attribute typeJohannes Berg
The new opmode notification used this attribute with a u8, when it's documented as a u32 and indeed used in userspace as such, it just happens to work on little-endian systems since userspace isn't doing any strict size validation, and the u8 goes into the lower byte. Fix this. Cc: stable@vger.kernel.org Fixes: 466b9936bf93 ("cfg80211: Add support to notify station's opmode change to userspace") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://lore.kernel.org/r/20200325090531.be124f0a11c7.Iedbf4e197a85471ebd729b186d5365c0343bf7a8@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-03-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) A new selftest for nf_queue, from Florian Westphal. This test covers two recent fixes: 07f8e4d0fddb ("tcp: also NULL skb->dev when copy was needed") and b738a185beaa ("tcp: ensure skb->dev is NULL before leaving TCP stack"). 2) The fwd action breaks with ifb. For safety in next extensions, make sure the fwd action only runs from ingress until it is extended to be used from a different hook. 3) The pipapo set type now reports EEXIST in case of subrange overlaps. Update the rbtree set to validate range overlaps, so far this validation is only done only from userspace. From Stefano Brivio. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-24ethtool: fix incorrect tx-checksumming settings reportingVladyslav Tarasiuk
Currently, ethtool feature mask for checksum command is ORed with NETIF_F_FCOE_CRC_BIT, which is bit's position number, instead of the actual feature bit - NETIF_F_FCOE_CRC. The invalid bitmask here might affect unrelated features when toggling TX checksumming. For example, TX checksumming is always mistakenly reported as enabled on the netdevs tested (mlx5, virtio_net). Fixes: f70bb06563ed ("ethtool: update mapping of features to legacy ioctl requests") Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-24net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_popVladimir Oltean
Not only did this wheel did not need reinventing, but there is also an issue with it: It doesn't remove the VLAN header in a way that preserves the L2 payload checksum when that is being provided by the DSA master hw. It should recalculate checksum both for the push, before removing the header, and for the pull afterwards. But the current implementation is quite dizzying, with pulls followed immediately afterwards by pushes, the memmove is done before the push, etc. This makes a DSA master with RX checksumming offload to print stack traces with the infamous 'hw csum failure' message. So remove the dsa_8021q_remove_header function and replace it with something that actually works with inet checksumming. Fixes: d461933638ae ("net: dsa: tag_8021q: Create helper function for removing VLAN header") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-24net: cbs: Fix software cbs to consider packet sending timeZh-yuan Ye
Currently the software CBS does not consider the packet sending time when depleting the credits. It caused the throughput to be Idleslope[kbps] * (Port transmit rate[kbps] / |Sendslope[kbps]|) where Idleslope * (Port transmit rate / (Idleslope + |Sendslope|)) = Idleslope is expected. In order to fix the issue above, this patch takes the time when the packet sending completes into account by moving the anchor time variable "last" ahead to the send completion time upon transmission and adding wait when the next dequeue request comes before the send completion time of the previous packet. changelog: V2->V3: - remove unnecessary whitespace cleanup - add the checks if port_rate is 0 before division V1->V2: - combine variable "send_completed" into "last" - add the comment for estimate of the packet sending Fixes: 585d763af09c ("net/sched: Introduce Credit Based Shaper (CBS) qdisc") Signed-off-by: Zh-yuan Ye <ye.zh-yuan@socionext.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-24netfilter: nft_fwd_netdev: allow to redirect to ifb via ingressPablo Neira Ayuso
Set skb->tc_redirected to 1, otherwise the ifb driver drops the packet. Set skb->tc_from_ingress to 1 to reinject the packet back to the ingress path after leaving the ifb egress path. This patch inconditionally sets on these two skb fields that are meaningful to the ifb driver. The existing forward action is guaranteed to run from ingress path. Fixes: 39e6dea28adc ("netfilter: nf_tables: add forward expression to the netdev family") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-24netfilter: nft_fwd_netdev: validate family and chain typePablo Neira Ayuso
Make sure the forward action is only used from ingress. Fixes: 39e6dea28adc ("netfilter: nf_tables: add forward expression to the netdev family") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-24netfilter: nft_set_rbtree: Detect partial overlaps on insertionStefano Brivio
...and return -ENOTEMPTY to the front-end in this case, instead of proceeding. Currently, nft takes care of checking for these cases and not sending them to the kernel, but if we drop the set_overlap() call in nft we can end up in situations like: # nft add table t # nft add set t s '{ type inet_service ; flags interval ; }' # nft add element t s '{ 1 - 5 }' # nft add element t s '{ 6 - 10 }' # nft add element t s '{ 4 - 7 }' # nft list set t s table ip t { set s { type inet_service flags interval elements = { 1-3, 4-5, 6-7 } } } This change has the primary purpose of making the behaviour consistent with nft_set_pipapo, but is also functional to avoid inconsistent behaviour if userspace sends overlapping elements for any reason. v2: When we meet the same key data in the tree, as start element while inserting an end element, or as end element while inserting a start element, actually check that the existing element is active, before resetting the overlap flag (Pablo Neira Ayuso) Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-24netfilter: nft_set_rbtree: Introduce and use nft_rbtree_interval_start()Stefano Brivio
Replace negations of nft_rbtree_interval_end() with a new helper, nft_rbtree_interval_start(), wherever this helps to visualise the problem at hand, that is, for all the occurrences except for the comparison against given flags in __nft_rbtree_get(). This gets especially useful in the next patch. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-24netfilter: nft_set_pipapo: Separate partial and complete overlap cases on ↵Stefano Brivio
insertion ...and return -ENOTEMPTY to the front-end on collision, -EEXIST if an identical element already exists. Together with the previous patch, element collision will now be returned to the user as -EEXIST. Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-24netfilter: nf_tables: Allow set back-ends to report partial overlaps on ↵Pablo Neira Ayuso
insertion Currently, the -EEXIST return code of ->insert() callbacks is ambiguous: it might indicate that a given element (including intervals) already exists as such, or that the new element would clash with existing ones. If identical elements already exist, the front-end is ignoring this without returning error, in case NLM_F_EXCL is not set. However, if the new element can't be inserted due an overlap, we should report this to the user. To this purpose, allow set back-ends to return -ENOTEMPTY on collision with existing elements, translate that to -EEXIST, and return that to userspace, no matter if NLM_F_EXCL was set. Reported-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-03-24Bluetooth: L2CAP: Add get_peer_pid callbackLuiz Augusto von Dentz
This adds a callback to read the socket pid. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-03-24xfrm: policy: Fix doulbe free in xfrm_policy_timerYueHaibing
After xfrm_add_policy add a policy, its ref is 2, then xfrm_policy_timer read_lock xp->walk.dead is 0 .... mod_timer() xfrm_policy_kill policy->walk.dead = 1 .... del_timer(&policy->timer) xfrm_pol_put //ref is 1 xfrm_pol_put //ref is 0 xfrm_policy_destroy call_rcu xfrm_pol_hold //ref is 1 read_unlock xfrm_pol_put //ref is 0 xfrm_policy_destroy call_rcu xfrm_policy_destroy is called twice, which may leads to double free. Call Trace: RIP: 0010:refcount_warn_saturate+0x161/0x210 ... xfrm_policy_timer+0x522/0x600 call_timer_fn+0x1b3/0x5e0 ? __xfrm_decode_session+0x2990/0x2990 ? msleep+0xb0/0xb0 ? _raw_spin_unlock_irq+0x24/0x40 ? __xfrm_decode_session+0x2990/0x2990 ? __xfrm_decode_session+0x2990/0x2990 run_timer_softirq+0x5c5/0x10e0 Fix this by use write_lock_bh in xfrm_policy_kill. Fixes: ea2dea9dacc2 ("xfrm: remove policy lock when accessing policy->walk.dead") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Timo Teräs <timo.teras@iki.fi> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2020-03-23Remove DST_HOSTDavid Laight
Previous changes to the IP routing code have removed all the tests for the DS_HOST route flag. Remove the flags and all the code that sets it. Signed-off-by: David Laight <david.laight@aculab.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23ethtool: fix reference leak in some *_SET handlersMichal Kubecek
Andrew noticed that some handlers for *_SET commands leak a netdev reference if required ethtool_ops callbacks do not exist. A simple reproducer would be e.g. ip link add veth1 type veth peer name veth2 ethtool -s veth1 wol g ip link del veth1 Make sure dev_put() is called when ethtool_ops check fails. v2: add Fixes tags Fixes: a53f3d41e4d3 ("ethtool: set link settings with LINKINFO_SET request") Fixes: bfbcfe2032e7 ("ethtool: set link modes related data with LINKMODES_SET request") Fixes: e54d04e3afea ("ethtool: set message mask with DEBUG_SET request") Fixes: 8d425b19b305 ("ethtool: set wake-on-lan settings with WOL_SET request") Reported-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23net: dsa: Implement flow dissection for tag_brcm.cFlorian Fainelli
Provide a flow_dissect callback which returns the network offset and where to find the skb protocol, given the tags structure a common function works for both tagging formats that are supported. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23net: dsa: Fix duplicate frames flooded by learningFlorian Fainelli
When both the switch and the bridge are learning about new addresses, switch ports attached to the bridge would see duplicate ARP frames because both entities would attempt to send them. Fixes: 5037d532b83d ("net: dsa: add Broadcom tag RX/TX handler") Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23devlink: Only pass packet trap group identifier in trap structureIdo Schimmel
Packet trap groups are now explicitly registered by drivers and not implicitly registered when the packet traps are registered. Therefore, there is no need to encode entire group structure the trap is associated with inside the trap structure. Instead, only pass the group identifier. Refer to it as initial group identifier, as future patches will allow user space to move traps between groups. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23devlink: Stop reference counting packet trap groupsIdo Schimmel
Now that drivers explicitly register their supported packet trap groups there is no for devlink to create them on-demand and destroy them when their reference count reaches zero. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23devlink: Add API to register packet trap groupsIdo Schimmel
Currently, packet trap groups are implicitly registered by drivers upon packet trap registration. When the traps are registered, each is associated with a group and the group is created by devlink, if it does not exist already. This makes it difficult for drivers to pass additional attributes for the groups. Therefore, as a preparation for future patches that require passing additional group attributes, add an API to explicitly register / unregister these groups. Next patches will convert existing drivers to use this API. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23ipv4: fix a RCU-list lock in inet_dump_fib()Qian Cai
There is a place, inet_dump_fib() fib_table_dump fn_trie_dump_leaf() hlist_for_each_entry_rcu() without rcu_read_lock() will trigger a warning, WARNING: suspicious RCU usage ----------------------------- net/ipv4/fib_trie.c:2216 RCU-list traversed in non-reader section!! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by ip/1923: #0: ffffffff8ce76e40 (rtnl_mutex){+.+.}, at: netlink_dump+0xd6/0x840 Call Trace: dump_stack+0xa1/0xea lockdep_rcu_suspicious+0x103/0x10d fn_trie_dump_leaf+0x581/0x590 fib_table_dump+0x15f/0x220 inet_dump_fib+0x4ad/0x5d0 netlink_dump+0x350/0x840 __netlink_dump_start+0x315/0x3e0 rtnetlink_rcv_msg+0x4d1/0x720 netlink_rcv_skb+0xf0/0x220 rtnetlink_rcv+0x15/0x20 netlink_unicast+0x306/0x460 netlink_sendmsg+0x44b/0x770 __sys_sendto+0x259/0x270 __x64_sys_sendto+0x80/0xa0 do_syscall_64+0x69/0xf4 entry_SYSCALL_64_after_hwframe+0x49/0xb3 Fixes: 18a8021a7be3 ("net/ipv4: Plumb support for filtering route dumps") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23net: sched: rename more stats_typesJakub Kicinski
Commit 53eca1f3479f ("net: rename flow_action_hw_stats_types* -> flow_action_hw_stats*") renamed just the flow action types and helpers. For consistency rename variables, enums, struct members and UAPI too (note that this UAPI was not in any official release, yet). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23net: mptcp: don't hang in mptcp_sendmsg() after TCP fallbackDavide Caratti
it's still possible for packetdrill to hang in mptcp_sendmsg(), when the MPTCP socket falls back to regular TCP (e.g. after receiving unsupported flags/version during the three-way handshake). Adjust MPTCP socket state earlier, to ensure correct functionality of mptcp_sendmsg() even in case of TCP fallback. Fixes: 767d3ded5fb8 ("net: mptcp: don't hang before sending 'MP capable with data'") Fixes: 1954b86016cf ("mptcp: Check connection state before attempting send") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23tcp: repair: fix TCP_QUEUE_SEQ implementationEric Dumazet
When application uses TCP_QUEUE_SEQ socket option to change tp->rcv_next, we must also update tp->copied_seq. Otherwise, stuff relying on tcp_inq() being precise can eventually be confused. For example, tcp_zerocopy_receive() might crash because it does not expect tcp_recv_skb() to return NULL. We could add tests in various places to fix the issue, or simply make sure tcp_inq() wont return a random value, and leave fast path as it is. Note that this fixes ioctl(fd, SIOCINQ, &val) at the same time. Fixes: ee9952831cfd ("tcp: Initial repair mode") Fixes: 05255b823a61 ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23net: Make skb_segment not to compute checksum if network controller supports ↵Yadu Kishore
checksumming Problem: TCP checksum in the output path is not being offloaded during GSO in the following case: The network driver does not support scatter-gather but supports checksum offload with NETIF_F_HW_CSUM. Cause: skb_segment calls skb_copy_and_csum_bits if the network driver does not announce NETIF_F_SG. It does not check if the driver supports NETIF_F_HW_CSUM. So for devices which might want to offload checksum but do not support SG there is currently no way to do so if GSO is enabled. Solution: In skb_segment check if the network controller does checksum and if so call skb_copy_bits instead of skb_copy_and_csum_bits. Testing: Without the patch, ran iperf TCP traffic with NETIF_F_HW_CSUM enabled in the network driver. Observed the TCP checksum offload is not happening since the skbs received by the driver in the output path have skb->ip_summed set to CHECKSUM_NONE. With the patch ran iperf TCP traffic and observed that TCP checksum is being offloaded with skb->ip_summed set to CHECKSUM_PARTIAL. Also tested with the patch by disabling NETIF_F_HW_CSUM in the driver to cover the newly introduced if-else code path in skb_segment. Link: https://lore.kernel.org/netdev/CA+FuTSeYGYr3Umij+Mezk9CUcaxYwqEe5sPSuXF8jPE2yMFJAw@mail.gmail.com Signed-off-by: Yadu Kishore <kyk.segfault@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23bpf: Add bpf_sk_storage support to bpf_tcp_caMartin KaFai Lau
This patch adds bpf_sk_storage_get() and bpf_sk_storage_delete() helper to the bpf_tcp_ca's struct_ops. That would allow bpf-tcp-cc to: 1) share sk private data with other bpf progs. 2) use bpf_sk_storage as a private storage for a bpf-tcp-cc if the existing icsk_ca_priv is not big enough. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200320152101.2169498-1-kafai@fb.com
2020-03-23Bluetooth: Fix incorrect branch in connection completeAbhishek Pandit-Subedi
When handling auto-connected devices, we should execute the rest of the connection complete when it was previously discovered and it is an ACL connection. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-03-23Bluetooth: Restore running state if suspend failsAbhishek Pandit-Subedi
If Bluetooth fails to enter the suspended state correctly, restore the state to running (re-enabling scans). PM_POST_SUSPEND is only sent to notifiers that successfully return from PM_PREPARE_SUSPEND notification so we should recover gracefully if it fails. Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2020-03-23libceph: fix alloc_msg_with_page_vector() memory leaksIlya Dryomov
Make it so that CEPH_MSG_DATA_PAGES data item can own pages, fixing a bunch of memory leaks for a page vector allocated in alloc_msg_with_page_vector(). Currently, only watch-notify messages trigger this allocation, and normally the page vector is freed either in handle_watch_notify() or by the caller of ceph_osdc_notify(). But if the message is freed before that (e.g. if the session faults while reading in the message or if the notify is stale), we leak the page vector. This was supposed to be fixed by switching to a message-owned pagelist, but that never happened. Fixes: 1907920324f1 ("libceph: support for sending notifies") Reported-by: Roman Penyaev <rpenyaev@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Roman Penyaev <rpenyaev@suse.de>