summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2020-11-17tcp: only postpone PROBE_RTT if RTT is < current min_rtt estimateRyan Sharpelletti
During loss recovery, retransmitted packets are forced to use TCP timestamps to calculate the RTT samples, which have a millisecond granularity. BBR is designed using a microsecond granularity. As a result, multiple RTT samples could be truncated to the same RTT value during loss recovery. This is problematic, as BBR will not enter PROBE_RTT if the RTT sample is <= the current min_rtt sample, meaning that if there are persistent losses, PROBE_RTT will constantly be pushed off and potentially never re-entered. This patch makes sure that BBR enters PROBE_RTT by checking if RTT sample is < the current min_rtt sample, rather than <=. The Netflix transport/TCP team discovered this bug in the Linux TCP BBR code during lab tests. Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control") Signed-off-by: Ryan Sharpelletti <sharpelletti@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Link: https://lore.kernel.org/r/20201116174412.1433277-1-sharpelletti.kdev@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17net: dsa: tag_dsa: Use a consistent comment styleTobias Waldekranz
Use a consistent style of one-line/multi-line comments throughout the file. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17net: dsa: tag_dsa: Unify regular and ethertype DSA taggersTobias Waldekranz
Ethertype DSA encodes exactly the same information in the DSA tag as the non-ethertype variety. So refactor out the common parts and reuse them for both protocols. This is ensures tag parsing and generation is always consistent across all mv88e6xxx chips. While we are at it, explicitly deal with all possible CPU codes on receive, making sure to set offload_fwd_mark as appropriate. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17net: dsa: tag_dsa: Allow forwarding of redirected IGMP trafficTobias Waldekranz
When receiving an IGMP/MLD frame with a TO_CPU tag, the switch has not performed any forwarding of it. This means that we should not set the offload_fwd_mark on the skb, in case a software bridge wants it forwarded. This is a port of: 1ed9ec9b08ad ("dsa: Allow forwarding of redirected IGMP traffic") Which corrected the issue for chips using EDSA tags, but not for those using regular DSA tags. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16net/tls: fix corrupted data in recvmsgVadim Fedorenko
If tcp socket has more data than Encrypted Handshake Message then tls_sw_recvmsg will try to decrypt next record instead of returning full control message to userspace as mentioned in comment. The next message - usually Application Data - gets corrupted because it uses zero copy for decryption that's why the data is not stored in skb for next iteration. Revert check to not decrypt next record if current is not Application Data. Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru> Link: https://lore.kernel.org/r/1605413760-21153-1-git-send-email-vfedorenko@novek.ru Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16bpf: Fix the irq and nmi check in bpf_sk_storage for tracing usageMartin KaFai Lau
The intention of the current check is to avoid using bpf_sk_storage in irq and nmi. Jakub pointed out that the current check cannot do that. For example, in_serving_softirq() returns true if the softirq handling is interrupted by hard irq. Fixes: 8e4597c627fb ("bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201116200113.2868539-1-kafai@fb.com
2020-11-16net: bridge: add missing counters to ndo_get_stats64 callbackHeiner Kallweit
In br_forward.c and br_input.c fields dev->stats.tx_dropped and dev->stats.multicast are populated, but they are ignored in ndo_get_stats64. Fixes: 28172739f0a2 ("net: fix 64 bit counters on 32 bit arches") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/58ea9963-77ad-a7cf-8dfd-fc95ab95f606@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16mptcp: send explicit ack on delayed ack_seq incrPaolo Abeni
When the worker moves some bytes from the OoO queue into the receive queue, the msk->ask_seq is updated, the MPTCP-level ack carrying that value needs to wait the next ingress packet, possibly slowing down or hanging the peer Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16mptcp: keep track of advertised windows right edgeFlorian Westphal
Before sending 'x' new bytes also check that the new snd_una would be within the permitted receive window. For every ACK that also contains a DSS ack, check whether its tcp-level receive window would advance the current mptcp window right edge and update it if so. Signed-off-by: Florian Westphal <fw@strlen.de> Co-developed-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16mptcp: rework poll+nospace handlingFlorian Westphal
MPTCP maintains a status bit, MPTCP_SEND_SPACE, that is set when at least one subflow and the mptcp socket itself are writeable. mptcp_poll returns EPOLLOUT if the bit is set. mptcp_sendmsg makes sure MPTCP_SEND_SPACE gets cleared when last write has used up all subflows or the mptcp socket wmem. This reworks nospace handling as follows: MPTCP_SEND_SPACE is replaced with MPTCP_NOSPACE, i.e. inverted meaning. This bit is set when the mptcp socket is not writeable. The mptcp-level ack path schedule will then schedule the mptcp worker to allow it to free already-acked data (and reduce wmem usage). This will then wake userspace processes that wait for a POLLOUT event. sendmsg will set MPTCP_NOSPACE only when it has to wait for more wmem (blocking I/O case). poll path will set MPTCP_NOSPACE in case the mptcp socket is not writeable. Normal tcp-level notification (SOCK_NOSPACE) is only enabled in case the subflow socket has no available wmem. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16mptcp: try to push pending data on snd una updatesPaolo Abeni
After the previous patch we may end-up with unsent data in the write buffer. If such buffer is full, the writer will block for unlimited time. We need to trigger the MPTCP xmit path even for the subflow rx path, on MPTCP snd_una updates. Keep things simple and just schedule the work queue if needed. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16mptcp: move page frag allocation in mptcp_sendmsg()Paolo Abeni
mptcp_sendmsg() is refactored so that first it copies the data provided from user space into the send queue, and then tries to spool the send queue via sendmsg_frag. There a subtle change in the mptcp level collapsing on consecutive data fragment: we now allow that only on unsent data. The latter don't need to deal with msghdr data anymore and can be simplified in a relevant way. snd_nxt and write_seq are now tracked independently. Overall this allows some relevant cleanup and will allow sending pending mptcp data on msk una update in later patch. Co-developed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16mptcp: refactor shutdown and closePaolo Abeni
We must not close the subflows before all the MPTCP level data, comprising the DATA_FIN has been acked at the MPTCP level, otherwise we could be unable to retransmit as needed. __mptcp_wr_shutdown() shutdown is responsible to check for the correct status and close all subflows. Is called by the output path after spooling any data and at shutdown/close time. In a similar way, __mptcp_destroy_sock() is responsible to clean-up the MPTCP level status, and is called when the msk transition to TCP_CLOSE. The protocol level close() does not force anymore the TCP_CLOSE status, but orphan the msk socket and all the subflows. Orphaned msk sockets are forciby closed after a timeout or when all MPTCP-level data is acked. There is a caveat about keeping the orphaned subflows around: the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via tcp_close(). To prevent accessing freed memory on later MPTCP level operations, the msk acquires a reference to each subflow socket and prevent subflow_ulp_release() from releasing the subflow context before __mptcp_destroy_sock(). The additional subflow references are released by __mptcp_done() and the async ULP release is detected checking ULP ops. If such field has been already cleared by the ULP release path, the dangling context is freed directly by __mptcp_done(). Co-developed-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16mptcp: introduce MPTCP snd_nxtPaolo Abeni
Track the next MPTCP sequence number used on xmit, currently always equal to write_next. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16mptcp: add accounting for pending dataPaolo Abeni
Preparation patch to track the data pending in the msk write queue. No functional change introduced here Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16mptcp: reduce the arguments of mptcp_sendmsg_fragPaolo Abeni
The current argument list is pretty long and quite unreadable, move many of them into a specific struct. Later patches will add more stuff to such struct. Additionally drop the 'timeo' argument, now unused. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16mptcp: introduce mptcp_schedule_workPaolo Abeni
remove some of code duplications an allow preventing rescheduling on close. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16tcp: factor out __tcp_close() helperPaolo Abeni
unlocked version of protocol level close, will be used by MPTCP to allow decouple orphaning and subflow level close. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16mptcp: use tcp_build_frag()Paolo Abeni
mptcp_push_pending() is called even on orphaned msk (and orphaned subflows), if there is outstanding data at close() time. To cope with the above MPTCP needs to handle explicitly the allocation failure on xmit. The newly introduced do_tcp_sendfrag() allows that, just plug it. We can additionally drop a couple of sanity checks, duplicate in the TCP code. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16tcp: factor out tcp_build_frag()Paolo Abeni
Will be needed by the next patch, as MPTCP needs to handle directly the error/memory-allocation-needed path. No functional changes intended. Additionally let MPTCP code access the tcp_remove_empty_skb() helper. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16ipv6/netfilter: Discard first fragment not including all headersGeorg Kohmann
Packets are processed even though the first fragment don't include all headers through the upper layer header. This breaks TAHI IPv6 Core Conformance Test v6LC.1.3.6. Referring to RFC8200 SECTION 4.5: "If the first fragment does not include all headers through an Upper-Layer header, then that fragment should be discarded and an ICMP Parameter Problem, Code 3, message should be sent to the source of the fragment, with the Pointer field set to zero." The fragment needs to be validated the same way it is done in commit 2efdaaaf883a ("IPv6: reply ICMP error if the first fragment don't include all headers") for ipv6. Wrap the validation into a common function, ipv6_frag_thdr_truncated() to check for truncation in the upper layer header. This validation does not fullfill all aspects of RFC 8200, section 4.5, but is at the moment sufficient to pass mentioned TAHI test. In netfilter, utilize the fragment offset returned by find_prev_fhdr() to let ipv6_frag_thdr_truncated() start it's traverse from the fragment header. Return 0 to drop the fragment in the netfilter. This is the same behaviour as used on other protocol errors in this function, e.g. when nf_ct_frag6_queue() returns -EPROTO. The Fragment will later be picked up by ipv6_frag_rcv() in reassembly.c. ipv6_frag_rcv() will then send an appropriate ICMP Parameter Problem message back to the source. References commit 2efdaaaf883a ("IPv6: reply ICMP error if the first fragment don't include all headers") Signed-off-by: Georg Kohmann <geokohma@cisco.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Link: https://lore.kernel.org/r/20201111115025.28879-1-geokohma@cisco.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16treewide: rename nla_strlcpy to nla_strscpy.Francis Laniel
Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new name of this function. Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16Modify return value of nla_strlcpy to match that of strscpy.Francis Laniel
nla_strlcpy now returns -E2BIG if src was truncated when written to dst. It also returns this error value if dstsize is 0 or higher than INT_MAX. For example, if src is "foo\0" and dst is 3 bytes long, the result will be: 1. "foG" after memcpy (G means garbage). 2. "fo\0" after memset. 3. -E2BIG is returned because src was not completely written into dst. The callers of nla_strlcpy were modified to take into account this modification. Signed-off-by: Francis Laniel <laniel_francis@privacyrequired.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-16Merge tag 'linux-can-fixes-for-5.10-20201115' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2020-11-15 Anant Thazhemadam contributed two patches for the AF_CAN that prevent potential access of uninitialized member in can_rcv() and canfd_rcv(). The next patch is by Alejandro Concepcion Rodriguez and changes can_restart() to use the correct function to push a skb into the networking stack from process context. Zhang Qilong's patch fixes a memory leak in the error path of the ti_hecc's probe function. A patch by me fixes mcba_usb_start_xmit() function in the mcba_usb driver, to first fill the skb and then pass it to can_put_echo_skb(). Colin Ian King's patch fixes a potential integer overflow on shift in the peak_usb driver. The next two patches target the flexcan driver, a patch by me adds the missing "req_bit" to the stop mode property comment (which was broken during net-next for v5.10). Zhang Qilong's patch fixes the failure handling of pm_runtime_get_sync(). The next seven patches target the m_can driver including the tcan4x5x spi driver glue code. Enric Balletbo i Serra's patch for the tcan4x5x Kconfig fix the REGMAP_SPI dependency handling. A patch by me for the tcan4x5x driver's probe() function adds missing error handling to for devm_regmap_init(), and in tcan4x5x_can_remove() the order of deregistration is fixed. Wu Bo's patch for the m_can driver fixes the state change handling in m_can_handle_state_change(). Two patches by Dan Murphy first introduce m_can_class_free_dev() and then make use of it to fix the freeing of the can device. A patch by Faiz Abbas add a missing shutdown of the CAN controller in the m_can_stop() function. * tag 'linux-can-fixes-for-5.10-20201115' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can: can: m_can: m_can_stop(): set device to software init mode before closing can: m_can: Fix freeing of can device from peripherials can: m_can: m_can_class_free_dev(): introduce new function can: m_can: m_can_handle_state_change(): fix state change can: tcan4x5x: tcan4x5x_can_remove(): fix order of deregistration can: tcan4x5x: tcan4x5x_can_probe(): add missing error checking for devm_regmap_init() can: tcan4x5x: replace depends on REGMAP_SPI with depends on SPI can: flexcan: fix failure handling of pm_runtime_get_sync() can: flexcan: flexcan_setup_stop_mode(): add missing "req_bit" to stop mode property comment can: peak_usb: fix potential integer overflow on shift of a int can: mcba_usb: mcba_usb_start_xmit(): first fill skb, then pass to can_put_echo_skb() can: ti_hecc: Fix memleak in ti_hecc_probe can: dev: can_restart(): post buffer from the right context can: af_can: prevent potential access of uninitialized member in canfd_rcv() can: af_can: prevent potential access of uninitialized member in can_rcv() ==================== Link: https://lore.kernel.org/r/20201115174131.2089251-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-15can: af_can: prevent potential access of uninitialized member in canfd_rcv()Anant Thazhemadam
In canfd_rcv(), cfd->len is uninitialized when skb->len = 0, and this uninitialized cfd->len is accessed nonetheless by pr_warn_once(). Fix this uninitialized variable access by checking cfd->len's validity condition (cfd->len > CANFD_MAX_DLEN) separately after the skb->len's condition is checked, and appropriately modify the log messages that are generated as well. In case either of the required conditions fail, the skb is freed and NET_RX_DROP is returned, same as before. Fixes: d4689846881d ("can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once") Reported-by: syzbot+9bcb0c9409066696d3aa@syzkaller.appspotmail.com Tested-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> Link: https://lore.kernel.org/r/20201103213906.24219-3-anant.thazhemadam@gmail.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-15can: af_can: prevent potential access of uninitialized member in can_rcv()Anant Thazhemadam
In can_rcv(), cfd->len is uninitialized when skb->len = 0, and this uninitialized cfd->len is accessed nonetheless by pr_warn_once(). Fix this uninitialized variable access by checking cfd->len's validity condition (cfd->len > CAN_MAX_DLEN) separately after the skb->len's condition is checked, and appropriately modify the log messages that are generated as well. In case either of the required conditions fail, the skb is freed and NET_RX_DROP is returned, same as before. Fixes: 8cb68751c115 ("can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once") Reported-by: syzbot+9bcb0c9409066696d3aa@syzkaller.appspotmail.com Tested-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> Signed-off-by: Anant Thazhemadam <anant.thazhemadam@gmail.com> Link: https://lore.kernel.org/r/20201103213906.24219-2-anant.thazhemadam@gmail.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2020-11-15batman-adv: set .owner to THIS_MODULETaehee Yoo
If THIS_MODULE is not set, the module would be removed while debugfs is being used. It eventually makes kernel panic. Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2020-11-14net: xfrm: use core API for updating/providing statsLev Stipakov
Commit d3fd65484c781 ("net: core: add dev_sw_netstats_tx_add") has added function "dev_sw_netstats_tx_add()" to update net device per-cpu TX stats. Use this function instead of own code. While on it, remove xfrmi_get_stats64() and replace it with dev_get_tstats64(). Signed-off-by: Lev Stipakov <lev@openvpn.net> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/20201113215939.147007-1-lev@openvpn.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14net: openvswitch: use core API to update/provide statsLev Stipakov
Commit d3fd65484c781 ("net: core: add dev_sw_netstats_tx_add") has added function "dev_sw_netstats_tx_add()" to update net device per-cpu TX stats. Use this function instead of own code. While on it, remove internal_get_stats() and replace it with dev_get_tstats64(). Signed-off-by: Lev Stipakov <lev@openvpn.net> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://lore.kernel.org/r/20201113215336.145998-1-lev@openvpn.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14nfc: refined function nci_hci_resp_receivedAlex Shi
We don't use the parameter result actually, so better to remove it and skip a gcc warning for unused variable. Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com> Link: https://lore.kernel.org/r/1605239517-49707-1-git-send-email-alex.shi@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14devlink: Add missing genlmsg_cancel() in devlink_nl_sb_port_pool_fill()Wang Hai
If sb_occ_port_pool_get() failed in devlink_nl_sb_port_pool_fill(), msg should be canceled by genlmsg_cancel(). Fixes: df38dafd2559 ("devlink: implement shared buffer occupancy monitoring interface") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wang Hai <wanghai38@huawei.com> Link: https://lore.kernel.org/r/20201113111622.11040-1-wanghai38@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14inet: unexport udp{4|6}_lib_lookup_skb()Eric Dumazet
These functions do not need to be exported. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201113113553.3411756-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14tcp: uninline tcp_stream_memory_free()Eric Dumazet
Both IPv4 and IPv6 needs it via a function pointer. Following patch will avoid the indirect call. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14IPv4: RTM_GETROUTE: Add RTA_ENCAP to resultOliver Herms
This patch adds an IPv4 routes encapsulation attribute to the result of netlink RTM_GETROUTE requests (e.g. ip route get 192.0.2.1). Signed-off-by: Oliver Herms <oliver.peter.herms@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20201113085517.GA1307262@tws Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14netlabel: fix an uninitialized warning in netlbl_unlabel_staticlist()Paul Moore
Static checking revealed that a previous fix to netlbl_unlabel_staticlist() leaves a stack variable uninitialized, this patches fixes that. Fixes: 866358ec331f ("netlabel: fix our progress tracking in netlbl_unlabel_staticlist()") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Link: https://lore.kernel.org/r/160530304068.15651.18355773009751195447.stgit@sifl Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14ipv6: remove unused function ipv6_skb_idev()Lukas Bulwahn
Commit bdb7cc643fc9 ("ipv6: Count interface receive statistics on the ingress netdev") removed all callees for ipv6_skb_idev(). Hence, since then, ipv6_skb_idev() is unused and make CC=clang W=1 warns: net/ipv6/exthdrs.c:909:33: warning: unused function 'ipv6_skb_idev' [-Wunused-function] So, remove this unused function and a -Wunused-function warning. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Link: https://lore.kernel.org/r/20201113135012.32499-1-lukas.bulwahn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14sctp: change to hold/put transport for proto_unreach_timerXin Long
A call trace was found in Hangbin's Codenomicon testing with debug kernel: [ 2615.981988] ODEBUG: free active (active state 0) object type: timer_list hint: sctp_generate_proto_unreach_event+0x0/0x3a0 [sctp] [ 2615.995050] WARNING: CPU: 17 PID: 0 at lib/debugobjects.c:328 debug_print_object+0x199/0x2b0 [ 2616.095934] RIP: 0010:debug_print_object+0x199/0x2b0 [ 2616.191533] Call Trace: [ 2616.194265] <IRQ> [ 2616.202068] debug_check_no_obj_freed+0x25e/0x3f0 [ 2616.207336] slab_free_freelist_hook+0xeb/0x140 [ 2616.220971] kfree+0xd6/0x2c0 [ 2616.224293] rcu_do_batch+0x3bd/0xc70 [ 2616.243096] rcu_core+0x8b9/0xd00 [ 2616.256065] __do_softirq+0x23d/0xacd [ 2616.260166] irq_exit+0x236/0x2a0 [ 2616.263879] smp_apic_timer_interrupt+0x18d/0x620 [ 2616.269138] apic_timer_interrupt+0xf/0x20 [ 2616.273711] </IRQ> This is because it holds asoc when transport->proto_unreach_timer starts and puts asoc when the timer stops, and without holding transport the transport could be freed when the timer is still running. So fix it by holding/putting transport instead for proto_unreach_timer in transport, just like other timers in transport. v1->v2: - Also use sctp_transport_put() for the "out_unlock:" path in sctp_generate_proto_unreach_event(), as Marcelo noticed. Fixes: 50b5d6ad6382 ("sctp: Fix a race between ICMP protocol unreachable and connect()") Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/102788809b554958b13b95d33440f5448113b8d6.1605331373.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14vsock: forward all packets to the host when no H2G is registeredStefano Garzarella
Before commit c0cfa2d8a788 ("vsock: add multi-transports support"), if a G2H transport was loaded (e.g. virtio transport), every packets was forwarded to the host, regardless of the destination CID. The H2G transports implemented until then (vhost-vsock, VMCI) always responded with an error, if the destination CID was not VMADDR_CID_HOST. From that commit, we are using the remote CID to decide which transport to use, so packets with remote CID > VMADDR_CID_HOST(2) are sent only through H2G transport. If no H2G is available, packets are discarded directly in the guest. Some use cases (e.g. Nitro Enclaves [1]) rely on the old behaviour to implement sibling VMs communication, so we restore the old behavior when no H2G is registered. It will be up to the host to discard packets if the destination is not the right one. As it was already implemented before adding multi-transport support. Tested with nested QEMU/KVM by me and Nitro Enclaves by Andra. [1] Documentation/virt/ne_overview.rst Cc: Jorgen Hansen <jhansen@vmware.com> Cc: Dexuan Cui <decui@microsoft.com> Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Reported-by: Andra Paraschiv <andraprs@amazon.com> Tested-by: Andra Paraschiv <andraprs@amazon.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20201112133837.34183-1-sgarzare@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-11-14 1) Add BTF generation for kernel modules and extend BTF infra in kernel e.g. support for split BTF loading and validation, from Andrii Nakryiko. 2) Support for pointers beyond pkt_end to recognize LLVM generated patterns on inlined branch conditions, from Alexei Starovoitov. 3) Implements bpf_local_storage for task_struct for BPF LSM, from KP Singh. 4) Enable FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage infra, from Martin KaFai Lau. 5) Add XDP bulk APIs that introduce a defer/flush mechanism to optimize the XDP_REDIRECT path, from Lorenzo Bianconi. 6) Fix a potential (although rather theoretical) deadlock of hashtab in NMI context, from Song Liu. 7) Fixes for cross and out-of-tree build of bpftool and runqslower allowing build for different target archs on same source tree, from Jean-Philippe Brucker. 8) Fix error path in htab_map_alloc() triggered from syzbot, from Eric Dumazet. 9) Move functionality from test_tcpbpf_user into the test_progs framework so it can run in BPF CI, from Alexander Duyck. 10) Lift hashtab key_size limit to be larger than MAX_BPF_STACK, from Florian Lehner. Note that for the fix from Song we have seen a sparse report on context imbalance which requires changes in sparse itself for proper annotation detection where this is currently being discussed on linux-sparse among developers [0]. Once we have more clarification/guidance after their fix, Song will follow-up. [0] https://lore.kernel.org/linux-sparse/CAHk-=wh4bx8A8dHnX612MsDO13st6uzAz1mJ1PaHHVevJx_ZCw@mail.gmail.com/T/ https://lore.kernel.org/linux-sparse/20201109221345.uklbp3lzgq6g42zb@ltop.local/T/ * git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (66 commits) net: mlx5: Add xdp tx return bulking support net: mvpp2: Add xdp tx return bulking support net: mvneta: Add xdp tx return bulking support net: page_pool: Add bulk support for ptr_ring net: xdp: Introduce bulking for xdp tx return path bpf: Expose bpf_d_path helper to sleepable LSM hooks bpf: Augment the set of sleepable LSM hooks bpf: selftest: Use bpf_sk_storage in FENTRY/FEXIT/RAW_TP bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TP bpf: Rename some functions in bpf_sk_storage bpf: Folding omem_charge() into sk_storage_charge() selftests/bpf: Add asm tests for pkt vs pkt_end comparison. selftests/bpf: Add skb_pkt_end test bpf: Support for pointers beyond pkt_end. tools/bpf: Always run the *-clean recipes tools/bpf: Add bootstrap/ to .gitignore bpf: Fix NULL dereference in bpf_task_storage tools/bpftool: Fix build slowdown tools/runqslower: Build bpftool using HOSTCC tools/runqslower: Enable out-of-tree build ... ==================== Link: https://lore.kernel.org/r/20201114020819.29584-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13ipv6: Fix error path to cancel the meseageZhang Qilong
genlmsg_cancel() needs to be called in the error path of inet6_fill_ifmcaddr and inet6_fill_ifacaddr to cancel the message. Fixes: 6ecf4c37eb3e ("ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com> Link: https://lore.kernel.org/r/20201112080950.1476302-1-zhangqilong3@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-14net: page_pool: Add bulk support for ptr_ringLorenzo Bianconi
Introduce the capability to batch page_pool ptr_ring refill since it is usually run inside the driver NAPI tx completion loop. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/bpf/08dd249c9522c001313f520796faa777c4089e1c.1605267335.git.lorenzo@kernel.org
2020-11-14net: xdp: Introduce bulking for xdp tx return pathLorenzo Bianconi
XDP bulk APIs introduce a defer/flush mechanism to return pages belonging to the same xdp_mem_allocator object (identified via the mem.id field) in bulk to optimize I-cache and D-cache since xdp_return_frame is usually run inside the driver NAPI tx completion loop. The bulk queue size is set to 16 to be aligned to how XDP_REDIRECT bulking works. The bulk is flushed when it is full or when mem.id changes. xdp_frame_bulk is usually stored/allocated on the function call-stack to avoid locking penalties. Current implementation considers only page_pool memory model. Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Link: https://lore.kernel.org/bpf/e190c03eac71b20c8407ae0fc2c399eda7835f49.1605267335.git.lorenzo@kernel.org
2020-11-13net: Exempt multicast addresses from five-second neighbor lifetimeJeff Dike
Commit 58956317c8de ("neighbor: Improve garbage collection") guarantees neighbour table entries a five-second lifetime. Processes which make heavy use of multicast can fill the neighour table with multicast addresses in five seconds. At that point, neighbour entries can't be GC-ed because they aren't five seconds old yet, the kernel log starts to fill up with "neighbor table overflow!" messages, and sends start to fail. This patch allows multicast addresses to be thrown out before they've lived out their five seconds. This makes room for non-multicast addresses and makes messages to all addresses more reliable in these circumstances. Fixes: 58956317c8de ("neighbor: Improve garbage collection") Signed-off-by: Jeff Dike <jdike@akamai.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20201113015815.31397-1-jdike@akamai.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13tipc: fix -Wstringop-truncation warningsWenlin Kang
Replace strncpy() with strscpy(), fixes the following warning: In function 'bearer_name_validate', inlined from 'tipc_enable_bearer' at net/tipc/bearer.c:246:7: net/tipc/bearer.c:141:2: warning: 'strncpy' specified bound 32 equals destination size [-Wstringop-truncation] strncpy(name_copy, name, TIPC_MAX_BEARER_NAME); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Wenlin Kang <wenlin.kang@windriver.com> Acked-by: Ying Xue <ying.xue@windriver.com> Link: https://lore.kernel.org/r/20201112093442.8132-1-wenlin.kang@windriver.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13Merge tag 'mac80211-next-for-net-next-2020-11-13' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Some updates: * injection/radiotap updates for new test capabilities * remove WDS support - even years ago when we turned it off by default it was already basically unusable * support for HE (802.11ax) rates for beacons * support for some vendor-specific HE rates * many other small features/cleanups * tag 'mac80211-next-for-net-next-2020-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next: (21 commits) nl80211: fix kernel-doc warning in the new SAE attribute cfg80211: remove WDS code mac80211: remove WDS-related code rt2x00: remove WDS code b43legacy: remove WDS code b43: remove WDS code carl9170: remove WDS code ath9k: remove WDS code wireless: remove CONFIG_WIRELESS_WDS mac80211: assure that certain drivers adhere to DONT_REORDER flag mac80211: don't overwrite QoS TID of injected frames mac80211: adhere to Tx control flag that prevents frame reordering mac80211: add radiotap flag to assure frames are not reordered mac80211: save HE oper info in BSS config for mesh cfg80211: add support to configure HE MCS for beacon rate nl80211: fix beacon tx rate mask validation nl80211/cfg80211: fix potential infinite loop cfg80211: Add support to calculate and report 4096-QAM HE rates cfg80211: Add support to configure SAE PWE value to drivers ieee80211: Add definition for WFA DPP ... ==================== Link: https://lore.kernel.org/r/20201113101148.25268-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13Merge tag 'mac80211-for-net-2020-11-13' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== A handful of fixes: * a use-after-free fix in rfkill * a memory leak fix in the mac80211 TX status path * some rate scaling fixes * a fix for the often-reported (by syzbot) sleeping in atomic issue with mac80211's station removal * tag 'mac80211-for-net-2020-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211: mac80211: free sta in sta_info_insert_finish() on errors mac80211: minstrel: fix tx status processing corner case mac80211: minstrel: remove deferred sampling code mac80211: fix memory leak on filtered powersave frames rfkill: Fix use-after-free in rfkill_resume() ==================== Link: https://lore.kernel.org/r/20201113093421.24025-1-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-13mac80211: free sta in sta_info_insert_finish() on errorsJohannes Berg
If sta_info_insert_finish() fails, we currently keep the station around and free it only in the caller, but there's only one such caller and it always frees it immediately. As syzbot found, another consequence of this split is that we can put things that sleep only into __cleanup_single_sta() and not in sta_info_free(), but this is the only place that requires such of sta_info_free() now. Change this to free the station in sta_info_insert_finish(), in which case we can still sleep. This will also let us unify the cleanup code later. Cc: stable@vger.kernel.org Fixes: dcd479e10a05 ("mac80211: always wind down STA state") Reported-by: syzbot+32c6c38c4812d22f2f0b@syzkaller.appspotmail.com Reported-by: syzbot+4c81fe92e372d26c4246@syzkaller.appspotmail.com Reported-by: syzbot+6a7fe9faf0d1d61bc24a@syzkaller.appspotmail.com Reported-by: syzbot+abed06851c5ffe010921@syzkaller.appspotmail.com Reported-by: syzbot+b7aeb9318541a1c709f1@syzkaller.appspotmail.com Reported-by: syzbot+d5a9416c6cafe53b5dd0@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20201112112201.ee6b397b9453.I9c31d667a0ea2151441cc64ed6613d36c18a48e0@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-11-12bpf: Allow using bpf_sk_storage in FENTRY/FEXIT/RAW_TPMartin KaFai Lau
This patch enables the FENTRY/FEXIT/RAW_TP tracing program to use the bpf_sk_storage_(get|delete) helper, so those tracing programs can access the sk's bpf_local_storage and the later selftest will show some examples. The bpf_sk_storage is currently used in bpf-tcp-cc, tc, cg sockops...etc which is running either in softirq or task context. This patch adds bpf_sk_storage_get_tracing_proto and bpf_sk_storage_delete_tracing_proto. They will check in runtime that the helpers can only be called when serving softirq or running in a task context. That should enable most common tracing use cases on sk. During the load time, the new tracing_allowed() function will ensure the tracing prog using the bpf_sk_storage_(get|delete) helper is not tracing any bpf_sk_storage*() function itself. The sk is passed as "void *" when calling into bpf_local_storage. This patch only allows tracing a kernel function. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201112211313.2587383-1-kafai@fb.com
2020-11-12bpf: Rename some functions in bpf_sk_storageMartin KaFai Lau
Rename some of the functions currently prefixed with sk_storage to bpf_sk_storage. That will make the next patch have fewer prefix check and also bring the bpf_sk_storage.c to a more consistent function naming. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20201112211307.2587021-1-kafai@fb.com
2020-11-12bpf: Folding omem_charge() into sk_storage_charge()Martin KaFai Lau
sk_storage_charge() is the only user of omem_charge(). This patch simplifies it by folding omem_charge() into sk_storage_charge(). Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Acked-by: KP Singh <kpsingh@google.com> Link: https://lore.kernel.org/bpf/20201112211301.2586255-1-kafai@fb.com