summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2021-05-29netfilter: use nfnetlink_unicast()Pablo Neira Ayuso
Replace netlink_unicast() calls by nfnetlink_unicast() which already deals with translating EAGAIN to ENOBUFS as the nfnetlink core expects. nfnetlink_unicast() calls nlmsg_unicast() which returns zero in case of success, otherwise the netlink core function netlink_rcv_skb() turns err > 0 into an acknowlegment. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: xt_CT: Remove redundant assignment to retYang Li
Variable 'ret' is set to zero but this value is never read as it is overwritten with a new value later on, hence it is a redundant assignment and can be removed Clean up the following clang-analyzer warning: net/netfilter/xt_CT.c:175:2: warning: Value stored to 'ret' is never read [clang-analyzer-deadcode.DeadStores] Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: x_tables: improve limit_mt scalabilityJason Baron
We've seen this spin_lock show up high in profiles. Let's introduce a lockless version. I've tested this using pktgen_sample01_simple.sh. Signed-off-by: Jason Baron <jbaron@akamai.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: Remove leading spaces in KconfigJuerg Haefliger
Remove leading spaces before tabs in Kconfig file(s) by running the following command: $ find net/netfilter -name 'Kconfig*' | xargs sed -r -i 's/^[ ]+\t/\t/' Signed-off-by: Juerg Haefliger <juergh@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-29netfilter: nf_tables: prefer direct calls for set lookupsFlorian Westphal
Extend nft_set_do_lookup() to use direct calls when retpoline feature is enabled. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-28mptcp: restrict values of 'enabled' sysctlMatthieu Baerts
To avoid confusions, it seems better to parse this sysctl parameter as a boolean. We use it as a boolean, no need to parse an integer and bring confusions if we see a value different from 0 and 1, especially with this parameter name: enabled. It seems fine to do this modification because the default value is 1 (enabled). Then the only other interesting value to set is 0 (disabled). All other values would not have changed the default behaviour. Suggested-by: Florian Westphal <fw@strlen.de> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: support SYSCTL only if enabledMatthieu Baerts
Since the introduction of the sysctl support in MPTCP with commit 784325e9f037 ("mptcp: new sysctl to control the activation per NS"), we don't check CONFIG_SYSCTL. Until now, that was not an issue: the register and unregister functions were replaced by NO-OP one if SYSCTL was not enabled in the config. The only thing we could have avoid is not to reserve memory for the table but that's for the moment only a small table per net-ns. But the following commit is going to use SYSCTL_ZERO and SYSCTL_ONE which are not be defined if SYSCTL is not enabled in the config. This causes 'undefined reference' errors from the linker. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: make sure flag signal is set when add addr with portJianguo Wu
When add address with port, it is mean to create a listening socket, and send an ADD_ADDR to remote, so it must have flag signal set, add this check in mptcp_pm_parse_addr(). Fixes: a77e9179c7651 ("mptcp: deal with MPTCP_PM_ADDR_ATTR_PORT in PM netlink") Acked-by: Geliang Tang <geliangtang@gmail.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: remove redundant initialization in pm_nl_init_net()Jianguo Wu
Memory of struct pm_nl_pernet{} is allocated by kzalloc() in setup_net()->ops_init(), so it's no need to reset counters and zero bitmap in pm_nl_init_net(). Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: generate subflow hmac after mptcp_finish_join()Jianguo Wu
For outgoing subflow join, when recv SYNACK, in subflow_finish_connect(), the mptcp_finish_join() may return false in some cases, and send a RESET to remote, and no local hmac is required. So generate subflow hmac after mptcp_finish_join(). Fixes: ec3edaa7ca6c ("mptcp: Add handling of outgoing MP_JOIN requests") Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: using TOKEN_MAX_RETRIES instead of magic numberJianguo Wu
We have macro TOKEN_MAX_RETRIES for the number of token generate retries, so using TOKEN_MAX_RETRIES in subflow_check_req(). And rename TOKEN_MAX_RETRIES to MPTCP_TOKEN_MAX_RETRIES as it is now exposed. Fixes: 535fb8152f31 ("mptcp: token: move retry to caller") Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: fix pr_debug in mptcp_token_new_connectJianguo Wu
After commit 2c5ebd001d4f ("mptcp: refactor token container"), pr_debug() is called before mptcp_crypto_key_gen_sha() in mptcp_token_new_connect(), so the output local_key, token and idsn are 0, like: MPTCP: ssk=00000000f6b3c4a2, local_key=0, token=0, idsn=0 Move pr_debug() after mptcp_crypto_key_gen_sha(). Fixes: 2c5ebd001d4f ("mptcp: refactor token container") Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jianguo Wu <wujianguo@chinatelecom.cn> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: do not reset MP_CAPABLE subflow on mapping errorsPaolo Abeni
When some mapping related errors occurs we close the main MPC subflow with a RST. We should instead fallback gracefully to TCP, and do the reset only for MPJ subflows. Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/192 Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: always parse mptcp options for MPC reqskPaolo Abeni
In subflow_syn_recv_sock() we currently skip options parsing for OoO packet, given that such packets may not carry the relevant MPC option. If the peer generates an MPC+data TSO packet and some of the early segments are lost or get reorder, we server will ignore the peer key, causing transient, unexpected fallback to TCP. The solution is always parsing the incoming MPTCP options, and do the fallback only for in-order packets. This actually cleans the existing code a bit. Fixes: d22f4988ffec ("mptcp: process MP_CAPABLE data option") Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28mptcp: fix sk_forward_memory corruption on retransmissionPaolo Abeni
MPTCP sk_forward_memory handling is a bit special, as such field is protected by the msk socket spin_lock, instead of the plain socket lock. Currently we have a code path updating such field without handling the relevant lock: __mptcp_retrans() -> __mptcp_clean_una_wakeup() Several helpers in __mptcp_clean_una_wakeup() will update sk_forward_alloc, possibly causing such field corruption, as reported by Matthieu. Address the issue providing and using a new variant of blamed function which explicitly acquires the msk spin lock. Fixes: 64b9cea7a0af ("mptcp: fix spurious retransmissions") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/172 Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net> Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-28netfilter: add and use nft_set_do_lookup helperFlorian Westphal
Followup patch will add a CONFIG_RETPOLINE wrapper to avoid the ops->lookup() indirection cost for retpoline builds. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-28netfilter: nft_set_pipapo_avx2: Skip LDMXCSR, we don't need a valid MXCSR stateStefano Brivio
We don't need a valid MXCSR state for the lookup routines, none of the instructions we use rely on or affect any bit in the MXCSR register. Instead of calling kernel_fpu_begin(), we can pass 0 as mask to kernel_fpu_begin_mask() and spare one LDMXCSR instruction. Commit 49200d17d27d ("x86/fpu/64: Don't FNINIT in kernel_fpu_begin()") already speeds up lookups considerably, and by dropping the MCXSR initialisation we can now get a much smaller, but measurable, increase in matching rates. The table below reports matching rates and a wild approximation of clock cycles needed for a match in a "port,net" test with 10 entries from selftests/netfilter/nft_concat_range.sh, limited to the first field, i.e. the port (with nft_set_rbtree initialisation skipped), run on a single AMD Epyc 7351 thread (2.9GHz, 512 KiB L1D$, 8 MiB L2$). The (very rough) estimation of clock cycles is obtained by simply dividing frequency by matching rate. The "cycles spared" column refers to the difference in cycles compared to the previous row, and the rate increase also refers to the previous row. Results are averages of six runs. Merely for context, I'm also reporting packet rates obtained by skipping kernel_fpu_begin() and kernel_fpu_end() altogether (which shows a very limited impact now), as well as skipping the whole lookup function, compared to simply counting and dropping all packets using the netdev hook drop (see nft_concat_range.sh for details). This workload also includes packet generation with pktgen and the receive path of veth. |matching| est. | cycles | rate | | rate | cycles | spared |increase| | (Mpps) | | | | --------------------------------------|--------|--------|--------|--------| FNINIT, LDMXCSR (before 49200d17d27d) | 5.245 | 553 | - | - | LDMXCSR only (with 49200d17d27d) | 6.347 | 457 | 96 | 21.0% | Without LDMXCSR (this patch) | 6.461 | 449 | 8 | 1.8% | -------- for reference only: ---------|--------|--------|--------|--------| Without kernel_fpu_begin() | 6.513 | 445 | 4 | 0.8% | Without actual matching (return true) | 7.649 | 379 | 66 | 17.4% | Without lookup operation (netdev drop)| 10.320 | 281 | 98 | 34.9% | The clock cycles spared by avoiding LDMXCSR appear to be in line with CPI and latency indicated in the manuals of comparable architectures: Intel Skylake (CPI: 1, latency: 7) and AMD 12h (latency: 12) -- I couldn't find this information for AMD 17h. Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-28netfilter: nft_exthdr: Support SCTP chunksPhil Sutter
Chunks are SCTP header extensions similar in implementation to IPv6 extension headers or TCP options. Reusing exthdr expression to find and extract field values from them is therefore pretty straightforward. For now, this supports extracting data from chunks at a fixed offset (and length) only - chunks themselves are an extensible data structure; in order to make all fields available, a nested extension search is needed. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-28Merge tag 'nfs-for-5.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client bugfixes from Trond Myklebust: "Stable fixes: - Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config - Fix Oops in xs_tcp_send_request() when transport is disconnected - Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return() Bugfixes: - Fix instances where signal_pending() should be fatal_signal_pending() - fix an incorrect limit in filelayout_decode_layout() - Fixes for the SUNRPC backlogged RPC queue - Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce() - Revert commit 586a0787ce35 ("Clean up rpcrdma_prepare_readch()")" * tag 'nfs-for-5.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: Remove trailing semicolon in macros xprtrdma: Revert 586a0787ce35 NFSv4: Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config NFS: Clean up reset of the mirror accounting variables NFS: Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce() NFS: Fix an Oopsable condition in __nfs_pageio_add_request() SUNRPC: More fixes for backlog congestion SUNRPC: Fix Oops in xs_tcp_send_request() when transport is disconnected NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return() SUNRPC in case of backlog, hand free slots directly to waiting task pNFS/NFSv4: Remove redundant initialization of 'rd_size' NFS: fix an incorrect limit in filelayout_decode_layout() fs/nfs: Use fatal_signal_pending instead of signal_pending
2021-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for net: 1) Fix incorrect sockopts unregistration from error path, from Florian Westphal. 2) A few patches to provide better error reporting when missing kernel netfilter options are missing in .config. 3) Fix dormant table flag updates. 4) Memleak in IPVS when adding service with IP_VS_SVC_F_HASHED flag. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: ipvs: ignore IP_VS_SVC_F_HASHED flag when adding service netfilter: nf_tables: fix table flag updates netfilter: nf_tables: extended netlink error reporting for chain type netfilter: nf_tables: missing error reporting for not selected expressions netfilter: conntrack: unregister ipv4 sockopts on error unwind ==================== Link: https://lore.kernel.org/r/20210527190115.98503-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27net/sched: act_ct: Fix ct template allocation for zone 0Ariel Levkovich
Fix current behavior of skipping template allocation in case the ct action is in zone 0. Skipping the allocation may cause the datapath ct code to ignore the entire ct action with all its attributes (commit, nat) in case the ct action in zone 0 was preceded by a ct clear action. The ct clear action sets the ct_state to untracked and resets the skb->_nfct pointer. Under these conditions and without an allocated ct template, the skb->_nfct pointer will remain NULL which will cause the tc ct action handler to exit without handling commit and nat actions, if such exist. For example, the following rule in OVS dp: recirc_id(0x2),ct_state(+new-est-rel-rpl+trk),ct_label(0/0x1), \ in_port(eth0),actions:ct_clear,ct(commit,nat(src=10.11.0.12)), \ recirc(0x37a) Will result in act_ct skipping the commit and nat actions in zone 0. The change removes the skipping of template allocation for zone 0 and treats it the same as any other zone. Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct") Signed-off-by: Ariel Levkovich <lariel@nvidia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/20210526170110.54864-1-lariel@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27net/sched: act_ct: Offload connections with commit actionPaul Blakey
Currently established connections are not offloaded if the filter has a "ct commit" action. This behavior will not offload connections of the following scenario: $ tc_filter add dev $DEV ingress protocol ip prio 1 flower \ ct_state -trk \ action ct commit action goto chain 1 $ tc_filter add dev $DEV ingress protocol ip chain 1 prio 1 flower \ action mirred egress redirect dev $DEV2 $ tc_filter add dev $DEV2 ingress protocol ip prio 1 flower \ action ct commit action goto chain 1 $ tc_filter add dev $DEV2 ingress protocol ip prio 1 chain 1 flower \ ct_state +trk+est \ action mirred egress redirect dev $DEV Offload established connections, regardless of the commit flag. Fixes: 46475bb20f4b ("net/sched: act_ct: Software offload of established flows") Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Paul Blakey <paulb@nvidia.com> Link: https://lore.kernel.org/r/1622029449-27060-1-git-send-email-paulb@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27devlink: Correct VIRTUAL port to not have phys_port attributesParav Pandit
Physical port name, port number attributes do not belong to virtual port flavour. When VF or SF virtual ports are registered they incorrectly append "np0" string in the netdevice name of the VF/SF. Before this fix, VF netdevice name were ens2f0np0v0, ens2f0np0v1 for VF 0 and 1 respectively. After the fix, they are ens2f0v0, ens2f0v1. With this fix, reading /sys/class/net/ens2f0v0/phys_port_name returns -EOPNOTSUPP. Also devlink port show example for 2 VFs on one PF to ensure that any physical port attributes are not exposed. $ devlink port show pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false pci/0000:06:00.3/196608: type eth netdev ens2f0v0 flavour virtual splittable false pci/0000:06:00.4/262144: type eth netdev ens2f0v1 flavour virtual splittable false This change introduces a netdevice name change on systemd/udev version 245 and higher which honors phys_port_name sysfs file for generation of netdevice name. This also aligns to phys_port_name usage which is limited to switchdev ports as described in [1]. [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/Documentation/networking/switchdev.rst Fixes: acf1ee44ca5d ("devlink: Introduce devlink port flavour virtual") Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20210526200027.14008-1-parav@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27Merge tag 'linux-can-next-for-5.14-20210527' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== can-next 2021-05-27 The first 2 patches are by Geert Uytterhoeven and convert the rcan_can and rcan_canfd device tree bindings to yaml. The next 2 patches are by Oliver Hartkopp and me and update the CAN uapi headers. zuoqilin's patch removes an unnecessary variable from the CAN proc code. Patrick Menschel contributes 3 patches for CAN ISOTP to enhance the error messages. Jiapeng Chong's patch removes two dead stores from the softing driver. The next 4 patches are by me and silence several warnings found by clang compiler. Jimmy Assarsson's patches for the kvaser_usb driver add support for the Kvaser hydra devices. Dario Binacchi provides 2 patches for the c_can driver, first removing an unused variable, then adding basic ethtool support to query driver and ring parameter info. The last 4 patches are by Torin Cooper-Bennun and clean up the m_can driver. * tag 'linux-can-next-for-5.14-20210527' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (21 commits) can: m_can: fix whitespace in a few comments can: m_can: make TXESC, RXESC config more explicit can: m_can: clean up CCCR reg defs, order by revs can: m_can: use bits.h macros for all regmasks can: c_can: add ethtool support can: c_can: remove unused variable struct c_can_priv::rxmasked can: kvaser_usb: Add new Kvaser hydra devices can: kvaser_usb: Rename define USB_HYBRID_{,PRO_}CANLIN_PRODUCT_ID can: at91_can: silence clang warning can: mcp251xfd: silence clang warning can: mcp251x: mcp251x_can_probe(): silence clang warning can: hi311x: hi3110_can_probe(): silence clang warning can: softing: Remove redundant variable ptr can: isotp: Add error message if txqueuelen is too small can: isotp: add symbolic error message to isotp_module_init() can: isotp: change error format from decimal to symbolic error names can: proc: remove unnecessary variables can: uapi: introduce CANFD_FDF flag for mixed content in struct canfd_frame can: uapi: update CAN-FD frame description dt-bindings: can: rcar_canfd: Convert to json-schema ... ==================== Link: https://lore.kernel.org/r/20210527084532.1384031-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27devlink: append split port number to the port nameJiri Pirko
Instead of doing sprintf twice in case the port is split or not, append the split port suffix in case the port is split. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20210527104819.789840-1-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
cdc-wdm: s/kill_urbs/poison_urbs/ to fix build Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27Bluetooth: fix the erroneous flush_work() orderLin Ma
In the cleanup routine for failed initialization of HCI device, the flush_work(&hdev->rx_work) need to be finished before the flush_work(&hdev->cmd_work). Otherwise, the hci_rx_work() can possibly invoke new cmd_work and cause a bug, like double free, in late processings. This was assigned CVE-2021-3564. This patch reorder the flush_work() to fix this bug. Cc: Marcel Holtmann <marcel@holtmann.org> Cc: Johan Hedberg <johan.hedberg@gmail.com> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: linux-bluetooth@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Lin Ma <linma@zju.edu.cn> Signed-off-by: Hao Xiong <mart1n@zju.edu.cn> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-05-27xprtrdma: Revert 586a0787ce35Chuck Lever
Commit 9ed5af268e88 ("SUNRPC: Clean up the handling of page padding in rpc_prepare_reply_pages()") [Dec 2020] affects RPC Replies that have a data payload (i.e., Write chunks). rpcrdma_prepare_readch(), as its name suggests, sets up Read chunks which are data payloads within RPC Calls. Those payloads are constructed by xdr_write_pages(), which continues to stuff the call buffer's tail kvec with the payload's XDR roundup. Thus removing the tail buffer logic in rpcrdma_prepare_readch() was the wrong thing to do. Fixes: 586a0787ce35 ("xprtrdma: Clean up rpcrdma_prepare_readch()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-27ipvs: ignore IP_VS_SVC_F_HASHED flag when adding serviceJulian Anastasov
syzbot reported memory leak [1] when adding service with HASHED flag. We should ignore this flag both from sockopt and netlink provided data, otherwise the service is not hashed and not visible while releasing resources. [1] BUG: memory leak unreferenced object 0xffff888115227800 (size 512): comm "syz-executor263", pid 8658, jiffies 4294951882 (age 12.560s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff83977188>] kmalloc include/linux/slab.h:556 [inline] [<ffffffff83977188>] kzalloc include/linux/slab.h:686 [inline] [<ffffffff83977188>] ip_vs_add_service+0x598/0x7c0 net/netfilter/ipvs/ip_vs_ctl.c:1343 [<ffffffff8397d770>] do_ip_vs_set_ctl+0x810/0xa40 net/netfilter/ipvs/ip_vs_ctl.c:2570 [<ffffffff838449a8>] nf_setsockopt+0x68/0xa0 net/netfilter/nf_sockopt.c:101 [<ffffffff839ae4e9>] ip_setsockopt+0x259/0x1ff0 net/ipv4/ip_sockglue.c:1435 [<ffffffff839fa03c>] raw_setsockopt+0x18c/0x1b0 net/ipv4/raw.c:857 [<ffffffff83691f20>] __sys_setsockopt+0x1b0/0x360 net/socket.c:2117 [<ffffffff836920f2>] __do_sys_setsockopt net/socket.c:2128 [inline] [<ffffffff836920f2>] __se_sys_setsockopt net/socket.c:2125 [inline] [<ffffffff836920f2>] __x64_sys_setsockopt+0x22/0x30 net/socket.c:2125 [<ffffffff84350efa>] do_syscall_64+0x3a/0xb0 arch/x86/entry/common.c:47 [<ffffffff84400068>] entry_SYSCALL_64_after_hwframe+0x44/0xae Reported-and-tested-by: syzbot+e562383183e4b1766930@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Julian Anastasov <ja@ssi.bg> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-05-27can: isotp: Add error message if txqueuelen is too smallPatrick Menschel
This patch adds an additional error message in case that txqueuelen is set too small and advices the user to increase txqueuelen. This is likely to happen even with small transfers if txqueuelen is at default value 10 frames. Link: https://lore.kernel.org/r/20210427052150.2308-4-menschel.p@posteo.de Signed-off-by: Patrick Menschel <menschel.p@posteo.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-05-27can: isotp: add symbolic error message to isotp_module_init()Patrick Menschel
This patch adds the value of err with format %pe to the already existing error message. Link: https://lore.kernel.org/r/20210427052150.2308-3-menschel.p@posteo.de Signed-off-by: Patrick Menschel <menschel.p@posteo.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-05-27can: isotp: change error format from decimal to symbolic error namesPatrick Menschel
This patch changes the format string for errors from decimal %d to symbolic error names %pe to achieve more comprehensive log messages. Link: https://lore.kernel.org/r/20210427052150.2308-2-menschel.p@posteo.de Signed-off-by: Patrick Menschel <menschel.p@posteo.de> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-05-27can: proc: remove unnecessary variableszuoqilin
There is no need to define the variable "rate" to receive, just return directly. Link: https://lore.kernel.org/r/20210514100806.792-1-zuoqilin1@163.com Signed-off-by: zuoqilin <zuoqilin@yulong.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2021-05-26Merge tag 'net-5.13-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.13-rc4, including fixes from bpf, netfilter, can and wireless trees. Notably including fixes for the recently announced "FragAttacks" WiFi vulnerabilities. Rather large batch, touching some core parts of the stack, too, but nothing hair-raising. Current release - regressions: - tipc: make node link identity publish thread safe - dsa: felix: re-enable TAS guard band mode - stmmac: correct clocks enabled in stmmac_vlan_rx_kill_vid() - stmmac: fix system hang if change mac address after interface ifdown Current release - new code bugs: - mptcp: avoid OOB access in setsockopt() - bpf: Fix nested bpf_bprintf_prepare with more per-cpu buffers - ethtool: stats: fix a copy-paste error - init correct array size Previous releases - regressions: - sched: fix packet stuck problem for lockless qdisc - net: really orphan skbs tied to closing sk - mlx4: fix EEPROM dump support - bpf: fix alu32 const subreg bound tracking on bitwise operations - bpf: fix mask direction swap upon off reg sign change - bpf, offload: reorder offload callback 'prepare' in verifier - stmmac: Fix MAC WoL not working if PHY does not support WoL - packetmmap: fix only tx timestamp on request - tipc: skb_linearize the head skb when reassembling msgs Previous releases - always broken: - mac80211: address recent "FragAttacks" vulnerabilities - mac80211: do not accept/forward invalid EAPOL frames - mptcp: avoid potential error message floods - bpf, ringbuf: deny reserve of buffers larger than ringbuf to prevent out of buffer writes - bpf: forbid trampoline attach for functions with variable arguments - bpf: add deny list of functions to prevent inf recursion of tracing programs - tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT - can: isotp: prevent race between isotp_bind() and isotp_setsockopt() - netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check, fallback to non-AVX2 version Misc: - bpf: add kconfig knob for disabling unpriv bpf by default" * tag 'net-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (172 commits) net: phy: Document phydev::dev_flags bits allocation mptcp: validate 'id' when stopping the ADD_ADDR retransmit timer mptcp: avoid error message on infinite mapping mptcp: drop unconditional pr_warn on bad opt mptcp: avoid OOB access in setsockopt() nfp: update maintainer and mailing list addresses net: mvpp2: add buffer header handling in RX bnx2x: Fix missing error code in bnx2x_iov_init_one() net: zero-initialize tc skb extension on allocation net: hns: Fix kernel-doc sctp: fix the proc_handler for sysctl encap_port sctp: add the missing setting for asoc encap_port bpf, selftests: Adjust few selftest result_unpriv outcomes bpf: No need to simulate speculative domain for immediates bpf: Fix mask direction swap upon off reg sign change bpf: Wrap aux data inside bpf_sanitize_info container bpf: Fix BPF_LSM kconfig symbol dependency selftests/bpf: Add test for l3 use of bpf_redirect_peer bpftool: Add sock_release help info for cgroup attach/prog load command net: dsa: microchip: enable phy errata workaround on 9567 ...
2021-05-26locking/atomic: net: use linux/atomic.h for xchg & cmpxchgMark Rutland
As xchg*() and cmpxchg*() may be instrumented by atomic-instrumented.h, it's necessary to include <linux/atomic.h> to use these, rather than <asm/cmpxchg.h>, which is effectively an arch-internal header. In a couple of places we include <asm/cmpxchg.h>, but get away with this as <linux/atomic.h> gets pulled in inidrectly by another include. Before we convert more architectures to use atomic-instrumented.h, let's fix these up to use <linux/atomic.h> so that we don't make things more fragile. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210525140232.53872-3-mark.rutland@arm.com
2021-05-26SUNRPC: More fixes for backlog congestionTrond Myklebust
Ensure that we fix the XPRT_CONGESTED starvation issue for RDMA as well as socket based transports. Ensure we always initialise the request after waking up from the backlog list. Fixes: e877a88d1f06 ("SUNRPC in case of backlog, hand free slots directly to waiting task") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-05-26xdp: Extend xdp_redirect_map with broadcast supportHangbin Liu
This patch adds two flags BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS to extend xdp_redirect_map for broadcast support. With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be excluded when do broadcasting. When getting the devices in dev hash map via dev_map_hash_get_next_key(), there is a possibility that we fall back to the first key when a device was removed. This will duplicate packets on some interfaces. So just walk the whole buckets to avoid this issue. For dev array map, we also walk the whole map to find valid interfaces. Function bpf_clear_redirect_map() was removed in commit ee75aef23afe ("bpf, xdp: Restructure redirect actions"). Add it back as we need to use ri->map again. With test topology: +-------------------+ +-------------------+ | Host A (i40e 10G) | ---------- | eno1(i40e 10G) | +-------------------+ | | | Host B | +-------------------+ | | | Host C (i40e 10G) | ---------- | eno2(i40e 10G) | +-------------------+ | | | +------+ | | veth0 -- | Peer | | | veth1 -- | | | | veth2 -- | NS | | | +------+ | +-------------------+ On Host A: # pktgen/pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -s 64 On Host B(Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz, 128G Memory): Use xdp_redirect_map and xdp_redirect_map_multi in samples/bpf for testing. All the veth peers in the NS have a XDP_DROP program loaded. The forward_map max_entries in xdp_redirect_map_multi is modify to 4. Testing the performance impact on the regular xdp_redirect path with and without patch (to check impact of additional check for broadcast mode): 5.12 rc4 | redirect_map i40e->i40e | 2.0M | 9.7M 5.12 rc4 | redirect_map i40e->veth | 1.7M | 11.8M 5.12 rc4 + patch | redirect_map i40e->i40e | 2.0M | 9.6M 5.12 rc4 + patch | redirect_map i40e->veth | 1.7M | 11.7M Testing the performance when cloning packets with the redirect_map_multi test, using a redirect map size of 4, filled with 1-3 devices: 5.12 rc4 + patch | redirect_map multi i40e->veth (x1) | 1.7M | 11.4M 5.12 rc4 + patch | redirect_map multi i40e->veth (x2) | 1.1M | 4.3M 5.12 rc4 + patch | redirect_map multi i40e->veth (x3) | 0.8M | 2.6M Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/20210519090747.1655268-3-liuhangbin@gmail.com
2021-05-26net: Remove unnecessary variableszuoqilin
It is not necessary to define variables to receive -ENOMEM, directly return -ENOMEM. Signed-off-by: zuoqilin <zuoqilin@yulong.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-05-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2021-05-26 The following pull-request contains BPF updates for your *net* tree. We've added 14 non-merge commits during the last 14 day(s) which contain a total of 17 files changed, 513 insertions(+), 231 deletions(-). The main changes are: 1) Fix bpf_skb_change_head() helper to reset mac_len, from Jussi Maki. 2) Fix masking direction swap upon off-reg sign change, from Daniel Borkmann. 3) Fix BPF offloads in verifier by reordering driver callback, from Yinjun Zhang. 4) BPF selftest for ringbuf mmap ro/rw restrictions, from Andrii Nakryiko. 5) Follow-up fixes to nested bprintf per-cpu buffers, from Florent Revest. 6) Fix bpftool sock_release attach point help info, from Liu Jian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-25mptcp: validate 'id' when stopping the ADD_ADDR retransmit timerDavide Caratti
when Linux receives an echo-ed ADD_ADDR, it checks the IP address against the list of "announced" addresses. In case of a positive match, the timer that handles retransmissions is stopped regardless of the 'Address Id' in the received packet: this behaviour does not comply with RFC8684 3.4.1. Fix it by validating the 'Address Id' in received echo-ed ADD_ADDRs. Tested using packetdrill, with the following captured output: unpatched kernel: Out <...> Flags [.], ack 1, win 256, options [mptcp add-addr v1 id 1 198.51.100.2 hmac 0xfd2e62517888fe29,mptcp dss ack 3007449509], length 0 In <...> Flags [.], ack 1, win 257, options [mptcp add-addr v1-echo id 1 1.2.3.4,mptcp dss ack 3013740213], length 0 Out <...> Flags [.], ack 1, win 256, options [mptcp add-addr v1 id 1 198.51.100.2 hmac 0xfd2e62517888fe29,mptcp dss ack 3007449509], length 0 In <...> Flags [.], ack 1, win 257, options [mptcp add-addr v1-echo id 90 198.51.100.2,mptcp dss ack 3013740213], length 0 ^^^ retransmission is stopped here, but 'Address Id' is 90 patched kernel: Out <...> Flags [.], ack 1, win 256, options [mptcp add-addr v1 id 1 198.51.100.2 hmac 0x1cf372d59e05f4b8,mptcp dss ack 3007449509], length 0 In <...> Flags [.], ack 1, win 257, options [mptcp add-addr v1-echo id 1 1.2.3.4,mptcp dss ack 1672384568], length 0 Out <...> Flags [.], ack 1, win 256, options [mptcp add-addr v1 id 1 198.51.100.2 hmac 0x1cf372d59e05f4b8,mptcp dss ack 3007449509], length 0 In <...> Flags [.], ack 1, win 257, options [mptcp add-addr v1-echo id 90 198.51.100.2,mptcp dss ack 1672384568], length 0 Out <...> Flags [.], ack 1, win 256, options [mptcp add-addr v1 id 1 198.51.100.2 hmac 0x1cf372d59e05f4b8,mptcp dss ack 3007449509], length 0 In <...> Flags [.], ack 1, win 257, options [mptcp add-addr v1-echo id 1 198.51.100.2,mptcp dss ack 1672384568], length 0 ^^^ retransmission is stopped here, only when both 'Address Id' and 'IP Address' match Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-25mptcp: avoid error message on infinite mappingPaolo Abeni
Another left-over. Avoid flooding dmesg with useless text, we already have a MIB for that event. Fixes: 648ef4b88673 ("mptcp: Implement MPTCP receive path") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-25mptcp: drop unconditional pr_warn on bad optPaolo Abeni
This is a left-over of early day. A malicious peer can flood the kernel logs with useless messages, just drop it. Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-25mptcp: avoid OOB access in setsockopt()Paolo Abeni
We can't use tcp_set_congestion_control() on an mptcp socket, as such function can end-up accessing a tcp-specific field - prior_ssthresh - causing an OOB access. To allow propagating the correct ca algo on subflow, cache the ca name at initialization time. Additionally avoid overriding the user-selected CA (if any) at clone time. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/182 Fixes: aa1fbd94e5c7 ("mptcp: sockopt: add TCP_CONGESTION and TCP_INFO") Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-25net: zero-initialize tc skb extension on allocationVlad Buslov
Function skb_ext_add() doesn't initialize created skb extension with any value and leaves it up to the user. However, since extension of type TC_SKB_EXT originally contained only single value tc_skb_ext->chain its users used to just assign the chain value without setting whole extension memory to zero first. This assumption changed when TC_SKB_EXT extension was extended with additional fields but not all users were updated to initialize the new fields which leads to use of uninitialized memory afterwards. UBSAN log: [ 778.299821] UBSAN: invalid-load in net/openvswitch/flow.c:899:28 [ 778.301495] load of value 107 is not a valid value for type '_Bool' [ 778.303215] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.12.0-rc7+ #2 [ 778.304933] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 778.307901] Call Trace: [ 778.308680] <IRQ> [ 778.309358] dump_stack+0xbb/0x107 [ 778.310307] ubsan_epilogue+0x5/0x40 [ 778.311167] __ubsan_handle_load_invalid_value.cold+0x43/0x48 [ 778.312454] ? memset+0x20/0x40 [ 778.313230] ovs_flow_key_extract.cold+0xf/0x14 [openvswitch] [ 778.314532] ovs_vport_receive+0x19e/0x2e0 [openvswitch] [ 778.315749] ? ovs_vport_find_upcall_portid+0x330/0x330 [openvswitch] [ 778.317188] ? create_prof_cpu_mask+0x20/0x20 [ 778.318220] ? arch_stack_walk+0x82/0xf0 [ 778.319153] ? secondary_startup_64_no_verify+0xb0/0xbb [ 778.320399] ? stack_trace_save+0x91/0xc0 [ 778.321362] ? stack_trace_consume_entry+0x160/0x160 [ 778.322517] ? lock_release+0x52e/0x760 [ 778.323444] netdev_frame_hook+0x323/0x610 [openvswitch] [ 778.324668] ? ovs_netdev_get_vport+0xe0/0xe0 [openvswitch] [ 778.325950] __netif_receive_skb_core+0x771/0x2db0 [ 778.327067] ? lock_downgrade+0x6e0/0x6f0 [ 778.328021] ? lock_acquire+0x565/0x720 [ 778.328940] ? generic_xdp_tx+0x4f0/0x4f0 [ 778.329902] ? inet_gro_receive+0x2a7/0x10a0 [ 778.330914] ? lock_downgrade+0x6f0/0x6f0 [ 778.331867] ? udp4_gro_receive+0x4c4/0x13e0 [ 778.332876] ? lock_release+0x52e/0x760 [ 778.333808] ? dev_gro_receive+0xcc8/0x2380 [ 778.334810] ? lock_downgrade+0x6f0/0x6f0 [ 778.335769] __netif_receive_skb_list_core+0x295/0x820 [ 778.336955] ? process_backlog+0x780/0x780 [ 778.337941] ? mlx5e_rep_tc_netdevice_event_unregister+0x20/0x20 [mlx5_core] [ 778.339613] ? seqcount_lockdep_reader_access.constprop.0+0xa7/0xc0 [ 778.341033] ? kvm_clock_get_cycles+0x14/0x20 [ 778.342072] netif_receive_skb_list_internal+0x5f5/0xcb0 [ 778.343288] ? __kasan_kmalloc+0x7a/0x90 [ 778.344234] ? mlx5e_handle_rx_cqe_mpwrq+0x9e0/0x9e0 [mlx5_core] [ 778.345676] ? mlx5e_xmit_xdp_frame_mpwqe+0x14d0/0x14d0 [mlx5_core] [ 778.347140] ? __netif_receive_skb_list_core+0x820/0x820 [ 778.348351] ? mlx5e_post_rx_mpwqes+0xa6/0x25d0 [mlx5_core] [ 778.349688] ? napi_gro_flush+0x26c/0x3c0 [ 778.350641] napi_complete_done+0x188/0x6b0 [ 778.351627] mlx5e_napi_poll+0x373/0x1b80 [mlx5_core] [ 778.352853] __napi_poll+0x9f/0x510 [ 778.353704] ? mlx5_flow_namespace_set_mode+0x260/0x260 [mlx5_core] [ 778.355158] net_rx_action+0x34c/0xa40 [ 778.356060] ? napi_threaded_poll+0x3d0/0x3d0 [ 778.357083] ? sched_clock_cpu+0x18/0x190 [ 778.358041] ? __common_interrupt+0x8e/0x1a0 [ 778.359045] __do_softirq+0x1ce/0x984 [ 778.359938] __irq_exit_rcu+0x137/0x1d0 [ 778.360865] irq_exit_rcu+0xa/0x20 [ 778.361708] common_interrupt+0x80/0xa0 [ 778.362640] </IRQ> [ 778.363212] asm_common_interrupt+0x1e/0x40 [ 778.364204] RIP: 0010:native_safe_halt+0xe/0x10 [ 778.365273] Code: 4f ff ff ff 4c 89 e7 e8 50 3f 40 fe e9 dc fe ff ff 48 89 df e8 43 3f 40 fe eb 90 cc e9 07 00 00 00 0f 00 2d 74 05 62 00 fb f4 <c3> 90 e9 07 00 00 00 0f 00 2d 64 05 62 00 f4 c3 cc cc 0f 1f 44 00 [ 778.369355] RSP: 0018:ffffffff84407e48 EFLAGS: 00000246 [ 778.370570] RAX: ffff88842de46a80 RBX: ffffffff84425840 RCX: ffffffff83418468 [ 778.372143] RDX: 000000000026f1da RSI: 0000000000000004 RDI: ffffffff8343af5e [ 778.373722] RBP: fffffbfff0884b08 R08: 0000000000000000 R09: ffff88842de46bcb [ 778.375292] R10: ffffed1085bc8d79 R11: 0000000000000001 R12: 0000000000000000 [ 778.376860] R13: ffffffff851124a0 R14: 0000000000000000 R15: dffffc0000000000 [ 778.378491] ? rcu_eqs_enter.constprop.0+0xb8/0xe0 [ 778.379606] ? default_idle_call+0x5e/0xe0 [ 778.380578] default_idle+0xa/0x10 [ 778.381406] default_idle_call+0x96/0xe0 [ 778.382350] do_idle+0x3d4/0x550 [ 778.383153] ? arch_cpu_idle_exit+0x40/0x40 [ 778.384143] cpu_startup_entry+0x19/0x20 [ 778.385078] start_kernel+0x3c7/0x3e5 [ 778.385978] secondary_startup_64_no_verify+0xb0/0xbb Fix the issue by providing new function tc_skb_ext_alloc() that allocates tc skb extension and initializes its memory to 0 before returning it to the caller. Change all existing users to use new API instead of calling skb_ext_add() directly. Fixes: 038ebb1a713d ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct") Fixes: d29334c15d33 ("net/sched: act_api: fix miss set post_ct for ovs after do conntrack in act_ct") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Acked-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-25net: bridge: remove redundant assignmentNigel Christian
The variable br is assigned a value that is not being read after exiting case IFLA_STATS_LINK_XSTATS_SLAVE. The assignment is redundant and can be removed. Addresses-Coverity ("Unused value") Signed-off-by: Nigel Christian <nigel.l.christian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-25sctp: fix the proc_handler for sysctl encap_portXin Long
proc_dointvec() cannot do min and max check for setting a value when extra1/extra2 is set, so change it to proc_dointvec_minmax() for sysctl encap_port. Fixes: e8a3001c2120 ("sctp: add encap_port for netns sock asoc and transport") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-25sctp: add the missing setting for asoc encap_portXin Long
This patch is to add the missing setting back for asoc encap_port. Fixes: 8dba29603b5c ("sctp: add SCTP_REMOTE_UDP_ENCAPS_PORT sockopt") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-25xsk: Use kvcalloc to support large umemsMagnus Karlsson
Use kvcalloc() instead of kcalloc() to support large umems with, on my server, one million pages or more in the umem. Reported-by: Dan Siemon <dan@coverfire.com> Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Björn Töpel <bjorn@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210521083301.26921-1-magnus.karlsson@gmail.com
2021-05-24net: hsr: fix mac_len checksGeorge McCollister
Commit 2e9f60932a2c ("net: hsr: check skb can contain struct hsr_ethhdr in fill_frame_info") added the following which resulted in -EINVAL always being returned: if (skb->mac_len < sizeof(struct hsr_ethhdr)) return -EINVAL; mac_len was not being set correctly so this check completely broke HSR/PRP since it was always 14, not 20. Set mac_len correctly and modify the mac_len checks to test in the correct places since sometimes it is legitimately 14. Fixes: 2e9f60932a2c ("net: hsr: check skb can contain struct hsr_ethhdr in fill_frame_info") Signed-off-by: George McCollister <george.mccollister@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-24sch_dsmark: fix a NULL deref in qdisc_reset()Taehee Yoo
If Qdisc_ops->init() is failed, Qdisc_ops->reset() would be called. When dsmark_init(Qdisc_ops->init()) is failed, it possibly doesn't initialize dsmark_qdisc_data->q. But dsmark_reset(Qdisc_ops->reset()) uses dsmark_qdisc_data->q pointer wihtout any null checking. So, panic would occur. Test commands: sysctl net.core.default_qdisc=dsmark -w ip link add dummy0 type dummy ip link add vw0 link dummy0 type virt_wifi ip link set vw0 up Splat looks like: KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] CPU: 3 PID: 684 Comm: ip Not tainted 5.12.0+ #910 RIP: 0010:qdisc_reset+0x2b/0x680 Code: 1f 44 00 00 48 b8 00 00 00 00 00 fc ff df 41 57 41 56 41 55 41 54 55 48 89 fd 48 83 c7 18 53 48 89 fa 48 c1 ea 03 48 83 ec 20 <80> 3c 02 00 0f 85 09 06 00 00 4c 8b 65 18 0f 1f 44 00 00 65 8b 1d RSP: 0018:ffff88800fda6bf8 EFLAGS: 00010282 RAX: dffffc0000000000 RBX: ffff8880050ed800 RCX: 0000000000000000 RDX: 0000000000000003 RSI: ffffffff99e34100 RDI: 0000000000000018 RBP: 0000000000000000 R08: fffffbfff346b553 R09: fffffbfff346b553 R10: 0000000000000001 R11: fffffbfff346b552 R12: ffffffffc0824940 R13: ffff888109e83800 R14: 00000000ffffffff R15: ffffffffc08249e0 FS: 00007f5042287680(0000) GS:ffff888119800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055ae1f4dbd90 CR3: 0000000006760002 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? rcu_read_lock_bh_held+0xa0/0xa0 dsmark_reset+0x3d/0xf0 [sch_dsmark] qdisc_reset+0xa9/0x680 qdisc_destroy+0x84/0x370 qdisc_create_dflt+0x1fe/0x380 attach_one_default_qdisc.constprop.41+0xa4/0x180 dev_activate+0x4d5/0x8c0 ? __dev_open+0x268/0x390 __dev_open+0x270/0x390 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>