Age | Commit message (Collapse) | Author |
|
Expose device capabilities for DEVX user context, it includes which caps
the device is supported and a matching bit to set as part of user
context creation.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
There is no need to perform modify_rmp in two separate function,
while one of them uses stack as a placeholder for data while other
allocates it dynamically. Combine those two functions to one call
instead of two.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
There is no need to perform create_rmp in two separate function, while
one of them uses stack as a placeholder for data while other allocates
it dynamically. Combine those two functions to one instead of two.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
Transfer initialization and cleanup from mlx5_priv struct of
mlx5_core_dev to be part of mlx5_ib_dev. This completes removal
of SRQ from mlx5_core.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
Reflect the change of moving SRQ code from mlx5_core to mlx5_ib by
updating function signatures do not require mlx5_core_dev as an input,
because all operations in mlx5_ib are supposed to use mlx5_ib_dev.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
Reuse existing infrastructure to initialize and release DEVX uid.
The DevX interface is intended for user space access, so it is supposed
to be initialized before ib_register_device(). Also it isn't supported
in switchdev mode and don't need to initialize it in that mode.
Fixes: 76dc5a8406bf ("IB/mlx5: Manage device uid for DEVX white list commands")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
SRQ signature is not supported, hence no need for special static
global variable to announce it.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
There is no need to keep SRQ which is RDMA object in mlx5_core.
In this patch, we partially move the execution code, while next patches
will move table initialization/release logic too.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
As a preparation to move SRQ functionality to RDMA, drop all references
to mlx5_core logic and make SRQ be dependent on shared code only.
Most of the time, we are interested to know if events are working/not
working and it is possible with previous commit ("net/mlx5: Debug print
for forwarded async events").
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
lib/eq.h is needed for EQ manipulation which are not performed in SRQ.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
Delete functions which are not called and not needed.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
Ensure that both RDMA and netdev parts of SRQ implementation
has same copyright and license information annotated by SPDX
tags.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
|
|
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
basechain->stats is rcu protected data which is updated from
nft_chain_stats_replace(). This function is executed from the commit
phase which holds the pernet nf_tables commit mutex - not the global
nfnetlink subsystem mutex.
Test commands to reproduce the problem are:
%iptables-nft -I INPUT
%iptables-nft -Z
%iptables-nft -Z
This patch uses RCU calls to handle basechain->stats updates to fix a
splat that looks like:
[89279.358755] =============================
[89279.363656] WARNING: suspicious RCU usage
[89279.368458] 4.20.0-rc2+ #44 Tainted: G W L
[89279.374661] -----------------------------
[89279.379542] net/netfilter/nf_tables_api.c:1404 suspicious rcu_dereference_protected() usage!
[...]
[89279.406556] 1 lock held by iptables-nft/5225:
[89279.411728] #0: 00000000bf45a000 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1f/0x70 [nf_tables]
[89279.424022] stack backtrace:
[89279.429236] CPU: 0 PID: 5225 Comm: iptables-nft Tainted: G W L 4.20.0-rc2+ #44
[89279.430135] Call Trace:
[89279.430135] dump_stack+0xc9/0x16b
[89279.430135] ? show_regs_print_info+0x5/0x5
[89279.430135] ? lockdep_rcu_suspicious+0x117/0x160
[89279.430135] nft_chain_commit_update+0x4ea/0x640 [nf_tables]
[89279.430135] ? sched_clock_local+0xd4/0x140
[89279.430135] ? check_flags.part.35+0x440/0x440
[89279.430135] ? __rhashtable_remove_fast.constprop.67+0xec0/0xec0 [nf_tables]
[89279.430135] ? sched_clock_cpu+0x126/0x170
[89279.430135] ? find_held_lock+0x39/0x1c0
[89279.430135] ? hlock_class+0x140/0x140
[89279.430135] ? is_bpf_text_address+0x5/0xf0
[89279.430135] ? check_flags.part.35+0x440/0x440
[89279.430135] ? __lock_is_held+0xb4/0x140
[89279.430135] nf_tables_commit+0x2555/0x39c0 [nf_tables]
Fixes: f102d66b335a4 ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Tariq Toukan says:
====================
mlx4_core cleanups
This patchset by Erez contains cleanups to the mlx4_core driver.
Patch 1 replaces -EINVAL with -EOPNOTSUPP for unsupported operations.
Patch 2 fixes some coding style issues.
Series generated against net-next commit:
97e6c858a26e net: usb: aqc111: Initialize wol_cfg with memset in aqc111_suspend
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix 3 checkpatch errors within mlx4/main.c:
- Unnecessary mlx4_debug_level global variable initialization to 0.
- Prohibited space before comma.
- Whitespaces instead of tab.
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Functions __set_port_type and mlx4_check_port_params returned
-EINVAL while the proper return code is -EOPNOTSUPP as a
result of an unsupported operation. All drivers should generate
this and all users should check for it when detecting an
unsupported functionality.
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Logic of phy_device_create() requests PHY modules according to PHY ID
but for C45 PHYs we use different field for the IDs.
Let's also request the modules for these IDs.
Changes from v1:
- Only request C22 modules if C45 are not present (Andrew)
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Joao Pinto <joao.pinto@synopsys.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jerin Jacob says:
====================
octeontx2-af: NIX and NPC enhancements
This patchset is a continuation to earlier submitted four patch
series to add a new driver for Marvell's OcteonTX2 SOC's
Resource virtualization unit (RVU) admin function driver.
1. octeontx2-af: Add RVU Admin Function driver
https://www.spinics.net/lists/netdev/msg528272.html
2. octeontx2-af: NPA and NIX blocks initialization
https://www.spinics.net/lists/netdev/msg529163.html
3. octeontx2-af: NPC parser and NIX blocks initialization
https://www.spinics.net/lists/netdev/msg530252.html
4. octeontx2-af: NPC MCAM support and FLR handling
https://www.spinics.net/lists/netdev/msg534392.html
This patch series adds support for below
NPC block:
- Add NPC(mkex) profile support for various Key extraction configurations
NIX block:
- Enable dynamic RSS flow key algorithm configuration
- Enhancements on Rx checksum and error checks
- Add support for Tx packet marking support
- TL1 schedule queue allocation enhancements
- Add LSO format configuration mbox
- VLAN TPID configuration
- Skip multicast entry init for broadcast tables
v2:
- Rename FLOW_* to NIX_FLOW_* to avoid serious global namespace collisions,
as we have various FLOW_* definitions coming from
include/uapi/linux/pkt_cls.h for example.(David Miller)
- Pack the arguments of rvu_get_tl1_schqs() function
as 80 columns allows.(David Miller)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The following set of NPC registers allow the driver to configure NPC
to generate different key value schemes to compare against packet
payload in MCAM search.
NPC_AF_INTF(0..1)_KEX_CFG
NPC_AF_KEX_LDATA(0..1)_FLAGS_CFG
NPC_AF_INTF(0..1)_LID(0..7)_LT(0..15)_LD(0..1)_CFG
NPC_AF_INTF(0..1)_LDATA(0..1)_FLAGS(0..15)_CFG
Currently, the AF driver populates these registers to
configure the default values to address the most common
use cases such as key generation for channel number + DMAC.
The secure firmware stores different configuration
value of these registers to enable different NPC use case
along with the name for the lookup.
Patch loads profile binary from secure firmware over
the exiting CGX mailbox interface and apply the profile.
AF driver shall fall back to the default configuration
in case of any errors.
The AF consumer driver can know the selected profile
on response to NPC_GET_KEX_CFG mailbox by introducing
mkex_pfl_name in the struct npc_get_kex_cfg_rsp.
Signed-off-by: Vamsi Attunuru <vamsi.attunuru@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
NIX_AF_LSO_FORMAT(0..31)_FIELD(0..7) register enables an SW defined
means to define LSO packet modification formats.
0..31 works as an index to choose the algorithm, On success, the mailbox
returns the index to the client of chosen LSO algorithm selection.
This index will be used in configuring the transmit descriptors.
Add mailbox interface to dynamically reserve and configure LSO format.
This commit also fixes 'sizem1' for NIX_LSOALG_TCP_FLAGS
to '1' i.e 2 Bytes.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adds mailbox support for L4 checksum verification
and L3 and L4 length verification configuration.
Signed-off-by: Vidhya Raman <vraman@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Setup TPID's for vlan0 and vlan1 for Tx VLAN insertion offloads.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
NIX_AF_MARK_FORMAT(0..127)_CTL register enables an SW defined
means to mark/insert various data in the packet based on
final packet color from traffic shaping HW.
0..127 works as an index to choose the algorithm. On success,
the mailbox returns the index to the client.
Add NIX_MARK_FORMAT_CFG mailbox which reserves mark format based on
tuple (offset, y_mask, y_val, r_mask, r_val)
If the tuple is requested again for mark format that was already
reserved, then it will be reused. If not it will reserve a new entry
if space is available.
Also on AF init commonly used marker format such as VLAN DEI, IPv4
ECN, IPv4 DSCP are reserved for AF consumers.
Signed-off-by: Krzysztof Kanas <kkanas@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch adds support for enabling RSS in promiscuous mode
if RSS is already requested by the AF client.
Signed-off-by: Vamsi Attunuru <vamsi.attunuru@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to support all NIX specific valid length errors and
checksum errors on Rx, Update all NIX_AF_RX_DEF_* registers.
Also sorted all registers in HRM definition order.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch enables the inner IPv4 checksum and
defines the error code for Rx inner and outer checksum errors.
Setting ERRCODE as 1 so that CQE descriptor can be embedded
valid checksum error code and the driver can interpret
checksum error as ERRLEV = LID + 1 and ERRCODE = 1.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The default behavior was to free all the TLx Tx schedule
queues. This patch adds support for freeing a single Tx
schedule queue if TXSCHQ_FREE_ALL flag is not set.
Signed-off-by: Krzysztof Kanas <kkanas@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
TL1 is the root node in the scheduling hierarchy and
it is a global resource with a limited number.
This patch introduces restriction and validation on
the allocation of the TL1 nodes for the effective resource
sharing across the AF consumers.
- Limit TL1 allocation to 2 per lmac.
One could be for the normal link and one for IEEE802.3br
express link (Express Send DMA).
Effectively all the VF's of an RVU PF(lmac) share the two TL1 schqs.
- TL1 cannot be freed once allocated.
- Allow VF's to only apply default config to TL1 if not
already applied. PF's can always overwrite the TL1 config.
- Consider NIX_AQ_INSTOP_WRITE while validating txschq
when sq.ena is set.
Signed-off-by: Krzysztof Kanas <kkanas@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduced reserve_flowkey_alg_idx()to reserve RSS algorithm index,
it would internally use set_flowkey_fields() to generate fields
based on the flow key dynamically.
On AF driver init, it would reserve a predefined set RSS algo indexes,
which will be available all the time for all the AF driver consumers.
The leftover algo indexes can be reserved at runtime through
exiting nix_rss_flowkey_cfg mailbox message.
The NIX_FLOW_KEY_TYPE_PORT is removed from predefined a set of RSS flow
type as it is not used by any consumer.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Introduce state-based algorithm to convert the flow_key value
to RSS algo field used by NIX_AF_RX_FLOW_KEY_ALGX_FIELDX register.
The outer `for loop` goes over _all_ protocol field and the following
variables depict the state machine forward progress logic.
a) keyoff_marker - Enabled when hash byte length needs to be accounted
in field->key_offset update.
b) field_marker - Enabled when a new field needs to be selected.
c) group_member - Enabled when a protocol is part of a group.
This would remove the existing hard coding and enable to add
new protocol support seamlessly.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Added response for nix_rss_flowkey_cfg message to return
selected RSS algorithm index.
The FLOW_KEY_TYPE* definition is part of the mbox message and
it will be used by the other consumers of AF driver hence moving to mbox.h.
Also renamed FLOW_* definitions to NIX_FLOW_* to avoid global
name space collisions, as we have various coming from
include/uapi/linux/pkt_cls.h for example.
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
At the time of initial broadcast packet replication table init,
NIXLFs are not yet attached to PF_FUNCs. Hence skipped checking
NIXLF while submitting MCE entry init instruction to NIX admin queue.
Also did a minor cleanup while installing bcast match entry in
packet parser unit i.e NPC.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tariq Toukan says:
====================
mlx4 fixes for 4.20-rc
This patchset includes small fixes for the mlx4_en driver.
First patch by Eran fixes the value used to init the netdevice's
min_mtu field.
Please queue it to -stable >= v4.10.
Second patch by Saeed adds missing Kconfig build dependencies.
Series generated against net commit:
35b827b6d061 tun: forbid iface creation with rtnl ops
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
MLX4_EN depends on NETDEVICES, ETHERNET and INET Kconfigs.
Make sure they are listed in MLX4_EN Kconfig dependencies.
This fixes the following build break:
drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: ‘struct iphdr’ declared inside parameter list [enabled by default]
struct iphdr *iph)
^
drivers/net/ethernet/mellanox/mlx4/en_rx.c:582:18: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
drivers/net/ethernet/mellanox/mlx4/en_rx.c: In function ‘get_fixed_ipv4_csum’:
drivers/net/ethernet/mellanox/mlx4/en_rx.c:586:20: error: dereferencing pointer to incomplete type
_u8 ipproto = iph->protocol;
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
NIC driver minimal MTU size shall be set to ETH_MIN_MTU, as defined in
the RFC791 and in the network stack. Remove old mlx4_en only define for
it, which was set to wrong value.
Fixes: b80f71f5816f ("ethernet/mellanox: use core min/max MTU checking")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
netif_napi_add() could report an error like this below due to it allows
to pass a format string for wildcarding before calling
dev_get_valid_name(),
"netif_napi_add() called with weight 256 on device eth%d"
For example, hns_enet_drv module does this.
hns_nic_try_get_ae
hns_nic_init_ring_data
netif_napi_add
register_netdev
dev_get_valid_name
Hence, make it a bit more human-readable by using netdev_err_once()
instead.
Signed-off-by: Qian Cai <cai@gmx.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 17c91487364fb33797ed84022564ee7544ac4945.
Rafael found that this commit broke the SD card reader in his
Acer Aspire S5. Details of the problem are in the bugzilla below.
Fixes: 17c91487364f ("PCI/ASPM: Do not initialize link state when aspm_disabled is set")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201801
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
Disable hardware level MAC learning because it breaks station roaming.
When enabled it drops all frames that arrive from a MAC address
that is on a different port at learning table.
Signed-off-by: Anderson Luiz Alves <alacn1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
A MAC address must be unique among all the macvlan devices with the same
lower device. The only exception is the passthru [sic] mode,
which shares the lower device address.
When duplicate addresses are detected, EBUSY is returned when bringing
the interface up:
# ip link add macvlan0 link eth0 type macvlan
# read addr </sys/class/net/eth0/address
# ip link set macvlan0 address $addr
# ip link set macvlan0 up
RTNETLINK answers: Device or resource busy
Use correct error code which is EADDRINUSE, and do the check also
earlier, on address change:
# ip link set macvlan0 address $addr
RTNETLINK answers: Address already in use
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Willem de Bruijn says:
====================
udp msg_zerocopy
Enable MSG_ZEROCOPY for udp sockets
Patch 1/3 is the main patch, a rework of RFC patch
http://patchwork.ozlabs.org/patch/899630/
more details in the patch commit message
Patch 2/3 is an optimization to remove a branch from the UDP hot path
and refcount_inc/refcount_dec_and_test pair when zerocopy is used.
This used to be included in the first patch in v2.
Patch 3/3 runs the already existing udp zerocopy tests
as part of kselftest
See also recent Linux Plumbers presentation
https://linuxplumbersconf.org/event/2/contributions/106/attachments/104/128/willemdebruijn-lpc2018-udpgso-presentation-20181113.pdf
Changes:
v1 -> v2
- Fixup reverse christmas tree violation
v2 -> v3
- Split refcount avoidance optimization into separate patch
- Fix refcount leak on error in fragmented case
(thanks to Paolo Abeni for pointing this one out!)
- Fix refcount inc on zero
v3 -> v4
- Move skb_zcopy_set below the only kfree_skb that might cause
a premature uarg destroy before skb_zerocopy_put_abort
- Move the entire skb_shinfo assignment block, to keep that
cacheline access in one place
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Both msg_zerocopy and udpgso_bench have udp zerocopy variants.
Exercise these as part of the standard kselftest run.
With udp, msg_zerocopy has no control channel. Ensure that the
receiver exits after the sender by accounting for the initial
delay in starting them (in msg_zerocopy.sh).
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info.
Release of its last reference triggers a completion notification.
The TCP stack in tcp_sendmsg_locked holds an extra ref independent of
the skbs, because it can build, send and free skbs within its loop,
possibly reaching refcount zero and freeing the ubuf_info too soon.
The UDP stack currently also takes this extra ref, but does not need
it as all skbs are sent after return from __ip(6)_append_data.
Avoid the extra refcount_inc and refcount_dec_and_test, and generally
the sock_zerocopy_put in the common path, by passing the initial
reference to the first skb.
This approach is taken instead of initializing the refcount to 0, as
that would generate error "refcount_t: increment on 0" on the
next skb_zcopy_set.
Changes
v3 -> v4
- Move skb_zcopy_set below the only kfree_skb that might cause
a premature uarg destroy before skb_zerocopy_put_abort
- Move the entire skb_shinfo assignment block, to keep that
cacheline access in one place
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Extend zerocopy to udp sockets. Allow setting sockopt SO_ZEROCOPY and
interpret flag MSG_ZEROCOPY.
This patch was previously part of the zerocopy RFC patchsets. Zerocopy
is not effective at small MTU. With segmentation offload building
larger datagrams, the benefit of page flipping outweights the cost of
generating a completion notification.
tools/testing/selftests/net/msg_zerocopy.sh after applying follow-on
test patch and making skb_orphan_frags_rx same as skb_orphan_frags:
ipv4 udp -t 1
tx=191312 (11938 MB) txc=0 zc=n
rx=191312 (11938 MB)
ipv4 udp -z -t 1
tx=304507 (19002 MB) txc=304507 zc=y
rx=304507 (19002 MB)
ok
ipv6 udp -t 1
tx=174485 (10888 MB) txc=0 zc=n
rx=174485 (10888 MB)
ipv6 udp -z -t 1
tx=294801 (18396 MB) txc=294801 zc=y
rx=294801 (18396 MB)
ok
Changes
v1 -> v2
- Fixup reverse christmas tree violation
v2 -> v3
- Split refcount avoidance optimization into separate patch
- Fix refcount leak on error in fragmented case
(thanks to Paolo Abeni for pointing this one out!)
- Fix refcount inc on zero
- Test sock_flag SOCK_ZEROCOPY directly in __ip_append_data.
This is needed since commit 5cf4a8532c99 ("tcp: really ignore
MSG_ZEROCOPY if no SO_ZEROCOPY") did the same for tcp.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In sctp_hash_transport/sctp_epaddr_lookup_transport, it dereferences
a transport's asoc under rcu_read_lock while asoc is freed not after
a grace period, which leads to a use-after-free panic.
This patch fixes it by calling kfree_rcu to make asoc be freed after
a grace period.
Note that only the asoc's memory is delayed to free in the patch, it
won't cause sk to linger longer.
Thanks Neil and Marcelo to make this clear.
Fixes: 7fda702f9315 ("sctp: use new rhlist interface on sctp transport rhashtable")
Fixes: cd2b70875058 ("sctp: check duplicate node before inserting a new transport")
Reported-by: syzbot+0b05d8aa7cb185107483@syzkaller.appspotmail.com
Reported-by: syzbot+aad231d51b1923158444@syzkaller.appspotmail.com
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit a5681e20b541 ("net/ibmnvic: Fix deadlock problem
in reset") made the change to hold the RTNL lock during
driver reset but still calls netdev_notify_peers, which
results in a deadlock. Instead, use call_netdevice_notifiers,
which is functionally the same except that it does not
take the RTNL lock again.
Fixes: a5681e20b541 ("net/ibmnvic: Fix deadlock problem in reset")
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 78139c94dc8c ("net: vhost: lock the vqs one by one") moved the vq
lock to improve scalability, but introduced a possible deadlock in
vhost-iotlb. vhost_iotlb_notify_vq() now takes vq->mutex while holding
the device's IOTLB spinlock. And on the vhost_iotlb_miss() path, the
spinlock is taken while holding vq->mutex.
Since calling vhost_poll_queue() doesn't require any lock, avoid the
deadlock by not taking vq->mutex.
Fixes: 78139c94dc8c ("net: vhost: lock the vqs one by one")
Acked-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|