Age | Commit message (Collapse) | Author |
|
Pull NFS client bugfixes from Anna Schumaker:
"This contains two delegation fixes (with the RCU lock leak fix marked
for stable), and three patches to fix destroying the the sunrpc back
channel.
Stable bugfixes:
- Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
Other fixes:
- The TCP back channel mustn't disappear while requests are
outstanding
- The RDMA back channel mustn't disappear while requests are
outstanding
- Destroy the back channel when we destroy the host transport
- Don't allow a cached open with a revoked delegation"
* tag 'nfs-for-5.4-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS: Fix an RCU lock leak in nfs4_refresh_delegation_stateid()
NFSv4: Don't allow a cached open with a revoked delegation
SUNRPC: Destroy the back channel when we destroy the host transport
SUNRPC: The RDMA back channel mustn't disappear while requests are outstanding
SUNRPC: The TCP back channel mustn't disappear while requests are outstanding
|
|
Pull block fixes from Jens Axboe:
- Two small nvme fixes, one is a fabrics connection fix, the other one
a cleanup made possible by that fix (Anton, via Keith)
- Fix requeue handling in umb ubd (Anton)
- Fix spin_lock_irq() nesting in blk-iocost (Dan)
- Three small io_uring fixes:
- Install io_uring fd after done with ctx (me)
- Clear ->result before every poll issue (me)
- Fix leak of shadow request on error (Pavel)
* tag 'for-linus-20191101' of git://git.kernel.dk/linux-block:
iocost: don't nest spin_lock_irq in ioc_weight_write()
io_uring: ensure we clear io_kiocb->result before each issue
um-ubd: Entrust re-queue to the upper layers
nvme-multipath: remove unused groups_only mode in ana log
nvme-multipath: fix possible io hang after ctrl reconnect
io_uring: don't touch ctx in setup after ring fd install
io_uring: Fix leaked shadow_req
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
"One fix for PCIe users:
- Fix legacy PCI I/O port access emulation
One set of cleanups:
- Resolve most of the warnings generated by sparse across arch/riscv.
No functional changes
And one MAINTAINERS update:
- Update Palmer's E-mail address"
* tag 'riscv/for-v5.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
MAINTAINERS: Change to my personal email address
RISC-V: Add PCIe I/O BAR memory mapping
riscv: for C functions called only from assembly, mark with __visible
riscv: fp: add missing __user pointer annotations
riscv: add missing header file includes
riscv: mark some code and data as file-static
riscv: init: merge split string literals in preprocessor directive
riscv: add prototypes for assembly language functions from head.S
|
|
Björn Töpel says:
====================
This set consists of three patches from Maciej and myself which are
optimizing the XSKMAP lookups. In the first patch, the sockets are
moved to be stored at the tail of the struct xsk_map. The second
patch, Maciej implements map_gen_lookup() for XSKMAP. The third patch,
introduced in this revision, moves various XSKMAP functions, to permit
the compiler to do more aggressive inlining.
Based on the XDP program from tools/lib/bpf/xsk.c where
bpf_map_lookup_elem() is explicitly called, this work yields a 5%
improvement for xdpsock's rxdrop scenario. The last patch yields 2%
improvement.
Jonathan's Acked-by: for patch 1 and 2 was carried on. Note that the
overflow checks are done in the bpf_map_area_alloc() and
bpf_map_charge_init() functions, which was fixed in commit
ff1c08e1f74b ("bpf: Change size to u64 for bpf_map_{area_alloc,
charge_init}()").
[1] https://patchwork.ozlabs.org/patch/1186170/
v1->v2: * Change size/cost to size_t and use {struct, array}_size
where appropriate. (Jakub)
v2->v3: * Proper commit message for patch 2.
v3->v4: * Change size_t to u64 to handle 32-bit overflows. (Jakub)
* Introduced patch 3.
v4->v5: * Use BPF_SIZEOF size, instead of BPF_DW, for correct
pointer-sized loads. (Daniel)
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
In this commit the XSKMAP entry lookup function used by the XDP
redirect code is moved from the xskmap.c file to the xdp_sock.h
header, so the lookup can be inlined from, e.g., the
bpf_xdp_redirect_map() function.
Further the __xsk_map_redirect() and __xsk_map_flush() is moved to the
xsk.c, which lets the compiler inline the xsk_rcv() and xsk_flush()
functions.
Finally, all the XDP socket functions were moved from linux/bpf.h to
net/xdp_sock.h, where most of the XDP sockets functions are anyway.
This yields a ~2% performance boost for the xdpsock "rx_drop"
scenario.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191101110346.15004-4-bjorn.topel@gmail.com
|
|
Inline the xsk_map_lookup_elem() via implementing the map_gen_lookup()
callback. This results in emitting the bpf instructions in place of
bpf_map_lookup_elem() helper call and better performance of bpf
programs.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/bpf/20191101110346.15004-3-bjorn.topel@gmail.com
|
|
Prior this commit, the array storing XDP socket instances were stored
in a separate allocated array of the XSKMAP. Now, we store the sockets
as a flexible array member in a similar fashion as the arraymap. Doing
so, we do less pointer chasing in the lookup.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Link: https://lore.kernel.org/bpf/20191101110346.15004-2-bjorn.topel@gmail.com
|
|
We have seen many crashes on powerpc hosts while loading bpf programs.
The problem here is that bpf_int_jit_compile() does a first pass
to compute the program length.
Then it allocates memory to store the generated program and
calls bpf_jit_build_body() a second time (and a third time
later)
What I have observed is that the second bpf_jit_build_body()
could end up using few more words than expected.
If bpf_jit_binary_alloc() put the space for the program
at the end of the allocated page, we then write on
a non mapped memory.
It appears that bpf_jit_emit_tail_call() calls
bpf_jit_emit_common_epilogue() while ctx->seen might not
be stable.
Only after the second pass we can be sure ctx->seen wont be changed.
Trying to avoid a second pass seems quite complex and probably
not worth it.
Fixes: ce0761419faef ("powerpc/bpf: Implement support for tail calls")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20191101033444.143741-1-edumazet@google.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
"Fix a parisc kernel crash with ftrace functions when compiled without
frame pointers"
* 'parisc-5.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix frame pointer in ftrace_regs_caller()
|
|
Jakub Kicinski says:
====================
fix BPF offload related bugs
test_offload.py catches some recently added bugs.
First of a bug in test_offload.py itself after recent changes
to netdevsim is fixed.
Second patch fixes a bug in cls_bpf, and last one addresses
a problem with the recently added XDP installation optimization.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When netdevice with offloaded BPF programs is destroyed
the programs are orphaned and removed from the program
IDA - their IDs get released (the programs may remain
accessible via existing open file descriptors and pinned
files). After IDs are released they are set to 0.
This confuses dev_change_xdp_fd() because it compares
the __dev_xdp_query() result where 0 means no program
with prog->aux->id where 0 means orphaned.
dev_change_xdp_fd() would have incorrectly returned success
even though it had not installed the program.
Since drivers already catch this case via bpf_offload_dev_match()
let them handle this case. The error message drivers produce in
this case ("program loaded for a different device") is in fact
correct as the orphaned program must had to be loaded for a
different device.
Fixes: c14a9f633d9e ("net: Don't call XDP_SETUP_PROG when nothing is changed")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 401192113730 ("net: sched: refactor block offloads counter
usage") missed the fact that either new prog or old prog may be
NULL.
Fixes: 401192113730 ("net: sched: refactor block offloads counter usage")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
DebugFS for netdevsim now contains some "action trigger" files
which are write only. Don't try to capture the contents of those.
Note that we can't use os.access() because the script requires
root.
Fixes: 4418f862d675 ("netdevsim: implement support for devlink region and snapshots")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This test reports EINVAL for getsockopt(SOL_SOCKET, SO_DOMAIN)
occasionally due to the uninitialized length parameter.
Initialize it to fix this, and also use int for "test_family" to comply
with the API standard.
Fixes: d6a61f80b871 ("soreuseport: test mixed v4/v6 sockets")
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Wei Wang <weiwan@google.com>
Cc: Craig Gallek <cgallek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As reported in [0] at least one RTL8168dp version has problems
establishing a link. This chip version has an integrated RTL8211b PHY,
however the chip seems to report a wrong PHY ID, resulting in a wrong
PHY driver (for Generic Realtek PHY) being loaded.
Work around this issue by adding a hook to r8168dp_2_mdio_read()
for returning the correct PHY ID.
[0] https://bbs.archlinux.org/viewtopic.php?id=246508
Fixes: 242cd9b5866a ("r8169: use phy_resume/phy_suspend")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Since it became possible for the DSA core to use a CPU port different
than 8, our bcm_sf2_imp_setup() function was broken because it assumes
that registers are applicable to port 8. In particular, the port's MAC
is going to stay disabled, so make sure we clear the RX_DIS and TX_DIS
bits if we are not configured for port 8.
Fixes: 9f91484f6fcc ("net: dsa: make "label" property optional for dsa2")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The phylink_dbg() macro does not follow dynamic debug or defined(DEBUG)
and as a result, it spams the kernel log since a PR_DEBUG level is
currently used. Fix it to be defined appropriately whether
CONFIG_DYNAMIC_DEBUG or defined(DEBUG) are set.
Fixes: 17091180b152 ("net: phylink: Add phylink_{printk, err, warn, info, dbg} macros")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Synces the DMA buffer properly in order for CPU and device to see
the most up-to-data data.
Signed-off-by: Yangchun Fu <yangchun@google.com>
Reviewed-by: Catherine Sullivan <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Historically linux tried to stick to RFC 791, 1122, 2003
for IPv4 ID field generation.
RFC 6864 made clear that no matter how hard we try,
we can not ensure unicity of IP ID within maximum
lifetime for all datagrams with a given source
address/destination address/protocol tuple.
Linux uses a per socket inet generator (inet_id), initialized
at connection startup with a XOR of 'jiffies' and other
fields that appear clear on the wire.
Thiemo Nagel pointed that this strategy is a privacy
concern as this provides 16 bits of entropy to fingerprint
devices.
Let's switch to a random starting point, this is just as
good as far as RFC 6864 is concerned and does not leak
anything critical.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Thiemo Nagel <tnagel@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Until now SW steering supported matchers that are IPv4 and IPv6.
The limitation was mixed matchers in which the outer header IP version
was different from the inner header IP version.
To support the mixed matcher we create all the possible ste_builder
combinations, once we create a rule we select the correct one to
be used for rule creation.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Instead of using explicit indexes, simply use affinity
type enumerators to make the code more readable.
Fixes: 544fe7c2e654 ("net/mlx5e: Activate HW multipath and handle port affinity based on FIB events")
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Instead of using explicit array indexes, simply use
ports enumerators to make the code more readable.
Fixes: 7907f23adc18 ("net/mlx5: Implement RoCE LAG feature")
Signed-off-by: Erez Alfasi <ereza@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
when debug a bug, which triggers TX hang, and kernel log is
spammed with the following info message
[ 1172.044764] mlx5_core 0000:21:00.0: cmd_work_handler:930:(pid 8):
failed to allocate command entry
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Add support for rewriting of DSCP part of ToS field.
Next commands, for example, can be used to offload rewrite action:
OVS:
$ ovs-ofctl add-flow ovs-sriov "ip, in_port=REP, \
actions=mod_nw_tos:68, output:NIC"
iproute2 (used retain mask, as tc command rewrite whole ToS field):
$ tc filter add dev REP ingress protocol ip prio 1 flower skip_sw \
ip_proto icmp action pedit munge ip tos set 68 retain 0xfc pipe \
action mirred egress redirect dev NIC
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
This patch doesn't change any functionality, but is a pre-step for
adding support for rewriting of bit-sized fields, like DSCP and ECN
in IPv4 header, similar fields in IPv6, etc.
Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Move short Work Queue API getter functions into the WQ
header file.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Dump the Work Queue's TX WQE descriptor when a completion with
error is received.
Example:
[5.331832] mlx5_core 0000:00:04.0 enp0s4: Error cqe on cqn 0xa, ci 0x1, TXQ-SQ qpn 0xe, opcode 0xd, syndrome 0x2, vendor syndrome 0x0
[5.333127] 00000000: 55 65 02 75 31 fe c2 d2 6b 6c 62 1e f9 e1 d8 5c
[5.333837] 00000010: d3 b2 6c b8 89 e4 84 20 0b f4 3c e0 f3 75 41 ca
[5.334568] 00000020: 46 00 00 00 cd 70 a0 92 18 3a 01 de 00 00 00 00
[5.335313] 00000030: 7d bc 05 89 b2 e9 00 02 1e 00 00 0e 00 00 30 d2
[5.335972] WQE DUMP: WQ size 1024 WQ cur size 0, WQE index 0x0, len: 64
[5.336710] 00000000: 00 00 00 1e 00 00 0e 04 00 00 00 08 00 00 00 00
[5.337524] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 12 33 33
[5.338151] 00000020: 00 00 00 16 52 54 00 00 00 01 86 dd 60 00 00 00
[5.338740] 00000030: 00 00 00 48 00 00 00 00 00 00 00 00 66 ba 58 14
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
During connection tracking offloads with high number of connections,
(40K connections per second), flow table group lock contention is
observed.
To improve the performance by reducing lock contention, lockless
FTE read lookup is performed as described below.
Each flow table entry is refcounted.
Flow table entry is removed when refcount drops to zero.
rhash table allows rcu protected lookup.
Each hash table entry insertion and removal is write lock protected.
Hence, it is possible to perform lockless lookup in rhash table using
following scheme.
(a) Guard FTE entry lookup per group using rcu read lock.
(b) Before freeing the FTE entry, wait for all readers to finish
accessing the FTE.
Below example of one reader and write in parallel racing, shows
protection in effect with rcu lock.
lookup_fte_locked()
rcu_read_lock();
search_hash_table()
existing_flow_group_write_lock();
tree_put_node(fte)
drop_ref_cnt(fte)
del_sw_fte(fte)
del_hash_table_entry();
call_rcu();
existing_flow_group_write_unlock();
get_ref_cnt(fte) fails
rcu_read_unlock();
rcu grace period();
[..]
kmem_cache_free(fte);
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
FTE memory allocation using alloc_fte() doesn't have any dependency
on the flow group.
Hence, do not hold flow group lock while performing alloc_fte().
This helps to reduce contention of flow group lock.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Currently, mlx5 tc layer doesn't verify that rule has at least one forward
or drop action which leads to following firmware syndrome when user tries
to offload such action:
[ 1824.860501] mlx5_core 0000:81:00.0: mlx5_cmd_check:753:(pid 29458): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x144b7a)
Add check at the end of parse_tc_fdb_actions() that verifies that resulting
attribute has action fwd or drop flag set.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
When setting number of VFs to 0 (disable SRIOV), clear VF's
configuration.
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
mlx5_unload_one do not need local variable to store different value,
Hence just remove it.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Not all mlx5 cards with FPGA device use it for network processing.
mlx5_core driver configures network connection to FPGA device
for all mlx5 cards with installed FPGA. If FPGA is not a part of
network path, driver crashes in this case
Check FPGA name in function mlx5_fpga_device_start() and continue
integrate FPGA into packets flow only for dedicated cards.
Currently there are Newton and Edison cards.
Signed-off-by: Igor Leshenko <igorle@mellanox.com>
Reviewed-by: Meir Lichtinger <meirl@mellanox.com>
Reviewed-by: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Use kernel function to calculate crc32 Instead of dr implementation
since it has the same algorithm "slice by 8".
Fixes: 26d688e33f88 ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: Hamdan Igbaria <hamdani@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2019-11-01
This series contains updates to e1000, igb, igc, ixgbe, i40e and driver
documentation.
Lyude Paul fixes an issue where a fatal read error occurs when the
device is unplugged from the machine. So change the read error into a
warn while the device is still present.
Manfred Rudigier found that the i350 device was not apart of the "Media
Auto Sense" feature, yet the device supports it. So add the missing
i350 device to the check and fix an issue where the media auto sense
would flip/flop when no cable was connected to the port causing spurious
kernel log messages.
I fixed an issue where the fix to resolve receive buffer starvation was
applied in more than one place in the driver, one being the incorrect
location in the i40e driver.
Wenwen Wang fixes a potential memory leak in e1000 where allocated
memory is not properly cleaned up in one of the error paths.
Jonathan Neuschäfer cleans up the driver documentation to be consistent
and remove the footnote reference, since the footnote no longer exists in
the documentation.
Igor Pylypiv cleans up a duplicate clearing of a bit, no need to clear
it twice.
v2: Fixed alignment issue in patch 3 of the series based on community
feedback.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Instead of deciding a given device is virtual function or
not based on a device is PF or not, use already defined
MLX5_COREDEV_VF by introducing an helper API mlx5_core_is_vf().
This enables to clearly identify PF, VF and non virtual functions.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Currently on ECPF, metadata is enabled on the ECPF vport = 0xfffe
(manager vport).
Metadata when supported, must be enabled on own vport which is
used to pass metadata to vport of NIC Rx Flow Table.
Due to this error, traffic tagged by ingress ACL is not processed
correctly at NIC rx flow table level which is supposed to work
on metadata tag.
Hence, instead of working on eswitch manager vport, always working on
eswitch own vport regardless of PF or ECPF.
Given that mlx5_eswitch_query/modify_esw_vport_context() is used to
access other vport in legacy mode and own vport settings in switchdev mode,
extend low level API to explicitly specify other_vport.
Fixes: c1286050cf47 ("net/mlx5: E-Switch, Pass metadata from FDB to eswitch manager")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Drop, untagged, spoof check and untagged spoof check flow groups are
limited to legacy mode only.
Therefore, following refactoring is done to
(a) improve code readability
(b) have better code split between legacy and offloads mode
1. Move legacy flow groups under legacy structure
2. Add validity check for group deletion
3. Restrict scope of esw_vport_disable_ingress_acl to legacy mode
4. Rename esw_vport_enable_ingress_acl() to
esw_vport_create_ingress_acl_table() and limit its scope to
table creation
5. Introduce legacy flow groups creation helper
esw_legacy_create_ingress_acl_groups() and keep its scope to legacy mode
6. Reduce offloads ingress groups from 4 to just 1 metadata group
per vport
7. Removed redundant IS_ERR_OR_NULL as entries are marked NULL on free.
8. Shortern error message to remove redundant 'E-switch'
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Now that there is clear separation for acl setup/cleanup between legacy
and offloads mode, limit metdata disablement to offloads mode.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Currently legacy mode enables ACL while enabling vport, while offloads
mode enable ACL when moving to offloads mode.
Bring consistency to both modes by enabling/disabling ACL when
enabling/disabling a vport.
It also eliminates creating ingress ACL table on unused ECPF vport in
offloads mode.
Signed-off-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Introduce and use per vport ACL tables creation and destroy APIs, so that
subsequently patch can use them during enabling/disabling a vport.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
It is better to create/destroy ACL related drop counters where the actual
drop rule ACLs are created/destroyed, so that ACL configuration is self
contained for ingress and egress.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Introduce and use per vport ACL tables creation and destroy APIs, so that
subsequently patch can use them during enabling/disabling a vport in
unified way for legacy vs offloads mode.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
In subsequent patch, esw_enable_vport() could fail and return error.
Prepare code to handle such error.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
When eswitch is disabled, vport event handler is unregistered.
This unregistration already synchronizes with running EQ event handler
in below code flow.
mlx5_eswitch_disable()
mlx5_eswitch_event_handlers_unregister()
mlx5_eq_notifier_unregister()
atomic_notifier_chain_unregister()
synchronize_rcu()
notifier_callchain
eswitch_vport_event()
queue_work()
Additionally vport->enabled flag is set under state_lock during
esw_enable_vport() but is not read under state_lock in
(a) esw_disable_vport() and (b) under atomic context
eswitch_vport_event().
It is also necessary to synchronize with already scheduled vport event.
This is already achieved using below sequence.
mlx5_eswitch_event_handlers_unregister()
[..]
flush_workqueue()
Hence,
(a) Remove vport->enabled check in eswitch_vport_event() which
doesn't make any sense.
(b) Remove redundant flush_workqueue() on every vport disable.
(c) Keep esw_disable_vport() symmetric with esw_enable_vport() for
state_lock.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
To improve code readability, move legacy drop counters and droup rule
under legacy structure.
While at it,
(a) prefix drop flow counters helper with legacy_.
(b) nullify the rule pointers only if they were valid.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Metadata fields are offload mode specific.
To improve code readability, move metadata under offloads structure.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
fdb_table is used for both legacy and offloads mode.
It was incorrect to comment that fdb_table is legacy specific.
Hence, fix the comment to reflect that fdb_table is used in legacy and
offloads mode.
Fixes: 131ce7014043 ("net/mlx5: E-Switch, Remove redundant mc_promisc NULL check")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Currently esw_enable_vport() does vport check for zero to enable drop
counters regardless of execution on ECPF/PF.
While esw_disable_vport() considers such scenario.
To keep consistency across code for checking for manager_vport,
introduce and use mlx5_esw_is_manager_vport() to check if a specified
vport is eswitch manager vport or not.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Vu Pham <vuhuong@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|