summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-01-06mlxsw: spectrum_acl_bloom_filter: Make mlxsw_sp_acl_bf_key_encode() more ↵Amit Cohen
flexible Spectrum-4 will calculate hash function for bloom filter differently from the existing ASICs. One of the changes is related to the way that the chunks will be build - without padding. As preparation for support of Spectrum-4 bloom filter, make mlxsw_sp_acl_bf_key_encode() more flexible, so it will be able to use it for Spectrum-4 as well. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06mlxsw: spectrum_acl_bloom_filter: Reorder functions to make the code more ↵Amit Cohen
aesthetic Currently, mlxsw_sp_acl_bf_rule_count_index_get() is implemented before mlxsw_sp_acl_bf_index_get() but is used after it. Adding a new function for Spectrum-4 would make them further apart still. Fix by moving them around. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06mlxsw: Introduce flex key elements for Spectrum-4Amit Cohen
Spectrum-4 ASIC will support more virtual routers and local ports compared to the existing ASICs. Therefore, the virtual router and local port ACL key elements need to be increased. Introduce new key elements for Spectrum-4 to be aligned with the elements used already for other Spectrum ASICs. The key blocks layout is the same for Spectrum-4, so use the existing code for encode_block() and clear_block(), just create separate blocks. Note that size of `VIRT_ROUTER_MSB` is 4 bits in Spectrum-4, therefore declare it using `MLXSW_AFK_ELEMENT_INST_U32()`, in order to be able to set `.avoid_size_check` to true. Otherwise, `mlxsw_afk_blocks_check()` will fail and warn. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06mlxsw: Rename virtual router flex key elementAmit Cohen
In Spectrum-4, the size of the virtual router ACL key element increased from 11 bits to 12 bits. In order to reuse the existing virtual router ACL key element enumerators for Spectrum-4, rename 'VIRT_ROUTER_8_10' and 'VIRT_ROUTER_0_7' to 'VIRT_ROUTER_MSB' and 'VIRT_ROUTER_LSB', respectively. No functional changes intended. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06Merge branch 'dpaa2-eth-small-cleanup'Jakub Kicinski
Ioana Ciornei says: ==================== dpaa2-eth: small cleanup These 3 patches are just part of a small cleanup on the dpaa2-eth and the dpaa2-switch drivers. In case we are hitting a case in which the fwnode of the root dprc device we initiate a deferred probe. On the dpaa2-switch side, if we are on the remove path, make sure that we check for a non-NULL pointer before accessing the port private structure. ==================== Link: https://lore.kernel.org/r/20220106135905.81923-1-ioana.ciornei@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06dpaa2-switch: check if the port priv is validIoana Ciornei
Before accessing the port private structure make sure that there is still a non-NULL pointer there. A NULL pointer access can happen when we are on the remove path, some switch ports are unregistered and some are in the process of unregistering. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06dpaa2-mac: return -EPROBE_DEFER from dpaa2_mac_open in case the fwnode is ↵Ioana Ciornei
not set We could get into a situation when the fwnode of the parent device is not yet set because its probe didn't yet finish. When this happens, any caller of the dpaa2_mac_open() will not have the fwnode available, thus cause problems at the PHY connect time. Avoid this by just returning -EPROBE_DEFER from the dpaa2_mac_open when this happens. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06dpaa2-mac: bail if the dpmacs fwnode is not foundRobert-Ionut Alexa
The parent pointer node handler must be declared with a NULL initializer. Before using it, a check must be performed to make sure that a valid address has been assigned to it. Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Refcount leak in ipt_CLUSTERIP rule loading path, from Xin Xiong. 2) Use socat in netfilter selftests, from Hangbin Liu. 3) Skip layer checksum 4 update for IP fragments. 4) Missing allocation of pcpu scratch maps on clone in nft_set_pipapo, from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: netfilter: nft_set_pipapo: allocate pcpu scratch maps on clone netfilter: nft_payload: do not update layer 4 checksum when mangling fragments selftests: netfilter: switch to socat for tests using -q option netfilter: ipt_CLUSTERIP: fix refcount leak in clusterip_tg_check() ==================== Link: https://lore.kernel.org/r/20220106215139.170824-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Alexei Starovoitov says: ==================== pull-request: bpf-next 2022-01-06 We've added 41 non-merge commits during the last 2 day(s) which contain a total of 36 files changed, 1214 insertions(+), 368 deletions(-). The main changes are: 1) Various fixes in the verifier, from Kris and Daniel. 2) Fixes in sockmap, from John. 3) bpf_getsockopt fix, from Kuniyuki. 4) INET_POST_BIND fix, from Menglong. 5) arm64 JIT fix for bpf pseudo funcs, from Hou. 6) BPF ISA doc improvements, from Christoph. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (41 commits) bpf: selftests: Add bind retry for post_bind{4, 6} bpf: selftests: Use C99 initializers in test_sock.c net: bpf: Handle return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() bpf/selftests: Test bpf_d_path on rdonly_mem. libbpf: Add documentation for bpf_map batch operations selftests/bpf: Don't rely on preserving volatile in PT_REGS macros in loop3 xdp: Add xdp_do_redirect_frame() for pre-computed xdp_frames xdp: Move conversion to xdp_frame out of map functions page_pool: Store the XDP mem id page_pool: Add callback to init pages when they are allocated xdp: Allow registering memory model without rxq reference samples/bpf: xdpsock: Add timestamp for Tx-only operation samples/bpf: xdpsock: Add time-out for cleaning Tx samples/bpf: xdpsock: Add sched policy and priority support samples/bpf: xdpsock: Add cyclic TX operation capability samples/bpf: xdpsock: Add clockid selection support samples/bpf: xdpsock: Add Dest and Src MAC setting for Tx-only operation samples/bpf: xdpsock: Add VLAN support for Tx-only operation libbpf 1.0: Deprecate bpf_object__find_map_by_offset() API libbpf 1.0: Deprecate bpf_map__is_offload_neutral() ... ==================== Link: https://lore.kernel.org/r/20220107013626.53943-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-01-06Merge branch 'net: bpf: handle return value of post_bind{4,6} and add ↵Alexei Starovoitov
selftests for it' Menglong Dong says: ==================== From: Menglong Dong <imagedong@tencent.com> The return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() in __inet_bind() is not handled properly. While the return value is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and exit: err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk); if (err) { inet->inet_saddr = inet->inet_rcv_saddr = 0; goto out_release_sock; } Let's take UDP for example and see what will happen. For UDP socket, it will be added to 'udp_prot.h.udp_table->hash' and 'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port() called success. If 'inet->inet_rcv_saddr' is specified here, then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong to (because inet_saddr is changed to 0), and UDP packet received will not be passed to this sock. If 'inet->inet_rcv_saddr' is not specified here, the sock will work fine, as it can receive packet properly, which is wired, as the 'bind()' is already failed. To undo the get_port() operation, introduce the 'put_port' field for 'struct proto'. For TCP proto, it is inet_put_port(); For UDP proto, it is udp_lib_unhash(); For icmp proto, it is ping_unhash(). Therefore, after sys_bind() fail caused by BPF_CGROUP_RUN_PROG_INET4_POST_BIND(), it will be unbinded, which means that it can try to be binded to another port. The second patch use C99 initializers in test_sock.c The third patch is the selftests for this modification. Changes since v4: - use C99 initializers in test_sock.c before adding the test case Changes since v3: - add the third patch which use C99 initializers in test_sock.c Changes since v2: - NULL check for sk->sk_prot->put_port Changes since v1: - introduce 'put_port' field for 'struct proto' - add selftests for it ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-06bpf: selftests: Add bind retry for post_bind{4, 6}Menglong Dong
With previous patch, kernel is able to 'put_port' after sys_bind() fails. Add the test for that case: rebind another port after sys_bind() fails. If the bind success, it means previous bind operation is already undoed. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220106132022.3470772-4-imagedong@tencent.com
2022-01-06bpf: selftests: Use C99 initializers in test_sock.cMenglong Dong
Use C99 initializers for the initialization of 'tests' in test_sock.c. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220106132022.3470772-3-imagedong@tencent.com
2022-01-06net: bpf: Handle return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND()Menglong Dong
The return value of BPF_CGROUP_RUN_PROG_INET{4,6}_POST_BIND() in __inet_bind() is not handled properly. While the return value is non-zero, it will set inet_saddr and inet_rcv_saddr to 0 and exit: err = BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk); if (err) { inet->inet_saddr = inet->inet_rcv_saddr = 0; goto out_release_sock; } Let's take UDP for example and see what will happen. For UDP socket, it will be added to 'udp_prot.h.udp_table->hash' and 'udp_prot.h.udp_table->hash2' after the sk->sk_prot->get_port() called success. If 'inet->inet_rcv_saddr' is specified here, then 'sk' will be in the 'hslot2' of 'hash2' that it don't belong to (because inet_saddr is changed to 0), and UDP packet received will not be passed to this sock. If 'inet->inet_rcv_saddr' is not specified here, the sock will work fine, as it can receive packet properly, which is wired, as the 'bind()' is already failed. To undo the get_port() operation, introduce the 'put_port' field for 'struct proto'. For TCP proto, it is inet_put_port(); For UDP proto, it is udp_lib_unhash(); For icmp proto, it is ping_unhash(). Therefore, after sys_bind() fail caused by BPF_CGROUP_RUN_PROG_INET4_POST_BIND(), it will be unbinded, which means that it can try to be binded to another port. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220106132022.3470772-2-imagedong@tencent.com
2022-01-06Revert "net/mlx5: Add retry mechanism to the command entry index allocation"Moshe Shemesh
This reverts commit 410bd754cd73c4a2ac3856d9a03d7b08f9c906bf. The reverted commit had added a retry mechanism to the command entry index allocation. The previous patch ensures that there is a free command entry index once the command work handler holds the command semaphore. Thus the retry mechanism is not needed. Fixes: 410bd754cd73 ("net/mlx5: Add retry mechanism to the command entry index allocation") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5: Set command entry semaphore up once got index freeMoshe Shemesh
Avoid a race where command work handler may fail to allocate command entry index, by holding the command semaphore down till command entry index is being freed. Fixes: 410bd754cd73 ("net/mlx5: Add retry mechanism to the command entry index allocation") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5e: Sync VXLAN udp ports during uplink representor profile changeMaor Dickman
Currently during NIC profile disablement all VXLAN udp ports offloaded to the HW are flushed and during its enablement the driver send notification to the stack to inform the core that the entire UDP tunnel port state has been lost, uplink representor doesn't have the same behavior which can cause VXLAN udp ports offload to be in bad state while moving between modes while VXLAN interface exist. Fixed by aligning the uplink representor profile behavior to the NIC behavior. Fixes: 84db66124714 ("net/mlx5e: Move set vxlan nic info to profile init") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5: Fix access to sf_dev_table on allocation failureShay Drory
Even when SF devices are supported, the SF device table allocation can still fail. In such case mlx5_sf_dev_supported still reports true, but SF device table is invalid. This can result in NULL table access. Hence, fix it by adding NULL table check. Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5e: Fix matching on modified inner ip_ecn bitsPaul Blakey
Tunnel device follows RFC 6040, and during decapsulation inner ip_ecn might change depending on inner and outer ip_ecn as follows: +---------+----------------------------------------+ |Arriving | Arriving Outer Header | | Inner +---------+---------+---------+----------+ | Header | Not-ECT | ECT(0) | ECT(1) | CE | +---------+---------+---------+---------+----------+ | Not-ECT | Not-ECT | Not-ECT | Not-ECT | <drop> | | ECT(0) | ECT(0) | ECT(0) | ECT(1) | CE* | | ECT(1) | ECT(1) | ECT(1) | ECT(1)* | CE* | | CE | CE | CE | CE | CE | +---------+---------+---------+---------+----------+ Cells marked above are changed from original inner packet ip_ecn value. Tc then matches on the modified inner ip_ecn, but hw offload which matches the inner ip_ecn value before decap, will fail. Fix that by mapping all the cases of outer and inner ip_ecn matching, and only supporting cases where we know inner wouldn't be changed by decap, or in the outer ip_ecn=CE case, inner ip_ecn didn't matter. Fixes: bcef735c59f2 ("net/mlx5e: Offload TC matching on tos/ttl for ip tunnels") Signed-off-by: Paul Blakey <paulb@nvidia.com> Reviewed-by: Oz Shlomo <ozsh@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06Revert "net/mlx5e: Block offload of outer header csum for GRE tunnel"Aya Levin
This reverts commit 54e1217b90486c94b26f24dcee1ee5ef5372f832. Although the NIC doesn't support offload of outer header CSUM, using gso_partial_features allows offloading the tunnel's segmentation. The driver relies on the stack CSUM calculation of the outer header. For this, NETIF_F_GSO_GRE_CSUM must be a member of the device's features. Fixes: 54e1217b9048 ("net/mlx5e: Block offload of outer header csum for GRE tunnel") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06Revert "net/mlx5e: Block offload of outer header csum for UDP tunnels"Aya Levin
This reverts commit 6d6727dddc7f93fcc155cb8d0c49c29ae0e71122. Although the NIC doesn't support offload of outer header CSUM, using gso_partial_features allows offloading the tunnel's segmentation. The driver relies on the stack CSUM calculation of the outer header. For this, NETIF_F_GSO_UDP_TUNNEL_CSUM must be a member of the device's features. Fixes: 6d6727dddc7f ("net/mlx5e: Block offload of outer header csum for UDP tunnels") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5e: Don't block routes with nexthop objects in SWMaor Dickman
Routes with nexthop objects is currently not supported by multipath offload and any attempts to use it is blocked, however this also block adding SW routes with nexthop. Resolve this by returning NOTIFY_DONE instead of an error which will allow such a route to be created in SW but not offloaded. This fix also solve an issue which block adding such routes on different devices due to missing check if the route FIB device is one of multipath devices. Fixes: 6a87afc072c3 ("mlx5: Fail attempts to use routes with nexthop objects") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5e: Fix wrong usage of fib_info_nh when routes with nexthop objects ↵Maor Dickman
are used Creating routes with nexthop objects while in switchdev mode leads to access to un-allocated memory and trigger bellow call trace due to hitting WARN_ON. This is caused due to illegal usage of fib_info_nh in TC tunnel FIB event handling to resolve the FIB device while fib_info built in with nexthop. Fixed by ignoring attempts to use nexthop objects with routes until support can be properly added. WARNING: CPU: 1 PID: 1724 at include/net/nexthop.h:468 mlx5e_tc_tun_fib_event+0x448/0x570 [mlx5_core] CPU: 1 PID: 1724 Comm: ip Not tainted 5.15.0_for_upstream_min_debug_2021_11_09_02_04 #1 RIP: 0010:mlx5e_tc_tun_fib_event+0x448/0x570 [mlx5_core] RSP: 0018:ffff8881349f7910 EFLAGS: 00010202 RAX: ffff8881492f1980 RBX: ffff8881349f79e8 RCX: 0000000000000000 RDX: ffff8881349f79e8 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff8881349f7950 R08: 00000000000000fe R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88811e9d0000 R13: ffff88810eb62000 R14: ffff888106710268 R15: 0000000000000018 FS: 00007f1d5ca6e800(0000) GS:ffff88852c880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffedba44ff8 CR3: 0000000129808004 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> atomic_notifier_call_chain+0x42/0x60 call_fib_notifiers+0x21/0x40 fib_table_insert+0x479/0x6d0 ? try_charge_memcg+0x480/0x6d0 inet_rtm_newroute+0x65/0xb0 rtnetlink_rcv_msg+0x2af/0x360 ? page_add_file_rmap+0x13/0x130 ? do_set_pte+0xcd/0x120 ? rtnl_calcit.isra.0+0x120/0x120 netlink_rcv_skb+0x4e/0xf0 netlink_unicast+0x1ee/0x2b0 netlink_sendmsg+0x22e/0x460 sock_sendmsg+0x33/0x40 ____sys_sendmsg+0x1d1/0x1f0 ___sys_sendmsg+0xab/0xf0 ? __mod_memcg_lruvec_state+0x40/0x60 ? __mod_lruvec_page_state+0x95/0xd0 ? page_add_new_anon_rmap+0x4e/0xf0 ? __handle_mm_fault+0xec6/0x1470 __sys_sendmsg+0x51/0x90 ? internal_get_user_pages_fast+0x480/0xa10 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: 8914add2c9e5 ("net/mlx5e: Handle FIB events to update tunnel endpoint device") Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5e: Fix nullptr on deleting mirroring ruleDima Chumak
Deleting a Tc rule with multiple outputs, one of which is internal port, like this one: tc filter del dev enp8s0f0_0 ingress protocol ip pref 5 flower \ dst_mac 0c:42:a1:d1:d0:88 \ src_mac e4:ea:09:08:00:02 \ action tunnel_key set \ src_ip 0.0.0.0 \ dst_ip 7.7.7.8 \ id 8 \ dst_port 4789 \ action mirred egress mirror dev vxlan_sys_4789 pipe \ action mirred egress redirect dev enp8s0f0_1 Triggers a call trace: BUG: kernel NULL pointer dereference, address: 0000000000000230 RIP: 0010:del_sw_hw_rule+0x2b/0x1f0 [mlx5_core] Call Trace: tree_remove_node+0x16/0x30 [mlx5_core] mlx5_del_flow_rules+0x51/0x160 [mlx5_core] __mlx5_eswitch_del_rule+0x4b/0x170 [mlx5_core] mlx5e_tc_del_fdb_flow+0x295/0x550 [mlx5_core] mlx5e_flow_put+0x1f/0x70 [mlx5_core] mlx5e_delete_flower+0x286/0x390 [mlx5_core] tc_setup_cb_destroy+0xac/0x170 fl_hw_destroy_filter+0x94/0xc0 [cls_flower] __fl_delete+0x15e/0x170 [cls_flower] fl_delete+0x36/0x80 [cls_flower] tc_del_tfilter+0x3a6/0x6e0 rtnetlink_rcv_msg+0xe5/0x360 ? rtnl_calcit.isra.0+0x110/0x110 netlink_rcv_skb+0x46/0x110 netlink_unicast+0x16b/0x200 netlink_sendmsg+0x202/0x3d0 sock_sendmsg+0x33/0x40 ____sys_sendmsg+0x1c3/0x200 ? copy_msghdr_from_user+0xd6/0x150 ___sys_sendmsg+0x88/0xd0 ? ___sys_recvmsg+0x88/0xc0 ? do_futex+0x10c/0x460 __sys_sendmsg+0x59/0xa0 do_syscall_64+0x48/0x140 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fix by disabling offloading for flows matching esw_is_chain_src_port_rewrite() which have more than one output. Fixes: 10742efc20a4 ("net/mlx5e: VF tunnel TX traffic offloading") Signed-off-by: Dima Chumak <dchumak@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5e: Fix page DMA map/unmap attributesAya Levin
Driver initiates DMA sync, hence it may skip CPU sync. Add DMA_ATTR_SKIP_CPU_SYNC as input attribute both to dma_map_page and dma_unmap_page to avoid redundant sync with the CPU. When forcing the device to work with SWIOTLB, the extra sync might cause data corruption. The driver unmaps the whole page while the hardware used just a part of the bounce buffer. So syncing overrides the entire page with bounce buffer that only partially contains real data. Fixes: bc77b240b3c5 ("net/mlx5e: Add fragmented memory support for RX multi packet WQE") Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06Documentation: devlink: mlx5.rst: Fix htmldoc build warningSaeed Mahameed
Fix the following build warning: Documentation/networking/devlink/mlx5.rst:13: WARNING: Error parsing content block for the "list-table" directive: +uniform two-level bullet list expected, but row 2 does not contain the same number of items as row 1 (2 vs 3). ... Add the missing item in the first row. Fixes: 0844fa5f7b89 ("net/mlx5: Let user configure io_eq_size param") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5e: Add recovery flow in case of error CQEGal Pressman
The rep legacy RQ completion handling was missing the appropriate handling of error CQEs (dump the CQE and queue a recover work), fix it by calling trigger_report() when needed. Since all CQE handling flows do the exact same error CQE handling, extract it to a common helper function. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5e: TC, Remove redundant error loggingRoi Dayan
Remove redundant and trivial error logging when trying to offload mirred device with unsupported devices. Using OVS could hit those a lot and the errors are still logged in extack. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5e: Refactor set_pflag_cqe_based_moderSaeed Mahameed
Rearrange the code and use cqe_mode_to_period_mode() helper. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5e: Move HW-GRO and CQE compression check to fix features flowGal Pressman
Feature dependencies should be resolved in fix features rather than in set features flow. Move the check that disables HW-GRO in case CQE compression is enabled from set_feature_hw_gro() to mlx5e_fix_features(). Signed-off-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5e: Fix feature check per profileAya Levin
Remove redundant space when constructing the feature's enum. Validate against the indented enum value. Fixes: 6c72cb05d4b8 ("net/mlx5e: Use bitmap field for profile features") Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5e: Unblock setting vid 0 for VF in case PF isn't eswitch managerMaor Dickman
When using libvirt to passthrough VF to VM it will always set the VF vlan to 0 even if user didn’t request it, this will cause libvirt to fail to boot in case the PF isn't eswitch owner. Example of such case is the DPU host PF which isn't eswitch manager, so any attempt to passthrough VF of it using libvirt will fail. Fix it by not returning error in case set VF vlan is called with vid 0. Signed-off-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5e: Expose FEC counters via ethtoolLama Kayal
Add FEC counters' statistics of corrected_blocks and uncorrectable_blocks, along with their lanes via ethtool. HW supports corrected_blocks and uncorrectable_blocks counters both for RS-FEC mode and FC-FEC mode. In FC mode these counters are accumulated per lane, while in RS mode the correction method crosses lanes, thus only total corrected_blocks and uncorrectable_blocks are reported in this mode. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5: Update log_max_qp value to FW max capabilityMaher Sanalla
log_max_qp in driver's default profile #2 was set to 18, but FW actually supports 17 at the most - a situation that led to the concerning print when the driver is loaded: "log_max_qp value in current profile is 18, changing to HCA capabaility limit (17)" The expected behavior from mlx5_profile #2 is to match the maximum FW capability in regards to log_max_qp. Thus, log_max_qp in profile #2 is initialized to a defined static value (0xff) - which basically means that when loading this profile, log_max_qp value will be what the currently installed FW supports at most. Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5: SF, Use all available cpu for setting cpu affinityShay Drory
Currently all SFs are using the same CPUs. Spreading SF over CPUs, in round-robin manner, in order to achieve better distribution of the SFs over available CPUs. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5: Introduce API for bulk request and release of IRQsShay Drory
Currently IRQs are requested one by one. To balance spreading IRQs among cpus using such scheme requires remembering cpu mask for the cpus used for a given device. This complicates the IRQ allocation scheme in subsequent patch. Hence, prepare the code for bulk IRQs allocation. This enables spreading IRQs among cpus in subsequent patch. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5: Split irq_pool_affinity logic to new fileShay Drory
The downstream patches add more functionality to irq_pool_affinity. Move the irq_pool_affinity logic to a new file in order to ease the coding and maintenance of it. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5: Move affinity assignment into irq_requestShay Drory
Move affinity binding of the IRQ to irq_request function in order to bind the IRQ before inserting it to the xarray. After this change, the IRQ is ready for use when inserted to the xarray. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5: Introduce control IRQ request APIShay Drory
Currently, IRQ layer have a separate flow for ctrl and comp IRQs, and the distinction between ctrl and comp IRQs is done in the IRQ layer. In order to ease the coding and maintenance of the IRQ layer, introduce a new API for requesting control IRQs - mlx5_ctrl_irq_request(struct mlx5_core_dev *dev). Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06net/mlx5: mlx5e_hv_vhca_stats_create return type to voidSaeed Mahameed
Callers of this functions ignore its return value, as reported by Wang Qing, in one of the return paths, it returns positive values. Since return value is ignored anyways, void out the return type of the function. Reported-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-01-06bpf/selftests: Test bpf_d_path on rdonly_mem.Hao Luo
The second parameter of bpf_d_path() can only accept writable memories. Rdonly_mem obtained from bpf_per_cpu_ptr() can not be passed into bpf_d_path for modification. This patch adds a selftest to verify this behavior. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220106205525.2116218-1-haoluo@google.com
2022-01-06libbpf: Add documentation for bpf_map batch operationsGrant Seltzer
This adds documention for: - bpf_map_delete_batch() - bpf_map_lookup_batch() - bpf_map_lookup_and_delete_batch() - bpf_map_update_batch() This also updates the public API for the `keys` parameter of `bpf_map_delete_batch()`, and both the `keys` and `values` parameters of `bpf_map_update_batch()` to be constants. Signed-off-by: Grant Seltzer <grantseltzer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220106201304.112675-1-grantseltzer@gmail.com
2022-01-06selftests/bpf: Don't rely on preserving volatile in PT_REGS macros in loop3Andrii Nakryiko
PT_REGS*() macro on some architectures force-cast struct pt_regs to other types (user_pt_regs, etc) and might drop volatile modifiers, if any. Volatile isn't really required as pt_regs value isn't supposed to change during the BPF program run, so this is correct behavior. But progs/loop3.c relies on that volatile modifier to ensure that loop is preserved. Fix loop3.c by declaring i and sum variables as volatile instead. It preserves the loop and makes the test pass on all architectures (including s390x which is currently broken). Fixes: 3cc31d794097 ("libbpf: Normalize PT_REGS_xxx() macro definitions") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220106205156.955373-1-andrii@kernel.org
2022-01-06ice: Use bitmap_free() to free bitmapChristophe JAILLET
kfree() and bitmap_free() are the same. But using the latter is more consistent when freeing memory allocated with bitmap_zalloc(). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-06ice: Optimize a few bitmap operationsChristophe JAILLET
When a bitmap is local to a function, it is safe to use the non-atomic __[set|clear]_bit(). No concurrent accesses can occur. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-06ice: Slightly simply ice_find_free_recp_res_idxChristophe JAILLET
The 'possible_idx' bitmap is set just after it is zeroed, so we can save the first step. The 'free_idx' bitmap is used only at the end of the function as the result of a bitmap xor operation. So there is no need to explicitly zero it before. So, slightly simply the code and remove 2 useless 'bitmap_zero()' call Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-06ice: improve switchdev's slow-pathWojciech Drewek
In current switchdev implementation, every VF PR is assigned to individual ring on switchdev ctrl VSI. For slow-path traffic, there is a mapping VF->ring done in software based on src_vsi value (by calling ice_eswitch_get_target_netdev function). With this change, HW solution is introduced which is more efficient. For each VF, src MAC (VF's MAC) filter will be created, which forwards packets to the corresponding switchdev ctrl VSI queue based on src MAC address. This filter has to be removed and then replayed in case of resetting one VF. Keep information about this rule in repr->mac_rule, thanks to that we know which rule has to be removed and replayed for a given VF. In case of CORE/GLOBAL all rules are removed automatically. We have to take care of readding them. This is done by ice_replay_vsi_adv_rule. When driver leaves switchdev mode, remove all advanced rules from switchdev ctrl VSI. This is done by ice_rem_adv_rule_for_vsi. Flag repr->rule_added is needed because in some cases reset might be triggered before VF sends request to add MAC. Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-06ice: replay advanced rules after resetVictor Raj
ice_replay_vsi_adv_rule will replay advanced rules for a given VSI. Exit this function when list of rules for given recipe is empty. Do not add rule when given vsi_handle does not match vsi_handle from the rule info. Use ICE_MAX_NUM_RECIPES instead of ICE_SW_LKUP_LAST in order to find advanced rules as well. Signed-off-by: Victor Raj <victor.raj@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-06Bluetooth: hci_event: Rework hci_inquiry_result_with_rssi_evtLuiz Augusto von Dentz
This rework the handling of hci_inquiry_result_with_rssi_evt to not use a union to represent the different inquiry responses. Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Soenke Huster <soenke.huster@eknoes.de> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2022-01-06Bluetooth: btbcm: disable read tx power for MacBook Air 8,1 and 8,2Aditya Garg
The MacBook Air 8,1 and 8,2 also need querying of LE Tx power to be disabled for Bluetooth to work. Signed-off-by: Aditya Garg <gargaditya08@live.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org