linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-08-17	Merge branch 'bridge-vlan-fixes'	David S. Miller
	Nikolay Aleksandrov says: ==================== net: bridge: vlan: fixes for vlan mcast contexts These are four fixes for vlan multicast contexts. The first patch enables mcast ctx snooping when adding already existing master vlans to be consistent with the rest of the code. The second patch accounts for the mcast ctx router ports when allocating skb for notification. The third one fixes two suspicious rcu usages due to wrong vlan group helper, and the fourth updates host vlan mcast state along with port mcast state. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	net: bridge: mcast: toggle also host vlan state in br_multicast_toggle_vlan	Nikolay Aleksandrov
	When changing vlan mcast state by br_multicast_toggle_vlan it iterates over all ports and enables/disables the port mcast ctx based on the new state, but I forgot to update the host vlan (bridge master vlan entry) with the new state so it will be left out. Also that function is not used outside of br_multicast.c, so make it static. Fixes: f4b7002a7076 ("net: bridge: add vlan mcast snooping knob") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	net: bridge: mcast: use the correct vlan group helper	Nikolay Aleksandrov
	When dereferencing the port vlan group we should use the rcu helper instead of the one relying on rtnl. In br_multicast_pg_to_port_ctx the entry cannot disappear as we hold the multicast lock and rcu as explained in the comment above it. For the same reason we're ok in br_multicast_start_querier. ============================= WARNING: suspicious RCU usage 5.14.0-rc5+ #429 Tainted: G W ----------------------------- net/bridge/br_private.h:1478 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by swapper/2/0: #0: ffff88822be85eb0 ((&p->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x2da #1: ffff88810b32f260 (&br->multicast_lock){+.-.}-{3:3}, at: br_multicast_port_group_expired+0x28/0x13d [bridge] #2: ffffffff824f6c80 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire.constprop.0+0x0/0x22 [bridge] stack backtrace: CPU: 2 PID: 0 Comm: swapper/2 Kdump: loaded Tainted: G W 5.14.0-rc5+ #429 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x45/0x59 nbp_vlan_group+0x3e/0x44 [bridge] br_multicast_pg_to_port_ctx+0xd6/0x10d [bridge] br_multicast_star_g_handle_mode+0xa1/0x2ce [bridge] ? netlink_broadcast+0xf/0x11 ? nlmsg_notify+0x56/0x99 ? br_mdb_notify+0x224/0x2e9 [bridge] ? br_multicast_del_pg+0x1dc/0x26d [bridge] br_multicast_del_pg+0x1dc/0x26d [bridge] br_multicast_port_group_expired+0xaa/0x13d [bridge] ? __grp_src_delete_marked.isra.0+0x35/0x35 [bridge] ? __grp_src_delete_marked.isra.0+0x35/0x35 [bridge] call_timer_fn+0x134/0x2da __run_timers+0x169/0x193 run_timer_softirq+0x19/0x2d __do_softirq+0x1bc/0x42a __irq_exit_rcu+0x5c/0xb3 irq_exit_rcu+0xa/0x12 sysvec_apic_timer_interrupt+0x5e/0x75 </IRQ> asm_sysvec_apic_timer_interrupt+0x12/0x20 RIP: 0010:default_idle+0xc/0xd Code: e8 14 40 71 ff e8 10 b3 ff ff 4c 89 e2 48 89 ef 31 f6 5d 41 5c e9 a9 e8 c2 ff cc cc cc cc 0f 1f 44 00 00 e8 7f 55 65 ff fb f4 <c3> 0f 1f 44 00 00 55 65 48 8b 2c 25 40 6f 01 00 53 f0 80 4d 02 20 RSP: 0018:ffff88810033bf00 EFLAGS: 00000206 RAX: ffffffff819cf828 RBX: ffff888100328000 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff819cfa2d RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 R10: ffff8881008302c0 R11: 00000000000006db R12: 0000000000000000 R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000 ? __sched_text_end+0x4/0x4 ? default_idle_call+0x15/0x7b default_idle_call+0x4d/0x7b do_idle+0x124/0x2a2 cpu_startup_entry+0x1d/0x1f secondary_startup_64_no_verify+0xb0/0xbb Fixes: 74edfd483de8 ("net: bridge: multicast: add helper to get port mcast context from port group") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	net: bridge: vlan: account for router port lists when notifying	Nikolay Aleksandrov
	When sending a global vlan notification we should account for the number of router ports when allocating the skb, otherwise we might end up losing notifications. Fixes: dc002875c22b ("net: bridge: vlan: use br_rports_fill_info() to export mcast router ports") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	net: bridge: vlan: enable mcast snooping for existing master vlans	Nikolay Aleksandrov
	We always create a vlan with enabled mcast snooping, so when the user turns on per-vlan mcast contexts they'll get consistent behaviour with the current situation, but one place wasn't updated when a bridge/master vlan which already exists (created due to port vlans) is being added as real bridge vlan (BRIDGE_VLAN_INFO_BRENTRY). We need to enable mcast snooping for that vlan when that happens. Fixes: 7b54aaaf53cb ("net: bridge: multicast: add vlan state initialization and control") Signed-off-by: Nikolay Aleksandrov <nikolay@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	Merge branch 'octeonx2-mcam-management-rework'	David S. Miller
	Subbaraya Sundeep says: ==================== octeontx2: Rework MCAM flows management for VFs From Octeontx2 hardware point of view there is no difference between PFs and VFs. Hence with refactoring in driver the packet classification features or offloads can be supported by VFs also. This patchset unifies the mcam flows management so that VFs can also support ntuple filters. Since there are MCAM allocations by all PFs and VFs in the system it is required to have the ability to modify number of mcam rules count for a PF/VF in runtime. This is achieved by using devlink. Below is the summary of patches: Patch 1,2,3 are trivial patches which helps in debugging in case of errors by using custom error codes and displaying proper error messages. Patches 4,5 brings rx-all and ntuple support for CGX mapped VFs and LBK VFs. Patches 6,7,8 brings devlink support to PF netdev driver so that mcam entries count can be changed at runtime. To change mcam rule count at runtime where multiple rule allocations are done sorting is required. Also both ntuple and TC rules needs to be unified. Patch 9 is related to AF NPC where a PF allocated entries are allocated at bottom(low priority). On CN10K there is slight change in reading NPC counters which is handled by patch 10. Patch 11 is to allow packets from CPT for NPC parsing on CN10K. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-af: configure npc for cn10k to allow packets from cpt	Vidya
	On CN10K, the higher bits in the channel number represents the CPT channel number. Mask out these higher bits in the npc configuration to allow packets from cpt for parsing. Signed-off-by: Vidya <vvelumuri@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-af: cn10K: Get NPC counters value	Hariprasad Kelam
	The way SW can identify the number NPC counters supported by silicon has changed for CN10K. This patch addresses this reading appropriate registers to find out number of counters available. Signed-off-by: Hariprasad Kelam <hkelam@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-af: Allocate low priority entries for PF	Subbaraya Sundeep
	If the mcam entry allocation request is from PF and NOT a priority allocation request then allocate low priority entries so that PF entries always have lower priority than its VFs. This is required so that entries with (base) MCAM match criteria have lower priority compared to entries with (base + additional) match criteria. This patch considers only best case scenario where PF entries are allocated from low priority zone if low priority zone has free space. There are worst case scenarios like: 1. VFs allocating hundreds of MCAM entries leading to VFs using all mid priority zone and low priority zone entries hence no entries free from low priority zone for PF. 2. All the PFs and VFs in the system allocating and freeing entries causing fragmentation in MCAM space and all the entries requested by PF could not fit in low priority zone for allocation. This patch do not handle worst case scenarios. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-pf: devlink params support to set mcam entry count	Sunil Goutham
	Added support for setting or modifying MCAM entry count at runtime via devlink params. commands: devlink dev param show pci/0002:02:00.0: name mcam_count type driver-specific values: cmode runtime value 16 devlink dev param set pci/0002:02:00.0 name mcam_count value 64 cmode runtime Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-pf: Unify flow management variables	Sunil Goutham
	Variables used for TC flow management like maximum number of flows, number of flows installed etc are a copy of ntuple flow management variables. Since both TC and NTUPLE are not supported at the same time, it's better to unify these with common variables. This patch addresses this unification and also does cleanup of other minor stuff wrt TC. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-pf: Sort the allocated MCAM entry indices	Sunil Goutham
	Per single mailbox request a maximum of 256 MCAM entries can be allocated. If more than 256 are being allocated, then the mcam indices in the final list could get jumbled. Hence sort the indices. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-pf: Ntuple filters support for VF netdev	Rakesh Babu
	Add packet flow classification support for both LMAC mapped virtual functions and loopback VFs. This patch adds supports for ntuple offload feature. Signed-off-by: Rakesh Babu <rsaladi2@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-pf: Enable NETIF_F_RXALL support for VF driver	Sunil Goutham
	Enabled NETIF_F_RXALL support for VF driver. Also removed MTU range comments which are no longer valid. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-af: Add debug messages for failures	Sunil Goutham
	Added debug messages for various failures during probe. This will help in quickly identifying the API where the failure is happening. Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-af: add proper return codes for AF mailbox handlers	Naveen Mamindlapalli
	Add appropriate error codes to be used when returning from AF mailbox handlers due to some error condition. Signed-off-by: Naveen Mamindlapalli <naveenm@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	octeontx2-af: Modify install flow error codes	Subbaraya Sundeep
	When installing a flow using npc_install_flow mailbox there are number of reasons to reject the request like caller is not permitted, invalid channel specified in request, flow not supported in extraction profile and so on. Hence define new error codes for npc flows and use them instead of generic error codes. Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	Merge tag 'mlx5-updates-2021-08-16' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-08-16 The following patchset provides two separate mlx5 updates 1) Ethtool RSS context and MQPRIO channel mode support: 1.1) enable mlx5e netdev driver to allow creating Transport Interface RX (TIRs) objects on the fly to be used for ethtool RSS contexts and TX MQPRIO channel mode 1.2) Introduce mlx5e_rss object to manage such TIRs. 1.3) Ethtool support for RSS context 1.4) Support MQPRIO channel mode 2) Bridge offloads Lag support: to allow adding bond net devices to mlx5 bridge 2.1) Address bridge port by (vport_num, esw_owner_vhca_id) pair since vport_num is only unique per eswitch and in lag mode we need to manage ports from both eswitches. 2.2) Allow connectivity between representors of different eswitch instances that are attached to same bridge 2.3) Bridge LAG, Require representors to be in shared FDB mode and introduce local and peer ports representors, match on paired eswitch metadata in peer FDB entries, And finally support addition/deletion and aging of peer flows. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-17	ALSA: hda/via: Apply runtime PM workaround for ASUS B23E	Takashi Iwai
	ASUS B23E requires the same workaround like other machines with VT1802, otherwise it looses the codec power on a few nodes and the sound kept silence. Fixes: a0645daf1610 ("ALSA: HDA: Early Forbid of runtime PM") Link: https://lore.kernel.org/r/ac2232f142efcd67fe6ac38897f704f7176bd200.camel@gmail.com Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210817052432.14751-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-08-17	ALSA: hda: Fix hang during shutdown due to link reset	Imre Deak
	During system shutdown codecs may be still active, and resetting the controller->codec HW link in this state - based on the bug reporter's tests - leads to the shutdown sequence to get stuck. This happens at least on the reporter's KBL system with an ALC662 codec. For now fix the issue by skipping the link reset step. Fixes: 472e18f63c42 ("ALSA: hda: Release controller display power during shutdown/reboot") References: https://bugzilla.kernel.org/show_bug.cgi?id=214045 References: https://gitlab.freedesktop.org/drm/intel/-/issues/3618#note_1024665 Reported-and-tested-by: youling257@gmail.com Cc: youling257@gmail.com Signed-off-by: Imre Deak <imre.deak@intel.com> Link: https://lore.kernel.org/r/20210816174259.2759103-1-imre.deak@intel.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2021-08-16	selftests/bpf: Add exponential backoff to map_update_retriable in test_maps	Yucong Sun
	Using a fixed delay of 1 microsecond has proven flaky in slow CPU environment, e.g. Github Actions CI system. This patch adds exponential backoff with a cap of 50ms to reduce the flakiness of the test. Initial delay is chosen at random in the range [0ms, 5ms). Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210816175250.296110-1-fallentree@fb.com
2021-08-16	Merge branch 'sockmap: add sockmap support for unix stream socket'	Andrii Nakryiko
	Jiang Wang says: ==================== This patch series add support for unix stream type for sockmap. Sockmap already supports TCP, UDP, unix dgram types. The unix stream support is similar to unix dgram. Also add selftests for unix stream type in sockmap tests. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2021-08-16	selftest/bpf: Add new tests in sockmap for unix stream to tcp.	Jiang Wang
	Add two new test cases in sockmap tests, where unix stream is redirected to tcp and vice versa. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-6-jiang.wang@bytedance.com
2021-08-16	selftest/bpf: Change udp to inet in some function names	Jiang Wang
	This is to prepare for adding new unix stream tests. Mostly renames, also pass the socket types as an argument. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-5-jiang.wang@bytedance.com
2021-08-16	selftest/bpf: Add tests for sockmap with unix stream type.	Jiang Wang
	Add two tests for unix stream to unix stream redirection in sockmap tests. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-4-jiang.wang@bytedance.com
2021-08-16	af_unix: Add unix_stream_proto for sockmap	Jiang Wang
	Previously, sockmap for AF_UNIX protocol only supports dgram type. This patch add unix stream type support, which is similar to unix_dgram_proto. To support sockmap, dgram and stream cannot share the same unix_proto anymore, because they have different implementations, such as unhash for stream type (which will remove closed or disconnected sockets from the map), so rename unix_proto to unix_dgram_proto and add a new unix_stream_proto. Also implement stream related sockmap functions. And add dgram key words to those dgram specific functions. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-3-jiang.wang@bytedance.com
2021-08-16	af_unix: Add read_sock for stream socket types	Jiang Wang
	To support sockmap for af_unix stream type, implement read_sock, which is similar to the read_sock for unix dgram sockets. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-2-jiang.wang@bytedance.com
2021-08-16	Merge branch 'linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This contains a fix for a potential boot failure due to a missing Kconfig dependency for people upgrading with the DRBG enabled" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: drbg - select SHA512
2021-08-16	selftests/bpf: Test btf__load_vmlinux_btf/btf__load_module_btf APIs	Hengqi Chen
	Add test for btf__load_vmlinux_btf/btf__load_module_btf APIs. The test loads bpf_testmod module BTF and check existence of a symbol which is known to exist. Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210815081035.205879-1-hengqi.chen@gmail.com
2021-08-16	tcp: enable data-less, empty-cookie SYN with TFO_SERVER_COOKIE_NOT_REQD	Luke Hsiao
	Since the original TFO server code was implemented in commit 168a8f58059a22feb9e9a2dcc1b8053dbbbc12ef ("tcp: TCP Fast Open Server - main code path") the TFO server code has supported the sysctl bit flag TFO_SERVER_COOKIE_NOT_REQD. Currently, when the TFO_SERVER_ENABLE and TFO_SERVER_COOKIE_NOT_REQD sysctl bit flags are set, a server connection will accept a SYN with N bytes of data (N > 0) that has no TFO cookie, create a new fast open connection, process the incoming data in the SYN, and make the connection ready for accepting. After accepting, the connection is ready for read()/recvmsg() to read the N bytes of data in the SYN, ready for write()/sendmsg() calls and data transmissions to transmit data. This commit changes an edge case in this feature by changing this behavior to apply to (N >= 0) bytes of data in the SYN rather than only (N > 0) bytes of data in the SYN. Now, a server will accept a data-less SYN without a TFO cookie if TFO_SERVER_COOKIE_NOT_REQD is set. Caveat! While this enables a new kind of TFO (data-less empty-cookie SYN), some firewall rules setup may not work if they assume such packets are not legit TFOs and will filter them. Signed-off-by: Luke Hsiao <lukehsiao@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20210816205105.2533289-1-luke.w.hsiao@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-16	bpf: Reconfigure libbpf docs to remove unversioned API	grantseltzer
	This removes the libbpf_api.rst file from the kernel documentation. The intention for this file was to pull documentation from comments above API functions in libbpf. However, due to limitations of the kernel documentation system, this API documentation could not be versioned, which is counterintuative to how users expect to use it. There is also currently no doc comments, making this a blank page. Once the kernel comment documentation is actually contributed, it will still exist in the kernel repository, just in the code itself. A seperate site is being spun up to generate documentaiton from those comments in a way in which it can be versioned properly. This also reconfigures the bpf documentation index page to make it easier to sync to the previously mentioned documentaiton site. Signed-off-by: Grant Seltzer <grantseltzer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210810020508.280639-1-grantseltzer@gmail.com
2021-08-16	Merge branch 'ptp-ocp-minor-updates-and-fixes'	Jakub Kicinski
	Jonathan Lemon says: ==================== ptp: ocp: minor updates and fixes. Fix errors spotted by automated tools. Add myself to the MAINTAINERS for the ptp_ocp driver. ==================== Link: https://lore.kernel.org/r/20210816221337.390645-1-jonathan.lemon@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-16	MAINTAINERS: Update for ptp_ocp driver.	Jonathan Lemon
	Add maintainer info for the OpenCompute PTP driver. Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-16	ptp: ocp: Have Kconfig select NET_DEVLINK	Jonathan Lemon
	NET doesn't imply NET_DEVLINK. Select this separately, so that random config combinations don't complain. Reported-by: kernel test robot <lkp@intel.com> Fixes: 773bda964921 ("ptp: ocp: Expose various resources on the timecard.") Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-16	ptp: ocp: Fix error path for pci_ocp_device_init()	Jonathan Lemon
	If ptp_ocp_device_init() fails, pci_disable_device() is skipped. Fix the error handling so this case is covered. Update ptp_ocp_remove() so the normal exit path is identical. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 773bda964921 ("ptp: ocp: Expose various resources on the timecard.") Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-16	ptp: ocp: Fix uninitialized variable warning spotted by clang.	Jonathan Lemon
	If attempting to flash the firmware with a blob of size 0, the entire write loop is skipped and the uninitialized err is returned. Fix by setting to 0 first. Fixes: 773bda964921 ("ptp: ocp: Expose various resources on the timecard.") Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-16	vrf: Reset skb conntrack connection on VRF rcv	Lahav Schlesinger
	To fix the "reverse-NAT" for replies. When a packet is sent over a VRF, the POST_ROUTING hooks are called twice: Once from the VRF interface, and once from the "actual" interface the packet will be sent from: 1) First SNAT: l3mdev_l3_out() -> vrf_l3_out() -> .. -> vrf_output_direct() This causes the POST_ROUTING hooks to run. 2) Second SNAT: 'ip_output()' calls POST_ROUTING hooks again. Similarly for replies, first ip_rcv() calls PRE_ROUTING hooks, and second vrf_l3_rcv() calls them again. As an example, consider the following SNAT rule: > iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j SNAT --to-source 2.2.2.2 -o vrf_1 In this case sending over a VRF will create 2 conntrack entries. The first is from the VRF interface, which performs the IP SNAT. The second will run the SNAT, but since the "expected reply" will remain the same, conntrack randomizes the source port of the packet: e..g With a socket bound to 1.1.1.1:10000, sending to 3.3.3.3:53, the conntrack rules are: udp 17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=0 bytes=0 mark=0 use=1 udp 17 29 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1 i.e. First SNAT IP from 1.1.1.1 --> 2.2.2.2, and second the src port is SNAT-ed from 10000 --> 61033. But when a reply is sent (3.3.3.3:53 -> 2.2.2.2:61033) only the later conntrack entry is matched: udp 17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=1 bytes=49 mark=0 use=1 udp 17 28 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1 And a "port 61033 unreachable" ICMP packet is sent back. The issue is that when PRE_ROUTING hooks are called from vrf_l3_rcv(), the skb already has a conntrack flow attached to it, which means nf_conntrack_in() will not resolve the flow again. This means only the dest port is "reverse-NATed" (61033 -> 10000) but the dest IP remains 2.2.2.2, and since the socket is bound to 1.1.1.1 it's not received. This can be verified by logging the 4-tuple of the packet in '__udp4_lib_rcv()'. The fix is then to reset the flow when skb is received on a VRF, to let conntrack resolve the flow again (which now will hit the earlier flow). To reproduce: (Without the fix "Got pkt_to_nat_port" will not be printed by running 'bash ./repro'): $ cat run_in_A1.py import logging logging.getLogger("scapy.runtime").setLevel(logging.ERROR) from scapy.all import * import argparse def get_packet_to_send(udp_dst_port, msg_name): return Ether(src='11:22:33:44:55:66', dst=iface_mac)/ \ IP(src='3.3.3.3', dst='2.2.2.2')/ \ UDP(sport=53, dport=udp_dst_port)/ \ Raw(f'{msg_name}\x0012345678901234567890') parser = argparse.ArgumentParser() parser.add_argument('-iface_mac', dest="iface_mac", type=str, required=True, help="From run_in_A3.py") parser.add_argument('-socket_port', dest="socket_port", type=str, required=True, help="From run_in_A3.py") parser.add_argument('-v1_mac', dest="v1_mac", type=str, required=True, help="From script") args, _ = parser.parse_known_args() iface_mac = args.iface_mac socket_port = int(args.socket_port) v1_mac = args.v1_mac print(f'Source port before NAT: {socket_port}') while True: pkts = sniff(iface='_v0', store=True, count=1, timeout=10) if 0 == len(pkts): print('Something failed, rerun the script :(', flush=True) break pkt = pkts[0] if not pkt.haslayer('UDP'): continue pkt_sport = pkt.getlayer('UDP').sport print(f'Source port after NAT: {pkt_sport}', flush=True) pkt_to_send = get_packet_to_send(pkt_sport, 'pkt_to_nat_port') sendp(pkt_to_send, '_v0', verbose=False) # Will not be received pkt_to_send = get_packet_to_send(socket_port, 'pkt_to_socket_port') sendp(pkt_to_send, '_v0', verbose=False) break $ cat run_in_A2.py import socket import netifaces print(f"{netifaces.ifaddresses('e00000')[netifaces.AF_LINK][0]['addr']}", flush=True) s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, str('vrf_1' + '\0').encode('utf-8')) s.connect(('3.3.3.3', 53)) print(f'{s. getsockname()[1]}', flush=True) s.settimeout(5) while True: try: # Periodically send in order to keep the conntrack entry alive. s.send(b'a'*40) resp = s.recvfrom(1024) msg_name = resp[0].decode('utf-8').split('\0')[0] print(f"Got {msg_name}", flush=True) except Exception as e: pass $ cat repro.sh ip netns del A1 2> /dev/null ip netns del A2 2> /dev/null ip netns add A1 ip netns add A2 ip -n A1 link add _v0 type veth peer name _v1 netns A2 ip -n A1 link set _v0 up ip -n A2 link add e00000 type bond ip -n A2 link add lo0 type dummy ip -n A2 link add vrf_1 type vrf table 10001 ip -n A2 link set vrf_1 up ip -n A2 link set e00000 master vrf_1 ip -n A2 addr add 1.1.1.1/24 dev e00000 ip -n A2 link set e00000 up ip -n A2 link set _v1 master e00000 ip -n A2 link set _v1 up ip -n A2 link set lo0 up ip -n A2 addr add 2.2.2.2/32 dev lo0 ip -n A2 neigh add 1.1.1.10 lladdr 77:77:77:77:77:77 dev e00000 ip -n A2 route add 3.3.3.3/32 via 1.1.1.10 dev e00000 table 10001 ip netns exec A2 iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j \ SNAT --to-source 2.2.2.2 -o vrf_1 sleep 5 ip netns exec A2 python3 run_in_A2.py > x & XPID=$! sleep 5 IFACE_MAC=`sed -n 1p x` SOCKET_PORT=`sed -n 2p x` V1_MAC=`ip -n A2 link show _v1 \| sed -n 2p \| awk '{print $2'}` ip netns exec A1 python3 run_in_A1.py -iface_mac ${IFACE_MAC} -socket_port \ ${SOCKET_PORT} -v1_mac ${SOCKET_PORT} sleep 5 kill -9 $XPID wait $XPID 2> /dev/null ip netns del A1 ip netns del A2 tail x -n 2 rm x set +x Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device") Signed-off-by: Lahav Schlesinger <lschlesinger@drivenets.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210815120002.2787653-1-lschlesinger@drivenets.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-16	net/mlx5: Bridge, support LAG	Vlad Buslov
	Allow adding bond net devices to mlx5 bridge with following changes: - Modify bridge representor code to obtain uplink represetor that belongs to eswitch that is registered for notification. Require representor to be in shared FDB mode. If representor is the lag master, then consider its port as local, otherwise treat it as peer. - Use devcom to match on paired eswitch metadata in peer FDB entries. This is necessary for shared FDB LAG to function since packets are always received on active eswitch instance as opposed to parent eswitch of port. - Support for deleting peer flows when receiving SWITCHDEV_FDB_DEL_TO_BRIDGE notification was implemented in one of previous patches in series. Now also implement support for handling SWITCHDEV_FDB_ADD_TO_BRIDGE which can be generated on peer by bridge update workqueue task in LAG configuration. Refresh the flow 'lastuse' timestamp to current jiffies when receiving such notification on eswitch that manages the local FDB entry. This allows peer entries to prevent ageing of the FDB. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16	net/mlx5: Bridge, allow merged eswitch connectivity	Vlad Buslov
	Allow connectivity between representors of different eswitch instances that are attached to same bridge when merged_eswitch capability is enabled. Add ports of peer eswitch to bridge instance and mark them with MLX5_ESW_BRIDGE_PORT_FLAG_PEER. Mark FDBs offloaded on peer ports with MLX5_ESW_BRIDGE_FLAG_PEER flag. Such FDBs can only be aged out on their local eswitch instance, which then sends SWITCHDEV_FDB_DEL_TO_BRIDGE event. Listen to the event on mlx5 bridge implementation and delete peer FDBs in event handler. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16	net/mlx5: Bridge, extract FDB delete notification to function	Vlad Buslov
	SWITCHDEV_FDB_DEL_TO_BRIDGE notification is generated in multiple places in bridge code. Following patch in series changes the condition for the notification. Extract the notification into dedicated helper function mlx5_esw_bridge_fdb_del_notify() to only modify it in single place in the future changes. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16	net/mlx5: Bridge, identify port by vport_num+esw_owner_vhca_id pair	Vlad Buslov
	Following patches in series allow traffic between vports of different eswitch instances, which requires addressing bridge port by vport_num+esw_owner_vhca_id pair since vport_num is only unique per-eswitch. As a preparation, extend struct mlx5_esw_bridge_port with 'esw_owner_vhca_id' field and use it as part of key for mlx5_esw_bridge->vports xarray. With this change we can't rely on switchdev_handle_port_obj_add() helper to get mlx5 representor from stacked device because we need specifically representor from parent eswitch that registered the callback to obtain correct esw_owner_vhca_id. The helper doesn't allow passing additional parameters to predicate function and doesn't provide access to the notifier block to obtain eswitch through br_offloads. Implement custom helpers to obtain mlx5 representor and use them in mlx5_esw_bridge_port_obj_{add\|del\|attr_set}() implementations. Remove direct pointer to parent bridge from struct mlx5_vport as it is no longer needed. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16	net/mlx5: Bridge, obtain core device from eswitch instead of priv	Vlad Buslov
	Following patches in series will pass bond device to bridge, which means the code can't assume the device is mlx5 representor. Moreover, the core device can be easily obtained from eswitch instance, so there is no reason for more complex code that obtains struct mlx5_priv from net_device in order to use its mdev. Refactor the code to use esw->dev instead of priv->mdev. Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16	net/mlx5: Bridge, release bridge in same function where it is taken	Vlad Buslov
	Refactor mlx5_esw_bridge_vport_link() to release the bridge instance if mlx5_esw_bridge_vport_init() returned an error instead of relying on it to release the bridge. This improves the design because object instance is taken and released in same layer and simplifies following patches that add more logic to mlx5_esw_bridge_vport_link(). Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16	net/mlx5e: Support MQPRIO channel mode	Tariq Toukan
	Add support for MQPRIO channel mode, in which a partition to TCs is defined over the channels. We allow partitions with contiguous queue indices, with no holes within. We do not allow modification to the num of channels while this MQPRIO mode is active. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16	net/mlx5e: Handle errors of netdev_set_num_tc()	Tariq Toukan
	Add handling for failures in netdev_set_num_tc(). Let mlx5e_netdev_set_tcs return an int. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16	net/mlx5e: Maintain MQPRIO mode parameter	Tariq Toukan
	This is in preparation for supporting MQPRIO CHANNEL mode in downstream patch, in addition to DCB mode that's supported today. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16	net/mlx5e: Abstract MQPRIO params	Tariq Toukan
	Abstract the MQPRIO params into a struct. Use a getter for DCB mode num_tcs. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16	net/mlx5e: Support flow classification into RSS contexts	Tariq Toukan
	Extend the existing flow classification support, to steer flows not only directly to a receive ring, but also into the new RSS contexts. Create needed TIR objects on demand, and hold reference on the RSS context. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16	net/mlx5e: Support multiple RSS contexts	Tariq Toukan
	Add support to multiple RSS contexts. Resources of the non-default RSS contexts are allocated and created on demand. Each RSS context can be controlled and configured separately, via the implemented ethtool ops. Here we limit the num of total contexts to 16. We do not enforce any kind of new limitation over the indirection table content. More specifically, two separate contexts can be configured to fully or partially point to the same set of receive rings. The default RSS context (index 0) is created with its full set of TIRs. All other contexts are created with an empty set, then TIRs are added upon first usage when steering rules are added. We use a reference counting mechanism to make sure an RSS context is not removed before the rules pointing to it. Block ethtool set_channels operations when multiple RSS contexts exist, as currently the kernel doesn't protect against inconsistent channels configs that break non-default RSS contexts. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-16	net/mlx5e: Dynamically allocate TIRs in RSS contexts	Tariq Toukan
	Move from static to dynamic memory allocations for TIR. This is in preparation to supporting on-demand TIR operations in downstream patches, where every RSS context will be init with an empty set of TIRs. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>