linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2021-12-23	xfrm: Add support for SM3 secure hash	Xu Jia
	This patch allows IPsec to use SM3 HMAC authentication algorithm. Signed-off-by: Xu Jia <xujia39@huawei.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-12-23	xfrm: update SA curlft.use_time	Antony Antony
	SA use_time was only updated once, for the first packet. with this fix update the use_time for every packet. Signed-off-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2021-12-22	net/mlx5: Fix some error handling paths in 'mlx5e_tc_add_fdb_flow()'	Christophe JAILLET
	All the error handling paths of 'mlx5e_tc_add_fdb_flow()' end to 'err_out' where 'flow_flag_set(flow, FAILED);' is called. All but the new error handling paths added by the commits given in the Fixes tag below. Fix these error handling paths and branch to 'err_out'. Fixes: 166f431ec6be ("net/mlx5e: Add indirect tc offload of ovs internal port") Fixes: b16eb3c81fe2 ("net/mlx5: Support internal port as decap route device") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> (cherry picked from commit 31108d142f3632970f6f3e0224bd1c6781c9f87d)
2021-12-22	net/mlx5e: Delete forward rule for ct or sample action	Chris Mi
	When there is ct or sample action, the ct or sample rule will be deleted and return. But if there is an extra mirror action, the forward rule can't be deleted because of the return. Fix it by removing the return. Fixes: 69e2916ebce4 ("net/mlx5: CT: Add support for mirroring") Fixes: f94d6389f6a8 ("net/mlx5e: TC, Add support to offload sample action") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22	net/mlx5e: Fix ICOSQ recovery flow for XSK	Maxim Mikityanskiy
	There are two ICOSQs per channel: one is needed for RX, and the other for async operations (XSK TX, kTLS offload). Currently, the recovery flow for both is the same, and async ICOSQ is mistakenly treated like the regular ICOSQ. This patch prevents running the regular ICOSQ recovery on async ICOSQ. The purpose of async ICOSQ is to handle XSK wakeup requests and post kTLS offload RX parameters, it has nothing to do with RQ and XSKRQ UMRs, so the regular recovery sequence is not applicable here. Fixes: be5323c8379f ("net/mlx5e: Report and recover from CQE error on ICOSQ") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22	net/mlx5e: Fix interoperability between XSK and ICOSQ recovery flow	Maxim Mikityanskiy
	Both regular RQ and XSKRQ use the same ICOSQ for UMRs. When doing recovery for the ICOSQ, don't forget to deactivate XSKRQ. XSK can be opened and closed while channels are active, so a new mutex prevents the ICOSQ recovery from running at the same time. The ICOSQ recovery deactivates and reactivates XSKRQ, so any parallel change in XSK state would break consistency. As the regular RQ is running, it's not enough to just flush the recovery work, because it can be rescheduled. Fixes: be5323c8379f ("net/mlx5e: Report and recover from CQE error on ICOSQ") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22	net/mlx5e: Fix skb memory leak when TC classifier action offloads are disabled	Gal Pressman
	When TC classifier action offloads are disabled (CONFIG_MLX5_CLS_ACT in Kconfig), the mlx5e_rep_tc_receive() function which is responsible for passing the skb to the stack (or freeing it) is defined as a nop, and results in leaking the skb memory. Replace the nop with a call to napi_gro_receive() to resolve the leak. Fixes: 28e7606fa8f1 ("net/mlx5e: Refactor rx handler of represetor device") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22	net/mlx5e: Wrap the tx reporter dump callback to extract the sq	Amir Tzin
	Function mlx5e_tx_reporter_dump_sq() casts its void * argument to struct mlx5e_txqsq , but in TX-timeout-recovery flow the argument is actually of type struct mlx5e_tx_timeout_ctx . mlx5_core 0000:08:00.1 enp8s0f1: TX timeout detected mlx5_core 0000:08:00.1 enp8s0f1: TX timeout on queue: 1, SQ: 0x11ec, CQ: 0x146d, SQ Cons: 0x0 SQ Prod: 0x1, usecs since last trans: 21565000 BUG: stack guard page was hit at 0000000093f1a2de (stack is 00000000b66ea0dc..000000004d932dae) kernel stack overflow (page fault): 0000 [#1] SMP NOPTI CPU: 5 PID: 95 Comm: kworker/u20:1 Tainted: G W OE 5.13.0_mlnx #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5e mlx5e_tx_timeout_work [mlx5_core] RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180 [mlx5_core] Call Trace: mlx5e_tx_reporter_dump+0x43/0x1c0 [mlx5_core] devlink_health_do_dump.part.91+0x71/0xd0 devlink_health_report+0x157/0x1b0 mlx5e_reporter_tx_timeout+0xb9/0xf0 [mlx5_core] ? mlx5e_tx_reporter_err_cqe_recover+0x1d0/0x1d0 [mlx5_core] ? mlx5e_health_queue_dump+0xd0/0xd0 [mlx5_core] ? update_load_avg+0x19b/0x550 ? set_next_entity+0x72/0x80 ? pick_next_task_fair+0x227/0x340 ? finish_task_switch+0xa2/0x280 mlx5e_tx_timeout_work+0x83/0xb0 [mlx5_core] process_one_work+0x1de/0x3a0 worker_thread+0x2d/0x3c0 ? process_one_work+0x3a0/0x3a0 kthread+0x115/0x130 ? kthread_park+0x90/0x90 ret_from_fork+0x1f/0x30 --[ end trace 51ccabea504edaff ]--- RIP: 0010:mlx5e_tx_reporter_dump_sq+0xd3/0x180 PKRU: 55555554 Kernel panic - not syncing: Fatal exception Kernel Offset: disabled end Kernel panic - not syncing: Fatal exception To fix this bug add a wrapper for mlx5e_tx_reporter_dump_sq() which extracts the sq from struct mlx5e_tx_timeout_ctx and set it as the TX-timeout-recovery flow dump callback. Fixes: 5f29458b77d5 ("net/mlx5e: Support dump callback in TX reporter") Signed-off-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Amir Tzin <amirtz@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22	net/mlx5: Fix tc max supported prio for nic mode	Chris Mi
	Only prio 1 is supported if firmware doesn't support ignore flow level for nic mode. The offending commit removed the check wrongly. Add it back. Fixes: 9a99c8f1253a ("net/mlx5e: E-Switch, Offload all chain 0 priorities when modify header and forward action is not supported") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22	net/mlx5: Fix SF health recovery flow	Moshe Shemesh
	SF do not directly control the PCI device. During recovery flow SF should not be allowed to do pci disable or pci reset, its PF will do it. It fixes the following kernel trace: mlx5_core.sf mlx5_core.sf.25: mlx5_health_try_recover:387:(pid 40948): starting health recovery flow mlx5_core 0000:03:00.0: mlx5_pci_slot_reset was called mlx5_core 0000:03:00.0: wait vital counter value 0xab175 after 1 iterations mlx5_core.sf mlx5_core.sf.25: firmware version: 24.32.532 mlx5_core.sf mlx5_core.sf.23: mlx5_health_try_recover:387:(pid 40946): starting health recovery flow mlx5_core 0000:03:00.0: mlx5_pci_slot_reset was called mlx5_core 0000:03:00.0: wait vital counter value 0xab193 after 1 iterations mlx5_core.sf mlx5_core.sf.23: firmware version: 24.32.532 mlx5_core.sf mlx5_core.sf.25: mlx5_cmd_check:813:(pid 40948): ENABLE_HCA(0x104) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x658908) mlx5_core.sf mlx5_core.sf.25: mlx5_function_setup:1292:(pid 40948): enable hca failed mlx5_core.sf mlx5_core.sf.25: mlx5_health_try_recover:389:(pid 40948): health recovery failed Fixes: 1958fc2f0712 ("net/mlx5: SF, Add auxiliary device driver") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22	net/mlx5: Fix error print in case of IRQ request failed	Shay Drory
	In case IRQ layer failed to find or to request irq, the driver is printing the first cpu of the provided affinity as part of the error print. Empty affinity is a valid input for the IRQ layer, and it is an error to call cpumask_first() on empty affinity. Remove the first cpu print from the error message. Fixes: c36326d38d93 ("net/mlx5: Round-Robin EQs over IRQs") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22	net/mlx5: Use first online CPU instead of hard coded CPU	Shay Drory
	Hard coded CPU (0 in our case) might be offline. Hence, use the first online CPU instead. Fixes: f891b7cdbdcd ("net/mlx5: Enable single IRQ for PCI Function") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22	net/mlx5: DR, Fix querying eswitch manager vport for ECPF	Yevgeny Kliteynik
	On BlueField the E-Switch manager is the ECPF (vport 0xFFFE), but when querying capabilities of ECPF eswitch manager, need to query vport 0 with other_vport = 0. Fixes: 9091b821aaa4 ("net/mlx5: DR, Handle eswitch manager and uplink vports separately") Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22	net/mlx5: DR, Fix NULL vs IS_ERR checking in dr_domain_init_resources	Miaoqian Lin
	The mlx5_get_uars_page() function returns error pointers. Using IS_ERR() to check the return value to fix this. Fixes: 4ec9e7b02697 ("net/mlx5: DR, Expose steering domain functionality") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-12-22	scsi: vmw_pvscsi: Set residual data length conditionally	Alexey Makhalov
	The PVSCSI implementation in the VMware hypervisor under specific configuration ("SCSI Bus Sharing" set to "Physical") returns zero dataLen in the completion descriptor for READ CAPACITY(16). As a result, the kernel can not detect proper disk geometry. This can be recognized by the kernel message: [ 0.776588] sd 1:0:0:0: [sdb] Sector size 0 reported, assuming 512. The PVSCSI implementation in QEMU does not set dataLen at all, keeping it zeroed. This leads to a boot hang as was reported by Shmulik Ladkani. It is likely that the controller returns the garbage at the end of the buffer. Residual length should be set by the driver in that case. The SCSI layer will erase corresponding data. See commit bdb2b8cab439 ("[SCSI] erase invalid data returned by device") for details. Commit e662502b3a78 ("scsi: vmw_pvscsi: Set correct residual data length") introduced the issue by setting residual length unconditionally, causing the SCSI layer to erase the useful payload beyond dataLen when this value is returned as 0. As a result, considering existing issues in implementations of PVSCSI controllers, we do not want to call scsi_set_resid() when dataLen == 0. Calling scsi_set_resid() has no effect if dataLen equals buffer length. Link: https://lore.kernel.org/lkml/20210824120028.30d9c071@blondie/ Link: https://lore.kernel.org/r/20211220190514.55935-1-amakhalov@vmware.com Fixes: e662502b3a78 ("scsi: vmw_pvscsi: Set correct residual data length") Cc: Matt Wang <wwentao@vmware.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Vishal Bhakta <vbhakta@vmware.com> Cc: VMware PV-Drivers <pv-drivers@vmware.com> Cc: James E.J. Bottomley <jejb@linux.ibm.com> Cc: linux-scsi@vger.kernel.org Cc: stable@vger.kernel.org Reported-and-suggested-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Signed-off-by: Alexey Makhalov <amakhalov@vmware.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22	scsi: libiscsi: Fix UAF in iscsi_conn_get_param()/iscsi_conn_teardown()	Lixiaokeng
	\|- iscsi_if_destroy_conn \|-dev_attr_show \|-iscsi_conn_teardown \|-spin_lock_bh \|-iscsi_sw_tcp_conn_get_param \|-kfree(conn->persistent_address) \|-iscsi_conn_get_param \|-kfree(conn->local_ipaddr) ==>\|-read persistent_address ==>\|-read local_ipaddr \|-spin_unlock_bh When iscsi_conn_teardown() and iscsi_conn_get_param() happen in parallel, a UAF may be triggered. Link: https://lore.kernel.org/r/046ec8a0-ce95-d3fc-3235-666a7c65b224@huawei.com Reported-by: Lu Tixiong <lutianxiong@huawei.com> Reviewed-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Lixiaokeng <lixiaokeng@huawei.com> Signed-off-by: Linfeilong <linfeilong@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-12-22	io_uring: zero iocb->ki_pos for stream file types	Jens Axboe
	io_uring supports using offset == -1 for using the current file position, and we read that in as part of read/write command setup. For the non-iter read/write types we pass in NULL for the position pointer, but for the iter types we should not be passing any anything but 0 for the position for a stream. Clear kiocb->ki_pos if the file is a stream, don't leave it as -1. If we do, then the request will error with -ESPIPE. Fixes: ba04291eb66e ("io_uring: allow use of offset == -1 to mean file position") Link: https://github.com/axboe/liburing/discussions/501 Reported-by: Samuel Williams <samuel.williams@oriontransfer.co.nz> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-22	drm/amdgpu: fix runpm documentation	Alex Deucher
	It's not only supported by HG/PX laptops. It's supported by all dGPUs which supports BOCO/BACO functionality (runtime D3). BOCO - Bus Off, Chip Off. The entire chip is powered off. This is controlled by ACPI. BACO - Bus Active, Chip Off. The chip still shows up on the PCI bus, but the device itself is powered down. v2: fix missed HG/PX reference Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-23	Merge tag 'exynos-drm-next-for-v5.17' of ↵	Dave Airlie
	git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next Four cleanups - Replacing lagacy gpio interface of dsi driver with gpiod one. - Implementing a generic GEM object mmap and use it instead of exynos specific one. - Dropping the use of label from dsi driver. Which also fixes a build warning. - Just trivial cleanup by dropping unnecessay code. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Inki Dae <inki.dae@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211222035345.26595-1-inki.dae@samsung.com
2021-12-23	Merge tag 'drm/tegra/for-5.17-rc1' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/drm/tegra into drm-next drm/tegra: Changes for v5.17-rc1 This contains a fairly large rework that makes the buffer objects behave more according to what the DMA-BUF infrastructure expects. A buffer object cache is implemented on top of that to make certain operations such as page-flipping more efficient by avoiding needless map/unmap operations. This in turn is useful to implement asynchronous commits to support legacy cursor updates. Another fairly big addition is the NVDEC driver. This uses the updated UABI introduced in v5.15-rc1 to provide access to the video decode engines found on Tegra210 and later. This also includes some power management improvements that are useful on older devices in particular because they, together with a bunch of other changes across the kernel, allow the system to scale down frequency and voltages when mostly idle and prevent these devices from becoming excessively hot. The remainder of these changes is an assortment of cleanups and minor fixes. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thierry Reding <thierry.reding@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211217142912.558095-1-thierry.reding@gmail.com
2021-12-23	Merge tag 'amd-drm-next-5.17-2021-12-16' of ↵	Dave Airlie
	https://gitlab.freedesktop.org/agd5f/linux into drm-next amdgpu: - Add some display debugfs entries - RAS fixes - SR-IOV fixes - W=1 fixes - Documentation fixes - IH timestamp fix - Misc power fixes - IP discovery fixes - Large driver documentation updates - Multi-GPU memory use reductions - Misc display fixes and cleanups - Add new SMU debug option amdkfd: - SVM fixes radeon: - Fix typo in comment From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211216202731.5900-1-alexander.deucher@amd.com
2021-12-23	Merge tag 'drm-intel-fixes-2021-12-22' of ↵	Dave Airlie
	git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fixes for v5.16-rc7: - Fix fallout from guc submission locking rework Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87wnjwydhs.fsf@intel.com
2021-12-22	Merge branch 'add-tests-for-vxlan-with-ipv6-underlay'	Jakub Kicinski
	Amit Cohen says: ==================== Add tests for VxLAN with IPv6 underlay mlxsw driver lately added support for VxLAN with IPv6 underlay. This set adds the relevant tests for IPv6, most of them are same to IPv4 tests with the required changes. Patch set overview: Patch #1 relaxes requirements for offloading TC filters that match on 802.1q fields. The following selftests make use of these newly-relaxed filters. Patch #2 adds preparation as part of selftests API, which will be used later. Patches #3-#4 add tests for VxLAN with bridge aware and unaware. Patche #5 cleans unused function. Patches #6-#7 add tests for VxLAN symmetric and asymmetric. Patch #8 adds test for Q-in-VNI. ==================== Link: https://lore.kernel.org/r/20211221144949.2527545-1-amcohen@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	selftests: forwarding: Add Q-in-VNI test for IPv6	Amit Cohen
	Add test to check Q-in-VNI traffic with IPv6 underlay and overlay. The test is similar to the existing IPv4 test. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	selftests: forwarding: Add a test for VxLAN symmetric routing with IPv6	Amit Cohen
	In a similar fashion to the asymmetric test, add a test for symmetric routing. In symmetric routing both the ingress and egress VTEPs perform routing in the overlay network into / from the VxLAN tunnel. Packets in different directions use the same VNI - the L3 VNI. Different tenants (VRFs) use different L3 VNIs. Add a test which is similar to the existing IPv4 test to check IPv6. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	selftests: forwarding: Add a test for VxLAN asymmetric routing with IPv6	Amit Cohen
	In asymmetric routing the ingress VTEP routes the packet into the correct VxLAN tunnel, whereas the egress VTEP only bridges the packet to the correct host. Therefore, packets in different directions use different VNIs - the target VNI. Add a test which is similar to the existing IPv4 test to check IPv6. The test uses a simple topology with two VTEPs and two VNIs and verifies that ping passes between hosts (local / remote) in the same VLAN (VNI) and in different VLANs belonging to the same tenant (VRF). While the test does not check VM mobility, it does configure an anycast gateway using a macvlan device on both VTEPs. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	selftests: forwarding: vxlan_bridge_1q: Remove unused function	Amit Cohen
	Remove `vxlan_ping_test()` which is not used and probably was copied mistakenly from vxlan_bridge_1d.sh. This was found while adding an equivalent test for IPv6. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	selftests: forwarding: Add VxLAN tests with a VLAN-aware bridge for IPv6	Amit Cohen
	The tests are very similar to their VLAN-unaware counterpart (vxlan_bridge_1d_ipv6.sh and vxlan_bridge_1d_port_8472_ipv6.sh), but instead of using multiple VLAN-unaware bridges, a single VLAN-aware bridge is used with multiple VLANs. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	selftests: forwarding: Add VxLAN tests with a VLAN-unaware bridge for IPv6	Amit Cohen
	Add tests similar to vxlan_bridge_1d.sh and vxlan_bridge_1d_port_8472.sh. The tests set up a topology with three VxLAN endpoints: one "local", possibly offloaded, and two "remote", formed using veth pairs and likely purely software bridges. The "local" endpoint is connected to host systems by a VLAN-unaware bridge. Since VxLAN tunnels must be unique per namespace, each of the "remote" endpoints is in its own namespace. H3 forms the bridge between the three domains. Send IPv4 packets and IPv6 packets with IPv6 underlay. Use `TC_FLAG`, which is defined in `forwarding.config` file, for TC checks. `TC_FLAG` allows testing that on HW datapath, the traffic actually goes through HW. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	selftests: lib.sh: Add PING_COUNT to allow sending configurable amount of ↵	Amit Cohen
	packets Currently `ping_do()` and `ping6_do()` send 10 packets. There are cases that it is not possible to catch only the interesting packets using tc rule, so then, it is possible to send many packets and verify that at least this amount of packets hit the rule. Add `PING_COUNT` variable, which is set to 10 by default, to allow tests sending more than 10 packets using the existing ping API. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	mlxsw: spectrum_flower: Make vlan_id limitation more specific	Amit Cohen
	Spectrum ASICs do not support matching of VLAN ID at egress. Currently, mlxsw driver forbids matching of all VLAN related fields at egress, which is too strict check. For example, the following filter is not supported by the driver: $ tc filter add dev swpX egress protocol 802.1q pref 1 handle 101 flower vlan_ethtype ipv4 src_ip .. dst_ip .. skip_sw action pass Error: mlxsw_spectrum: vlan_id key is not supported on egress. We have an error talking to the kernel The filter above does not match on VLAN ID, but is bounced anyway. Make the check more specific, forbid only matching of 'vlan_id' at egress. Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	Merge tag 'mlx5-updates-2021-12-21' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-12-21 1) From Shay Drory: Devlink user knobs to control device's EQ size This series provides knobs which will enable users to minimize memory consumption of mlx5 Functions (PF/VF/SF). mlx5 exposes two new generic devlink params for EQ size configuration and uses devlink generic param max_macs. LINK: https://lore.kernel.org/netdev/20211208141722.13646-1-shayd@nvidia.com/ 2) From Tariq and Lama, allocate software channel objects and statistics of a mlx5 netdevice private data dynamically upon first demand to save on memory. * tag 'mlx5-updates-2021-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Take packet_merge params directly from the RX res struct net/mlx5e: Allocate per-channel stats dynamically at first usage net/mlx5e: Use dynamic per-channel allocations in stats net/mlx5e: Allow profile-specific limitation on max num of channels net/mlx5e: Save memory by using dynamic allocation in netdev priv net/mlx5e: Add profile indications for PTP and QOS HTB features net/mlx5e: Use bitmap field for profile features net/mlx5: Remove the repeated declaration net/mlx5: Let user configure max_macs generic param devlink: Clarifies max_macs generic devlink param net/mlx5: Let user configure event_eq_size param devlink: Add new "event_eq_size" generic device param net/mlx5: Let user configure io_eq_size param devlink: Add new "io_eq_size" generic device param ==================== Link: https://lore.kernel.org/r/20211222031604.14540-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-23	Merge tag 'mediatek-drm-fixes-5.16' of ↵	Dave Airlie
	https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-fixes Mediatek DRM Fixes for Linux 5.16 1. Perform NULL pointer check for mtk_hdmi_conf. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Chun-Kuang Hu <chunkuang.hu@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/1639956861-14873-1-git-send-email-chunkuang.hu@kernel.org
2021-12-23	netfilter: flowtable: remove ipv4/ipv6 modules	Florian Westphal
	Just place the structs and registration in the inet module. nf_flow_table_ipv6, nf_flow_table_ipv4 and nf_flow_table_inet share same module dependencies: nf_flow_table, nf_tables. before: text data bss dec hex filename 2278 1480 0 3758 eae nf_flow_table_inet.ko 1159 1352 0 2511 9cf nf_flow_table_ipv6.ko 1154 1352 0 2506 9ca nf_flow_table_ipv4.ko after: 2369 1672 0 4041 fc9 nf_flow_table_inet.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23	netfilter: nat: force port remap to prevent shadowing well-known ports	Florian Westphal
	If destination port is above 32k and source port below 16k assume this might cause 'port shadowing' where a 'new' inbound connection matches an existing one, e.g. inbound X:41234 -> Y:53 matches existing conntrack entry Z:53 -> X:4123, where Z got natted to X. In this case, new packet is natted to Z:53 which is likely unwanted. We avoid the rewrite for connections that originate from local host: port-shadowing is only possible with forwarded connections. Also adjust test case. v3: no need to call tuple_force_port_remap if already in random mode (Phil) Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Phil Sutter <phil@nwl.cc> Acked-by: Eric Garver <eric@garver.life> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23	netfilter: conntrack: tag conntracks picked up in local out hook	Florian Westphal
	This allows to identify flows that originate from local machine in a followup patch. It would be possible to make this a ->status bit instead. For now I did not do that yet because I don't have a use-case for exposing this info to userspace. If one comes up the toggle can be replaced with a status bit. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23	netfilter: nf_tables: make counter support built-in	Pablo Neira Ayuso
	Make counter support built-in to allow for direct call in case of CONFIG_RETPOLINE. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23	netfilter: nf_tables: replace WARN_ON by WARN_ON_ONCE for unknown verdicts	Pablo Neira Ayuso
	Bug might trigger warning for each packet, call WARN_ON_ONCE instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23	netfilter: nf_tables: consolidate rule verdict trace call	Pablo Neira Ayuso
	Add function to consolidate verdict tracing. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23	netfilter: nft_payload: WARN_ON_ONCE instead of BUG	Pablo Neira Ayuso
	BUG() is too harsh for unknown payload base, use WARN_ON_ONCE() instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23	netfilter: nf_tables: remove rcu read-size lock	Pablo Neira Ayuso
	Chain stats are updated from the Netfilter hook path which already run under rcu read-size lock section. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-12-23	Revert "ARM: dts: BCM5301X: define RTL8365MB switch on Asus RT-AC88U"	Arnd Bergmann
	This reverts commit 3d2d52a0d1835b56f6bd67d268f6c39df0e41692, it caused a build regression: arch/arm/boot/dts/bcm47094-asus-rt-ac88u.dts:109.4-14: Warning (reg_format): /switch/ports:reg: property has invalid length (4 bytes) (#address-cells == 2, #size-cells == 1) arch/arm/boot/dts/bcm47094-asus-rt-ac88u.dtb: Warning (pci_device_reg): Failed prerequisite 'reg_format' arch/arm/boot/dts/bcm47094-asus-rt-ac88u.dtb: Warning (pci_device_bus_num): Failed prerequisite 'reg_format' arch/arm/boot/dts/bcm47094-asus-rt-ac88u.dtb: Warning (i2c_bus_reg): Failed prerequisite 'reg_format' arch/arm/boot/dts/bcm47094-asus-rt-ac88u.dtb: Warning (spi_bus_reg): Failed prerequisite 'reg_format' arch/arm/boot/dts/bcm47094-asus-rt-ac88u.dts:106.9-149.5: Warning (avoid_default_addr_size): /switch/ports: Relying on default #address-cells value arch/arm/boot/dts/bcm47094-asus-rt-ac88u.dts:106.9-149.5: Warning (avoid_default_addr_size): /switch/ports: Relying on default #size-cells value Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-12-22	codel: remove unnecessary pkt_sched.h include	Jakub Kicinski
	Commit d068ca2ae2e6 ("codel: split into multiple files") moved all Qdisc-related code to codel_qdisc.h, move the include of pkt_sched.h as well. This is similar to the previous commit, although we don't care as much about incremental builds after pkt_sched.h was touched itself it is included by net/sch_generic.h which is modified ~20 times a year. This decreases the incremental build size after touching pkt_sched.h from 1592 to 617 objects. Fix unmasked missing includes in WiFi drivers. Acked-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20211221193941.3805147-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	codel: remove unnecessary sock.h include	Jakub Kicinski
	Since sock.h is modified relatively often (60 times in the last 12 months) it seems worthwhile to decrease the incremental build work. CoDel's header includes net/inet_ecn.h which in turn includes net/sock.h. codel.h is itself included by mac80211 which is included by much of the WiFi stack and drivers. Removing the net/inet_ecn.h include from CoDel breaks the dependecy between WiFi and sock.h. Commit d068ca2ae2e6 ("codel: split into multiple files") moved all the code which actually needs ECN helpers out to net/codel_impl.h, the include can be moved there as well. This decreases the incremental build size after touching sock.h from 4999 objects to 4051 objects. Fix unmasked missing includes in WiFi drivers. Acked-by: Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20211221193941.3805147-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	net: broadcom: bcm4908enet: remove redundant variable bytes	Colin Ian King
	The variable bytes is being used to summate slot lengths, however the value is never used afterwards. The summation is redundant so remove variable bytes. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20211222003937.727325-1-colin.i.king@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	ice: trivial: fix odd indenting	Jesse Brandeburg
	Fix an odd indent where some code was left indented, and causes smatch to warn: ice_log_pkg_init() warn: inconsistent indenting While here, for consistency, add a break after the default case. This commit has a Fixes: but we caught this while it was only in net-next. Fixes: 247dd97d713c ("ice: Refactor status flow for DDP load") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/20211221230538.2546315-1-jesse.brandeburg@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	asix: fix wrong return value in asix_check_host_enable()	Pavel Skripkin
	If asix_read_cmd() returns 0 on 30th interation, 0 will be returned from asix_check_host_enable(), which is logically wrong. Fix it by returning -ETIMEDOUT explicitly if we have exceeded 30 iterations Also, replaced 30 with #define as suggested by Andrew Fixes: a786e3195d6a ("net: asix: fix uninit value bugs") Reported-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/ecd3470ce6c2d5697ac635d0d3b14a47defb4acb.1640117288.git.paskripkin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	asix: fix uninit-value in asix_mdio_read()	Pavel Skripkin
	asix_read_cmd() may read less than sizeof(smsr) bytes and in this case smsr will be uninitialized. Fail log: BUG: KMSAN: uninit-value in asix_check_host_enable drivers/net/usb/asix_common.c:82 [inline] BUG: KMSAN: uninit-value in asix_check_host_enable drivers/net/usb/asix_common.c:82 [inline] drivers/net/usb/asix_common.c:497 BUG: KMSAN: uninit-value in asix_mdio_read+0x3c1/0xb00 drivers/net/usb/asix_common.c:497 drivers/net/usb/asix_common.c:497 asix_check_host_enable drivers/net/usb/asix_common.c:82 [inline] asix_check_host_enable drivers/net/usb/asix_common.c:82 [inline] drivers/net/usb/asix_common.c:497 asix_mdio_read+0x3c1/0xb00 drivers/net/usb/asix_common.c:497 drivers/net/usb/asix_common.c:497 Fixes: d9fe64e51114 ("net: asix: Add in_pm parameter") Reported-and-tested-by: syzbot+f44badb06036334e867a@syzkaller.appspotmail.com Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Link: https://lore.kernel.org/r/8966e3b514edf39857dd93603fc79ec02e000a75.1640117288.git.paskripkin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	Merge branch '100GbE' of ↵	Jakub Kicinski
	git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-12-21 This series contains updates to ice driver only. Karol modifies the reset flow to correct issues with PTP reset. Jake extends PTP support for E822 based devices. This includes a few cleanup patches, that fix some minor issues. In addition, there are some slight refactors to ease the addition of E822 support, followed by adding the new hardware implementation ice_ptp_hw.c. There are a few major differences with E822 support compared to E810 support: ) The E822 device has a Clock Generation Unit which must be initialized in order to generate proper clock frequencies on the output that drives the PTP hardware clock registers ) The E822 PHY is a bit different and requires a more complex initialization procedure which must be rerun any time the link configuration changes. ) The E822 devices support enhanced timestamp calibration by making use of a process called Vernier offset measurement. This allows the hardware to measure phase offset related to the PHY clocks for Serdes and FEC, reducing the inaccuracy of the timestamp relative to the actual packet transmission and receipt. Making use of this requires data gathered from the first transmitted and received packets, and waiting for the PHY to complete the calibration measurements. This is done as part of a new kthread, ov_work. Note that to avoid delay in enabling timestamps, we start the PHY in 'bypass' mode which allows timestamps to be captured without the Vernier calibration measurement. Once the first packets have been sent and received, we then complete the calibration setup and exit bypass mode and begin using the more precise timestamps. According to the datasheet, timestamps without calibration data can be incorrect relative to actual receipt or transmission by up to 1 clock cycle (~1.25 nanoseconds), while calibrated timestamps should be correct to within 1/8th of a clock cycle (~0.15 nanoseconds). ) E822 devices support crosstimestamping via PCIe PTM, which we enable when available on the platform. There is a fair amount of logic required to perform PHY and CGU initialization, which is the vast majority of the new code, but it is fairly self contained within ice_ptp_hw.c, with the exception of monitoring for offset validity being handled by a kthread. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: support crosstimestamping on E822 devices if supported ice: exit bypass mode once hardware finishes timestamp calibration ice: ensure the hardware Clock Generation Unit is configured ice: implement basic E822 PTP support ice: convert clk_freq capability into time_ref ice: introduce ice_ptp_init_phc function ice: use 'int err' instead of 'int status' in ice_ptp_hw.c ice: PTP: move setting of tstamp_config ice: introduce ice_base_incval function ice: Fix E810 PTP reset flow ==================== Link: https://lore.kernel.org/r/20211221174845.3063640-1-anthony.l.nguyen@intel.com Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-22	xfs: map unwritten blocks in XFS_IOC_{ALLOC,FREE}SP just like fallocate	Darrick J. Wong
	The old ALLOCSP/FREESP ioctls in XFS can be used to preallocate space at the end of files, just like fallocate and RESVSP. Make the behavior consistent with the other ioctls. Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>