summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-07drm/amd/display: Clean up flip pending timeout handlingJoshua Aberback
[Why] Adjust timeout handling code for easier runtime manipulation during debug. Change has no functional effect by default. Reviewed-by: Aric Cyr <aric.cyr@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Joshua Aberback <joshua.aberback@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: init TA microcode for SRIOV VF when MP0 IP is 13.0.6Zhigang Luo
Init TA ucode for SRIOV. Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: remove SRIOV VF FB location programmingZhigang Luo
For SRIOV VF, FB location is programmed by host driver, no need to program it in guest driver. v2: squash in unused variable removal Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amd/display: Add Functions to enable Freesync Panel ReplayBhawanpreet Lakha
Add various functions for replay, such as construct, destroy, enable get_state, and copy_setting etc. These functions communicate with the firmware to setup and enable panel replay Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: enable SDMA MGCG for SDMA 5.2.xPrike Liang
Now the SDMA firmware can support SDMA MGCG properly, so let's enable it from the driver side. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: Issue ras enable_feature for gfx ip onlyHawking Zhang
For non-GFX IP blocks, set up ras obj if ras feature is allowed. For GFX IP blocks, force issue ras enable_feature command to firmware and only set up ras obj if ras feature is allowed Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: Remove gfx v11_0_3 ras_late_init callHawking Zhang
amdgpu_ras_late_init will invoke ras_late_init call per IP block Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: Clean up style problems in mmhub_v2_3.cSrinivasan Shanmugam
Fixes the following: ERROR: code indent should use tabs where possible WARNING: Missing a blank line after declarations WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: suspect code indent for conditional statements (8, 24) + if (!(data & (DAGB0_CNTL_MISC2__DISABLE_WRREQ_CG_MASK | [...] + *flags |= AMD_CG_SUPPORT_MC_MGCG; Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: Move vram, gtt & flash defines to amdgpu_ ttm & _psp.hSrinivasan Shanmugam
As amdgpu.h is getting decomposed, move vram and gtt extern defines into amdgpu_ttm.h & flash extern to amdgpu_psp.h Fixes: f9acfafc3458 ("drm/amdgpu: Move externs to amdgpu.h file from amdgpu_drv.c") Suggested-by: Christian König <christian.koenig@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07Revert "drm/radeon: Prefer dev_* variant over printk"Srinivasan Shanmugam
Usage of container_of is wrong here. struct acpi_device *adev = container_of(handle, struct acpi_device, handle) This reverts commit cbd0606e6a776bf2ba10d4a6957bb7628c0da947. References: https://gitlab.freedesktop.org/drm/amd/-/issues/2744 Cc: Guchun Chen <guchun.chen@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Bert Karwatzki <spasswolf@web.de> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: Apply poison mode check to GFX IP onlyHawking Zhang
For GFX IP that only supports poison consumption, GFX RAS won't be marked as enabled. i.e., hardware doesn't support gfx sram ecc. But driver still needs to issue firmware to enable poison consumption mode for GFX IP. In such case, check poison mode and treat GFX IP as RAS capable IP block. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: Only create err_count sysfs when hw_op is supportedHawking Zhang
Some IP blocks only support partial ras feature and don't have ras counter and/or ras error status register at all. Driver should not create err_count sysfs node for those IP blocks. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amd/display: Add structs for Freesync Panel ReplayBhawanpreet Lakha
In some instances, the GPU is transmitting repeated frame to the sink without any updates or changes in the content. These repeat transmission are wasteful, resulting in power draw in different aspects of the system 1. DCN is fetching the frame of data from DF/UMC/DRAM. This memory traffic prevents power down of parts of this HW path. 2. GPU is transmitting pixel data to the display through the main link of the DisplayPort interface. This prevents power down of both the Source transmitter (TX) and the Sink receiver (RX)  The concepts of utilizing replay is similar to PSR, but there is a benefit of: Source and Sink remaining synchronized which allows for - lower latency when switching from replay to live frames - enable the possibility of more use cases - easy control of the sink's refresh rate during replay Due to Source and Sink remaining timing synchronized, Replay can be activated in more UI scenarios. Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: Sort the includes in amdgpu/amdgpu_drv.cSrinivasan Shanmugam
Sort the include files that are included in amdgpu_drv.c alphabetically. Suggested-by: Mario Limonciello <mario.limonciello@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: Cleanup amdgpu/amdgpu_cgs.cSrinivasan Shanmugam
Fixes the below: ERROR: switch and case should be at the same indent WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: Block comments use * on subsequent lines WARNING: Comparisons should place the constant on the right side of the test Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: Fix style issues in amdgpu_psp.cPraful Swarnakar
Fixes the following to align to linux coding style: WARNING: Block comments use a trailing */ on a separate line WARNING: Block comments should align the * on each line Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Praful Swarnakar <Praful.Swarnakar@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: Fix style issues in amdgpu_debugfs.cPraful Swarnakar
Fixes the following to align to linux coding style: WARNING: Missing a blank line after declarations WARNING: sizeof *rd should be sizeof(*rd) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Praful Swarnakar <Praful.Swarnakar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amd/pm: fix pp_dpm_sclk node not displaying correctlyYang Wang
if GFX clock is in DS (Deep Sleep) state, the current gfx freq may less then dpm level 0, then pp_dpm_sclk node unable show correct freq. (align output format with other cards) Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amd/pm: disable the SMU13 OD feature support temporarilyEvan Quan
The existing OD interface cannot support the growing demand for more OD features. We are in the transition to a new OD mechanism. So, disable the SMU13 OD feature support temporarily. And this should be reverted when the new OD mechanism online. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07Merge branch 'msm-fixes' into msm-nextRob Clark
Back-merge msm-fixes to resolve conflicts. Signed-off-by: Rob Clark <robdclark@chromium.org>
2023-08-07drm/mcde: remove redundant of_match_ptrZhu Wang
The driver depends on CONFIG_OF, so it is not necessary to use of_match_ptr here. Even for drivers that do not depend on CONFIG_OF, it's almost always better to leave out the of_match_ptr(), since the only thing it can possibly do is to save a few bytes of .text if a driver can be used both with and without it. Hence we remove of_match_ptr. Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230731131810.103379-1-wangzhu9@huawei.com
2023-08-07drm/tve200: remove redundant of_match_ptrZhu Wang
The driver depends on CONFIG_OF, so it is not necessary to use of_match_ptr here. Even for drivers that do not depend on CONFIG_OF, it's almost always better to leave out the of_match_ptr(), since the only thing it can possibly do is to save a few bytes of .text if a driver can be used both with and without it. Hence we remove of_match_ptr. Signed-off-by: Zhu Wang <wangzhu9@huawei.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20230731124222.78643-1-wangzhu9@huawei.com
2023-08-07drm/amdkfd: avoid unmap dma address when svm_ranges are splitAlex Sierra
DMA address reference within svm_ranges should be unmapped only after the memory has been released from the system. In case of range splitting, the DMA address information should be copied to the corresponding range after this has split. But leaving dma mapping intact. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amd/pm: correct the pcie width for smu 13.0.0Kenneth Feng
correct the pcie width value in pp_dpm_pcie for smu 13.0.0 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amd/display: Don't show stack trace for missing eDPMario Limonciello
Some systems are only connected by HDMI or DP, so warning related to missing eDP is unnecessary. Downgrade to debug instead. Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Fixes: 6d9b6dceaa51 ("drm/amd/display: only warn once in dce110_edp_wait_for_hpd_ready()") Reported-by: Mastan.Katragadda@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amd/display: Fix typo in enable and disable symclk_seTaimur Hassan
[Why & How] Symclk should be based on link_enc_inst, and symclk_fe_sel should be based on stream_enc_inst. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Taimur Hassan <syed.hassan@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amd/display: Add symclk enable/disable during stream enable/disableTaimur Hassan
[Why & How] Using dig_stream_mapper, program symclk_en and symclk_fe_src_sel when enabling or disabling the corresponding stream. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Taimur Hassan <syed.hassan@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu/discovery: add ih 6.1.0 supportPrike Liang
Add to IP discovery table. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: add ih 6.1 supportBen Li
Add initial support for IH 6.1. v2: Fix copyright date (Alex) Signed-off-by: Ben Li <ben.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: add ih 6.1 registersBen Li
Add new registers. v2: updates (Alex) Signed-off-by: Ben Li <ben.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu/discovery: add smuio 14.0.0 supportPrike Liang
Add to IP discovery table. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu/discovery: add hdp 6.1.0 supportPrike Liang
Add to IP discovery table. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu: Match against exact bootloader statusLijo Lazar
On PSP v13.x ASICs, boot loader will set only the MSB to 1 and clear the least significant bits for any command submission. Hence match against the exact register value, otherwise a register value of all 0xFFs also could falsely indicate that boot loader is ready. Also, from PSP v13.0.6 and newer, bits[7:0] will be used to indicate command error status. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amd/pm: skip the RLC stop when S0i3 suspend for SMU v13.0.4/11Tim Huang
For SMU v13.0.4/11, driver does not need to stop RLC for S0i3, the firmwares will handle that properly. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu/discovery: enable sdma6 for SDMA 6.1.0Prike Liang
Add to IP discovery table. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amd: Disable S/G for APUs when 64GB or more host memoryMario Limonciello
Users report a white flickering screen on multiple systems that is tied to having 64GB or more memory. When S/G is enabled pages will get pinned to both VRAM carve out and system RAM leading to this. Until it can be fixed properly, disable S/G when 64GB of memory or more is detected. This will force pages to be pinned into VRAM. This should fix white screen flickers but if VRAM pressure is encountered may lead to black screens. It's a trade-off for now. Fixes: 81d0bcf99009 ("drm/amdgpu: make display pinning more flexible (v2)") Cc: Hamza Mahfooz <Hamza.Mahfooz@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: <stable@vger.kernel.org> # 6.1.y: bf0207e172703 ("drm/amdgpu: add S/G display parameter") Cc: <stable@vger.kernel.org> # 6.4.y Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2735 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2354 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07drm/amdgpu/sdma6: initialize sdma 6.1.0Prike Liang
Add firmware declaration. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-07Merge tag 'wq-for-6.5-rc5-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: - The recently added cpu_intensive auto detection and warning mechanism was spuriously triggered on slow CPUs. While not causing serious issues, it's still a nuisance and can cause unintended concurrency management behaviors. Relax the threshold on machines with lower BogoMIPS. While BogoMIPS is not an accurate measure of performance by most measures, we don't have to be accurate and it has rough but strong enough correlation. - A correction in Kconfig help text * tag 'wq-for-6.5-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Scale up wq_cpu_intensive_thresh_us if BogoMIPS is below 4000 workqueue: Fix cpu_intensive_thresh_us name in help text
2023-08-07Merge tag 'tpmdd-v6.5-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm fixes from Jarkko Sakkinen: "A few more bug fixes" * tag 'tpmdd-v6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm/tpm_tis: Disable interrupts for Lenovo P620 devices tpm: Disable RNG for all AMD fTPMs sysctl: set variable key_sysctls storage-class-specifier to static tpm/tpm_tis: Disable interrupts for TUXEDO InfinityBook S 15/17 Gen7
2023-08-07Merge branch 'wireguard-fixes-for-6-5-rc6'Jakub Kicinski
Jason A. Donenfeld says: ==================== wireguard fixes for 6.5-rc6 Just one patch this time, somewhat late in the cycle: 1) Fix an off-by-one calculation for the maximum node depth size in the allowedips trie data structure, and also adjust the self-tests to hit this case so it doesn't regress again in the future. ==================== Link: https://lore.kernel.org/r/20230807132146.2191597-1-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-07wireguard: allowedips: expand maximum node depthJason A. Donenfeld
In the allowedips self-test, nodes are inserted into the tree, but it generated an even amount of nodes, but for checking maximum node depth, there is of course the root node, which makes the total number necessarily odd. With two few nodes added, it never triggered the maximum depth check like it should have. So, add 129 nodes instead of 128 nodes, and do so with a more straightforward scheme, starting with all the bits set, and shifting over one each time. Then increase the maximum depth to 129, and choose a better name for that variable to make it clear that it represents depth as opposed to bits. Cc: stable@vger.kernel.org Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Link: https://lore.kernel.org/r/20230807132146.2191597-2-Jason@zx2c4.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-07bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slavesZiyang Xuan
BUG_ON(!vlan_info) is triggered in unregister_vlan_dev() with following testcase: # ip netns add ns1 # ip netns exec ns1 ip link add bond0 type bond mode 0 # ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2 # ip netns exec ns1 ip link set bond_slave_1 master bond0 # ip netns exec ns1 ip link add link bond_slave_1 name vlan10 type vlan id 10 protocol 802.1ad # ip netns exec ns1 ip link add link bond0 name bond0_vlan10 type vlan id 10 protocol 802.1ad # ip netns exec ns1 ip link set bond_slave_1 nomaster # ip netns del ns1 The logical analysis of the problem is as follows: 1. create ETH_P_8021AD protocol vlan10 for bond_slave_1: register_vlan_dev() vlan_vid_add() vlan_info_alloc() __vlan_vid_add() // add [ETH_P_8021AD, 10] vid to bond_slave_1 2. create ETH_P_8021AD protocol bond0_vlan10 for bond0: register_vlan_dev() vlan_vid_add() __vlan_vid_add() vlan_add_rx_filter_info() if (!vlan_hw_filter_capable(dev, proto)) // condition established because bond0 without NETIF_F_HW_VLAN_STAG_FILTER return 0; if (netif_device_present(dev)) return dev->netdev_ops->ndo_vlan_rx_add_vid(dev, proto, vid); // will be never called // The slaves of bond0 will not refer to the [ETH_P_8021AD, 10] vid. 3. detach bond_slave_1 from bond0: __bond_release_one() vlan_vids_del_by_dev() list_for_each_entry(vid_info, &vlan_info->vid_list, list) vlan_vid_del(dev, vid_info->proto, vid_info->vid); // bond_slave_1 [ETH_P_8021AD, 10] vid will be deleted. // bond_slave_1->vlan_info will be assigned NULL. 4. delete vlan10 during delete ns1: default_device_exit_batch() dev->rtnl_link_ops->dellink() // unregister_vlan_dev() for vlan10 vlan_info = rtnl_dereference(real_dev->vlan_info); // real_dev of vlan10 is bond_slave_1 BUG_ON(!vlan_info); // bond_slave_1->vlan_info is NULL now, bug is triggered!!! Add S-VLAN tag related features support to bond driver. So the bond driver will always propagate the VLAN info to its slaves. Fixes: 8ad227ff89a7 ("net: vlan: add 802.1ad support") Suggested-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20230802114320.4156068-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-07net/mlx5e: Add capability check for vnic countersLama Kayal
Add missing capability check for each of the vnic counters exposed by devlink health reporter, and thus avoid unexpected behavior due to invalid access to registers. While at it, read only the exact number of bits for each counter whether it was 32 bits or 64 bits. Fixes: b0bc615df488 ("net/mlx5: Add vnic devlink health reporter to PFs/VFs") Fixes: a33682e4e78e ("net/mlx5e: Expose catastrophic steering error counters") Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Reload auxiliary devices in pci error handlersMoshe Shemesh
Handling pci errors should fully teardown and load back auxiliary devices, same as done through mlx5 health recovery flow. Fixes: 72ed5d5624af ("net/mlx5: Suspend auxiliary devices only in case of PCI device suspend") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Skip clock update work when device is in error stateMoshe Shemesh
When device is in error state, marked by the flag MLX5_DEVICE_STATE_INTERNAL_ERROR, the HW and PCI may not be accessible and so clock update work should be skipped. Furthermore, such access through PCI in error state, after calling mlx5_pci_disable_device() can result in failing to recover from pci errors. Fixes: ef9814deafd0 ("net/mlx5e: Add HW timestamping (TS) support") Reported-and-tested-by: Ganesh G R <ganeshgr@linux.ibm.com> Closes: https://lore.kernel.org/netdev/9bdb9b9d-140a-7a28-f0de-2e64e873c068@nvidia.com Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: LAG, Check correct bucket when modifying LAGShay Drory
Cited patch introduced buckets in hash mode, but missed to update the ports/bucket check when modifying LAG. Fix the check. Fixes: 352899f384d4 ("net/mlx5: Lag, use buckets in hash mode") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5e: Unoffload post act rule when handling FIB eventsChris Mi
If having the following tc rule on stack device: filter parent ffff: protocol ip pref 3 flower chain 1 filter parent ffff: protocol ip pref 3 flower chain 1 handle 0x1 dst_mac 24:25:d0:e1:00:00 src_mac 02:25:d0:25:01:02 eth_type ipv4 ct_state +trk+new in_hw in_hw_count 1 action order 1: ct commit zone 0 pipe index 2 ref 1 bind 1 installed 3807 sec used 3779 sec firstused 3800 sec Action statistics: Sent 120 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 2: tunnel_key set src_ip 192.168.1.25 dst_ip 192.168.1.26 key_id 4 dst_port 4789 csum pipe index 3 ref 1 bind 1 installed 3807 sec used 3779 sec firstused 3800 sec Action statistics: Sent 120 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed action order 3: mirred (Egress Redirect to device vxlan1) stolen index 9 ref 1 bind 1 installed 3807 sec used 3779 sec firstused 3800 sec Action statistics: Sent 120 bytes 2 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 used_hw_stats delayed When handling FIB events, the rule in post act will not be deleted. And because the post act rule has packet reformat and modify header actions, also will hit the following syndromes: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:829:(pid 11613): DEALLOC_MODIFY_HEADER_CONTEXT(0x941) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x1ab444), err(-22) mlx5_core 0000:08:00.0: mlx5_cmd_out_err:829:(pid 11613): DEALLOC_PACKET_REFORMAT_CONTEXT(0x93e) op_mod(0x0) failed, status bad resource state(0x9), syndrome (0x179e84), err(-22) Fix it by unoffloading post act rule when handling FIB events. Fixes: 314e1105831b ("net/mlx5e: Add post act offload/unoffload API") Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Fix devlink controller number for ECVFDaniel Jurgens
The controller number for ECVFs is always 0, because the ECPF must be the eswitch owner for EC VFs to be enabled. Fixes: dc13180824b7 ("net/mlx5: Enable devlink port for embedded cpu VF vports") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Allow 0 for total host VFsDaniel Jurgens
When querying eswitch functions 0 is a valid number of host VFs. After introducing ARM SRIOV falling through to getting the max value from PCI results in using the total VFs allowed on the ARM for the host. Fixes: 86eec50beaf3 ("net/mlx5: Support querying max VFs from device"); Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-07net/mlx5: Return correct EC_VF function IDDaniel Jurgens
The ECVF function ID range is 1..max_ec_vfs. Currently mlx5_vport_to_func_id returns 0..max_ec_vfs - 1. Which results in a syndrome when querying the caps with more recent firmware, or reading incorrect caps with older firmware that supports EC VFs. Fixes: 9ac0b128248e ("net/mlx5: Update vport caps query/set for EC VFs") Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>