Age | Commit message (Collapse) | Author |
|
CMU_BUSMC is responsible to control clocks of BLK_BUSMC which represents
Data/Peri buses. Most clocks except PDMA/SPDMA are not necessary to
be controlled by HLOS. So, this adds PDMA/SPDMA gate clocks.
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Link: https://lore.kernel.org/r/20220504075154.58819-7-chanho61.park@samsung.com
|
|
CMU_PERIS is responsible to control clocks of BLK_PERIS which has
OPT/MCT/WDT and TMU. This patch only supports WDT gate clocks and all
other clocks except WDT will be supported later.
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Link: https://lore.kernel.org/r/20220504075154.58819-6-chanho61.park@samsung.com
|
|
Add CMU_CORE clock which represents Core BUS clocks. The source clocks
of this CMU block are oscclk or dout_clkcmu_core_bus. Thus, two source
clocks should be provided via device tree. All the gate clocks are
defined as CLK_IS_CRITICAL because they control(gate/ungate) core bus
clocks but not been assigned to any drivers.
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Link: https://lore.kernel.org/r/20220504075154.58819-5-chanho61.park@samsung.com
|
|
This adds support for CMU_TOP which generates clocks for all the
function blocks such as CORE, FSYS0/1/2, PERIC0/1 and so on. For
CMU_TOP, PLL_SHARED0,1,2,3 and 4 will be the sources of this block
and they will generate bus clocks.
Signed-off-by: Chanho Park <chanho61.park@samsung.com>
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Link: https://lore.kernel.org/r/20220504075154.58819-4-chanho61.park@samsung.com
|
|
The variables vht_mcs and he_mcs are being initialized in the
start of for-loops however they are re-assigned new values in
the loop and not used outside the loop. The initializations
are redundant and can be removed.
Cleans up clang scan warnings:
warning: Although the value stored to 'vht_mcs' is used in the
enclosing expression, the value is never actually read from
'vht_mcs' [deadcode.DeadStores]
warning: Although the value stored to 'he_mcs' is used in the
enclosing expression, the value is never actually read from
'he_mcs' [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220507184155.26939-1-colin.i.king@gmail.com
|
|
Ath11k allocates memory when firmware requests memory in QMI.
Coldboot calibration and firmware recovery uses firmware reload.
On firmware reload, firmware sends memory request again. If Ath11k
allocates memory on first firmware boot, reuse the available
memory. Also check if the segment type and size is same
on the next firmware boot. Reuse if segment type/size is
same as previous firmware boot else free the segment and
allocate the segment with size/type.
Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.6.0.1-00752-QCAHKSWPL_SILICONZ-1
Signed-off-by: Anilkumar Kolli <quic_akolli@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220506141448.10340-1-quic_akolli@quicinc.com
|
|
This is completely racy, remove it.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220506095451.9852305a92c8.I05155824a1b9059eb59beccde81786dc69de354a@changeid
|
|
In case of Passpoint, the WLAN interface may be requested to
remain on a specific channel and then to send some management
frames on that channel. Now chanfreq of wmi_mgmt_send_cmd is set
as 0, as a result firmware may choose a default but wrong channel.
Fix it by assigning chanfreq field with the designated channel.
This change only applies to WCN6855 and QCA6390, other chips are
not affected.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220506013614.1580274-4-quic_bqiang@quicinc.com
|
|
Commit 66307ca04057 ("ath11k: fix mgmt_tx_wmi cmd sent to FW for
deleted vdev") wants both of below two conditions are true before
sending management frames:
1: ar->allocated_vdev_map & (1LL << arvif->vdev_id)
2: arvif->is_started
Actually the second one is not necessary because with the first one
we can make sure the vdev is present.
Also use ar->conf_mutex to synchronize vdev delete and mgmt. TX.
This issue is found in case of Passpoint scenario where ath11k
needs to send action frames before vdev is started.
Fix it by removing the second condition.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
Fixes: 66307ca04057 ("ath11k: fix mgmt_tx_wmi cmd sent to FW for deleted vdev")
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220506013614.1580274-3-quic_bqiang@quicinc.com
|
|
Add remain on channel support, it is needed in several
scenarios such as Passpoint etc.
Currently this is supported by QCA6390, WCN6855, IPQ8074,
IPQ6018 and QCN9074.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220506013614.1580274-2-quic_bqiang@quicinc.com
|
|
With WoWLAN enabled and after sleeping for a rather long time,
we are seeing that with some APs, it is not able to wake up
the STA though the correct wake up pattern has been configured.
This is because the host doesn't send keepalive command to
firmware, thus firmware will not send any packet to the AP and
after a specific time the AP kicks out the STA.
Fix this issue by enabling keepalive before going to suspend
and disabling it after resume back.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1
Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Link: https://lore.kernel.org/r/20220506012540.1579604-1-quic_bqiang@quicinc.com
|
|
The &chip->gdev->ei[] array has chip->nlines elements so this >
comparison needs to be >= to prevent an out of bounds access. The
gdev->ei[] array is allocated in hte_register_chip().
Fixes: 31ab09b42188 ("drivers: Add hardware timestamp engine (HTE) subsystem")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dipen Patel <dipenp@nvidia.com>
Acked-by: Dipen Patel <dipenp@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
del_timer() does not wait until the timer handler finishing.
This means that the timer handler may still be running after
the driver's remove function has finished, which would result
in a use-after-free.
Fix it by calling del_timer_sync(), which makes sure the timer
handler has finished.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Dipen Patel <dipenp@nvidia.com>
Acked-by: Dipen Patel <dipenp@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Eliminate the follow versioncheck warning:
./drivers/hte/hte-tegra194-test.c: 8 linux/version.h not needed.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Dipen Patel <dipenp@nvidia.com>
Acked-by: Dipen Patel <dipenp@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
|
|
Remove a couple of unnecessary casts to `(void *)` when initializing the
`.data` members in the device ID table.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20220510115141.212779-3-abbotti@mev.co.uk
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Fix "WARNING: Missing a blank line after declarations" reported by
checkpatch.pl.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Link: https://lore.kernel.org/r/20220510115141.212779-2-abbotti@mev.co.uk
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Expose mlx5_sriov_blocking_notifier_register / unregister APIs to let a
VF register to be notified for its enablement / disablement by the PF.
Upon VF probe it will call mlx5_sriov_blocking_notifier_register() with
its notifier block and upon VF remove it will call
mlx5_sriov_blocking_notifier_unregister() to drop its registration.
This can give a VF the ability to clean some resources upon disable
before that the command interface goes down and on the other hand sets
some stuff before that it's enabled.
This may be used by a VF which is migration capable in few cases.(e.g.
PF load/unload upon an health recovery).
Link: https://lore.kernel.org/r/20220510090206.90374-2-yishaih@nvidia.com
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Remove the irrelevant changelogs and todo notes and just leave the SPDX
marker and the copyright notice.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220419063303.583106-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The copyright statement says:
"Redistribution of this file is permitted under the GNU General Public
License." and was added by Ted in 1993, at which point GPLv2 only
was the default Linux license.
Replace it with the usual GPLv2 only SPDX header.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220419063303.583106-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Merge loop.h into loop.c as all the content is only used there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220419063303.583106-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Make sure to check the pmu type first and then check event->attr.disabled.
Doing so would avoid reading the disabled attribute of an event that is
not handled by TAD PMU.
Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
Link: https://lore.kernel.org/r/20220510102657.487539-1-tanmay@marvell.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
When using the legacy "mmu-masters" DT binding, we reject DMA domains
since we have no guarantee of driver probe order and thus can't rely on
client drivers getting the correct DMA ops. However, we can do better
than fall back to the old no-default-domain behaviour now, by forcing an
identity default domain instead. This also means that detaching from a
VFIO domain can actually work - that looks to have been broken for over
6 years, so clearly isn't something that legacy binding users care
about, but we may as well make the driver code make sense anyway.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/9805e4c492cb972bdcdd57999d2d001a2d8b5aab.1652171938.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
If an eMMC card supports TRIM and indicates that it erases to zeros, we can
use it to support hardware offloading of REQ_OP_WRITE_ZEROES, so let's add
support for this.
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Reviewed-by: Avri Altman <Avri.Altman@wdc.com>
Link: https://lore.kernel.org/r/20220429152118.3617303-1-vincent.whitchurch@axis.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Add driver for Sunplus SP7021 SoC.
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Wells Lu <wellslutw@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
When dma_buf_stats_setup() fails, it closes the dmabuf file which
results into the calling of dma_buf_file_release() where it does
list_del(&dmabuf->list_node) with out first adding it to the proper
list. This is resulting into panic in the below path:
__list_del_entry_valid+0x38/0xac
dma_buf_file_release+0x74/0x158
__fput+0xf4/0x428
____fput+0x14/0x24
task_work_run+0x178/0x24c
do_notify_resume+0x194/0x264
work_pending+0xc/0x5f0
Fix it by moving the dma_buf_stats_setup() after dmabuf is added to the
list.
Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
Signed-off-by: Charan Teja Reddy <quic_charante@quicinc.com>
Tested-by: T.J. Mercier <tjmercier@google.com>
Acked-by: T.J. Mercier <tjmercier@google.com>
Cc: <stable@vger.kernel.org> # 5.15.x+
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1652125797-2043-1-git-send-email-quic_charante@quicinc.com
|
|
Fix following coccichek error:
./drivers/cpufreq/mediatek-cpufreq.c:199:2-8: preceding lock on line
./drivers/cpufreq/mediatek-cpufreq.c:208:2-8: preceding lock on line
mutex_lock is acquired but not released before return.
Use 'goto out' to help releasing the mutex_lock.
Fixes: c210063b40ac ("cpufreq: mediatek: Add opp notification support")
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Rex-BC Chen <rex-bc.chen@mediatek.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
|
|
The impact of this regression is the same for resume that I saw on
thaw: the kernel hangs and nothing except SysRq rebooting can be done.
Fixes regression in commit cbe6c3a8f8f4 ("net: atlantic: invert deep
par in pm functions, preventing null derefs"), where I disabled deep
pm resets in suspend and resume, trying to make sense of the
atl_resume_common() deep parameter in the first place.
It turns out, that atlantic always has to deep reset on pm
operations. Even though I expected that and tested resume, I screwed
up by kexec-rebooting into an unpatched kernel, thus missing the
breakage.
This fixup obsoletes the deep parameter of atl_resume_common, but I
leave the cleanup for the maintainers to post to mainline.
Suspend and hibernation were successfully tested by the reporters.
Fixes: cbe6c3a8f8f4 ("net: atlantic: invert deep par in pm functions, preventing null derefs")
Link: https://lore.kernel.org/regressions/9-Ehc_xXSwdXcvZqKD5aSqsqeNj5Izco4MYEwnx5cySXVEc9-x_WC4C3kAoCqNTi-H38frroUK17iobNVnkLtW36V6VWGSQEOHXhmVMm5iQ=@protonmail.com/
Reported-by: Jordan Leppert <jordanleppert@protonmail.com>
Reported-by: Holger Hoffstaette <holger@applied-asynchrony.com>
Tested-by: Jordan Leppert <jordanleppert@protonmail.com>
Tested-by: Holger Hoffstaette <holger@applied-asynchrony.com>
CC: <stable@vger.kernel.org> # 5.10+
Signed-off-by: Manuel Ullmann <labre@posteo.de>
Link: https://lore.kernel.org/r/87bkw8dfmp.fsf@posteo.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Currently we use PORTn_OFFSET to locate PORT DFLs, and PORT DFLs are not
connected FME DFL. But for some cases (e.g. Intel Open FPGA Stack device),
PORT DFLs are connected to FME DFL directly, so we don't need to search
PORT DFLs via PORTn_OFFSET again. If BAR value of PORTn_OFFSET is 0x7
(FME_PORT_OFST_BAR_SKIP) then driver will skip searching the DFL for that
port. If BAR value is invalid, return -EINVAL.
Signed-off-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
Signed-off-by: Tianfei Zhang <tianfei.zhang@intel.com>
Acked-by: Wu Hao <hao.wu@intel.com>
Link: https://lore.kernel.org/r/20220505100617.703672-1-tianfei.zhang@intel.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
|
|
Previously the feature IDs defined are unique, no matter
which feature type. But currently we want to extend its
usage to have a per-type feature ID space, so this patch
adds feature type checking as well just before look into
feature ID for different features which have irq info.
Signed-off-by: Tianfei zhang <tianfei.zhang@intel.com>
Reviewed-by: Matthew Gerlach <matthew.gerlach@linux.intel.com>
Acked-by: Wu Hao <hao.wu@intel.com>
Acked-by: Moritz Fischer <mdf@kernel.org>
Link: https://lore.kernel.org/r/20220419032942.427429-2-tianfei.zhang@intel.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
|
|
To fix below kernel-doc warnings this patch does the following
->Replaced Return\Returns with 'Return:' keyword.
->Added 'Return' description For __init of_fpga_region_init()' API.
->Added description for 'child_regions_with_firmware()' API.
warning: No description found for return value of
'of_fpga_region_find'.
warning: No description found for return value of
'of_fpga_region_get_bridges'.
warning: missing initial short description on line:
* child_regions_with_firmware
warning: No description found for return value of
'child_regions_with_firmware'.
warning: No description found for return value of
'of_fpga_region_notify_pre_apply'.
warning: No description found for return value of
'of_fpga_region_notify'.
warning: No description found for return value of
'of_fpga_region_init'.
Signed-off-by: Nava kishore Manne <nava.manne@xilinx.com>
Link: https://lore.kernel.org/r/20220423170235.2115479-6-nava.manne@xilinx.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
|
|
In FPGA Makefile has both space and tab indentation, to
make them align use tab instead of space indentation.
Signed-off-by: Nava kishore Manne <nava.manne@xilinx.com>
Link: https://lore.kernel.org/r/20220423170235.2115479-5-nava.manne@xilinx.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
|
|
warnings: No description found for return value of 'xxx'
In-order to fix the above kernel-doc warnings added the
'Return' description for 'devm_fpga_mgr_register_full()'
and 'devm_fpga_mgr_register()' APIs.
Signed-off-by: Nava kishore Manne <nava.manne@xilinx.com>
Link: https://lore.kernel.org/r/20220423170235.2115479-4-nava.manne@xilinx.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
|
|
fixes the below checks reported by checkpatch.pl:
- Lines should not end with a '('
- Alignment should match open parenthesis
Signed-off-by: Nava kishore Manne <nava.manne@xilinx.com>
Link: https://lore.kernel.org/r/20220423170235.2115479-3-nava.manne@xilinx.com
Signed-off-by: Xu Yilun <yilun.xu@intel.com>
|
|
The TSN endpoint Ethernet MAC supports a free running counter
additionally to its clock. This free running counter can be read and
hardware timestamps are supported. As the name implies, this counter
cannot be set and its frequency cannot be adjusted.
Add free running cycle counter support based on this free running
counter to physical clock. This also requires hardware time stamps
based on that free running counter.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ptp_convert_timestamp() is called in the RX path of network messages.
The current implementation takes ~5000ns on 1.2GHz A53. This is too much
for the hot path of packet processing.
Introduce hash table for fast vclock lookup in ptp_convert_timestamp().
The execution time of ptp_convert_timestamp() is reduced to ~700ns on
1.2GHz A53.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ptp_convert_timestamp() converts only the timestamp hwtstamp, which is
a field of the argument with the type struct skb_shared_hwtstamps *. So
a pointer to the hwtstamp field of this structure is sufficient.
Rework ptp_convert_timestamp() to use an argument of type ktime_t *.
This allows to add additional timestamp manipulation stages before the
call of ptp_convert_timestamp().
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ptp vclocks require a free running time for their timecounter.
Currently only a physical clock forced to free running is supported.
If vclocks are used, then the physical clock cannot be synchronized
anymore. The synchronized time is not available in hardware in this
case. As a result, timed transmission with TAPRIO hardware support
is not possible anymore.
If hardware would support a free running time additionally to the
physical clock, then the physical clock does not need to be forced to
free running. Thus, the physical clocks can still be synchronized
while vclocks are in use.
The physical clock could be used to synchronize the time domain of the
TSN network and trigger TAPRIO. In parallel vclocks can be used to
synchronize other time domains.
Introduce support for a free running cycle counter called cycles to
physical clocks. Rework ptp vclocks to use this free running cycle
counter. Default implementation is based on time of physical clock.
Thus, behavior of ptp vclocks based on physical clocks without free
running cycle counter is identical to previous behavior.
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Since commit 4e30e98c4b4c ("dpaa2-mac: return -EPROBE_DEFER from dpaa2_mac_open in case the fwnode is not set")
@parent can't be NULL after the if. It's either the address
of the ->fwnode of @dpmacs or @fwnode in case of ACPI.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20220506200029.852310-1-kuba@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Lag state has become very complicated with many modes, flags, types and
port selections methods and future work will add additional features.
Add a debugfs to query the current lag state. A new directory named "lag"
will be created under the mlx5 debugfs directory. As the driver has
debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag
For example:
/sys/kernel/debug/mlx5/0000:08:00.0/lag
The following files are exposed:
- state: Returns "active" or "disabled". If "active" it means hardware
lag is active.
- members: Returns the BDFs of all the members of lag object.
- type: Returns the type of the lag currently configured. Valid only
if hardware lag is active.
* "roce" - Members are bare metal PFs.
* "switchdev" - Members are in switchdev mode.
* "multipath" - ECMP offloads.
- port_sel_mode: Returns the egress port selection method, valid
only if hardware lag is active.
* "queue_affinity" - Egress port is selected by
the QP/SQ affinity.
* "hash" - Egress port is selected by hash done on
each packet. Controlled by: xmit_hash_policy of the
bond device.
- flags: Returns flags that are specific per lag @type. Valid only if
hardware lag is active.
* "shared_fdb" - "on" or "off", if "on" single FDB is used.
- mapping: Returns the mapping which is used to select egress port.
Valid only if hardware lag is active.
If @port_sel_mode is "hash" returns the active egress ports.
The hash result will select only active ports.
if @port_sel_mode is "queue_affinity" returns the mapping
between the configured port affinity of the QP/SQ and actual
egress port. For example:
* 1:1 - Mapping means if the configured affinity is port 1
traffic will egress via port 1.
* 1:2 - Mapping means if the configured affinity is port 1
traffic will egress via port 2. This can happen
if port 1 is down or in active/backup mode and port 1
is backup.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When in hardware lag and the NIC has more than 2 ports when one port
goes down need to distribute the traffic between the remaining
active ports.
For better spread in such cases instead of using 1-to-1 mapping and only
4 slots in the hash, use many.
Each port will have many slots that point to it. When a port goes down
go over all the slots that pointed to that port and spread them between
the remaining active ports. Once the port comes back restore the default
mapping.
We will have number_of_ports * MLX5_LAG_MAX_HASH_BUCKETS slots.
Each MLX5_LAG_MAX_HASH_BUCKETS belong to a different port.
The native mapping is such that:
port 1: The first MLX5_LAG_MAX_HASH_BUCKETS slots are: [1, 1, .., 1]
which means if a packet is hased into one of this slots it will hit the
wire via port 1.
port 2: The second MLX5_LAG_MAX_HASH_BUCKETS slots are: [2, 2, .., 2]
which means if a packet is hased into one of this slots it will hit the
wire via port2.
and this mapping is the same of the rest of the ports.
On a failover, lets say port 2 goes down (port 1, 3, 4 are still up).
the new mapping for port 2 will be:
port 2: The second MLX5_LAG_MAX_HASH_BUCKETS are: [1, 3, 1, 4, .., 4]
which means the mapping was changed from the native mapping to a mapping
that consists of only the active ports.
With this if a port goes down the traffic will be split between the
active ports randomly
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Combine dmesg lag prints into a single function.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Increase the define MLX5_MAX_PORTS to 4 as the driver is ready
to support NICs with 4 ports.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Refactor the entire lag code to use ldev->ports instead of hard-coded
defines (like MLX5_MAX_PORTS) for its operations.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Downstream patches will add support for lag over 4 ports.
In that mode we will only use hash as the uplink selection method.
Using hash instead of queue affinity (before this patch) offers key
advantages like:
- Align ports selection method with the method used by the bond device
- Better packets distribution where a single queue can transmit from
multiple ports (with queue affinity a queue is bound to a single port
regardless of the packet being sent).
- In case of failover we traffic is split between multiple ports and not
a single one like in queue affinity.
Going forward it was decided that queue affinity will be deprecated
as using hash provides a better user experience which means on 4 ports
HCAs hash will always be used.
Future work will add hash support for 2 ports HCAs as well.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
E-Switch currently doesn't support more than 2 E-Switch managers
being aggregated under a single hardware lag. Have specific checks
to disallow creating lag when the code doesn't support it.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Store the number of lag ports inside the lag object. Lag object is a single
shared object managing the lag state of multiple mlx5 devices on the same
physical HCA.
Downstream patches will allow hardware lag to be created over devices with
more than 2 ports.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
When search for a peer lag device we can filter based on that
device's capabilities.
Downstream patch will be less strict when filtering compatible devices
and remove the limitation where we require exact MLX5_MAX_PORTS and
change it to a range.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Use a lag specific lock instead of depending on external locks to
synchronise the lag creation/destruction.
With this, taking E-Switch mode lock is no longer needed for syncing
lag logic.
Cleanup any dead code that is left over and don't export functions that
aren't used outside the E-Switch core code.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
There is no need to expose E-Switch function for something that can be
checked with already present API inside lag code.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Devcom API is intended to be used between 2 devices only add this
implied assumption into the code and check when it's no true.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|