Age | Commit message (Collapse) | Author |
|
Update lpfc version to 14.0.0.1
Link: https://lore.kernel.org/r/20210816162901.121235-16-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add a bsg ioctl to allow user applications to retrieve the adapter
congestion management framework buffer.
Link: https://lore.kernel.org/r/20210816162901.121235-15-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Allow abbreviated cm framework status information to be obtained via sysfs.
Link: https://lore.kernel.org/r/20210816162901.121235-14-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add support via debugfs to report the cm statistics, cm enablement, and rx
monitor information.
Link: https://lore.kernel.org/r/20210816162901.121235-13-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add the logic to move the congestion management and event information into
the cmd statistics buffer maintained for the adapter. The update includes
rolling up values for the last minute, hour, and day information.
Link: https://lore.kernel.org/r/20210816162901.121235-12-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The driver provides overwatch of the cm behavior by maintaining a set of rx
I/O statistics. This information is also used in later updating of the cm
statistics buffer.
Link: https://lore.kernel.org/r/20210816162901.121235-11-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Complete the enablement of the cm framework feature in the adapter. Perform
the following:
- Detect the presence of the congestion management framework feature.
When the cm framework is present:
- Issue the SET_FEATURE command to enable the feature.
- Register the cm statistics buffer with the adapter.
- Read the cm enablement buffer to determine the cm framework state for cm
management.
When cm management is enabled:
- Monitor all FPIN and congestion signalling events, incrementing
counters.
- Regularly sync with the adapter to communicate congestion events and to
receive an rx request limit.
- Monitor requests for rx data and ensure that no more than the
adapter prescribed limit is issued on the link. If the limit is
exceeded, SCSI and/or NVMe traffic is temporarily suspended.
- Maintain the minute, hourly, daily statistics buffer.
- Monitor for congestion enablement change events, causing a reread of the
enablement buffer and acting on any change in enablement.
And:
- Add teardown logic, including buffer deregistration, on adapter
detachment or reset.
Link: https://lore.kernel.org/r/20210816162901.121235-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When congestion mgmt is enabled, cmf has the driver regularly issue a
command to synchronize reporting of congestion mgmt events such as fpin and
signal delivery.
This patch adds the definition of the CMF_SYNC WQE and its CQE fields as
well as support for issuing the command. The patch also adds the few
remaining cmf-related SLI additions, such as feature definition for
enablement of CMF and notifications to the driver if the cm enablement mode
changes.
Link: https://lore.kernel.org/r/20210816162901.121235-9-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
As part of the cmf framework, the firmware maintains a table with
congestion related state information, specifically whether enabled and if
enabled, whether monitoring or actively managing congestion.
Add definition of the table and add support to read the table from the
adapter and determine if it is enabled. In support of this, the READ_OBJECT
mailbox command definition is added to the driver.
Link: https://lore.kernel.org/r/20210816162901.121235-8-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The cmf framework requires the driver to maintain a cm statistics table,
accessible inband, of congestion related statistics that are reported per
minute, rolled up to per hour, and rolled up again per day. Several days
worth may be maintained. The table is registered with the adapter when the
MIB feature is enabled.
Add definition of the table and add support to register the table with the
adapter. Includes definition and initialization of event counters that are
later added to the statistics table.
Link: https://lore.kernel.org/r/20210816162901.121235-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When congestion management is enabled, issue EDC ELS to register congestion
signaling capabilities with the fabric. The response handling will process
the fabric parameters and set the reporting parameters.
Similarly, add support for receiving an EDC request from the fabric
generating a corresponding response.
Implement handlers for congestion signals from the fabric and maintain
statistics for them.
Link: https://lore.kernel.org/r/20210816162901.121235-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Expand FPIN logging:
- Display Attached Port Names for Link Integrity and Peer Congestion
events
- Log Delivery, Peer Congestion, and Congestion events
- Sanity check FPIN descriptor lengths when processing FPIN descriptors.
Log RDF events when congestion logging is enabled.
Link: https://lore.kernel.org/r/20210816162901.121235-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
MIB support is currently limited to detecting support in the adapter and
ensuring FDMI support is enabled if present. For the new framework MIB
support also requires active enablement of support via the SET_FEATURES
command with the firmware.
Rework the MIB detection and enablement for the following:
- Move detection away from the get_sli4_parameters routine, and into the
hba_setup path. get_sli4_parameters is only called once at attachment
while hba_setup is called as part of any SLI port reset path. This
ensures detection after firmware download.
- Update SET_FEATURES mbx command for the MIB enablement feature and add
support for the feature.
- Create the cmf_setup routine to encapsulate the detection of MIB support
and perform the enablement of the MIB support feature.
Link: https://lore.kernel.org/r/20210816162901.121235-4-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Implement the SET_HOST_DATA mbox command to set date / time during
initialization. It is used by the firmware for various purposes including
congestion management and firmware dumps.
Link: https://lore.kernel.org/r/20210816162901.121235-3-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add Exchange Diagnostic Capabilities (EDC) ELS definition and the following
capability descriptors:
- Link Fault Capability Descriptor
- Congestion Signaling Capability Descriptor
Definitions taken from FC-LS-5 r5.01
Link: https://lore.kernel.org/r/20210816162901.121235-2-jsmart2021@gmail.com
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Change "allcation" to "allocation".
Link: https://lore.kernel.org/r/1891546521.01629282781634.JavaMail.epsvc@epcpadp4
Signed-off-by: Keoseong Park <keosung.park@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Driver is not setting up IRQs in the resume path. As a result, hibernation
path is broken and controller will not be operational after system is
resumed.
Set up IRQs to handle the hibernation case.
Link: https://lore.kernel.org/r/20210818081755.1274470-1-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Cc: thenzl@redhat.com
Reported-by: Marco Patalano <mpatalan@redhat.com>
Tested-by: Marco Patalano <mpatalan@redhat.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Link: https://lore.kernel.org/r/20210823133458.3536824-1-pasic@linux.ibm.com
Fixes: f2542a3be327 ("scsi: scsi_ioctl: Move the "block layer" SCSI ioctl handling to drivers/scsi")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When HPB pinned region exists and mctx allocation for this region fails, a
memory leak is possible because memory is not released for the subregion
table of the current region.
Free memory for the subregion table of the current region.
Link: https://lore.kernel.org/r/1891546521.01629711601304.JavaMail.epsvc@epcpadp3
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Keoseong Park <keosung.park@samsung.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
There is a spelling mistake in a SNIC_HOST_INFO message. Fix it.
Link: https://lore.kernel.org/r/20210820151835.59804-1-colin.king@canonical.com
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Link: https://lore.kernel.org/r/20210820095405.12801-4-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
ncr_reset_bus() will complete all outstanding commands anyway, so there's
no need to single out a specific command.
Link: https://lore.kernel.org/r/20210820095405.12801-3-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Always '1', so we can remove it.
Link: https://lore.kernel.org/r/20210820095405.12801-2-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Yup, the VFS hoist broke it, and nobody noticed. Bulkstat workloads
make it clear that it doesn't work as it should.
Fixes: dae2f8ed7992 ("fs: Lift XFS_IDONTCACHE to the VFS layer")
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
|
|
Without suspend/resume callbacks, the PHY cannot be powered down/up
administratively.
Fixes: e40d2cca0189 ("net: phy: add MediaTek Gigabit Ethernet PHY driver")
Signed-off-by: DENG Qingfang <dqfext@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20210823044422.164184-1-dqfext@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
br_handle_ingress_vlan_tunnel() is only referenced in
br_handle_frame(). If br_handle_ingress_vlan_tunnel() is called and
return non-zero value, goto drop in br_handle_frame().
But, br_handle_ingress_vlan_tunnel() always return 0. So, the
routines that check the return value and goto drop has no meaning.
Therefore, change return type of br_handle_ingress_vlan_tunnel() to
void and remove if statement of br_handle_frame().
Signed-off-by: Kangmin Park <l4stpr0gr4m@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Link: https://lore.kernel.org/r/20210823102118.17966-1-l4stpr0gr4m@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are several test cases in the net directory are still using
exit 0 or exit 1 when they need to be skipped. Use kselftest
framework skip code instead so it can help us to distinguish the
return status.
Criterion to filter out what should be fixed in net directory:
grep -r "exit [01]" -B1 | grep -i skip
This change might cause some false-positives if people are running
these test scripts directly and only checking their return codes,
which will change from 0 to 4. However I think the impact should be
small as most of our scripts here are already using this skip code.
And there will be no such issue if running them with the kselftest
framework.
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Link: https://lore.kernel.org/r/20210823085854.40216-1-po-hsu.lin@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This allows using the normal sg_table APIs and makes all the code
cleaner. Remove sgt, nents and nmapd from ib_umem.
Link: https://lore.kernel.org/r/20210824142531.3877007-4-maorg@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
orig_nents should represent the number of entries with pages,
but __sg_alloc_table_from_pages sets orig_nents as the number of
total entries in the table. This is wrong when the API is used for
dynamic allocation where not all the table entries are mapped with
pages. It wasn't observed until now, since RDMA umem who uses this
API in the dynamic form doesn't use orig_nents implicit or explicit
by the scatterlist APIs.
Fix it by changing the append API to track the SG append table
state and have an API to free the append table according to the
total number of entries in the table.
Now all APIs set orig_nents as number of enries with pages.
Fixes: 07da1223ec93 ("lib/scatterlist: Add support in dynamic allocation of SG table from pages")
Link: https://lore.kernel.org/r/20210824142531.3877007-3-maorg@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
AUDIT_TRIM is expected to be idempotent, but multiple executions resulted
in a refcount underflow and use-after-free.
git bisect fingered commit fb041bb7c0a9 ("locking/refcount: Consolidate
implementations of refcount_t") but this patch with its more thorough
checking that wasn't in the x86 assembly code merely exposed a previously
existing tree refcount imbalance in the case of tree trimming code that
was refactored with prune_one() to remove a tree introduced in
commit 8432c7006297 ("audit: Simplify locking around untag_chunk()")
Move the put_tree() to cover only the prune_one() case.
Passes audit-testsuite and 3 passes of "auditctl -t" with at least one
directory watch.
Cc: Jan Kara <jack@suse.cz>
Cc: Will Deacon <will@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Seiji Nishikawa <snishika@redhat.com>
Cc: stable@vger.kernel.org
Fixes: 8432c7006297 ("audit: Simplify locking around untag_chunk()")
Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
[PM: reformatted/cleaned-up the commit description]
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
The block layer may call the I/O scheduler .finish_request() callback
without having called the .insert_requests() callback. Make sure that the
mq-deadline I/O statistics are correct if the block layer inserts an I/O
request that bypasses the I/O scheduler. This patch prevents that lower
priority I/O is delayed longer than necessary for mixed I/O priority
workloads.
Cc: Niklas Cassel <Niklas.Cassel@wdc.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Reported-by: Niklas Cassel <Niklas.Cassel@wdc.com>
Fixes: 08a9ad8bf607 ("block/mq-deadline: Add cgroup support")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20210824170520.1659173-1-bvanassche@acm.org
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Remove the architecture-specific code for handling the
"linux,usable-memory-range" property under the "/chosen" node in DT, as
the platform-agnostic FDT core code already takes care of this.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/7356c531c49a24b4a55577bf8e46d93f4d8ae460.1628670468.git.geert+renesas@glider.be
|
|
Remove the architecture-specific code for handling the
"linux,elfcorehdr" property under the "/chosen" node in DT, as the
platform-agnostic handling in the FDT core code already takes care of
this.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/3b8f801f9b92066855e87f3079fafc153ab20f69.1628670468.git.geert+renesas@glider.be
|
|
RISC-V uses platform-specific code to locate the elf core header in
memory. However, this does not conform to the standard
"linux,elfcorehdr" DT bindings, as it relies on a reserved memory node
with the "linux,elfcorehdr" compatible value, instead of on a
"linux,elfcorehdr" property under the "/chosen" node.
The non-compliant code can just be removed, as the standard behavior is
already implemented by platform-agnostic handling in the FDT core code.
Fixes: 5640975003d0234d ("RISC-V: Add crash kernel support")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/41c75d6ee3114ae6304f8afe0051895af91200ee.1628670468.git.geert+renesas@glider.be
|
|
Replace the conditional compilation using "#ifdef CONFIG_BLK_DEV_INITRD"
by a check for "IS_ENABLED(CONFIG_BLK_DEV_INITRD)", to increase compile
coverage and to simplify the code.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/604c13747f09d800da6a7c12f661e1ec146f1dfd.1628670468.git.geert+renesas@glider.be
|
|
Add support for handling the "linux,usable-memory-range" property in the
"/chosen" node to the FDT core code. This can co-exist safely with the
architecture-specific handling, until the latter has been removed.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/3bd69bada93ee59b7d23c38b3527fc1654e19343.1628670468.git.geert+renesas@glider.be
|
|
There are two methods to specify the location of the elf core headers:
using the "elfcorehdr=" kernel parameter, as handled by generic code in
kernel/crash_dump.c, or using the "linux,elfcorehdr" property under the
"/chosen" node in the Device Tree, as handled by architecture-specific
code in arch/arm64/mm/init.c.
Extend support for "linux,elfcorehdr" to all platforms supporting DT by
adding platform-agnostic handling for handling this property to the FDT
core code. This can co-exist safely with the architecture-specific
handling, until the latter has been removed.
This requires moving the call to of_scan_flat_dt() up, as the code
scanning the "/chosen" node now needs to be aware of the values of
"#address-cells" and "#size-cells".
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/c7e46e50aaf87ef49bdaa61358d25b122f32b7df.1628670468.git.geert+renesas@glider.be
|
|
Make the forward declarations of elfcorehdr_addr and elfcorehdr_size,
and the definitions of ELFCORE_ADDR_MAX and ELFCORE_ADDR_ERR always
available, like is done for phys_initrd_start and phys_initrd_size.
Code referring to these symbols can then just check for
IS_ENABLED(CONFIG_CRASH_DUMP), instead of requiring conditional
compilation using an #ifdef, thus preparing to increase compile
coverage.
Suggested-by: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/ba965ca613c0cc82c1ec2fe353ee34fb13b36474.1628670468.git.geert+renesas@glider.be
|
|
Convert Samsung Exynos5422 SoC frequency and voltage scaling for
Dynamic Memory Controller to DT schema format using json-schema.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Acked-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20210820150353.161161-3-krzysztof.kozlowski@canonical.com
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Convert Samsung Exynos PPMU bindings to DT schema format using
json-schema. The example is quite different due to the nature of
dtschema examples parsing (no overriding via-label allowed).
New bindings contain copied description from previous bindings document,
therefore the license is set as GPL-2.0-only.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Link: https://lore.kernel.org/r/20210820150353.161161-2-krzysztof.kozlowski@canonical.com
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Kumar Kartikeya says:
====================
This set revamps XDP samples related to redirection to show better output and
implement missing features consolidating all their differences and giving them a
consistent look and feel, by implementing common features and command line
options. Some of the TODO items like reporting redirect error numbers
(ENETDOWN, EINVAL, ENOSPC, etc.) have also been implemented.
Some of the features are:
* Received packet statistics
* xdp_redirect/xdp_redirect_map tracepoint statistics
* xdp_redirect_err/xdp_redirect_map_err tracepoint statistics (with support for
showing exact errno)
* xdp_cpumap_enqueue/xdp_cpumap_kthread tracepoint statistics
* xdp_devmap_xmit tracepoint statistics
* xdp_exception tracepoint statistics
* Per ifindex pair devmap_xmit stats shown dynamically (for xdp_monitor) to
decompose the total.
* Use of BPF skeleton and BPF static linking to share BPF programs.
* Use of vmlinux.h and tp_btf for raw_tracepoint support.
* Removal of redundant -N/--native-mode option (enforced by default now)
* ... and massive cleanups all over the place.
All tracepoints also use raw_tp now, and tracepoints like xdp_redirect
are only enabled when requested explicitly to capture successful redirection
statistics.
The set of programs converted as part of this series are:
* xdp_redirect_cpu
* xdp_redirect_map_multi
* xdp_redirect_map
* xdp_redirect
* xdp_monitor
Explanation of the output:
There is now a concise output mode by default that shows primarily four fields:
rx/s Number of packets received per second
redir/s Number of packets successfully redirected per second
err,drop/s Aggregated count of errors per second (including dropped packets)
xmit/s Number of packets transmitted on the output device per second
Some examples:
; sudo ./xdp_redirect_map veth0 veth1 -s
Redirecting from veth0 (ifindex 15; driver veth) to veth1 (ifindex 14; driver veth)
veth0->veth1 0 rx/s 0 redir/s 0 err,drop/s 0 xmit/s
veth0->veth1 9,998,660 rx/s 9,998,658 redir/s 0 err,drop/s 9,998,654 xmit/s
...
There is also a verbose mode, that can also be enabled by default using -v (--verbose).
The output mode can be switched dynamically at runtime using Ctrl + \ (SIGQUIT).
To make the concise output more useful, the errors that occur are expanded inline
(as if verbose mode was enabled) to let the user pin down the source of the
problem without having to clutter output (or possibly miss it) or always use verbose mode.
For instance, let's consider a case where the output device link state is set to
down while redirection is happening:
[...]
veth0->veth1 24,503,376 rx/s 0 err,drop/s 24,503,372 xmit/s
veth0->veth1 25,044,775 rx/s 0 err,drop/s 25,044,783 xmit/s
veth0->veth1 25,263,046 rx/s 4 err,drop/s 25,263,028 xmit/s
redirect_err 4 error/s
ENETDOWN 4 error/s
[...]
The same holds for xdp_exception actions.
An example of how a complete xdp_redirect_map session would look:
; sudo ./xdp_redirect_map veth0 veth1
Redirecting from veth0 (ifindex 5; driver veth) to veth1 (ifindex 4; driver veth)
veth0->veth1 7,411,506 rx/s 0 err,drop/s 7,411,470 xmit/s
veth0->veth1 8,931,770 rx/s 0 err,drop/s 8,931,771 xmit/s
^\
veth0->veth1 8,787,295 rx/s 0 err,drop/s 8,787,325 xmit/s
receive total 8,787,295 pkt/s 0 drop/s 0 error/s
cpu:7 8,787,295 pkt/s 0 drop/s 0 error/s
redirect_err 0 error/s
xdp_exception 0 hit/s
xmit veth0->veth1 8,787,325 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg
cpu:7 8,787,325 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg
veth0->veth1 8,842,610 rx/s 0 err,drop/s 8,842,606 xmit/s
receive total 8,842,610 pkt/s 0 drop/s 0 error/s
cpu:7 8,842,610 pkt/s 0 drop/s 0 error/s
redirect_err 0 error/s
xdp_exception 0 hit/s
xmit veth0->veth1 8,842,606 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg
cpu:7 8,842,606 xmit/s 0 drop/s 0 drv_err/s 2.00 bulk-avg
^C
Packets received : 33,973,181
Average packets/s : 4,246,648
Packets transmitted : 33,973,172
Average transmit/s : 4,246,647
The xdp_redirect tracepoint (for success stats) needs to be enabled explicitly
using --stats/-s. Documentation for entire output and options is provided when
user specifies --help/-h with a sample.
Changelog:
----------
v3 -> v4:
v3: https://lore.kernel.org/bpf/20210728165552.435050-1-memxor@gmail.com
* Address all feedback from Daniel
* Use READ_ONCE/WRITE_ONCE from linux/compiler.h (cannot directly include
due to conflicts with vmlinux.h)
* Fix MAX_CPUS hardcoding by switching to mmapable array maps, that are
resized based on the value of libbpf_num_possible_cpus
* s/ELEMENTS_OF/ARRAY_SIZE/g
* Use tools/include/linux/hashtable.h
* Coding style fixes
* Remove hyperlinks for tracepoints
* Split into smaller reviewable changes
* Restore support for specifying custom xdp_redirect_cpu cpumap prog with some
enhancements, including built-in programs for common actions (pass, drop,
redirect). By default, cpumap prog is now disabled.
* Misc bug fixes all over the place
The printing stuff is a lot more basic without hyperlink support, hence it
has not been exported into a more general facility.
v2 -> v3
v2: https://lore.kernel.org/bpf/20210721212833.701342-1-memxor@gmail.com
* Address all feedback from Andrii
* Replace usage of libbpf hashmap (internal API) with custom one
* Rename ATOMIC_* macros to NO_TEAR_* to better reflect their use
* Use size_t as a portable word sized data type
* Set libbpf_set_strict_mode
* Invert conditions in BPF programs to exit early and reduce nesting
* Use canonical SEC("xdp") naming for all XDP BPF progams
* Add missing help description for cpumap enqueue and kthread tracepoints
* Move private struct declarations from xdp_sample_user.h to .c file
* Improve help output for cpumap enqueue and cpumap kthread tracepoints
* Fix a bug where keys array for BPF_MAP_LOOKUP_BATCH is overallocated
* Fix some conditions for printing stats (earlier only checked pps, now pps,
drop, err and print if any is greater than zero)
* Fix alloc_stats_record to properly return and cleanup allocated memory on
allocation failure instead of calling exit(3)
* Bump bpf_map_lookup_batch count to 32 to reduce lookup time with multiple
devices in map
* Fix a bug where devmap_xmit_multi stats are not printed when previous record
is missing (i.e. when the first time stats are printed), by simply using a
dummy record that is zeroed out
* Also print per-CPU counts for devmap_xmit_multi which we collect already
* Change mac_map to be BPF_MAP_TYPE_HASH instead of array to prevent resizing
to a large size when max_ifindex is high, in xdp_redirect_map_multi
* Fix instance of strerror(errno) in sample_install_xdp to use saved errno
* Provide a usage function from samples helper
* Provide a fix where incorrect stats are shown for parallel sessions of
xdp_redirect_* samples by introducing matching support for input device(s),
output device(s) and cpumap map id for enqueue and kthread stats.
Only xdp_monitor doesn't filter stats, all others do.
RFC (v1) -> v2
RFC (v1): https://lore.kernel.org/bpf/20210528235250.2635167-1-memxor@gmail.com
* Address all feedback from Andrii
* Use BPF static linking
* Use vmlinux.h
* Use BPF_PROG macro
* Use global variables instead of maps
* Use of tp_btf for raw_tracepoint progs
* Switch to timerfd for polling
* Use libbpf hashmap for maintaing device sets for per ifindex pair
devmap_xmit stats
* Fix Makefile to specify object dependencies properly
* Use in-tree bpftool
* ... misc fixes and cleanups all over the place
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper. Also adapt to change of type of mac address map, so that
no resizing is required.
Add a new flag for sample mask that skips priting the
from_device->to_device heading for each line, as xdp_redirect_map_multi
may have two devices but the flow of data may be bidirectional, so the
output would be confusing.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-23-memxor@gmail.com
|
|
One of the notable changes is using a BPF_MAP_TYPE_HASH instead of array
map to store mac addresses of devices, as the resizing behavior was
based on max_ifindex, which unecessarily maximized the capacity of map
beyond what was needed.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-22-memxor@gmail.com
|
|
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper.
Since get_mac_addr is already provided by XDP samples helper, we drop
it. Also convert to XDP samples helper similar to prior samples to
minimize duplication of code.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-21-memxor@gmail.com
|
|
Also update it to use consistent SEC("xdp") and SEC("xdp_devmap")
naming, and use global variable instead of BPF map for copying the mac
address.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-20-memxor@gmail.com
|
|
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper.
Similar to xdp_monitor, xdp_redirect_cpu was quite featureful except a
few minor omissions (e.g. redirect errno reporting). All of these have
been moved to XDP samples helper, hence drop the unneeded code and
convert to usage of helpers provided by it.
One of the important changes here is dropping of mprog-disable option,
as we make that the default. Also, we support built-in programs for some
common actions on the packet when it reaches kthread (pass, drop,
redirect to device). If the user still needs to install a custom
program, they can still supply a BPF object, however the program should
be suitably tagged with SEC("xdp_cpumap") annotation so that the
expected attach type is correct when updating our cpumap map element.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-19-memxor@gmail.com
|
|
Similar to xdp_monitor_kern, a lot of these BPF programs have been
reimplemented properly consolidating missing features from other XDP
samples. Hence, drop the unneeded code and rename to .bpf.c suffix.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-18-memxor@gmail.com
|
|
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper.
One important note:
The XDP samples helper handles ownership of installed XDP programs on
devices, including responding to SIGINT and SIGTERM, so drop the code
here and use the helpers we provide going forward for all xdp_redirect*
conversions.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-17-memxor@gmail.com
|
|
We moved swap_src_dst_mac to xdp_sample.bpf.h to be shared with other
potential users, so drop it while moving code to the new file.
Also, consistently use SEC("xdp") naming instead.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-16-memxor@gmail.com
|
|
Use the libbpf skeleton facility and other utilities provided by XDP
samples helper.
A lot of the code in xdp_monitor and xdp_redirect_cpu has been moved to
the xdp_sample_user.o helper, so we remove the duplicate functions here
that are no longer needed.
Thanks to BPF skeleton, we no longer depend on order of tracepoints to
uninstall them on startup. Instead, the sample mask is used to install
the needed tracepoints.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210821002010.845777-15-memxor@gmail.com
|