summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-02-16net/sched: Retire rsvp classifierJamal Hadi Salim
The rsvp classifier has served us well for about a quarter of a century but has has not been getting much maintenance attention due to lack of known users. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net/sched: Retire tcindex classifierJamal Hadi Salim
The tcindex classifier has served us well for about a quarter of a century but has not been getting much TLC due to lack of known users. Most recently it has become easy prey to syzkaller. For this reason, we are retiring it. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net/sched: Retire dsmark qdiscJamal Hadi Salim
The dsmark qdisc has served us well over the years for diffserv but has not been getting much attention due to other more popular approaches to do diffserv services. Most recently it has become a shooting target for syzkaller. For this reason, we are retiring it. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net/sched: Retire ATM qdiscJamal Hadi Salim
The ATM qdisc has served us well over the years but has not been getting much TLC due to lack of known users. Most recently it has become a shooting target for syzkaller. For this reason, we are retiring it. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net/sched: Retire CBQ qdiscJamal Hadi Salim
While this amazing qdisc has served us well over the years it has not been getting any tender love and care and has bitrotted over time. It has become mostly a shooting target for syzkaller lately. For this reason, we are retiring it. Goodbye CBQ - we loved you. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16Merge branch 'adding-sparx5-es0-vcap-support'Paolo Abeni
Steen Hegelund says: ==================== Adding Sparx5 ES0 VCAP support This provides the Egress Stage 0 (ES0) VCAP (Versatile Content-Aware Processor) support for the Sparx5 platform. The ES0 VCAP is an Egress Access Control VCAP that uses frame keyfields and previously classified keyfields to add, rewrite or remove VLAN tags on the egress frames, and is therefore often referred to as the rewriter. The ES0 VCAP also supports trapping frames to the host. The ES0 VCAP has 1 lookup accessible with this chain id: - chain 10000000: ES0 Lookup 0 The ES0 VCAP does not do traffic classification to select a keyset, but it does have two keysets that can be used on all traffic. For now only the ISDX keyset is used. The ES0 VCAP can match on an ISDX key (Ingress Service Index) as one of the frame metadata keyfields, similar to the ES2 VCAP. The ES0 VCAP uses external counters in the XQS (statistics) group. ==================== Link: https://lore.kernel.org/r/20230214104049.1553059-1-steen.hegelund@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net: microchip: sparx5: Add TC vlan action support for the ES0 VCAPSteen Hegelund
This provides these 3 actions for rule in the ES0 VCAP: - action vlan pop - action vlan modify id X priority Y - action vlan push id X priority Y protocol Z Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net: microchip: sparx5: Add TC support for the ES0 VCAPSteen Hegelund
This enables the TC command to use the Sparx5 ES0 VCAP, and handling of rule links between IS0 and ES0. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net: microchip: sparx5: Add ES0 VCAP keyset configuration for Sparx5Steen Hegelund
This adds the ES0 VCAP port keyset configuration for Sparx5. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net: microchip: sparx5: Updated register interface with VCAP ES0 accessSteen Hegelund
This provides access to the ES0 VCAP register targets Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net: microchip: sparx5: Add ES0 VCAP model and updated KUNIT VCAP modelSteen Hegelund
This provides the VCAP model for the Sparx5 ES0 (Egress Stage 0) VCAP. This VCAP provides rewriting functionality in the egress path. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net: microchip: sparx5: Improve the error handling for linked rulesSteen Hegelund
Ensure that an error is returned if the VCAP instance was not found. The chain offset (diff) is allowed to be zero as this just means that the user did not request rules to be linked. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net: microchip: sparx5: Use chain ids without offsets when enabling rulesSteen Hegelund
This improves the check performed on linked rules when enabling or disabling them. The chain id used must be the chain id without the offset used for linking the rules. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net: microchip: sparx5: Egress VLAN TPID configuration follows IFHSteen Hegelund
This changes the TPID of the egress frames to use the TPID stored in the IFH (internal frame header), which ensures that this is the TPID classified for the frame at ingress. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net: microchip: sparx5: Clear rule counter even if lookup is disabledSteen Hegelund
The rule counter must be cleared when creating a new rule, even if the VCAP lookup is currently disabled. This ensures that rules located in VCAPs that use external counters (such as Sparx5 IS2 and ES0) will have their counter reset even if the VCAP lookup is not enabled at the moment. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Fixes: 95fa74148daa ("net: microchip: sparx5: Reset VCAP counter for new rules") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16net: microchip: sparx5: Discard frames with SMAC multicast addressesSteen Hegelund
A valid frame should never use a multicast address as its source MAC address, so discard these invalid frames. Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-15Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-02-14 (ice) This series contains updates to ice driver only. Karol extends support for GPIO pins to E823 devices. Daniel Vacek stops processing of PTP packets when link is down. Pawel adds support for BIG TCP for IPv6. Tony changes return type of ice_vsi_realloc_stat_arrays() as it always returns success. Zhu Yanjun updates kdoc stating supported TLVs. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Mention CEE DCBX in code comment ice: Change ice_vsi_realloc_stat_arrays() to void ice: add support BIG TCP on IPv6 ice/ptp: fix the PTP worker retrying indefinitely if the link went down ice: Add GPIO pin support for E823 products ==================== Link: https://lore.kernel.org/r/20230214213003.2117125-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15Documentation: core-api: packing: correct spellingRandy Dunlap
Correct spelling problems for Documentation/core-api/packing.rst as reported by codespell. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vladimir Oltean <olteanv@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Reviewed-by: Mukesh Ojha <quic_mojha@quicinc.com> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Link: https://lore.kernel.org/r/20230215053738.11562-1-rdunlap@infradead.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15dt-bindings: net: snps,dwmac: Fix snps,reset-delays-us dependencyAndrew Halaney
The schema had snps,reset-delay-us as dependent on snps,reset-gpio. The actual property is called snps,reset-delays-us, so fix this to catch any devicetree defining snsps,reset-delays-us without snps,reset-gpio. Fixes: 7db3545aef5f ("dt-bindings: net: stmmac: Convert the binding to a schemas") Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Andrew Halaney <ahalaney@redhat.com> Link: https://lore.kernel.org/r/20230214171505.224602-1-ahalaney@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15selftests: forwarding: tc_actions: cleanup temporary files when test is abortedDavide Caratti
remove temporary files created by 'mirred_egress_to_ingress_tcp' test in the cleanup() handler. Also, change variable names to avoid clashing with globals from lib.sh. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/091649045a017fc00095ecbb75884e5681f7025f.1676368027.git.dcaratti@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15net: wangxun: Add the basic ethtool interfacesMengyuan Lou
Add the basic ethtool ops get_drvinfo and get_link for ngbe and txgbe. Ngbe implements get_link_ksettings, nway_reset and set_link_ksettings for free using phylib code. The code related to the physical interface is not yet fully implemented in txgbe using phylink code. So do not implement get_link_ksettings, nway_reset and set_link_ksettings in txgbe. Signed-off-by: Mengyuan Lou <mengyuanlou@net-swift.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230214091527.69943-1-mengyuanlou@net-swift.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15net: msg_zerocopy: elide page accounting if RLIM_INFINITYWillem de Bruijn
MSG_ZEROCOPY ensures that pinned user pages do not exceed the limit. If no limit is set, skip this accounting as otherwise expensive atomic_long operations are called for no reason. This accounting is already skipped for privileged (CAP_IPC_LOCK) users. Rely on the same mechanism: if no mmp->user is set, mm_unaccount_pinned_pages does not decrement either. Tested by running tools/testing/selftests/net/msg_zerocopy.sh with an unprivileged user for the TXMODE binary: ip netns exec "${NS1}" sudo -u "{$USER}" "${BIN}" "-${IP}" ... Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20230214155740.3448763-1-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15net: phy: c45: genphy_c45_an_config_aneg(): fix uninitialized symbol errorOleksij Rempel
Fix warning: drivers/net/phy/phy-c45.c:712 genphy_c45_write_eee_adv() error: uninitialized symbol 'changed' Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/202302150232.q6idsV8s-lkp@intel.com/ Fixes: 022c3f87f88e ("net: phy: add genphy_c45_ethtool_get/set_eee() support") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20230215050453.2251360-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15net: phy: motorcomm: uninitialized variables in yt8531_link_change_notify()Dan Carpenter
These booleans are never set to false, but are just used without being initialized. Fixes: 4ac94f728a58 ("net: phy: Add driver for Motorcomm yt8531 gigabit ethernet phy") Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Frank Sae <Frank.Sae@motor-comm.com> Link: https://lore.kernel.org/r/Y+xd2yJet2ImHLoQ@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15igb: conditionalize I2C bit banging on external thermal sensor supportCorinna Vinschen
Commit a97f8783a937 ("igb: unbreak I2C bit-banging on i350") introduced code to change I2C settings to bit banging unconditionally. However, this patch introduced a regression: On an Intel S2600CWR Server Board with three NICs: - 1x dual-port copper Intel I350 Gigabit Network Connection [8086:1521] (rev 01) fw 1.63, 0x80000dda - 2x quad-port SFP+ with copper SFP Avago ABCU-5700RZ Intel I350 Gigabit Fiber Network Connection [8086:1522] (rev 01) fw 1.52.0 the SFP NICs no longer get link at all. Reverting commit a97f8783a937 or switching to the Intel out-of-tree driver both fix the problem. Per the igb out-of-tree driver, I2C bit banging on i350 depends on support for an external thermal sensor (ETS). However, commit a97f8783a937 added bit banging unconditionally. Additionally, the out-of-tree driver always calls init_thermal_sensor_thresh on probe, while our driver only calls init_thermal_sensor_thresh only in igb_reset(), and only if an ETS is present, ignoring the internal thermal sensor. The affected SFPs don't provide an ETS. Per Intel, the behaviour is a result of i350 firmware requirements. This patch fixes the problem by aligning the behaviour to the out-of-tree driver: - split igb_init_i2c() into two functions: - igb_init_i2c() only performs the basic I2C initialization. - igb_set_i2c_bb() makes sure that E1000_CTRL_I2C_ENA is set and enables bit-banging. - igb_probe() only calls igb_set_i2c_bb() if an ETS is present. - igb_probe() calls init_thermal_sensor_thresh() unconditionally. - igb_reset() aligns its behaviour to igb_probe(), i. e., call igb_set_i2c_bb() if an ETS is present and call init_thermal_sensor_thresh() unconditionally. Fixes: a97f8783a937 ("igb: unbreak I2C bit-banging on i350") Tested-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Co-developed-by: Jamie Bainbridge <jbainbri@redhat.com> Signed-off-by: Jamie Bainbridge <jbainbri@redhat.com> Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230214185549.1306522-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15drm/amd/display: Fail atomic_check early on normalize_zpos errorLeo Li
[Why] drm_atomic_normalize_zpos() can return an error code when there's modeset lock contention. This was being ignored. [How] Bail out of atomic check if normalize_zpos() returns an error. Fixes: b261509952bc ("drm/amd/display: Fix double cursor on non-video RGB MPO") Signed-off-by: Leo Li <sunpeng.li@amd.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Reviewed-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-02-15drm/amd/amdgpu: fix warning during suspendJack Xiao
Freeing memory was warned during suspend. Move the self test out of suspend. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2151825 Cc: jfalempe@redhat.com Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-and-tested-by: Evan Quan <evan.quan@amd.com> Tested-by: Jocelyn Falempe <jfalempe@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-02-15Merge tag 'mlx5-updates-2023-02-10' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-02-10 1) From Roi and Mark: MultiPort eswitch support MultiPort E-Switch builds on newer hardware's capabilities and introduces a mode where a single E-Switch is used and all the vports and physical ports on the NIC are connected to it. The new mode will allow in the future a decrease in the memory used by the driver and advanced features that aren't possible today. This represents a big change in the current E-Switch implantation in mlx5. Currently, by default, each E-Switch manager manages its E-Switch. Steering rules in each E-Switch can only forward traffic to the native physical port associated with that E-Switch. While there are ways to target non-native physical ports, for example using a bond or via special TC rules. None of the ways allows a user to configure the driver to operate by default in such a mode nor can the driver decide to move to this mode by default as it's user configuration-driven right now. While MultiPort E-Switch single FDB mode is the preferred mode, older generations of ConnectX hardware couldn't support this mode so it was never implemented. Now that there is capable hardware present, start the transition to having this mode by default. Introduce a devlink parameter to control MultiPort Eswitch single FDB mode. This will allow users to select this mode on their system right now and in the future will allow the driver to move to this mode by default. 2) From Jiri: Improvements and fixes for mlx5 netdev's devlink logic 2.1) Cleanups related to mlx5's devlink port logic 2.2) Move devlink port registration to be done before netdev alloc 2.3) Create auxdev devlink instance in the same ns as parent devlink 2.4) Suspend auxiliary devices only in case of PCI device suspend * tag 'mlx5-updates-2023-02-10' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Suspend auxiliary devices only in case of PCI device suspend net/mlx5: Remove "recovery" arg from mlx5_load_one() function net/mlx5e: Create auxdev devlink instance in the same ns as parent devlink net/mlx5e: Move devlink port registration to be done before netdev alloc net/mlx5e: Move dl_port to struct mlx5e_dev net/mlx5e: Replace usage of mlx5e_devlink_get_dl_port() by netdev->devlink_port net/mlx5e: Pass mdev to mlx5e_devlink_port_register() net/mlx5: Remove outdated comment net/mlx5e: TC, Remove redundant parse_attr argument net/mlx5e: Use a simpler comparison for uplink rep net/mlx5: Lag, Add single RDMA device in multiport mode net/mlx5: Lag, set different uplink vport metadata in multiport eswitch mode net/mlx5: E-Switch, rename bond update function to be reused net/mlx5e: TC, Add peer flow in mpesw mode net/mlx5: Lag, Control MultiPort E-Switch single FDB mode ==================== Link: https://lore.kernel.org/r/20230214221239.159033-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15Merge branch '10GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-02-14 (ixgbe, i40e) This series contains updates to ixgbe and i40e drivers. Jason Xing corrects comparison of frame sizes for setting MTU with XDP on ixgbe and adjusts frame size to account for a second VLAN header on ixgbe and i40e. * '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ixgbe: add double of VLAN header when computing the max MTU i40e: add double of VLAN header when computing the max MTU ixgbe: allow to increase MTU to 3K with XDP enabled ==================== Link: https://lore.kernel.org/r/20230214185146.1305819-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15Merge branch ↵Jakub Kicinski
'devlink-cleanups-and-move-devlink-health-functionality-to-separate-file' Moshe Shemesh says: ==================== devlink: cleanups and move devlink health functionality to separate file This patchset moves devlink health callbacks, helpers and related code from leftover.c to new file health.c. About 1.3K LoC are moved by this patchset, covering all devlink health functionality. In addition this patchset includes a couple of small cleanups in devlink health code and documentation update. ==================== Link: https://lore.kernel.org/r/1676392686-405892-1-git-send-email-moshe@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15devlink: Fix TP_STRUCT_entry in trace of devlink health reportMoshe Shemesh
Fix a bug in trace point definition for devlink health report, as TP_STRUCT_entry of reporter_name should get reporter_name and not msg. Note no fixes tag as this is a harmless bug as both reporter_name and msg are strings and TP_fast_assign for this entry is correct. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15devlink: Update devlink health documentationMoshe Shemesh
Update devlink-health.rst file: - Add devlink formatted message (fmsg) API documentation. - Add auto-dump as a condition to do dump once error reported. - Expand OOB to clarify this acronym. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15devlink: Move health common function to health fileMoshe Shemesh
Now that all devlink health callbacks and related code are in file health.c move common health functions and devlink_health_reporter struct to be local in health.c file. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15devlink: Move devlink health test to health fileMoshe Shemesh
Move devlink health report test callback from leftover.c to health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15devlink: Move devlink health dump to health fileMoshe Shemesh
Move devlink health report dump callbacks and related code from leftover.c to health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15devlink: Move devlink fmsg and health diagnose to health fileMoshe Shemesh
Devlink fmsg (formatted message) is used by devlink health diagnose, dump and drivers which support these devlink health callbacks. Therefore, move devlink fmsg helpers and related code to file health.c. Move devlink health diagnose to file health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15devlink: Move devlink health report and recover to health fileMoshe Shemesh
Move devlink health report helper and recover callback and related code from leftover.c to health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15devlink: Move devlink health get and set code to health fileMoshe Shemesh
Move devlink health get and set callbacks and related code from leftover.c to health.c. No functional change in this patch. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15devlink: health: Fix nla_nest_end in error flowMoshe Shemesh
devlink_nl_health_reporter_fill() error flow calls nla_nest_end(). Fix it to call nla_nest_cancel() instead. Note the bug is harmless as genlmsg_cancel() cancel the entire message, so no fixes tag added. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15devlink: Split out health reporter create codeMoshe Shemesh
Move devlink health reporter create/destroy and related dev code to new file health.c. This file shall include all callbacks and functionality that are related to devlink health. In addition, fix kdoc indentation and make reporter create/destroy kdoc more clear. No functional change in this patch. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15ice: update xdp_features with xdp multi-buffLorenzo Bianconi
Now ice driver supports xdp multi-buffer so add it to xdp_features. Check vsi type before setting xdp_features flag. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/8a4781511ab6e3cd280e944eef69158954f1a15f.1676385351.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15i40e: check vsi type before setting xdp_features flagLorenzo Bianconi
Set xdp_features flag just for I40E_VSI_MAIN vsi type since XDP is supported just in this configuration. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/f2b537f86b34fc176fbc6b3d249b46a20a87a2f3.1676405131.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMESAlexander Lobakin
&xdp_buff and &xdp_frame are bound in a way that xdp_buff->data_hard_start == xdp_frame It's always the case and e.g. xdp_convert_buff_to_frame() relies on this. IOW, the following: for (u32 i = 0; i < 0xdead; i++) { xdpf = xdp_convert_buff_to_frame(&xdp); xdp_convert_frame_to_buff(xdpf, &xdp); } shouldn't ever modify @xdpf's contents or the pointer itself. However, "live packet" code wrongly treats &xdp_frame as part of its context placed *before* the data_hard_start. With such flow, data_hard_start is sizeof(*xdpf) off to the right and no longer points to the XDP frame. Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several places and praying that there are no more miscalcs left somewhere in the code, unionize ::frm with ::data in a flex array, so that both starts pointing to the actual data_hard_start and the XDP frame actually starts being a part of it, i.e. a part of the headroom, not the context. A nice side effect is that the maximum frame size for this mode gets increased by 40 bytes, as xdp_buff::frame_sz includes everything from data_hard_start (-> includes xdpf already) to the end of XDP/skb shared info. Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it hardcoded for 64 bit && 4k pages, it can be made more flexible later on. Minor: align `&head->data` with how `head->frm` is assigned for consistency. Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for clarity. (was found while testing XDP traffic generator on ice, which calls xdp_convert_frame_to_buff() for each XDP frame) Fixes: b530e9e1063e ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN") Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230215185440.4126672-1-aleksander.lobakin@intel.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-15Merge branch 'New benchmark for hashmap lookups'Andrii Nakryiko
Anton Protopopov says: ==================== Add a new benchmark for hashmap lookups and fix several typos. In commit 3 I've patched the bench utility so that now command line options can be reused by different benchmarks. The benchmark itself is added in the last commit 7. I was using this benchmark to test map lookup productivity when using a different hash function [1]. When run with --quiet, the results can be easily plotted [2]. The results provided by the benchmark look reasonable and match the results of my different benchmarks (requiring to patch kernel to get actual statistics on map lookups). Links: [1] https://fosdem.org/2023/schedule/event/bpf_hashing/ [2] https://github.com/aspsk/bpf-bench/tree/master/hashmap-bench Changes, v1->v2: - percpu_times_index[] is of wrong size (Martin) - use base 0 for strtol (Andrii) - just use -q without argument (Andrii) - use less hacks when parsing arguments (Andrii) ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2023-02-15selftest/bpf/benchs: Add benchmark for hashmap lookupsAnton Protopopov
Add a new benchmark which measures hashmap lookup operations speed. A user can control the following parameters of the benchmark: * key_size (max 1024): the key size to use * max_entries: the hashmap max entries * nr_entries: the number of entries to insert/lookup * nr_loops: the number of loops for the benchmark * map_flags The hashmap flags passed to BPF_MAP_CREATE The BPF program performing the benchmarks calls two nested bpf_loop: bpf_loop(nr_loops/nr_entries) bpf_loop(nr_entries) bpf_map_lookup() So the nr_loops determines the number of actual map lookups. All lookups are successful. Example (the output is generated on a AMD Ryzen 9 3950X machine): for nr_entries in `seq 4096 4096 65536`; do echo -n "$((nr_entries*100/65536))% full: "; sudo ./bench -d2 -a bpf-hashmap-lookup --key_size=4 --nr_entries=$nr_entries --max_entries=65536 --nr_loops=1000000 --map_flags=0x40 | grep cpu; done 6% full: cpu01: lookup 50.739M ± 0.018M events/sec (approximated from 32 samples of ~19ms) 12% full: cpu01: lookup 47.751M ± 0.015M events/sec (approximated from 32 samples of ~20ms) 18% full: cpu01: lookup 45.153M ± 0.013M events/sec (approximated from 32 samples of ~22ms) 25% full: cpu01: lookup 43.826M ± 0.014M events/sec (approximated from 32 samples of ~22ms) 31% full: cpu01: lookup 41.971M ± 0.012M events/sec (approximated from 32 samples of ~23ms) 37% full: cpu01: lookup 41.034M ± 0.015M events/sec (approximated from 32 samples of ~24ms) 43% full: cpu01: lookup 39.946M ± 0.012M events/sec (approximated from 32 samples of ~25ms) 50% full: cpu01: lookup 38.256M ± 0.014M events/sec (approximated from 32 samples of ~26ms) 56% full: cpu01: lookup 36.580M ± 0.018M events/sec (approximated from 32 samples of ~27ms) 62% full: cpu01: lookup 36.252M ± 0.012M events/sec (approximated from 32 samples of ~27ms) 68% full: cpu01: lookup 35.200M ± 0.012M events/sec (approximated from 32 samples of ~28ms) 75% full: cpu01: lookup 34.061M ± 0.009M events/sec (approximated from 32 samples of ~29ms) 81% full: cpu01: lookup 34.374M ± 0.010M events/sec (approximated from 32 samples of ~29ms) 87% full: cpu01: lookup 33.244M ± 0.011M events/sec (approximated from 32 samples of ~30ms) 93% full: cpu01: lookup 32.182M ± 0.013M events/sec (approximated from 32 samples of ~31ms) 100% full: cpu01: lookup 31.497M ± 0.016M events/sec (approximated from 32 samples of ~31ms) Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-8-aspsk@isovalent.com
2023-02-15selftest/bpf/benchs: Print less if the quiet option is setAnton Protopopov
The bench utility will print Setting up benchmark '<bench-name>'... Benchmark '<bench-name>' started. on startup to stdout. Suppress this output if --quiet option if given. This makes it simpler to parse benchmark output by a script. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-7-aspsk@isovalent.com
2023-02-15selftest/bpf/benchs: Make quiet option commonAnton Protopopov
The "local-storage-tasks-trace" benchmark has a `--quiet` option. Move it to the list of common options, so that the main code and other benchmarks can use (new) env.quiet variable. Patch the run_bench_local_storage_rcu_tasks_trace.sh helper script accordingly. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-6-aspsk@isovalent.com
2023-02-15selftest/bpf/benchs: Remove an unused headerAnton Protopopov
The benchs/bench_bpf_hashmap_full_update.c doesn't set a custom argp, so it shouldn't include the <argp.h> header. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-5-aspsk@isovalent.com
2023-02-15selftest/bpf/benchs: Enhance argp parsingAnton Protopopov
To parse command line the bench utility uses the argp_parse() function. This function takes as an argument a parent 'struct argp' structure which defines common command line options and an array of children 'struct argp' structures which defines additional command line options for particular benchmarks. This implementation doesn't allow benchmarks to share option names, e.g., if two benchmarks want to use, say, the --option option, then only one of them will succeed (the first one encountered in the array). This will be convenient if same option names could be used in different benchmarks (with the same semantics, e.g., --nr_loops=N). Fix this by calling the argp_parse() function twice. The first call is the same as it was before, with all children argps, and helps to find the benchmark name and to print a combined help message if anything is wrong. Given the name, we can call the argp_parse the second time, but now the children array points only to a correct benchmark thus always calling the correct parsers. (If there's no a specific list of arguments, then only one call to argp_parse will be done.) Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-4-aspsk@isovalent.com
2023-02-15selftest/bpf/benchs: Make a function static in bpf_hashmap_full_updateAnton Protopopov
The hashmap_report_final callback function defined in the benchs/bench_bpf_hashmap_full_update.c file should be static. Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230213091519.1202813-3-aspsk@isovalent.com