summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-06-22mlxsw: core_acl_flex_actions: Add L4_PORT_ACTIONPetr Machata
Add fields related to L4_PORT_ACTION, which is used for changing of TCP and UDP port numbers. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22mlxsw: spectrum: Split handling of pedit mangle by chip typePetr Machata
Certain ACL actions are only available on some Spectrum revisions. In particular, L4_PORT_ACTION is not available on Spectrum-1. Introduce a new ops struct intended to hold these differences, mlxsw_sp_rulei_ops. Prime it with a sole member, act_mangle_field, meant for handling of pedit mangles. Create two ops structures, one for Spectrum-1, the other for Spectrum-2 and above. Add callbacks for act_mangle_field and dispatch to the common handler. Invoke mlxsw_sp_rulei_ops.act_mangle_field from the field mangler instead of calling the common handler directly. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22mlxsw: spectrum: Do not rely on machine endiannessIdo Schimmel
The second commit cited below performed a cast of 'u32 buffsize' to '(u16 *)' when calling mlxsw_sp_port_headroom_8x_adjust(): mlxsw_sp_port_headroom_8x_adjust(mlxsw_sp_port, (u16 *) &buffsize); Colin noted that this will behave differently on big endian architectures compared to little endian architectures. Fix this by following Colin's suggestion and have the function accept and return 'u32' instead of passing the current size by reference. Fixes: da382875c616 ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC") Fixes: 60833d54d56c ("mlxsw: spectrum: Adjust headroom buffers for 8x ports") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Colin Ian King <colin.king@canonical.com> Suggested-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22Merge branch 'Add-Marvell-88E1340S-88E1548P-support'David S. Miller
Maxim Kochetkov says: ==================== Add Marvell 88E1340S, 88E1548P support This patch series add new PHY id support. Russell King asked to use single style for referencing functions. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net: phy: marvell: Add Marvell 88E1548P supportMaxim Kochetkov
Add support for this new phy ID. Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net: phy: marvell: Add Marvell 88E1340S supportMaxim Kochetkov
Add support for this new phy ID. Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net: phy: marvell: use a single style for referencing functionsMaxim Kochetkov
The kernel in general does not use &func referencing format. Signed-off-by: Maxim Kochetkov <fido_max@inbox.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22Merge branch 'r8169-mark-device-as-detached-in-PCI-D3-and-improve-locking'David S. Miller
Heiner Kallweit says: ==================== r8169: mark device as detached in PCI D3 and improve locking Mark the netdevice as detached whenever parent is in PCI D3hot and not accessible. This mainly applies to runtime-suspend state. In addition take RTNL lock in suspend calls, this allows to remove the driver-specific mutex and improve PM callbacks in general. ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22r8169: improve rtl8169_runtime_resumeHeiner Kallweit
Simplify rtl8169_runtime_resume() by calling rtl8169_resume(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22r8169: remove driver-specific mutexHeiner Kallweit
Now that the critical sections are protected with RTNL lock, we don't need a separate mutex any longer. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22r8169: use RTNL to protect critical sectionsHeiner Kallweit
Most relevant ops (open, close, ethtool ops) are protected with RTNL lock by net core. Make sure that such ops can't be interrupted by e.g. (runtime-)suspending by taking the RTNL lock in suspend ops and the PCI error handler. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22r8169: add rtl8169_upHeiner Kallweit
Factor out bringing device up to a new function rtl8169_up(), similar to rtl8169_down() for bringing the device down. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22r8169: remove no longer needed checks for device being runtime-activeHeiner Kallweit
Because the netdevice is marked as detached now when parent is not accessible we can remove quite some checks. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22r8169: mark device as not present when in PCI D3Heiner Kallweit
Mark the netdevice as detached whenever we go into PCI D3hot. This allows to remove some checks e.g. from ethtool ops because dev_ethtool() checks for netif_device_present() in the beginning. In this context move waking up the queue out of rtl_reset_work() because in cases where netif_device_attach() is called afterwards the queue should be woken up by the latter function only. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net: core: try to runtime-resume detached device in __dev_openHeiner Kallweit
A netdevice may be marked as detached because the parent is runtime-suspended and not accessible whilst interface or link is down. An example are PCI network devices that go into PCI D3hot, see e.g. __igc_shutdown() or rtl8169_net_suspend(). If netdevice is down and marked as detached we can only open it if we runtime-resume it before __dev_open() calls netif_device_present(). Therefore, if netdevice is detached, try to runtime-resume the parent and only return with an error if it's still detached. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22Merge branch 'prepare-dwmac-meson8b-for-G12A-specific-initialization'David S. Miller
Martin Blumenstingl says: ==================== prepare dwmac-meson8b for G12A specific initialization Some users are reporting that RGMII (and sometimes also RMII) Ethernet is not working for them on G12A/G12B/SM1 boards. Upon closer inspection of the vendor code for these SoCs new register bits are found. It's not clear yet how these registers work. Add a new compatible string as the first preparation step to improve Ethernet support on these SoCs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net: stmmac: dwmac-meson8b: add a compatible string for G12A SoCsMartin Blumenstingl
Amlogic Meson G12A, G12B and SM1 have the same (at least as far as we know at the time of writing) PRG_ETHERNET glue register implementation. This implementation however is slightly different from AXG as it now has an undocument "auto cali idx val" register in PRG_ETH1[17:16] which seems to be related to RGMII Ethernet. Add a new compatible string for G12A SoCs so the logic for this new register can be implemented in the future. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22dt-bindings: net: dwmac-meson: Add a compatible string for G12A onwardsMartin Blumenstingl
Amlogic Meson G12A, G12B and SM1 have the same (at least as far as we know at the time of writing) PRG_ETHERNET glue register implementation. This implementation however is slightly different from AXG as it now has an undocument "auto cali idx val" register in PRG_ETH1[17:16] which seems to be related to RGMII Ethernet. Add a compatible string for G12A and newer so the new registers can be used. Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22Merge branch 'devlink-Add-board-serial_number-field-to-info_get-cb'David S. Miller
Vasundhara Volam says: ==================== devlink: Add board.serial_number field to info_get cb. This patchset adds support for board.serial_number to devlink info_get cb and also use it in bnxt_en driver. Sample output: $ devlink dev info pci/0000:af:00.1 pci/0000:af:00.1: driver bnxt_en serial_number 00-10-18-FF-FE-AD-1A-00 board.serial_number 433551F+172300000 versions: fixed: board.id 7339763 Rev 0. asic.id 16D7 asic.rev 1 running: fw 216.1.216.0 fw.psid 0.0.0 fw.mgmt 216.1.192.0 fw.mgmt.api 1.10.1 fw.ncsi 0.0.0.0 fw.roce 216.1.16.0 v2: - Modify board_serial_number to board.serial_number for maintaining consistency. - Combine 2 lines in second patchset as column limit is 100 now ==================== Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22bnxt_en: Add board.serial_number field to info_get cbVasundhara Volam
Add board.serial_number field info to info_get cb via devlink, if driver can fetch the information from the device. Cc: Jiri Pirko <jiri@mellanox.com> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22devlink: Add support for board.serial_number to info_get cb.Vasundhara Volam
Board serial number is a serial number, often available in PCI *Vital Product Data*. Also, update devlink-info.rst documentation file. Cc: Jiri Pirko <jiri@mellanox.com> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net: phy: smsc: fix printing too many logsDejin Zheng
Commit 7ae7ad2f11ef47 ("net: phy: smsc: use phy_read_poll_timeout() to simplify the code") will print a lot of logs as follows when Ethernet cable is not connected: [ 4.473105] SMSC LAN8710/LAN8720 2188000.ethernet-1:00: lan87xx_read_status failed: -110 When wait 640 ms for check ENERGYON bit, the timeout should not be regarded as an actual error and an error message also should not be printed. due to a hardware bug in LAN87XX device, it leads to unstable detection of plugging in Ethernet cable when LAN87xx is in Energy Detect Power-Down mode. the workaround for it involves, when the link is down, and at each read_status() call: - disable EDPD mode, forcing the PHY out of low-power mode - waiting 640ms to see if we have any energy detected from the media - re-enable entry to EDPD mode This is presumably enough to allow the PHY to notice that a cable is connected, and resume normal operations to negotiate with the partner. The problem is that when no media is detected, the 640ms wait times out and this commit was modified to prints an error message. it is an inappropriate conversion by used phy_read_poll_timeout() to introduce this bug. so fix this issue by use read_poll_timeout() to replace phy_read_poll_timeout(). Fixes: 7ae7ad2f11ef47 ("net: phy: smsc: use phy_read_poll_timeout() to simplify the code") Reported-by: Kevin Groeneveld <kgroeneveld@gmail.com> Signed-off-by: Dejin Zheng <zhengdejin5@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22Merge branch 'Cosmetic-cleanup-in-SJA1105-DSA-driver'David S. Miller
Vladimir Oltean says: ==================== Cosmetic cleanup in SJA1105 DSA driver This removes the sparse warnings from the sja1105 driver and makes some structures constant. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net: dsa: sja1105: make the instantiations of struct sja1105_info constantVladimir Oltean
Since struct sja1105_private only holds a const pointer to one of these structures based on device tree compatible string, the structures themselves can be made const. Also add an empty line between each structure definition, to appease checkpatch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net: dsa: sja1105: make config table operation structures constantVladimir Oltean
The per-chip instantiations of struct sja1105_table_ops and struct sja1105_dynamic_table_ops can be made constant, so do that. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net: dsa: sja1105: remove empty structures from config table opsVladimir Oltean
Sparse is complaining and giving the following warning message: 'Using plain integer as NULL pointer'. This is not what's going on, instead {0} is used as a zero initializer for the structure members, to indicate that the particular chip revision does not support those particular config tables. But since the config tables are declared globally, the unpopulated elements are zero-initialized anyway. So, to make sparse shut up, let's remove the zero initializers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22Merge branch 'net-dsa-qca8k-Improve-SGMII-interface-handling'David S. Miller
Jonathan McDowell says: ==================== net: dsa: qca8k: Improve SGMII interface handling This 3 patch series migrates the qca8k switch driver over to PHYLINK, and then adds the SGMII clean-ups (i.e. the missing initialisation) on top of that as a second patch. The final patch is a simple spelling fix in a comment. As before, tested with a device where the CPU connection is RGMII (i.e. the common current use case) + one where the CPU connection is SGMII. I don't have any devices where the SGMII interface is brought out to something other than the CPU. v5: - Move spelling fix to separate patch - Use ds directly rather than ds->priv v4: - Enable pcs_poll so we keep phylink updated when doing in-band negotiation - Explicitly check for PHY_INTERFACE_MODE_1000BASEX when setting SGMII port mode. - Address Vladimir's review comments v3: - Move phylink changes to separate patch - Address rmk review comments v2: - Switch to phylink - Avoid need for device tree configuration options ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net: dsa: qca8k: Minor comment spelling fixJonathan McDowell
Signed-off-by: Jonathan McDowell <noodles@earth.li> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net: dsa: qca8k: Improve SGMII interface handlingJonathan McDowell
This patch improves the handling of the SGMII interface on the QCA8K devices. Previously the driver did no configuration of the port, even if it was selected. We now configure it up in the appropriate PHY/MAC/Base-X mode depending on what phylink tells us we are connected to and ensure it is enabled. Tested with a device where the CPU connection is RGMII (i.e. the common current use case) + one where the CPU connection is SGMII. I don't have any devices where the SGMII interface is brought out to something other than the CPU. Signed-off-by: Jonathan McDowell <noodles@earth.li> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net: dsa: qca8k: Switch to PHYLINK instead of PHYLIBJonathan McDowell
Update the driver to use the new PHYLINK callbacks, removing the legacy adjust_link callback. Signed-off-by: Jonathan McDowell <noodles@earth.li> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22Merge branch 'bonding-initial-support-for-hardware-crypto-offload'David S. Miller
Jarod Wilson says: ==================== bonding: initial support for hardware crypto offload This is an initial functional implementation for doing pass-through of hardware encryption from bonding device to capable slaves, in active-backup bond setups. This was developed and tested using ixgbe-driven Intel x520 interfaces with libreswan and a transport mode connection, primarily using netperf, with assorted connection failures forced during transmission. The failover works quite well in my testing, and overall performance is right on par with offload when running on a bare interface, no bond involved. Caveats: this is ONLY enabled for active-backup, because I'm not sure how one would manage multiple offload handles for different devices all running at the same time in the same xfrm, and it relies on some minor changes to both the xfrm code and slave device driver code to get things to behave, and I don't have immediate access to any other hardware that could function similarly, but the NIC driver changes are minimal and straight-forward enough that I've included what I think ought to be enough for mlx5 devices too. v2: reordered patches, switched (back) to using CONFIG_XFRM_OFFLOAD to wrap the code additions and wrapped overlooked additions. v3: rebase w/net-next open, add proper cc list to cover letter ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22bonding: support hardware encryption offload to slavesJarod Wilson
Currently, this support is limited to active-backup mode, as I'm not sure about the feasilibity of mapping an xfrm_state's offload handle to multiple hardware devices simultaneously, and we rely on being able to pass some hints to both the xfrm and NIC driver about whether or not they're operating on a slave device. I've tested this atop an Intel x520 device (ixgbe) using libreswan in transport mode, succesfully achieving ~4.3Gbps throughput with netperf (more or less identical to throughput on a bare NIC in this system), as well as successful failover and recovery mid-netperf. v2: just use CONFIG_XFRM_OFFLOAD for wrapping, isolate more code with it CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22mlx5: become aware of when running as a bonding slaveJarod Wilson
I've been unable to get my hands on suitable supported hardware to date, but I believe this ought to be all that is needed to enable the mlx5 driver to also work with bonding active-backup crypto offload passthru. CC: Boris Pismenny <borisp@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Leon Romanovsky <leon@kernel.org> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22ixgbe_ipsec: become aware of when running as a bonding slaveJarod Wilson
Slave devices in a bond doing hardware encryption also need to be aware that they're slaves, so we operate on the slave instead of the bonding master to do the actual hardware encryption offload bits. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Acked-by: Jeff Kirsher <Jeffrey.t.kirsher@intel.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22xfrm: bail early on slave pass over skbJarod Wilson
This is prep work for initial support of bonding hardware encryption pass-through support. The bonding driver will fill in the slave_dev pointer, and we use that to know not to skb_push() again on a given skb that was already processed on the bond device. CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org CC: intel-wired-lan@lists.osuosl.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22Merge branch 'devlink-Support-get-set-mac-address-of-a-port-function'David S. Miller
Parav Pandit says: ==================== devlink: Support get,set mac address of a port function Currently, ip link set dev <pfndev> vf <vf_num> <param> <value> has below few limitations. 1. Command is limited to set VF parameters only. It cannot set the default MAC address for the PCI PF. 2. It can be set only on system where PCI SR-IOV capability exists. In smartnic based system, eswitch of a NIC resides on a different embedded cpu which has the VF and PF representors for the SR-IOV functions of a host system in which this smartnic is plugged-in. 3. It cannot setup the function attributes of sub-function described in detail in comprehensive RFC [1] and [2]. This series covers the first small part to let user query and set MAC address (hardware address) of a PCI PF/VF which is represented by devlink port pcipf, pcivf port flavours respectively. Whenever a devlink port manages a function connected to a devlink port, it allows to query and set its hardware address. Driver implements necessary get/set callback functions if it supports port function for a given port type. Patch summary: Patch-1 Prepares devlink port fill routines for extack Patch-2 and 3 extended devlink interface to get/set port function attributes, mainly hardware address to start with. Patch-2 Extended port dump command to query port function hardware address Patch-3 Introduces a command to set the hardware address of a port function Patch-4 to 9 refactors and implement devlink callbacks in mlx5_core driver. Patch-4 Constify the mac address pointer in set routines Patch-5 Introduces eswich check helper to use in devlink facing callbacks Patch-6 Moves port index, port number conversion routine to eswitch header file Patch-7 Implements port function query devlink callback Patch-8 Refactors mac address setting routine to uniformly use state_lock Patch-9 Implements port function set devlink callback [1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/ [2] https://marc.info/?l=linux-netdev&m=158555928517777&w=2 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net/mlx5: E-switch, Supporting setting devlink port function mac addressParav Pandit
Enable user to set mac address of the PCI PF and VF port function. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net/mlx5: Split mac address setting function for using state_lockParav Pandit
Refactor mac address setting function to let caller hold the necessary state_lock mutex, so that subsequent patch and use this helper routine. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net/mlx5: E-switch, Support querying port function mac addressParav Pandit
Support querying mac address of the eswitch devlink port function. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net/mlx5: Move helper to eswitch layerParav Pandit
To use port number to port index conversion at eswitch level, move it to eswitch header. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net/mlx5: E-switch, Introduce and use eswitch support check helperParav Pandit
Introduce an helper routine to get esw from a devlink device and use it at eswitch callbacks and in subsequent patch. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net/mlx5: Constify mac address pointerParav Pandit
Since none of the functions need to modify the input mac address, constify them. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net/devlink: Support setting hardware address of port functionParav Pandit
PCI PF and VF devlink port can manage the function represented by a devlink port. Allow users to set port function's hardware address. Example of a PCI VF port which supports a port function: $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 $ devlink port function set pci/0000:06:00.0/2 hw_addr 00:11:22:33:44:55 $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:11:22:33:44:55 Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net/devlink: Support querying hardware address of port functionParav Pandit
PCI PF and VF devlink port can manage the function represented by a devlink port. Enable users to query port function's hardware address. Example of a PCI VF port which supports a port function: $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:11:22:33:44:66 $ devlink port show pci/0000:06:00.0/2 -jp { "port": { "pci/0000:06:00.0/2": { "type": "eth", "netdev": "enp6s0pf0vf1", "flavour": "pcivf", "pfnum": 0, "vfnum": 1, "function": { "hw_addr": "00:11:22:33:44:66" } } } } Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22net/devlink: Prepare devlink port functions to fill extackParav Pandit
Prepare devlink port related functions to optionally fill up the extack information which will be used in subsequent patch by port function attribute(s). Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-22KVM: nVMX: Plumb L2 GPA through to PML emulationSean Christopherson
Explicitly pass the L2 GPA to kvm_arch_write_log_dirty(), which for all intents and purposes is vmx_write_pml_buffer(), instead of having the latter pull the GPA from vmcs.GUEST_PHYSICAL_ADDRESS. If the dirty bit update is the result of KVM emulation (rare for L2), then the GPA in the VMCS may be stale and/or hold a completely unrelated GPA. Fixes: c5f983f6e8455 ("nVMX: Implement emulated Page Modification Logging") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200622215832.22090-2-sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-06-23libbpf: Add a bunch of attribute getters/setters for map definitionsAndrii Nakryiko
Add a bunch of getter for various aspects of BPF map. Some of these attribute (e.g., key_size, value_size, type, etc) are available right now in struct bpf_map_def, but this patch adds getter allowing to fetch them individually. bpf_map_def approach isn't very scalable, when ABI stability requirements are taken into account. It's much easier to extend libbpf and add support for new features, when each aspect of BPF map has separate getter/setter. Getters follow the common naming convention of not explicitly having "get" in its name: bpf_map__type() returns map type, bpf_map__key_size() returns key_size. Setters, though, explicitly have set in their name: bpf_map__set_type(), bpf_map__set_key_size(). This patch ensures we now have a getter and a setter for the following map attributes: - type; - max_entries; - map_flags; - numa_node; - key_size; - value_size; - ifindex. bpf_map__resize() enforces unnecessary restriction of max_entries > 0. It is unnecessary, because libbpf actually supports zero max_entries for some cases (e.g., for PERF_EVENT_ARRAY map) and treats it specially during map creation time. To allow setting max_entries=0, new bpf_map__set_max_entries() setter is added. bpf_map__resize()'s behavior is preserved for backwards compatibility reasons. Map ifindex getter is added as well. There is a setter already, but no corresponding getter. Fix this assymetry as well. bpf_map__set_ifindex() itself is converted from void function into error-returning one, similar to other setters. The only error returned right now is -EBUSY, if BPF map is already loaded and has corresponding FD. One lacking attribute with no ability to get/set or even specify it declaratively is numa_node. This patch fixes this gap and both adds programmatic getter/setter, as well as adds support for numa_node field in BTF-defined map. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20200621062112.3006313-1-andriin@fb.com
2020-06-22selftests/bpf: Test access to bpf map pointerAndrey Ignatov
Add selftests to test access to map pointers from bpf program for all map types except struct_ops (that one would need additional work). verifier test focuses mostly on scenarios that must be rejected. prog_tests test focuses on accessing multiple fields both scalar and a nested struct from bpf program and verifies that those fields have expected values. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/139a6a17f8016491e39347849b951525335c6eb4.1592600985.git.rdna@fb.com
2020-06-22bpf: Set map_btf_{name, id} for all map typesAndrey Ignatov
Set map_btf_name and map_btf_id for all map types so that map fields can be accessed by bpf programs. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/a825f808f22af52b018dbe82f1c7d29dab5fc978.1592600985.git.rdna@fb.com
2020-06-22bpf: Support access to bpf map fieldsAndrey Ignatov
There are multiple use-cases when it's convenient to have access to bpf map fields, both `struct bpf_map` and map type specific struct-s such as `struct bpf_array`, `struct bpf_htab`, etc. For example while working with sock arrays it can be necessary to calculate the key based on map->max_entries (some_hash % max_entries). Currently this is solved by communicating max_entries via "out-of-band" channel, e.g. via additional map with known key to get info about target map. That works, but is not very convenient and error-prone while working with many maps. In other cases necessary data is dynamic (i.e. unknown at loading time) and it's impossible to get it at all. For example while working with a hash table it can be convenient to know how much capacity is already used (bpf_htab.count.counter for BPF_F_NO_PREALLOC case). At the same time kernel knows this info and can provide it to bpf program. Fill this gap by adding support to access bpf map fields from bpf program for both `struct bpf_map` and map type specific fields. Support is implemented via btf_struct_access() so that a user can define their own `struct bpf_map` or map type specific struct in their program with only necessary fields and preserve_access_index attribute, cast a map to this struct and use a field. For example: struct bpf_map { __u32 max_entries; } __attribute__((preserve_access_index)); struct bpf_array { struct bpf_map map; __u32 elem_size; } __attribute__((preserve_access_index)); struct { __uint(type, BPF_MAP_TYPE_ARRAY); __uint(max_entries, 4); __type(key, __u32); __type(value, __u32); } m_array SEC(".maps"); SEC("cgroup_skb/egress") int cg_skb(void *ctx) { struct bpf_array *array = (struct bpf_array *)&m_array; struct bpf_map *map = (struct bpf_map *)&m_array; /* .. use map->max_entries or array->map.max_entries .. */ } Similarly to other btf_struct_access() use-cases (e.g. struct tcp_sock in net/ipv4/bpf_tcp_ca.c) the patch allows access to any fields of corresponding struct. Only reading from map fields is supported. For btf_struct_access() to work there should be a way to know btf id of a struct that corresponds to a map type. To get btf id there should be a way to get a stringified name of map-specific struct, such as "bpf_array", "bpf_htab", etc for a map type. Two new fields are added to `struct bpf_map_ops` to handle it: * .map_btf_name keeps a btf name of a struct returned by map_alloc(); * .map_btf_id is used to cache btf id of that struct. To make btf ids calculation cheaper they're calculated once while preparing btf_vmlinux and cached same way as it's done for btf_id field of `struct bpf_func_proto` While calculating btf ids, struct names are NOT checked for collision. Collisions will be checked as a part of the work to prepare btf ids used in verifier in compile time that should land soon. The only known collision for `struct bpf_htab` (kernel/bpf/hashtab.c vs net/core/sock_map.c) was fixed earlier. Both new fields .map_btf_name and .map_btf_id must be set for a map type for the feature to work. If neither is set for a map type, verifier will return ENOTSUPP on a try to access map_ptr of corresponding type. If just one of them set, it's verifier misconfiguration. Only `struct bpf_array` for BPF_MAP_TYPE_ARRAY and `struct bpf_htab` for BPF_MAP_TYPE_HASH are supported by this patch. Other map types will be supported separately. The feature is available only for CONFIG_DEBUG_INFO_BTF=y and gated by perfmon_capable() so that unpriv programs won't have access to bpf map fields. Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/6479686a0cd1e9067993df57b4c3eef0e276fec9.1592600985.git.rdna@fb.com