summaryrefslogtreecommitdiff
path: root/net/ethtool
AgeCommit message (Collapse)Author
2021-12-06ethtool: do not perform operations on net devices being unregisteredAntoine Tenart
There is a short period between a net device starts to be unregistered and when it is actually gone. In that time frame ethtool operations could still be performed, which might end up in unwanted or undefined behaviours[1]. Do not allow ethtool operations after a net device starts its unregistration. This patch targets the netlink part as the ioctl one isn't affected: the reference to the net device is taken and the operation is executed within an rtnl lock section and the net device won't be found after unregister. [1] For example adding Tx queues after unregister ends up in NULL pointer exceptions and UaFs, such as: BUG: KASAN: use-after-free in kobject_get+0x14/0x90 Read of size 1 at addr ffff88801961248c by task ethtool/755 CPU: 0 PID: 755 Comm: ethtool Not tainted 5.15.0-rc6+ #778 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-4.fc34 04/014 Call Trace: dump_stack_lvl+0x57/0x72 print_address_description.constprop.0+0x1f/0x140 kasan_report.cold+0x7f/0x11b kobject_get+0x14/0x90 kobject_add_internal+0x3d1/0x450 kobject_init_and_add+0xba/0xf0 netdev_queue_update_kobjects+0xcf/0x200 netif_set_real_num_tx_queues+0xb4/0x310 veth_set_channels+0x1c3/0x550 ethnl_set_channels+0x524/0x610 Fixes: 041b1c5d4a53 ("ethtool: helper functions for netlink interface") Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Antoine Tenart <atenart@kernel.org> Link: https://lore.kernel.org/r/20211203101318.435618-1-atenart@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26ethtool: ioctl: fix potential NULL deref in ethtool_set_coalesce()Julian Wiedmann
ethtool_set_coalesce() now uses both the .get_coalesce() and .set_coalesce() callbacks. But the check for their availability is buggy, so changing the coalesce settings on a device where the driver provides only _one_ of the callbacks results in a NULL pointer dereference instead of an -EOPNOTSUPP. Fix the condition so that the availability of both callbacks is ensured. This also matches the netlink code. Note that reproducing this requires some effort - it only affects the legacy ioctl path, and needs a specific combination of driver options: - have .get_coalesce() and .coalesce_supported but no .set_coalesce(), or - have .set_coalesce() but no .get_coalesce(). Here eg. ethtool doesn't cause the crash as it first attempts to call ethtool_get_coalesce() and bails out on error. Fixes: f3ccfda19319 ("ethtool: extend coalesce setting uAPI with CQE mode") Cc: Yufeng Mo <moyufeng@huawei.com> Cc: Huazhong Tan <tanhuazhong@huawei.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Link: https://lore.kernel.org/r/20211126175543.28000-1-jwi@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-03ethtool: fix ethtool msg len calculation for pause statsJakub Kicinski
ETHTOOL_A_PAUSE_STAT_MAX is the MAX attribute id, so we need to subtract non-stats and add one to get a count (IOW -2+1 == -1). Otherwise we'll see: ethnl cmd 21: calculated reply length 40, but consumed 52 Fixes: 9a27a33027f2 ("ethtool: add standard pause stats") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01ethtool: don't drop the rtnl_lock half way thru the ioctlJakub Kicinski
devlink compat code needs to drop rtnl_lock to take devlink->lock to ensure correct lock ordering. This is problematic because we're not strictly guaranteed that the netdev will not disappear after we re-lock. It may open a possibility of nested ->begin / ->complete calls. Instead of calling into devlink under rtnl_lock take a ref on the devlink instance and make the call after we've dropped rtnl_lock. We (continue to) assume that netdevs have an implicit reference on the devlink returned from ndo_get_devlink_port Note that ndo_get_devlink_port will now get called under rtnl_lock. That should be fine since none of the drivers seem to be taking serious locks inside ndo_get_devlink_port. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01ethtool: handle info/flash data copying outside rtnl_lockJakub Kicinski
We need to increase the lifetime of the data for .get_info and .flash_update beyond their handlers inside rtnl_lock. Allocate a union on the heap and use it instead. Note that we now copy the ethcmd before we lookup dev, hopefully there is no crazy user space depending on error codes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01ethtool: push the rtnl_lock into dev_ethtool()Jakub Kicinski
Don't take the lock in net/core/dev_ioctl.c, we'll have things to do outside rtnl_lock soon. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-24net: convert users of bitmap_foo() to linkmode_foo()Sean Anderson
This converts instances of bitmap_foo(args..., __ETHTOOL_LINK_MODE_MASK_NBITS) to linkmode_foo(args...) I manually fixed up some lines to prevent them from being excessively long. Otherwise, this change was generated with the following semantic patch: // Generated with // echo linux/linkmode.h > includes // git grep -Flf includes include/ | cut -f 2- -d / | cat includes - \ // | sort | uniq | tee new_includes | wc -l && mv new_includes includes // and repeating until the number stopped going up @i@ @@ ( #include <linux/acpi_mdio.h> | #include <linux/brcmphy.h> | #include <linux/dsa/loop.h> | #include <linux/dsa/sja1105.h> | #include <linux/ethtool.h> | #include <linux/ethtool_netlink.h> | #include <linux/fec.h> | #include <linux/fs_enet_pd.h> | #include <linux/fsl/enetc_mdio.h> | #include <linux/fwnode_mdio.h> | #include <linux/linkmode.h> | #include <linux/lsm_audit.h> | #include <linux/mdio-bitbang.h> | #include <linux/mdio.h> | #include <linux/mdio-mux.h> | #include <linux/mii.h> | #include <linux/mii_timestamper.h> | #include <linux/mlx5/accel.h> | #include <linux/mlx5/cq.h> | #include <linux/mlx5/device.h> | #include <linux/mlx5/driver.h> | #include <linux/mlx5/eswitch.h> | #include <linux/mlx5/fs.h> | #include <linux/mlx5/port.h> | #include <linux/mlx5/qp.h> | #include <linux/mlx5/rsc_dump.h> | #include <linux/mlx5/transobj.h> | #include <linux/mlx5/vport.h> | #include <linux/of_mdio.h> | #include <linux/of_net.h> | #include <linux/pcs-lynx.h> | #include <linux/pcs/pcs-xpcs.h> | #include <linux/phy.h> | #include <linux/phy_led_triggers.h> | #include <linux/phylink.h> | #include <linux/platform_data/bcmgenet.h> | #include <linux/platform_data/xilinx-ll-temac.h> | #include <linux/pxa168_eth.h> | #include <linux/qed/qed_eth_if.h> | #include <linux/qed/qed_fcoe_if.h> | #include <linux/qed/qed_if.h> | #include <linux/qed/qed_iov_if.h> | #include <linux/qed/qed_iscsi_if.h> | #include <linux/qed/qed_ll2_if.h> | #include <linux/qed/qed_nvmetcp_if.h> | #include <linux/qed/qed_rdma_if.h> | #include <linux/sfp.h> | #include <linux/sh_eth.h> | #include <linux/smsc911x.h> | #include <linux/soc/nxp/lpc32xx-misc.h> | #include <linux/stmmac.h> | #include <linux/sunrpc/svc_rdma.h> | #include <linux/sxgbe_platform.h> | #include <net/cfg80211.h> | #include <net/dsa.h> | #include <net/mac80211.h> | #include <net/selftests.h> | #include <rdma/ib_addr.h> | #include <rdma/ib_cache.h> | #include <rdma/ib_cm.h> | #include <rdma/ib_hdrs.h> | #include <rdma/ib_mad.h> | #include <rdma/ib_marshall.h> | #include <rdma/ib_pack.h> | #include <rdma/ib_pma.h> | #include <rdma/ib_sa.h> | #include <rdma/ib_smi.h> | #include <rdma/ib_umem.h> | #include <rdma/ib_umem_odp.h> | #include <rdma/ib_verbs.h> | #include <rdma/iw_cm.h> | #include <rdma/mr_pool.h> | #include <rdma/opa_addr.h> | #include <rdma/opa_port_info.h> | #include <rdma/opa_smi.h> | #include <rdma/opa_vnic.h> | #include <rdma/rdma_cm.h> | #include <rdma/rdma_cm_ib.h> | #include <rdma/rdmavt_cq.h> | #include <rdma/rdma_vt.h> | #include <rdma/rdmavt_qp.h> | #include <rdma/rw.h> | #include <rdma/tid_rdma_defs.h> | #include <rdma/uverbs_ioctl.h> | #include <rdma/uverbs_named_ioctl.h> | #include <rdma/uverbs_std_types.h> | #include <rdma/uverbs_types.h> | #include <soc/mscc/ocelot.h> | #include <soc/mscc/ocelot_ptp.h> | #include <soc/mscc/ocelot_vcap.h> | #include <trace/events/ib_mad.h> | #include <trace/events/rdma_core.h> | #include <trace/events/rdma.h> | #include <trace/events/rpcrdma.h> | #include <uapi/linux/ethtool.h> | #include <uapi/linux/ethtool_netlink.h> | #include <uapi/linux/mdio.h> | #include <uapi/linux/mii.h> ) @depends on i@ expression list args; @@ ( - bitmap_zero(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_zero(args) | - bitmap_copy(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_copy(args) | - bitmap_and(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_and(args) | - bitmap_or(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_or(args) | - bitmap_empty(args, ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_empty(args) | - bitmap_andnot(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_andnot(args) | - bitmap_equal(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_equal(args) | - bitmap_intersects(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_intersects(args) | - bitmap_subset(args, __ETHTOOL_LINK_MODE_MASK_NBITS) + linkmode_subset(args) ) Add missing linux/mii.h include to mellanox. -DaveM Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06ethtool: Add ability to control transceiver modules' power modeIdo Schimmel
Add a pair of new ethtool messages, 'ETHTOOL_MSG_MODULE_SET' and 'ETHTOOL_MSG_MODULE_GET', that can be used to control transceiver modules parameters and retrieve their status. The first parameter to control is the power mode of the module. It is only relevant for paged memory modules, as flat memory modules always operate in low power mode. When a paged memory module is in low power mode, its power consumption is reduced to the minimum, the management interface towards the host is available and the data path is deactivated. User space can choose to put modules that are not currently in use in low power mode and transition them to high power mode before putting the associated ports administratively up. This is useful for user space that favors reduced power consumption and lower temperatures over reduced link up times. In QSFP-DD modules the transition from low power mode to high power mode can take a few seconds and this transition is only expected to get longer with future / more complex modules. User space can control the power mode of the module via the power mode policy attribute ('ETHTOOL_A_MODULE_POWER_MODE_POLICY'). Possible values: * high: Module is always in high power mode. * auto: Module is transitioned by the host to high power mode when the first port using it is put administratively up and to low power mode when the last port using it is put administratively down. The operational power mode of the module is available to user space via the 'ETHTOOL_A_MODULE_POWER_MODE' attribute. The attribute is not reported to user space when a module is not plugged-in. The user API is designed to be generic enough so that it could be used for modules with different memory maps (e.g., SFF-8636, CMIS). The only implementation of the device driver API in this series is for a MAC driver (mlxsw) where the module is controlled by the device's firmware, but it is designed to be generic enough so that it could also be used by implementations where the module is controlled by the CPU. CMIS testing ============ # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x03 (ModuleReady) LowPwrAllowRequestHW : Off LowPwrRequestSW : Off The module is not in low power mode, as it is not forced by hardware (LowPwrAllowRequestHW is off) or by software (LowPwrRequestSW is off). The power mode can be queried from the kernel. In case LowPwrAllowRequestHW was on, the kernel would need to take into account the state of the LowPwrRequestHW signal, which is not visible to user space. $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy high power-mode high Change the power mode policy to 'auto': # ethtool --set-module swp11 power-mode-policy auto Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x01 (ModuleLowPwr) LowPwrAllowRequestHW : Off LowPwrRequestSW : On Put the associated port administratively up which will instruct the host to transition the module to high power mode: # ip link set dev swp11 up Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode high Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x03 (ModuleReady) LowPwrAllowRequestHW : Off LowPwrRequestSW : Off Put the associated port administratively down which will instruct the host to transition the module to low power mode: # ip link set dev swp11 down Query the power mode again: $ ethtool --show-module swp11 Module parameters for swp11: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp11 Identifier : 0x18 (QSFP-DD Double Density 8X Pluggable Transceiver (INF-8628)) ... Module State : 0x01 (ModuleLowPwr) LowPwrAllowRequestHW : Off LowPwrRequestSW : On SFF-8636 testing ================ # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled Power set : Off Power override : On ... Transmit avg optical power (Channel 1) : 0.7733 mW / -1.12 dBm Transmit avg optical power (Channel 2) : 0.7649 mW / -1.16 dBm Transmit avg optical power (Channel 3) : 0.7790 mW / -1.08 dBm Transmit avg optical power (Channel 4) : 0.7837 mW / -1.06 dBm Rcvr signal avg optical power(Channel 1) : 0.9302 mW / -0.31 dBm Rcvr signal avg optical power(Channel 2) : 0.9079 mW / -0.42 dBm Rcvr signal avg optical power(Channel 3) : 0.8993 mW / -0.46 dBm Rcvr signal avg optical power(Channel 4) : 0.8778 mW / -0.57 dBm The module is not in low power mode, as it is not forced by hardware (Power override is on) or by software (Power set is off). The power mode can be queried from the kernel. In case Power override was off, the kernel would need to take into account the state of the LPMode signal, which is not visible to user space. $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy high power-mode high Change the power mode policy to 'auto': # ethtool --set-module swp13 power-mode-policy auto Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled Power set : On Power override : On ... Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm Put the associated port administratively up which will instruct the host to transition the module to high power mode: # ip link set dev swp13 up Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode high Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) enabled Power set : Off Power override : On ... Transmit avg optical power (Channel 1) : 0.7934 mW / -1.01 dBm Transmit avg optical power (Channel 2) : 0.7859 mW / -1.05 dBm Transmit avg optical power (Channel 3) : 0.7885 mW / -1.03 dBm Transmit avg optical power (Channel 4) : 0.7985 mW / -0.98 dBm Rcvr signal avg optical power(Channel 1) : 0.9325 mW / -0.30 dBm Rcvr signal avg optical power(Channel 2) : 0.9034 mW / -0.44 dBm Rcvr signal avg optical power(Channel 3) : 0.9086 mW / -0.42 dBm Rcvr signal avg optical power(Channel 4) : 0.8885 mW / -0.51 dBm Put the associated port administratively down which will instruct the host to transition the module to low power mode: # ip link set dev swp13 down Query the power mode again: $ ethtool --show-module swp13 Module parameters for swp13: power-mode-policy auto power-mode low Verify with the data read from the EEPROM: # ethtool -m swp13 Identifier : 0x11 (QSFP28) ... Extended identifier description : 5.0W max. Power consumption, High Power Class (> 3.5 W) not enabled Power set : On Power override : On ... Transmit avg optical power (Channel 1) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 2) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 3) : 0.0000 mW / -inf dBm Transmit avg optical power (Channel 4) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 1) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 2) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 3) : 0.0000 mW / -inf dBm Rcvr signal avg optical power(Channel 4) : 0.0000 mW / -inf dBm Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-29ethtool: ioctl: Use array_size() helper in copy_{from,to}_user()Gustavo A. R. Silva
Use array_size() helper instead of the open-coded version in copy_{from,to}_user(). These sorts of multiplication factors need to be wrapped in array_size(). Link: https://github.com/KSPP/linux/issues/160 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-14ethtool: prevent endless loop if eeprom size is smaller than announcedHeiner Kallweit
It shouldn't happen, but can happen that readable eeprom size is smaller than announced. Then we would be stuck in an endless loop here because after reaching the actual end reads return eeprom.len = 0. I faced this issue when making a mistake in driver development. Detect this scenario and return an error. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-24ethtool: extend coalesce setting uAPI with CQE modeYufeng Mo
In order to support more coalesce parameters through netlink, add two new parameter kernel_coal and extack for .set_coalesce and .get_coalesce, then some extra info can return to user with the netlink API. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24ethtool: add two coalesce attributes for CQE modeYufeng Mo
Currently, there are many drivers who support CQE mode configuration, some configure it as a fixed when initialized, some provide an interface to change it by ethtool private flags. In order to make it more generic, add two new 'ETHTOOL_A_COALESCE_USE_CQE_TX' and 'ETHTOOL_A_COALESCE_USE_CQE_RX' coalesce attributes, then these parameters can be accessed by ethtool netlink coalesce uAPI. Also add an new structure kernel_ethtool_coalesce, then the new parameter can be added into this struct. Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-06ethtool: return error from ethnl_ops_begin if dev is NULLHeiner Kallweit
Julian reported that after d43c65b05b84 Coverity complains about a missing check whether dev is NULL in ethnl_ops_complete(). There doesn't seem to be any valid case where dev could be NULL when calling ethnl_ops_begin(), therefore return an error if dev is NULL. Fixes: d43c65b05b84 ("ethtool: runtime-resume netdev parent in ethnl_ops_begin") Reported-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-05net: Remove redundant if statementsYajun Deng
The 'if (dev)' statement already move into dev_{put , hold}, so remove redundant if statements. Signed-off-by: Yajun Deng <yajun.deng@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-03ethtool: runtime-resume netdev parent in ethnl_ops_beginHeiner Kallweit
If a network device is runtime-suspended then: - network device may be flagged as detached and all ethtool ops (even if not accessing the device) will fail because netif_device_present() returns false - ethtool ops may fail because device is not accessible (e.g. because being in D3 in case of a PCI device) It may not be desirable that userspace can't use even simple ethtool ops that not access the device if interface or link is down. To be more friendly to userspace let's ensure that device is runtime-resumed when executing the respective ethtool op in kernel. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-03ethtool: move netif_device_present check from ethnl_parse_header_dev_get to ↵Heiner Kallweit
ethnl_ops_begin If device is runtime-suspended and not accessible then it may be flagged as not present. If checking whether device is present is done too early then we may bail out before we have the chance to runtime-resume the device. Therefore move this check to ethnl_ops_begin(). This is in preparation of a follow-up patch that tries to runtime-resume the device before executing ethtool ops. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-03ethtool: move implementation of ethnl_ops_begin/complete to netlink.cHeiner Kallweit
In preparation of subsequent extensions to both functions move the implementations from netlink.h to netlink.c. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-03ethtool: runtime-resume netdev parent before ethtool ioctl opsHeiner Kallweit
If a network device is runtime-suspended then: - network device may be flagged as detached and all ethtool ops (even if not accessing the device) will fail because netif_device_present() returns false - ethtool ops may fail because device is not accessible (e.g. because being in D3 in case of a PCI device) It may not be desirable that userspace can't use even simple ethtool ops that not access the device if interface or link is down. To be more friendly to userspace let's ensure that device is runtime-resumed when executing the respective ethtool op in kernel. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27dev_ioctl: pass SIOCDEVPRIVATE data separatelyArnd Bergmann
The compat handlers for SIOCDEVPRIVATE are incorrect for any driver that passes data as part of struct ifreq rather than as an ifr_data pointer, or that passes data back this way, since the compat_ifr_data_ioctl() helper overwrites the ifr_data pointer and does not copy anything back out. Since all drivers using devprivate commands are now converted to the new .ndo_siocdevprivate callback, fix this by adding the missing piece and passing the pointer separately the whole way. This further unifies the native and compat logic for socket ioctls, as the new code now passes the correct pointer as well as the correct data for both native and compat ioctls. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-27ethtool: Fix rxnfc copy to user buffer overflowSaeed Mahameed
In the cited commit, copy_to_user() got called with the wrong pointer, instead of passing the actual buffer ptr to copy from, a pointer to the pointer got passed, which causes a buffer overflow calltrace to pop up when executing "ethtool -x ethX". Fix ethtool_rxnfc_copy_to_user() to use the rxnfc pointer as passed to the function, instead of a pointer to it. This fixes below call trace: [ 15.533533] ------------[ cut here ]------------ [ 15.539007] Buffer overflow detected (8 < 192)! [ 15.544110] WARNING: CPU: 3 PID: 1801 at include/linux/thread_info.h:200 copy_overflow+0x15/0x20 [ 15.549308] Modules linked in: [ 15.551449] CPU: 3 PID: 1801 Comm: ethtool Not tainted 5.14.0-rc2+ #1058 [ 15.553919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 15.558378] RIP: 0010:copy_overflow+0x15/0x20 [ 15.560648] Code: e9 7c ff ff ff b8 a1 ff ff ff eb c4 66 0f 1f 84 00 00 00 00 00 55 48 89 f2 89 fe 48 c7 c7 88 55 78 8a 48 89 e5 e8 06 5c 1e 00 <0f> 0b 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 [ 15.565114] RSP: 0018:ffffad49c0523bd0 EFLAGS: 00010286 [ 15.566231] RAX: 0000000000000000 RBX: 00000000000000c0 RCX: 0000000000000000 [ 15.567616] RDX: 0000000000000001 RSI: ffffffff8a7912e7 RDI: 00000000ffffffff [ 15.569050] RBP: ffffad49c0523bd0 R08: ffffffff8ab2ae28 R09: 00000000ffffdfff [ 15.570534] R10: ffffffff8aa4ae40 R11: ffffffff8aa4ae40 R12: 0000000000000000 [ 15.571899] R13: 00007ffd4cc2a230 R14: ffffad49c0523c00 R15: 0000000000000000 [ 15.573584] FS: 00007f538112f740(0000) GS:ffff96d5bdd80000(0000) knlGS:0000000000000000 [ 15.575639] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.577092] CR2: 00007f5381226d40 CR3: 0000000013542000 CR4: 00000000001506e0 [ 15.578929] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 15.580695] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 15.582441] Call Trace: [ 15.582970] ethtool_rxnfc_copy_to_user+0x30/0x46 [ 15.583815] ethtool_get_rxnfc.cold+0x23/0x2b [ 15.584584] dev_ethtool+0x29c/0x25f0 [ 15.585286] ? security_netlbl_sid_to_secattr+0x77/0xd0 [ 15.586728] ? do_set_pte+0xc4/0x110 [ 15.587349] ? _raw_spin_unlock+0x18/0x30 [ 15.588118] ? __might_sleep+0x49/0x80 [ 15.588956] dev_ioctl+0x2c1/0x490 [ 15.589616] sock_ioctl+0x18e/0x330 [ 15.591143] __x64_sys_ioctl+0x41c/0x990 [ 15.591823] ? irqentry_exit_to_user_mode+0x9/0x20 [ 15.592657] ? irqentry_exit+0x33/0x40 [ 15.593308] ? exc_page_fault+0x32f/0x770 [ 15.593877] ? exit_to_user_mode_prepare+0x3c/0x130 [ 15.594775] do_syscall_64+0x35/0x80 [ 15.595397] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 15.596037] RIP: 0033:0x7f5381226d4b [ 15.596492] Code: 0f 1e fa 48 8b 05 3d b1 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 0d b1 0c 00 f7 d8 64 89 01 48 [ 15.598743] RSP: 002b:00007ffd4cc2a1f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 15.599804] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f5381226d4b [ 15.600795] RDX: 00007ffd4cc2a350 RSI: 0000000000008946 RDI: 0000000000000003 [ 15.601712] RBP: 00007ffd4cc2a340 R08: 00007ffd4cc2a350 R09: 0000000000000001 [ 15.602751] R10: 00007f538128a990 R11: 0000000000000246 R12: 0000000000000000 [ 15.603882] R13: 00007ffd4cc2a350 R14: 00007ffd4cc2a4b0 R15: 0000000000000000 [ 15.605042] ---[ end trace 325cf185e2795048 ]--- Fixes: dd98d2895de6 ("ethtool: improve compat ioctl handling") Reported-by: Shannon Nelson <snelson@pensando.io> CC: Arnd Bergmann <arnd@arndb.de> CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Tested-by: Shannon Nelson <snelson@pensando.io> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-23ethtool: improve compat ioctl handlingArnd Bergmann
The ethtool compat ioctl handling is hidden away in net/socket.c, which introduces a couple of minor oddities: - The implementation may end up diverging, as seen in the RXNFC extension in commit 84a1d9c48200 ("net: ethtool: extend RXNFC API to support RSS spreading of filter matches") that does not work in compat mode. - Most architectures do not need the compat handling at all because u64 and compat_u64 have the same alignment. - On x86, the conversion is done for both x32 and i386 user space, but it's actually wrong to do it for x32 and cannot work there. - On 32-bit Arm, it never worked for compat oabi user space, since that needs to do the same conversion but does not. - It would be nice to get rid of both compat_alloc_user_space() and copy_in_user() throughout the kernel. None of these actually seems to be a serious problem that real users are likely to encounter, but fixing all of them actually leads to code that is both shorter and more readable. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-01net: sock: extend SO_TIMESTAMPING for PHC bindingYangbo Lu
Since PTP virtual clock support is added, there can be several PTP virtual clocks based on one PTP physical clock for timestamping. This patch is to extend SO_TIMESTAMPING API to support PHC (PTP Hardware Clock) binding by adding a new flag SOF_TIMESTAMPING_BIND_PHC. When PTP virtual clocks are in use, user space can configure to bind one for timestamping, but PTP physical clock is not supported and not needed to bind. This patch is preparation for timestamp conversion from raw timestamp to a specific PTP virtual clock time in core net. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-01ethtool: add a new command for getting PHC virtual clocksYangbo Lu
Add an interface for getting PHC (PTP Hardware Clock) virtual clocks, which are based on PHC physical clock providing hardware timestamp to network packets. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22ethtool: Validate module EEPROM offset as part of policyIdo Schimmel
Validate the offset to read from module EEPROM as part of the netlink policy and remove the corresponding check from the code. This also makes it possible to query the offset range from user space: $ genl ctrl policy name ethtool ... ID: 0x14 policy[32]:attr[2]: type=U32 range:[0,255] ... Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22ethtool: Validate module EEPROM length as part of policyIdo Schimmel
Validate the number of bytes to read from the module EEPROM as part of the netlink policy and remove the corresponding check from the code. This also makes it possible to query the length range from user space: $ genl ctrl policy name ethtool ... ID: 0x14 policy[32]:attr[3]: type=U32 range:[1,128] ... Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-22ethtool: Decrease size of module EEPROM get policy arrayIdo Schimmel
The 'ETHTOOL_A_MODULE_EEPROM_DATA' attribute is not part of the get request. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Trivial conflicts in net/can/isotp.c and tools/testing/selftests/net/mptcp/mptcp_connect.sh scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c to include/linux/ptp_clock_kernel.h in -next so re-apply the fix there. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-16ethtool: add a stricter length checkJakub Kicinski
There has been a few errors in the ethtool reply size calculations, most of those are hard to trigger during basic testing because of skb size rounding up and netdev names being shorter than max. Add a more precise check. This change will affect the value of payload length displayed in case of -EMSGSIZE but that should be okay, "payload length" isn't a well defined term here. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-14ethtool: strset: fix message length calculationJakub Kicinski
Outer nest for ETHTOOL_A_STRSET_STRINGSETS is not accounted for. This may result in ETHTOOL_MSG_STRSET_GET producing a warning like: calculated message payload length (684) not sufficient WARNING: CPU: 0 PID: 30967 at net/ethtool/netlink.c:369 ethnl_default_doit+0x87a/0xa20 and a splat. As usually with such warnings three conditions must be met for the warning to trigger: - there must be no skb size rounding up (e.g. reply_size of 684); - string set must be per-device (so that the header gets populated); - the device name must be at least 12 characters long. all in all with current user space it looks like reading priv flags is the only place this could potentially happen. Or with syzbot :) Reported-by: syzbot+59aa77b92d06cd5a54f2@syzkaller.appspotmail.com Fixes: 71921690f974 ("ethtool: provide string sets with STRSET_GET request") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-09net: ethtool: clear heap allocations for ethtool functionAustin Kim
Several ethtool functions leave heap uncleared (potentially) by drivers. This will leave the unused portion of heap unchanged and might copy the full contents back to userspace. Signed-off-by: Austin Kim <austindh.kim@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-07ethtool: Fix NULL pointer dereference during module EEPROM dumpIdo Schimmel
When get_module_eeprom_by_page() is not implemented by the driver, NULL pointer dereference can occur [1]. Fix by testing if get_module_eeprom_by_page() is implemented instead of get_module_info(). [1] BUG: kernel NULL pointer dereference, address: 0000000000000000 [...] CPU: 0 PID: 251 Comm: ethtool Not tainted 5.13.0-rc3-custom-00940-g3822d0670c9d #989 Call Trace: eeprom_prepare_data+0x101/0x2d0 ethnl_default_doit+0xc2/0x290 genl_family_rcv_msg_doit+0xdc/0x140 genl_rcv_msg+0xd7/0x1d0 netlink_rcv_skb+0x49/0xf0 genl_rcv+0x1f/0x30 netlink_unicast+0x1f6/0x2c0 netlink_sendmsg+0x1f9/0x400 __sys_sendto+0xe1/0x130 __x64_sys_sendto+0x1b/0x20 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: c97a31f66ebc ("ethtool: wire in generic SFP module access") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-02ethtool: Fix a typoZheng Yongjun
atribute ==> attribute Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-19ethtool: stats: Fix a copy-paste errorYueHaibing
data->ctrl_stats should be memset with correct size. Fixes: bfad2b979ddc ("ethtool: add interface to read standard MAC Ctrl stats") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-05ethtool: fix missing NLM_F_MULTI flag when dumpingFernando Fernandez Mancera
When dumping the ethtool information from all the interfaces, the netlink reply should contain the NLM_F_MULTI flag. This flag allows userspace tools to identify that multiple messages are expected. Link: https://bugzilla.redhat.com/1953847 Fixes: 365f9ae4ee36 ("ethtool: fix genlmsg_put() failure handling in ethnl_default_dumpit()") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19ethtool: stats: clarify the initialization to ETHTOOL_STAT_NOT_SETJakub Kicinski
Ido suggests we add a comment about the init of stats to -1. This is unlikely to be clear to first time readers. Suggested-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-19ethtool: ioctl: Fix out-of-bounds warning in store_link_ksettings_for_user()Gustavo A. R. Silva
Fix the following out-of-bounds warning: net/ethtool/ioctl.c:492:2: warning: 'memcpy' offset [49, 84] from the object at 'link_usettings' is out of the bounds of referenced subobject 'base' with type 'struct ethtool_link_settings' at offset 0 [-Warray-bounds] The problem is that the original code is trying to copy data into a some struct members adjacent to each other in a single call to memcpy(). This causes a legitimate compiler warning because memcpy() overruns the length of &link_usettings.base. Fix this by directly using &link_usettings and _from_ as destination and source addresses, instead. This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). Link: https://github.com/KSPP/linux/issues/109 Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c - keep the ZC code, drop the code related to reinit net/bridge/netfilter/ebtables.c - fix build after move to net_generic Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-04-16ethtool: add interface to read RMON statsJakub Kicinski
Most devices maintain RMON (RFC 2819) stats - particularly the "histogram" of packets received by size. Unlike other RFCs which duplicate IEEE stats, the short/oversized frame counters in RMON don't seem to match IEEE stats 1-to-1 either, so expose those, too. Do not expose basic packet, CRC errors etc - those are already otherwise covered. Because standard defines packet ranges only up to 1518, and everything above that should theoretically be "oversized" - devices often create their own ranges. Going beyond what the RFC defines - expose the "histogram" in the Tx direction (assume for now that the ranges will be the same). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16ethtool: add interface to read standard MAC Ctrl statsJakub Kicinski
Number of devices maintains the standard-based MAC control counters for control frames. Add a API for those. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16ethtool: add interface to read standard MAC statsJakub Kicinski
Most of the MAC statistics are included in struct rtnl_link_stats64, but some fields are aggregated. Besides it's good to expose these clearly hardware stats separately. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-16ethtool: add a new command for reading standard statsJakub Kicinski
Add an interface for reading standard stats, including stats which don't have a corresponding control interface. Start with IEEE 802.3 PHY stats. There seems to be only one stat to expose there. Define API to not require user space changes when new stats or groups are added. Groups are based on bitset, stats have a string set associated. v1: wrap stats in a nest Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15ethtool: add FEC statisticsJakub Kicinski
Similarly to pause statistics add stats for FEC. The IEEE standard mandates two sets of counters: - 30.5.1.1.17 aFECCorrectedBlocks - 30.5.1.1.18 aFECUncorrectableBlocks where block is a block of bits FEC operates on. Each of these counters is defined per lane (PCS instance). Multiple vendors provide number of corrected _bits_ rather than/as well as blocks. This set adds the 2 standard-based block counters and a extra one for corrected bits. Counters are exposed to user space via netlink in new attributes. Each attribute carries an array of u64s, first element is the total count, and the following ones are a per-lane break down. Much like with pause stats the operation will not fail when driver does not implement the get_fec_stats callback (nor can the driver fail the operation by returning an error). If stats can't be reported the relevant attributes will be empty. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15ethtool: fec_prepare_data() - jump to error handlingJakub Kicinski
Refactor fec_prepare_data() a little bit to skip the body of the function and exit on error. Currently the code depends on the fact that we only have one call which may fail between ethnl_ops_begin() and ethnl_ops_complete() and simply saves the error code. This will get hairy with the stats also being queried. No functional changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-15ethtool: move ethtool_stats_initJakub Kicinski
We'll need it for FEC stats as well. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-14ethtool: pause: make sure we init driver statsJakub Kicinski
The intention was for pause statistics to not be reported when driver does not have the relevant callback (only report an empty netlink nest). What happens currently we report all 0s instead. Make sure statistics are initialized to "not set" (which is -1) so the dumping code skips them. Fixes: 9a27a33027f2 ("ethtool: add standard pause stats") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-12ethtool: fix kdoc attr nameJakub Kicinski
Add missing 't' in attrtype. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11ethtool: wire in generic SFP module accessAndrew Lunn
If the device has a sfp bus attached, call its sfp_get_module_eeprom_by_page() function, otherwise use the ethtool op for the device. This follows how the IOCTL works. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11ethtool: Add fallback to get_module_eeprom from netlink commandVladyslav Tarasiuk
In case netlink get_module_eeprom_by_page() callback is not implemented by the driver, try to call old get_module_info() and get_module_eeprom() pair. Recalculate parameters to get_module_eeprom() offset and len using page number and their sizes. Return error if this can't be done. Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11net: ethtool: Export helpers for getting EEPROM infoAndrew Lunn
There are two ways to retrieve information from SFP EEPROMs. Many devices make use of the common code, and assign the sfp_bus pointer in the netdev to point to the bus holding the SFP device. Some MAC drivers directly implement ops in there ethool structure. Export within net/ethtool the two helpers used to call these methods, so that they can also be used in the new netlink code. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-11ethtool: Allow network drivers to dump arbitrary EEPROM dataVladyslav Tarasiuk
Define get_module_eeprom_by_page() ethtool callback and implement netlink infrastructure. get_module_eeprom_by_page() allows network drivers to dump a part of module's EEPROM specified by page and bank numbers along with offset and length. It is effectively a netlink replacement for get_module_info() and get_module_eeprom() pair, which is needed due to emergence of complex non-linear EEPROM layouts. Signed-off-by: Vladyslav Tarasiuk <vladyslavt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>