summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-07-05bnxt_en: Add bnxt_en initial params table and register it.Vasundhara Volam
Create initial devlink parameters table for bnxt_en. Table consists of a permanent generic parameter. enable_sriov - Enables Single-Root Input/Output Virtualization(SR-IOV) characteristic of the device. Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05devlink: Add enable_sriov boolean generic parameterVasundhara Volam
enable_sriov - Enables Single-Root Input/Output Virtualization(SR-IOV) characteristic of the device. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05mlx4: Add support for devlink reload and load driverinit valuesMoshe Shemesh
Add mlx4_devlink_reload() to support devlink reload operation. Add mlx4_devlink_param_load_driverinit_values() to load values which were set using driverinit configuration mode. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05mlx4: Add mlx4 initial parameters table and register itMoshe Shemesh
Create initial parameters table for mlx4. The table consists of two generic parameters and two driver-specific parameters. Generic: internal_err_reset - Enable reset device on internal errors. This parameter can be configured on mlx4 either on runtime or during driver initialization. max_macs - Max number of MACs per ETH port. For mlx4 this parameter value range is between 1 and 128. This parameter can be configured on mlx4 only during driver initialization. Driver specific: enable_64b_cqe_eqe - Enable 64 byte CQEs/EQEs when the FW supports it. This parameter can be configured on mlx4 only during driver initialization. enable_4k_uar - Enable using 4K UAR. This parameter can be configured on mlx4 only during driver initialization. Register the parameters table on mlx4_init_one() and unregister on mlx4_remove_one(). Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05devlink: Add generic parameters internal_err_reset and max_macsMoshe Shemesh
Add 2 first generic parameters to devlink configuration parameters set: internal_err_reset - When set enables reset device on internal errors. max_macs - max number of MACs per ETH port. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05devlink: Add devlink notifications support for paramsMoshe Shemesh
Add devlink_param_notify() function to support devlink param notifications. Add notification call to devlink param set, register and unregister functions. Add devlink_param_value_changed() function to enable the driver notify devlink on value change. Driver should use this function after value was changed on any configuration mode part to driverinit. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05devlink: Add support for get/set driverinit valueMoshe Shemesh
"driverinit" configuration mode value is held by devlink to enable the driver query the value after reload. Two additional functions added to help the driver get/set the value from/to devlink: devlink_param_driverinit_value_set() and devlink_param_driverinit_value_get(). Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05devlink: Add param set commandMoshe Shemesh
Add param set command to set value for a parameter. Value can be set to any of the supported configuration modes. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05devlink: Add param get commandMoshe Shemesh
Add param get command which gets data per parameter. Option to dump the parameters data per device. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05devlink: Add devlink_param register and unregisterMoshe Shemesh
Define configuration parameters data structure. Add functions to register and unregister the driver supported configuration parameters table. For each parameter registered, the driver should fill all the parameter's fields. In case the only supported configuration mode is "driverinit" the parameter's get()/set() functions are not required and should be set to NULL, for any other configuration mode, these functions are required and should be set by the driver. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05net/hamradio/6pack: remove redundant variable channelColin Ian King
Variable channel is being assigned but is never used hence it is redundant and can be removed. Cleans up two clang warnings: warning: variable 'channel' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05fjes: use currently unused variable my_epid and max_epidColin Ian King
Variables my_epid and max_epid are currently assigned and not being used - however, I suspect they were intended to be used in the for-loops to reduce the dereferencing of hw. Replace hw->my_epid and hw->max_epid with these variables. Cleans up clang warnings: warning: variable 'my_epid' set but not used [-Wunused-but-set-variable] variable 'max_epid' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05net: tehuti: remove redundant pointer skbColin Ian King
Pointer skb is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'skb' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05net: socionext: remove redundant pointer ndevColin Ian King
Pointer ndev is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'ndev' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05net: aquantia: Make some functions staticWei Yongjun
Fixes the following sparse warnings: drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c:525:5: warning: symbol 'hw_atl_utils_mpi_set_speed' was not declared. Should it be static? drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_utils.c:536:5: warning: symbol 'hw_atl_utils_mpi_set_state' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05net: dsa: vsc73xx: Make some functions staticWei Yongjun
Fixes the following sparse warnings: drivers/net/dsa/vitesse-vsc73xx.c:1054:6: warning: symbol 'vsc73xx_get_strings' was not declared. Should it be static? drivers/net/dsa/vitesse-vsc73xx.c:1113:5: warning: symbol 'vsc73xx_get_sset_count' was not declared. Should it be static? drivers/net/dsa/vitesse-vsc73xx.c:1122:6: warning: symbol 'vsc73xx_get_ethtool_stats' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05net: dsa: fix spelling mistake "waitting" -> "waiting"Colin Ian King
Trivial fix to spelling mistake in dev_err error message. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05net: limit each hash list length to MAX_GRO_SKBSLi RongQing
After commit 07d78363dcff ("net: Convert NAPI gro list into a small hash table.")' there is 8 hash buckets, which allows more flows to be held for merging. but MAX_GRO_SKBS, the total held skb for merging, is 8 skb still, limit the hash table performance. keep MAX_GRO_SKBS as 8 skb, but limit each hash list length to 8 skb, not the total 8 skb Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05r8169: fix runtime suspendHeiner Kallweit
When runtime-suspending we configure WoL w/o touching saved_wolopts. If saved_wolopts == 0 we would power down the PHY in this case what's wrong. Therefore we have to check the actual chip WoL settings here. Fixes: 433f9d0ddcc6 ("r8169: improve saved_wolopts handling") Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05net: ipv4: fix drop handling in ip_list_rcv() and ip_list_rcv_finish()Edward Cree
Since callees (ip_rcv_core() and ip_rcv_finish_core()) might free or steal the skb, we can't use the list_cut_before() method; we can't even do a list_del(&skb->list) in the drop case, because skb might have already been freed and reused. So instead, take each skb off the source list before processing, and add it to the sublist afterwards if it wasn't freed or stolen. Fixes: 5fa12739a53d net: ipv4: listify ip_rcv_finish Fixes: 17266ee93984 net: ipv4: listified version of ip_rcv Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-05cxgb4: Add support to read actual provisioned resourcesCasey Leedom
In highly constrained resources environments (like the 124VF T5 and 248VF T6 configurations), PF4 may not have very many resources at all and we need to adapt to whatever we've been allocated, this patch adds support to get the provisioned resources. Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04epic100: remove redundant variable 'irq'Colin Ian King
Variable 'irq' is being assigned but is never used hence it is and can be removed. Cleans up clang warning: warning: variable 'irq' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04sfc: remove redundant variable old_vlanColin Ian King
Variable old_vlan is being assigned but is never used hence it is and can be removed. Cleans up clang warning: warning: variable 'old_vlan' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04qed: remove redundant pointer 'name'Colin Ian King
Pointer 'name' is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'name' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04ethernet: micrel: remove redundant pointer 'info'Colin Ian King
Pointer 'info' is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'info' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net: hinic: remove redundant pointer pfhwdevColin Ian King
Pointer pfhwdev is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'pfhwdev' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net: hns3: remove redundant variable 'protocol'Colin Ian King
Variable 'protocol' is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'protocol' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net: ethernet: gianfar_ethtool: remove redundant variable last_rule_idxColin Ian King
Variable last_rule_idx is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'last_rule_idx' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net: fec: remove redundant variable 'inc'Colin Ian King
Variable 'inc' is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'inc' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04cnic: remove redundant pointer req and variable funcColin Ian King
Pointer req and variable func are being assigned but are never used hence they are redundant and can be removed. Cleans up clang warnings: warning: variable 'req' set but not used [-Wunused-but-set-variable] warning: variable 'func' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net: bgmac: remove redundant variable 'freed'Colin Ian King
Variable 'freed' is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'freed' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net: ethernet: nb8800: remove redundant pointer rxdColin Ian King
Pointer rxd is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'rxb' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net: alx: remove redundant variable old_duplexColin Ian King
Variable old_duplex is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'old_duplex' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net: alteon: acenic: remove redundant pointer rxdescColin Ian King
Pointer rxdesc is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'rxdesc' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net: dsa: bcm_sf2: remove redundant variable offColin Ian King
Variable 'off' is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: warning: variable 'off' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04Merge branch 'Scheduled-packet-Transmission-ETF'David S. Miller
Jesus Sanchez-Palencia says: ==================== Scheduled packet Transmission: ETF Changes since v1: - moved struct sock_txtime from socket.h to uapi net_tstamp.h; - sk_clockid was changed from u16 to u8; - sk_txtime_flags was changed from u16 to a u8 bit field in struct sock; - the socket option flags are now validated in sock_setsockopt(); - added SO_EE_ORIGIN_TXTIME; - sockc.transmit_time is now initialized from all IPv4 Tx paths; - added support for the IPv6 Tx path; Overview ======== This work consists of a set of kernel interfaces that can be used by applications that require (time-based) Scheduled Tx of packets. It is comprised by 3 new components to the kernel: - SO_TXTIME: socket option + cmsg programming interfaces. - etf: the "earliest txtime first" qdisc, that provides per-queue TxTime-based scheduling. This has been renamed from 'tbs' to 'etf' to better describe its functionality. - taprio: the "time-aware priority scheduler" qdisc, that provides per-port Time-Aware scheduling; This patchset is providing the first 2 components, which have been developed for longer. The taprio qdisc will be shared as an RFC separately (shortly). Note that this series is a follow up of the "Time based packet transmission" RFCv3 [1]. etf (formerly known as 'tbs') ============================= For applications/systems that the concept of time slices isn't precise enough, the etf qdisc allows applications to control the instant when a packet should leave the network controller. When used in conjunction with taprio, it can also be used in case the application needs to control with greater guarantee the offset into each time slice a packet will be sent. Another use case of etf, is when only a small number of applications on a system are time sensitive, so it can then be used with a more traditional root qdisc (like mqprio). The etf qdisc is designed so it buffers packets until a configurable time before their deadline (Tx time). The qdisc uses a rbtree internally so the buffered packets are always 'ordered' by their txtime (deadline) and will be dequeued following the earliest txtime first. It relies on the SO_TXTIME API set for receiving the per-packet timestamp (txtime) as well as the config flags for each socket: the clockid to be used as a reference, if the expected mode of txtime for that socket is deadline or strict mode, and if packet drops should be reported on the socket's error queue or not. The qdisc will drop any packets with a Tx time in the past, or if a packet expires while waiting for being dequeued. Drops can be reported as errors back to userspace through the socket's error queue. Example configuration: $ tc qdisc add dev enp2s0 parent 100:1 etf offload delta 200000 \ clockid CLOCK_TAI Here, the Qdisc will use HW offload for the txtime control. Packets will be dequeued by the qdisc "delta" (200000) nanoseconds before their transmission time. Because this will be using HW offload and since dynamic clocks are not supported by hrtimers, the system clock and the PHC clock must be synchronized for this mode to behave as expected. A more complete example can be found here, with instructions of how to test it: https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f [2] Note that we haven't modified the qdisc so it uses a timerqueue because the modification needed was increasing the number of cachelines of a sk_buff. This series is also hosted on github and can be found at [3]. The companion iproute2 patches can be found at [4]. [1] https://patchwork.ozlabs.org/cover/882342/ [2] github doesn't make it clear, but the gist can be cloned like this: $ git clone https://gist.github.com/jeez/bd3afeff081ba64a695008dd8215866f scheduled-tx-tests [3] https://github.com/jeez/linux/tree/etf-v2 [4] https://github.com/jeez/iproute2/tree/etf-v2 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net/sched: Make etf report drops on error_queueJesus Sanchez-Palencia
Use the socket error queue for reporting dropped packets if the socket has enabled that feature through the SO_TXTIME API. Packets are dropped either on enqueue() if they aren't accepted by the qdisc or on dequeue() if the system misses their deadline. Those are reported as different errors so applications can react accordingly. Userspace can retrieve the errors through the socket error queue and the corresponding cmsg interfaces. A struct sock_extended_err* is used for returning the error data, and the packet's timestamp can be retrieved by adding both ee_data and ee_info fields as e.g.: ((__u64) serr->ee_data << 32) + serr->ee_info This feature is disabled by default and must be explicitly enabled by applications. Enabling it can bring some overhead for the Tx cycles of the application. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04igb: Add support for ETF offloadJesus Sanchez-Palencia
Implement HW offload support for SO_TXTIME through igb's Launchtime feature. This is done by extending igb_setup_tc() so it supports TC_SETUP_QDISC_ETF and configuring i210 so time based transmit arbitration is enabled. The FQTSS transmission mode added before is extended so strict priority (SP) queues wait for stream reservation (SR) ones. igb_config_tx_modes() is extended so it can support enabling/disabling Launchtime following the previous approach used for the credit-based shaper (CBS). As the previous flow, FQTSS transmission mode is enabled automatically by the driver once Launchtime (or CBS, as before) is enabled. Similarly, it's automatically disabled when the feature is disabled for the last queue that had it setup on. The driver just consumes the transmit times from the skbuffs directly, so no special handling is done in case an 'invalid' time is provided. We assume this has been handled by the ETF qdisc already. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04igb: Only call skb_tx_timestamp after descriptors are readyJesus Sanchez-Palencia
Currently, skb_tx_timestamp() is being called before the Tx descriptors are prepared in igb_xmit_frame_ring(), which happens during either the igb_tso() or igb_tx_csum() calls. Given that now the skb->tstamp might be used to carry the timestamp for SO_TXTIME, we must only call skb_tx_timestamp() after the information has been copied into the Tx descriptors. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04igb: Refactor igb_offload_cbs()Jesus Sanchez-Palencia
Split code into a separate function (igb_offload_apply()) that will be used by ETF offload implementation. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04igb: Only change Tx arbitration when CBS is onJesus Sanchez-Palencia
Currently the data transmission arbitration algorithm - DataTranARB field on TQAVCTRL reg - is always set to CBS when the Tx mode is changed from legacy to 'Qav' mode. Make that configuration a bit more granular in preparation for the upcoming Launchtime enabling patches, since CBS and Launchtime can be enabled separately. That is achieved by moving the DataTranARB setup to igb_config_tx_modes() instead. Similarly, when disabling CBS we must check if it has been disabled for all queues, and clear the DataTranARB accordingly. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04igb: Refactor igb_configure_cbs()Jesus Sanchez-Palencia
Make this function retrieve what it needs from the Tx ring being addressed since it already relies on what had been saved on it before. Also, since this function will be used by the upcoming Launchtime patches rename it to better reflect its intention. Note that Launchtime is not part of what 802.1Qav specifies, but the i210 datasheet refers to this set of functionality as "Qav Transmission Mode". Here we also perform a tiny refactor at is_any_cbs_enabled(), and add further documentation to igb_setup_tx_mode(). Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net/sched: Add HW offloading capability to ETFJesus Sanchez-Palencia
Add infra so etf qdisc supports HW offload of time-based transmission. For hw offload, the time sorted list is still used, so packets are dequeued always in order of txtime. Example: $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0 $ tc qdisc add dev enp2s0 parent 100:1 etf offload delta 100000 \ clockid CLOCK_REALTIME In this example, the Qdisc will use HW offload for the control of the transmission time through the network adapter. The hrtimer used for packets scheduling inside the qdisc will use the clockid CLOCK_REALTIME as reference and packets leave the Qdisc "delta" (100000) nanoseconds before their transmission time. Because this will be using HW offload and since dynamic clocks are not supported by the hrtimer, the system clock and the PHC clock must be synchronized for this mode to behave as expected. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net/sched: Introduce the ETF QdiscVinicius Costa Gomes
The ETF (Earliest TxTime First) qdisc uses the information added earlier in this series (the socket option SO_TXTIME and the new role of sk_buff->tstamp) to schedule packets transmission based on absolute time. For some workloads, just bandwidth enforcement is not enough, and precise control of the transmission of packets is necessary. Example: $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0 $ tc qdisc add dev enp2s0 parent 100:1 etf delta 100000 \ clockid CLOCK_TAI In this example, the Qdisc will provide SW best-effort for the control of the transmission time to the network adapter, the time stamp in the socket will be in reference to the clockid CLOCK_TAI and packets will leave the qdisc "delta" (100000) nanoseconds before its transmission time. The ETF qdisc will buffer packets sorted by their txtime. It will drop packets on enqueue() if their skbuff clockid does not match the clock reference of the Qdisc. Moreover, on dequeue(), a packet will be dropped if it expires while being enqueued. The qdisc also supports the SO_TXTIME deadline mode. For this mode, it will dequeue a packet as soon as possible and change the skb timestamp to 'now' during etf_dequeue(). Note that both the qdisc's and the SO_TXTIME ABIs allow for a clockid to be configured, but it's been decided that usage of CLOCK_TAI should be enforced until we decide to allow for other clockids to be used. The rationale here is that PTP times are usually in the TAI scale, thus no other clocks should be necessary. For now, the qdisc will return EINVAL if any clocks other than CLOCK_TAI are used. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net/sched: Allow creating a Qdisc watchdog with other clocksVinicius Costa Gomes
This adds 'qdisc_watchdog_init_clockid()' that allows a clockid to be passed, this allows other time references to be used when scheduling the Qdisc to run. Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net: packet: Hook into time based transmission.Richard Cochran
For raw layer-2 packets, copy the desired future transmit time from the CMSG cookie into the skb. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net: ipv6: Hook into time based transmissionJesus Sanchez-Palencia
Add a struct sockcm_cookie parameter to ip6_setup_cork() so we can easily re-use the transmit_time field from struct inet_cork for most paths, by copying the timestamp from the CMSG cookie. This is later copied into the skb during __ip6_make_skb(). For the raw fast path, also pass the sockcm_cookie as a parameter so we can just perform the copy at rawv6_send_hdrinc() directly. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net: ipv4: Hook into time based transmissionJesus Sanchez-Palencia
Add a transmit_time field to struct inet_cork, then copy the timestamp from the CMSG cookie at ip_setup_cork() so we can safely copy it into the skb later during __ip_make_skb(). For the raw fast path, just perform the copy at raw_send_hdrinc(). Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net: Add a new socket option for a future transmit time.Richard Cochran
This patch introduces SO_TXTIME. User space enables this option in order to pass a desired future transmit time in a CMSG when calling sendmsg(2). The argument to this socket option is a 8-bytes long struct provided by the uapi header net_tstamp.h defined as: struct sock_txtime { clockid_t clockid; u32 flags; }; Note that new fields were added to struct sock by filling a 2-bytes hole found in the struct. For that reason, neither the struct size or number of cachelines were altered. Signed-off-by: Richard Cochran <rcochran@linutronix.de> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-04net: Clear skb->tstamp only on the forwarding pathJesus Sanchez-Palencia
This is done in preparation for the upcoming time based transmission patchset. Now that skb->tstamp will be used to hold packet's txtime, we must ensure that it is being cleared when traversing namespaces. Also, doing that from skb_scrub_packet() before the early return would break our feature when tunnels are used. Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>