Age | Commit message (Collapse) | Author |
|
Under rare circumstances there might be a situation where a requirement
of having XDP Tx queue per CPU could not be fulfilled and some of the Tx
resources have to be shared between CPUs. This yields a need for placing
accesses to xdp_ring inside a critical section protected by spinlock.
These accesses happen to be in the hot path, so let's introduce the
static branch that will be triggered from the control plane when driver
could not provide Tx queue dedicated for XDP on each CPU.
Currently, the design that has been picked is to allow any number of XDP
Tx queues that is at least half of a count of CPUs that platform has.
For lower number driver will bail out with a response to user that there
were not enough Tx resources that would allow configuring XDP. The
sharing of rings is signalled via static branch enablement which in turn
indicates that lock for xdp_ring accesses needs to be taken in hot path.
Approach based on static branch has no impact on performance of a
non-fallback path. One thing that is needed to be mentioned is a fact
that the static branch will act as a global driver switch, meaning that
if one PF got out of Tx resources, then other PFs that ice driver is
servicing will suffer. However, given the fact that HW that ice driver
is handling has 1024 Tx queues per each PF, this is currently an
unlikely scenario.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Optimize Tx descriptor cleaning for XDP. Current approach doesn't
really scale and chokes when multiple flows are handled.
Introduce two ring fields, @next_dd and @next_rs that will keep track of
descriptor that should be looked at when the need for cleaning arise and
the descriptor that should have the RS bit set, respectively.
Note that at this point the threshold is a constant (32), but it is
something that we could make configurable.
First thing is to get away from setting RS bit on each descriptor. Let's
do this only once NTU is higher than the currently @next_rs value. In
such case, grab the tx_desc[next_rs], set the RS bit in descriptor and
advance the @next_rs by a 32.
Second thing is to clean the Tx ring only when there are less than 32
free entries. For that case, look up the tx_desc[next_dd] for a DD bit.
This bit is written back by HW to let the driver know that xmit was
successful. It will happen only for those descriptors that had RS bit
set. Clean only 32 descriptors and advance the DD bit.
Actual cleaning routine is moved from ice_napi_poll() down to the
ice_xmit_xdp_ring(). It is safe to do so as XDP ring will not get any
SKBs in there that would rely on interrupts for the cleaning. Nice side
effect is that for rare case of Tx fallback path (that next patch is
going to introduce) we don't have to trigger the SW irq to clean the
ring.
With those two concepts, ring is kept at being almost full, but it is
guaranteed that driver will be able to produce Tx descriptors.
This approach seems to work out well even though the Tx descriptors are
produced in one-by-one manner. Test was conducted with the ice HW
bombarded with packets from HW generator, configured to generate 30
flows.
Xdp2 sample yields the following results:
<snip>
proto 17: 79973066 pkt/s
proto 17: 80018911 pkt/s
proto 17: 80004654 pkt/s
proto 17: 79992395 pkt/s
proto 17: 79975162 pkt/s
proto 17: 79955054 pkt/s
proto 17: 79869168 pkt/s
proto 17: 79823947 pkt/s
proto 17: 79636971 pkt/s
</snip>
As that sample reports the Rx'ed frames, let's look at sar output.
It says that what we Rx'ed we do actually Tx, no noticeable drops.
Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil
Average: ens4f1 79842324.00 79842310.40 4678261.17 4678260.38 0.00 0.00 0.00 38.32
with tx_busy staying calm.
When compared to a state before:
Average: IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil
Average: ens4f1 90919711.60 42233822.60 5327326.85 2474638.04 0.00 0.00 0.00 43.64
it can be observed that the amount of txpck/s is almost doubled, meaning
that the performance is improved by around 90%. All of this due to the
drops in the driver, previously the tx_busy stat was bumped at a 7mpps
rate.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
With rings being split, it is now convenient to introduce a pointer to
XDP ring within the Rx ring. For XDP_TX workloads this means that
xdp_rings array access will be skipped, which was executed per each
processed frame.
Also, read the XDP prog once per NAPI and if prog is present, set up the
local xdp_ring pointer. Reading prog a single time was discussed in [1]
with some concern raised by Toke around dispatcher handling and having
the need for going through the RCU grace period in the ndo_bpf driver
callback, but ice currently is torning down NAPI instances regardless of
the prog presence on VSI.
Although the pointer to XDP ring introduced to Rx ring makes things a
lot slimmer/simpler, I still feel that single prog read per NAPI
lifetime is beneficial.
Further patch that will introduce the fallback path will also get a
profit from that as xdp_ring pointer will be set during the XDP rings
setup.
[1]: https://lore.kernel.org/bpf/87k0oseo6e.fsf@toke.dk/
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
xdp_frame is not needed for XDP_TX data path in ice driver case.
For this data path cleaning of sent descriptor will not happen anywhere
outside of the driver, which means that carrying the information about
the underlying memory model via xdp_frame will not be used. Therefore,
this conversion can be simply dropped, which would relieve CPU a bit.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
There has been a long lasting issue of improper xdp_rings indexing for
XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index
is mixed with smp_processor_id(), there could be a situation where Tx
descriptors are produced onto XDP Tx ring, but tail is never bumped -
for example pin a particular queue id to non-matching IRQ line.
Address this problem by ignoring the user ring count setting and always
initialize the xdp_rings array to be of num_possible_cpus() size. Then,
always use the smp_processor_id() as an index to xdp_rings array. This
provides serialization as at given time only a single softirq can run on
a particular CPU.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
While it was convenient to have a generic ring structure that served
both Tx and Rx sides, next commits are going to introduce several
Tx-specific fields, so in order to avoid hurting the Rx side, let's
pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs.
Rx ring could be handled by the old ice_ring which would reduce the code
churn within this patch, but this would make things asymmetric.
Make the union out of the ring container within ice_q_vector so that it
is possible to iterate over newly introduced ice_tx_ring.
Remove the @size as it's only accessed from control path and it can be
calculated pretty easily.
Change definitions of ice_update_ring_stats and
ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be
used for both Rx and Tx rings.
Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In
Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
difference now.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Currently ice_container_type is scoped only for ice_ethtool.c. Next
commit that will split the ice_ring struct onto Rx/Tx specific ring
structs is going to also modify the type of linked list of rings that is
within ice_ring_container. Therefore, the functions that are taking the
ice_ring_container as an input argument will need to be aware of a ring
type that will be looked up.
Embed ice_container_type within ice_ring_container and initialize it
properly when allocating the q_vectors.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This field is dead and driver is not making any use of it. Simply remove
it.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Fix the following build/link error by adding a dependency on the CRC32
routines:
ld: drivers/net/usb/lan78xx.o: in function `lan78xx_set_multicast':
lan78xx.c:(.text+0x48cf): undefined reference to `crc32_le'
The actual use of crc32_le() comes indirectly through ether_crc().
Fixes: 55d7de9de6c30 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for adaptive interrupt coalescing to the dpaa2-eth driver.
First of all, ETHTOOL_COALESCE_USE_ADAPTIVE_RX is defined as a supported
coalesce parameter and the requested state is configured through the
dpio APIs added in the previous patch.
Besides the ethtool API interaction, we keep track of how many bytes and
frames are dequeued per CDAN (Channel Data Availability Notification)
and update the Net DIM instance through the dpaa2_io_update_net_dim()
API.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use the newly exported dpio driver API to manually configure the IRQ
coalescing parameters requested by the user.
The .get_coalesce() and .set_coalesce() net_device callbacks are
implemented and directly export or setup the rx-usecs on all the
channels configured.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use of percpu_counter structure to track count of orphaned
sockets is causing problems on modern hosts with 256 cpus
or more.
Stefan Bach reported a serious spinlock contention in real workloads,
that I was able to reproduce with a netfilter rule dropping
incoming FIN packets.
53.56% server [kernel.kallsyms] [k] queued_spin_lock_slowpath
|
---queued_spin_lock_slowpath
|
--53.51%--_raw_spin_lock_irqsave
|
--53.51%--__percpu_counter_sum
tcp_check_oom
|
|--39.03%--__tcp_close
| tcp_close
| inet_release
| inet6_release
| sock_close
| __fput
| ____fput
| task_work_run
| exit_to_usermode_loop
| do_syscall_64
| entry_SYSCALL_64_after_hwframe
| __GI___libc_close
|
--14.48%--tcp_out_of_resources
tcp_write_timeout
tcp_retransmit_timer
tcp_write_timer_handler
tcp_write_timer
call_timer_fn
expire_timers
__run_timers
run_timer_softirq
__softirqentry_text_start
As explained in commit cf86a086a180 ("net/dst: use a smaller percpu_counter
batch for dst entries accounting"), default batch size is too big
for the default value of tcp_max_orphans (262144).
But even if we reduce batch sizes, there would still be cases
where the estimated count of orphans is beyond the limit,
and where tcp_too_many_orphans() has to call the expensive
percpu_counter_sum_positive().
One solution is to use plain per-cpu counters, and have
a timer to periodically refresh this cache.
Updating this cache every 100ms seems about right, tcp pressure
state is not radically changing over shorter periods.
percpu_counter was nice 15 years ago while hosts had less
than 16 cpus, not anymore by current standards.
v2: Fix the build issue for CONFIG_CRYPTO_DEV_CHELSIO_TLS=m,
reported by kernel test robot <lkp@intel.com>
Remove unused socket argument from tcp_too_many_orphans()
Fixes: dd24c00191d5 ("net: Use a percpu_counter for orphan_count")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Stefan Bach <sfb@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Move ports related config to dedicated struct to keep things organized.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
QCA original code report port instability and sa that SGMII also require
to set internal delay. Generalize the rgmii delay function and apply the
advised value if they are not defined in DT.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
QCA8328 switch is the bigger brother of the qca8327. Same regs different
chip. Change the function to set the correct pin layout and introduce a
new match_data to differentiate the 2 switch as they have the same ID
and their internal PHY have the same ID.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some qca8327 switch require to force the ignore of power on sel
strapping. Some switch require to set the led open drain mode in regs
instead of using strapping. While most of the device implements this
using the correct way using pin strapping, there are still some broken
device that require to be set using sw regs.
Introduce a new binding and support these special configuration.
As led open drain require to ignore pin strapping to work, the probe
fails with EINVAL error with incorrect configuration.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Support enabling PLL on the SGMII CPU port. Some device require this
special configuration or no traffic is transmitted and the switch
doesn't work at all. A dedicated binding is added to the CPU node
port to apply the correct reg on mac config.
Fail to correctly configure sgmii with qca8327 switch and warn if pll is
used on qca8337 with a revision greater than 1.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Future proof commit. This switch have 2 CPU ports and one valid
configuration is first CPU port set to sgmii and second CPU port set to
rgmii-id. The current implementation detects delay only for CPU port
zero set to rgmii and doesn't count any delay set in a secondary CPU
port. Drop the current delay scan function and move it to the sgmii
parser function to generalize and implicitly add support for secondary
CPU port set to rgmii-id. Introduce new logic where delay is enabled
also with internal delay binding declared and rgmii set as PHY mode.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently CPU port is always hardcoded to port 0. This switch have 2 CPU
ports. The original intention of this driver seems to be use the
mac06_exchange bit to swap MAC0 with MAC6 in the strange configuration
where device have connected only the CPU port 6. To skip the
introduction of a new binding, rework the driver to address the
secondary CPU port as primary and drop any reference of hardcoded port.
With configuration of mac06 exchange, just skip the definition of port0
and define the CPU port as a secondary. The driver will autoconfigure
the switch to use that as the primary CPU port.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for this in the qca8k driver. Also add support for SGMII
rx/tx clock falling edge. This is only present for pad0, pad5 and
pad6 have these bit reserved from Documentation. Add a comment that this
is hardcoded to PAD0 as qca8327/28/34/37 have an unique sgmii line and
setting falling in port0 applies to both configuration with sgmii used
for port0 or port6.
Co-developed-by: Matthew Hagan <mnhagan88@gmail.com>
Signed-off-by: Matthew Hagan <mnhagan88@gmail.com>
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add missing mac power sel support needed for ipq8064/5 SoC that require
1.8v for the internal regulator port instead of the default 1.5v.
If other device needs this, consider adding a dedicated binding to
support this.
Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The variable err is being initialized with a value that is never read, it
is being updated immediately afterwards. The assignment is redundant and
can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Adds support to flush or invalidate CPT CTX entries as part of FLR
and also provides a mailbox to flush CPT CTX entries in case of
graceful exit.
This patch also adds support for AF -> CPT PF uplink mailbox messages
and adds a new mbox message to submit a CPT instruction from AF.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Perform CPT LF teardown in non FLR path as well via cpt_lf_free()
Currently CPT LF teardown and reset sequence is only
done when FLR is handled with CPT LF still attached.
This patch also fixes cpt_lf_alloc() to set EXEC_LDWB in
CPT_AF_LFX_CTL2 when being completely overwritten as that is
the default value and is better for performance.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch enables and registers interrupt handler for CPT HW
interrupts.
Signed-off-by: Srujana Challa <schalla@marvell.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On i386, when builtin (not a loadable module), the winbond-840 driver
inspects boot_cpu_data to see what CPU family it is running on, and
then acts on that data. The "family" struct member (x86) does not exist
when running on UML, so prevent that test and do the default action.
Prevents this build error on UML + i386:
../drivers/net/ethernet/dec/tulip/winbond-840.c: In function ‘init_registers’:
../drivers/net/ethernet/dec/tulip/winbond-840.c:882:19: error: ‘struct cpuinfo_um’ has no member named ‘x86’
if (boot_cpu_data.x86 <= 4) {
Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-um@lists.infradead.org
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://lore.kernel.org/r/20211014050606.7288-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On a UML build, the igc_ptp driver uses CONFIG_X86_TSC for timestamp
conversion. The function that is used is not available on UML builds,
so have the function use the default system_counterval_t timestamp
instead for UML builds.
Prevents this build error on UML:
../drivers/net/ethernet/intel/igc/igc_ptp.c: In function ‘igc_device_tstamp_to_system’:
../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: implicit declaration of function ‘convert_art_ns_to_tsc’ [-Werror=implicit-function-declaration]
return convert_art_ns_to_tsc(tstamp);
../drivers/net/ethernet/intel/igc/igc_ptp.c:777:9: error: incompatible types when returning type ‘int’ but ‘struct system_counterval_t’ was expected
return convert_art_ns_to_tsc(tstamp);
Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-um@lists.infradead.org
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/r/20211014050516.6846-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
On i386, when builtin (not a loadable module), the fealnx driver
inspects boot_cpu_data to see what CPU family it is running on, and
then acts on that data. The "family" struct member (x86) does not exist
when running on UML, so prevent that test and do the default action.
Prevents this build error on UML + i386:
../drivers/net/ethernet/fealnx.c: In function ‘netdev_open’:
../drivers/net/ethernet/fealnx.c:861:19: error: ‘struct cpuinfo_um’ has no member named ‘x86’
Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-um@lists.infradead.org
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Link: https://lore.kernel.org/r/20211014050500.5620-1-rdunlap@infradead.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Adding comment to avoid the misusing of netvsc_xdp_xmit().
Otherwise the value of skb->queue_mapping could be 0 and
then the return value of skb_get_rx_queue() could be MAX_U16
cause by overflow.
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1634174786-1810351-1-git-send-email-jiasheng@iscas.ac.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The variable pp is not in use - delete it.
Signed-off-by: Yuval Shaia <yshaia@marvell.com>
Link: https://lore.kernel.org/r/20211013064921.26346-1-yshaia@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
PHY driver dp83867 has rich supports for OF-platform to fine-tune the PHY
chip during phy configuration. However, for non-OF platform, certain PHY
tunable parameters such as IO impedance and RX & TX internal delays are
critical and should be initialized to its default during PHY driver probe.
Tested-by: Clement <clement@intel.com>
Signed-off-by: Lay, Kuan Loon <kuan.loon.lay@intel.com>
Co-developed-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Tested-by: Kurt Kanzenbach <kurt@linutronix.de>
Link: https://lore.kernel.org/r/20211013065941.2124858-1-boon.leong.ong@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
If the PTP_PEROUT_DUTY_CYCLE flag is set, then check if the
request_on value in ptp_perout_request matches the pre-defined
values or a toggle option.
Return a failure if the value is not supported.
Preserve the old behaviors if the PTP_PEROUT_DUTY_CYCLE flag is not
set.
Tested with an oscilloscope on EVB-LAN7430:
e.g., to output PPS 1sec period 500mS on (high) to GPIO 2.
./testptp -L 2,2
./testptp -p 1000000000 -w 500000000
Signed-off-by: Yuiko Oshino <yuiko.oshino@microchip.com>
Link: https://lore.kernel.org/r/1634046593-64312-1-git-send-email-yuiko.oshino@microchip.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It seems reasonable to fine-tune only some of the skew values when using
one of the rgmii-*id PHY modes, and even when all skew values are
specified, using the correct ID PHY mode makes sense for documentation
purposes. Such a configuration also appears in the binding docs in
Documentation/devicetree/bindings/net/micrel-ksz90x1.txt, so the driver
should not warn about it.
Signed-off-by: Matthias Schiffer <matthias.schiffer@ew.tq-group.com>
Link: https://lore.kernel.org/r/20211012103402.21438-1-matthias.schiffer@ew.tq-group.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
tools/testing/selftests/net/ioam6.sh
7b1700e009cc ("selftests: net: modify IOAM tests for undef bits")
bf77b1400a56 ("selftests: net: Test for the IOAM encapsulation with IPv6")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Quite calm.
The noisy DSA driver (embedded switches) changes, and adjustment to
IPv6 IOAM behavior add to diffstat's bottom line but are not scary.
Current release - regressions:
- af_unix: rename UNIX-DGRAM to UNIX to maintain backwards
compatibility
- procfs: revert "add seq_puts() statement for dev_mcast", minor
format change broke user space
Current release - new code bugs:
- dsa: fix bridge_num not getting cleared after ports leaving the
bridge, resource leak
- dsa: tag_dsa: send packets with TX fwd offload from VLAN-unaware
bridges using VID 0, prevent packet drops if pvid is removed
- dsa: mv88e6xxx: keep the pvid at 0 when VLAN-unaware, prevent HW
getting confused about station to VLAN mapping
Previous releases - regressions:
- virtio-net: fix for skb_over_panic inside big mode
- phy: do not shutdown PHYs in READY state
- dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's, fix link
LED staying lit after ifdown
- mptcp: fix possible infinite wait on recvmsg(MSG_WAITALL)
- mqprio: Correct stats in mqprio_dump_class_stats()
- ice: fix deadlock for Tx timestamp tracking flush
- stmmac: fix feature detection on old hardware
Previous releases - always broken:
- sctp: account stream padding length for reconf chunk
- icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe()
- isdn: cpai: check ctr->cnr to avoid array index out of bound
- isdn: mISDN: fix sleeping function called from invalid context
- nfc: nci: fix potential UAF of rf_conn_info object
- dsa: microchip: prevent ksz_mib_read_work from kicking back in
after it's canceled in .remove and crashing
- dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged
ports
- dsa: sja1105, ocelot: break circular dependency between switch and
tag drivers
- dsa: felix: improve timestamping in presence of packe loss
- mlxsw: thermal: fix out-of-bounds memory accesses
Misc:
- ipv6: ioam: move the check for undefined bits to improve
interoperability"
* tag 'net-5.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (60 commits)
icmp: fix icmp_ext_echo_iio parsing in icmp_build_probe
MAINTAINERS: Update the devicetree documentation path of imx fec driver
sctp: account stream padding length for reconf chunk
mlxsw: thermal: Fix out-of-bounds memory accesses
ethernet: s2io: fix setting mac address during resume
NFC: digital: fix possible memory leak in digital_in_send_sdd_req()
NFC: digital: fix possible memory leak in digital_tg_listen_mdaa()
nfc: fix error handling of nfc_proto_register()
Revert "net: procfs: add seq_puts() statement for dev_mcast"
net: encx24j600: check error in devm_regmap_init_encx24j600
net: korina: select CRC32
net: arc: select CRC32
net: dsa: felix: break at first CPU port during init and teardown
net: dsa: tag_ocelot_8021q: fix inability to inject STP BPDUs into BLOCKING ports
net: dsa: felix: purge skb from TX timestamping queue if it cannot be sent
net: dsa: tag_ocelot_8021q: break circular dependency with ocelot switch lib
net: dsa: tag_ocelot: break circular dependency with ocelot switch lib driver
net: mscc: ocelot: cross-check the sequence id from the timestamp FIFO with the skb PTP header
net: mscc: ocelot: deny TX timestamping of non-PTP packets
net: mscc: ocelot: warn when a PTP IRQ is raised for an unknown skb
...
|
|
Support classifying packets with netfilter on egress to satisfy user
requirements such as:
* outbound security policies for containers (Laura)
* filtering and mangling intra-node Direct Server Return (DSR) traffic
on a load balancer (Laura)
* filtering locally generated traffic coming in through AF_PACKET,
such as local ARP traffic generated for clustering purposes or DHCP
(Laura; the AF_PACKET plumbing is contained in a follow-up commit)
* L2 filtering from ingress and egress for AVB (Audio Video Bridging)
and gPTP with nftables (Pablo)
* in the future: in-kernel NAT64/NAT46 (Pablo)
The egress hook introduced herein complements the ingress hook added by
commit e687ad60af09 ("netfilter: add netfilter ingress hook after
handle_ing() under unique static key"). A patch for nftables to hook up
egress rules from user space has been submitted separately, so users may
immediately take advantage of the feature.
Alternatively or in addition to netfilter, packets can be classified
with traffic control (tc). On ingress, packets are classified first by
tc, then by netfilter. On egress, the order is reversed for symmetry.
Conceptually, tc and netfilter can be thought of as layers, with
netfilter layered above tc.
Traffic control is capable of redirecting packets to another interface
(man 8 tc-mirred). E.g., an ingress packet may be redirected from the
host namespace to a container via a veth connection:
tc ingress (host) -> tc egress (veth host) -> tc ingress (veth container)
In this case, netfilter egress classifying is not performed when leaving
the host namespace! That's because the packet is still on the tc layer.
If tc redirects the packet to a physical interface in the host namespace
such that it leaves the system, the packet is never subjected to
netfilter egress classifying. That is only logical since it hasn't
passed through netfilter ingress classifying either.
Packets can alternatively be redirected at the netfilter layer using
nft fwd. Such a packet *is* subjected to netfilter egress classifying
since it has reached the netfilter layer.
Internally, the skb->nf_skip_egress flag controls whether netfilter is
invoked on egress by __dev_queue_xmit(). Because __dev_queue_xmit() may
be called recursively by tunnel drivers such as vxlan, the flag is
reverted to false after sch_handle_egress(). This ensures that
netfilter is applied both on the overlay and underlying network.
Interaction between tc and netfilter is possible by setting and querying
skb->mark.
If netfilter egress classifying is not enabled on any interface, it is
patched out of the data path by way of a static_key and doesn't make a
performance difference that is discernible from noise:
Before: 1537 1538 1538 1537 1538 1537 Mb/sec
After: 1536 1534 1539 1539 1539 1540 Mb/sec
Before + tc accept: 1418 1418 1418 1419 1419 1418 Mb/sec
After + tc accept: 1419 1424 1418 1419 1422 1420 Mb/sec
Before + tc drop: 1620 1619 1619 1619 1620 1620 Mb/sec
After + tc drop: 1616 1624 1625 1624 1622 1619 Mb/sec
When netfilter egress classifying is enabled on at least one interface,
a minimal performance penalty is incurred for every egress packet, even
if the interface it's transmitted over doesn't have any netfilter egress
rules configured. That is caused by checking dev->nf_hooks_egress
against NULL.
Measurements were performed on a Core i7-3615QM. Commands to reproduce:
ip link add dev foo type dummy
ip link set dev foo up
modprobe pktgen
echo "add_device foo" > /proc/net/pktgen/kpktgend_3
samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh -i foo -n 400000000 -m "11:11:11:11:11:11" -d 1.1.1.1
Accept all traffic with tc:
tc qdisc add dev foo clsact
tc filter add dev foo egress bpf da bytecode '1,6 0 0 0,'
Drop all traffic with tc:
tc qdisc add dev foo clsact
tc filter add dev foo egress bpf da bytecode '1,6 0 0 2,'
Apply this patch when measuring packet drops to avoid errors in dmesg:
https://lore.kernel.org/netdev/a73dda33-57f4-95d8-ea51-ed483abd6a7a@iogearbox.net/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Laura García Liébana <nevola@gmail.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Currently when a user uses "devlink dev info", the fw.mgmt.api will be
the major.minor numbers as shown below:
devlink dev info pci/0000:3b:00.0
pci/0000:3b:00.0:
driver ice
serial_number 00-01-00-ff-ff-00-00-00
versions:
fixed:
board.id K91258-000
running:
fw.mgmt 6.1.2
fw.mgmt.api 1.7 <--- No patch number included
fw.mgmt.build 0xd75e7d06
fw.mgmt.srev 5
fw.undi 1.2992.0
fw.undi.srev 5
fw.psid.api 3.10
fw.bundle_id 0x800085cc
fw.app.name ICE OS Default Package
fw.app 1.3.27.0
fw.app.bundle_id 0xc0000001
fw.netlist 3.10.2000-3.1e.0
fw.netlist.build 0x2a76e110
stored:
fw.mgmt.srev 5
fw.undi 1.2992.0
fw.undi.srev 5
fw.psid.api 3.10
fw.bundle_id 0x800085cc
fw.netlist 3.10.2000-3.1e.0
fw.netlist.build 0x2a76e110
There are many features in the driver that depend on the major, minor,
and patch version of the FW. Without the patch number in the output for
fw.mgmt.api debugging issues related to the FW API version is difficult.
Also, using major.minor.patch aligns with the existing firmware version
which uses a 3 digit value.
Fix this by making the fw.mgmt.api print the major.minor.patch
versions. Shown below is the result:
devlink dev info pci/0000:3b:00.0
pci/0000:3b:00.0:
driver ice
serial_number 00-01-00-ff-ff-00-00-00
versions:
fixed:
board.id K91258-000
running:
fw.mgmt 6.1.2
fw.mgmt.api 1.7.9 <--- patch number included
fw.mgmt.build 0xd75e7d06
fw.mgmt.srev 5
fw.undi 1.2992.0
fw.undi.srev 5
fw.psid.api 3.10
fw.bundle_id 0x800085cc
fw.app.name ICE OS Default Package
fw.app 1.3.27.0
fw.app.bundle_id 0xc0000001
fw.netlist 3.10.2000-3.1e.0
fw.netlist.build 0x2a76e110
stored:
fw.mgmt.srev 5
fw.undi 1.2992.0
fw.undi.srev 5
fw.psid.api 3.10
fw.bundle_id 0x800085cc
fw.netlist 3.10.2000-3.1e.0
fw.netlist.build 0x2a76e110
Fixes: ff2e5c700e08 ("ice: add basic handler for devlink .info_get")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Correct parameters order in call to ice_tunnel_idx_to_entry function.
Entry in sparse port table is correct when the idx is 0. For idx 1 one
correct entry should be skipped, for idx 2 two of them should be skipped
etc. Change if condition to be true when idx is 0, which means that
previous valid entry of this tunnel type were skipped.
Fixes: b20e6c17c468 ("ice: convert to new udp_tunnel infrastructure")
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In the remove path, there is an attempt to free the aux_idx IDA whether
it was allocated or not. This can potentially cause a crash when
unloading the driver on systems that do not initialize support for RDMA.
But, this free cannot be gated by the status bit for RDMA, since it is
allocated if the driver detects support for RDMA at probe time, but the
driver can enter into a state where RDMA is not supported after the IDA
has been allocated at probe time and this would lead to a memory leak.
Initialize aux_idx to an invalid value and check for a valid value when
unloading to determine if an IDA free is necessary.
Fixes: d25a0fc41c1f9 ("ice: Initialize RDMA support")
Reported-by: Jun Miao <jun.miao@windriver.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Currently if the VSI is rebuilt/removed and the RDMA PF driver is active
the RDMA Tx queue scheduler node configuration will not be cleaned up.
This will cause the rebuild/re-add of the VSI to fail due to the
software structures not being correctly cleaned up for the VSI index.
Fix this by always calling ice_rm_vsi_rdma_cfg() for all VSI. If there
are no RDMA scheduler nodes created, then there is no harm in calling
ice_rm_vsi_rdma_cfg(). This change applies to all VSI types, so if
RDMA support is added for other VSI types they will also get this
change.
Fixes: 348048e724a0 ("ice: Implement iidc operations")
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Jerzy Wiktor Jurkowski <jerzy.wiktor.jurkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
random_ether_addr() was the original name of the helper which
was kept for backward compatibility (?) after the rename in
commit 0a4dd594982a ("etherdevice: Rename random_ether_addr
to eth_random_addr").
We have a single random_ether_addr() caller left in tree
while there are 70 callers of eth_random_addr().
Time to drop this define.
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20211013205450.328092-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
This patch takes care of drivers which cast netdev->dev_addr to
a 16bit type, often with an explicit byte order.
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A handful of drivers contains loops assigning the mac
addr byte by byte. Convert those to eth_hw_addr_set().
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A straggler I somehow missed in the automated conversion in
commit 9ca01b25dfff ("ethernet: use of_get_ethdev_address()").
Use the new helper instead of using netdev->dev_addr directly.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
A handful of drivers use sizeof(addr) for the size of
the address, after manually confirming the size is
indeed 6 convert them to eth_hw_addr_set().
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Petko Manolov <petkan@nucleusys.com>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Number of drivers use eth_random_addr(netdev->dev_addr)
thus writing to netdev->dev_addr directly, and not setting
the address type. Switch them to eth_hw_addr_random().
@@
expression netdev;
@@
- eth_random_addr(netdev->dev_addr);
+ eth_hw_addr_random(netdev);
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This big patch sprinkles const on local variables and
function arguments which may refer to netdev->dev_addr.
Commit 406f42fa0d3c ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it got through appropriate helpers.
Some of the changes here are not strictly required - const
is sometimes cast off but pointer is not used for writing.
It seems like it's still better to add the const in case
the code changes later or relevant -W flags get enabled
for the build.
No functional changes.
Link: https://lore.kernel.org/r/20211014142432.449314-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Expose SMA and U.FL connectors as ptp_pins on E810-T based adapters and
allow controlling them.
E810-T adapters are equipped with:
- 2 external bidirectional SMA connectors
- 1 internal TX U.FL
- 1 internal RX U.FL
U.FL connectors share signal lines with the SMA connectors. The TX U.FL1
share the line with the SMA1 and the RX U.FL2 share line with the SMA2.
This dependence is controlled by the ice_verify_pin_e810t.
Additionally add support for the E810-T-based devices which don't use the
SMA/U.FL controller. If the IO expander is not detected don't expose pins
and use 2 predefined 1PPS input and output pins.
Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
E810-T adapters have two external bidirectional SMA connectors and two
internal unidirectional U.FL connectors. Multiplexing between U.FL and
SMA and SMA direction is controlled using the PCA9575 expander.
Add support for the PCA9575 detection and control of the respective pins
of the SMA/U.FL multiplexer using the GPIO AQ API.
Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Implement ice_aq_get_gpio and ice_aq_set_gpio for reading and changing
the state of GPIO pins described in the topology.
Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|