Age | Commit message (Collapse) | Author |
|
DWMAC4 is capable to support clause 45 mdio communication.
This patch enable the feature on stmmac_mdio_write() and
stmmac_mdio_read() by following phy_write_mmd() and
phy_read_mmd() mdiobus read write implementation format.
Reviewed-by: Li, Yifan <yifan2.li@intel.com>
Signed-off-by: Kweh Hock Leong <hock.leong.kweh@intel.com>
Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com>
Signed-off-by: Voon Weifeng <weifeng.voon@intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Break out parts of mshyperv.h that are ISA independent into a
separate file in include/asm-generic. This move facilitates
ARM64 code reusing these definitions and avoids code
duplication. No functionality or behavior is changed.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
|
|
Use netif_ovs_is_port() function instead of open code.
This patch doesn't change logic.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In this release cycle the number of NIC drivers using page_pool
will likely reach 4 drivers. It is about time to add a maintainer
entry. Add myself and Ilias.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Maxime Chevallier says:
====================
net: mvpp2: Add classification based on the ETHER flow
This series adds support for classification of the ETHER flow in the
mvpp2 driver.
The first patch allows detecting when a user specifies a flow_type that
isn't supported by the driver, while the second adds support for this
flow_type by adding the mapping between the ETHER_FLOW enum value and
the relevant classifier flow entries.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Users can specify classification actions based on the 'ether' flow type.
In that case, this will apply to all ethernet traffic, superseeding
flows such as 'udp4' or 'tcp6'.
Add support for this flow type in the PPv2 classifier, by mapping the
ETHER_FLOW value to the corresponding entries in the classifier.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a missing check to detect flow types that we don't support, so that
user can be informed of this.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"The changes in this cycle are:
- RCU flavor consolidation cleanups and optmizations
- Documentation updates
- Miscellaneous fixes
- SRCU updates
- RCU-sync flavor consolidation
- Torture-test updates
- Linux-kernel memory-consistency-model updates, most notably the
addition of plain C-language accesses"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
tools/memory-model: Improve data-race detection
tools/memory-model: Change definition of rcu-fence
tools/memory-model: Expand definition of barrier
tools/memory-model: Do not use "herd" to refer to "herd7"
tools/memory-model: Fix comment in MP+poonceonces.litmus
Documentation: atomic_t.txt: Explain ordering provided by smp_mb__{before,after}_atomic()
rcu: Don't return a value from rcu_assign_pointer()
rcu: Force inlining of rcu_read_lock()
rcu: Fix irritating whitespace error in rcu_assign_pointer()
rcu: Upgrade sync_exp_work_done() to smp_mb()
rcutorture: Upper case solves the case of the vanishing NULL pointer
torture: Suppress propagating trace_printk() warning
rcutorture: Dump trace buffer for callback pipe drain failures
torture: Add --trust-make to suppress "make clean"
torture: Make --cpus override idleness calculations
torture: Run kernel build in source directory
torture: Add function graph-tracing cheat sheet
torture: Capture qemu output
rcutorture: Tweak kvm options
rcutorture: Add trivial RCU implementation
...
|
|
If mmap() fails it returns MAP_FAILED, which is defined as ((void *) -1).
The current if-statement incorrectly tests if *ring is NULL.
Fixes: 358be656406d ("selftests/net: add txring_overwrite")
Signed-off-by: Frank de Brabander <debrabander@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Stefano Garzarella says:
====================
vsock/virtio: several fixes in the .probe() and .remove()
During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
before registering the driver", Stefan pointed out some possible issues
in the .probe() and .remove() callbacks of the virtio-vsock driver.
This series tries to solve these issues:
- Patch 1 adds RCU critical sections to avoid use-after-free of
'the_virtio_vsock' pointer.
- Patch 2 stops workers before to call vdev->config->reset(vdev) to
be sure that no one is accessing the device.
- Patch 3 moves the works flush at the end of the .remove() to avoid
use-after-free of 'vsock' object.
v3:
- Patch 1: use rcu_dereference_protected() to get the_virtio_vosck value in
the virtio_vsock_probe() [Jason]
v2: https://patchwork.kernel.org/cover/11022343/
v1: https://patchwork.kernel.org/cover/10964733/
Before this series the guest crashes in a few second. After this series the
test runs (~12h) without issues.
Tested on an SMP guest (-smp 4 -monitor tcp:127.0.0.1:1234,server,nowait)
with these scripts to stress the .probe()/.remove() path:
- guest
while true; do
cat /dev/urandom | nc-vsock -l 4321 > /dev/null &
cat /dev/urandom | nc-vsock -l 5321 > /dev/null &
cat /dev/urandom | nc-vsock -l 6321 > /dev/null &
cat /dev/urandom | nc-vsock -l 7321 > /dev/null &
wait
done
- host
while true; do
cat /dev/urandom | nc-vsock 3 4321 > /dev/null &
cat /dev/urandom | nc-vsock 3 5321 > /dev/null &
cat /dev/urandom | nc-vsock 3 6321 > /dev/null &
cat /dev/urandom | nc-vsock 3 7321 > /dev/null &
sleep 2
echo "device_del v1" | nc 127.0.0.1 1234
sleep 1
echo "device_add vhost-vsock-pci,id=v1,guest-cid=3" | nc 127.0.0.1 1234
sleep 1
done
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch moves the flush of works after vdev->config->del_vqs(vdev),
because we need to be sure that no workers run before to free the
'vsock' object.
Since we stopped the workers using the [tx|rx|event]_run flags,
we are sure no one is accessing the device while we are calling
vdev->config->reset(vdev), so we can safely move the workers' flush.
Before the vdev->config->del_vqs(vdev), workers can be scheduled
by VQ callbacks, so we must flush them after del_vqs(), to avoid
use-after-free of 'vsock' object.
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before to call vdev->config->reset(vdev) we need to be sure that
no one is accessing the device, for this reason, we add new variables
in the struct virtio_vsock to stop the workers during the .remove().
This patch also add few comments before vdev->config->reset(vdev)
and vdev->config->del_vqs(vdev).
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Some callbacks used by the upper layers can run while we are in the
.remove(). A potential use-after-free can happen, because we free
the_virtio_vsock without knowing if the callbacks are over or not.
To solve this issue we move the assignment of the_virtio_vsock at the
end of .probe(), when we finished all the initialization, and at the
beginning of .remove(), before to release resources.
For the same reason, we do the same also for the vdev->priv.
We use RCU to be sure that all callbacks that use the_virtio_vsock
ended before freeing it. This is not required for callbacks that
use vdev->priv, because after the vdev->config->del_vqs() we are sure
that they are ended and will no longer be invoked.
We also take the mutex during the .remove() to avoid that .probe() can
run while we are resetting the device.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Benedikt Spranger says:
====================
Document the configuration of b53
this is the third round to document the configuration of a b53 supported
switch.
v3..v2:
- fix a typo
- improve b53 configuration in DSA_TAG_PROTO_NONE showcase.
- grade up from RFC to patch for mainline inclusion.
v1..v2:
- split out generic parts of the configuration.
- target comments by Andrew Lunn and Florian Fainelli.
- make changes visible to build system
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Document the different needs of documentation for the b53 driver.
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Document DSA tagged and VLAN based switch configuration by showcases.
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Without CONFIG_OF, we get a build failure in the reboot-mode
implementation:
drivers/power/reset/reboot-mode.c: In function 'reboot_mode_register':
drivers/power/reset/reboot-mode.c:72:2: error: implicit declaration of function 'for_each_property_of_node'; did you mean 'for_each_child_of_node'? [-Werror=implicit-function-declaration]
for_each_property_of_node(np, prop) {
Add a Kconfig dependency like we have for the other users of
CONFIG_REBOOT_MODE.
Fixes: 7a78a7f7695b ("power: reset: nvmem-reboot-mode: use NVMEM as reboot mode write interface")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
Fix to return negative error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: 1f35a56cf586 ("nfp: tls: add/delete TLS TX connections")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Michael Chan says:
====================
bnxt_en: Add XDP_REDIRECT support.
This patch series adds XDP_REDIRECT support by Andy Gospodarek.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This removes contention over page allocation for XDP_REDIRECT actions by
adding page_pool support per queue for the driver. The performance for
XDP_REDIRECT actions scales linearly with the number of cores performing
redirect actions when using the page pools instead of the standard page
allocator.
v2: Fix up the error path from XDP registration, noted by Ilias Apalodimas.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This adds basic support for XDP_REDIRECT in the bnxt_en driver. Next
patch adds the more optimized page pool support.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
__bnxt_xmit_xdp() is used by XDP_TX and ethtool loopback packet transmit.
Refactor it so that it can be re-used by the XDP_REDIRECT logic.
Restructure the TX interrupt handler logic to cleanly separate XDP_TX
logic in preparation for XDP_REDIRECT.
Acked-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Renaming bnxt_xmit_xdp to __bnxt_xmit_xdp to get ready for XDP_REDIRECT
support and reduce confusion/namespace collision.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Ivan Khoronzhuk says:
====================
net: ethernet: ti: cpsw: Add XDP support
This patchset adds XDP support for TI cpsw driver and base it on
page_pool allocator. It was verified on af_xdp socket drop,
af_xdp l2f, ebpf XDP_DROP, XDP_REDIRECT, XDP_PASS, XDP_TX.
It was verified with following configs enabled:
CONFIG_JIT=y
CONFIG_BPFILTER=y
CONFIG_BPF_SYSCALL=y
CONFIG_XDP_SOCKETS=y
CONFIG_BPF_EVENTS=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_JIT=y
CONFIG_CGROUP_BPF=y
Link on previous v7:
https://lkml.org/lkml/2019/7/4/715
Also regular tests with iperf2 were done in order to verify impact on
regular netstack performance, compared with base commit:
https://pastebin.com/JSMT0iZ4
v8..v9:
- fix warnings on arm64 caused by typos in type casting
v7..v8:
- corrected dma calculation based on headroom instead of hard start
- minor comment changes
v6..v7:
- rolled back to v4 solution but with small modification
- picked up patch:
https://www.spinics.net/lists/netdev/msg583145.html
- added changes related to netsec fix and cpsw
v5..v6:
- do changes that is rx_dev while redirect/flush cycle is kept the same
- dropped net: ethernet: ti: davinci_cpdma: return handler status
- other changes desc in patches
v4..v5:
- added two plreliminary patches:
net: ethernet: ti: davinci_cpdma: allow desc split while down
net: ethernet: ti: cpsw_ethtool: allow res split while down
- added xdp alocator refcnt on xdp level, avoiding page pool refcnt
- moved flush status as separate argument for cpdma_chan_process
- reworked cpsw code according to last changes to allocator
- added missed statistic counter
v3..v4:
- added page pool user counter
- use same pool for ndevs in dual mac
- restructured page pool create/destroy according to the last changes in API
v2..v3:
- each rxq and ndev has its own page pool
v1..v2:
- combined xdp_xmit functions
- used page allocation w/o refcnt juggle
- unmapped page for skb netstack
- moved rxq/page pool allocation to open/close pair
- added several preliminary patches:
net: page_pool: add helper function to retrieve dma addresses
net: page_pool: add helper function to unmap dma addresses
net: ethernet: ti: cpsw: use cpsw as drv data
net: ethernet: ti: cpsw_ethtool: simplify slave loops
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add XDP support based on rx page_pool allocator, one frame per page.
Page pool allocator is used with assumption that only one rx_handler
is running simultaneously. DMA map/unmap is reused from page pool
despite there is no need to map whole page.
Due to specific of cpsw, the same TX/RX handler can be used by 2
network devices, so special fields in buffer are added to identify
an interface the frame is destined to. Thus XDP works for both
interfaces, that allows to test xdp redirect between two interfaces
easily. Also, each rx queue have own page pools, but common for both
netdevs.
XDP prog is common for all channels till appropriate changes are added
in XDP infrastructure. Also, once page_pool recycling becomes part of
skb netstack some simplifications can be added, like removing
page_pool_release_page() before skb receive.
In order to keep rx_dev while redirect, that can be somehow used in
future, do flush in rx_handler, that allows to keep rx dev the same
while redirect. It allows to conform with tracing rx_dev pointed
by Jesper.
Also, there is probability, that XDP generic code can be extended to
support multi ndev drivers like this one, using same rx queue for
several ndevs, based on switchdev for instance or else. In this case,
driver can be modified like exposed here:
https://lkml.org/lkml/2019/7/3/243
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
That's possible to set channel num while interfaces are down. When
interface gets up it should resplit budget. This resplit can happen
after phy is up but only if speed is changed, so should be set before
this, for this allow it to happen while changing number of channels,
when interfaces are down.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
That's possible to set ring params while interfaces are down. When
interface gets up it uses number of descs to fill rx queue and on
later on changes to create rx pools. Usually, this resplit can happen
after phy is up, but it can be needed before this, so allow it to
happen while setting number of rx descs, when interfaces are down.
Also, if no dependency on intf state, move it to cpdma layer, where
it should be.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In case if dma mapped packet needs to be sent, like with XDP
page pool, the "mapped" submit can be used. This patch adds dma
mapped submit based on regular one.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Jesper recently removed page_pool_destroy() (from driver invocation)
and moved shutdown and free of page_pool into xdp_rxq_info_unreg(),
in-order to handle in-flight packets/pages. This created an asymmetry
in drivers create/destroy pairs.
This patch reintroduce page_pool_destroy and add page_pool user
refcnt. This serves the purpose to simplify drivers error handling as
driver now drivers always calls page_pool_destroy() and don't need to
track if xdp_rxq_info_reg_mem_model() was unsuccessful.
This could be used for a special cases where a single RX-queue (with a
single page_pool) provides packets for two net_device'es, and thus
needs to register the same page_pool twice with two xdp_rxq_info
structures.
This patch is primarily to ease API usage for drivers. The recently
merged netsec driver, actually have a bug in this area, which is
solved by this API change.
This patch is a modified version of Ivan Khoronzhuk's original patch.
Link: https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/
Fixes: 5c67bf0ec4d0 ("net: netsec: Use page_pool API")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The third argument 'nomap' of early_init_dt_reserve_memory_arch() is
bool. It is preferred to pass it with a bool type parameter.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
Without the __always_inline at least i386 configs that have
CONFIG_OPTIMIZE_INLINING set seem fail to inline
dma_alloc_need_uncached, leading to a linker error because of
undefined symbols.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
|
|
When using the automarkup extension with:
make pdfdocs
without passing an specific book, the code will raise an exception:
File "/devel/v4l/docs/Documentation/sphinx/automarkup.py", line 86, in auto_markup
node.parent.replace(node, markup_funcs(name, app, node))
File "/devel/v4l/docs/Documentation/sphinx/automarkup.py", line 59, in markup_funcs
'function', target, pxref, lit_text)
File "/devel/v4l/docs/sphinx_2.0/lib/python3.7/site-packages/sphinx/domains/c.py", line 308, in resolve_xref
contnode, target)
File "/devel/v4l/docs/sphinx_2.0/lib/python3.7/site-packages/sphinx/util/nodes.py", line 450, in make_refnode
'#' + targetid)
File "/devel/v4l/docs/sphinx_2.0/lib/python3.7/site-packages/sphinx/builders/latex/__init__.py", line 159, in get_relative_uri
return self.get_target_uri(to, typ)
File "/devel/v4l/docs/sphinx_2.0/lib/python3.7/site-packages/sphinx/builders/latex/__init__.py", line 152, in get_target_uri
raise NoUri
sphinx.environment.NoUri
This happens because not all references will belong to a single
PDF/LaTeX document.
Better to just ignore those than breaking Sphinx build.
Fixes: d74b0d31ddde ("Docs: An initial automarkup extension for sphinx")
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
[jc: Narrowed the "except" and tweaked the comment]
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
The documentation is more appropriate for the administrator than for
the internal kernel API section it is currently in.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: Christian Brauner <christian@brauner.io>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
|
|
There is no need to cast "u64" to "unsigned long long" before printing
it, as both types have been made identical on all architectures many
years ago.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rob Herring <robh@kernel.org>
|
|
While creating new RDMA devices based on netdevice name, consider the net
namespace of the caller skb's socket similar to rest of the doit()
callbacks and nldev_dellink() which deletes the RDMA device created using
nldev_newlink().
Fixes: 3856ec4b93c94 ("RDMA/core: Add RDMA_NLDEV_CMD_NEWLINK/DELLINK support")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
The frags_q is not properly initialized, it may result in illegal memory
access when conn_info is NULL.
The "goto free_exit" should be replaced by "goto exit".
Signed-off-by: Yang Wei <albin_yang@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When CONFIG_OF is disabled, we get a harmless warning about the
newly added variable:
drivers/net/ethernet/cadence/macb_main.c:48:39: error: 'mgmt' defined but not used [-Werror=unused-variable]
static struct sifive_fu540_macb_mgmt *mgmt;
Move the variable closer to its use inside of the #ifdef.
Fixes: c218ad559020 ("macb: Add support for SiFive FU540-C000")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On unusual page sizes, we get harmless warnings:
drivers/net/ethernet/google/gve/gve_rx.c:283:6: error: unused variable 'pagecount' [-Werror,-Wunused-variable]
drivers/net/ethernet/google/gve/gve_rx.c:336:1: error: unused label 'have_skb' [-Werror,-Wunused-label]
Change the preprocessor #if to regular if() to avoid this.
Fixes: f5cedc84a30d ("gve: Add transmit and receive support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Calculate the correct byte_len on the receiving side when a work
completion is generated with IB_WC_RECV_RDMA_WITH_IMM opcode.
According to the IBA byte_len must indicate the number of written bytes,
whereas it was always equal to zero for the IB_WC_RECV_RDMA_WITH_IMM
opcode, even though data was transferred.
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Konstantin Taranov <konstantin.taranov@inf.ethz.ch>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Enable RDMA DIM by default for better user experience.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Added parameter in ib_device for enabling dynamic interrupt moderation so
that it can be configured in userspace using rdma tool.
In order to set adaptive-moderation for an ib device the command is:
rdma dev set [DEV] adaptive-moderation [on|off]
Please set on/off.
rdma dev show
0: mlx5_0: node_type ca fw 16.26.0055 node_guid 248a:0703:00a5:29d0
sys_image_guid 248a:0703:00a5:29d0 adaptive-moderation on
rdma resource show cq
dev mlx5_0 cqn 0 cqe 1023 users 4 poll-ctx UNBOUND_WORKQUEUE
adaptive-moderation off comm [ib_core]
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Added the interface in the infiniband driver that applies the rdma_dim
adaptive moderation. There is now a special function for allocating an
ib_cq that uses rdma_dim.
Performance improvement (ConnectX-5 100GbE, x86) running FIO benchmark over
NVMf between two equal end-hosts with 56 cores across a Mellanox switch
using null_blk device:
READS without DIM:
blk size | BW | IOPS | 99th percentile latency | 99.99th latency
512B | 3.8GiB/s | 7.7M | 1401 usec | 2442 usec
4k | 7.0GiB/s | 1.8M | 4817 usec | 6587 usec
64k | 10.7GiB/s| 175k | 9896 usec | 10028 usec
IO WRITES without DIM:
blk size | BW | IOPS | 99th percentile latency | 99.99th latency
512B | 3.6GiB/s | 7.5M | 1434 usec | 2474 usec
4k | 6.3GiB/s | 1.6M | 938 usec | 1221 usec
64k | 10.7GiB/s| 175k | 8979 usec | 12780 usec
IO READS with DIM:
blk size | BW | IOPS | 99th percentile latency | 99.99th latency
512B | 4GiB/s | 8.2M | 816 usec | 889 usec
4k | 10.1GiB/s| 2.65M| 3359 usec | 5080 usec
64k | 10.7GiB/s| 175k | 9896 usec | 10028 usec
IO WRITES with DIM:
blk size | BW | IOPS | 99th percentile latency | 99.99th latency
512B | 3.9GiB/s | 8.1M | 799 usec | 922 usec
4k | 9.6GiB/s | 2.5M | 717 usec | 1004 usec
64k | 10.7GiB/s| 176k | 8586 usec | 12256 usec
The rdma_dim algorithm was designed to measure the effectiveness of
moderation on the flow in a general way and thus should be appropriate
for all RDMA storage protocols.
rdma_dim is configured to be the default option based on performance
improvement seen after extensive tests.
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
RDMA DIM implements a different algorithm from net DIM and is based on
completions which is how we can implement interrupt moderation in RDMA.
The algorithm optimizes for number of completions and ratio between
completions and events. In order to avoid long latencies, the
implementation performs fast reduction of moderation level when the
traffic changes.
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Generic DIM
From: Tal Gilboa and Yamin Fridman
Implement net DIM over a generic DIM library, add RDMA DIM
dim.h lib exposes an implementation of the DIM algorithm for
dynamically-tuned interrupt moderation for networking interfaces.
We want a similar functionality for other protocols, which might need to
optimize interrupts differently. Main motivation here is DIM for NVMf
storage protocol.
Current DIM implementation prioritizes reducing interrupt overhead over
latency. Also, in order to reduce DIM's own overhead, the algorithm might
take some time to identify it needs to change profiles. While this is
acceptable for networking, it might not work well on other scenarios.
Here we propose a new structure to DIM. The idea is to allow a slightly
modified functionality without the risk of breaking Net DIM behavior for
netdev. We verified there are no degradations in current DIM behavior with
the modified solution.
Suggested solution:
- Common logic is implemented in lib/dim/dim.c
- Net DIM (existing) logic is implemented in lib/dim/net_dim.c, which uses
the common logic in dim.c
- Any new DIM logic will be implemented in "lib/dim/new_dim.c".
This new implementation will expose modified versions of profiles,
dim_step() and dim_decision().
- DIM API is declared in include/linux/dim.h for all implementations.
Pros for this solution are:
- Zero impact on existing net_dim implementation and usage
- Relatively more code reuse (compared to two separate solutions)
- Increased extensibility
Required for dependencies in the next series.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
Ben Hutchings says:
"This is the wrong place to change the queue mapping.
stmmac_xmit() is called with a specific TX queue locked,
and accessing a different TX queue results in a data race
for all of that queue's state.
I think this commit should be reverted upstream and in all
stable branches. Instead, the driver should implement the
ndo_select_queue operation and override the queue mapping there."
Fixes: c5acdbee22a1 ("net: stmmac: Send TSO packets always from Queue 0")
Suggested-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
On 32-bit architectures, dividing a 64-bit integer in the kernel
leads to a link error:
ERROR: "__udivdi3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
ERROR: "__divdi3" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
Change the two recently introduced instances to a multiply+shift
operation that is also much cheaper on 32-bit architectures.
We can do that here, since both of them are really 32-bit numbers
that change a few percent.
Fixes: bedbbe6af4be ("drm/amd/display: Move link functions from dc to dc_link")
Fixes: f18bc4e53ad6 ("drm/amd/display: update calculated bounding box logic for NV")
Acked-by: Slava Abramov <slava.abramov@amd.com>
Tested-by: Slava Abramov <slava.abramov@amd.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti updates from Thomas Gleixner:
"The speculative paranoia departement delivers a few more plugs for
possible (probably theoretical) spectre/mds leaks"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tls: Fix possible spectre-v1 in do_get_thread_area()
x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()
x86/speculation/mds: Eliminate leaks by trace_hardirqs_on()
|
|
This is only at notice level but it was pointed out that no other driver
does this.
Also there is no action the user can take as it is really a property of
the server.
Signed-off-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer updates from Thomas Gleixner:
"A rather large series consolidating the HPET code, which was triggered
by the attempt to bolt HPET NMI watchdog support on to the existing
maze with the usual duct tape and super glue approach.
This mainly removes two separate partially redundant storage layers
and consolidates them into a single one which provides a consistent
view of the different HPET channels and their usage and allows to
integrate HPET NMI watchdog support (if it turns out to be feasible)
in a non intrusive way"
* 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
x86/hpet: Use channel for legacy clockevent storage
x86/hpet: Use common init for legacy clockevent
x86/hpet: Carve out shareable parts of init_one_hpet_msi_clockevent()
x86/hpet: Consolidate clockevent functions
x86/hpet: Wrap legacy clockevent in hpet_channel
x86/hpet: Use cached info instead of extra flags
x86/hpet: Move clockevents into channels
x86/hpet: Rename variables to prepare for switching to channels
x86/hpet: Add function to select a /dev/hpet channel
x86/hpet: Add mode information to struct hpet_channel
x86/hpet: Use cached channel data
x86/hpet: Introduce struct hpet_base and struct hpet_channel
x86/hpet: Coding style cleanup
x86/hpet: Clean up comments
x86/hpet: Make naming consistent
x86/hpet: Remove not required includes
x86/hpet: Decapitalize and rename EVT_TO_HPET_DEV
x86/hpet: Simplify counter validation
x86/hpet: Separate counter check out of clocksource register code
x86/hpet: Shuffle code around for readability sake
...
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next:
1) Move bridge keys in nft_meta to nft_meta_bridge, from wenxu.
2) Support for bridge pvid matching, from wenxu.
3) Support for bridge vlan protocol matching, also from wenxu.
4) Add br_vlan_get_pvid_rcu(), to fetch the bridge port pvid
from packet path.
5) Prefer specific family extension in nf_tables.
6) Autoload specific family extension in case it is missing.
7) Add synproxy support to nf_tables, from Fernando Fernandez Mancera.
8) Support for GRE encapsulation in IPVS, from Vadim Fedorenko.
9) ICMP handling for GRE encapsulation, from Julian Anastasov.
10) Remove unused parameter in nf_queue, from Florian Westphal.
11) Replace seq_printf() by seq_puts() in nf_log, from Markus Elfring.
12) Rename nf_SYNPROXY.h => nf_synproxy.h before this header becomes
public.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|