Age | Commit message (Collapse) | Author |
|
If the object type doesn't implement an update operation and the user tries to
update it will silently ignore the update operation.
Fixes: aa4095a156b5 ("netfilter: nf_tables: fix possible null-pointer dereference in object update")
Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Invoking the following commands on a 32-bit architecture with strict
alignment requirements (such as an ARMv7-based Raspberry Pi) results
in an alignment exception:
# nft add table ip test-ip4
# nft add chain ip test-ip4 output { type filter hook output priority 0; }
# nft add rule ip test-ip4 output quota 1025 bytes
Alignment trap: not handling instruction e1b26f9f at [<7f4473f8>]
Unhandled fault: alignment exception (0x001) at 0xb832e824
Internal error: : 1 [#1] PREEMPT SMP ARM
Hardware name: BCM2835
[<7f4473fc>] (nft_quota_do_init [nft_quota])
[<7f447448>] (nft_quota_init [nft_quota])
[<7f4260d0>] (nf_tables_newrule [nf_tables])
[<7f4168dc>] (nfnetlink_rcv_batch [nfnetlink])
[<7f416bd0>] (nfnetlink_rcv [nfnetlink])
[<8078b334>] (netlink_unicast)
[<8078b664>] (netlink_sendmsg)
[<8071b47c>] (sock_sendmsg)
[<8071bd18>] (___sys_sendmsg)
[<8071ce3c>] (__sys_sendmsg)
[<8071ce94>] (sys_sendmsg)
The reason is that nft_quota_do_init() calls atomic64_set() on an
atomic64_t which is only aligned to 32-bit, not 64-bit, because it
succeeds struct nft_expr in memory which only contains a 32-bit pointer.
Fix by aligning the nft_expr private data to 64-bit.
Fixes: 96518518cc41 ("netfilter: add nftables")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Since v5.2 (commit "netlink: re-add parse/validate functions in strict
mode") NL_VALIDATE_STRICT is enabled. Fix the ipset nla_policies which did
not support strict mode and convert from deprecated parsings to verified ones.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
Same as commit 1b4a75108d5b ("netfilter: ipset: Copy the right MAC
address in bitmap:ip,mac and hash:ip,mac sets"), another copy and paste
went wrong in commit 8cc4ccf58379 ("netfilter: ipset: Allow matching on
destination MAC address for mac and ipmac sets").
When I fixed this for IPv4 in 1b4a75108d5b, I didn't realise that
hash:ip,mac sets also support IPv6 as family, and this is covered by a
separate function, hash_ipmac6_kadt().
In hash:ip,mac sets, the first dimension is the IP address, and the
second dimension is the MAC address: check the IPSET_DIM_TWO_SRC flag
in flags while deciding which MAC address to copy, destination or
source.
This way, mixing source and destination matches for the two dimensions
of ip,mac hash type works as expected, also for IPv6. With this setup:
ip netns add A
ip link add veth1 type veth peer name veth2 netns A
ip addr add 2001:db8::1/64 dev veth1
ip -net A addr add 2001:db8::2/64 dev veth2
ip link set veth1 up
ip -net A link set veth2 up
dst=$(ip netns exec A cat /sys/class/net/veth2/address)
ip netns exec A ipset create test_hash hash:ip,mac family inet6
ip netns exec A ipset add test_hash 2001:db8::1,${dst}
ip netns exec A ip6tables -A INPUT -p icmpv6 --icmpv6-type 135 -j ACCEPT
ip netns exec A ip6tables -A INPUT -m set ! --match-set test_hash src,dst -j DROP
ipset now correctly matches a test packet:
# ping -c1 2001:db8::2 >/dev/null
# echo $?
0
Reported-by: Chen, Yi <yiche@redhat.com>
Fixes: 8cc4ccf58379 ("netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
The copy_to_user() function returns the number of bytes remaining to be
copied. In this code, that positive return is checked at the end of the
function and we return zero/success. What we should do instead is
return -EFAULT.
Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
|
|
In preparation of AF XDP, move functions that will be used both by skb and
zero-copy paths to a new file called ice_txrx_lib.c. This allows us to
avoid using ifdefs to control the staticness of said functions.
Move other functions (ice_rx_csum, ice_rx_hash and ice_ptype_to_htype)
called only by the moved ones to the new file as well.
Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
For some reason I missed the case of DCCP passive
flows in my previous patch.
Fixes: a904a0693c18 ("inet: stop leaking jiffies on the wire")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Thiemo Nagel <tnagel@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This driver forgets to disable and unprepare clks when remove.
Add calls to clk_disable_unprepare to fix it.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Commit 5bc60de50dfe ("selftests: bpf: Don't try to read files without
read permission") got reverted as the fix was not working as expected
and real fix came in via 8101e069418d ("selftests: bpf: Skip write
only files in debugfs"). When bpf-next got merged into net-next, the
test_offload.py had a small conflict. Fix the resolution in ae8a76fb8b5d
iby not reintroducing 5bc60de50dfe again.
Fixes: ae8a76fb8b5d ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Building with W=1 (cf.scripts/Makefile.extrawarn) outputs:
warning: variable ‘ret’ set but not used [-Wunused-but-set-variable]
Drop the unused 'ret' variable.
Signed-off-by: Christophe Roullier <christophe.roullier@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
IOC3 hardware needs a 16k aligned TX ring.
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Set NETIF_F_HIGHDMA together with the NETIF_F_IP_CSUM flag instead of
letting the second assignment overwrite it. Probably doesn't matter
in practice as none of the systems an IOC3 is usually found in has
highmem to start with.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no need to fall back to a lower mask these days, the DMA mask
just communicates the hardware supported features.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dma_alloc_coherent always zeroes memory, there is no need for
__GFP_ZERO. Also doing a GFP_ATOMIC allocation just before a GFP_KERNEL
one is clearly bogus.
Fixes: ed870f6a7aa2 ("net: sgi: ioc3-eth: use dma-direct for dma allocations")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
dma_direct_ is a low-level API that must never be used by drivers
directly. Switch to use the proper DMA API instead.
Fixes: ed870f6a7aa2 ("net: sgi: ioc3-eth: use dma-direct for dma allocations")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Faster jhash2() can be used instead of jhash(), since
IPv6 addresses have the needed alignment requirement.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Before this change of_get_phy_mode() returned an enum,
phy_interface_t. On error, -ENODEV etc, is returned. If the result of
the function is stored in a variable of type phy_interface_t, and the
compiler has decided to represent this as an unsigned int, comparision
with -ENODEV etc, is a signed vs unsigned comparision.
Fix this problem by changing the API. Make the function return an
error, or 0 on success, and pass a pointer, of type phy_interface_t,
where the phy mode should be stored.
v2:
Return with *interface set to PHY_INTERFACE_MODE_NA on error.
Add error checks to all users of of_get_phy_mode()
Fixup a few reverse christmas tree errors
Fixup a few slightly malformed reverse christmas trees
v3:
Fix 0-day reported errors.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When commit df1c0b8468b3 ("[BRIDGE]: Packets leaking out of
disabled/blocked ports.") introduced the port state tests in
br_fdb_update() it was to avoid learning/refreshing from STP BPDUs, it was
also used to avoid learning/refreshing from user-space with NTF_USE. Those
two tests are done for every packet entering the bridge if it's learning,
but for the fast-path we already have them checked in br_handle_frame() and
is unnecessary to do it again. Thus push the checks to the unlikely cases
and drop them from br_fdb_update(), the new nbp_state_should_learn() helper
is used to determine if the port state allows br_fdb_update() to be called.
The two places which need to do it manually are:
- user-space add call with NTF_USE set
- link-local packet learning done in __br_handle_local_finish()
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add support for XDP. Implement ndo_bpf and ndo_xdp_xmit. Upon load of
an XDP program, allocate additional Tx rings for dedicated XDP use.
The following actions are supported: XDP_TX, XDP_DROP, XDP_REDIRECT,
XDP_PASS, and XDP_ABORTED.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
There's no reason for treating DCB as first class citizen when configuring
the Tx queues and going through TCs. Reverse the logic and base the
configuration logic on rings, which is the object of interest anyway.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
Remove a few uses of kernel configuration flags from ice_lib.c by
introducing a new source file ice_base.c. Also move corresponding
function prototypes from ice_lib.h to ice_base.h and include ice_base.h
where required.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/snawrocki/clk into clk-fixes
Pull Samsung clk driver fixes from Sylwester Nawrocki:
- system suspend related fixes for the exynos542x clocks driver
- probe() error paths fixes in the exynos5433 CMU driver adding
proper release of memory and clk resources
* tag 'clk-v5.4-samsung-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/snawrocki/clk:
clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume
clk: samsung: exynos542x: Move G3D subsystem clocks to its sub-CMU
clk: samsung: exynos5433: Fix error paths
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into clk-fixes
Two patches that fix some operator precedence and zeroing of bits
* tag 'sunxi-clk-fixes-for-5.4-1' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18
clk: sunxi: Fix operator precedence in sunxi_divs_clk_setup
|
|
Commit 3d8598fb9c5a ("clk: ti: clkctrl: use fallback udelay approach if
timekeeping is suspended") added handling for cases when timekeeping is
suspended. But looks like we can still get occasional "failed to enable"
errors on the PM runtime resume path with udelay() returning faster than
expected.
With ti-sysc interconnect target module driver this leads into device
failure with PM runtime failing with "failed to enable" clkctrl error.
Let's fix the issue with a delay of two times the desired delay as in
often done for udelay() to account for the inaccuracy.
Fixes: 3d8598fb9c5a ("clk: ti: clkctrl: use fallback udelay approach if timekeeping is suspended")
Cc: Keerthy <j-keerthy@ti.com>
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Link: https://lkml.kernel.org/r/20190930154001.46581-1-tony@atomide.com
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
ti_clk_register() calls it already so the driver should not create
duplicated alias.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Link: https://lkml.kernel.org/r/20191002083436.10194-1-peter.ujfalusi@ti.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
Make sure register data length does not mismatch immediate data length,
otherwise hit EOPNOTSUPP.
Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
When ASoC card instance is removed containing a HDA codec,
hdac_hda_codec_remove() may run in parallel with codec resume.
This will cause problems if the HDA link is freed with
snd_hdac_ext_bus_link_put() while the codec is still in
middle of its resume process.
To fix this, change the order such that pm_runtime_disable()
is called before the link is freed. This will ensure any
pending runtime PM action is completed before proceeding
to free the link.
This issue can be easily hit with e.g. SOF driver by loading and
unloading the drivers.
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20191101170635.26389-1-pierre-louis.bossart@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
When the compiler decides not to inline the Chunky-to-Planar core
functions, the build fails with:
c2p_planar.c:(.text+0xd6): undefined reference to `c2p_unsupported'
c2p_planar.c:(.text+0x1dc): undefined reference to `c2p_unsupported'
c2p_iplan2.c:(.text+0xc4): undefined reference to `c2p_unsupported'
c2p_iplan2.c:(.text+0x150): undefined reference to `c2p_unsupported'
Fix this by marking the functions __always_inline.
While this could be triggered before by manually enabling both
CONFIG_OPTIMIZE_INLINING and CONFIG_CC_OPTIMIZE_FOR_SIZE, it was exposed
in the m68k defconfig by commit ac7c3e4ff401b304 ("compiler: enable
CONFIG_OPTIMIZE_INLINING forcibly").
Fixes: 9012d011660ea5cf ("compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING")
Reported-by: noreply@ellerman.id.au
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20190927094708.11563-1-geert@linux-m68k.org
|
|
Saffire Pro i/o series
For Focusrite Saffire Pro i/o, the lowest 8 bits of register represents
configured source of sampling clock. The next lowest 8 bits represents
whether the configured source is actually detected or not just after
the register is changed for the source.
Current implementation evaluates whole the register to detect configured
source. This results in failure due to the next lowest 8 bits when the
source is connected in advance.
This commit fixes the bug.
Fixes: 25784ec2d034 ("ALSA: bebob: Add support for Focusrite Saffire/SaffirePro series")
Cc: <stable@vger.kernel.org> # v3.16+
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Link: https://lore.kernel.org/r/20191102150920.20367-1-o-takashi@sakamocchi.jp
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-11-01
Misc updates for mlx5 netdev and core driver
1) Steering Core: Replace CRC32 internal implementation with standard
kernel lib.
2) Steering Core: Support IPv4 and IPv6 mixed matcher.
3) Steering Core: Lockless FTE read lookups
4) TC: Bit sized fields rewrite support.
5) FPGA: Standalone FPGA support.
6) SRIOV: Reset VF parameters configurations on SRIOV disable.
7) netdev: Dump WQs wqe descriptors on CQE with error events.
8) MISC Cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
drivers/isdn/hardware/mISDN/mISDNisar.c:30:17:
warning: faxmodulation_s defined but not used [-Wunused-const-variable=]
It is never used, so can be removed.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The IDT ClockMatrix (TM) family includes integrated devices that provide
eight PLL channels. Each PLL channel can be independently configured as a
frequency synthesizer, jitter attenuator, digitally controlled
oscillator (DCO), or a digital phase lock loop (DPLL). Typically
these devices are used as timing references and clock sources for PTP
applications. This patch adds support for the device.
Co-developed-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add device tree binding doc for the IDT ClockMatrix PTP clock.
Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
traceroute6 output can be confusing, in that it shows the address
that a router would use to reach the sender, rather than the address
the packet used to reach the router.
Consider this case:
------------------------ N2
| |
------ ------ N3 ----
| R1 | | R2 |------|H2|
------ ------ ----
| |
------------------------ N1
|
----
|H1|
----
where H1's default route is through R1, and R1's default route is
through R2 over N2.
traceroute6 from H1 to H2 shows R2's address on N1 rather than on N2.
The script below can be used to reproduce this scenario.
traceroute6 output without this patch:
traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets
1 2000:101::1 (2000:101::1) 0.036 ms 0.008 ms 0.006 ms
2 2000:101::2 (2000:101::2) 0.011 ms 0.008 ms 0.007 ms
3 2000:103::4 (2000:103::4) 0.013 ms 0.010 ms 0.009 ms
traceroute6 output with this patch:
traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets
1 2000:101::1 (2000:101::1) 0.056 ms 0.019 ms 0.006 ms
2 2000:102::2 (2000:102::2) 0.013 ms 0.008 ms 0.008 ms
3 2000:103::4 (2000:103::4) 0.013 ms 0.009 ms 0.009 ms
#!/bin/bash
#
# ------------------------ N2
# | |
# ------ ------ N3 ----
# | R1 | | R2 |------|H2|
# ------ ------ ----
# | |
# ------------------------ N1
# |
# ----
# |H1|
# ----
#
# N1: 2000:101::/64
# N2: 2000:102::/64
# N3: 2000:103::/64
#
# R1's host part of address: 1
# R2's host part of address: 2
# H1's host part of address: 3
# H2's host part of address: 4
#
# For example:
# the IPv6 address of R1's interface on N2 is 2000:102::1/64
#
# Nets are implemented by macvlan interfaces (bridge mode) over
# dummy interfaces.
#
# Create net namespaces
ip netns add host1
ip netns add host2
ip netns add rtr1
ip netns add rtr2
# Create nets
ip link add net1 type dummy; ip link set net1 up
ip link add net2 type dummy; ip link set net2 up
ip link add net3 type dummy; ip link set net3 up
# Add interfaces to net1, move them to their nemaspaces
ip link add link net1 dev host1net1 type macvlan mode bridge
ip link set host1net1 netns host1
ip link add link net1 dev rtr1net1 type macvlan mode bridge
ip link set rtr1net1 netns rtr1
ip link add link net1 dev rtr2net1 type macvlan mode bridge
ip link set rtr2net1 netns rtr2
# Add interfaces to net2, move them to their nemaspaces
ip link add link net2 dev rtr1net2 type macvlan mode bridge
ip link set rtr1net2 netns rtr1
ip link add link net2 dev rtr2net2 type macvlan mode bridge
ip link set rtr2net2 netns rtr2
# Add interfaces to net3, move them to their nemaspaces
ip link add link net3 dev rtr2net3 type macvlan mode bridge
ip link set rtr2net3 netns rtr2
ip link add link net3 dev host2net3 type macvlan mode bridge
ip link set host2net3 netns host2
# Configure interfaces and routes in host1
ip netns exec host1 ip link set lo up
ip netns exec host1 ip link set host1net1 up
ip netns exec host1 ip -6 addr add 2000:101::3/64 dev host1net1
ip netns exec host1 ip -6 route add default via 2000:101::1
# Configure interfaces and routes in rtr1
ip netns exec rtr1 ip link set lo up
ip netns exec rtr1 ip link set rtr1net1 up
ip netns exec rtr1 ip -6 addr add 2000:101::1/64 dev rtr1net1
ip netns exec rtr1 ip link set rtr1net2 up
ip netns exec rtr1 ip -6 addr add 2000:102::1/64 dev rtr1net2
ip netns exec rtr1 ip -6 route add default via 2000:102::2
ip netns exec rtr1 sysctl net.ipv6.conf.all.forwarding=1
# Configure interfaces and routes in rtr2
ip netns exec rtr2 ip link set lo up
ip netns exec rtr2 ip link set rtr2net1 up
ip netns exec rtr2 ip -6 addr add 2000:101::2/64 dev rtr2net1
ip netns exec rtr2 ip link set rtr2net2 up
ip netns exec rtr2 ip -6 addr add 2000:102::2/64 dev rtr2net2
ip netns exec rtr2 ip link set rtr2net3 up
ip netns exec rtr2 ip -6 addr add 2000:103::2/64 dev rtr2net3
ip netns exec rtr2 sysctl net.ipv6.conf.all.forwarding=1
# Configure interfaces and routes in host2
ip netns exec host2 ip link set lo up
ip netns exec host2 ip link set host2net3 up
ip netns exec host2 ip -6 addr add 2000:103::4/64 dev host2net3
ip netns exec host2 ip -6 route add default via 2000:103::2
# Ping host2 from host1
ip netns exec host1 ping6 -c5 2000:103::4
# Traceroute host2 from host1
ip netns exec host1 traceroute6 2000:103::4
# Delete nets
ip link del net3
ip link del net2
ip link del net1
# Delete namespaces
ip netns del rtr2
ip netns del rtr1
ip netns del host2
ip netns del host1
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Original-patch-by: Honggang Xu <hxu@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As mentioned in commit e95584a889e1 ("tipc: fix unlimited bundling of
small messages"), the current message bundling algorithm is inefficient
that can generate bundles of only one payload message, that causes
unnecessary overheads for both the sender and receiver.
This commit re-designs the 'tipc_msg_make_bundle()' function (now named
as 'tipc_msg_try_bundle()'), so that when a message comes at the first
place, we will just check & keep a reference to it if the message is
suitable for bundling. The message buffer will be put into the link
backlog queue and processed as normal. Later on, when another one comes
we will make a bundle with the first message if possible and so on...
This way, a bundle if really needed will always consist of at least two
payload messages. Otherwise, we let the first buffer go its way without
any need of bundling, so reduce the overheads to zero.
Moreover, since now we have both the messages in hand, we can even
optimize the 'tipc_msg_bundle()' function, make bundle of a very large
(size ~ MSS) and small messages which is not with the current algorithm
e.g. [1400-byte message] + [10-byte message] (MTU = 1500).
Acked-by: Ying Xue <ying.xue@windreiver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Even with icmp_errors_use_inbound_ifaddr set, traceroute returns the
primary address of the interface the packet was received on, even if
the path goes through a secondary address. In the example:
1.0.3.1/24
---- 1.0.1.3/24 1.0.1.1/24 ---- 1.0.2.1/24 1.0.2.4/24 ----
|H1|--------------------------|R1|--------------------------|H2|
---- N1 ---- N2 ----
where 1.0.3.1/24 is R1's primary address on N1, traceroute from
H1 to H2 returns:
traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
1 1.0.3.1 (1.0.3.1) 0.018 ms 0.006 ms 0.006 ms
2 1.0.2.4 (1.0.2.4) 0.021 ms 0.007 ms 0.007 ms
After applying this patch, it returns:
traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets
1 1.0.1.1 (1.0.1.1) 0.033 ms 0.007 ms 0.006 ms
2 1.0.2.4 (1.0.2.4) 0.011 ms 0.007 ms 0.007 ms
Original-patch-by: Bill Fenner <fenner@arista.com>
Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Tonghao Zhang says:
====================
optimize openvswitch flow looking up
This series patch optimize openvswitch for performance or simplify
codes.
Patch 1, 2, 4: Port Pravin B Shelar patches to
linux upstream with little changes.
Patch 5, 6, 7: Optimize the flow looking up and
simplify the flow hash.
Patch 8, 9: are bugfix.
The performance test is on Intel Xeon E5-2630 v4.
The test topology is show as below:
+-----------------------------------+
| +---------------------------+ |
| | eth0 ovs-switch eth1 | | Host0
| +---------------------------+ |
+-----------------------------------+
^ |
| |
| |
| |
| v
+-----+----+ +----+-----+
| netperf | Host1 | netserver| Host2
+----------+ +----------+
We use netperf send the 64B packets, and insert 255+ flow-mask:
$ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:01:00:00:00:00/ff:ff:ff:ff:ff:01),eth_type(0x0800),ipv4(frag=no)" 2
...
$ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:ff:00:00:00:00/ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(frag=no)" 2
$
$ netperf -t UDP_STREAM -H 2.2.2.200 -l 40 -- -m 18
* Without series patch, throughput 8.28Mbps
* With series patch, throughput 46.05Mbps
v6:
some coding style fixes
v5:
rewrite patch 8, release flow-mask when freeing flow
v4:
access ma->count with READ_ONCE/WRITE_ONCE API. More information,
see patch 5 comments.
v3:
update ma point when realloc mask_array in patch 5
v2:
simplify codes. e.g. use kfree_rcu instead of call_rcu
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
use the specified functions to init resource.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Unlocking of a not locked mutex is not allowed.
Other kernel thread may be in critical section while
we unlock it because of setting user_feature fail.
Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index")
Cc: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When we destroy the flow tables which may contain the flow_mask,
so release the flow mask struct.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The most case *index < ma->max, and flow-mask is not NULL.
We add un/likely for performance.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simplify the code and remove the unnecessary BUILD_BUG_ON.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The full looking up on flow table traverses all mask array.
If mask-array is too large, the number of invalid flow-mask
increase, performance will be drop.
One bad case, for example: M means flow-mask is valid and NULL
of flow-mask means deleted.
+-------------------------------------------+
| M | NULL | ... | NULL | M|
+-------------------------------------------+
In that case, without this patch, openvswitch will traverses all
mask array, because there will be one flow-mask in the tail. This
patch changes the way of flow-mask inserting and deleting, and the
mask array will be keep as below: there is not a NULL hole. In the
fast path, we can "break" "for" (not "continue") in flow_lookup
when we get a NULL flow-mask.
"break"
v
+-------------------------------------------+
| M | M | NULL |... | NULL | NULL|
+-------------------------------------------+
This patch don't optimize slow or control path, still using ma->max
to traverse. Slow path:
* tbl_mask_array_realloc
* ovs_flow_tbl_lookup_exact
* flow_mask_find
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Port the codes to linux upstream and with little changes.
Pravin B Shelar, says:
| In case hash collision on mask cache, OVS does extra flow
| lookup. Following patch avoid it.
Link: https://github.com/openvswitch/ovs/commit/0e6efbe2712da03522532dc5e84806a96f6a0dd1
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When creating and inserting flow-mask, if there is no available
flow-mask, we realloc the mask array. When removing flow-mask,
if necessary, we shrink mask array.
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Port the codes to linux upstream and with little changes.
Pravin B Shelar, says:
| mask caches index of mask in mask_list. On packet recv OVS
| need to traverse mask-list to get cached mask. Therefore array
| is better for retrieving cached mask. This also allows better
| cache replacement algorithm by directly checking mask's existence.
Link: https://github.com/openvswitch/ovs/commit/d49fc3ff53c65e4eca9cabd52ac63396746a7ef5
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The idea of this optimization comes from a patch which
is committed in 2014, openvswitch community. The author
is Pravin B Shelar. In order to get high performance, I
implement it again. Later patches will use it.
Pravin B Shelar, says:
| On every packet OVS needs to lookup flow-table with every
| mask until it finds a match. The packet flow-key is first
| masked with mask in the list and then the masked key is
| looked up in flow-table. Therefore number of masks can
| affect packet processing performance.
Link: https://github.com/openvswitch/ovs/commit/5604935e4e1cbc16611d2d97f50b717aa31e8ec5
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Tested-by: Greg Rose <gvrose8192@gmail.com>
Acked-by: William Tu <u9012063@gmail.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 8f86a5b4ad679e4836733b47414226074eee4e4d.
It has been established that this causes a boot regression on
both Baytrail and Cherrytrail SoCs, and we can't have that in
the final kernel release, so we need to revert it.
Reported-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
This reverts commit 6658f87f219427ee776c498e07c878eb5cad1be2.
This revert is a prerequisite for the later revert of commit
8f86a5b4ad679e4836733b47414226074eee4e4d.
Reported-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
This reverts commit 4c87540940cbc7ddbe9674087919c605fd5c2ef1.
This revert is a prerequisite for the later revert of commit
8f86a5b4ad679e4836733b47414226074eee4e4d.
Reported-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|