linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2019-11-04	netfilter: nf_tables: fix unexpected EOPNOTSUPP error	Fernando Fernandez Mancera
	If the object type doesn't implement an update operation and the user tries to update it will silently ignore the update operation. Fixes: aa4095a156b5 ("netfilter: nf_tables: fix possible null-pointer dereference in object update") Signed-off-by: Fernando Fernandez Mancera <ffmancera@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-04	netfilter: nf_tables: Align nft_expr private data to 64-bit	Lukas Wunner
	Invoking the following commands on a 32-bit architecture with strict alignment requirements (such as an ARMv7-based Raspberry Pi) results in an alignment exception: # nft add table ip test-ip4 # nft add chain ip test-ip4 output { type filter hook output priority 0; } # nft add rule ip test-ip4 output quota 1025 bytes Alignment trap: not handling instruction e1b26f9f at [<7f4473f8>] Unhandled fault: alignment exception (0x001) at 0xb832e824 Internal error: : 1 [#1] PREEMPT SMP ARM Hardware name: BCM2835 [<7f4473fc>] (nft_quota_do_init [nft_quota]) [<7f447448>] (nft_quota_init [nft_quota]) [<7f4260d0>] (nf_tables_newrule [nf_tables]) [<7f4168dc>] (nfnetlink_rcv_batch [nfnetlink]) [<7f416bd0>] (nfnetlink_rcv [nfnetlink]) [<8078b334>] (netlink_unicast) [<8078b664>] (netlink_sendmsg) [<8071b47c>] (sock_sendmsg) [<8071bd18>] (___sys_sendmsg) [<8071ce3c>] (__sys_sendmsg) [<8071ce94>] (sys_sendmsg) The reason is that nft_quota_do_init() calls atomic64_set() on an atomic64_t which is only aligned to 32-bit, not 64-bit, because it succeeds struct nft_expr in memory which only contains a 32-bit pointer. Fix by aligning the nft_expr private data to 64-bit. Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-04	netfilter: ipset: Fix nla_policies to fully support NL_VALIDATE_STRICT	Jozsef Kadlecsik
	Since v5.2 (commit "netlink: re-add parse/validate functions in strict mode") NL_VALIDATE_STRICT is enabled. Fix the ipset nla_policies which did not support strict mode and convert from deprecated parsings to verified ones. Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
2019-11-04	netfilter: ipset: Copy the right MAC address in hash:ip,mac IPv6 sets	Stefano Brivio
	Same as commit 1b4a75108d5b ("netfilter: ipset: Copy the right MAC address in bitmap:ip,mac and hash:ip,mac sets"), another copy and paste went wrong in commit 8cc4ccf58379 ("netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets"). When I fixed this for IPv4 in 1b4a75108d5b, I didn't realise that hash:ip,mac sets also support IPv6 as family, and this is covered by a separate function, hash_ipmac6_kadt(). In hash:ip,mac sets, the first dimension is the IP address, and the second dimension is the MAC address: check the IPSET_DIM_TWO_SRC flag in flags while deciding which MAC address to copy, destination or source. This way, mixing source and destination matches for the two dimensions of ip,mac hash type works as expected, also for IPv6. With this setup: ip netns add A ip link add veth1 type veth peer name veth2 netns A ip addr add 2001:db8::1/64 dev veth1 ip -net A addr add 2001:db8::2/64 dev veth2 ip link set veth1 up ip -net A link set veth2 up dst=$(ip netns exec A cat /sys/class/net/veth2/address) ip netns exec A ipset create test_hash hash:ip,mac family inet6 ip netns exec A ipset add test_hash 2001:db8::1,${dst} ip netns exec A ip6tables -A INPUT -p icmpv6 --icmpv6-type 135 -j ACCEPT ip netns exec A ip6tables -A INPUT -m set ! --match-set test_hash src,dst -j DROP ipset now correctly matches a test packet: # ping -c1 2001:db8::2 >/dev/null # echo $? 0 Reported-by: Chen, Yi <yiche@redhat.com> Fixes: 8cc4ccf58379 ("netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
2019-11-04	netfilter: ipset: Fix an error code in ip_set_sockfn_get()	Dan Carpenter
	The copy_to_user() function returns the number of bytes remaining to be copied. In this code, that positive return is checked at the end of the function and we return zero/success. What we should do instead is return -EFAULT. Fixes: a7b4f989a629 ("netfilter: ipset: IP set core support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
2019-11-04	ice: Move common functions to ice_txrx_lib.c	Krzysztof Kazimierczak
	In preparation of AF XDP, move functions that will be used both by skb and zero-copy paths to a new file called ice_txrx_lib.c. This allows us to avoid using ifdefs to control the staticness of said functions. Move other functions (ice_rx_csum, ice_rx_hash and ice_ptype_to_htype) called only by the moved ones to the new file as well. Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04	dccp: do not leak jiffies on the wire	Eric Dumazet
	For some reason I missed the case of DCCP passive flows in my previous patch. Fixes: a904a0693c18 ("inet: stop leaking jiffies on the wire") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Thiemo Nagel <tnagel@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04	net: fec: add missed clk_disable_unprepare in remove	Chuhong Yuan
	This driver forgets to disable and unprepare clks when remove. Add calls to clk_disable_unprepare to fix it. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04	bpf: re-fix skip write only files in debugfs	Daniel Borkmann
	Commit 5bc60de50dfe ("selftests: bpf: Don't try to read files without read permission") got reverted as the fix was not working as expected and real fix came in via 8101e069418d ("selftests: bpf: Skip write only files in debugfs"). When bpf-next got merged into net-next, the test_offload.py had a small conflict. Fix the resolution in ae8a76fb8b5d iby not reintroducing 5bc60de50dfe again. Fixes: ae8a76fb8b5d ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04	net: ethernet: stmmac: drop unused variable in stm32mp1_set_mode()	Christophe Roullier
	Building with W=1 (cf.scripts/Makefile.extrawarn) outputs: warning: variable ‘ret’ set but not used [-Wunused-but-set-variable] Drop the unused 'ret' variable. Signed-off-by: Christophe Roullier <christophe.roullier@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04	net: sgi: ioc3-eth: ensure tx ring is 16k aligned.	Thomas Bogendoerfer
	IOC3 hardware needs a 16k aligned TX ring. Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04	net: sgi: ioc3-eth: fix setting NETIF_F_HIGHDMA	Christoph Hellwig
	Set NETIF_F_HIGHDMA together with the NETIF_F_IP_CSUM flag instead of letting the second assignment overwrite it. Probably doesn't matter in practice as none of the systems an IOC3 is usually found in has highmem to start with. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04	net: sgi: ioc3-eth: simplify setting the DMA mask	Christoph Hellwig
	There is no need to fall back to a lower mask these days, the DMA mask just communicates the hardware supported features. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04	net: sgi: ioc3-eth: fix usage of GFP_* flags	Christoph Hellwig
	dma_alloc_coherent always zeroes memory, there is no need for __GFP_ZERO. Also doing a GFP_ATOMIC allocation just before a GFP_KERNEL one is clearly bogus. Fixes: ed870f6a7aa2 ("net: sgi: ioc3-eth: use dma-direct for dma allocations") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04	net: sgi: ioc3-eth: don't abuse dma_direct_* calls	Christoph Hellwig
	dma_direct_ is a low-level API that must never be used by drivers directly. Switch to use the proper DMA API instead. Fixes: ed870f6a7aa2 ("net: sgi: ioc3-eth: use dma-direct for dma allocations") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04	ipv6: use jhash2() in rt6_exception_hash()	Eric Dumazet
	Faster jhash2() can be used instead of jhash(), since IPv6 addresses have the needed alignment requirement. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04	net: of_get_phy_mode: Change API to solve int/unit warnings	Andrew Lunn
	Before this change of_get_phy_mode() returned an enum, phy_interface_t. On error, -ENODEV etc, is returned. If the result of the function is stored in a variable of type phy_interface_t, and the compiler has decided to represent this as an unsigned int, comparision with -ENODEV etc, is a signed vs unsigned comparision. Fix this problem by changing the API. Make the function return an error, or 0 on success, and pass a pointer, of type phy_interface_t, where the phy mode should be stored. v2: Return with *interface set to PHY_INTERFACE_MODE_NA on error. Add error checks to all users of of_get_phy_mode() Fixup a few reverse christmas tree errors Fixup a few slightly malformed reverse christmas trees v3: Fix 0-day reported errors. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04	net: bridge: fdb: eliminate extra port state tests from fast-path	Nikolay Aleksandrov
	When commit df1c0b8468b3 ("[BRIDGE]: Packets leaking out of disabled/blocked ports.") introduced the port state tests in br_fdb_update() it was to avoid learning/refreshing from STP BPDUs, it was also used to avoid learning/refreshing from user-space with NTF_USE. Those two tests are done for every packet entering the bridge if it's learning, but for the fast-path we already have them checked in br_handle_frame() and is unnecessary to do it again. Thus push the checks to the unlikely cases and drop them from br_fdb_update(), the new nbp_state_should_learn() helper is used to determine if the port state allows br_fdb_update() to be called. The two places which need to do it manually are: - user-space add call with NTF_USE set - link-local packet learning done in __br_handle_local_finish() Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-04	ice: Add support for XDP	Maciej Fijalkowski
	Add support for XDP. Implement ndo_bpf and ndo_xdp_xmit. Upon load of an XDP program, allocate additional Tx rings for dedicated XDP use. The following actions are supported: XDP_TX, XDP_DROP, XDP_REDIRECT, XDP_PASS, and XDP_ABORTED. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04	ice: get rid of per-tc flow in Tx queue configuration routines	Maciej Fijalkowski
	There's no reason for treating DCB as first class citizen when configuring the Tx queues and going through TCs. Reverse the logic and base the configuration logic on rings, which is the object of interest anyway. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04	ice: Introduce ice_base.c	Anirudh Venkataramanan
	Remove a few uses of kernel configuration flags from ice_lib.c by introducing a new source file ice_base.c. Also move corresponding function prototypes from ice_lib.h to ice_base.h and include ice_base.h where required. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04	Merge tag 'clk-v5.4-samsung-fixes' of ↵	Stephen Boyd
	https://git.kernel.org/pub/scm/linux/kernel/git/snawrocki/clk into clk-fixes Pull Samsung clk driver fixes from Sylwester Nawrocki: - system suspend related fixes for the exynos542x clocks driver - probe() error paths fixes in the exynos5433 CMU driver adding proper release of memory and clk resources * tag 'clk-v5.4-samsung-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/snawrocki/clk: clk: samsung: exynos5420: Preserve PLL configuration during suspend/resume clk: samsung: exynos542x: Move G3D subsystem clocks to its sub-CMU clk: samsung: exynos5433: Fix error paths
2019-11-04	Merge tag 'sunxi-clk-fixes-for-5.4-1' of ↵	Stephen Boyd
	https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into clk-fixes Two patches that fix some operator precedence and zeroing of bits * tag 'sunxi-clk-fixes-for-5.4-1' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux: clk: sunxi-ng: a80: fix the zero'ing of bits 16 and 18 clk: sunxi: Fix operator precedence in sunxi_divs_clk_setup
2019-11-04	clk: ti: clkctrl: Fix failed to enable error with double udelay timeout	Tony Lindgren
	Commit 3d8598fb9c5a ("clk: ti: clkctrl: use fallback udelay approach if timekeeping is suspended") added handling for cases when timekeeping is suspended. But looks like we can still get occasional "failed to enable" errors on the PM runtime resume path with udelay() returning faster than expected. With ti-sysc interconnect target module driver this leads into device failure with PM runtime failing with "failed to enable" clkctrl error. Let's fix the issue with a delay of two times the desired delay as in often done for udelay() to account for the inaccuracy. Fixes: 3d8598fb9c5a ("clk: ti: clkctrl: use fallback udelay approach if timekeeping is suspended") Cc: Keerthy <j-keerthy@ti.com> Cc: Tero Kristo <t-kristo@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Link: https://lkml.kernel.org/r/20190930154001.46581-1-tony@atomide.com Tested-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2019-11-04	clk: ti: dra7-atl-clock: Remove ti_clk_add_alias call	Peter Ujfalusi
	ti_clk_register() calls it already so the driver should not create duplicated alias. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Link: https://lkml.kernel.org/r/20191002083436.10194-1-peter.ujfalusi@ti.com Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2019-11-04	netfilter: nf_tables_offload: check for register data length mismatches	Pablo Neira Ayuso
	Make sure register data length does not mismatch immediate data length, otherwise hit EOPNOTSUPP. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-11-04	ASoC: hdac_hda: fix race in device removal	Kai Vehmanen
	When ASoC card instance is removed containing a HDA codec, hdac_hda_codec_remove() may run in parallel with codec resume. This will cause problems if the HDA link is freed with snd_hdac_ext_bus_link_put() while the codec is still in middle of its resume process. To fix this, change the order such that pm_runtime_disable() is called before the link is freed. This will ensure any pending runtime PM action is completed before proceeding to free the link. This issue can be easily hit with e.g. SOF driver by loading and unloading the drivers. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20191101170635.26389-1-pierre-louis.bossart@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org>
2019-11-04	fbdev: c2p: Fix link failure on non-inlining	Geert Uytterhoeven
	When the compiler decides not to inline the Chunky-to-Planar core functions, the build fails with: c2p_planar.c:(.text+0xd6): undefined reference to `c2p_unsupported' c2p_planar.c:(.text+0x1dc): undefined reference to `c2p_unsupported' c2p_iplan2.c:(.text+0xc4): undefined reference to `c2p_unsupported' c2p_iplan2.c:(.text+0x150): undefined reference to `c2p_unsupported' Fix this by marking the functions __always_inline. While this could be triggered before by manually enabling both CONFIG_OPTIMIZE_INLINING and CONFIG_CC_OPTIMIZE_FOR_SIZE, it was exposed in the m68k defconfig by commit ac7c3e4ff401b304 ("compiler: enable CONFIG_OPTIMIZE_INLINING forcibly"). Fixes: 9012d011660ea5cf ("compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING") Reported-by: noreply@ellerman.id.au Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20190927094708.11563-1-geert@linux-m68k.org
2019-11-04	ALSA: bebob: fix to detect configured source of sampling clock for Focusrite ↵	Takashi Sakamoto
	Saffire Pro i/o series For Focusrite Saffire Pro i/o, the lowest 8 bits of register represents configured source of sampling clock. The next lowest 8 bits represents whether the configured source is actually detected or not just after the register is changed for the source. Current implementation evaluates whole the register to detect configured source. This results in failure due to the next lowest 8 bits when the source is connected in advance. This commit fixes the bug. Fixes: 25784ec2d034 ("ALSA: bebob: Add support for Focusrite Saffire/SaffirePro series") Cc: <stable@vger.kernel.org> # v3.16+ Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp> Link: https://lore.kernel.org/r/20191102150920.20367-1-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai <tiwai@suse.de>
2019-11-03	Merge tag 'mlx5-updates-2019-11-01' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-11-01 Misc updates for mlx5 netdev and core driver 1) Steering Core: Replace CRC32 internal implementation with standard kernel lib. 2) Steering Core: Support IPv4 and IPv6 mixed matcher. 3) Steering Core: Lockless FTE read lookups 4) TC: Bit sized fields rewrite support. 5) FPGA: Standalone FPGA support. 6) SRIOV: Reset VF parameters configurations on SRIOV disable. 7) netdev: Dump WQs wqe descriptors on CQE with error events. 8) MISC Cleanups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	mISDN: remove unused variable 'faxmodulation_s'	YueHaibing
	drivers/isdn/hardware/mISDN/mISDNisar.c:30:17: warning: faxmodulation_s defined but not used [-Wunused-const-variable=] It is never used, so can be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	ptp: Add a ptp clock driver for IDT ClockMatrix.	Vincent Cheng
	The IDT ClockMatrix (TM) family includes integrated devices that provide eight PLL channels. Each PLL channel can be independently configured as a frequency synthesizer, jitter attenuator, digitally controlled oscillator (DCO), or a digital phase lock loop (DPLL). Typically these devices are used as timing references and clock sources for PTP applications. This patch adds support for the device. Co-developed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	dt-bindings: ptp: Add device tree binding for IDT ClockMatrix based PTP clock	Vincent Cheng
	Add device tree binding doc for the IDT ClockMatrix PTP clock. Signed-off-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	net: icmp6: provide input address for traceroute6	Francesco Ruggeri
	traceroute6 output can be confusing, in that it shows the address that a router would use to reach the sender, rather than the address the packet used to reach the router. Consider this case: ------------------------ N2 \| \| ------ ------ N3 ---- \| R1 \| \| R2 \|------\|H2\| ------ ------ ---- \| \| ------------------------ N1 \| ---- \|H1\| ---- where H1's default route is through R1, and R1's default route is through R2 over N2. traceroute6 from H1 to H2 shows R2's address on N1 rather than on N2. The script below can be used to reproduce this scenario. traceroute6 output without this patch: traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets 1 2000:101::1 (2000:101::1) 0.036 ms 0.008 ms 0.006 ms 2 2000:101::2 (2000:101::2) 0.011 ms 0.008 ms 0.007 ms 3 2000:103::4 (2000:103::4) 0.013 ms 0.010 ms 0.009 ms traceroute6 output with this patch: traceroute to 2000:103::4 (2000:103::4), 30 hops max, 80 byte packets 1 2000:101::1 (2000:101::1) 0.056 ms 0.019 ms 0.006 ms 2 2000:102::2 (2000:102::2) 0.013 ms 0.008 ms 0.008 ms 3 2000:103::4 (2000:103::4) 0.013 ms 0.009 ms 0.009 ms #!/bin/bash # # ------------------------ N2 # \| \| # ------ ------ N3 ---- # \| R1 \| \| R2 \|------\|H2\| # ------ ------ ---- # \| \| # ------------------------ N1 # \| # ---- # \|H1\| # ---- # # N1: 2000:101::/64 # N2: 2000:102::/64 # N3: 2000:103::/64 # # R1's host part of address: 1 # R2's host part of address: 2 # H1's host part of address: 3 # H2's host part of address: 4 # # For example: # the IPv6 address of R1's interface on N2 is 2000:102::1/64 # # Nets are implemented by macvlan interfaces (bridge mode) over # dummy interfaces. # # Create net namespaces ip netns add host1 ip netns add host2 ip netns add rtr1 ip netns add rtr2 # Create nets ip link add net1 type dummy; ip link set net1 up ip link add net2 type dummy; ip link set net2 up ip link add net3 type dummy; ip link set net3 up # Add interfaces to net1, move them to their nemaspaces ip link add link net1 dev host1net1 type macvlan mode bridge ip link set host1net1 netns host1 ip link add link net1 dev rtr1net1 type macvlan mode bridge ip link set rtr1net1 netns rtr1 ip link add link net1 dev rtr2net1 type macvlan mode bridge ip link set rtr2net1 netns rtr2 # Add interfaces to net2, move them to their nemaspaces ip link add link net2 dev rtr1net2 type macvlan mode bridge ip link set rtr1net2 netns rtr1 ip link add link net2 dev rtr2net2 type macvlan mode bridge ip link set rtr2net2 netns rtr2 # Add interfaces to net3, move them to their nemaspaces ip link add link net3 dev rtr2net3 type macvlan mode bridge ip link set rtr2net3 netns rtr2 ip link add link net3 dev host2net3 type macvlan mode bridge ip link set host2net3 netns host2 # Configure interfaces and routes in host1 ip netns exec host1 ip link set lo up ip netns exec host1 ip link set host1net1 up ip netns exec host1 ip -6 addr add 2000:101::3/64 dev host1net1 ip netns exec host1 ip -6 route add default via 2000:101::1 # Configure interfaces and routes in rtr1 ip netns exec rtr1 ip link set lo up ip netns exec rtr1 ip link set rtr1net1 up ip netns exec rtr1 ip -6 addr add 2000:101::1/64 dev rtr1net1 ip netns exec rtr1 ip link set rtr1net2 up ip netns exec rtr1 ip -6 addr add 2000:102::1/64 dev rtr1net2 ip netns exec rtr1 ip -6 route add default via 2000:102::2 ip netns exec rtr1 sysctl net.ipv6.conf.all.forwarding=1 # Configure interfaces and routes in rtr2 ip netns exec rtr2 ip link set lo up ip netns exec rtr2 ip link set rtr2net1 up ip netns exec rtr2 ip -6 addr add 2000:101::2/64 dev rtr2net1 ip netns exec rtr2 ip link set rtr2net2 up ip netns exec rtr2 ip -6 addr add 2000:102::2/64 dev rtr2net2 ip netns exec rtr2 ip link set rtr2net3 up ip netns exec rtr2 ip -6 addr add 2000:103::2/64 dev rtr2net3 ip netns exec rtr2 sysctl net.ipv6.conf.all.forwarding=1 # Configure interfaces and routes in host2 ip netns exec host2 ip link set lo up ip netns exec host2 ip link set host2net3 up ip netns exec host2 ip -6 addr add 2000:103::4/64 dev host2net3 ip netns exec host2 ip -6 route add default via 2000:103::2 # Ping host2 from host1 ip netns exec host1 ping6 -c5 2000:103::4 # Traceroute host2 from host1 ip netns exec host1 traceroute6 2000:103::4 # Delete nets ip link del net3 ip link del net2 ip link del net1 # Delete namespaces ip netns del rtr2 ip netns del rtr1 ip netns del host2 ip netns del host1 Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Original-patch-by: Honggang Xu <hxu@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	tipc: improve message bundling algorithm	Tuong Lien
	As mentioned in commit e95584a889e1 ("tipc: fix unlimited bundling of small messages"), the current message bundling algorithm is inefficient that can generate bundles of only one payload message, that causes unnecessary overheads for both the sender and receiver. This commit re-designs the 'tipc_msg_make_bundle()' function (now named as 'tipc_msg_try_bundle()'), so that when a message comes at the first place, we will just check & keep a reference to it if the message is suitable for bundling. The message buffer will be put into the link backlog queue and processed as normal. Later on, when another one comes we will make a bundle with the first message if possible and so on... This way, a bundle if really needed will always consist of at least two payload messages. Otherwise, we let the first buffer go its way without any need of bundling, so reduce the overheads to zero. Moreover, since now we have both the messages in hand, we can even optimize the 'tipc_msg_bundle()' function, make bundle of a very large (size ~ MSS) and small messages which is not with the current algorithm e.g. [1400-byte message] + [10-byte message] (MTU = 1500). Acked-by: Ying Xue <ying.xue@windreiver.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	net: icmp: use input address in traceroute	Francesco Ruggeri
	Even with icmp_errors_use_inbound_ifaddr set, traceroute returns the primary address of the interface the packet was received on, even if the path goes through a secondary address. In the example: 1.0.3.1/24 ---- 1.0.1.3/24 1.0.1.1/24 ---- 1.0.2.1/24 1.0.2.4/24 ---- \|H1\|--------------------------\|R1\|--------------------------\|H2\| ---- N1 ---- N2 ---- where 1.0.3.1/24 is R1's primary address on N1, traceroute from H1 to H2 returns: traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets 1 1.0.3.1 (1.0.3.1) 0.018 ms 0.006 ms 0.006 ms 2 1.0.2.4 (1.0.2.4) 0.021 ms 0.007 ms 0.007 ms After applying this patch, it returns: traceroute to 1.0.2.4 (1.0.2.4), 30 hops max, 60 byte packets 1 1.0.1.1 (1.0.1.1) 0.033 ms 0.007 ms 0.006 ms 2 1.0.2.4 (1.0.2.4) 0.011 ms 0.007 ms 0.007 ms Original-patch-by: Bill Fenner <fenner@arista.com> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	Merge branch 'optimize-openvswitch-flow-looking-up'	David S. Miller
	Tonghao Zhang says: ==================== optimize openvswitch flow looking up This series patch optimize openvswitch for performance or simplify codes. Patch 1, 2, 4: Port Pravin B Shelar patches to linux upstream with little changes. Patch 5, 6, 7: Optimize the flow looking up and simplify the flow hash. Patch 8, 9: are bugfix. The performance test is on Intel Xeon E5-2630 v4. The test topology is show as below: +-----------------------------------+ \| +---------------------------+ \| \| \| eth0 ovs-switch eth1 \| \| Host0 \| +---------------------------+ \| +-----------------------------------+ ^ \| \| \| \| \| \| \| \| v +-----+----+ +----+-----+ \| netperf \| Host1 \| netserver\| Host2 +----------+ +----------+ We use netperf send the 64B packets, and insert 255+ flow-mask: $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:01:00:00:00:00/ff:ff:ff:ff:ff:01),eth_type(0x0800),ipv4(frag=no)" 2 ... $ ovs-dpctl add-flow ovs-switch "in_port(1),eth(dst=00:ff:00:00:00:00/ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(frag=no)" 2 $ $ netperf -t UDP_STREAM -H 2.2.2.200 -l 40 -- -m 18 * Without series patch, throughput 8.28Mbps * With series patch, throughput 46.05Mbps v6: some coding style fixes v5: rewrite patch 8, release flow-mask when freeing flow v4: access ma->count with READ_ONCE/WRITE_ONCE API. More information, see patch 5 comments. v3: update ma point when realloc mask_array in patch 5 v2: simplify codes. e.g. use kfree_rcu instead of call_rcu ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	net: openvswitch: simplify the ovs_dp_cmd_new	Tonghao Zhang
	use the specified functions to init resource. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	net: openvswitch: don't unlock mutex when changing the user_features fails	Tonghao Zhang
	Unlocking of a not locked mutex is not allowed. Other kernel thread may be in critical section while we unlock it because of setting user_feature fail. Fixes: 95a7233c4 ("net: openvswitch: Set OvS recirc_id from tc chain index") Cc: Paul Blakey <paulb@mellanox.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	net: openvswitch: fix possible memleak on destroy flow-table	Tonghao Zhang
	When we destroy the flow tables which may contain the flow_mask, so release the flow mask struct. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	net: openvswitch: add likely in flow_lookup	Tonghao Zhang
	The most case *index < ma->max, and flow-mask is not NULL. We add un/likely for performance. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	net: openvswitch: simplify the flow_hash	Tonghao Zhang
	Simplify the code and remove the unnecessary BUILD_BUG_ON. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	net: openvswitch: optimize flow-mask looking up	Tonghao Zhang
	The full looking up on flow table traverses all mask array. If mask-array is too large, the number of invalid flow-mask increase, performance will be drop. One bad case, for example: M means flow-mask is valid and NULL of flow-mask means deleted. +-------------------------------------------+ \| M \| NULL \| ... \| NULL \| M\| +-------------------------------------------+ In that case, without this patch, openvswitch will traverses all mask array, because there will be one flow-mask in the tail. This patch changes the way of flow-mask inserting and deleting, and the mask array will be keep as below: there is not a NULL hole. In the fast path, we can "break" "for" (not "continue") in flow_lookup when we get a NULL flow-mask. "break" v +-------------------------------------------+ \| M \| M \| NULL \|... \| NULL \| NULL\| +-------------------------------------------+ This patch don't optimize slow or control path, still using ma->max to traverse. Slow path: * tbl_mask_array_realloc * ovs_flow_tbl_lookup_exact * flow_mask_find Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	net: openvswitch: optimize flow mask cache hash collision	Tonghao Zhang
	Port the codes to linux upstream and with little changes. Pravin B Shelar, says: \| In case hash collision on mask cache, OVS does extra flow \| lookup. Following patch avoid it. Link: https://github.com/openvswitch/ovs/commit/0e6efbe2712da03522532dc5e84806a96f6a0dd1 Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	net: openvswitch: shrink the mask array if necessary	Tonghao Zhang
	When creating and inserting flow-mask, if there is no available flow-mask, we realloc the mask array. When removing flow-mask, if necessary, we shrink mask array. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	net: openvswitch: convert mask list in mask array	Tonghao Zhang
	Port the codes to linux upstream and with little changes. Pravin B Shelar, says: \| mask caches index of mask in mask_list. On packet recv OVS \| need to traverse mask-list to get cached mask. Therefore array \| is better for retrieving cached mask. This also allows better \| cache replacement algorithm by directly checking mask's existence. Link: https://github.com/openvswitch/ovs/commit/d49fc3ff53c65e4eca9cabd52ac63396746a7ef5 Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	net: openvswitch: add flow-mask cache for performance	Tonghao Zhang
	The idea of this optimization comes from a patch which is committed in 2014, openvswitch community. The author is Pravin B Shelar. In order to get high performance, I implement it again. Later patches will use it. Pravin B Shelar, says: \| On every packet OVS needs to lookup flow-table with every \| mask until it finds a match. The packet flow-key is first \| masked with mask in the list and then the masked key is \| looked up in flow-table. Therefore number of masks can \| affect packet processing performance. Link: https://github.com/openvswitch/ovs/commit/5604935e4e1cbc16611d2d97f50b717aa31e8ec5 Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Greg Rose <gvrose8192@gmail.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-03	Revert "gpio: merrifield: Pass irqchip when adding gpiochip"	Linus Walleij
	This reverts commit 8f86a5b4ad679e4836733b47414226074eee4e4d. It has been established that this causes a boot regression on both Baytrail and Cherrytrail SoCs, and we can't have that in the final kernel release, so we need to revert it. Reported-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-11-03	Revert "gpio: merrifield: Restore use of irq_base"	Linus Walleij
	This reverts commit 6658f87f219427ee776c498e07c878eb5cad1be2. This revert is a prerequisite for the later revert of commit 8f86a5b4ad679e4836733b47414226074eee4e4d. Reported-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2019-11-03	Revert "gpio: merrifield: Move hardware initialization to callback"	Linus Walleij
	This reverts commit 4c87540940cbc7ddbe9674087919c605fd5c2ef1. This revert is a prerequisite for the later revert of commit 8f86a5b4ad679e4836733b47414226074eee4e4d. Reported-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>