summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-12-15Merge branch 'net-mitigate-retpoline-overhead'David S. Miller
Paolo Abeni says: ==================== net: mitigate retpoline overhead The spectre v2 counter-measures, aka retpolines, are a source of measurable overhead[1]. We can partially address that when the function pointer refers to a builtin symbol resorting to a list of tests vs well-known builtin function and direct calls. Experimental results show that replacing a single indirect call via retpoline with several branches and a direct call gives performance gains even when multiple branches are added - 5 or more, as reported in [2]. This may lead to some uglification around the indirect calls. In netconf 2018 Eric Dumazet described a technique to hide the most relevant part of the needed boilerplate with some macro help. This series is a [re-]implementation of such idea, exposing the introduced helpers in a new header file. They are later leveraged to avoid the indirect call overhead in the GRO path, when possible. Overall this gives > 10% performance improvement for UDP GRO benchmark and smaller but measurable for TCP syn flood. The added infra can be used in follow-up patches to cope with retpoline overhead in other points of the networking stack (e.g. at the qdisc layer) and possibly even in other subsystems. v2 -> v3: - fix build error with CONFIG_IPV6=m v1 -> v2: - list explicitly the builtin function names in INDIRECT_CALL_*(), as suggested by Ed Cree - expand the recipients list rfc -> v1: - use branch prediction hints, as suggested by Eric [1] http://vger.kernel.org/netconf2018_files/PaoloAbeni_netconf2018.pdf [2] https://linuxplumbersconf.org/event/2/contributions/99/attachments/98/117/lpc18_paper_af_xdp_perf-v2.pdf ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15udp: use indirect call wrappers for GRO socket lookupPaolo Abeni
This avoids another indirect call for UDP GRO. Again, the test for the IPv6 variant is performed first. v1 -> v2: - adapted to INDIRECT_CALL_ changes Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: use indirect call wrappers at GRO transport layerPaolo Abeni
This avoids an indirect call in the receive path for TCP and UDP packets. TCP takes precedence on UDP, so that we have a single additional conditional in the common case. When IPV6 is build as module, all gro symbols except UDPv6 are builtin, while the latter belong to the ipv6 module, so we need some special care. v1 -> v2: - adapted to INDIRECT_CALL_ changes v2 -> v3: - fix build issue with CONFIG_IPV6=m Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: use indirect call wrappers at GRO network layerPaolo Abeni
This avoids an indirect calls for L3 GRO receive path, both for ipv4 and ipv6, if the latter is not compiled as a module. Note that when IPv6 is compiled as builtin, it will be checked first, so we have a single additional compare for the more common path. v1 -> v2: - adapted to INDIRECT_CALL_ changes Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15indirect call wrappers: helpers to speed-up indirect calls of builtinPaolo Abeni
This header define a bunch of helpers that allow avoiding the retpoline overhead when calling builtin functions via function pointers. It boils down to explicitly comparing the function pointers to known builtin functions and eventually invoke directly the latter. The macros defined here implement the boilerplate for the above schema and will be used by the next patches. rfc -> v1: - use branch prediction hint, as suggested by Eric v1 -> v2: - list explicitly the builtin function names in INDIRECT_CALL_*(), as suggested by Ed Cree Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: socionext: remove mmio reads on TxIlias Apalodimas
Currently the driver issues 2 mmio reads to figure out the number of transmitted packets and clean them. We can get rid of the expensive reads since BIT 31 of the Tx descriptor can be used for that. We can also remove the budget counting of Tx completions since all of the descriptors are not deliberately processed. Performance numbers using pktgen are: size pre-patch(pps) post-patch(pps) 64 362483 427916 128 358315 411686 256 352725 389683 512 215675 216464 1024 113812 114442 Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: socionext: correctly recover txq after being fullIlias Apalodimas
Running pktgen with packets sizes > 512b ends up in the interface Txq getting stuck. "netsec 522d0000.ethernet eth0: netsec_netdev_start_xmit: TxQFull!" appears on dmesg but the interface never recovers. It requires an ifconfig down/up to make the interface usable again. The reason that triggers this, is a race condition between .ndo_start_xmit and the napi completion. The available budget is calculated first and indicates the queue is full. Due to a costly netif_err() the queue is not stopped in time while the napi completion runs, clears the irq and frees up descriptors, thus the queue never wakes up again. Fix this by moving the print after stopping the queue, make the print ratelimited, add barriers and check for cleaned descriptors.. Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15dt-bindings: net: ravb: Add support for r8a774c0 SoCFabrizio Castro
Document RZ/G2E (R8A774C0) SoC bindings. Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15mod_devicetable.h: correct kerneldoc typo, "PHYSID2" -> "MII_PHYSID2"Robert P. J. Day
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: ipv4: do not handle duplicate fragments as overlappingMichal Kubecek
Since commit 7969e5c40dfd ("ip: discard IPv4 datagrams with overlapping segments.") IPv4 reassembly code drops the whole queue whenever an overlapping fragment is received. However, the test is written in a way which detects duplicate fragments as overlapping so that in environments with many duplicate packets, fragmented packets may be undeliverable. Add an extra test and for (potentially) duplicate fragment, only drop the new fragment rather than the whole queue. Only starting offset and length are checked, not the contents of the fragments as that would be too expensive. For similar reason, linear list ("run") of a rbtree node is not iterated, we only check if the new fragment is a subset of the interval covered by existing consecutive fragments. v2: instead of an exact check iterating through linear list of an rbtree node, only check if the new fragment is subset of the "run" (suggested by Eric Dumazet) Fixes: 7969e5c40dfd ("ip: discard IPv4 datagrams with overlapping segments.") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15neighbor: Improve neighbour struct layoutDavid Ahern
Move arp_queue_len_bytes ahead of arp_queue to remove two 4-byte holes. Ensure ha element is always 8-byte aligned. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15qmi_wwan: Added support for Telit LN940 seriesJörgen Storvist
Added support for the Telit LN940 series cellular modules QMI interface. QMI_QUIRK_SET_DTR quirk requied for Qualcomm MDM9x40 chipset. Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: sched: simplify the qdisc_leaf codeTonghao Zhang
Except for returning, the var leaf is not used in the qdisc_leaf(). For simplicity, remove it. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15ipv6: Fix handling of LLA with VRF and sockets bound to VRFDavid Ahern
A recent commit allows sockets bound to a VRF to receive ipv6 link local packets. However, it only works for UDP and worse TCP connection attempts to the LLA with the only listener bound to the VRF just hang where as before the client gets a reset and connection refused. Fix by adjusting ir_iif for LL addresses and packets received through a device enslaved to a VRF. Fixes: 6f12fa775530 ("vrf: mark skb for multicast or link-local as enslaved to VRF") Reported-by: Donald Sharp <sharpd@cumulusnetworks.com> Cc: Mike Manning <mmanning@vyatta.att-mail.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15qmi_wwan: Added support for Fibocom NL668 seriesJörgen Storvist
Added support for Fibocom NL668 series QMI interface. Using QMI_QUIRK_SET_DTR required for Qualcomm MDM9x07 chipsets. Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15r8169: improve spurious interrupt detectionHeiner Kallweit
Improve detection of spurious interrupts by checking against the interrupt mask as currently set in the chip. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15cxgb4: remove DEFINE_SIMPLE_DEBUGFS_FILE()Yangtao Li
We already have the DEFINE_SHOW_ATTRIBUTE. There is no need to define such a macro, so remove DEFINE_SIMPLE_DEBUGFS_FILE. Also use the DEFINE_SHOW_ATTRIBUTE macro to simplify some code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15ipconfig: convert to DEFINE_SHOW_ATTRIBUTEYangtao Li
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf 2018-12-15 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) fix liveness propagation of callee saved registers, from Jakub. 2) fix overflow in bpf_jit_limit knob, from Daniel. 3) bpf_flow_dissector api fix, from Stanislav. 4) bpf_perf_event api fix on powerpc, from Sandipan. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15Merge branch 'hns3-Add-more-commands-to-Debugfs-in-HNS3-driver'David S. Miller
Salil Mehta says: ==================== net: hns3: Add more commands to Debugfs in HNS3 driver This patch-set adds few more debugfs commands to HNS3 Ethernet Driver. Support has been added to query info related to below items: 1. Packet buffer descriptor ("echo bd info [queue no] [bd index] > cmd") 2. Manager table("echo dump mng tbl > cmd") 3. Dfx status register("echo dump reg ssu [prt id] > cmd") 4. Dcb status register("echo dump reg dcb [port id] > cmd") 5. Queue map ("echo queue map [queue no] > cmd") 6. Tm map ("echo tm map [queue no] > cmd") NOTE: Above commands are *read-only* and are only intended to query the information from the SoC(and dump inside the kernel, for now) and in no way tries to perform write operations for the purpose of configuration etc. Change Log: V1-->V2: 1. Addressed the GCC-8.2 compiler issue reported by David S. Miller. Link: https://lkml.org/lkml/2018/12/14/1298 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: hns3: Add "tm map" status information query functionliuzhongzhu
This patch prints dcb register status information by module. debugfs command: root@(none)# echo dump tm map 100 > cmd queue_id | qset_id | pri_id | tc_id 0100 | 0065 | 08 | 00 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: hns3: Add "queue map" information query functionliuzhongzhu
This patch prints queue map information. debugfs command: echo dump queue map > cmd Sample Command: root@(none)# echo queue map > cmd local queue id | global queue id | vector id 0 32 769 1 33 770 2 34 771 3 35 772 4 36 773 5 37 774 6 38 775 7 39 776 8 40 777 9 41 778 10 42 779 11 43 780 12 44 781 13 45 782 14 46 783 15 47 784 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: hns3: Add "dcb register" status information query functionliuzhongzhu
This patch prints dcb register status information by module. debugfs command: root@(none)# echo dump reg dcb > cmd roce_qset_mask: 0x0 nic_qs_mask: 0x0 qs_shaping_pass: 0x0 qs_bp_sts: 0x0 pri_mask: 0x0 pri_cshaping_pass: 0x0 pri_pshaping_pass: 0x0 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: hns3: Add "status register" information query functionliuzhongzhu
This patch prints status register information by module. debugfs command: echo dump reg [mode name] > cmd Sample Command: root@(none)# echo dump reg bios common > cmd BP_CPU_STATE: 0x0 DFX_MSIX_INFO_NIC_0: 0xc000 DFX_MSIX_INFO_NIC_1: 0xf DFX_MSIX_INFO_NIC_2: 0x2 DFX_MSIX_INFO_NIC_3: 0x2 DFX_MSIX_INFO_ROC_0: 0xc000 DFX_MSIX_INFO_ROC_1: 0x0 DFX_MSIX_INFO_ROC_2: 0x0 DFX_MSIX_INFO_ROC_3: 0x0 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: hns3: Add "manager table" information query functionliuzhongzhu
This patch prints manager table information. debugfs command: echo dump mng tbl > cmd Sample Command: root@(none)# echo dump mng tbl > cmd entry|mac_addr |mask|ether|mask|vlan|mask|i_map|i_dir|e_type 00 |01:00:5e:00:00:01|0 |00000|0 |0000|0 |00 |00 |0 01 |c2:f1:c5:82:68:17|0 |00000|0 |0000|0 |00 |00 |0 root@(none)# Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15net: hns3: Add "bd info" query functionliuzhongzhu
This patch prints Sending and receiving package descriptor information. debugfs command: echo dump bd info 1 > cmd Sample Command: root@(none)# echo bd info 1 > cmd hns3 0000:7d:00.0: TX Queue Num: 0, BD Index: 0 hns3 0000:7d:00.0: (TX) addr: 0x0 hns3 0000:7d:00.0: (TX)vlan_tag: 0 hns3 0000:7d:00.0: (TX)send_size: 0 hns3 0000:7d:00.0: (TX)vlan_tso: 0 hns3 0000:7d:00.0: (TX)l2_len: 0 hns3 0000:7d:00.0: (TX)l3_len: 0 hns3 0000:7d:00.0: (TX)l4_len: 0 hns3 0000:7d:00.0: (TX)vlan_tag: 0 hns3 0000:7d:00.0: (TX)tv: 0 hns3 0000:7d:00.0: (TX)vlan_msec: 0 hns3 0000:7d:00.0: (TX)ol2_len: 0 hns3 0000:7d:00.0: (TX)ol3_len: 0 hns3 0000:7d:00.0: (TX)ol4_len: 0 hns3 0000:7d:00.0: (TX)paylen: 0 hns3 0000:7d:00.0: (TX)vld_ra_ri: 0 hns3 0000:7d:00.0: (TX)mss: 0 hns3 0000:7d:00.0: RX Queue Num: 0, BD Index: 120 hns3 0000:7d:00.0: (RX)addr: 0xffee7000 hns3 0000:7d:00.0: (RX)pkt_len: 0 hns3 0000:7d:00.0: (RX)size: 0 hns3 0000:7d:00.0: (RX)rss_hash: 0 hns3 0000:7d:00.0: (RX)fd_id: 0 hns3 0000:7d:00.0: (RX)vlan_tag: 0 hns3 0000:7d:00.0: (RX)o_dm_vlan_id_fb: 0 hns3 0000:7d:00.0: (RX)ot_vlan_tag: 0 hns3 0000:7d:00.0: (RX)bd_base_info: 0 Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-15Merge branch 'bpf-bpftool-cleanups'Daniel Borkmann
Quentin Monnet says: ==================== This series contains several minor fixes for bpftool source and documentation. The first patches focus on documentation: addition of an option in the page for "bpftool prog", clean up and update of the same page, and addition of an example of prog array map manipulation in "bpftool map" page. The last two fix warnings susceptible to appear when libbfd is not present (patch 4), or with additional warning flags passed to the compiler (last patch). ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-15tools: bpftool: fix -Wmissing declaration warningsQuentin Monnet
Help compiler check arguments for several utility functions used to print items to the console by adding the "printf" attribute when declaring those functions. Also, declare as "static" two functions that are only used in prog.c. All of them discovered by compiling bpftool with -Wmissing-format-attribute -Wmissing-declarations. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-15tools: bpftool: fix warning on struct bpf_prog_linfo definitionQuentin Monnet
The following warning appears when compiling bpftool without BFD support: main.h:198:23: warning: 'struct bpf_prog_linfo' declared inside parameter list will not be visible outside of this definition or declaration const struct bpf_prog_linfo *prog_linfo, Fix it by declaring struct bpf_prog_linfo even in the case BFD is not supported. Fixes: b053b439b72a ("bpf: libbpf: bpftool: Print bpf_line_info during prog dump") Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-15tools: bpftool: add a prog array map update example to documentationQuentin Monnet
Add an example in map documentation to show how to use bpftool in order to update the references to programs hold by prog array maps. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-15tools: bpftool: fix examples in documentation for bpftool progQuentin Monnet
Bring various fixes to the manual page for "bpftool prog" set of commands: - Fix typos ("dum" -> "dump") - Harmonise indentation and format for command output - Update date format for program load time - Add instruction numbers on program dumps - Fix JSON format for the example program listing Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-15tools: bpftool: add doc for -m option to bpftool-prog.rstQuentin Monnet
The --mapcompat|-m option has been documented on the main bpftool.rst page, and on the interactive help. As this option is useful for loading programs with maps with the "bpftool prog load" command, it should also appear in the related bpftool-prog.rst documentation page. Let's add it. Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-15Merge branch 'bpf-improve-verifier-state-analysis'Daniel Borkmann
Alexei Starovoitov says: ==================== v1->v2: With optimization suggested by Jakub patch 4 safety check became cheap enough. Several improvements to verifier state logic. Patch 1 - trivial optimization Patch 3 - significant optimization for stack state equivalence Patch 4 - safety check for liveness and prep for future state merging ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-15bpf: add self-check logic to liveness analysisAlexei Starovoitov
Introduce REG_LIVE_DONE to check the liveness propagation and prepare the states for merging. See algorithm description in clean_live_states(). Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-15bpf: improve stacksafe state comparisonAlexei Starovoitov
"if (old->allocated_stack > cur->allocated_stack)" check is too conservative. In some cases explored stack could have allocated more space, but that stack space was not live. The test case improves from 19 to 15 processed insns and improvement on real programs is significant as well: before after bpf_lb-DLB_L3.o 1940 1831 bpf_lb-DLB_L4.o 3089 3029 bpf_lb-DUNKNOWN.o 1065 1064 bpf_lxc-DDROP_ALL.o 28052 26309 bpf_lxc-DUNKNOWN.o 35487 33517 bpf_netdev.o 10864 9713 bpf_overlay.o 6643 6184 bpf_lcx_jit.o 38437 37335 Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Edward Cree <ecree@solarflare.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-15selftests/bpf: check insn processed in test_verifierAlexei Starovoitov
Teach test_verifier to parse verifier output for insn processed and compare with expected number. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Edward Cree <ecree@solarflare.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-15bpf: speed up stacksafe checkAlexei Starovoitov
Don't check the same stack liveness condition 8 times. once is enough. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Edward Cree <ecree@solarflare.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-12-14Merge branch 'net-prefer-listeners-bound-to-an-address'David S. Miller
Peter Oskolkov says: ==================== net: prefer listeners bound to an address A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patchset eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In a future patchset I plan to explore whether it is possible to remove port-only hashtables completely: additional refactoring will be required, as some non-lookup code uses the hashtables. ==================== Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14selftests: net: test that listening sockets match on address properlyPeter Oskolkov
This patch adds a selftest that verifies that a socket listening on a specific address is chosen in preference over sockets that listen on any address. The test covers UDP/UDP6/TCP/TCP6. It is based on, and similar to, reuseport_dualstack.c selftest. Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14net: tcp6: prefer listeners bound to an addressPeter Oskolkov
A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patch eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In addition, compute_score() is tweaked to _not_ match addr_any sockets to specific addresses, as hash collisions could result in the unwanted behavior described above. Tested: the patch compiles; full test in the last patch in this patchset. Existing reuseport_* selftests also pass. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14net: tcp: prefer listeners bound to an addressPeter Oskolkov
A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patch eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In addition, compute_score() is tweaked to _not_ match addr_any sockets to specific addresses, as hash collisions could result in the unwanted behavior described above. Tested: the patch compiles; full test in the last patch in this patchset. Existing reuseport_* selftests also pass. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14net: udp6: prefer listeners bound to an addressPeter Oskolkov
A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patch eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In addition, compute_score() is tweaked to _not_ match addr_any sockets to specific addresses, as hash collisions could result in the unwanted behavior described above. Tested: the patch compiles; full test in the last patch in this patchset. Existing reuseport_* selftests also pass. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14net: udp: prefer listeners bound to an addressPeter Oskolkov
A relatively common use case is to have several IPs configured on a host, and have different listeners for each of them. We would like to add a "catch all" listener on addr_any, to match incoming connections not served by any of the listeners bound to a specific address. However, port-only lookups can match addr_any sockets when sockets listening on specific addresses are present if so_reuseport flag is set. This patch eliminates lookups into port-only hashtable, as lookups by (addr,port) tuple are easily available. In addition, compute_score() is tweaked to _not_ match addr_any sockets to specific addresses, as hash collisions could result in the unwanted behavior described above. Tested: the patch compiles; full test in the last patch in this patchset. Existing reuseport_* selftests also pass. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14add snmp counters documentyupeng
Add explainations for some general IP counters, SACK and DSACK related counters Signed-off-by: yupeng <yupeng0921@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14tipc: check tsk->group in tipc_wait_for_cond()Cong Wang
tipc_wait_for_cond() drops socket lock before going to sleep, but tsk->group could be freed right after that release_sock(). So we have to re-check and reload tsk->group after it wakes up. After this patch, tipc_wait_for_cond() returns -ERESTARTSYS when tsk->group is NULL, instead of continuing with the assumption of a non-NULL tsk->group. (It looks like 'dsts' should be re-checked and reloaded too, but it is a different bug.) Similar for tipc_send_group_unicast() and tipc_send_group_anycast(). Reported-by: syzbot+10a9db47c3a0e13eb31c@syzkaller.appspotmail.com Fixes: b7d42635517f ("tipc: introduce flow control for group broadcast messages") Fixes: ee106d7f942d ("tipc: introduce group anycast messaging") Fixes: 27bd9ec027f3 ("tipc: introduce group unicast messaging") Cc: Ying Xue <ying.xue@windriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14Merge branch 'neighbor-More-gc_list-changes'David S. Miller
David Ahern says: ==================== neighbor: More gc_list changes More gc_list changes and cleanups. The first 2 patches are bug fixes from the first gc_list change. Specifically, fix the locking order to be consistent - table lock followed by neighbor lock, and then entries in the FAILED state should always be candidates for forced_gc without waiting for any time span (return to the eviction logic prior to the separate gc_list). Patch 3 removes 2 now unnecessary arguments to neigh_del. Patch 4 moves a helper from a header file to core code in preparation for Patch 5 which removes NTF_EXT_LEARNED entries from the gc_list. These entries are already exempt from forced_gc; patch 5 removes them from consideration and makes them on par with PERMANENT entries given that they are also managed by userspace. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14neighbor: Remove externally learned entries from gc_listDavid Ahern
Externally learned entries are similar to PERMANENT entries in the sense they are managed by userspace and can not be garbage collected. As such remove them from the gc_list, remove the flags check from neigh_forced_gc and skip threshold checks in neigh_alloc. As with PERMANENT entries, this allows unlimited number of NTF_EXT_LEARNED entries. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14neighbor: Move neigh_update_ext_learned to core fileDavid Ahern
neigh_update_ext_learned has one caller in neighbour.c so does not need to be defined in the header. Move it and in the process remove the intialization of ndm_flags and just set it based on the flags check. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14neighbor: Remove state and flags arguments to neigh_delDavid Ahern
neigh_del now only has 1 caller, and the state and flags arguments are both 0. Remove them and simplify neigh_del. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-14neighbor: Fix state check in neigh_forced_gcDavid Ahern
PERMANENT entries are not on the gc_list so the state check is now redundant. Also, the move to not purge entries until after 5 seconds should not apply to FAILED entries; those can be removed immediately to make way for newer ones. This restores the previous logic prior to the gc_list. Fixes: 58956317c8de ("neighbor: Improve garbage collection") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>