summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-01-11mac80211: fix the TID on NDPs sent as EOSP carrierEmmanuel Grumbach
In the commit below, I forgot to translate the mac80211's AC to QoS IE order. Moreover, the condition in the if was wrong. Fix both issues. This bug would hit only with clients that didn't set all the ACs as delivery enabled. Fixes: f438ceb81d4 ("mac80211: uapsd_queues is in QoS IE order") Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11mac80211: Fix headroom allocation when forwarding mesh pktCedric Izoard
This patch fix issue introduced by my previous commit that tried to ensure enough headroom was present, and instead broke it. When forwarding mesh pkt, mac80211 may also add security header, and it must therefore be taken into account in the needed headroom. Fixes: d8da0b5d64d5 ("mac80211: Ensure enough headroom when forwarding mesh pkt") Signed-off-by: Cedric Izoard <cedric.izoard@ceva-dsp.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11sctp: Fix spelling mistake: "Atempt" -> "Attempt"Colin Ian King
Trivial fix to spelling mistake in WARN_ONCE message Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11net: ipv4: Fix multipath selection with vrfDavid Ahern
fib_select_path does not call fib_select_multipath if oif is set in the flow struct. For VRF use cases oif is always set, so multipath route selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif check similar to what is done in fib_table_lookup. Add saddr and proto to the flow struct for the fib lookup done by the VRF driver to better match hash computation for a flow. Fixes: 613d09b30f8b ("net: Use VRF device index for lookups on TX") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11Revert "net: dsa: Implement ndo_get_phys_port_id"Florian Fainelli
This reverts commit 3a543ef479868e36c95935de320608a7e41466ca ("net: dsa: Implement ndo_get_phys_port_id") since it misuses the purpose of ndo_get_phys_port_id(). We have ndo_get_phys_port_name() to do the correct thing for us now. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11net: dsa: Implement ndo_get_phys_port_name()Florian Fainelli
Return the physical port number of a DSA created network device using ndo_get_phys_port_name(). Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11cgroup: move CONFIG_SOCK_CGROUP_DATA to init/KconfigArnd Bergmann
We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is not right when CONFIG_NET is disabled and there is no socket interface: warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET) I don't know what the correct solution for this is, but simply removing the dependency on NET from SOCK_CGROUP_DATA by moving it out of the 'if NET' section avoids the warning and does not produce other build errors. Fixes: 483c4933ea09 ("cgroup: Fix CGROUP_BPF config") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11net: dsa: make "label" property optional for dsa2Vivien Didelot
In the new DTS bindings for DSA (dsa2), the "ethernet" and "link" phandles are respectively mandatory and exclusive to CPU port and DSA link device tree nodes. Simplify dsa2.c a bit by checking the presence of such phandle instead of checking the redundant "label" property. Then the Linux philosophy for Ethernet switch ports is to expose them to userspace as standard NICs by default. Thus use the standard enumerated "eth%d" device name if no "label" property is provided for a user port. This allows to save DTS files from subjective net device names. If one wants to rename an interface, udev rules can be used as usual. Of course the current behavior is unchanged, and the optional "label" property for user ports has precedence over the enumerated name. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Uwe Kleine-König <uwe@kleine-koenig.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11gro: use min_t() in skb_gro_reset_offset()Eric Dumazet
On 32bit arches, (skb->end - skb->data) is not 'unsigned int', so we shall use min_t() instead of min() to avoid a compiler error. Fixes: 1272ce87fa01 ("gro: Enter slow-path if there is no tailroom") Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11cfg80211: wext does not need to set monitor channel in managed modeJorge Ramirez-Ortiz
There is not a valid reason to attempt setting the monitor channel while in managed mode. Since this code path only deals with this mode, remove the code block. Johannes: I'll note that the comment indicated it was for backward compatibility, but the code wasn't functional since switching the monitor channel isn't supported (any more?) when in managed mode, as that mode owns the channel configuration. Additionally, since monitor can't be done on a managed mode interface, this would only have had any effect to start with if a separate monitor interface is present, in which case it's better to change the channel through that anyway, if even possible. Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-10mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to ↵Alexander Duyck
page_frag_free Patch series "Page fragment updates", v4. This patch series takes care of a few cleanups for the page fragments API. First we do some renames so that things are much more consistent. First we move the page_frag_ portion of the name to the front of the functions names. Secondly we split out the cache specific functions from the other page fragment functions by adding the word "cache" to the name. Finally I added a bit of documentation that will hopefully help to explain some of this. I plan to revisit this later as we get things more ironed out in the near future with the changes planned for the DMA setup to support eXpress Data Path. This patch (of 3): This patch renames the page frag functions to be more consistent with other APIs. Specifically we place the name page_frag first in the name and then have either an alloc or free call name that we append as the suffix. This makes it a bit clearer in terms of naming. In addition we drop the leading double underscores since we are technically no longer a backing interface and instead the front end that is called from the networking APIs. Link: http://lkml.kernel.org/r/20170104023854.13451.67390.stgit@localhost.localdomain Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10gro: Disable frag0 optimization on IPv6 ext headersHerbert Xu
The GRO fast path caches the frag0 address. This address becomes invalid if frag0 is modified by pskb_may_pull or its variants. So whenever that happens we must disable the frag0 optimization. This is usually done through the combination of gro_header_hard and gro_header_slow, however, the IPv6 extension header path did the pulling directly and would continue to use the GRO fast path incorrectly. This patch fixes it by disabling the fast path when we enter the IPv6 extension header path. Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address") Reported-by: Slava Shwartsman <slavash@mellanox.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10gro: Enter slow-path if there is no tailroomHerbert Xu
The GRO path has a fast-path where we avoid calling pskb_may_pull and pskb_expand by directly accessing frag0. However, this should only be done if we have enough tailroom in the skb as otherwise we'll have to expand it later anyway. This patch adds the check by capping frag0_len with the skb tailroom. Fixes: cb18978cbf45 ("gro: Open-code final pskb_may_pull") Reported-by: Slava Shwartsman <slavash@mellanox.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10net/af_iucv: don't use paged skbs for TX on HiperSocketsJulian Wiedmann
With commit e53743994e21 ("af_iucv: use paged SKBs for big outbound messages"), we transmit paged skbs for both of AF_IUCV's transport modes (IUCV or HiperSockets). The qeth driver for Layer 3 HiperSockets currently doesn't support NETIF_F_SG, so these skbs would just be linearized again by the stack. Avoid that overhead by using paged skbs only for IUCV transport. cc stable, since this also circumvents a significant skb leak when sending large messages (where the skb then needs to be linearized). Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> # v4.8+ Fixes: e53743994e21 ("af_iucv: use paged SKBs for big outbound messages") Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10packet: pdiag_put_ring() should return TX_RING info for TPACKET_V3Sowmini Varadhan
Commit 7f953ab2ba46 ("af_packet: TX_RING support for TPACKET_V3") now makes it possible to use TX_RING with TPACKET_V3, so make the the relevant information available via 'ss -e -a --packet' Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10net: add the AF_QIPCRTR entries to family name tablesAnna, Suman
Commit bdabad3e363d ("net: Add Qualcomm IPC router") introduced a new address family. Update the family name tables accordingly so that the lockdep initialization can use the proper names for this family. Cc: Courtney Cavin <courtney.cavin@sonymobile.com> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Suman Anna <s-anna@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10net: qrtr: Mark 'buf' as little endianStephen Boyd
Failure to mark this pointer as __le32 causes checkers like sparse to complain: net/qrtr/qrtr.c:274:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:274:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:274:16: got restricted __le32 [usertype] <noident> net/qrtr/qrtr.c:275:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:275:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:275:16: got restricted __le32 [usertype] <noident> net/qrtr/qrtr.c:276:16: warning: incorrect type in assignment (different base types) net/qrtr/qrtr.c:276:16: expected unsigned int [unsigned] [usertype] <noident> net/qrtr/qrtr.c:276:16: got restricted __le32 [usertype] <noident> Silence it. Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10net: dsa: Ensure validity of dst->ds[0]Florian Fainelli
It is perfectly possible to have non zero indexed switches being present in a DSA switch tree, in such a case, we will be deferencing a NULL pointer while dsa_cpu_port_ethtool_{setup,restore}. Be more defensive and ensure that dst->ds[0] is valid before doing anything with it. Fixes: 0c73c523cf73 ("net: dsa: Initialize CPU port ethtool ops per tree") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10net: skb_flow_get_be16() can be staticEric Dumazet
Removes following sparse complain : net/core/flow_dissector.c:70:8: warning: symbol 'skb_flow_get_be16' was not declared. Should it be static? Fixes: 972d3876faa8 ("flow dissector: ICMP support") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10net: socket: Make unnecessarily global sockfs_setattr() staticTobias Klauser
Make sockfs_setattr() static as it is not used outside of net/socket.c This fixes the following GCC warning: net/socket.c:534:5: warning: no previous prototype for ‘sockfs_setattr’ [-Wmissing-prototypes] Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.") Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10xfrm: state: simplify rcu_read_unlock handling in two spotsFlorian Westphal
Instead of: if (foo) { unlock(); return bar(); } unlock(); do: unlock(); if (foo) return bar(); This is ok because rcu protected structure is only dereferenced before the conditional. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10xfrm: add and use xfrm_state_afinfo_get_rcuFlorian Westphal
xfrm_init_tempstate is always called from within rcu read side section. We can thus use a simpler function that doesn't call rcu_read_lock again. While at it, also make xfrm_init_tempstate return value void, the return value was never tested. A followup patch will replace remaining callers of xfrm_state_get_afinfo with xfrm_state_afinfo_get_rcu variant and then remove the 'old' get_afinfo interface. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10xfrm: remove xfrm_state_put_afinfoFlorian Westphal
commit 44abdc3047aecafc141dfbaf1ed ("xfrm: replace rwlock on xfrm_state_afinfo with rcu") made xfrm_state_put_afinfo equivalent to rcu_read_unlock. Use spatch to replace it with direct calls to rcu_read_unlock: @@ struct xfrm_state_afinfo *a; @@ - xfrm_state_put_afinfo(a); + rcu_read_unlock(); old: text data bss dec hex filename 22570 72 424 23066 5a1a xfrm_state.o 1612 0 0 1612 64c xfrm_output.o new: 22554 72 424 23050 5a0a xfrm_state.o 1596 0 0 1596 63c xfrm_output.o Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10xfrm: avoid rcu sparse warningFlorian Westphal
xfrm/xfrm_state.c:1973:21: error: incompatible types in comparison expression (different address spaces) Harmless, but lets fix it to reduce the noise. While at it, get rid of unneeded NULL check, its never hit: net/ipv4/xfrm4_state.c: xfrm_state_register_afinfo(&xfrm4_state_afinfo); net/ipv6/xfrm6_state.c: return xfrm_state_register_afinfo(&xfrm6_state_afinfo); net/ipv6/xfrm6_state.c: xfrm_state_unregister_afinfo(&xfrm6_state_afinfo); ... are the only callsites. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10xfrm: remove unused functionFlorian Westphal
Has been ifdef'd out for more than 10 years, remove it. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-09net: dsa: select NET_SWITCHDEVVivien Didelot
The support for DSA Ethernet switch chips depends on TCP/IP networking, thus explicit that HAVE_NET_DSA depends on INET. DSA uses SWITCHDEV, thus select it instead of depending on it. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09tcp: make TCP_INFO more consistentEric Dumazet
tcp_get_info() has to lock the socket, so lets lock it for an extended critical section, so that various fields have consistent values. This solves an annoying issue that some applications reported when multiple counters are updated during one particular rx/rx event, and TCP_INFO was called from another cpu. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09bpf: rename ARG_PTR_TO_STACKAlexei Starovoitov
since ARG_PTR_TO_STACK is no longer just pointer to stack rename it to ARG_PTR_TO_MEM and adjust comment. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09tcp: do not export tcp_peer_is_proven()Eric Dumazet
After commit 1fb6f159fd21 ("tcp: add tcp_conn_request"), tcp_peer_is_proven() no longer needs to be exported. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned intPavel Tikhomirov
> cat /proc/sys/net/ipv4/tcp_notsent_lowat -1 > echo 4294967295 > /proc/sys/net/ipv4/tcp_notsent_lowat -bash: echo: write error: Invalid argument > echo -2147483648 > /proc/sys/net/ipv4/tcp_notsent_lowat > cat /proc/sys/net/ipv4/tcp_notsent_lowat -2147483648 but in documentation we have "tcp_notsent_lowat - UNSIGNED INTEGER" v2: simplify to just proc_douintvec Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09ipv6: fix typosAlexander Alemayhu
o s/approriate/appropriate o s/discouvery/discovery Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09smc: netlink interface for SMC socketsUrsula Braun
Support for SMC socket monitoring via netlink sockets of protocol NETLINK_SOCK_DIAG. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09smc: socket closing and linkgroup cleanupUrsula Braun
smc_shutdown() and smc_release() handling delayed linkgroup cleanup for linkgroups without connections Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09smc: receive data from RMBEUrsula Braun
move RMBE data into user space buffer and update managing cursors Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09smc: send data (through RDMA)Ursula Braun
copy data to kernel send buffer, and trigger RDMA write Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09smc: connection data control (CDC)Ursula Braun
send and receive CDC messages (via IB message send and CQE) Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09smc: link layer control (LLC)Ursula Braun
send and receive LLC messages CONFIRM_LINK (via IB message send and CQE) Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09smc: initialize IB transport incl. PD, MR, QP, CQ, event, WRUrsula Braun
Prepare the link for RDMA transport: Create a queue pair (QP) and move it into the state Ready-To-Receive (RTR). Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09smc: work request (WR) base for use by LLC and CDCUrsula Braun
The base containers for RDMA transport are work requests and completion queue entries processed through Infiniband verbs: * allocate and initialize these areas * map these areas to DMA * implement the basic communication consisting of work request posting and receival of completion queue events Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09smc: remote memory buffers (RMBs)Ursula Braun
* allocate data RMB memory for sending and receiving * size depends on the maximum socket send and receive buffers * allocated RMBs are kept during life time of the owning link group * map the allocated RMBs to DMA Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09smc: connection and link group creationUrsula Braun
* create smc_connection for SMC-sockets * determine suitable link group for a connection * create a new link group if necessary Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09smc: CLC handshake (incl. preparation steps)Ursula Braun
* CLC (Connection Layer Control) handshake Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09smc: establish pnet table managementThomas Richter
Connection creation with SMC-R starts through an internal TCP-connection. The Ethernet interface for this TCP-connection is not restricted to the Ethernet interface of a RoCE device. Any existing Ethernet interface belonging to the same physical net can be used, as long as there is a defined relation between the Ethernet interface and some RoCE devices. This relation is defined with the help of an identification string called "Physical Net ID" or short "pnet ID". Information about defined pnet IDs and their related Ethernet interfaces and RoCE devices is stored in the SMC-R pnet table. A pnet table entry consists of the identifying pnet ID and the associated network and IB device. This patch adds pnet table configuration support using the generic netlink message interface referring to network and IB device by their names. Commands exist to add, delete, and display pnet table entries, and to flush or display the entire pnet table. There are cross-checks to verify whether the ethernet interfaces or infiniband devices really exist in the system. If either device is not available, the pnet ID entry is not created. Loss of network devices and IB devices is also monitored; a pnet ID entry is removed when an associated network or IB device is removed. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09smc: introduce SMC as an IB-clientUrsula Braun
* create a list of SMC IB-devices Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09smc: establish new socket familyUrsula Braun
* enable smc module loading and unloading * register new socket family * basic smc socket creation and deletion * use backing TCP socket to run CLC (Connection Layer Control) handshake of SMC protocol * Setup for infiniband traffic is implemented in follow-on patches. For now fallback to TCP socket is always used. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: introduce keepalive function in struct protoUrsula Braun
Direct call of tcp_set_keepalive() function from protocol-agnostic sock_setsockopt() function in net/core/sock.c violates network layering. And newly introduced protocol (SMC-R) will need its own keepalive function. Therefore, add "keepalive" function pointer to "struct proto", and call it from sock_setsockopt() via this pointer. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: for rate-limited ICMP replies save one atomic operationJesper Dangaard Brouer
It is possible to avoid the atomic operation in icmp{v6,}_xmit_lock, by checking the sysctl_icmp_msgs_per_sec ratelimit before these calls, as pointed out by Eric Dumazet, but the BH disabled state must be correct. The icmp_global_allow() call states it must be called with BH disabled. This protection was given by the calls icmp_xmit_lock and icmpv6_xmit_lock. Thus, split out local_bh_disable/enable from these functions and maintain it explicitly at callers. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09net: reduce cycles spend on ICMP replies that gets rate limitedJesper Dangaard Brouer
This patch split the global and per (inet)peer ICMP-reply limiter code, and moves the global limit check to earlier in the packet processing path. Thus, avoid spending cycles on ICMP replies that gets limited/suppressed anyhow. The global ICMP rate limiter icmp_global_allow() is a good solution, it just happens too late in the process. The kernel goes through the full route lookup (return path) for the ICMP message, before taking the rate limit decision of not sending the ICMP reply. Details: The kernels global rate limiter for ICMP messages got added in commit 4cdf507d5452 ("icmp: add a global rate limitation"). It is a token bucket limiter with a global lock. It brilliantly avoids locking congestion by only updating when 20ms (HZ/50) were elapsed. It can then avoids taking lock when credit is exhausted (when under pressure) and time constraint for refill is not yet meet. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09Revert "icmp: avoid allocating large struct on stack"Jesper Dangaard Brouer
This reverts commit 9a99d4a50cb8 ("icmp: avoid allocating large struct on stack"), because struct icmp_bxm no really a large struct, and allocating and free of this small 112 bytes hurts performance. Fixes: 9a99d4a50cb8 ("icmp: avoid allocating large struct on stack") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09Merge tag 'rxrpc-rewrite-20170109' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== afs: Refcount afs_call struct These patches provide some tracepoints for AFS and fix a potential leak by adding refcounting to the afs_call struct. The patches are: (1) Add some tracepoints for logging incoming calls and monitoring notifications from AF_RXRPC and data reception. (2) Get rid of afs_wait_mode as it didn't turn out to be as useful as initially expected. It can be brought back later if needed. This clears some stuff out that I don't then need to fix up in (4). (3) Allow listen(..., 0) to be used to disable listening. This makes shutting down the AFS cache manager server in the kernel much easier and the accounting simpler as we can then be sure that (a) all preallocated afs_call structs are relesed and (b) no new incoming calls are going to be started. For the moment, listening cannot be reenabled. (4) Add refcounting to the afs_call struct to fix a potential multiple release detected by static checking and add a tracepoint to follow the lifecycle of afs_call objects. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>