summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-06-17net: vrf: Implement get_saddr for IPv6David Ahern
IPv6 source address selection needs to consider the real egress route. Similar to IPv4 implement a get_saddr6 method which is called if source address has not been set. The get_saddr6 method does a full lookup which means pulling a route from the VRF FIB table and properly considering linklocal/multicast destination addresses. Lookup failures (eg., unreachable) then cause the source address selection to fail which gets propagated back to the caller. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17net: ipv6: Move ip6_route_get_saddr to inlineDavid Ahern
VRF driver needs access to ip6_route_get_saddr code. Since it does little beyond ipv6_dev_get_saddr and ipv6_dev_get_saddr is already exported for modules move ip6_route_get_saddr to the header as an inline. Code move only; no functional change. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17net: Remove deprecated tunnel specific UDP offload functionsAlexander Duyck
Now that we have all the drivers using udp_tunnel_get_rx_ports, ndo_add_udp_enc_rx_port, and ndo_del_udp_enc_rx_port we can drop the function calls that were specific to VXLAN and GENEVE. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17net: Merge VXLAN and GENEVE push notifiers into a single notifierAlexander Duyck
This patch merges the notifiers for VXLAN and GENEVE into a single UDP tunnel notifier. The idea is that we will want to only have to make one notifier call to receive the list of ports for VXLAN and GENEVE tunnels that need to be offloaded. In addition we add a new set of ndo functions named ndo_udp_tunnel_add and ndo_udp_tunnel_del that are meant to allow us to track the tunnel meta-data such as port and address family as tunnels are added and removed. The tunnel meta-data is now transported in a structure named udp_tunnel_info which for now carries the type, address family, and port number. In the future this could be updated so that we can include a tuple of values including things such as the destination IP address and other fields. I also ended up going with a naming scheme that consisted of using the prefix udp_tunnel on function names. I applied this to the notifier and ndo ops as well so that it hopefully points to the fact that these are primarily used in the udp_tunnel functions. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17net: Combine GENEVE and VXLAN port notifiers into single functionsAlexander Duyck
This patch merges the GENEVE and VXLAN code so that both functions pass through a shared code path. This way we can start the effort of using a single function on the network device drivers to handle both of these tunnel types. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are rather small patches but fixing several outstanding bugs in nf_conntrack and nf_tables, as well as minor problems with missing SYNPROXY header uapi installation: 1) Oneliner not to leak conntrack kmemcache on module removal, this problem was introduced in the previous merge window, patch from Florian Westphal. 2) Two fixes for insufficient ruleset loop validation, one due to incorrect flag check in nf_tables_bind_set() and another related to silly wrong generation mask logic from the walk path, from Liping Zhang. 3) Fix double-free of anonymous sets on error, this fix simplifies the code to let the abort path take care of releasing the set object, also from Liping Zhang. 4) The introduction of helper function for transactions broke the skip inactive rules logic from the nft_do_chain(), again from Liping Zhang. 5) Two patches to install uapi xt_SYNPROXY.h header and calm down kbuild robot due to missing #include <linux/types.h>. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-17can: bcm: add support for CAN FD framesOliver Hartkopp
The programming API of the CAN_BCM depends on struct can_frame which is given as array directly behind the bcm_msg_head structure. To follow this schema for the CAN FD frames a new flag 'CAN_FD_FRAME' in the bcm_msg_head flags indicates that the concatenated CAN frame structures behind the bcm_msg_head are defined as struct canfd_frame. This patch adds the support to handle CAN and CAN FD frames on a per BCM-op base. Main changes: - generally use struct canfd_frames instead if struct can_frames - use canfd_frame.flags instead of can_frame.can_dlc for private BCM flags - make all CAN frame sizes depending on the new CAN_FD_FRAME flags - separate between CAN and CAN FD when sending/receiving frames Due to the dependence of the CAN_FD_FRAME flag the former binary interface for classic CAN frames remains stable. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-06-17can: bcm: unify bcm_msg_head handling and prepare function parametersOliver Hartkopp
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-06-17can: bcm: use CAN frame instead of can_frame in commentsOliver Hartkopp
can_frame is the name of the struct can_frame which is not meant in the corrected comments. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-06-17can: bcm: fix indention and other minor style issuesOliver Hartkopp
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-06-17can: build proc support only if CONFIG_PROC_FS is activatedAlexander Aring
When building can subsystem with CONFIG_PROC_FS=n I detected some unused variables warning by using proc functions. In CAN the proc handling is nicely placed in one object file. This patch adds simple add a dependency on CONFIG_PROC_FS for CAN's proc.o file and corresponding static inline no-op functions. Signed-off-by: Alexander Aring <aar@pengutronix.de> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> [mkl: provide static inline noops instead of using #ifdefs] Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2016-06-16net, cls: also reject deleting all filters when TCA_KIND presentDaniel Borkmann
When we check for RTM_DELTFILTER, we should also reject the request for deleting all filters under a given parent when TCA_KIND attribute is present. If present, it's currently just ignored but there's also no point to let it pass in the first place either since this doesn't have any meaning with wild-card removal. Fixes: ea7f8277f907 ("net, cls: allow for deleting all filters for given parent") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16net: xfrm: fix old-style declarationArnd Bergmann
Modern C standards expect the '__inline__' keyword to come before the return type in a declaration, and we get a couple of warnings for this with "make W=1" in the xfrm{4,6}_policy.c files: net/ipv6/xfrm6_policy.c:369:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration] static int inline xfrm6_net_sysctl_init(struct net *net) net/ipv6/xfrm6_policy.c:374:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration] static void inline xfrm6_net_sysctl_exit(struct net *net) net/ipv4/xfrm4_policy.c:339:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration] static int inline xfrm4_net_sysctl_init(struct net *net) net/ipv4/xfrm4_policy.c:344:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration] static void inline xfrm4_net_sysctl_exit(struct net *net) Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16Merge tag 'nfsd-4.7-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd bugfixes from Bruce Fields: "Oleg Drokin found and fixed races in the nfsd4 state code that go back to the big nfs4_lock_state removal around 3.17 (but that were also probably hard to reproduce before client changes in 3.20 allowed the client to perform parallel opens). Also fix a 4.1 backchannel crash due to rpc multipath changes in 4.6. Trond acked the client-side rpc fixes going through my tree" * tag 'nfsd-4.7-1' of git://linux-nfs.org/~bfields/linux: nfsd: Make init_open_stateid() a bit more whole nfsd: Extend the mutex holding region around in nfsd4_process_open2() nfsd: Always lock state exclusively. rpc: share one xps between all backchannels nfsd4/rpc: move backchannel create logic into rpc code SUNRPC: fix xprt leak on xps allocation failure nfsd: Fix NFSD_MDS_PR_KEY on 32-bit by adding ULL postfix
2016-06-16Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "This contains two regression fixes: one for the xattr API update and one for using the mounter's creds in file creation in overlayfs. There's also a fix for a bug in handling hard linked AF_UNIX sockets that's been there from day one. This fix is overlayfs only despite the fact that it touches code outside the overlay filesystem: d_real() is an identity function for all except overlay dentries" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix uid/gid when creating over whiteout ovl: xattr filter fix af_unix: fix hard linked sockets on overlay vfs: add d_real_inode() helper
2016-06-16net: the space is required after ','Wei Tang
The space is missing after ',', and this will introduce much more noise when checking patch around. Signed-off-by: Wei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16net: do not initialise statics to 0Wei Tang
This patch fixes the checkpatch.pl error to dev.c: ERROR: do not initialise statics to 0 Signed-off-by: Wei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16mpls: allow routes on ipgre devicesSimon Horman
This appears to be necessary and sufficient to provide MPLS in GRE (RFC4023) support. This can be used by establishing an ipgre tunnel device and then routing MPLS over it. The following example will forward MPLS frames received with an outermost MPLS label 100 over tun1, a GRE tunnel. The forwarded packet will have the outermost MPLS LSE removed and two new LSEs added with labels 200 (outermost) and 300 (next). ip link add name tun1 type gre remote 10.0.99.193 local 10.0.99.192 ttl 225 ip link set up dev tun1 ip addr add 10.0.98.192/24 dev tun1 ip route sh echo 1 > /proc/sys/net/mpls/conf/eth0/input echo 101 > /proc/sys/net/mpls/platform_labels ip -f mpls route add 100 as 200/300 via inet 10.0.98.193 ip -f mpls route sh Also remove unnecessary braces. Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16sit: correct IP protocol used in ipip6_errSimon Horman
Since 32b8a8e59c9c ("sit: add IPv4 over IPv4 support") ipip6_err() may be called for packets whose IP protocol is IPPROTO_IPIP as well as those whose IP protocol is IPPROTO_IPV6. In the case of IPPROTO_IPIP packets the correct protocol value is not passed to ipv4_update_pmtu() or ipv4_redirect(). This patch resolves this problem by using the IP protocol of the packet rather than a hard-coded value. This appears to be consistent with the usage of the protocol of a packet by icmp_socket_deliver() the caller of ipip6_err(). I was able to exercise the redirect case by using a setup where an ICMP redirect was received for the destination of the encapsulated packet. However, it appears that although incorrect the protocol field is not used in this case and thus no problem manifests. On inspection it does not appear that a problem will manifest in the fragmentation needed/update pmtu case either. In short I believe this is a cosmetic fix. None the less, the use of IPPROTO_IPV6 seems wrong and confusing. Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-16sctp: change sk state to CLOSED instead of CLOSING in sctp_sock_migrateXin Long
Commit d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") may set sk_state CLOSING in sctp_sock_migrate, but inet_accept doesn't allow the sk_state other than ESTABLISHED/ CLOSED for sctp. So we will change sk_state to CLOSED, instead of CLOSING, as actually sk is closed already there. Fixes: d46e416c11c8 ("sctp: sctp should change socket state when shutdown is received") Reported-by: Ye Xiaolong <xiaolong.ye@intel.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15bpf: fix matching of data/data_end in verifierAlexei Starovoitov
The ctx structure passed into bpf programs is different depending on bpf program type. The verifier incorrectly marked ctx->data and ctx->data_end access based on ctx offset only. That caused loads in tracing programs int bpf_prog(struct pt_regs *ctx) { .. ctx->ax .. } to be incorrectly marked as PTR_TO_PACKET which later caused verifier to reject the program that was actually valid in tracing context. Fix this by doing program type specific matching of ctx offsets. Fixes: 969bf05eb3ce ("bpf: direct packet access") Reported-by: Sasha Goldshtein <goldshtn@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15Merge tag 'rxrpc-rewrite-20160615' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Rework endpoint record handling Here's the next part of the AF_RXRPC rewrite. In this set I rework endpoint record handling. There are two types of endpoint record, local and peer. The local endpoint record is used as an anchor for the transport socket that AF_RXRPC uses (at the moment a UDP socket). Local endpoints can be shared between AF_RXRPC sockets under certain restricted circumstances. The peer endpoint is a record of the remote end. It is (or will be) used to keep track MTU and RTT values and, with these changes, is used to find the call(s) to abort when a network error occurs. The following significant changes are made: (1) The local endpoint event handling code is split out into its own file. (2) The local endpoint list bottom half-excluding spinlock is removed as things are arranged such that sk_user_data will not change whilst the transport socket callbacks are in progress. (3) Local endpoints can now only be shared if they have the same transport address (as before) and have a local service ID of 0 (ie. they're not listening for incoming calls). This prevents callbacks from a server to one process being picked up by another process. (4) Local endpoint destruction is now accomplished by the same work item as processes events, meaning that the destructor doesn't need to wait for the event processor. (5) Peer endpoints are now held in a hash table rather than a flat list. (6) Peer endpoints are now destroyed by RCU rather than by work item. (7) Peer endpoints are now differentiated by local endpoint and remote transport port in addition to remote transport address and transport type and family. This means that a firewall that excludes access between a particular local port and remote port won't cause calls to be aborted that use a different port pair. (8) Error report handling now no longer assumes that the source is always an IPv4 ICMP message from a UDP port and has assumptions that an ICMP message comes from an IPv4 socket removed. At some point IPv6 support will be added. (9) Peer endpoints rather than local endpoints are now the anchor point for distributing network error reports. (10) Both types of endpoint records are now disposed of as soon as all references to them are gone. There is less hanging around and once their usage counts hit zero, records can no longer be resurrected. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15gre: fix error handlerEric Dumazet
1) gre_parse_header() can be called from gre_err() At this point transport header points to ICMP header, not the inner header. 2) We can not really change transport header as ipgre_err() will later assume transport header still points to ICMP header (using icmp_hdr()) 3) pskb_may_pull() logic in gre_parse_header() really works if we are interested at zone pointed by skb->data 4) As Jiri explained in commit b7f8fe251e46 ("gre: do not pull header in ICMP error processing") we should not pull headers in error handler. So this fix : A) changes gre_parse_header() to use skb->data instead of skb_transport_header() B) Adds a nhs parameter to gre_parse_header() so that we can skip the not pulled IP header from error path. This offset is 0 for normal receive path. C) remove obsolete IPV6 includes Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15tipc: eliminate uninitialized variable warningYing Xue
net/tipc/link.c: In function ‘tipc_link_timeout’: net/tipc/link.c:744:28: warning: ‘mtyp’ may be used uninitialized in this function [-Wuninitialized] Fixes: 42b18f605fea ("tipc: refactor function tipc_link_timeout()") Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15tipc: fix suspicious RCU usageYing Xue
When run tipcTS&tipcTC test suite, the following complaint appears: [ 56.926168] =============================== [ 56.926169] [ INFO: suspicious RCU usage. ] [ 56.926171] 4.7.0-rc1+ #160 Not tainted [ 56.926173] ------------------------------- [ 56.926174] net/tipc/bearer.c:408 suspicious rcu_dereference_protected() usage! [ 56.926175] [ 56.926175] other info that might help us debug this: [ 56.926175] [ 56.926177] [ 56.926177] rcu_scheduler_active = 1, debug_locks = 1 [ 56.926179] 3 locks held by swapper/4/0: [ 56.926180] #0: (((&req->timer))){+.-...}, at: [<ffffffff810e79b5>] call_timer_fn+0x5/0x340 [ 56.926203] #1: (&(&req->lock)->rlock){+.-...}, at: [<ffffffffa000c29b>] disc_timeout+0x1b/0xd0 [tipc] [ 56.926212] #2: (rcu_read_lock){......}, at: [<ffffffffa00055e0>] tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc] [ 56.926218] [ 56.926218] stack backtrace: [ 56.926221] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.7.0-rc1+ #160 [ 56.926222] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 56.926224] 0000000000000000 ffff880016803d28 ffffffff813c4423 ffff8800154252c0 [ 56.926227] 0000000000000001 ffff880016803d58 ffffffff810b7512 ffff8800124d8120 [ 56.926230] ffff880013f8a160 ffff8800132b5ccc ffff8800124d8120 ffff880016803d88 [ 56.926234] Call Trace: [ 56.926235] <IRQ> [<ffffffff813c4423>] dump_stack+0x67/0x94 [ 56.926250] [<ffffffff810b7512>] lockdep_rcu_suspicious+0xe2/0x120 [ 56.926256] [<ffffffffa00051f1>] tipc_l2_send_msg+0x131/0x1c0 [tipc] [ 56.926261] [<ffffffffa000567c>] tipc_bearer_xmit_skb+0x14c/0x2e0 [tipc] [ 56.926266] [<ffffffffa00055e0>] ? tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc] [ 56.926273] [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc] [ 56.926278] [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc] [ 56.926283] [<ffffffffa000c2d6>] disc_timeout+0x56/0xd0 [tipc] [ 56.926288] [<ffffffff810e7a68>] call_timer_fn+0xb8/0x340 [ 56.926291] [<ffffffff810e79b5>] ? call_timer_fn+0x5/0x340 [ 56.926296] [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc] [ 56.926300] [<ffffffff810e8f4a>] run_timer_softirq+0x23a/0x390 [ 56.926306] [<ffffffff810f89ff>] ? clockevents_program_event+0x7f/0x130 [ 56.926316] [<ffffffff819727c3>] __do_softirq+0xc3/0x4a2 [ 56.926323] [<ffffffff8106ba5a>] irq_exit+0x8a/0xb0 [ 56.926327] [<ffffffff81972456>] smp_apic_timer_interrupt+0x46/0x60 [ 56.926331] [<ffffffff81970a49>] apic_timer_interrupt+0x89/0x90 [ 56.926333] <EOI> [<ffffffff81027fda>] ? default_idle+0x2a/0x1a0 [ 56.926340] [<ffffffff81027fd8>] ? default_idle+0x28/0x1a0 [ 56.926342] [<ffffffff810289cf>] arch_cpu_idle+0xf/0x20 [ 56.926345] [<ffffffff810adf0f>] default_idle_call+0x2f/0x50 [ 56.926347] [<ffffffff810ae145>] cpu_startup_entry+0x215/0x3e0 [ 56.926353] [<ffffffff81040ad9>] start_secondary+0xf9/0x100 The warning appears as rtnl_dereference() is wrongly used in tipc_l2_send_msg() under RCU read lock protection. Instead the proper usage should be that rcu_dereference_rtnl() is called here. Fixes: 5b7066c3dd24 ("tipc: stricter filtering of packets in bearer layer") Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15ila: Fix checksum neutral mappingTom Herbert
The algorithm for checksum neutral mapping is incorrect. This problem was being hidden since we were previously always performing checksum offload on the translated addresses and only with IPv6 HW csum. Enabling an ILA router shows the issue. Corrected algorithm: old_loc is the original locator in the packet, new_loc is the value to overwrite with and is found in the lookup table. old_flag is the old flag value (zero of CSUM_NEUTRAL_FLAG) and new_flag is then (old_flag ^ CSUM_NEUTRAL_FLAG) & CSUM_NEUTRAL_FLAG. Need SUM(new_id + new_flag + diff) == SUM(old_id + old_flag) for checksum neutral translation. Solving for diff gives: diff = (old_id - new_id) + (old_flag - new_flag) compute_csum_diff8(new_id, old_id) gives old_id - new_id If old_flag is set old_flag - new_flag = old_flag = CSUM_NEUTRAL_FLAG Else old_flag - new_flag = -new_flag = ~CSUM_NEUTRAL_FLAG Tested: - Implemented a user space program that creates random addresses and random locators to overwrite. Compares the checksum over the address before and after translation (must always be equal) - Enabled ILA router and showed proper operation. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net: ipv4: Add ability to have GRE ignore DF bit in IPv4 payloadsPhilip Prindeville
In the presence of firewalls which improperly block ICMP Unreachable (including Fragmentation Required) messages, Path MTU Discovery is prevented from working. A workaround is to handle IPv4 payloads opaquely, ignoring the DF bit--as is done for other payloads like AppleTalk--and doing transparent fragmentation and reassembly. Redux includes the enforcement of mutual exclusion between this feature and Path MTU Discovery as suggested by Alexander Duyck. Cc: Alexander Duyck <alexander.duyck@gmail.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Philip Prindeville <philipp@redfish-solutions.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-156lowpan: add support for 802.15.4 short addr handlingAlexander Aring
This patch adds necessary handling for use the short address for 802.15.4 6lowpan. It contains support for IPHC address compression and new matching algorithmn to decide which link layer address will be used for 802.15.4 frame. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-156lowpan: add support for getting short addressAlexander Aring
In case of sending RA messages we need some way to get the short address from an 802.15.4 6LoWPAN interface. This patch will add a temporary debugfs entry for experimental userspace api. Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-156lowpan: introduce 6lowpan-ndAlexander Aring
This patch introduce different 6lowpan handling for receive and transmit NS/NA messages for the ipv6 neighbour discovery. The first use-case is for supporting 802.15.4 short addresses inside the option fields and handling for RFC6775 6CO option field as userspace option. Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15ipv6: export several functionsAlexander Aring
This patch exports some neighbour discovery functions which can be used by 6lowpan neighbour discovery ops functionality then. Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15ipv6: introduce neighbour discovery opsAlexander Aring
This patch introduces neighbour discovery ops callback structure. The idea is to separate the handling for 6LoWPAN into the 6lowpan module. These callback offers 6lowpan different handling, such as 802.15.4 short address handling or RFC6775 (Neighbor Discovery Optimization for IPv6 over 6LoWPANs). Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15addrconf: put prefix address add in an own functionAlexander Aring
This patch moves the functionality to add a RA PIO prefix generated address in an own function. This move prepares to add a hook for adding a second address for a second link-layer address. E.g. short address for 802.15.4 6LoWPAN. Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15ndisc: add __ndisc_fill_addr_option functionAlexander Aring
This patch adds __ndisc_fill_addr_option as low-level function for ndisc_fill_addr_option which doesn't depend on net_device parameter. Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-156lowpan: remove ipv6 module requestAlexander Aring
Since we use exported function from ipv6 kernel module we don't need to request the module anymore to have ipv6 functionality. Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-156lowpan: add 802.15.4 short addr slaacAlexander Aring
This patch adds the autoconfiguration if a valid 802.15.4 short address is available for 802.15.4 6LoWPAN interfaces. Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-156lowpan: add private neighbour dataAlexander Aring
This patch will introduce a 6lowpan neighbour private data. Like the interface private data we handle private data for generic 6lowpan and for link-layer specific 6lowpan. The current first use case if to save the short address for a 802.15.4 6lowpan neighbour. Cc: David S. Miller <davem@davemloft.net> Reviewed-by: Stefan Schmidt <stefan@osg.samsung.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Alexander Aring <aar@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15htb: call qdisc_root with rcu read lock heldFlorian Westphal
saw a debug splat: net/include/net/sch_generic.h:287 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by kworker/2:1/710: #0: ("events"){.+.+.+}, at: [<ffffffff8106ca1d>] #1: ((&q->work)){+.+...}, at: [<ffffffff8106ca1d>] process_one_work+0x14d/0x690 Workqueue: events htb_work_func Call Trace: [<ffffffff812dc763>] dump_stack+0x85/0xc2 [<ffffffff8109fee7>] lockdep_rcu_suspicious+0xe7/0x120 [<ffffffff814ced47>] htb_work_func+0x67/0x70 Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net_sched: sch_fq: defer skb freeingEric Dumazet
sfq_reset() can use rtnl_kfree_skbs() instead of kfree_skb() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net_sched: sch_pie: defer skb freeingEric Dumazet
pie_change() can use rtnl_qdisc_drop() to benefit from deferred freeing. pie_reset() is already using qdisc_reset_queue() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net_sched: sch_netem: defer skb freeingEric Dumazet
rtnl_kfree_skbs() can be used in tfifo_reset() It would be nice if we could iterate through rb tree instead of removing one skb at a time, and build a single skb chain. But this is left for a future patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net_sched: sch_htb: defer skb freeingEric Dumazet
Both htb_reset() and htb_destroy() can use __qdisc_reset_queue() instead of __skb_queue_purge() to defer skb freeing of internal queues. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net_sched: sch_hhf: defer skb freeingEric Dumazet
Both hhf_reset() and hhf_change() can use rtnl_kfree_skbs() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net_sched: fq_codel: defer skb freeingEric Dumazet
Both fq_codel_change() and fq_codel_reset() can use rtnl_kfree_skbs() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net_sched: sch_fq: defer skb freeingEric Dumazet
Both fq_change() and fq_reset() can use rtnl_kfree_skbs() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net_sched: sch_codel: defer skb freeing in codel_change()Eric Dumazet
codel_change() can use rtnl_qdisc_drop() to defer expensive skb freeing after locks are released. codel_reset() already has support for deferred skb freeing because it uses qdisc_reset_queue() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net_sched: sch_choke: defer skb freeingEric Dumazet
choke_reset() and choke_change() can use rtnl_qdisc_drop() to defer expensive skb freeing after locks are released. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net_sched: add the ability to defer skb freeingEric Dumazet
qdisc are changed under RTNL protection and often while blocking BH and root qdisc spinlock. When lots of skbs need to be dropped, we free them under these locks causing TX/RX freezes, and more generally latency spikes. This commit adds rtnl_kfree_skbs(), used to queue skbs for deferred freeing. Actual freeing happens right after RTNL is released, with appropriate scheduling points. rtnl_qdisc_drop() can also be used in place of disc_drop() when RTNL is held. qdisc_reset_queue() and __qdisc_reset_queue() get the new behavior, so standard qdiscs like pfifo, pfifo_fast... have their ->reset() method automatically handled. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15tipc: add neighbor monitoring frameworkJon Paul Maloy
TIPC based clusters are by default set up with full-mesh link connectivity between all nodes. Those links are expected to provide a short failure detection time, by default set to 1500 ms. Because of this, the background load for neighbor monitoring in an N-node cluster increases with a factor N on each node, while the overall monitoring traffic through the network infrastructure increases at a ~(N * (N - 1)) rate. Experience has shown that such clusters don't scale well beyond ~100 nodes unless we significantly increase failure discovery tolerance. This commit introduces a framework and an algorithm that drastically reduces this background load, while basically maintaining the original failure detection times across the whole cluster. Using this algorithm, background load will now grow at a rate of ~(2 * sqrt(N)) per node, and at ~(2 * N * sqrt(N)) in traffic overhead. As an example, each node will now have to actively monitor 38 neighbors in a 400-node cluster, instead of as before 399. This "Overlapping Ring Supervision Algorithm" is completely distributed and employs no centralized or coordinated state. It goes as follows: - Each node makes up a linearly ascending, circular list of all its N known neighbors, based on their TIPC node identity. This algorithm must be the same on all nodes. - The node then selects the next M = sqrt(N) - 1 nodes downstream from itself in the list, and chooses to actively monitor those. This is called its "local monitoring domain". - It creates a domain record describing the monitoring domain, and piggy-backs this in the data area of all neighbor monitoring messages (LINK_PROTOCOL/STATE) leaving that node. This means that all nodes in the cluster eventually (default within 400 ms) will learn about its monitoring domain. - Whenever a node discovers a change in its local domain, e.g., a node has been added or has gone down, it creates and sends out a new version of its node record to inform all neighbors about the change. - A node receiving a domain record from anybody outside its local domain matches this against its own list (which may not look the same), and chooses to not actively monitor those members of the received domain record that are also present in its own list. Instead, it relies on indications from the direct monitoring nodes if an indirectly monitored node has gone up or down. If a node is indicated lost, the receiving node temporarily activates its own direct monitoring towards that node in order to confirm, or not, that it is actually gone. - Since each node is actively monitoring sqrt(N) downstream neighbors, each node is also actively monitored by the same number of upstream neighbors. This means that all non-direct monitoring nodes normally will receive sqrt(N) indications that a node is gone. - A major drawback with ring monitoring is how it handles failures that cause massive network partitionings. If both a lost node and all its direct monitoring neighbors are inside the lost partition, the nodes in the remaining partition will never receive indications about the loss. To overcome this, each node also chooses to actively monitor some nodes outside its local domain. Those nodes are called remote domain "heads", and are selected in such a way that no node in the cluster will be more than two direct monitoring hops away. Because of this, each node, apart from monitoring the member of its local domain, will also typically monitor sqrt(N) remote head nodes. - As an optimization, local list status, domain status and domain records are marked with a generation number. This saves senders from unnecessarily conveying unaltered domain records, and receivers from performing unneeded re-adaptations of their node monitoring list, such as re-assigning domain heads. - As a measure of caution we have added the possibility to disable the new algorithm through configuration. We do this by keeping a threshold value for the cluster size; a cluster that grows beyond this value will switch from full-mesh to ring monitoring, and vice versa when it shrinks below the value. This means that if the threshold is set to a value larger than any anticipated cluster size (default size is 32) the new algorithm is effectively disabled. A patch set for altering the threshold value and for listing the table contents will follow shortly. - This change is fully backwards compatible. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15net sched actions: bug fix dumping actions directly didnt produce NLMSG_DONEJamal Hadi Salim
This refers to commands to direct action access as follows: sudo tc actions add action drop index 12 sudo tc actions add action pipe index 10 And then dumping them like so: sudo tc actions ls action gact iproute2 worked because it depended on absence of TCA_ACT_TAB TLV as end of message. This fix has been tested with iproute2 and is backward compatible. Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>