summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-07-04net: simplify and make pkt_type_ok() available for other usersJamal Hadi Salim
Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-04Merge 4.7-rc6 into tty-nextGreg Kroah-Hartman
We want the tty/serial fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-04batman-adv: split routing API data structure in subobjectsAntonio Quartulli
The routing API data structure contains several function pointers that can easily be grouped together based on the component they work with. Split the API in subobjects in order to improve definition readability. At the same time, remove the "bat_" prefix from the API object and its fields names. These are batman-adv private structs and there is no need to always prepend such prefix, which only makes function invocations much much longer. Signed-off-by: Antonio Quartulli <a@unstable.cc> Reviewed-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-04batman-adv: throughput meter implementationAntonio Quartulli
The throughput meter module is a simple, kernel-space replacement for throughtput measurements tool like iperf and netperf. It is intended to approximate TCP behaviour. It is invoked through batctl: the protocol is connection oriented, with cumulative acknowledgment and a dynamic-size sliding window. The test *can* be interrupted by batctl. A receiver side timeout avoids unlimited waitings for sender packets: after one second of inactivity, the receiver abort the ongoing test. Based on a prototype from Edo Monticelli <montik@autistici.org> Signed-off-by: Antonio Quartulli <antonio.quartulli@open-mesh.com> Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-04batman-adv: return netdev status in the TX pathAntonio Quartulli
Return the proper netdev TX status along the TX path so that the tp_meter can understand when the queue is full and should stop sending packets. Signed-off-by: Antonio Quartulli <antonio.quartulli@open-mesh.com> Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-04batman-adv: add netlink command to query generic mesh information filesMatthias Schiffer
BATADV_CMD_GET_MESH_INFO is used to query basic information about a batman-adv softif (name, index and MAC address for both the softif and the primary hardif; routing algorithm; batman-adv version). Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Andrew Lunn <andrew@lunn.ch> [sven.eckelmann@open-mesh.com: Reduce the number of changes to BATADV_CMD_GET_MESH_INFO, add missing kerneldoc, add policy for attributes] Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-04batman-adv: add generic netlink family for batman-advMatthias Schiffer
debugfs is currently severely broken virtually everywhere in the kernel where files are dynamically added and removed (see http://lkml.iu.edu/hypermail/linux/kernel/1506.1/02196.html for some details). In addition to that, debugfs is not namespace-aware. Instead of adding new debugfs entries, the whole infrastructure should be moved to netlink. This will fix the long standing problem of large buffers for debug tables and hard to parse text files. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: Andrew Lunn <andrew@lunn.ch> [sven.eckelmann@open-mesh.com: Strip down patch to only add genl family, add missing kerneldoc] Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-07-04NFC: llcp: Use dynamic debug for hex dumpThierry Escande
LLCP skb tx and rx functions now use print_hex_dump_debug() making these verbose traces controllable using dynamic debug. Signed-off-by: Thierry Escande <thierry.escande@collabora.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-07-04NFC: digital: Add a delay between poll cyclesThierry Escande
This replaces the polling work struct with a delayed work struct and add a 10 ms delay between 2 poll cycles. This avoids to flood the device with 'switch off'/'switch on' commands. Signed-off-by: Thierry Escande <thierry.escande@collabora.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-07-04NFC: hci: delete unused nfc_llc_get_rx_head_tail_room()Denys Vlasenko
It used to be EXPORTed, but then EXPORT usage was cleaned up (in 2012), without noticing that the function has no users at all (and curiously, never had any users). Delete it. While at it, remove non-static "inline" hints on nearby functions: these hints don't work across compilation units anyway, and these functions are not used in their .c file, thus they are never inlined. IOW: "inline" here does not help in any way. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: Samuel Ortiz <sameo@linux.intel.com> CC: Christophe Ricard <christophe.ricard@gmail.com> CC: linux-wireless@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2016-07-03netfilter: Convert FWINV<[foo]> macros and uses to NF_INVFJoe Perches
netfilter uses multiple FWINV #defines with identical form that hide a specific structure variable and dereference it with a invflags member. $ git grep "#define FWINV" include/linux/netfilter_bridge/ebtables.h:#define FWINV(bool,invflg) ((bool) ^ !!(info->invflags & invflg)) net/bridge/netfilter/ebtables.c:#define FWINV2(bool, invflg) ((bool) ^ !!(e->invflags & invflg)) net/ipv4/netfilter/arp_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(arpinfo->invflags & (invflg))) net/ipv4/netfilter/ip_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ipinfo->invflags & (invflg))) net/ipv6/netfilter/ip6_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ip6info->invflags & (invflg))) net/netfilter/xt_tcpudp.c:#define FWINVTCP(bool, invflg) ((bool) ^ !!(tcpinfo->invflags & (invflg))) Consolidate these macros into a single NF_INVF macro. Miscellanea: o Neaten the alignment around these uses o A few lines are > 80 columns for intelligibility Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-02net/devlink: Add E-Switch mode controlOr Gerlitz
Add the commands to set and show the mode of SRIOV E-Switch, two modes are supported: * legacy: operating in the "old" L2 based mode (DMAC --> VF vport) * switchdev: the E-Switch is referred to as whitebox switch configured using standard tools such as tc, bridge, openvswitch etc. To allow working with the tools, for each VF, a VF representor netdevice is created by the E-Switch manager vendor device driver instance (e.g PF). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01Merge tag 'batadv-next-for-davem-20160701' of ↵David S. Miller
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature patchset includes the following changes: - two patches with minimal clean up work by Antonio Quartulli and Simon Wunderlich - eight patches of B.A.T.M.A.N. V, API and documentation clean up work, by Antonio Quartulli and Marek Lindner - Andrew Lunn fixed the skb priority adoption when forwarding fragmented packets (two patches) - Multicast optimization support is now enabled for bridges which comes with some protocol updates, by Linus Luessing ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01tipc: fix nl compat regression for link statisticsRichard Alpe
Fix incorrect use of nla_strlcpy() where the first NLA_HDRLEN bytes of the link name where left out. Making the output of tipc-config -ls look something like: Link statistics: dcast-link 1:data0-1.1.2:data0 1:data0-1.1.3:data0 Also, for the record, the patch that introduce this regression claims "Sending the whole object out can cause a leak". Which isn't very likely as this is a compat layer, where the data we are parsing is generated by us and we know the string to be NULL terminated. But you can of course never be to secure. Fixes: 5d2be1422e02 (tipc: fix an infoleak in tipc_nl_compat_link_dump) Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: Do not send a pong to an incoming ping with 0 src portSowmini Varadhan
RDS ping messages are sent with a non-zero src port to a zero dst port, so that the rds pong messages can be sent back to the originators src port. However if a confused/malicious sender sends a ping with a 0 src port, we'd have an infinite ping-pong loop. To avoid this, the receiver should ignore ping messages with a 0 src port. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Simplify reconnect to avoid duelling reconnnect attemptsSowmini Varadhan
When reconnecting, the peer with the smaller IP address will initiate the reconnect, to avoid needless duelling SYN issues. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Hooks to set up a single connection pathSowmini Varadhan
This patch adds ->conn_path_connect callbacks in the rds_transport that are used to set up a single connection path. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: make receive path use the rds_conn_pathSowmini Varadhan
The ->sk_user_data contains a pointer to the rds_conn_path for the socket. Use this consistently in the rds_tcp_data_ready callbacks to get the rds_conn_path for rds_recv_incoming. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: make ->sk_user_data point to a rds_conn_pathSowmini Varadhan
The socket callbacks should all operate on a struct rds_conn_path, in preparation for a MP capable RDS-TCP. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Refactor connection destruction to handle multiple pathsSowmini Varadhan
A single rds_connection may have multiple rds_conn_paths that have to be carefully and correctly destroyed, for both rmmod and netns-delete cases. For both cases, we extract a single rds_tcp_connection for each conn into a temporary list, and then invoke rds_conn_destroy() which iteratively dismantles every path in the rds_connection. For the netns deletion case, we additionally have to make sure that we do not leave a socket in TIME_WAIT state, as this will hold up the netns deletion. Thus we call rds_tcp_conn_paths_destroy() to reset state quickly. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Make rds_tcp_connection track the rds_conn_pathSowmini Varadhan
The struct rds_tcp_connection is the transport-specific private data structure that tracks TCP information per rds_conn_path. Modify this structure to have a back-pointer to the rds_conn_path for which it is the ->cp_transport_data. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: TCP: Remove dead logic around c_passive in rds-tcpSowmini Varadhan
The c_passive bit is only intended for the IB transport and will never be encountered in rds-tcp, so remove the dead logic that predicates on this bit. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01RDS: Rework path specific indirectionsSowmini Varadhan
Refactor code to avoid separate indirections for single-path and multipath transports. All transports (both single and mp-capable) will get a pointer to the rds_conn_path, and can trivially derive the rds_connection from the ->cp_conn. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01cgroup: bpf: Add bpf_skb_in_cgroup_protoMartin KaFai Lau
Adds a bpf helper, bpf_skb_in_cgroup, to decide if a skb->sk belongs to a descendant of a cgroup2. It is similar to the feature added in netfilter: commit c38c4597e4bf ("netfilter: implement xt_cgroup cgroup2 path match") The user is expected to populate a BPF_MAP_TYPE_CGROUP_ARRAY which will be used by the bpf_skb_in_cgroup. Modifications to the bpf verifier is to ensure BPF_MAP_TYPE_CGROUP_ARRAY and bpf_skb_in_cgroup() are always used together. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Tejun Heo <tj@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net_sched: fix mirrored packets checksumWANG Cong
Similar to commit 9b368814b336 ("net: fix bridge multicast packet checksum validation") we need to fixup the checksum for CHECKSUM_COMPLETE when pushing skb on RX path. Otherwise we get similar splats. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01packet: Use symmetric hash for PACKET_FANOUT_HASH.David S. Miller
People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that they want packets going in both directions on a flow to hash to the same bucket. The core kernel SKB hash became non-symmetric when the ipv6 flow label and other entities were incorporated into the standard flow hash order to increase entropy. But there are no users of PACKET_FANOUT_HASH who want an assymetric hash, they all want a symmetric one. Therefore, use the flow dissector to compute a flat symmetric hash over only the protocol, addresses and ports. This hash does not get installed into and override the normal skb hash, so this change has no effect whatsoever on the rest of the stack. Reported-by: Eric Leblond <eric@regit.org> Tested-by: Eric Leblond <eric@regit.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01bpf: refactor bpf_prog_get and type check into helperDaniel Borkmann
Since bpf_prog_get() and program type check is used in a couple of places, refactor this into a small helper function that we can make use of. Since the non RO prog->aux part is not used in performance critical paths and a program destruction via RCU is rather very unlikley when doing the put, we shouldn't have an issue just doing the bpf_prog_get() + prog->type != type check, but actually not taking the ref at all (due to being in fdget() / fdput() section of the bpf fd) is even cleaner and makes the diff smaller as well, so just go for that. Callsites are changed to make use of the new helper where possible. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01netfilter: Remove references to obsolete CONFIG_IP_ROUTE_FWMARKMoritz Sichert
This option was removed in commit 47dcf0cb1005 ("[NET]: Rethink mark field in struct flowi"). Signed-off-by: Moritz Sichert <moritz+linux@sichert.me> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-01etherdevice.h & bridge: netfilter: Add and use ether_addr_equal_maskedJoe Perches
There are code duplications of a masked ethernet address comparison here so make it a separate function instead. Miscellanea: o Neaten alignment of FWINV macro uses to make it clearer for the reader Signed-off-by: Joe Perches <joe@perches.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-01netfilter: x_tables: simplify ip{6}table_mangle_hook()Pablo Neira Ayuso
No need for a special case to handle NF_INET_POST_ROUTING, this is basically the same handling as for prerouting, input, forward. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-01netfilter: conntrack: avoid integer overflow when resizingFlorian Westphal
Can overflow so we might allocate very small table when bucket count is high on a 32bit platform. Note: resize is only possible from init_netns. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-07-01net: introduce NETDEV_CHANGE_TX_QUEUE_LENJason Wang
This patch introduces a new event - NETDEV_CHANGE_TX_QUEUE_LEN, this will be triggered when tx_queue_len. It could be used by net device who want to do some processing at that time. An example is tun who may want to resize tx array when tx_queue_len is changed. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net/sched/sch_hfsc.c: anchor virtual curve at proper vt in hfsc_change_fsc()Michal Soltys
cl->cl_vt alone is relative only to the current backlog period, while the curve operates on cumulative virtual time. This patch adds missing cl->cl_vtoff. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net/sched/sch_hfsc.c: go passive after vt updateMichal Soltys
When a class is going passive, it should update its cl_vt first to be consistent with the last dequeue operation. Otherwise its cl_vt will be one packet behind and parent's cvtmax might not be updated as well. One possible side effect is if some class goes passive and subsequently goes active /without/ its parent going passive - with cl_vt lagging one packet behind - comparison made in init_vf() will be affected (same period). Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net/sched/sch_hfsc.c: remove leftover dlist and droplistMichal Soltys
This is update to: commit a09ceb0e08140a ("sched: remove qdisc->drop") That commit removed qdisc->drop, but left alone dlist and droplist that no longer serve any meaningful purpose. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net/sched/sch_hfsc.c: add unlikely() in qdisc_peek_len()Michal Soltys
The condition can only succeed on wrong configurations. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01net/sched/sch_hfsc.c: handle corner cases where head may change invalidating ↵Michal Soltys
calculated deadline Realtime scheduling implemented in HFSC uses head of the queue to make the decision about which packet to schedule next. But in case of any head drop, the deadline calculated for the previous head is not necessarily correct for the next head (unless both packets have the same length). Thanks to peek() function used during dequeue - which internally is a dequeue operation - hfsc is almost safe from this issue, as peek() dequeues and isolates the head storing it temporarily until the real dequeue happens. But there is one exception: if after the class activation a drop happens before the first dequeue operation, there's never a chance to do the peek(). Adding peek() call in enqueue - if this is the first packet in a new backlog period AND the scheduler has realtime curve defined - fixes that one corner case. The 1st hfsc_dequeue() will use that peeked packet, similarly as every subsequent hfsc_dequeue() call uses packet peeked by the previous call. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01tcp: md5: use kmalloc() backed scratch areasEric Dumazet
Some arches have virtually mapped kernel stacks, or will soon have. tcp_md5_hash_header() uses an automatic variable to copy tcp header before mangling th->check and calling crypto function, which might be problematic on such arches. David says that using percpu storage is also problematic on non SMP builds. Just use kmalloc() to allocate scratch areas. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-01rxrpc: Fix processing of authenticated/encrypted jumbo packetsDavid Howells
When a jumbo packet is being split up and processed, the crypto checksum for each split-out packet is in the jumbo header and needs placing in the reconstructed packet header. When the code was changed to keep the stored copy of the packet header in host byte order, this reconstruction was missed. Found with sparse with CF=-D__CHECK_ENDIAN__: ../net/rxrpc/input.c:479:33: warning: incorrect type in assignment (different base types) ../net/rxrpc/input.c:479:33: expected unsigned short [unsigned] [usertype] _rsvd ../net/rxrpc/input.c:479:33: got restricted __be16 [addressable] [usertype] _rsvd Fixes: 0d12f8a4027d021c9cc942f09f38d28288020c5d ("rxrpc: Keep the skb private record of the Rx header in host byte order") Signed-off-by: David Howells <dhowells@redhat.com>
2016-06-30ipv4: Fix ip_skb_dst_mtu to use the sk passed by ip_finish_outputShmulik Ladkani
ip_skb_dst_mtu uses skb->sk, assuming it is an AF_INET socket (e.g. it calls ip_sk_use_pmtu which casts sk as an inet_sk). However, in the case of UDP tunneling, the skb->sk is not necessarily an inet socket (could be AF_PACKET socket, or AF_UNSPEC if arriving from tun/tap). OTOH, the sk passed as an argument throughout IP stack's output path is the one which is of PMTU interest: - In case of local sockets, sk is same as skb->sk; - In case of a udp tunnel, sk is the tunneling socket. Fix, by passing ip_finish_output's sk to ip_skb_dst_mtu. This augments 7026b1ddb6 'netfilter: Pass socket pointer down through okfn().' Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30fib_rules: Added NLM_F_EXCL support to fib_nl_newruleMateusz Bajorski
When adding rule with NLM_F_EXCL flag then check if the same rule exist. If yes then exit with -EEXIST. This is already implemented in iproute2: if (cmd == RTM_NEWRULE) { req.n.nlmsg_flags |= NLM_F_CREATE|NLM_F_EXCL; req.r.rtm_type = RTN_UNICAST; } Tested ipv4 and ipv6 with net-next linux on qemu x86 expected behavior after patch: localhost ~ # ip rule 0: from all lookup local 32766: from all lookup main 32767: from all lookup default localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005 localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005 RTNETLINK answers: File exists localhost ~ # ip rule 0: from all lookup local 1005: from 10.46.177.97 lookup 104 32766: from all lookup main 32767: from all lookup default There was already topic regarding this but I don't see any changes merged and problem still occurs. https://lkml.kernel.org/r/1135778809.5944.7.camel+%28%29+localhost+%21+localdomain Signed-off-by: Mateusz Bajorski <mateusz.bajorski@nokia.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30tcp: add an ability to dump and restore window parametersAndrey Vagin
We found that sometimes a restored tcp socket doesn't work. A reason of this bug is incorrect window parameters and in this case tcp_acceptable_seq() returns tcp_wnd_end(tp) instead of tp->snd_nxt. The other side drops packets with this seq, because seq is less than tp->rcv_nxt ( tcp_sequence() ). Data from a send queue is sent only if there is enough space in a window, so when we restore unacked data, we need to expand a window to fit this data. This was in a first version of this patch: "tcp: extend window to fit all restored unacked data in a send queue" Then Alexey recommended me to restore window parameters instead of adjusted them according with data in a sent queue. This sounds resonable. rcv_wnd has to be restored, because it was reported to another side and the offered window is never shrunk. One of reasons why we need to restore snd_wnd was described above. Cc: Pavel Emelyanov <xemul@parallels.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30net: bridge: add support for IGMP/MLD stats and export them via netlinkNikolay Aleksandrov
This patch adds stats support for the currently used IGMP/MLD types by the bridge. The stats are per-port (plus one stat per-bridge) and per-direction (RX/TX). The stats are exported via netlink via the new linkxstats API (RTM_GETSTATS). In order to minimize the performance impact, a new option is used to enable/disable the stats - multicast_stats_enabled, similar to the recent vlan stats. Also in order to avoid multiple IGMP/MLD type lookups and checks, we make use of the current "igmp" member of the bridge private skb->cb region to record the type on Rx (both host-generated and external packets pass by multicast_rcv()). We can do that since the igmp member was used as a boolean and all the valid IGMP/MLD types are positive values. The normal bridge fast-path is not affected at all, the only affected paths are the flooding ones and since we make use of the IGMP/MLD type, we can quickly determine if the packet should be counted using cache-hot data (cb's igmp member). We add counters for: * IGMP Queries * IGMP Leaves * IGMP v1/v2/v3 reports * MLD Queries * MLD Leaves * MLD v1/v2 reports These are invaluable when monitoring or debugging complex multicast setups with bridges. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30net: rtnetlink: add support for the IFLA_STATS_LINK_XSTATS_SLAVE attributeNikolay Aleksandrov
This patch adds support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute which allows to export per-slave statistics if the master device supports the linkxstats callback. The attribute is passed down to the linkxstats callback and it is up to the callback user to use it (an example has been added to the only current user - the bridge). This allows us to query only specific slaves of master devices like bridge ports and export only what we're interested in instead of having to dump all ports and searching only for a single one. This will be used to export per-port IGMP/MLD stats and also per-port vlan stats in the future, possibly other statistics as well. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-30mac80211: fix fq lockdep warningsMichal Kazior
Some lockdep assertions were not fulfilled and resulted in a kernel warning/call trace if driver used intermediate software queues (e.g. ath10k). Existing code sequences should've guaranteed safety but it's always good to be extra careful. The call trace could look like this: [ 237.335805] ------------[ cut here ]------------ [ 237.335852] WARNING: CPU: 3 PID: 1921 at include/net/fq_impl.h:22 fq_flow_dequeue+0xed/0x140 [mac80211] [ 237.335855] Modules linked in: ath10k_pci(E-) ath10k_core(E) ath(E) mac80211(E) cfg80211(E) [ 237.335913] CPU: 3 PID: 1921 Comm: rmmod Tainted: G W E 4.7.0-rc4-wt-ath+ #1377 [ 237.335916] Hardware name: Hewlett-Packard HP ProBook 6540b/1722, BIOS 68CDD Ver. F.04 01/27/2010 [ 237.335918] 00200286 00200286 eff85dac c14151e2 f901574e 00000000 eff85de0 c1081075 [ 237.335928] c1ab91f0 00000003 00000781 f901574e 00000016 f8fbabad f8fbabad 00000016 [ 237.335938] eb24ff60 00000000 ef3886c0 eff85df4 c10810ba 00000009 00000000 00000000 [ 237.335948] Call Trace: [ 237.335953] [<c14151e2>] dump_stack+0x76/0xb4 [ 237.335957] [<c1081075>] __warn+0xe5/0x100 [ 237.336002] [<f8fbabad>] ? fq_flow_dequeue+0xed/0x140 [mac80211] [ 237.336046] [<f8fbabad>] ? fq_flow_dequeue+0xed/0x140 [mac80211] [ 237.336053] [<c10810ba>] warn_slowpath_null+0x2a/0x30 [ 237.336095] [<f8fbabad>] fq_flow_dequeue+0xed/0x140 [mac80211] [ 237.336137] [<f8fbc67a>] fq_flow_reset.constprop.56+0x2a/0x90 [mac80211] [ 237.336180] [<f8fbc79a>] fq_reset.constprop.59+0x2a/0x50 [mac80211] [ 237.336222] [<f8fc04e8>] ieee80211_txq_teardown_flows+0x38/0x40 [mac80211] [ 237.336258] [<f8f7c1a4>] ieee80211_unregister_hw+0xe4/0x120 [mac80211] [ 237.336275] [<f933f536>] ath10k_mac_unregister+0x16/0x50 [ath10k_core] [ 237.336292] [<f934592d>] ath10k_core_unregister+0x3d/0x90 [ath10k_core] [ 237.336301] [<f85f8836>] ath10k_pci_remove+0x36/0xa0 [ath10k_pci] [ 237.336307] [<c1470388>] pci_device_remove+0x38/0xb0 ... Fixes: 5caa328e3811 ("mac80211: implement codel on fair queuing flows") Fixes: fa962b92120b ("mac80211: implement fair queueing per txq") Tested-by: Kalle Valo <kvalo@qca.qualcomm.com> Reported-by: Kalle Valo <kvalo@qca.qualcomm.com> Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2016-06-30mac80211: use common cleanup for user/!user_mpmBob Copeland
We've accumulated a couple of different fixes now to mesh_sta_cleanup() due to the different paths that user_mpm and !user_mpm cases take -- one fix to flush nexthop paths and one to fix the counting. The only caller of mesh_plink_deactivate() is mesh_sta_cleanup(), so we can push the user_mpm checks down into there in order to share more code. In doing so, we can remove an extra call to mesh_path_flush_by_nexthop() and the (unnecessary) call to mesh_accept_plinks_update(). This will also ensure the powersaving state code gets called in the user_mpm case. The only cleanup tasks we need to avoid when MPM is in user-space are sending the peering frames and stopping the plink timer, so wrap those in the appropriate check. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2016-06-30mac80211: Encrypt "Group addressed privacy" action framesMasashi Honma
Previously, the action frames to group address was not encrypted. But [1] "Table 8-38 Category values" indicates "Mesh" and "Multihop" category action frames should be encrypted (Group addressed privacy == yes). And the encyption key should be MGTK ([1] 10.13 Group addressed robust management frame procedures). So this patch modifies the code to make it suitable for spec. [1] IEEE Std 802.11-2012 Signed-off-by: Masashi Honma <masashi.honma@gmail.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2016-06-30mac80211: silence an uninitialized variable warningDan Carpenter
We normally return an uninitialized value, but no one checks it so it doesn't matter. Anyway, let's silence the static checker warning. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2016-06-30nl80211: improve nl80211_parse_mesh_config type checkingArnd Bergmann
When building a kernel with W=1, the nl80211.c file causes a number of warnings, all about the same problem: net/wireless/nl80211.c: In function 'nl80211_parse_mesh_config': net/wireless/nl80211.c:5287:103: error: comparison is always false due to limited range of data type [-Werror=type-limits] net/wireless/nl80211.c:5290:96: error: comparison is always false due to limited range of data type [-Werror=type-limits] net/wireless/nl80211.c:5293:124: error: comparison is always false due to limited range of data type [-Werror=type-limits] net/wireless/nl80211.c:5295:148: error: comparison is always false due to limited range of data type [-Werror=type-limits] net/wireless/nl80211.c:5298:106: error: comparison is always false due to limited range of data type [-Werror=type-limits] net/wireless/nl80211.c:5305:116: error: comparison is always false due to limited range of data type [-Werror=type-limits] The problem is that gcc does not notice that the check is generate by a macro, so it complains about comparing an unsigned type against 0. I've tried to come up with a way to rephrase that code in a way that avoids the warnings and otherwise improves the code as well. This uses a set of new helper functions that perform the range checking, and should provide slightly better type safety than the older patch, at the expense of adding 44 lines to the code. Binary code size is basically unchanged though (20 bytes added to 126561 bytes .text). Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2016-06-30bpf: add bpf_skb_change_type helperDaniel Borkmann
This work adds a helper for changing skb->pkt_type in a controlled way. We only allow a subset of possible values and can extend that in future should other use cases come up. Doing this as a helper has the advantage that errors can be handeled gracefully and thus helper kept extensible. It's a write counterpart to pkt_type member we can already read from struct __sk_buff context. Major use case is to change incoming skbs to PACKET_HOST in a programmatic way instead of having to recirculate via redirect(..., BPF_F_INGRESS), for example. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>