summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2016-06-07tcp: record TLP and ER timer stats in v6 statsYuchung Cheng
The v6 tcp stats scan do not provide TLP and ER timer information correctly like the v4 version . This patch fixes that. Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)") Fixes: eed530b6c676 ("tcp: early retransmit") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07net: sched: fix tc_should_offload for specific clsact classesDaniel Borkmann
When offloading classifiers such as u32 or flower to hardware, and the qdisc is clsact (TC_H_CLSACT), then we need to differentiate its classes, since not all of them handle ingress, therefore we must leave those in software path. Add a .tcf_cl_offload() callback, so we can generically handle them, tested on ixgbe. Fixes: 10cbc6843446 ("net/sched: cls_flower: Hardware offloaded filters statistics support") Fixes: 5b33f48842fa ("net/flower: Introduce hardware offload support") Fixes: a1b7c5fd7fe9 ("net: sched: add cls_u32 offload hooks for netdevs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07act_police: fix a crash during removalWANG Cong
The police action is using its own code to initialize tcf hash info, which makes us to forgot to initialize a->hinfo correctly. Fix this by calling the helper function tcf_hash_create() directly. This patch fixed the following crash: BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 IP: [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91 PGD d3c34067 PUD d3e18067 PMD 0 Oops: 0000 [#1] SMP CPU: 2 PID: 853 Comm: tc Not tainted 4.6.0+ #87 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800d3e28040 ti: ffff8800d3f6c000 task.ti: ffff8800d3f6c000 RIP: 0010:[<ffffffff810c099f>] [<ffffffff810c099f>] __lock_acquire+0xd3/0xf91 RSP: 0000:ffff88011b203c80 EFLAGS: 00010002 RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000028 RBP: ffff88011b203d40 R08: 0000000000000001 R09: 0000000000000000 R10: ffff88011b203d58 R11: ffff88011b208000 R12: 0000000000000001 R13: ffff8800d3e28040 R14: 0000000000000028 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88011b200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000028 CR3: 00000000d4be1000 CR4: 00000000000006e0 Stack: ffff8800d3e289c0 0000000000000046 000000001b203d60 ffffffff00000000 0000000000000000 ffff880000000000 0000000000000000 ffffffff00000000 ffffffff8187142c ffff88011b203ce8 ffff88011b203ce8 ffffffff8101dbfc Call Trace: <IRQ> [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1 [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35 [<ffffffff8101dbfc>] ? native_sched_clock+0x1a/0x35 [<ffffffff810a9604>] ? sched_clock_local+0x11/0x78 [<ffffffff810bf6a1>] ? mark_lock+0x24/0x201 [<ffffffff810c1dbd>] lock_acquire+0x120/0x1b4 [<ffffffff810c1dbd>] ? lock_acquire+0x120/0x1b4 [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1 [<ffffffff81aad89f>] _raw_spin_lock_bh+0x3c/0x72 [<ffffffff8187142c>] ? __tcf_hash_release+0x77/0xd1 [<ffffffff8187142c>] __tcf_hash_release+0x77/0xd1 [<ffffffff81871a27>] tcf_action_destroy+0x49/0x7c [<ffffffff81870b1c>] tcf_exts_destroy+0x20/0x2d [<ffffffff8189273b>] u32_destroy_key+0x1b/0x4d [<ffffffff81892788>] u32_delete_key_freepf_rcu+0x1b/0x1d [<ffffffff810de3b8>] rcu_process_callbacks+0x610/0x82e [<ffffffff8189276d>] ? u32_destroy_key+0x4d/0x4d [<ffffffff81ab0bc1>] __do_softirq+0x191/0x3f4 Fixes: ddf97ccdd7cb ("net_sched: add network namespace support for tc actions") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07net: sched: do not acquire qdisc spinlock in qdisc/class stats dumpEric Dumazet
Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host agent [1] are problematic at scale : For each qdisc/class found in the dump, we currently lock the root qdisc spinlock in order to get stats. Sampling stats every 5 seconds from thousands of HTB classes is a challenge when the root qdisc spinlock is under high pressure. Not only the dumps take time, they also slow down the fast path (queue/dequeue packets) by 10 % to 20 % in some cases. An audit of existing qdiscs showed that sch_fq_codel is the only qdisc that might need the qdisc lock in fq_codel_dump_stats() and fq_codel_dump_class_stats() In v2 of this patch, I now use the Qdisc running seqcount to provide consistent reads of packets/bytes counters, regardless of 32/64 bit arches. I also changed rate estimators to use the same infrastructure so that they no longer need to lock root qdisc lock. [1] http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43838.pdf Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kevin Athey <kda@google.com> Cc: Xiaotian Pei <xiaotian@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07net_sched: transform qdisc running bit into a seqcountEric Dumazet
Instead of using a single bit (__QDISC___STATE_RUNNING) in sch->__state, use a seqcount. This adds lockdep support, but more importantly it will allow us to sample qdisc/class statistics without having to grab qdisc root lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07fq_codel: return non zero qlen in class dumpsEric Dumazet
We properly scan the flow list to count number of packets, but John passed 0 to gnet_stats_copy_queue() so we report a zero value to user space instead of the result. Fixes: 640158536632 ("net: sched: restrict use of qstats qlen") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.r.fastabend@intel.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07net: cls_u32: be more strict about skip-sw flagJakub Kicinski
Return an error if user requested skip-sw and the underlaying hardware cannot handle tc offloads (or offloads are disabled). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07net: cls_u32: fix error code for invalid flagsJakub Kicinski
'err' variable is not set in this test, we would return whatever previous test set 'err' to. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07net sched: indentation and other OCD stylistic fixesJamal Hadi Salim
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
2016-06-07net sched actions: aggregate dumping of actions timeinfoJamal Hadi Salim
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07net sched actions: introduce timestamp for firsttime useJamal Hadi Salim
Useful to know when the action was first used for accounting (and debugging) Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07net sched: actions use tcf_lastuse_update for consistencyJamal Hadi Salim
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07net/sched: cls_flower: Introduce support in SKIP SW flagAmir Vadai
In order to make a filter processed only by hardware, skip_sw flag should be supplied. This is an addition to the already existing skip_hw flag (filter will be processed by software only). If no flag is specified, filter will be processed by both software and hardware. If only hardware offloaded filters exist, fl_classify() will return without doing anything. A following userspace patch will be sent once kernel patch is accepted. Example: tc filter add dev enp0s9 protocol ip prio 20 parent ffff: \ flower \ ip_proto 6 \ indev enp0s9 \ skip_sw \ action skbedit mark 0x1234 Signed-off-by: Amir Vadai <amirva@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07net: get rid of spin_trylock() in net_tx_action()Eric Dumazet
Note: Tom Herbert posted almost same patch 3 months back, but for different reasons. The reasons we want to get rid of this spin_trylock() are : 1) Under high qdisc pressure, the spin_trylock() has almost no chance to succeed. 2) We loop multiple times in softirq handler, eventually reaching the max retry count (10), and we schedule ksoftirqd. Since we want to adhere more strictly to ksoftirqd being waked up in the future (https://lwn.net/Articles/687617/), better avoid spurious wakeups. 3) calls to __netif_reschedule() dirty the cache line containing q->next_sched, slowing down the owner of qdisc. 4) RT kernels can not use the spin_trylock() here. With help of busylock, we get the qdisc spinlock fast enough, and the trylock trick brings only performance penalty. Depending on qdisc setup, I observed a gain of up to 19 % in qdisc performance (1016600 pps instead of 853400 pps, using prio+tbf+fq_codel) ("mpstat -I SCPU 1" is much happier now) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07rxrpc: fix ptr_ret.cocci warningsWu Fengguang
net/rxrpc/rxkad.c:1165:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci CC: David Howells <dhowells@redhat.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07RDS: TCP: fix race windows in send-path quiescence by rds_tcp_accept_one()Sowmini Varadhan
The send path needs to be quiesced before resetting callbacks from rds_tcp_accept_one(), and commit eb192840266f ("RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting t_sock") achieves this using the c_state and RDS_IN_XMIT bit following the pattern used by rds_conn_shutdown(). However this leaves the possibility of a race window as shown in the sequence below take t_conn_lock in rds_tcp_conn_connect send outgoing syn to peer drop t_conn_lock in rds_tcp_conn_connect incoming from peer triggers rds_tcp_accept_one, conn is marked CONNECTING wait for RDS_IN_XMIT to quiesce any rds_send_xmit threads call rds_tcp_reset_callbacks [.. race-window where incoming syn-ack can cause the conn to be marked UP from rds_tcp_state_change ..] lock_sock called from rds_tcp_reset_callbacks, and we set t_sock to null As soon as the conn is marked UP in the race-window above, rds_send_xmit() threads will proceed to rds_tcp_xmit and may encounter a null-pointer deref on the t_sock. Given that rds_tcp_state_change() is invoked in softirq context, whereas rds_tcp_reset_callbacks() is in workq context, and testing for RDS_IN_XMIT after lock_sock could result in a deadlock with tcp_sendmsg, this commit fixes the race by using a new c_state, RDS_TCP_RESETTING, which will prevent a transition to RDS_CONN_UP from rds_tcp_state_change(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07RDS: TCP: Retransmit half-sent datagrams when switching sockets in ↵Sowmini Varadhan
rds_tcp_reset_callbacks When we switch a connection's sockets in rds_tcp_rest_callbacks, any partially sent datagram must be retransmitted on the new socket so that the receiver can correctly reassmble the RDS datagram. Use rds_send_reset() which is designed for this purpose. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07RDS: TCP: Add/use rds_tcp_reset_callbacks to reset tcp socket safelySowmini Varadhan
When rds_tcp_accept_one() has to replace the existing tcp socket with a newer tcp socket (duelling-syn resolution), it must lock_sock() to suppress the rds_tcp_data_recv() path while callbacks are being changed. Also, existing RDS datagram reassembly state must be reset, so that the next datagram on the new socket does not have corrupted state. Similarly when resetting the newly accepted socket, appropriate locks and synchronization is needed. This commit ensures correct synchronization by invoking kernel_sock_shutdown to reset a newly accepted sock, and by taking appropriate lock_sock()s (for old and new sockets) when resetting existing callbacks. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-07fq_codel: fix NET_XMIT_CN behaviorEric Dumazet
My prior attempt to fix the backlogs of parents failed. If we return NET_XMIT_CN, our parents wont increase their backlog, so our qdisc_tree_reduce_backlog() should take this into account. v2: Florian Westphal pointed out that we could drop the packet, so we need to save qdisc_pkt_len(skb) in a temp variable before calling fq_codel_drop() Fixes: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()") Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too") Reported-by: Stas Nichiporovich <stasn77@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-06net_sched: keep backlog updated with qlenWANG Cong
For gso_skb we only update qlen, backlog should be updated too. Note, it is correct to just update these stats at one layer, because the gso_skb is cached there. Reported-by: Stas Nichiporovich <stasn77@gmail.com> Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-06soreuseport: add compat case for setsockopt SO_ATTACH_REUSEPORT_CBPFHelge Deller
Commit 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") missed to add the compat case for the SO_ATTACH_REUSEPORT_CBPF option. Signed-off-by: Helge Deller <deller@gmx.de> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-05net: disable fragment reassembly if high_thresh is zeroMichal Kubeček
Before commit 6d7b857d541e ("net: use lib/percpu_counter API for fragmentation mem accounting"), setting the reassembly high threshold to 0 prevented fragment reassembly as first fragment would be always evicted before second could be added to the queue. While inefficient, some users apparently relied on this method. Since the commit mentioned above, a percpu counter is used for reassembly memory accounting and high batch size avoids taking slow path in most common scenarios. As a result, a whole full sized packet can be reassembled without the percpu counter's main counter changing its value so that even with high_thresh set to 0, fragmented packets can be still reassembled and processed. Add explicit check preventing reassembly if high threshold is zero. Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-06ipvs: update real-server binding of outgoing connections in SIP-peMarco Angaroni
Previous patch that introduced handling of outgoing packets in SIP persistent-engine did not call ip_vs_check_template() in case packet was matching a connection template. Assumption was that real-server was healthy, since it was sending a packet just in that moment. There are however real-server fault conditions requiring that association between call-id and real-server (represented by connection template) gets updated. Here is an example of the sequence of events: 1) RS1 is a back2back user agent that handled call-id1 and call-id2 2) RS1 is down and was marked as unavailable 3) new message from outside comes to IPVS with call-id1 4) IPVS reschedules the message to RS2, which becomes new call handler 5) RS2 forwards the message outside, translating call-id1 to call-id2 6) inside pe->conn_out() IPVS matches call-id2 with existing template 7) IPVS does not change association call-id2 <-> RS1 8) new message comes from client with call-id2 9) IPVS reschedules the message to a real-server potentially different from RS2, which is now the correct destination This patch introduces ip_vs_check_template() call in the handling of outgoing packets for SIP-pe. And also introduces a second optional argument for ip_vs_check_template() that allows to check if dest associated to a connection template is the same dest that was identified as the source of the packet. This is to change the real-server bound to a particular call-id independently from its availability status: the idea is that it's more reliable, for in->out direction (where internal network can be considered trusted), to always associate a call-id with the last real-server that used it in one of its messages. Think about above sequence of events where, just after step 5, RS1 returns instead to be available. Comparison of dests is done by simply comparing pointers to struct ip_vs_dest; there should be no cases where struct ip_vs_dest keeps its memory address, but represent a different real-server in terms of ip-address / port. Fixes: 39b972231536 ("ipvs: handle connections started by real-servers") Signed-off-by: Marco Angaroni <marcoangaroni@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
2016-06-04net: dsa: Add new binding implementationAndrew Lunn
The existing DSA binding has a number of limitations and problems. The main problem is that it cannot represent a switch as a linux device, hanging off some bus. It is limited to one CPU port. The DSA platform device is artificial, and does not really represent hardware. Implement a new binding which can be embedded into any type of node on a bus to represent one switch device, and its links to other switches. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04net: dsa: Make mdio bus optionalAndrew Lunn
The switch may want to instantiate its own MDIO bus. Only do it centrally if the switch has not already created one, and the read op is implemented. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04net: dsa: Refactor selection of tag ops into a functionAndrew Lunn
Replace the two switch statements with an array lookup, and store the result in the dsa tree structure. The drivers no longer need to know the selected tag protocol, so remove it from the dsa switch structure. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04net: dsa: Split up creating/destroying of DSA and CPU portsAndrew Lunn
Refactor the code to setup a single DSA/CPU port into a function of its own, and export it, so it can be used by the new binding. Similarly, refactor the destroy code into a function. When destroying the ports, don't put the of node. They should be released at the end along with the normal ports. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04net: dsa: Copy the routing table into the switch structureAndrew Lunn
The new binding will not have a chip data structure, it will place the routing directly into the switch structure. To enable backwards compatibility, copy the routing from the chip data into the switch structure. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04net: dsa: Remove dynamic allocate of routing tableAndrew Lunn
With a maximum of four switches, the size of the routing table is the same as the pointer to it. Removing it makes the code simpler. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04net: dsa: Move port device node into port structureAndrew Lunn
Move the port device node structure into the port structure, from the chip data. This information is needed in the next step of implementing the new binding. The chip data structure is used while parsing the whole old binding, before the individual switch structures exist. With the new bindings, this is reversed, the switches exist first, and the interconnections between the switches is derived from the individual switch bindings. Thus this chip data structure becomes unneeded. Signed-off-by: Andrew Lunn <andrew@lunn.ch> eviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04net: dsa: Add a ports structure and use it in the switch structureAndrew Lunn
There are going to be more per-port members added to the switch structure. So add a port structure and move the netdev into it. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04net: dsa: tag_{e}dsa.c: Remove dependency on platform dataAndrew Lunn
The platform data nr_chips is used when validating a received packet, to ensure it comes from a know switch chip. The number of possible switches is limited to DSA_MAX_SWITCHES, so use this as the first validation step. The new binding allows holes in the dst->ds[] array, so also ensure ensure there is a valid dsa_switch for this packet. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04net: dsa: slave: Remove MDIO address from switch MDIO bus nameAndrew Lunn
The DSA layer should no longer assume the switch is connected to an MDIO bus. As a result, we cannot use the address on the MDIO bus when forming the name of the switches internal MDIO bus for its builtin and possibly external PHYs. The switch index is sufficient to make the name unique, so drop the MDIO address. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04net: dsa: slave: chip data is optional, don't dereference NULLAndrew Lunn
The new binding does not make use of dsa_chip_data, a.k.a cd. When retrieving the size of the EEPROM attached to a switch, don't assume there is a cd attached to the switch structure. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-04Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "We have a few follow-up fixes for the libceph refactor from Ilya, and then some cephfs + fscache fixes from Zheng. The first two FS-Cache patches are acked by David Howells and deemed trivial enough to go through our tree. The rest fix some issues with the ceph fscache handling (disable cache for inodes opened for write, and simplify the revalidation logic accordingly, dropping the now-unnecessary work queue)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: use i_version to check validity of fscache ceph: improve fscache revalidation ceph: disable fscache when inode is opened for write ceph: avoid unnecessary fscache invalidation/revlidation ceph: call __fscache_uncache_page() if readpages fails FS-Cache: make check_consistency callback return int FS-Cache: wake write waiter after invalidating writes libceph: use %s instead of %pE in dout()s libceph: put request only if it's done in handle_reply() libceph: change ceph_osdmap_flag() to take osdc
2016-06-03net: Add docbook description for 'mtu' arg to skb_gso_validate_mtu()David S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03sctp: Fix warning in sctp_packet_transmit_chunk()David S. Miller
size_t objects should be printed with %Z printf format. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03rxrpc: Use pr_<level> and pr_fmt, reduce object size a few KBJoe Perches
Use the more common kernel logging style and reduce object size. The logging message prefix changes from a mixture of "RxRPC:" and "RXRPC:" to "af_rxrpc: ". $ size net/rxrpc/built-in.o* text data bss dec hex filename 64172 1972 8304 74448 122d0 net/rxrpc/built-in.o.new 67512 1972 8304 77788 12fdc net/rxrpc/built-in.o.old Miscellanea: o Consolidate the ASSERT macros to use a single pr_err call with decimal and hexadecimal output and a stringified #OP argument Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03sctp: improve debug message to also log curr pkt and new chunk sizeMarcelo Ricardo Leitner
This is useful for debugging packet sizes. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03sctp: Add GSO supportMarcelo Ricardo Leitner
SCTP has this pecualiarity that its packets cannot be just segmented to (P)MTU. Its chunks must be contained in IP segments, padding respected. So we can't just generate a big skb, set gso_size to the fragmentation point and deliver it to IP layer. This patch takes a different approach. SCTP will now build a skb as it would be if it was received using GRO. That is, there will be a cover skb with protocol headers and children ones containing the actual segments, already segmented to a way that respects SCTP RFCs. With that, we can tell skb_segment() to just split based on frag_list, trusting its sizes are already in accordance. This way SCTP can benefit from GSO and instead of passing several packets through the stack, it can pass a single large packet. v2: - Added support for receiving GSO frames, as requested by Dave Miller. - Clear skb->cb if packet is GSO (otherwise it's not used by SCTP) - Added heuristics similar to what we have in TCP for not generating single GSO packets that fills cwnd. v3: - consider sctphdr size in skb_gso_transport_seglen() - rebased due to 5c7cdf339af5 ("gso: Remove arbitrary checks for unsupported GSO") Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03sctp: delay as much as possible skb_linearizeMarcelo Ricardo Leitner
This patch is a preparation for the GSO one. In order to successfully handle GSO packets on rx path we must not call skb_linearize, otherwise it defeats any gain GSO may have had. This patch thus delays as much as possible the call to skb_linearize, leaving it to sctp_inq_pop() moment. For that the sanity checks performed now know how to deal with fragments. One positive side-effect of this is that if the socket is backlogged it will have the chance of doing it on backlog processing instead of during softirq. With this move, it's evident that a check for non-linearity in sctp_inq_pop was ineffective and is now removed. Note that a similar check is performed a bit below this one. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03skbuff: introduce skb_gso_validate_mtuMarcelo Ricardo Leitner
skb_gso_network_seglen is not enough for checking fragment sizes if skb is using GSO_BY_FRAGS as we have to check frag per frag. This patch introduces skb_gso_validate_mtu, based on the former, which will wrap the use case inside it as all calls to skb_gso_network_seglen were to validate if it fits on a given TMU, and improve the check. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03sk_buff: allow segmenting based on frag sizesMarcelo Ricardo Leitner
This patch allows segmenting a skb based on its frags sizes instead of based on a fixed value. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03skbuff: export skb_gro_receiveMarcelo Ricardo Leitner
sctp GSO requires it and sctp can be compiled as a module, so we need to export this function. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Tested-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03sch_tbf: update backlog as wellWANG Cong
Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03sch_red: update backlog as wellWANG Cong
Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03sch_drr: update backlog as wellWANG Cong
Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03sch_prio: update backlog as wellWANG Cong
We need to update backlog too when we update qlen. Joint work with Stas. Reported-by: Stas Nichiporovich <stasn77@gmail.com> Tested-by: Stas Nichiporovich <stasn77@gmail.com> Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-03sch_hfsc: always keep backlog updatedWANG Cong
hfsc updates backlog lazily, that is only when we dump the stats. This is problematic after we begin to update backlog in qdisc_tree_reduce_backlog(). Reported-by: Stas Nichiporovich <stasn77@gmail.com> Tested-by: Stas Nichiporovich <stasn77@gmail.com> Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-02rds: fix an infoleak in rds_inc_info_copyKangjie Lu
The last field "flags" of object "minfo" is not initialized. Copying this object out may leak kernel stack data. Assign 0 to it to avoid leak. Signed-off-by: Kangjie Lu <kjlu@gatech.edu> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>