summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-01-04batman-adv: Decrease hardif refcnt on fragmentation send errorSven Eckelmann
An error before the hardif is found has to free the skb. But every error after that has to free the skb + put the hard interface. Fixes: 8def0be82dd1 ("batman-adv: Consume skb in batadv_frag_send_packet") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-04xfrm: trivial typosAlexander Alemayhu
o s/descentant/descendant o s/workarbound/workaround Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-03net/sched: cls_matchall: Fix error pathYotam Gigi
Fix several error paths in matchall: - Release reference to actions in case the hardware fails offloading (relevant to skip_sw only) - Fix error path in case tcf_exts initialization/validation fail Fixes: bf3994d2ed31 ("net/sched: introduce Match-all classifier") Signed-off-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03tipc: reduce risk of user starvation during link congestionJon Paul Maloy
The socket code currently handles link congestion by either blocking and trying to send again when the congestion has abated, or just returning to the user with -EAGAIN and let him re-try later. This mechanism is prone to starvation, because the wakeup algorithm is non-atomic. During the time the link issues a wakeup signal, until the socket wakes up and re-attempts sending, other senders may have come in between and occupied the free buffer space in the link. This in turn may lead to a socket having to make many send attempts before it is successful. In extremely loaded systems we have observed latency times of several seconds before a low-priority socket is able to send out a message. In this commit, we simplify this mechanism and reduce the risk of the described scenario happening. When a message is attempted sent via a congested link, we now let it be added to the link's backlog queue anyway, thus permitting an oversubscription of one message per source socket. We still create a wakeup item and return an error code, hence instructing the sender to block or stop sending. Only when enough space has been freed up in the link's backlog queue do we issue a wakeup event that allows the sender to continue with the next message, if any. The fact that a socket now can consider a message sent even when the link returns a congestion code means that the sending socket code can be simplified. Also, since this is a good opportunity to get rid of the obsolete 'mtu change' condition in the three socket send functions, we now choose to refactor those functions completely. Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03tipc: modify struct tipc_plist to be more versatileJon Paul Maloy
During multicast reception we currently use a simple linked list with push/pop semantics to store port numbers. We now see a need for a more generic list for storing values of type u32. We therefore make some modifications to this list, while replacing the prefix 'tipc_plist_' with 'u32_'. We also add a couple of new functions which will come to use in the next commits. Acked-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() functionsJon Paul Maloy
The functions tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() are very similar. The latter function is also called from two locations, and there will be more in the coming commits, which will all need to test on different conditions. Instead of making yet another duplicates of the function, we now introduce a new macro tipc_wait_for_cond() where the wakeup condition can be stated as an argument to the call. This macro replaces all current and future uses of the two functions, which can now be eliminated. Acked-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03drop_monitor: consider inserted data in genlmsg_endReiter Wolfgang
Final nlmsg_len field update must reflect inserted net_dm_drop_point data. This patch depends on previous patch: "drop_monitor: add missing call to genlmsg_end" Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03af_packet: TX_RING support for TPACKET_V3Sowmini Varadhan
Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3 does not currently have TX_RING support. As a result an application that wants the best perf for Tx and Rx (e.g. to handle request/response transacations) ends up needing 2 sockets, one with *_v2 for Tx and another with *_v3 for Rx. This patch enables TPACKET_V2 compatible Tx features in TPACKET_V3 so that an application can use a single descriptor to get the benefits of _v3 RX_RING and _v2 TX_RING. An application may do a block-send by first filling up multiple frames in the Tx ring and then triggering a transmit. This patch only support fixed size Tx frames for TPACKET_V3, and requires that tp_next_offset must be zero. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03ipmr, ip6mr: add RTNH_F_UNRESOLVED flag to unresolved cache entriesNikolay Aleksandrov
While working with ipmr, we noticed that it is impossible to determine if an entry is actually unresolved or its IIF interface has disappeared (e.g. virtual interface got deleted). These entries look almost identical to user-space when dumping or receiving notifications. So in order to recognize them add a new RTNH_F_UNRESOLVED flag which is set when sending an unresolved cache entry to user-space. Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03ipv4: Do not allow MAIN to be alias for new LOCAL w/ custom rulesAlexander Duyck
In the case of custom rules being present we need to handle the case of the LOCAL table being intialized after the new rule has been added. To address that I am adding a new check so that we can make certain we don't use an alias of MAIN for LOCAL when allocating a new table. Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") Reported-by: Oliver Brunel <jjk@jjacky.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03netfilter: nft_ct: add average bytes per packet supportLiping Zhang
Similar to xt_connbytes, user can match how many average bytes per packet a connection has transferred so far. Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-03netfilter: nat: merge udp and udplite helpersFlorian Westphal
udplite nat was copied from udp nat, they are virtually 100% identical. Not really surprising given udplite is just udp with partial csum coverage. old: text data bss dec hex filename 11606 1457 210 13273 33d9 nf_nat.ko 330 0 2 332 14c nf_nat_proto_udp.o 276 0 2 278 116 nf_nat_proto_udplite.o new: text data bss dec hex filename 11598 1457 210 13265 33d1 nf_nat.ko 640 0 4 644 284 nf_nat_proto_udp.o Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-03netfilter: merge udp and udplite conntrack helpersFlorian Westphal
udplite was copied from udp, they are virtually 100% identical. This adds udplite tracker to udp instead, removes udplite module, and then makes the udplite tracker builtin. udplite will then simply re-use udp timeout settings. It makes little sense to add separate sysctls, nowadays we have fine-grained timeout policy support via the CT target. old: text data bss dec hex filename 1633 672 0 2305 901 nf_conntrack_proto_udp.o 1756 672 0 2428 97c nf_conntrack_proto_udplite.o 69526 17937 268 87731 156b3 nf_conntrack.ko new: text data bss dec hex filename 2442 1184 0 3626 e2a nf_conntrack_proto_udp.o 68565 17721 268 86554 1521a nf_conntrack.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-02RDS: add receive message trace used by applicationSantosh Shilimkar
Socket option to tap receive path latency in various stages in nano seconds. It can be enabled on selective sockets using using SO_RDS_MSG_RXPATH_LATENCY socket option. RDS will return the data to application with RDS_CMSG_RXPATH_LATENCY in defined format. Scope is left to add more trace points for future without need of change in the interface. Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: make message size limit compliant with specAvinash Repaka
RDS support max message size as 1M but the code doesn't check this in all cases. Patch fixes it for RDMA & non-RDMA and RDS MR size and its enforced irrespective of underlying transport. Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: add stat for socket recv memory usageVenkat Venkatsubra
Tracks the receive side memory added to scokets and removed from sockets. Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: fix panic due to handlers running post teardownSantosh Shilimkar
Shutdown code reaping loop takes care of emptying the CQ's before they being destroyed. And once tasklets are killed, the hanlders are not expected to run. But because of core tasklet code issues, tasklet handler could still run even after tasklet_kill, RDS IB shutdown code already reaps the CQs before freeing cq/qp resources so as such the handlers have nothing left to do post shutdown. On other hand any handler running after teardown and trying to access already freed qp/cq resources causes issues Patch fixes this race by makes sure that handlers returns without any action post teardown. Reviewed-by: Wengang <wen.gang.wang@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: RDMA: Fix the composite message user notificationSantosh Shilimkar
When application sends an RDS RDMA composite message consist of RDMA transfer to be followed up by non RDMA payload, it expect to be notified *only* when the full message gets delivered. RDS RDMA notification doesn't behave this way though. Thanks to Venkat for debug and root casuing the issue where only first part of the message(RDMA) was successfully delivered but remainder payload delivery failed. In that case, application should not be notified with a false positive of message delivery success. Fix this case by making sure the user gets notified only after the full message delivery. Reviewed-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: Add vector spreading for cqsSantosh Shilimkar
Based on available device vectors, allocate cqs accordingly to get better spread of completion vectors which helps performace great deal.. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: add few useful cache stastsSantosh Shilimkar
Tracks the ib receive cache total, incoming and frag allocations. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: track and log active side endpoint in connectionSantosh Shilimkar
Useful to know the active and passive end points in a RDS IB connection. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: RDMA: silence the use_once mr log floodSantosh Shilimkar
In absence of extension headers, message log will keep flooding the console. As such even without use_once we can clean up the MRs so its not really an error case message so make it debug message Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: split the mr registration and invalidation pathSantosh Shilimkar
MR invalidation in RDS is done in background thread and not in data path like registration. So break the dependency between them which helps to remove the performance bottleneck. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: RDMA: return appropriate error on rdma map failuresSantosh Shilimkar
The first message to a remote node should prompt a new connection even if it is RDMA operation. For RDMA operation the MR mapping can fail because connections is not yet up. Since the connection establishment is asynchronous, we make sure the map failure because of unavailable connection reach to the user by appropriate error code. Before returning to the user, lets trigger the connection so that its ready for the next retry. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: RDMA: start rdma listening after initQing Huang
This prevents RDS from handling incoming rdma packets before RDS completes initializing its recv/send components. Signed-off-by: Qing Huang <qing.huang@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: RDMA: fix the ib_map_mr_sg_zbva() argumentSantosh Shilimkar
Fixes warning: Using plain integer as NULL pointer Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: make the transport retry count smallestSantosh Shilimkar
Transport retry is not much useful since it indicate packet loss in fabric so its better to failover fast rather than longer retry. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: IB: include faddr in connection logSantosh Shilimkar
Also use pr_* for it. Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: mark few internal functions static to make sparse build happySantosh Shilimkar
Fixes below warnings: warning: symbol 'rds_send_probe' was not declared. Should it be static? warning: symbol 'rds_send_ping' was not declared. Should it be static? warning: symbol 'rds_tcp_accept_one_path' was not declared. Should it be static? warning: symbol 'rds_walk_conn_path_info' was not declared. Should it be static? Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02RDS: log the address on bind failureSantosh Shilimkar
It's useful to know the IP address when RDS fails to bind a connection. Thus, adding it to the error message. Orabug: 21894138 Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02igmp: Make igmp group member RFC 3376 compliantMichal Tesar
5.2. Action on Reception of a Query When a system receives a Query, it does not respond immediately. Instead, it delays its response by a random amount of time, bounded by the Max Resp Time value derived from the Max Resp Code in the received Query message. A system may receive a variety of Queries on different interfaces and of different kinds (e.g., General Queries, Group-Specific Queries, and Group-and-Source-Specific Queries), each of which may require its own delayed response. Before scheduling a response to a Query, the system must first consider previously scheduled pending responses and in many cases schedule a combined response. Therefore, the system must be able to maintain the following state: o A timer per interface for scheduling responses to General Queries. o A per-group and interface timer for scheduling responses to Group- Specific and Group-and-Source-Specific Queries. o A per-group and interface list of sources to be reported in the response to a Group-and-Source-Specific Query. When a new Query with the Router-Alert option arrives on an interface, provided the system has state to report, a delay for a response is randomly selected in the range (0, [Max Resp Time]) where Max Resp Time is derived from Max Resp Code in the received Query message. The following rules are then used to determine if a Report needs to be scheduled and the type of Report to schedule. The rules are considered in order and only the first matching rule is applied. 1. If there is a pending response to a previous General Query scheduled sooner than the selected delay, no additional response needs to be scheduled. 2. If the received Query is a General Query, the interface timer is used to schedule a response to the General Query after the selected delay. Any previously pending response to a General Query is canceled. --8<-- Currently the timer is rearmed with new random expiration time for every incoming query regardless of possibly already pending report. Which is not aligned with the above RFE. It also might happen that higher rate of incoming queries can postpone the report after the expiration time of the first query causing group membership loss. Now the per interface general query timer is rearmed only when there is no pending report already scheduled on that interface or the newly selected expiration time is before the already pending scheduled report. Signed-off-by: Michal Tesar <mtesar@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02flow_dissector: Update pptp handling to avoid null pointer deref.Ian Kumlien
__skb_flow_dissect can be called with a skb or a data packet, either can be NULL. All calls seems to have been moved to __skb_header_pointer except the pptp handling which is still calling skb_header_pointer. skb_header_pointer will use skb->data and thus: [ 109.556866] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080 [ 109.557102] IP: [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0 [ 109.557263] PGD 0 [ 109.557338] [ 109.557484] Oops: 0000 [#1] SMP [ 109.557562] Modules linked in: chaoskey [ 109.557783] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.0 #79 [ 109.557867] Hardware name: Supermicro A1SRM-LN7F/LN5F/A1SRM-LN7F-2758, BIOS 1.0c 11/04/2015 [ 109.557957] task: ffff94085c27bc00 task.stack: ffffb745c0068000 [ 109.558041] RIP: 0010:[<ffffffff88dc02f8>] [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0 [ 109.558203] RSP: 0018:ffff94087fc83d40 EFLAGS: 00010206 [ 109.558286] RAX: 0000000000000130 RBX: ffffffff8975bf80 RCX: ffff94084fab6800 [ 109.558373] RDX: 0000000000000010 RSI: 000000000000000c RDI: 0000000000000000 [ 109.558460] RBP: 0000000000000b88 R08: 0000000000000000 R09: 0000000000000022 [ 109.558547] R10: 0000000000000008 R11: ffff94087fc83e04 R12: 0000000000000000 [ 109.558763] R13: ffff94084fab6800 R14: ffff94087fc83e04 R15: 000000000000002f [ 109.558979] FS: 0000000000000000(0000) GS:ffff94087fc80000(0000) knlGS:0000000000000000 [ 109.559326] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 109.559539] CR2: 0000000000000080 CR3: 0000000281809000 CR4: 00000000001026e0 [ 109.559753] Stack: [ 109.559957] 000000000000000c ffff94084fab6822 0000000000000001 ffff94085c2b5fc0 [ 109.560578] 0000000000000001 0000000000002000 0000000000000000 0000000000000000 [ 109.561200] 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 109.561820] Call Trace: [ 109.562027] <IRQ> [ 109.562108] [<ffffffff88dfb4fa>] ? eth_get_headlen+0x7a/0xf0 [ 109.562522] [<ffffffff88c5a35a>] ? igb_poll+0x96a/0xe80 [ 109.562737] [<ffffffff88dc912b>] ? net_rx_action+0x20b/0x350 [ 109.562953] [<ffffffff88546d68>] ? __do_softirq+0xe8/0x280 [ 109.563169] [<ffffffff8854704a>] ? irq_exit+0xaa/0xb0 [ 109.563382] [<ffffffff8847229b>] ? do_IRQ+0x4b/0xc0 [ 109.563597] [<ffffffff8902d4ff>] ? common_interrupt+0x7f/0x7f [ 109.563810] <EOI> [ 109.563890] [<ffffffff88d57530>] ? cpuidle_enter_state+0x130/0x2c0 [ 109.564304] [<ffffffff88d57520>] ? cpuidle_enter_state+0x120/0x2c0 [ 109.564520] [<ffffffff8857eacf>] ? cpu_startup_entry+0x19f/0x1f0 [ 109.564737] [<ffffffff8848d55a>] ? start_secondary+0x12a/0x140 [ 109.564950] Code: 83 e2 20 a8 80 0f 84 60 01 00 00 c7 04 24 08 00 00 00 66 85 d2 0f 84 be fe ff ff e9 69 fe ff ff 8b 34 24 89 f2 83 c2 04 66 85 c0 <41> 8b 84 24 80 00 00 00 0f 49 d6 41 8d 31 01 d6 41 2b 84 24 84 [ 109.569959] RIP [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0 [ 109.570245] RSP <ffff94087fc83d40> [ 109.570453] CR2: 0000000000000080 Fixes: ab10dccb1160 ("rps: Inspect PPTP encapsulated by GRE to get flow hash") Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02mac80211: Fix addition of mesh configuration elementIlan peer
The code was setting the capabilities byte to zero, after it was already properly set previously. Fix it. The bug was found while debugging hwsim mesh tests failures that happened since the commit mentioned below. Fixes: 76f43b4c0a93 ("mac80211: Remove invalid flag operations in mesh TSF synchronization") Signed-off-by: Ilan Peer <ilan.peer@intel.com> Reviewed-by: Masashi Honma <masashi.honma@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-02mac80211: initialize fast-xmit 'info' laterJohannes Berg
In ieee80211_xmit_fast(), 'info' is initialized to point to the skb that's passed in, but that skb may later be replaced by a clone (if it was shared), leading to an invalid pointer. This can lead to use-after-free and also later crashes since the real SKB's info->hw_queue doesn't get initialized properly. Fix this by assigning info only later, when it's needed, after the skb replacement (may have) happened. Cc: stable@vger.kernel.org Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-01l2tp: take remote address into account in l2tp_ip and l2tp_ip6 socket lookupsGuillaume Nault
For connected sockets, __l2tp_ip{,6}_bind_lookup() needs to check the remote IP when looking for a matching socket. Otherwise a connected socket can receive traffic not originating from its peer. Drop l2tp_ip_bind_lookup() and l2tp_ip6_bind_lookup() instead of updating their prototype, as these functions aren't used. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-01l2tp: consider '::' as wildcard address in l2tp_ip6 socket lookupGuillaume Nault
An L2TP socket bound to the unspecified address should match with any address. If not, it can't receive any packet and __l2tp_ip6_bind_lookup() can't prevent another socket from binding on the same device/tunnel ID. While there, rename the 'addr' variable to 'sk_laddr' (local addr), to make following patch clearer. Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-01drop_monitor: add missing call to genlmsg_endReiter Wolfgang
Update nlmsg_len field with genlmsg_end to enable userspace processing using nlmsg_next helper. Also adds error handling. Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-01net: socket: don't set sk_uid to garbage value in ->setattr()Eric Biggers
->setattr() was recently implemented for socket files to sync the socket inode's uid to the new 'sk_uid' member of struct sock. It does this by copying over the ia_uid member of struct iattr. However, ia_uid is actually only valid when ATTR_UID is set in ia_valid, indicating that the uid is being changed, e.g. by chown. Other metadata operations such as chmod or utimes leave ia_uid uninitialized. Therefore, sk_uid could be set to a "garbage" value from the stack. Fix this by only copying the uid over when ATTR_UID is set. Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.") Signed-off-by: Eric Biggers <ebiggers@google.com> Tested-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-01batman-adv: Start new development cycleSimon Wunderlich
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2016-12-30net: Allow IP_MULTICAST_IF to set index to L3 slaveDavid Ahern
IP_MULTICAST_IF fails if sk_bound_dev_if is already set and the new index does not match it. e.g., ntpd[15381]: setsockopt IP_MULTICAST_IF 192.168.1.23 fails: Invalid argument Relax the check in setsockopt to allow setting mc_index to an L3 slave if sk_bound_dev_if points to an L3 master. Make a similar change for IPv6. In this case change the device lookup to take the rcu_read_lock avoiding a refcnt. The rcu lock is also needed for the lookup of a potential L3 master device. This really only silences a setsockopt failure since uses of mc_index are secondary to sk_bound_dev_if if it is set. In both cases, if either index is an L3 slave or master, lookups are directed to the same FIB table so relaxing the check at setsockopt time causes no harm. Patch is based on a suggested change by Darwin for a problem noted in their code base. Suggested-by: Darwin Dingel <darwin.dingel@alliedtelesis.co.nz> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-30bridge: netfilter: Fix dropping packets that moving through bridge interfaceArtur Molchanov
Problem: br_nf_pre_routing_finish() calls itself instead of br_nf_pre_routing_finish_bridge(). Due to this bug reverse path filter drops packets that go through bridge interface. User impact: Local docker containers with bridge network can not communicate with each other. Fixes: c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh") Signed-off-by: Artur Molchanov <artur.molchanov@synesis.ru> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-29net: ipv4: dst for local input routes should use l3mdev if relevantDavid Ahern
IPv4 output routes already use l3mdev device instead of loopback for dst's if it is applicable. Change local input routes to do the same. This fixes icmp responses for unreachable UDP ports which are directed to the wrong table after commit 9d1a6c4ea43e4 because local_input routes use the loopback device. Moving from ingress device to loopback loses the L3 domain causing responses based on the dst to get to lost. Fixes: 9d1a6c4ea43e4 ("net: icmp_route_lookup should use rt dev to determine L3 domain") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29net: dsa: Implement ndo_get_phys_port_idFlorian Fainelli
Implement ndo_get_phys_port_id() by returning the physical port number of the switch this per-port DSA created network interface corresponds to. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29net: dev_weight: TX/RX orthogonalityMatthias Tafelmeier
Oftenly, introducing side effects on packet processing on the other half of the stack by adjusting one of TX/RX via sysctl is not desirable. There are cases of demand for asymmetric, orthogonal configurability. This holds true especially for nodes where RPS for RFS usage on top is configured and therefore use the 'old dev_weight'. This is quite a common base configuration setup nowadays, even with NICs of superior processing support (e.g. aRFS). A good example use case are nodes acting as noSQL data bases with a large number of tiny requests and rather fewer but large packets as responses. It's affordable to have large budget and rx dev_weights for the requests. But as a side effect having this large a number on TX processed in one run can overwhelm drivers. This patch therefore introduces an independent configurability via sysctl to userland. Signed-off-by: Matthias Tafelmeier <matthias.tafelmeier@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29sctp: refactor sctp_datamsg_from_userMarcelo Ricardo Leitner
This patch refactors sctp_datamsg_from_user() in an attempt to make it better to read and avoid code duplication for handling the last fragment. It also avoids doing division and remaining operations. Even though, it should still operate similarly as before this patch. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29net: fix incorrect original ingress device index in PKTINFOWei Zhang
When we send a packet for our own local address on a non-loopback interface (e.g. eth0), due to the change had been introduced from commit 0b922b7a829c ("net: original ingress device index in PKTINFO"), the original ingress device index would be set as the loopback interface. However, the packet should be considered as if it is being arrived via the sending interface (eth0), otherwise it would break the expectation of the userspace application (e.g. the DHCPRELEASE message from dhcp_release binary would be ignored by the dnsmasq daemon, since it come from lo which is not the interface dnsmasq bind to) Fixes: 0b922b7a829c ("net: original ingress device index in PKTINFO") Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Wei Zhang <asuka.com@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29rtnl: stats - add missing netlink message size checksMathias Krause
We miss to check if the netlink message is actually big enough to contain a struct if_stats_msg. Add a check to prevent userland from sending us short messages that would make us access memory beyond the end of the message. Fixes: 10c9ead9f3c6 ("rtnetlink: add new RTM_GETSTATS message to dump...") Signed-off-by: Mathias Krause <minipli@googlemail.com> Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29ipv6: remove unnecessary inet6_sk checkDave Jones
np is already assigned in the variable declaration of ping_v6_sendmsg. At this point, we have already dereferenced np several times, so the NULL check is also redundant. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29ipv6: Should use consistent conditional judgement for ip6 fragment between ↵Zheng Li
__ip6_append_data and ip6_finish_output There is an inconsistent conditional judgement between __ip6_append_data and ip6_finish_output functions, the variable length in __ip6_append_data just include the length of application's payload and udp6 header, don't include the length of ipv6 header, but in ip6_finish_output use (skb->len > ip6_skb_dst_mtu(skb)) as judgement, and skb->len include the length of ipv6 header. That causes some particular application's udp6 payloads whose length are between (MTU - IPv6 Header) and MTU were fragmented by ip6_fragment even though the rst->dev support UFO feature. Add the length of ipv6 header to length in __ip6_append_data to keep consistent conditional judgement as ip6_finish_output for ip6 fragment. Signed-off-by: Zheng Li <james.z.li@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29ipv4: Namespaceify tcp_max_syn_backlog knobHaishuang Yan
Different namespace application might require different maximal number of remembered connection requests. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>