summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-10-07bridge: netlink: export port's topology_change_ack and config_pendingNikolay Aleksandrov
Add IFLA_BRPORT_TOPOLOGY_CHANGE_ACK and IFLA_BRPORT_CONFIG_PENDING to allow getting port's topology_change_ack and config_pending respectively via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07bridge: netlink: export port's id and numberNikolay Aleksandrov
Add IFLA_BRPORT_(ID|NO) to allow getting port's port_id and port_no respectively via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07bridge: netlink: export port's designated cost and portNikolay Aleksandrov
Add IFLA_BRPORT_DESIGNATED_(COST|PORT) to allow getting the port's designated cost and port respectively via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07bridge: netlink: export port's bridge idNikolay Aleksandrov
Add IFLA_BRPORT_BRIDGE_ID to allow getting the designated bridge id via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07bridge: netlink: export port's root idNikolay Aleksandrov
Add IFLA_BRPORT_ROOT_ID to allow getting the designated root id via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: Lookup actual route when oif is VRF deviceDavid Ahern
If the user specifies a VRF device in a get route query the custom route pointing to the VRF device is returned: $ ip route ls table vrf-red unreachable default broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.2 10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.2 local 10.2.1.2 dev eth1 proto kernel scope host src 10.2.1.2 broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.2 $ ip route get oif vrf-red 10.2.1.40 10.2.1.40 dev vrf-red cache Add the flags to skip the custom route and go directly to the FIB. With this patch the actual route is returned: $ ip route get oif vrf-red 10.2.1.40 10.2.1.40 dev eth1 src 10.2.1.2 cache Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07Merge tag 'mac80211-next-for-davem-2015-10-05' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== For the current cycle, we have the following right now: * many internal fixes, API improvements, cleanups, etc. * full AP client state tracking in cfg80211/mac80211 from Ayala * VHT support (in mac80211) for mesh * some A-MSDU in A-MPDU support from Emmanuel * show current TX power to userspace (from Rafał) * support for netlink dump in vendor commands (myself) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: Add l3mdev saddr lookup to raw_sendmsgDavid Ahern
ping originated on box through a VRF device is showing up in tcpdump without a source address: $ tcpdump -n -i vrf-blue 08:58:33.311303 IP 0.0.0.0 > 10.2.2.254: ICMP echo request, id 2834, seq 1, length 64 08:58:33.311562 IP 10.2.2.254 > 10.2.2.2: ICMP echo reply, id 2834, seq 1, length 64 Add the call to l3mdev_get_saddr to raw_sendmsg. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: Add source address lookup op for VRFDavid Ahern
Add operation to l3mdev to lookup source address for a given flow. Add support for the operation to VRF driver and convert existing IPv4 hooks to use the new lookup. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: Refactor path selection in __ip_route_output_key_hashDavid Ahern
VRF device needs the same path selection following lookup to set source address. Rather than duplicating code, move existing code into a function that is exported to modules. Code move only; no functional change. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: Add netif_is_l3_slaveDavid Ahern
IPv6 addrconf keys off of IFF_SLAVE so can not use it for L3 slave. Add a new private flag and add netif_is_l3_slave function for checking it. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: Rename FLOWI_FLAG_VRFSRC to FLOWI_FLAG_L3MDEV_SRCDavid Ahern
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: Fix vti use case with oif in dst lookups for IPv6David Ahern
It occurred to me yesterday that 741a11d9e4103 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set") means that xfrm6_dst_lookup needs the FLOWI_FLAG_SKIP_NH_OIF flag set. This latest commit causes the oif to be considered in lookups which is known to break vti. This explains why 58189ca7b274 did not the IPv6 change at the time it was submitted. Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: Fix vti use case with oif in dst lookups for IPv6David Ahern
It occurred to me yesterday that 741a11d9e4103 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set") means that xfrm6_dst_lookup needs the FLOWI_FLAG_SKIP_NH_OIF flag set. This latest commit causes the oif to be considered in lookups which is known to break vti. This explains why 58189ca7b274 did not the IPv6 change at the time it was submitted. Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07openvswitch: netlink attributes for IPv6 tunnelingJiri Benc
Add netlink attributes for IPv6 tunnel addresses. This enables IPv6 support for tunnels. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07openvswitch: add tunnel protocol to sw_flow_keyJiri Benc
Store tunnel protocol (AF_INET or AF_INET6) in sw_flow_key. This field now also acts as an indicator whether the flow contains tunnel data (this was previously indicated by tun_key.u.ipv4.dst being set but with IPv6 addresses in an union with IPv4 ones this won't work anymore). The new field was added to a hole in sw_flow_key. Signed-off-by: Jiri Benc <jbenc@redhat.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07bridge: netlink: make br_fill_info's frame size smallerNikolay Aleksandrov
When KASAN is enabled the frame size grows > 2048 bytes and we get a warning, so make it smaller. net/bridge/br_netlink.c: In function 'br_fill_info': >> net/bridge/br_netlink.c:1110:1: warning: the frame size of 2160 bytes >> is larger than 2048 bytes [-Wframe-larger-than=] Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: Add support for filtering neigh dump by device indexDavid Ahern
Add support for filtering neighbor dumps by device by adding the NDA_IFINDEX attribute to the dump request. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07net: dsa: better error reportingRussell King
Add additional error reporting to the generic DSA code, so it's easier to debug when things go wrong. This was useful when initially bringing up 88e6176 on a new board. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-07ipvs: Remove possibly unused variables from ip_vs_conn_net_{init,cleanup}Simon Horman
If CONFIG_PROC_FS is undefined then the arguments of proc_create() and remove_proc_entry() are unused. As a result the net variables of ip_vs_conn_net_{init,cleanup} are unused. net/netfilter/ipvs//ip_vs_conn.c: In function ‘ip_vs_conn_net_init’: net/netfilter/ipvs//ip_vs_conn.c:1350:14: warning: unused variable ‘net’ [-Wunused-variable] net/netfilter/ipvs//ip_vs_conn.c: In function ‘ip_vs_conn_net_cleanup’: net/netfilter/ipvs//ip_vs_conn.c:1361:14: warning: unused variable ‘net’ [-Wunused-variable] ... Resolve this by dereferencing net as needed rather than storing it in a variable. Fixes: 3d99376689ee ("ipvs: Pass ipvs not net into ip_vs_control_net_(init|cleanup)") Signed-off-by: Simon Horman <horms@verge.net.au> Acked-by: Julian Anastasov <ja@ssi.bg>
2015-10-07ipvs: Remove possibly unused variable from ip_vs_outDavid Ahern
Eric's net namespace changes in 1b75097dd7a26 leaves net unreferenced if CONFIG_IP_VS_IPV6 is not enabled: ../net/netfilter/ipvs/ip_vs_core.c: In function ‘ip_vs_out’: ../net/netfilter/ipvs/ip_vs_core.c:1177:14: warning: unused variable ‘net’ [-Wunused-variable] After the net refactoring there is only 1 user; push the reference to the 1 user. While the line length slightly exceeds 80 it seems to be the best change. Fixes: 1b75097dd7a26("ipvs: Pass ipvs into ip_vs_out") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Julian Anastasov <ja@ssi.bg> [horms: updated subject] Signed-off-by: Simon Horman <horms@verge.net.au>
2015-10-06xprtrdma: Don't require LOCAL_DMA_LKEY support for fastregSagi Grimberg
There is no need to require LOCAL_DMA_LKEY support as the PD allocation makes sure that there is a local_dma_lkey. Also correctly set a return value in error path. This caused a NULL pointer dereference in mlx5 which removed the support for LOCAL_DMA_LKEY. Fixes: bb6c96d72879 ("xprtrdma: Replace global lkey with lkey local to PD") Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-05ipv4: Fix compilation errors in fib_rebalancePeter Nørlund
This fixes net/built-in.o: In function `fib_rebalance': fib_semantics.c:(.text+0x9df14): undefined reference to `__divdi3' and net/built-in.o: In function `fib_rebalance': net/ipv4/fib_semantics.c:572: undefined reference to `__aeabi_ldivmod' Fixes: 0e884c78ee19 ("ipv4: L3 hash-based multipath") Signed-off-by: Peter Nørlund <pch@ordbogen.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05RDS: IB: split mr pool to improve 8K messages performanceSantosh Shilimkar
8K message sizes are pretty important usecase for RDS current workloads so we make provison to have 8K mrs available from the pool. Based on number of SG's in the RDS message, we pick a pool to use. Also to make sure that we don't under utlise mrs when say 8k messages are dominating which could lead to 8k pull being exhausted, we fall-back to 1m pool till 8k pool recovers for use. This helps to at least push ~55 kB/s bidirectional data which is a nice improvement. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05RDS: IB: use max_mr from HCA caps than max_fmrSantosh Shilimkar
All HCA drivers seems to popullate max_mr caps and few of them do both max_mr and max_fmr. Hence update RDS code to make use of max_mr. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05RDS: IB: mark rds_ib_fmr_wq staticSantosh Shilimkar
Fix below warning by marking rds_ib_fmr_wq static net/rds/ib_rdma.c:87:25: warning: symbol 'rds_ib_fmr_wq' was not declared. Should it be static? Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05RDS: IB: use already available pool handle from ibmrSantosh Shilimkar
rds_ib_mr already keeps the pool handle which it associates with. Lets use that instead of round about way of fetching it from rds_ib_device. No functional change. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05RDS: IB: fix the rds_ib_fmr_wq kick callSantosh Shilimkar
RDS IB mr pool has its own workqueue 'rds_ib_fmr_wq', so we need to use queue_delayed_work() to kick the work. This was hurting the performance since pool maintenance was less often triggered from other path. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05RDS: IB: handle rds_ibdev release case instead of crashing the kernelSantosh Shilimkar
Just in case we are still handling the QP receive completion while the rds_ibdev is released, drop the connection instead of crashing the kernel. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05RDS: IB: split send completion handling and do batch ackSantosh Shilimkar
Similar to what we did with receive CQ completion handling, we split the transmit completion handler so that it lets us implement batched work completion handling. We re-use the cq_poll routine and makes use of RDS_IB_SEND_OP to identify the send vs receive completion event handler invocation. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05RDS: IB: ack more receive completions to improve performanceSantosh Shilimkar
For better performance, we split the receive completion IRQ handler. That lets us acknowledge several WCE events in one call. We also limit the WC to max 32 to avoid latency. Acknowledging several completions in one call instead of several calls each time will provide better performance since less mutual exclusion locks are being performed. In next patch, send completion is also split which re-uses the poll_cq() and hence the code is moved to ib_cm.c Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05RDS: use rds_send_xmit() state instead of RDS_LL_SEND_FULLSantosh Shilimkar
In Transport indepedent rds_sendmsg(), we shouldn't make decisions based on RDS_LL_SEND_FULL which is used to manage the ring for RDMA based transports. We can safely issue rds_send_xmit() and the using its return value take decision on deferred work. This will also fix the scenario where at times we are seeing connections stuck with the LL_SEND_FULL bit getting set and never cleared. We kick krdsd after any time we see -ENOMEM or -EAGAIN from the ring allocation code. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05RDS: defer the over_batch work to send workerSantosh Shilimkar
Current process gives up if its send work over the batch limit. The work queue will get kicked to finish off any other requests. This fixes remainder condition from commit 443be0e5affe ("RDS: make sure not to loop forever inside rds_send_xmit"). The restart condition is only for the case where we reached to over_batch code for some other reason so just retrying again before giving up. While at it, make sure we use already available 'send_batch_count' parameter instead of magic value. The batch count threshold value of 1024 came via commit 443be0e5affe ("RDS: make sure not to loop forever inside rds_send_xmit"). The idea is to process as big a batch as we can but at the same time we don't hold other waiting processes for send. Hence back-off after the send_batch_count limit (1024) to avoid soft-lock ups. Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2015-10-05mac80211: make ieee80211_new_mesh_header return unsignedAndrzej Hajda
The function returns always non-negative values. The problem has been detected using proposed semantic patch scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1]. [1]: http://permalink.gmane.org/gmane.linux.kernel/2046107 Signed-off-by: Andrzej Hajda <a.hajda@samsung.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-10-05netfilter: nfnetlink_log: allow to attach conntrackKen-ichirou MATSUZAWA
This patch enables to include the conntrack information together with the packet that is sent to user-space via NFLOG, then a user-space program can acquire NATed information by this NFULA_CT attribute. Including the conntrack information is optional, you can set it via NFULNL_CFG_F_CONNTRACK flag with the NFULA_CFG_FLAGS attribute like NFQUEUE. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-05netfilter: ctnetlink: add const qualifier to nfnl_hook.get_ctKen-ichirou MATSUZAWA
get_ct as is and will not update its skb argument, and users of nfnl_ct_hook is currently only nfqueue, we can add const qualifier. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
2015-10-05netfilter: Kconfig rename QUEUE_CT to GLUE_CTKen-ichirou MATSUZAWA
Conntrack information attaching infrastructure is now generic and update it's name to use `glue' in previous patch. This patch updates Kconfig symbol name and adding NF_CT_NETLINK dependency. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-05netfilter: nfnetlink_queue: rename related to nfqueue attaching conntrack infoKen-ichirou MATSUZAWA
The idea of this series of patch is to attach conntrack information to nflog like nfqueue has already done. nfqueue conntrack info attaching basis is generic, rename those names to generic one, glue. Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-05netfilter: nfnetlink_queue: use y2038 safe timestampPablo Neira Ayuso
The __build_packet_message function fills a nfulnl_msg_packet_timestamp structure that uses 64-bit seconds and is therefore y2038 safe, but it uses an intermediate 'struct timespec' which is not. This trivially changes the code to use 'struct timespec64' instead, to correct the result on 32-bit architectures. This is a copy and paste of Arnd's original patch for nfnetlink_log. Suggested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-05Merge tag 'ipvs3-for-v4.4' of ↵Pablo Neira Ayuso
https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next Simon Horman says: ==================== Third Round of IPVS Updates for v4.4 please consider this build fix from Eric Biederman which resolves a build problem introduced in is excellent work to cleanup IPVS which you recently pulled: its queued up for v4.4 so no need to worry about earlier kernel versions. I have another minor cleanup, to fix a build warning, pending. However, I wanted to send this one to you now as its hit nf-next, net-next and in turn next, and a slow trickle of bug reports are appearing. ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-05openvswitch: Fix ovs_vport_get_stats()Pravin B Shelar
Not every device has dev->tstats set. So when OVS tries to calculate vport stats it causes kernel panic. Following patch fixes it by using standard API to get net-device stats. ---8<--- Unable to handle kernel paging request at virtual address 766b4008 Internal error: Oops: 96000005 [#1] PREEMPT SMP Modules linked in: vport_vxlan vxlan ip6_udp_tunnel udp_tunnel tun bridge stp llc openvswitch ipv6 CPU: 7 PID: 1108 Comm: ovs-vswitchd Not tainted 4.3.0-rc3+ #82 PC is at ovs_vport_get_stats+0x150/0x1f8 [openvswitch] <snip> Call trace: [<ffffffbffc0859f8>] ovs_vport_get_stats+0x150/0x1f8 [openvswitch] [<ffffffbffc07cdb0>] ovs_vport_cmd_fill_info+0x140/0x1e0 [openvswitch] [<ffffffbffc07cf0c>] ovs_vport_cmd_dump+0xbc/0x138 [openvswitch] [<ffffffc00045a5ac>] netlink_dump+0xb8/0x258 [<ffffffc00045ace0>] __netlink_dump_start+0x120/0x178 [<ffffffc00045dd9c>] genl_family_rcv_msg+0x2d4/0x308 [<ffffffc00045de58>] genl_rcv_msg+0x88/0xc4 [<ffffffc00045cf24>] netlink_rcv_skb+0xd4/0x100 [<ffffffc00045dab0>] genl_rcv+0x30/0x48 [<ffffffc00045c830>] netlink_unicast+0x154/0x200 [<ffffffc00045cc9c>] netlink_sendmsg+0x308/0x364 [<ffffffc00041e10c>] sock_sendmsg+0x14/0x2c [<ffffffc000420d58>] SyS_sendto+0xbc/0xf0 Code: aa1603e1 f94037a4 aa1303e2 aa1703e0 (f9400465) Reported-by: Tomasz Sawicki <tomasz.sawicki@objectiveintegration.uk> Fixes: 8c876639c98 ("openvswitch: Remove vport stats.") Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05bpf, seccomp: prepare for upcoming criu supportDaniel Borkmann
The current ongoing effort to dump existing cBPF seccomp filters back to user space requires to hold the pre-transformed instructions like we do in case of socket filters from sk_attach_filter() side, so they can be reloaded in original form at a later point in time by utilities such as criu. To prepare for this, simply extend the bpf_prog_create_from_user() API to hold a flag that tells whether we should store the original or not. Also, fanout filters could make use of that in future for things like diag. While fanout filters already use bpf_prog_destroy(), move seccomp over to them as well to handle original programs when present. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Tycho Andersen <tycho.andersen@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Tested-by: Tycho Andersen <tycho.andersen@canonical.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05ovs: do not allocate memory from offline numa nodeKonstantin Khlebnikov
When openvswitch tries allocate memory from offline numa node 0: stats = kmem_cache_alloc_node(flow_stats_cache, GFP_KERNEL | __GFP_ZERO, 0) It catches VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)) [ replaced with VM_WARN_ON(!node_online(nid)) recently ] in linux/gfp.h This patch disables numa affinity in this case. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05bpf: fix panic in SO_GET_FILTER with native ebpf programsDaniel Borkmann
When sockets have a native eBPF program attached through setsockopt(sk, SOL_SOCKET, SO_ATTACH_BPF, ...), and then try to dump these over getsockopt(sk, SOL_SOCKET, SO_GET_FILTER, ...), the following panic appears: [49904.178642] BUG: unable to handle kernel NULL pointer dereference at (null) [49904.178762] IP: [<ffffffff81610fd9>] sk_get_filter+0x39/0x90 [49904.182000] PGD 86fc9067 PUD 531a1067 PMD 0 [49904.185196] Oops: 0000 [#1] SMP [...] [49904.224677] Call Trace: [49904.226090] [<ffffffff815e3d49>] sock_getsockopt+0x319/0x740 [49904.227535] [<ffffffff812f59e3>] ? sock_has_perm+0x63/0x70 [49904.228953] [<ffffffff815e2fc8>] ? release_sock+0x108/0x150 [49904.230380] [<ffffffff812f5a43>] ? selinux_socket_getsockopt+0x23/0x30 [49904.231788] [<ffffffff815dff36>] SyS_getsockopt+0xa6/0xc0 [49904.233267] [<ffffffff8171b9ae>] entry_SYSCALL_64_fastpath+0x12/0x71 The underlying issue is the very same as in commit b382c0865600 ("sock, diag: fix panic in sock_diag_put_filterinfo"), that is, native eBPF programs don't store an original program since this is only needed in cBPF ones. However, sk_get_filter() wasn't updated to test for this at the time when eBPF could be attached. Just throw an error to the user to indicate that eBPF cannot be dumped over this interface. That way, it can also be known that a program _is_ attached (as opposed to just return 0), and a different (future) method needs to be consulted for a dump. Fixes: 89aa075832b0 ("net: sock: allow eBPF programs to be attached to sockets") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05openvswitch: Rename LABEL->LABELSJoe Stringer
Conntrack LABELS (plural) are exposed by conntrack; rename the OVS name for these to be consistent with conntrack. Fixes: c2ac667 "openvswitch: Allow matching on conntrack label" Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05net/unix: fix logic about sk_peek_offsetAndrey Vagin
Now send with MSG_PEEK can return data from multiple SKBs. Unfortunately we take into account the peek offset for each skb, that is wrong. We need to apply the peek offset only once. In addition, the peek offset should be used only if MSG_PEEK is set. Cc: "David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING Cc: Eric Dumazet <edumazet@google.com> (commit_signer:1/14=7%) Cc: Aaron Conole <aconole@bytheb.org> Fixes: 9f389e35674f ("af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag") Signed-off-by: Andrey Vagin <avagin@openvz.org> Tested-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05act_mirred: always release tcf hashWANG Cong
Align with other tc actions. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05act_mirred: fix a race condition on mirred_listWANG Cong
After commit 1ce87720d456 ("net: sched: make cls_u32 lockless") we began to release tc actions in a RCU callback. However, mirred action relies on RTNL lock to protect the global mirred_list, therefore we could have a race condition between RCU callback and netdevice event, which caused a list corruption as reported by Vinson. Instead of relying on RTNL lock, introduce a spinlock to protect this list. Note, in non-bind case, it is still called with RTNL lock, therefore should disable BH too. Reported-by: Vinson Lee <vlee@twopensource.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05ipv4: fix reply_dst leakage on arp replyJiri Benc
There are cases when the created metadata reply is not used. Ensure the allocated memory is freed also in such cases. Fixes: 63d008a4e9ee ("ipv4: send arp replies to the correct tunnel") Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-05inet: fix race in reqsk_queue_unlink()Eric Dumazet
reqsk_timer_handler() tests if icsk_accept_queue.listen_opt is NULL at its beginning. By the time it calls inet_csk_reqsk_queue_drop() and reqsk_queue_unlink(), listener might have been closed and inet_csk_listen_stop() had called reqsk_queue_yank_acceptq() which sets icsk_accept_queue.listen_opt to NULL We therefore need to correctly check listen_opt being NULL after holding syn_wait_lock for proper synchronization. Fixes: fa76ce7328b2 ("inet: get rid of central tcp/dccp listener timer") Fixes: b357a364c57c ("inet: fix possible panic in reqsk_queue_unlink()") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>