summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2015-11-02net: rds: changing the return type from int to voidSaurabh Sengar
as result of function rds_iw_flush_mr_pool is nowhere checked, changing its return type from int to void. also removing the unused variable rc as there is nothing to return Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02ipv4: use l4 hash for locally generated multipath flowsPaolo Abeni
This patch changes how the multipath hash is computed for locally generated flows: now the hash comprises l4 information. This allows better utilization of the available paths when the existing flows have the same source IP and the same destination IP: with l3 hash, even when multiple connections are in place simultaneously, a single path will be used, while with l4 hash we can use all the available paths. v2 changes: - use get_hash_from_flowi4() instead of implementing just another l4 hash function Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02SUNRPC: Remove the TCP-only restriction in bc_svc_process()Chuck Lever
Allow the use of other transport classes when handling a backward direction RPC call. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02svcrdma: Add backward direction service for RPC/RDMA transportChuck Lever
On NFSv4.1 mount points, the Linux NFS client uses this transport endpoint to receive backward direction calls and route replies back to the NFSv4.1 server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: "J. Bruce Fields" <bfields@fieldses.org> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02xprtrdma: Handle incoming backward direction RPC callsChuck Lever
Introduce a code path in the rpcrdma_reply_handler() to catch incoming backward direction RPC calls and route them to the ULP's backchannel server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02xprtrdma: Add support for sending backward direction RPC repliesChuck Lever
Backward direction RPC replies are sent via the client transport's send_request method, the same way forward direction RPC calls are sent. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02xprtrdma: Pre-allocate Work Requests for backchannelChuck Lever
Pre-allocate extra send and receive Work Requests needed to handle backchannel receives and sends. The transport doesn't know how many extra WRs to pre-allocate until the xprt_setup_backchannel() call, but that's long after the WRs are allocated during forechannel setup. So, use a fixed value for now. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffersChuck Lever
xprtrdma's backward direction send and receive buffers are the same size as the forechannel's inline threshold, and must be pre- registered. The consumer has no control over which receive buffer the adapter chooses to catch an incoming backwards-direction call. Any receive buffer can be used for either a forward reply or a backward call. Thus both types of RPC message must all be the same size. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02SUNRPC: Abstract backchannel operationsChuck Lever
xprt_{setup,destroy}_backchannel() won't be adequate for RPC/RMDA bi-direction. In particular, receive buffers have to be pre- registered and posted in order to receive incoming backchannel requests. Add a virtual function call to allow the insertion of appropriate backchannel setup and destruction methods for each transport. In addition, freeing a backchannel request is a little different for RPC/RDMA. Introduce an rpc_xprt_op to handle the difference. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02xprtrdma: Saving IRQs no longer needed for rb_lockChuck Lever
Now that RPC replies are processed in a workqueue, there's no need to disable IRQs when managing send and receive buffers. This saves noticeable overhead per RPC. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02xprtrdma: Remove reply taskletChuck Lever
Clean up: The reply tasklet is no longer used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02xprtrdma: Use workqueue to process RPC/RDMA repliesChuck Lever
The reply tasklet is fast, but it's single threaded. After reply traffic saturates a single CPU, there's no more reply processing capacity. Replace the tasklet with a workqueue to spread reply handling across all CPUs. This also moves RPC/RDMA reply handling out of the soft IRQ context and into a context that allows sleeps. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02xprtrdma: Replace send and receive arraysChuck Lever
The rb_send_bufs and rb_recv_bufs arrays are used to implement a pair of stacks for keeping track of free rpcrdma_req and rpcrdma_rep structs. Replace those arrays with free lists. To allow more than 512 RPCs in-flight at once, each of these arrays would be larger than a page (assuming 8-byte addresses and 4KB pages). Allowing up to 64K in-flight RPCs (as TCP now does), each buffer array would have to be 128 pages. That's an order-6 allocation. (Not that we're going there.) A list is easier to expand dynamically. Instead of allocating a larger array of pointers and copying the existing pointers to the new array, simply append more buffers to each list. This also makes it simpler to manage receive buffers that might catch backwards-direction calls, or to post receive buffers in bulk to amortize the overhead of ib_post_recv. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02xprtrdma: Refactor reply handler error handlingChuck Lever
Clean up: The error cases in rpcrdma_reply_handler() almost never execute. Ensure the compiler places them out of the hot path. No behavior change expected. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02xprtrdma: Prevent loss of completion signalsChuck Lever
Commit 8301a2c047cc ("xprtrdma: Limit work done by completion handler") was supposed to prevent xprtrdma's upcall handlers from starving other softIRQ work by letting them return to the provider before all CQEs have been polled. The logic assumes the provider will call the upcall handler again immediately if the CQ is re-armed while there are still queued CQEs. This assumption is invalid. The IBTA spec says that after a CQ is armed, the hardware must interrupt only when a new CQE is inserted. xprtrdma can't rely on the provider calling again, even though some providers do. Therefore, leaving CQEs on queue makes sense only when there is another mechanism that ensures all remaining CQEs are consumed in a timely fashion. xprtrdma does not have such a mechanism. If a CQE remains queued, the transport can wait forever to send the next RPC. Finally, move the wcs array back onto the stack to ensure that the poll array is always local to the CPU where the completion upcall is running. Fixes: 8301a2c047cc ("xprtrdma: Limit work done by completion ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02xprtrdma: Re-arm after missed eventsChuck Lever
ib_req_notify_cq(IB_CQ_REPORT_MISSED_EVENTS) returns a positive value if WCs were added to a CQ after the last completion upcall but before the CQ has been re-armed. Commit 7f23f6f6e388 ("xprtrmda: Reduce lock contention in completion handlers") assumed that when ib_req_notify_cq() returned a positive RC, the CQ had also been successfully re-armed, making it safe to return control to the provider without losing any completion signals. That is an invalid assumption. Change both completion handlers to continue polling while ib_req_notify_cq() returns a positive value. Fixes: 7f23f6f6e388 ("xprtrmda: Reduce lock contention in ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02xprtrdma: Enable swap-on-NFS/RDMAChuck Lever
After adding a swapfile on an NFS/RDMA mount and removing the normal swap partition, I was able to push the NFS client well into swap without any issue. I forgot to swapoff the NFS file before rebooting. This pinned the NFS mount and the IB core and provider, causing shutdown to hang. I think this is expected and safe behavior. Probably shutdown scripts should "swapoff -a" before unmounting any filesystems. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02xprtrdma: don't log warnings for flushed completionsSteve Wise
Unsignaled send WRs can get flushed as part of normal unmount, so don't log them as warnings. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-01Use 64-bit timekeepingTina Ruchandani
This patch changes the use of struct timespec in dccp_probe to use struct timespec64 instead. timespec uses a 32-bit seconds field which will overflow in the year 2038 and beyond. timespec64 uses a 64-bit seconds field. Note that the correctness of the code isn't changed, since the original code only uses the timestamps to compute a small elapsed interval. This patch is part of a larger attempt to remove instances of 32-bit timekeeping structures (timespec, timeval, time_t) from the kernel so it is easier to identify where the real 2038 issues are. Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01ipv4: update RTNH_F_LINKDOWN flag on UP eventJulian Anastasov
When nexthop is part of multipath route we should clear the LINKDOWN flag when link goes UP or when first address is added. This is needed because we always set LINKDOWN flag when DEAD flag was set but now on UP the nexthop is not dead anymore. Examples when LINKDOWN bit can be forgotten when no NETDEV_CHANGE is delivered: - link goes down (LINKDOWN is set), then link goes UP and device shows carrier OK but LINKDOWN remains set - last address is deleted (LINKDOWN is set), then address is added and device shows carrier OK but LINKDOWN remains set Steps to reproduce: modprobe dummy ifconfig dummy0 192.168.168.1 up here add a multipath route where one nexthop is for dummy0: ip route add 1.2.3.4 nexthop dummy0 nexthop SOME_OTHER_DEVICE ifconfig dummy0 down ifconfig dummy0 up now ip route shows nexthop that is not dead. Now set the sysctl var: echo 1 > /proc/sys/net/ipv4/conf/dummy0/ignore_routes_with_linkdown now ip route will show a dead nexthop because the forgotten RTNH_F_LINKDOWN is propagated as RTNH_F_DEAD. Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01ipv4: fix to not remove local route on link downJulian Anastasov
When fib_netdev_event calls fib_disable_ip on NETDEV_DOWN event we should not delete the local routes if the local address is still present. The confusion comes from the fact that both fib_netdev_event and fib_inetaddr_event use the NETDEV_DOWN constant. Fix it by returning back the variable 'force'. Steps to reproduce: modprobe dummy ifconfig dummy0 192.168.168.1 up ifconfig dummy0 down ip route list table local | grep dummy | grep host local 192.168.168.1 dev dummy0 proto kernel scope host src 192.168.168.1 Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01net: dsa: use switchdev obj for VLAN add/del opsVivien Didelot
Simplify DSA by pushing the switchdev objects for VLAN add and delete operations down to its drivers. Currently only mv88e6xxx is affected. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01VSOCK: define VSOCK_SS_LISTEN once onlyStefan Hajnoczi
The SS_LISTEN socket state is defined by both af_vsock.c and vmci_transport.c. This is risky since the value could be changed in one file and the other would be out of sync. Rename from SS_LISTEN to VSOCK_SS_LISTEN since the constant is not part of enum socket_state (SS_CONNECTED, ...). This way it is clear that the constant is vsock-specific. The big text reflow in af_vsock.c was necessary to keep to the maximum line length. Text is unchanged except for s/SS_LISTEN/VSOCK_SS_LISTEN/. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01tipc: linearize arriving NAME_DISTR and LINK_PROTO buffersJon Paul Maloy
Testing of the new UDP bearer has revealed that reception of NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE message buffers is not prepared for the case that those may be non-linear. We now linearize all such buffers before they are delivered up to the generic reception layer. In order for the commit to apply cleanly to 'net' and 'stable', we do the change in the function tipc_udp_recv() for now. Later, we will post a commit to 'net-next' moving the linearization to generic code, in tipc_named_rcv() and tipc_link_proto_rcv(). Fixes: commit d0f91938bede ("tipc: add ip/udp media type") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragmentHannes Frederic Sowa
CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one of those warn about them once and handle them gracefully by recalculating the checksum. Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets") See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL") Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked socketsHannes Frederic Sowa
We cannot reliable calculate packet size on MSG_MORE corked sockets and thus cannot decide if they are going to be fragmented later on, so better not use CHECKSUM_PARTIAL in the first place. The IPv6 code also intended to protect and not use CHECKSUM_PARTIAL in the existence of IPv6 extension headers, but the condition was wrong. Fix it up, too. Also the condition to check whether the packet fits into one fragment was wrong and has been corrected. Fixes: commit 32dce968dd987 ("ipv6: Allow for partial checksums on non-ufo packets") See-also: commit 72e843bb09d45 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL") Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragmentHannes Frederic Sowa
CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one of those warn about them once and handle them gracefully by recalculating the checksum. Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked socketsHannes Frederic Sowa
We cannot reliable calculate packet size on MSG_MORE corked sockets and thus cannot decide if they are going to be fragmented later on, so better not use CHECKSUM_PARTIAL in the first place. Cc: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Benjamin Coddington <bcodding@redhat.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2015-10-30Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2015-10-30 1) The flow cache is limited by the flow cache limit which depends on the number of cpus and the xfrm garbage collector threshold which is independent of the number of cpus. This leads to the fact that on systems with more than 16 cpus we hit the xfrm garbage collector limit and refuse new allocations, so new flows are dropped. On systems with 16 or less cpus, we hit the flowcache limit. In this case, we shrink the flow cache instead of refusing new flows. We increase the xfrm garbage collector threshold to INT_MAX to get the same behaviour, independent of the number of cpus. 2) Fix some unaligned accesses on sparc systems. From Sowmini Varadhan. 3) Fix some header checks in _decode_session4. We may call pskb_may_pull with a negative value converted to unsigened int from pskb_may_pull. This can lead to incorrect policy lookups. We fix this by a check of the data pointer position before we call pskb_may_pull. 4) Reload skb header pointers after calling pskb_may_pull in _decode_session4 as this may change the pointers into the packet. 5) Add a missing statistic counter on inner mode errors. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30switchdev: fix: pass correct obj size when deferring obj addScott Feldman
Fixes: 4d429c5dd ("switchdev: introduce possibility to defer obj_add/del") Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30switchdev: fix: erasing too much of vlan obj when handling multiple vlan specsScott Feldman
When adding vlans with multiple IFLA_BRIDGE_VLAN_INFO attrs set in AFSPEC, we would wipe the vlan obj struct after the first IFLA_BRIDGE_VLAN_INFO. Fix this by only clearing what's necessary on each IFLA_BRIDGE_VLAN_INFO iteration. Fixes: 9e8f4a54 ("switchdev: push object ID back to object structure") Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30Merge tag 'nfc-next-4.4-2' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz says: ==================== NFC 4.4 pull request This is the NFC pull request for 4.4. It's a bit bigger than usual, the 3 main culprits being: - A new driver for Intel's Fields Peak NCI chipset. In order to support this chipset we had to export a few NCI routines and extend the driver NCI ops to not only support proprietary commands but also core ones. - Support for vendor commands for both STM drivers, st-nci and st21nfca. Those vendor commands allow to run factory tests through the NFC netlink interface. - New i2c and SPI support for the Marvell driver, together with firmware download support for this driver's core. Besides that we also have: - A few file renames in the STM drivers, to keep the naming consistent between drivers. - Some improvements and fixes on the NCI HCI layer, mostly to properly reach a secure element over a legacy HCI link. - A few fixes for the s3fwrn5 and trf7970a drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-10-28 Here are a some more Bluetooth patches for 4.4 which collected up during the past week. The most important ones are from Kuba Pawlak for fixing locking issues with SCO sockets. There's also a fix from Alexander Aring for 6lowpan, a memleak fix from Julia Lawall for the btmrvl driver and some cleanup patches from Marcel. Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30ipv6: recreate ipv6 link-local addresses when increasing MTU over IPV6_MIN_MTUAlexander Duyck
This change makes it so that we reinitialize the interface if the MTU is increased back above IPV6_MIN_MTU and the interface is up. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30switchdev: Add support for flood controlIdo Schimmel
Allow devices supporting this feature to control the flooding of unknown unicast traffic, by making switchdev infrastructure propagate this setting to the switch driver. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30bridge: set is_local and is_static before fdb entry is added to the fdb ↵Roopa Prabhu
hashtable Problem Description: We can add fdbs pointing to the bridge with NULL ->dst but that has a few race conditions because br_fdb_insert() is used which first creates the fdb and then, after the fdb has been published/linked, sets "is_local" to 1 and in that time frame if a packet arrives for that fdb it may see it as non-local and either do a NULL ptr dereference in br_forward() or attach the fdb to the port where it arrived, and later br_fdb_insert() will make it local thus getting a wrong fdb entry. Call chain br_handle_frame_finish() -> br_forward(): But in br_handle_frame_finish() in order to call br_forward() the dst should not be local i.e. skb != NULL, whenever the dst is found to be local skb is set to NULL so we can't forward it, and here comes the problem since it's running only with RCU when forwarding packets it can see the entry before "is_local" is set to 1 and actually try to dereference NULL. The main issue is that if someone sends a packet to the switch while it's adding the entry which points to the bridge device, it may dereference NULL ptr. This is needed now after we can add fdbs pointing to the bridge. This poses a problem for br_fdb_update() as well, while someone's adding a bridge fdb, but before it has is_local == 1, it might get moved to a port if it comes as a source mac and then it may get its "is_local" set to 1 This patch changes fdb_create to take is_local and is_static as arguments to set these values in the fdb entry before it is added to the hash. Also adds null check for port in br_forward. Fixes: 3741873b4f73 ("bridge: allow adding of fdb entries pointing to the bridge device") Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-29ipv6: protect mtu calculation of wrap-around and infinite loop by rounding ↵Hannes Frederic Sowa
issues Raw sockets with hdrincl enabled can insert ipv6 extension headers right into the data stream. In case we need to fragment those packets, we reparse the options header to find the place where we can insert the fragment header. If the extension headers exceed the link's MTU we actually cannot make progress in such a case. Instead of ending up in broken arithmetic or rounding towards 0 and entering an endless loop in ip6_fragment, just prevent those cases by aborting early and signal -EMSGSIZE to user space. This is the second version of the patch which doesn't use the overflow_usub function, which got reverted for now. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-29Revert "Merge branch 'ipv6-overflow-arith'"Hannes Frederic Sowa
Linus dislikes these changes. To not hold up the net-merge let's revert it for now and fix the bug like Linus suggested. This reverts commit ec3661b42257d9a06cf0d318175623ac7a660113, reversing changes made to c80dbe04612986fd6104b4a1be21681b113b5ac9. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-28RDS/IW: Convert to new memory registration APISagi Grimberg
Get rid of fast_reg page list and its construction. Instead, just pass the RDS sg list to ib_map_mr_sg and post the new ib_reg_wr. This is done both for server IW RDMA_READ registration and the client remote key registration. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28svcrdma: Port to new memory registration APISagi Grimberg
Instead of maintaining a fastreg page list, keep an sg table and convert an array of pages to a sg list. Then call ib_map_mr_sg and construct ib_reg_wr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Selvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28xprtrdma: Port to new memory registration APISagi Grimberg
Instead of maintaining a fastreg page list, keep an sg table and convert an array of pages to a sg list. Then call ib_map_mr_sg and construct ib_reg_wr. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Selvin Xavier <selvin.xavier@avagotech.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28Merge branch 'wr-cleanup' into k.o/for-4.4Doug Ledford
2015-10-28Merge branch 'wr-cleanup' of git://git.infradead.org/users/hch/rdma into ↵Doug Ledford
wr-cleanup Signed-off-by: Doug Ledford <dledford@redhat.com> Conflicts: drivers/infiniband/ulp/isert/ib_isert.c - Commit 4366b19ca5eb (iser-target: Change the recv buffers posting logic) changed the logic in isert_put_datain() and had to be hand merged
2015-10-28IB/cma: Add support for network namespacesGuy Shapiro
Add support for network namespaces in the ib_cma module. This is accomplished by: 1. Adding network namespace parameter for rdma_create_id. This parameter is used to populate the network namespace field in rdma_id_private. rdma_create_id keeps a reference on the network namespace. 2. Using the network namespace from the rdma_id instead of init_net inside of ib_cma, when listening on an ID and when looking for an ID for an incoming request. 3. Decrementing the reference count for the appropriate network namespace when calling rdma_destroy_id. In order to preserve the current behavior init_net is passed when calling from other modules. Signed-off-by: Guy Shapiro <guysh@mellanox.com> Signed-off-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Yotam Kenneth <yotamke@mellanox.com> Signed-off-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28NFC: nci: non-static functions can not be inlineRobert Dolca
This fixes a build error that seems to be toochain dependent (Not seen with gcc v5.1): In file included from net/nfc/nci/rsp.c:36:0: net/nfc/nci/rsp.c: In function ‘nci_rsp_packet’: include/net/nfc/nci_core.h:355:12: error: inlining failed in call to always_inline ‘nci_prop_rsp_packet’: function body not available inline int nci_prop_rsp_packet(struct nci_dev *ndev, __u16 opcode, Signed-off-by: Robert Dolca <robert.dolca@intel.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-27mpls: reduce memory usage of routesRobert Shearman
Nexthops for MPLS routes have a via address field sized for the largest via address that is expected, which is 32 bytes. This means that in the most common case of having ipv4 via addresses, 28 bytes of memory more than required are used per nexthop. In the other common case of an ipv6 nexthop then 16 bytes more than required are used. With large numbers of MPLS routes this extra memory usage could start to become significant. To avoid allocating memory for a maximum length via address when not all of it is required and to allow for ease of iterating over nexthops, then the via addresses are changed to be stored in the same memory block as the route and nexthops, but in an array after the end of the array of nexthops. New accessors are provided to retrieve a pointer to the via address. To allow for O(1) access without having to store a pointer or offset per nh, the via address for each nexthop is sized according to the maximum via address for any nexthop in the route, which is stored in a new route field, rt_max_alen, but this is in an existing hole in struct mpls_route so it doesn't increase the size of the structure. Each via address is ensured to be aligned to VIA_ALEN_ALIGN to account for architectures that don't allow unaligned accesses. Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27mpls: fix forwarding using v4/v6 explicit nullRobert Shearman
Fill in the via address length for the predefined IPv4 and IPv6 explicit-null label routes. Fixes: f8efb73c97e2 ("mpls: multipath route support") Signed-off-by: Robert Shearman <rshearma@brocade.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in ↵Sowmini Varadhan
rds_tcp_data_recv Either of pskb_pull() or pskb_trim() may fail under low memory conditions. If rds_tcp_data_recv() ignores such failures, the application will receive corrupted data because the skb has not been correctly carved to the RDS datagram size. Avoid this by handling pskb_pull/pskb_trim failure in the same manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and retry via the deferred call to rds_send_worker() that gets set up on ENOMEM from rds_tcp_read_sock() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-28netfilter: nfnetlink: don't probe module if it existsFlorian Westphal
nfnetlink_bind request_module()s all the time as nfnetlink_get_subsys() shifts the argument by 8 to obtain the subsys id. So using type instead of type << 8 always returns NULL. Fixes: 03292745b02d11 ("netlink: add nlk->netlink_bind hook for module auto-loading") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>