summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-05-17xprtrdma: Refactor the FRWR recovery workerChuck Lever
Maintain the order of invalidation and DMA unmapping when doing a background MR reset. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Reset MRs in frwr_op_unmap_sync()Chuck Lever
frwr_op_unmap_sync() is now invoked in a workqueue context, the same as __frwr_queue_recovery(). There's no need to defer MR reset if posting LOCAL_INV MRs fails. This means that even when ib_post_send() fails (which should occur very rarely) the invalidation and DMA unmapping steps are still done in the correct order. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Save I/O direction in struct rpcrdma_frwrChuck Lever
Move the the I/O direction field from rpcrdma_mr_seg into the rpcrdma_frmr. This makes it possible to DMA-unmap the frwr long after an RPC has exited and its rpcrdma_mr_seg array has been released and re-used. This might occur if an RPC times out while waiting for a new connection to be established. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Rename rpcrdma_frwr::sg and sg_nentsChuck Lever
Clean up: Follow same naming convention as other fields in struct rpcrdma_frwr. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Use core ib_drain_qp() APIChuck Lever
Clean up: Replace rpcrdma_flush_cqs() and rpcrdma_clean_cqs() with the new ib_drain_qp() API. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Leon Romanovsky <leonro@mellanox.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Remove rpcrdma_create_chunks()Chuck Lever
rpcrdma_create_chunks() has been replaced, and can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Allow Read list and Reply chunk simultaneouslyChuck Lever
rpcrdma_marshal_req() makes a simplifying assumption: that NFS operations with large Call messages have small Reply messages, and vice versa. Therefore with RPC-over-RDMA, only one chunk type is ever needed for each Call/Reply pair, because one direction needs chunks, the other direction will always fit inline. In fact, this assumption is asserted in the code: if (rtype != rpcrdma_noch && wtype != rpcrdma_noch) { dprintk("RPC: %s: cannot marshal multiple chunk lists\n", __func__); return -EIO; } But RPCGSS_SEC breaks this assumption. Because krb5i and krb5p perform data transformation on RPC messages before they are transmitted, direct data placement techniques cannot be used, thus RPC messages must be sent via a Long call in both directions. All such calls are sent with a Position Zero Read chunk, and all such replies are handled with a Reply chunk. Thus the client must provide every Call/Reply pair with both a Read list and a Reply chunk. Without any special security in effect, NFSv4 WRITEs may now also use the Read list and provide a Reply chunk. The marshal_req logic was preventing that, meaning an NFSv4 WRITE with a large payload that included a GETATTR result larger than the inline threshold would fail. The code that encodes each chunk list is now completely contained in its own function. There is some code duplication, but the trade-off is that the overall logic should be more clear. Note that all three chunk lists now share the rl_segments array. Some additional per-req accounting is necessary to track this usage. For the same reasons that the above simplifying assumption has held true for so long, I don't expect more array elements are needed at this time. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Update comments in rpcrdma_marshal_req()Chuck Lever
Update documenting comments to reflect code changes over the past year. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Avoid using Write list for small NFS READ requestsChuck Lever
Avoid the latency and interrupt overhead of registering a Write chunk when handling NFS READ requests of a few hundred bytes or less. This change does not interoperate with Linux NFS/RDMA servers that do not have commit 9d11b51ce7c1 ('svcrdma: Fix send_reply() scatter/gather set-up'). Commit 9d11b51ce7c1 was introduced in v4.3, and is included in 4.2.y, 4.1.y, and 3.18.y. Oracle bug 22925946 has been filed to request that the above fix be included in the Oracle Linux UEK4 NFS/RDMA server. Red Hat bugzillas 1327280 and 1327554 have been filed to request that RHEL NFS/RDMA server backports include the above fix. Workaround: Replace the "proto=rdma,port=20049" mount options with "proto=tcp" until commit 9d11b51ce7c1 is applied to your NFS server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Prevent inline overflowChuck Lever
When deciding whether to send a Call inline, rpcrdma_marshal_req doesn't take into account header bytes consumed by chunk lists. This results in Call messages on the wire that are sometimes larger than the inline threshold. Likewise, when a Write list or Reply chunk is in play, the server's reply has to emit an RDMA Send that includes a larger-than-minimal RPC-over-RDMA header. The actual size of a Call message cannot be estimated until after the chunk lists have been registered. Thus the size of each RPC-over-RDMA header can be estimated only after chunks are registered; but the decision to register chunks is based on the size of that header. Chicken, meet egg. The best a client can do is estimate header size based on the largest header that might occur, and then ensure that inline content is always smaller than that. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Limit number of RDMA segments in RPC-over-RDMA headersChuck Lever
Send buffer space is shared between the RPC-over-RDMA header and an RPC message. A large RPC-over-RDMA header means less space is available for the associated RPC message, which then has to be moved via an RDMA Read or Write. As more segments are added to the chunk lists, the header increases in size. Typical modern hardware needs only a few segments to convey the maximum payload size, but some devices and registration modes may need a lot of segments to convey data payload. Sometimes so many are needed that the remaining space in the Send buffer is not enough for the RPC message. Sending such a message usually fails. To ensure a transport can always make forward progress, cap the number of RDMA segments that are allowed in chunk lists. This prevents less-capable devices and memory registrations from consuming a large portion of the Send buffer by reducing the maximum data payload that can be conveyed with such devices. For now I choose an arbitrary maximum of 8 RDMA segments. This allows a maximum size RPC-over-RDMA header to fit nicely in the current 1024 byte inline threshold with over 700 bytes remaining for an inline RPC message. The current maximum data payload of NFS READ or WRITE requests is one megabyte. To convey that payload on a client with 4KB pages, each chunk segment would need to handle 32 or more data pages. This is well within the capabilities of FMR. For physical registration, the maximum payload size on platforms with 4KB pages is reduced to 32KB. For FRWR, a device's maximum page list depth would need to be at least 34 to support the maximum 1MB payload. A device with a smaller maximum page list depth means the maximum data payload is reduced when using that device. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Bound the inline threshold valuesChuck Lever
Currently the sysctls that allow setting the inline threshold allow any value to be set. Small values only make the transport run slower. The default 1KB setting is as low as is reasonable. And the logic that decides how to divide a Send buffer between RPC-over-RDMA header and RPC message assumes (but does not check) that the lower bound is not crazy (say, 57 bytes). Send and receive buffers share a page with some control information. Values larger than about 3KB can't be supported, currently. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17sunrpc: Advertise maximum backchannel payload sizeChuck Lever
RPC-over-RDMA transports have a limit on how large a backward direction (backchannel) RPC message can be. Ensure that the NFSv4.x CREATE_SESSION operation advertises this limit to servers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17sunrpc: Update RPCBIND_MAXNETIDLENChuck Lever
Commit 176e21ee2ec8 ("SUNRPC: Support for RPC over AF_LOCAL transports") added a 5-character netid, but did not bump RPCBIND_MAXNETIDLEN from 4 to 5. Fixes: 176e21ee2ec8 ("SUNRPC: Support for RPC over AF_LOCAL ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17xprtrdma: Add rdma6 option to support NFS/RDMA IPv6Shirley Ma
RFC 5666: The "rdma" netid is to be used when IPv4 addressing is employed by the underlying transport, and "rdma6" for IPv6 addressing. Add mount -o proto=rdma6 option to support NFS/RDMA IPv6 addressing. Changes from v2: - Integrated comments from Chuck Level, Anna Schumaker, Trodt Myklebust - Add a little more to the patch description to describe NFS/RDMA IPv6 suggested by Chuck Level and Anna Schumaker - Removed duplicated rdma6 define - Remove Opt_xprt_rdma mountfamily since it doesn't support Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17nfs4: client: do not send empty SETATTR after OPEN_CREATETigran Mkrtchyan
OPEN_CREATE with EXCLUSIVE4_1 sends initial file permission. Ignoring fact, that server have indicated that file mod is set, client will send yet another SETATTR request, but, as mode is already set, new SETATTR will be empty. This is not a problem, nevertheless an extra roundtrip and slow open on high latency networks. This change is aims to skip extra setattr after open if there are no attributes to be set. Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17NFS: Add COPY nfs operationAnna Schumaker
This adds the copy_range file_ops function pointer used by the sys_copy_range() function call. This patch only implements sync copies, so if an async copy happens we decode the stateid and ignore it. Signed-off-by: Anna Schumaker <bjschuma@netapp.com>
2016-05-17NFS: Add nfs_commit_file()Anna Schumaker
Copy will use this to set up a commit request for a generic range. I don't want to allocate a new pagecache entry for the file, so I needed to change parts of the commit path to handle requests with a null wb_page. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17Fixing oops in callback pathOlga Kornievskaia
Commit 80f9642724af5 ("NFSv4.x: Enforce the ca_maxreponsesize_cached on the back channel") causes an oops when it receives a callback with cachethis=yes. [ 109.667378] BUG: unable to handle kernel NULL pointer dereference at 00000000000002c8 [ 109.669476] IP: [<ffffffffa08a3e68>] nfs4_callback_compound+0x4f8/0x690 [nfsv4] [ 109.671216] PGD 0 [ 109.671736] Oops: 0000 [#1] SMP [ 109.705427] CPU: 1 PID: 3579 Comm: nfsv4.1-svc Not tainted 4.5.0-rc1+ #1 [ 109.706987] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014 [ 109.709468] task: ffff8800b4408000 ti: ffff88008448c000 task.ti: ffff88008448c000 [ 109.711207] RIP: 0010:[<ffffffffa08a3e68>] [<ffffffffa08a3e68>] nfs4_callback_compound+0x4f8/0x690 [nfsv4] [ 109.713521] RSP: 0018:ffff88008448fca0 EFLAGS: 00010286 [ 109.714762] RAX: ffff880081ee202c RBX: ffff8800b7b5b600 RCX: 0000000000000001 [ 109.716427] RDX: 0000000000000008 RSI: 0000000000000008 RDI: 0000000000000000 [ 109.718091] RBP: ffff88008448fda8 R08: 0000000000000000 R09: 000000000b000000 [ 109.719757] R10: ffff880137786000 R11: ffff8800b7b5b600 R12: 0000000001000000 [ 109.721415] R13: 0000000000000002 R14: 0000000053270000 R15: 000000000000000b [ 109.723061] FS: 0000000000000000(0000) GS:ffff880139640000(0000) knlGS:0000000000000000 [ 109.724931] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 109.726278] CR2: 00000000000002c8 CR3: 0000000034d50000 CR4: 00000000001406e0 [ 109.727972] Stack: [ 109.728465] ffff880081ee202c ffff880081ee201c 000000008448fcc0 ffff8800baccb800 [ 109.730349] ffff8800baccc800 ffffffffa08d0380 0000000000000000 0000000000000000 [ 109.732211] ffff8800b7b5b600 0000000000000001 ffffffff81d073c0 ffff880081ee3090 [ 109.734056] Call Trace: [ 109.734657] [<ffffffffa03795d4>] svc_process_common+0x5c4/0x6c0 [sunrpc] [ 109.736267] [<ffffffffa0379a4c>] bc_svc_process+0x1fc/0x360 [sunrpc] [ 109.737775] [<ffffffffa08a2c2c>] nfs41_callback_svc+0x10c/0x1d0 [nfsv4] [ 109.739335] [<ffffffff810cb380>] ? prepare_to_wait_event+0xf0/0xf0 [ 109.740799] [<ffffffffa08a2b20>] ? nfs4_callback_svc+0x50/0x50 [nfsv4] [ 109.742349] [<ffffffff810a6998>] kthread+0xd8/0xf0 [ 109.743495] [<ffffffff810a68c0>] ? kthread_park+0x60/0x60 [ 109.744776] [<ffffffff816abc4f>] ret_from_fork+0x3f/0x70 [ 109.746037] [<ffffffff810a68c0>] ? kthread_park+0x60/0x60 [ 109.747324] Code: cc 45 31 f6 48 8b 85 00 ff ff ff 44 89 30 48 8b 85 f8 fe ff ff 44 89 20 48 8b 9d 38 ff ff ff 48 8b bd 30 ff ff ff 48 85 db 74 4c <4c> 8b af c8 02 00 00 4d 8d a5 08 02 00 00 49 81 c5 98 02 00 00 [ 109.754361] RIP [<ffffffffa08a3e68>] nfs4_callback_compound+0x4f8/0x690 [nfsv4] [ 109.756123] RSP <ffff88008448fca0> [ 109.756951] CR2: 00000000000002c8 [ 109.757738] ---[ end trace 2b8555511ab5dfb4 ]--- [ 109.758819] Kernel panic - not syncing: Fatal exception [ 109.760126] Kernel Offset: disabled [ 118.938934] ---[ end Kernel panic - not syncing: Fatal exception It doesn't unlock the table nor does it set the cps->clp pointer which is later needed by nfs4_cb_free_slot(). Fixes: 80f9642724af5 ("NFSv4.x: Enforce the ca_maxresponsesize_cached ...") CC: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2016-05-17Merge branches 'pci/arm64' and 'pci/host-hv' into nextBjorn Helgaas
* pci/arm64: PCI, of: Move PCI I/O space management to PCI core code PCI: generic, thunder: Use generic ECAM API PCI: Provide common functions for ECAM mapping * pci/host-hv: PCI: hv: Add explicit barriers to config space access
2016-05-17Merge branches 'pci/hotplug' and 'pci/resource' into nextBjorn Helgaas
* pci/hotplug: PCI: Use cached copy of PCI_EXP_SLTCAP_HPC bit * pci/resource: PCI: Disable all BAR sizing for devices with non-compliant BARs x86/PCI: Mark Broadwell-EP Home Agent 1 as having non-compliant BARs PCI: Identify Enhanced Allocation (EA) BAR Equivalent resources in sysfs
2016-05-17remove directory incorrectly tries to set delete on close on non-empty ↵Steve French
directories Wrong return code was being returned on SMB3 rmdir of non-empty directory. For SMB3 (unlike for cifs), we attempt to delete a directory by set of delete on close flag on the open. Windows clients set this flag via a set info (SET_FILE_DISPOSITION to set this flag) which properly checks if the directory is empty. With this patch on smb3 mounts we correctly return "DIRECTORY NOT EMPTY" on attempts to remove a non-empty directory. Signed-off-by: Steve French <steve.french@primarydata.com> CC: Stable <stable@vger.kernel.org> Acked-by: Sachin Prabhu <sprabhu@redhat.com>
2016-05-17Update cifs.ko version to 2.09Steve French
Signed-off-by: Steven French <steve.french@primarydata.com>
2016-05-17fs/cifs: correctly to anonymous authentication for the NTLM(v2) authenticationStefan Metzmacher
Only server which map unknown users to guest will allow access using a non-null NTLMv2_Response. For Samba it's the "map to guest = bad user" option. BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913 Signed-off-by: Stefan Metzmacher <metze@samba.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17fs/cifs: correctly to anonymous authentication for the NTLM(v1) authenticationStefan Metzmacher
Only server which map unknown users to guest will allow access using a non-null NTChallengeResponse. For Samba it's the "map to guest = bad user" option. BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913 Signed-off-by: Stefan Metzmacher <metze@samba.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17fs/cifs: correctly to anonymous authentication for the LANMAN authenticationStefan Metzmacher
Only server which map unknown users to guest will allow access using a non-null LMChallengeResponse. For Samba it's the "map to guest = bad user" option. BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913 Signed-off-by: Stefan Metzmacher <metze@samba.org> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17fs/cifs: correctly to anonymous authentication via NTLMSSPStefan Metzmacher
See [MS-NLMP] 3.2.5.1.2 Server Receives an AUTHENTICATE_MESSAGE from the Client: ... Set NullSession to FALSE If (AUTHENTICATE_MESSAGE.UserNameLen == 0 AND AUTHENTICATE_MESSAGE.NtChallengeResponse.Length == 0 AND (AUTHENTICATE_MESSAGE.LmChallengeResponse == Z(1) OR AUTHENTICATE_MESSAGE.LmChallengeResponse.Length == 0)) -- Special case: client requested anonymous authentication Set NullSession to TRUE ... Only server which map unknown users to guest will allow access using a non-null NTChallengeResponse. For Samba it's the "map to guest = bad user" option. BUG: https://bugzilla.samba.org/show_bug.cgi?id=11913 CC: Stable <stable@vger.kernel.org> Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17cifs: remove any preceding delimiter from prefix_pathSachin Prabhu
We currently do not check if any delimiter exists before the prefix path in cifs_compose_mount_options(). Consequently when building the devname using cifs_build_devname() we can end up with multiple delimiters separating the UNC and the prefix path. An issue was reported by the customer mounting a folder within a DFS share from a Netapp server which uses McAfee antivirus. We have narrowed down the cause to the use of double backslashes in the file name used to open the file. This was determined to be caused because of additional delimiters as a result of the bug. In addition to changes in cifs_build_devname(), we also fix cifs_parse_devname() to ignore any preceding delimiter for the prefix path. The problem was originally reported on RHEL 6 in RHEL bz 1252721. This is the upstream version of the fix. The fix was confirmed by looking at the packet capture of a DFS mount. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17cifs: Use file_dentry()Goldwyn Rodrigues
CIFS may be used as lower layer of overlayfs and accessing f_path.dentry can lead to a crash. Fix by replacing direct access of file->f_path.dentry with the file_dentry() accessor, which will always return a native object. Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>
2016-05-17Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"David S. Miller
This reverts commit 7f32541c2fdaa84af418c3e1431bbd066ab44d09. This needs reverting too, as per requests. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-17Revert "phy dp83867: Make rgmii parameters optional"David S. Miller
This reverts commit 81003bc924bac0a99bfdc2869f5dff5a87aa4a3d. Developers have asked me to revert this for now. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-17r8169: default to 64-bit DMA on recent PCIe chipsArd Biesheuvel
The current logic around the 'use_dac' module parameter prevents the r81969 driver from being loadable on 64-bit systems without any RAM below 4 GB when the parameter is left at its default value. So introduce a new default value -1 which indicates that 64-bit DMA should be enabled on sufficiently recent PCIe chips, i.e., versions RTL_GIGA_MAC_VER_18 or later. Explicit param values of 0 or 1 retain the existing behavior of unconditionally enabling/disabling 64-bit DMA on 64-bit architectures (i.e., regardless of the type and version of the chip) Since PCIe chips do not need to CPlusCmd Dual Address Cycle to be set, make that conditional on the device type as well. Cc: Realtek linux nic maintainers <nic_swsd@realtek.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-17phy dp83867: Make rgmii parameters optionalAlexander Graf
If you compile without OF_MDIO support in an RGMII configuration, we fail to configure the dp83867 phy today by writing garbage into its configuration registers. On the other hand if you do compile with OF_MDIO and the phy gets loaded via device tree, you have to have the properties set in the device tree, otherwise we fail to load the driver and don't even attach the generic phy driver to the interface anymore. To make things slightly more consistent, make the rgmii configuration properties optional and allow a user to omit them in their device tree. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-17phy dp83867: Fix compilation with CONFIG_OF_MDIO=mAlexander Graf
When CONFIG_OF_MDIO is configured as module, the #define for it really is CONFIG_OF_MDIO_MODULE, not CONFIG_OF_MDIO. So if we are compiling it as module, the dp83867 doesn't see that OF_MDIO was selected and doesn't read the dt rgmii parameters. The fix is simple: Use IS_ENABLED(). It checks for both - module as well as compiled in code. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-17Merge tag 'net-next-qcom-soc-4.7-2-merge' of git://github.com/andersson/kernelDavid S. Miller
Merge tag 'qcom-soc-for-4.7-2' into net-next This merges the Qualcomm SOC tree with the net-next, solving the merge conflict in the SMD API between the two.
2016-05-17bpf: arm64: remove callee-save registers use for tmp registersYang Shi
In the current implementation of ARM64 eBPF JIT, R23 and R24 are used for tmp registers, which are callee-saved registers. This leads to variable size of JIT prologue and epilogue. The latest blinding constant change prefers to constant size of prologue and epilogue. AAPCS reserves R9 ~ R15 for temp registers which not need to be saved/restored during function call. So, replace R23 and R24 to R10 and R11, and remove tmp_used flag to save 2 instructions for some jited BPF program. CC: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-17asix: Fix offset calculation in asix_rx_fixup() causing slow transmissionsJohn Stultz
In testing with HiKey, we found that since commit 3f30b158eba5 ("asix: On RX avoid creating bad Ethernet frames"), we're seeing lots of noise during network transfers: [ 239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x54ebb5ec, offset 4 [ 239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0xcdffe7a2, offset 4 [ 239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x1d36f59d, offset 4 [ 239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0xaef3c1e9, offset 4 [ 239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation was lost, remaining 988 [ 239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x2881912, offset 4 [ 239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 0x5638f7e2, offset 4 And network throughput ends up being pretty bursty and slow with a overall throughput of at best ~30kB/s (where as previously we got 1.1MB/s with the slower USB1.1 "full speed" host). We found the issue also was reproducible on a x86_64 system, using a "high-speed" USB2.0 port but the throughput did not measurably drop (possibly due to the scp transfer being cpu bound on my slow test hardware). After lots of debugging, I found the check added in the problematic commit seems to be calculating the offset incorrectly. In the normal case, in the main loop of the function, we do: (where offset is zero, or set to "offset += (copy_length + 1) & 0xfffe" in the previous loop) rx->header = get_unaligned_le32(skb->data + offset); offset += sizeof(u32); But the problematic patch calculates: offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32); rx->header = get_unaligned_le32(skb->data + offset); Adding some debug logic to check those offset calculation used to find rx->header, the one in problematic code is always too large by sizeof(u32). Thus, this patch removes the incorrect " + sizeof(u32)" addition in the problematic calculation, and resolves the issue. Cc: Dean Jenkins <Dean_Jenkins@mentor.com> Cc: "David B. Robins" <linux@davidrobins.net> Cc: Mark Craske <Mark_Craske@mentor.com> Cc: Emil Goode <emilgoode@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: YongQin Liu <yongqin.liu@linaro.org> Cc: Guodong Xu <guodong.xu@linaro.org> Cc: Ivan Vecera <ivecera@redhat.com> Cc: linux-usb@vger.kernel.org Cc: netdev@vger.kernel.org Cc: stable <stable@vger.kernel.org> #4.4+ Reported-by: Yongqin Liu <yongqin.liu@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-17Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull parallel filesystem directory handling update from Al Viro. This is the main parallel directory work by Al that makes the vfs layer able to do lookup and readdir in parallel within a single directory. That's a big change, since this used to be all protected by the directory inode mutex. The inode mutex is replaced by an rwsem, and serialization of lookups of a single name is done by a "in-progress" dentry marker. The series begins with xattr cleanups, and then ends with switching filesystems over to actually doing the readdir in parallel (switching to the "iterate_shared()" that only takes the read lock). A more detailed explanation of the process from Al Viro: "The xattr work starts with some acl fixes, then switches ->getxattr to passing inode and dentry separately. This is the point where the things start to get tricky - that got merged into the very beginning of the -rc3-based #work.lookups, to allow untangling the security_d_instantiate() mess. The xattr work itself proceeds to switch a lot of filesystems to generic_...xattr(); no complications there. After that initial xattr work, the series then does the following: - untangle security_d_instantiate() - convert a bunch of open-coded lookup_one_len_unlocked() to calls of that thing; one such place (in overlayfs) actually yields a trivial conflict with overlayfs fixes later in the cycle - overlayfs ended up switching to a variant of lookup_one_len_unlocked() sans the permission checks. I would've dropped that commit (it gets overridden on merge from #ovl-fixes in #for-next; proper resolution is to use the variant in mainline fs/overlayfs/super.c), but I didn't want to rebase the damn thing - it was fairly late in the cycle... - some filesystems had managed to depend on lookup/lookup exclusion for *fs-internal* data structures in a way that would break if we relaxed the VFS exclusion. Fixing hadn't been hard, fortunately. - core of that series - parallel lookup machinery, replacing ->i_mutex with rwsem, making lookup_slow() take it only shared. At that point lookups happen in parallel; lookups on the same name wait for the in-progress one to be done with that dentry. Surprisingly little code, at that - almost all of it is in fs/dcache.c, with fs/namei.c changes limited to lookup_slow() - making it use the new primitive and actually switching to locking shared. - parallel readdir stuff - first of all, we provide the exclusion on per-struct file basis, same as we do for read() vs lseek() for regular files. That takes care of most of the needed exclusion in readdir/readdir; however, these guys are trickier than lookups, so I went for switching them one-by-one. To do that, a new method '->iterate_shared()' is added and filesystems are switched to it as they are either confirmed to be OK with shared lock on directory or fixed to be OK with that. I hope to kill the original method come next cycle (almost all in-tree filesystems are switched already), but it's still not quite finished. - several filesystems get switched to parallel readdir. The interesting part here is dealing with dcache preseeding by readdir; that needs minor adjustment to be safe with directory locked only shared. Most of the filesystems doing that got switched to in those commits. Important exception: NFS. Turns out that NFS folks, with their, er, insistence on VFS getting the fuck out of the way of the Smart Filesystem Code That Knows How And What To Lock(tm) have grown the locking of their own. They had their own homegrown rwsem, with lookup/readdir/atomic_open being *writers* (sillyunlink is the reader there). Of course, with VFS getting the fuck out of the way, as requested, the actual smarts of the smart filesystem code etc. had become exposed... - do_last/lookup_open/atomic_open cleanups. As the result, open() without O_CREAT locks the directory only shared. Including the ->atomic_open() case. Backmerge from #for-linus in the middle of that - atomic_open() fix got brought in. - then comes NFS switch to saner (VFS-based ;-) locking, killing the homegrown "lookup and readdir are writers" kinda-sorta rwsem. All exclusion for sillyunlink/lookup is done by the parallel lookups mechanism. Exclusion between sillyunlink and rmdir is a real rwsem now - rmdir being the writer. Result: NFS lookups/readdirs/O_CREAT-less opens happen in parallel now. - the rest of the series consists of switching a lot of filesystems to parallel readdir; in a lot of cases ->llseek() gets simplified as well. One backmerge in there (again, #for-linus - rockridge fix)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (74 commits) ext4: switch to ->iterate_shared() hfs: switch to ->iterate_shared() hfsplus: switch to ->iterate_shared() hostfs: switch to ->iterate_shared() hpfs: switch to ->iterate_shared() hpfs: handle allocation failures in hpfs_add_pos() gfs2: switch to ->iterate_shared() f2fs: switch to ->iterate_shared() afs: switch to ->iterate_shared() befs: switch to ->iterate_shared() befs: constify stuff a bit isofs: switch to ->iterate_shared() get_acorn_filename(): deobfuscate a bit btrfs: switch to ->iterate_shared() logfs: no need to lock directory in lseek switch ecryptfs to ->iterate_shared 9p: switch to ->iterate_shared() fat: switch to ->iterate_shared() romfs, squashfs: switch to ->iterate_shared() more trivial ->iterate_shared conversions ...
2016-05-17switchdev: pass pointer to fib_info instead of copyJiri Pirko
The problem is that fib_info->nh is [0] so the struct fib_info allocation size depends on number of nexthops. If we just copy fib_info, we do not copy the nexthops info and driver accesses memory which is not ours. Given the fact that fib4 does not defer operations and therefore it does not need copy, just pass the pointer down to drivers as it was done before. Fixes: 850d0cbc91 ("switchdev: remove pointers from switchdev objects") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-17Merge branch 'irq-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "This update delivers: - Yet another interrupt chip diver (LPC32xx) - Core functions to handle partitioned per-cpu interrupts - Enhancements to the IPI core - Proper handling of irq type configuration - A large set of ARM GIC enhancements - The usual pile of small fixes, cleanups and enhancements" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) irqchip/bcm2836: Use a more generic memory barrier call irqchip/bcm2836: Fix compiler warning on 64-bit build irqchip/bcm2836: Drop smp_set_ops on arm64 builds irqchip/gic: Add helper functions for GIC setup and teardown irqchip/gic: Store GIC configuration parameters irqchip/gic: Pass GIC pointer to save/restore functions irqchip/gic: Return an error if GIC initialisation fails irqchip/gic: Remove static irq_chip definition for eoimode1 irqchip/gic: Don't initialise chip if mapping IO space fails irqchip/gic: WARN if setting the interrupt type for a PPI fails irqchip/gic: Don't unnecessarily write the IRQ configuration irqchip: Mask the non-type/sense bits when translating an IRQ genirq: Ensure IRQ descriptor is valid when setting-up the IRQ irqchip/gic-v3: Configure all interrupts as non-secure Group-1 irqchip/gic-v2m: Add workaround for Broadcom NS2 GICv2m erratum irqchip/irq-alpine-msi: Don't use <asm-generic/msi.h> irqchip/mbigen: Checking for IS_ERR() instead of NULL irqchip/gic-v3: Remove inexistant register definition irqchip/gicv3-its: Don't allow devices whose ID is outside range irqchip: Add LPC32xx interrupt controller driver ...
2016-05-17regulator: Silence build warnings from regulator_can_change_voltage()Mark Brown
Cut down on noise for mainstream users of the API and people doing build testing by dropping the deprecated flag from regulator_can_change_voltage() as it triggers even on the EXPORT_SYMBOL_GPL() which affects all builds rather than just the remaining drivers with calls to it (for which fixes are currently pending). The function remains deprecated and is expected to be removed entirely in v4.8. Signed-off-by: Mark Brown <broonie@kernel.org>
2016-05-17Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A rather small set of patches from the timer departement: - Some more y2038 work - Yet another new clocksource driver - The usual set of small fixes, cleanups and enhancements" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/tegra: Remove unused suspend/resume code clockevents/driversi/mps2: add MPS2 Timer driver dt-bindings: document the MPS2 timer bindings clocksource/drivers/mtk_timer: Add __init attribute clockevents/drivers/dw_apb_timer: Implement ->set_state_oneshot_stopped() time: Introduce do_sys_settimeofday64() security: Introduce security_settime64() clocksource: Add missing include of of.h.
2016-05-17Merge tag 'trace-fixes-v4.6-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing ring-buffer fixes from Steven Rostedt: "Hao Qin reported an integer overflow possibility with signed and unsigned numbers in the ring-buffer code. https://bugzilla.kernel.org/show_bug.cgi?id=118001 At first I did not think this was too much of an issue, because the overflow would be caught later when either too much data was allocated or it would trigger RB_WARN_ON() which shuts down the ring buffer. But looking closer into it, I found that the right settings could bypass the checks and crash the kernel. Luckily, this is only accessible by root. The first fix is to convert all the variables into long, such that we don't get into issues between 32 bit variables being assigned 64 bit ones. This fixes the RB_WARN_ON() triggering. The next fix is to get rid of a duplicate DIV_ROUND_UP() that when called twice with the right value, can cause a kernel crash. The first DIV_ROUND_UP() is to normalize the input and it is checked against the minimum allowable value. But then DIV_ROUND_UP() is called again, which can overflow due to the (a + b - 1)/b, logic. The first called upped the value, the second can overflow (with the +b part). The second call to DIV_ROUND_UP() came in via a second change a while ago and the code is cleaned up to remove it" * tag 'trace-fixes-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Prevent overflow of size in ring_buffer_resize() ring-buffer: Use long for nr_pages to avoid overflow failures
2016-05-17net_sched: close another race condition in tcf_mirred_release()WANG Cong
We saw the following extra refcount release on veth device: kernel: [7957821.463992] unregister_netdevice: waiting for mesos50284 to become free. Usage count = -1 Since we heavily use mirred action to redirect packets to veth, I think this is caused by the following race condition: CPU0: tcf_mirred_release(): (in RCU callback) struct net_device *dev = rcu_dereference_protected(m->tcfm_dev, 1); CPU1: mirred_device_event(): spin_lock_bh(&mirred_list_lock); list_for_each_entry(m, &mirred_list, tcfm_list) { if (rcu_access_pointer(m->tcfm_dev) == dev) { dev_put(dev); /* Note : no rcu grace period necessary, as * net_device are already rcu protected. */ RCU_INIT_POINTER(m->tcfm_dev, NULL); } } spin_unlock_bh(&mirred_list_lock); CPU0: tcf_mirred_release(): spin_lock_bh(&mirred_list_lock); list_del(&m->tcfm_list); spin_unlock_bh(&mirred_list_lock); if (dev) // <======== Stil refers to the old m->tcfm_dev dev_put(dev); // <======== dev_put() is called on it again The action init code path is good because it is impossible to modify an action that is being removed. So, fix this by moving everything under the spinlock. Fixes: 2ee22a90c7af ("net_sched: act_mirred: remove spinlock in fast path") Fixes: 6bd00b850635 ("act_mirred: fix a race condition on mirred_list") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-17tipc: fix nametable publication field in nl compatRichard Alpe
The publication field of the old netlink API should contain the publication key and not the publication reference. Fixes: 44a8ae94fd55 (tipc: convert legacy nl name table dump to nl compat) Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-17Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Crypto self tests can now be disabled at boot/run time. - Add async support to algif_aead. Algorithms: - A large number of fixes to MPI from Nicolai Stange. - Performance improvement for HMAC DRBG. Drivers: - Use generic crypto engine in omap-des. - Merge ppc4xx-rng and crypto4xx drivers. - Fix lockups in sun4i-ss driver by disabling IRQs. - Add DMA engine support to ccp. - Reenable talitos hash algorithms. - Add support for Hisilicon SoC RNG. - Add basic crypto driver for the MXC SCC. Others: - Do not allocate crypto hash tfm in NORECLAIM context in ecryptfs" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits) crypto: qat - change the adf_ctl_stop_devices to void crypto: caam - fix caam_jr_alloc() ret code crypto: vmx - comply with ABIs that specify vrsave as reserved. crypto: testmgr - Add a flag allowing the self-tests to be disabled at runtime. crypto: ccp - constify ccp_actions structure crypto: marvell/cesa - Use dma_pool_zalloc crypto: qat - make adf_vf_isr.c dependant on IOV config crypto: qat - Fix typo in comments lib: asn1_decoder - add MODULE_LICENSE("GPL") crypto: omap-sham - Use dma_request_chan() for requesting DMA channel crypto: omap-des - Use dma_request_chan() for requesting DMA channel crypto: omap-aes - Use dma_request_chan() for requesting DMA channel crypto: omap-des - Integrate with the crypto engine framework crypto: s5p-sss - fix incorrect usage of scatterlists api crypto: s5p-sss - Fix missed interrupts when working with 8 kB blocks crypto: s5p-sss - Use common BIT macro crypto: mxc-scc - fix unwinding in mxc_scc_crypto_register() crypto: mxc-scc - signedness bugs in mxc_scc_ablkcipher_req_init() crypto: talitos - fix ahash algorithms registration crypto: ccp - Ensure all dependencies are specified ...
2016-05-17drivers: net: Don't print unpopulated net_device nameHarvey Hunt
For ethernet devices, net_device.name will be eth%d before register_netdev() is called. Don't print the net_device name until the format string is replaced. Signed-off-by: Harvey Hunt <harvey.hunt@imgtec.com> Cc: Marcel Ziswiler <marcel@ziswiler.com> Cc: Robert Jarzmik <robert.jarzmik@free.fr> Cc: Barry Song <Baohua.Song@csr.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-17qed: add support for dcbx.Sudarsana Reddy Kalluru
This patch adds the necessary driver support for Management Firmware to configure the device/firmware with the dcbx results. Management Firmware is responsible for communicating the DCBX and driving the negotiation, but the driver has responsibility of receiving async notification and configuring the results in hw/fw. This patch also adds the dcbx support for future protocols (e.g., FCoE) as preparation to their imminent submission. Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-17ravb: Add missing free_irq() calls to ravb_close()Geert Uytterhoeven
When reopening the network device on ra7795/salvator-x, e.g. after a DHCP timeout: IP-Config: Reopening network devices... genirq: Flags mismatch irq 139. 00000000 (eth0:ch0:rx_be) vs. 00000000 (ravb e6800000.ethernet eth0: cannot request IRQ eth0:ch0:rx_be IP-Config: Failed to open eth0 IP-Config: No network devices available The "mismatch" is due to requesting an IRQ that is already in use, while IRQF_PROBE_SHARED wasn't set. However, the real cause is that ravb_close() doesn't release any of the R-Car Gen3-specific secondary IRQs. Add the missing free_irq() calls to fix this. Fixes: f51bdc236b6c5835 ("ravb: Add dma queue interrupt support") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-17qed: Remove a stray tabDan Carpenter
This line was indented more than it should be. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>