summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-07-31macvlan: Initialize vlan_features to turn on offload support.Vlad Yasevich
Macvlan devices do not initialize vlan_features. As a result, any vlan devices configured on top of macvlans perform very poorly. Initialize vlan_features based on the vlan features of the lower-level device. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31AX88179_178A: Add ethtool ops for EEE supportFreddy Xin
Add functions to support ethtool EEE manipulating, and the EEE is disabled in default setting to enhance the compatibility with certain switch. Signed-off-by: Freddy Xin <freddy@asix.com.tw> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31netlink: Use PAGE_ALIGNED macroTobias Klauser
Use PAGE_ALIGNED(...) instead of IS_ALIGNED(..., PAGE_SIZE). Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31net: fix the counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORSDuan Jiong
When dealing with ICMPv[46] Error Message, function icmp_socket_deliver() and icmpv6_notify() do some valid checks on packet's length, but then some protocols check packet's length redaudantly. So remove those duplicated statements, and increase counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS in function icmp_socket_deliver() and icmpv6_notify() respectively. In addition, add missed counter in udp6/udplite6 when socket is NULL. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31sctp: Fixup v4mapped behaviour to comply with Sock APIJason Gunthorpe
The SCTP socket extensions API document describes the v4mapping option as follows: 8.1.15. Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR) This socket option is a Boolean flag which turns on or off the mapping of IPv4 addresses. If this option is turned on, then IPv4 addresses will be mapped to V6 representation. If this option is turned off, then no mapping will be done of V4 addresses and a user will receive both PF_INET6 and PF_INET type addresses on the socket. See [RFC3542] for more details on mapped V6 addresses. This description isn't really in line with what the code does though. Introduce addr_to_user (renamed addr_v4map), which should be called before any sockaddr is passed back to user space. The new function places the sockaddr into the correct format depending on the SCTP_I_WANT_MAPPED_V4_ADDR option. Audit all places that touched v4mapped and either sanely construct a v4 or v6 address then call addr_to_user, or drop the unnecessary v4mapped check entirely. Audit all places that call addr_to_user and verify they are on a sycall return path. Add a custom getname that formats the address properly. Several bugs are addressed: - SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for addresses to user space - The addr_len returned from recvmsg was not correct when returning AF_INET on a v6 socket - flowlabel and scope_id were not zerod when promoting a v4 to v6 - Some syscalls like bind and connect behaved differently depending on v4mapped Tested bind, getpeername, getsockname, connect, and recvmsg for proper behaviour in v4mapped = 1 and 0 cases. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31net: kernel-doc compliant documentation for net_deviceKaroly Kemeny
Net_device is a vast and important structure, but it has no kernel-doc compliant documentation. This patch extracts the comments from the structure to clean it up, and let the scripts extract documentation from it. I know that the patch is big, but it's just reordering of comments into the appropriate form, and adding a few more, for the missing members. Signed-off-by: Karoly Kemeny <karoly.kemeny@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31cifs: Separate rawntlmssp auth from CIFS_SessSetup()Sachin Prabhu
Separate rawntlmssp authentication from CIFS_SessSetup(). Also cleanup CIFS_SessSetup() since we no longer do any auth within it. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>
2014-07-31cifs: Split Kerberos authentication off CIFS_SessSetup()Sachin Prabhu
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>
2014-07-31cifs: Split ntlm and ntlmv2 authentication methods off CIFS_SessSetup()Sachin Prabhu
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>
2014-07-31cifs: Split lanman auth from CIFS_SessSetup()Sachin Prabhu
In preparation for splitting CIFS_SessSetup() into smaller more manageable chunks, we first add helper functions. We then proceed to split out lanman auth out of CIFS_SessSetup() Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>
2014-07-31cifs: replace code with free_rsp_buf()Sachin Prabhu
The functionality provided by free_rsp_buf() is duplicated in a number of places. Replace these instances with a call to free_rsp_buf(). Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com>
2014-07-31bus: arm-ccn: Fix error handling at event allocationPawel Moll
The bitfield allocation function returns error condition as a negative value, but in two cases its result was assigned to an unsigned member of the hw_perf_event structure, thus the error would not be ever detected. Fixed by using an intermediate, signed variable. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pawel Moll <pawel.moll@arm.com> Signed-off-by: Olof Johansson <olof@lixom.net>
2014-07-31Merge tag 'samsung-dt-2' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into next/dt Merge "Samsung DT 2nd updates for v3.17" from Kukjin Kim: This is based on tags/exynos-power because this DT changes are depending PMU cleanup. Fixes boot for exynos5260 and exynos5410, - Since exynos cannot boot without obtaining PMU address via DT from now on, add PMU node for exynos5260 and exynos5410 For preparing exynos5250-spring, - move max77686 and cypress,cyapa trackpad from exynos5250- cros-common to exynos5250-snow DT file (Note exynos5250-spring is not included in this branch yet) For exynos3250, - add TMU node and remove duplicated interrupt-parent - add missing pinctrl property for uart0 and uart1 For exynos5250-smdk5250 board - add max77686 pmic interrupt property which is connected to gpx3 * tag 'samsung-dt-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung: (28 commits) ARM: dts: Add missing pinctrl for uart0/1 for exynos3250 ARM: dts: Remove duplicate 'interrput-parent' property for exynos3250 ARM: dts: Add TMU dt node to monitor the temperature for exynos3250 ARM: dts: Specify MAX77686 pmic interrupt for exynos5250-smdk5250 ARM: dts: cypress,cyapa trackpad is exynos5250-Snow only ARM: dts: max77686 is exynos5250-snow only ARM: EXYNOS: Add exynos5260 PMU compatible string to DT match table ARM: dts: Add PMU DT node for exynos5260 SoC ARM: EXYNOS: Add support for Exynos5410 PMU ARM: dts: Add PMU to exynos5410 ARM: dts: Document exynos5410 PMU ARM: EXYNOS: Move cpufreq and cpuidle device registration to init_machine ARM: EXYNOS: Refactored code for using PMU address via DT ARM: EXYNOS: Support cluster power off on exynos5420/5800 ARM: EXYNOS: populate suspend and powered_up callbacks for mcpm ARM: EXYNOS: do not allow cpuidle registration for exynos5420 cpuidle: big.LITTLE: init driver for exynos5420 cpuidle: big.LITTLE: Add ARCH_EXYNOS entry in config ARM: EXYNOS: add generic function to calculate cpu number cpuidle: big.LITTLE: add of_device_id structure ... Signed-off-by: Olof Johansson <olof@lixom.net>
2014-08-01ACPI / LPSS: add LPSS device for Wildcat Point PCHJie Yang
INT3438 is the ADSP device on Wildcat Point platform with 2 DW DMA engines built In. The DMA engines are used for DSP FW loading and audio data transferring. These DMA engine probing need the clock, without it, probing may failed and can't go forward. Add LPSS device "INT3438" for Wildcat Point PCH, to provide clock for its ADSP DMA engine probing. Signed-off-by: Jie Yang <yang.jie@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-31mnt: Add tests for unprivileged remount cases that have found to be faultyEric W. Biederman
Kenton Varda <kenton@sandstorm.io> discovered that by remounting a read-only bind mount read-only in a user namespace the MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user to the remount a read-only mount read-write. Upon review of the code in remount it was discovered that the code allowed nosuid, noexec, and nodev to be cleared. It was also discovered that the code was allowing the per mount atime flags to be changed. The first naive patch to fix these issues contained the flaw that using default atime settings when remounting a filesystem could be disallowed. To avoid this problems in the future add tests to ensure unprivileged remounts are succeeding and failing at the appropriate times. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-07-31mnt: Change the default remount atime from relatime to the existing valueEric W. Biederman
Since March 2009 the kernel has treated the state that if no MS_..ATIME flags are passed then the kernel defaults to relatime. Defaulting to relatime instead of the existing atime state during a remount is silly, and causes problems in practice for people who don't specify any MS_...ATIME flags and to get the default filesystem atime setting. Those users may encounter a permission error because the default atime setting does not work. A default that does not work and causes permission problems is ridiculous, so preserve the existing value to have a default atime setting that is always guaranteed to work. Using the default atime setting in this way is particularly interesting for applications built to run in restricted userspace environments without /proc mounted, as the existing atime mount options of a filesystem can not be read from /proc/mounts. In practice this fixes user space that uses the default atime setting on remount that are broken by the permission checks keeping less privileged users from changing more privileged users atime settings. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-07-31mnt: Correct permission checks in do_remountEric W. Biederman
While invesgiating the issue where in "mount --bind -oremount,ro ..." would result in later "mount --bind -oremount,rw" succeeding even if the mount started off locked I realized that there are several additional mount flags that should be locked and are not. In particular MNT_NOSUID, MNT_NODEV, MNT_NOEXEC, and the atime flags in addition to MNT_READONLY should all be locked. These flags are all per superblock, can all be changed with MS_BIND, and should not be changable if set by a more privileged user. The following additions to the current logic are added in this patch. - nosuid may not be clearable by a less privileged user. - nodev may not be clearable by a less privielged user. - noexec may not be clearable by a less privileged user. - atime flags may not be changeable by a less privileged user. The logic with atime is that always setting atime on access is a global policy and backup software and auditing software could break if atime bits are not updated (when they are configured to be updated), and serious performance degradation could result (DOS attack) if atime updates happen when they have been explicitly disabled. Therefore an unprivileged user should not be able to mess with the atime bits set by a more privileged user. The additional restrictions are implemented with the addition of MNT_LOCK_NOSUID, MNT_LOCK_NODEV, MNT_LOCK_NOEXEC, and MNT_LOCK_ATIME mnt flags. Taken together these changes and the fixes for MNT_LOCK_READONLY should make it safe for an unprivileged user to create a user namespace and to call "mount --bind -o remount,... ..." without the danger of mount flags being changed maliciously. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-07-31mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remountEric W. Biederman
There are no races as locked mount flags are guaranteed to never change. Moving the test into do_remount makes it more visible, and ensures all filesystem remounts pass the MNT_LOCK_READONLY permission check. This second case is not an issue today as filesystem remounts are guarded by capable(CAP_DAC_ADMIN) and thus will always fail in less privileged mount namespaces, but it could become an issue in the future. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-07-31mnt: Only change user settable mount flags in remountEric W. Biederman
Kenton Varda <kenton@sandstorm.io> discovered that by remounting a read-only bind mount read-only in a user namespace the MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user to the remount a read-only mount read-write. Correct this by replacing the mask of mount flags to preserve with a mask of mount flags that may be changed, and preserve all others. This ensures that any future bugs with this mask and remount will fail in an easy to detect way where new mount flags simply won't change. Cc: stable@vger.kernel.org Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-07-31Merge tag 'pm+acpi-3.16-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "One commit that fixes a problem causing PNP devices to be associated with wrong ACPI device objects sometimes during device enumeration due to an incorrect check in a matching function. That problem was uncovered by the ACPI device enumeration rework in 3.14" * tag 'pm+acpi-3.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / PNP: Fix acpi_pnp_match()
2014-07-31Merge branch 'stmmac-next'David S. Miller
Vince Bridgers says: ==================== net: stmmac: Improve mcast/ucast filter for snps This patch series adds Synopsys specific bindings for the Synopsys EMAC filter characteristics since those are implementation dependent. The multicast and unicast filtering code was improved to handle different configuration variations based on device tree settings. I verified the operation of the multicast and unicast filters through Synopsys support as requested during the V1 review, and tested the GMAC configuration on an Altera Cyclone 5 SOC (which supports 256 multicast bins and 128 Unicast addresses). The 10/100 variant of this driver modification was not tested, although it was compile tested. I shared the email thread results of the investigation through Synopsys with the stmmac maintainer. V4: Remove patch from series that addressed a sparse issue from a down rev'd version of sparse that does not show up in the latest version of sparse. V3: Break up the patch into interface and functional change patches per review comments V2: Confirm with Synopsys methods to determine number of Multicast bins and Unicast address filter entries per first round review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31net: stmmac: Support devicetree configs for mcast and ucast filter entriesVince Bridgers
This patch adds and modifies code to support multiple Multicast and Unicast Synopsys MAC filter configurations. The default configuration is defined to support legacy driver behavior, which is 64 Multicast bins. The Unicast filter code previously assumed all controllers support 32 or 16 Unicast addresses based on controller version number, but this has been corrected to support a default of 1 Unicast address. The filter configuration may be specified through the devicetree using a Synopsys specific device tree entry. This information was verified with Synopsys through Synopsys Support Case #8000684337 and shared with the maintainer. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31ARM: socfpga: Add socfpga Ethernet filter attributes entriesVince Bridgers
This patch adds socfpga Ethernet filter attributes for multicast and unicast filters per Synopsys Ethernet IP configuration chosen by Altera for the Cyclone 5 and Arria SOC FPGAs. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31dts: Add bindings for multicast hash bins and perfect filter entriesVince Bridgers
This change adds bindings for the number of multicast hash bins and perfect filter entries supported by the Synopsys EMAC. The Synopsys EMAC core is configurable at device creation time, and can be configured for a different number of multicast hash bins and a different number of perfect filter entries. The device does not provide a way to query these parameters, therefore parameters are required. The Altera Cyclone V SOC has support for 256 multicast hash bins and 128 perfect filter entries, and is different than what's currently provided in the stmmac driver. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31net: stmmac: Correct set_filter for multicast and unicast casesVince Bridgers
This patch removes the check for the number of mulitcast addresses when using hash based filtering since it's not necessary. If the number of multicast addresses in the list exceeds the number of multicast hash bins, the bins will "fold" over into one of the bins configured and enabled for the particular component instance. The default number of maximum unicast addresses was changed from 32 to 1 since this number is not dependent on the component revision. The maximum number of multicast and unicast addresses is dependent on the configuration of the Synopsys EMAC configured by the SOC architect at the time the features were selected and configured for a particular component. Sadly, Synopsys does not provide a way to query the precise number supported by a particular component, so we must fall back on a devicetree entry. This configuration could vary from vendor to vendor (such as STMicro, Altera, etc). The multicast bins are set for every possible filtering case (including no entries) - previously the bits were set only if multicast filter entries were present. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31net: stmmac: Change MAC interface to support multiple filter configurationsVince Bridgers
The synopsys EMAC can be configured for different numbers of multicast hash bins and perfect filter entries at device creation time and there's no way to query this configuration information at runtime. As a result, a devicetree parameter is required in order for the driver to program these filters correctly for a particular device instance. This patch modifies the 10/100/1000 MAC software interface such that these configuration parameters can be set at initialization time. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains netfilter updates for net-next, they are: 1) Add the reject expression for the nf_tables bridge family, this allows us to send explicit reject (TCP RST / ICMP dest unrech) to the packets matching a rule. 2) Simplify and consolidate the nf_tables set dumping logic. This uses netlink control->data to filter out depending on the request. 3) Perform garbage collection in xt_hashlimit using a workqueue instead of a timer, which is problematic when many entries are in place in the tables, from Eric Dumazet. 4) Remove leftover code from the removed ulog target support, from Paul Bolle. 5) Dump unmodified flags in the netfilter packet accounting when resetting counters, so userspace knows that a counter was in overquota situation, from Alexey Perevalov. 6) Fix wrong usage of the bitwise functions in nfnetlink_acct, also from Alexey. 7) Fix a crash when adding new set element with an empty NFTA_SET_ELEM_LIST attribute. This patchset also includes a couple of cleanups for xt_LED from Duan Jiong and for nf_conntrack_ipv4 (using coccinelle) from Himangi Saraogi. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31ACPI / PNP: Replace faulty is_hex_digit() by isxdigit()Arjun Sreedharan
0 is ascii for NULL. Hex digit matching should be from '0'. Faulty version returns true for #,$,%,& etc. Signed-off-by: Arjun Sreedharan <arjun024@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-07-31tcp: don't require root to read tcp_metricsBanerjee, Debabrata
commit d23ff7016 (tcp: add generic netlink support for tcp_metrics) introduced netlink support for the new tcp_metrics, however it restricted getting of tcp_metrics to root user only. This is a change from how these values could have been fetched when in the old route cache. Unless there's a legitimate reason to restrict the reading of these values it would be better if normal users could fetch them. Cc: Julian Anastasov <ja@ssi.bg> Cc: linux-kernel@vger.kernel.org Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-31xprtrdma: Handle additional connection eventsChuck Lever
Commit 38ca83a5 added RDMA_CM_EVENT_TIMEWAIT_EXIT. But that status is relevant only for consumers that re-use their QPs on new connections. xprtrdma creates a fresh QP on reconnection, so that event should be explicitly ignored. Squelch the alarming "unexpected CM event" message. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macroChuck Lever
Clean up. RPCRDMA_PERSISTENT_REGISTRATION was a compile-time switch between RPCRDMA_REGISTER mode and RPCRDMA_ALLPHYSICAL mode. Since RPCRDMA_REGISTER has been removed, there's no need for the extra conditional compilation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Make rpcrdma_ep_disconnect() return voidChuck Lever
Clean up: The return code is used only for dprintk's that are already redundant. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Schedule reply tasklet once per upcallChuck Lever
Minor optimization: grab rpcrdma_tk_lock_g and disable hard IRQs just once after clearing the receive completion queue. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Allocate each struct rpcrdma_mw separatelyChuck Lever
Currently rpcrdma_buffer_create() allocates struct rpcrdma_mw's as a single contiguous area of memory. It amounts to quite a bit of memory, and there's no requirement for these to be carved from a single piece of contiguous memory. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Rename frmr_wrChuck Lever
Clean up: Name frmr_wr after the opcode of the Work Request, consistent with the send and local invalidation paths. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Disable completions for LOCAL_INV Work RequestsChuck Lever
Instead of relying on a completion to change the state of an FRMR to FRMR_IS_INVALID, set it in advance. If an error occurs, a completion will fire anyway and mark the FRMR FRMR_IS_STALE. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Disable completions for FAST_REG_MR Work RequestsChuck Lever
Instead of relying on a completion to change the state of an FRMR to FRMR_IS_VALID, set it in advance. If an error occurs, a completion will fire anyway and mark the FRMR FRMR_IS_STALE. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Don't post a LOCAL_INV in rpcrdma_register_frmr_external()Chuck Lever
Any FRMR arriving in rpcrdma_register_frmr_external() is now guaranteed to be either invalid, or to be targeted by a queued LOCAL_INV that will invalidate it before the adapter processes the FAST_REG_MR being built here. The problem with current arrangement of chaining a LOCAL_INV to the FAST_REG_MR is that if the transport is not connected, the LOCAL_INV is flushed and the FAST_REG_MR is flushed. This leaves the FRMR valid with the old rkey. But rpcrdma_register_frmr_external() has already bumped the in-memory rkey. Next time through rpcrdma_register_frmr_external(), a LOCAL_INV and FAST_REG_MR is attempted again because the FRMR is still valid. But the rkey no longer matches the hardware's rkey, and a memory management operation error occurs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Reset FRMRs after a flushed LOCAL_INV Work RequestChuck Lever
When a LOCAL_INV Work Request is flushed, it leaves an FRMR in the VALID state. This FRMR can be returned by rpcrdma_buffer_get(), and must be knocked down in rpcrdma_register_frmr_external() before it can be re-used. Instead, capture these in rpcrdma_buffer_get(), and reset them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Reset FRMRs when FAST_REG_MR is flushed by a disconnectChuck Lever
FAST_REG_MR Work Requests update a Memory Region's rkey. Rkey's are used to block unwanted access to the memory controlled by an MR. The rkey is passed to the receiver (the NFS server, in our case), and is also used by xprtrdma to invalidate the MR when the RPC is complete. When a FAST_REG_MR Work Request is flushed after a transport disconnect, xprtrdma cannot tell whether the WR actually hit the adapter or not. So it is indeterminant at that point whether the existing rkey is still valid. After the transport connection is re-established, the next FAST_REG_MR or LOCAL_INV Work Request against that MR can sometimes fail because the rkey value does not match what xprtrdma expects. The only reliable way to recover in this case is to deregister and register the MR before it is used again. These operations can be done only in a process context, so handle it in the transport connect worker. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Properly handle exhaustion of the rb_mws listChuck Lever
If the rb_mws list is exhausted, clean up and return NULL so that call_allocate() will delay and try again. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Chain together all MWs in same buffer poolChuck Lever
During connection loss recovery, need to visit every MW in a buffer pool. Any MW that is in use by an RPC will not be on the rb_mws list. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Back off rkey when FAST_REG_MR failsChuck Lever
If posting a FAST_REG_MR Work Reqeust fails, revert the rkey update to avoid subsequent IB_WC_MW_BIND_ERR completions. Suggested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Unclutter struct rpcrdma_mr_segChuck Lever
Clean ups: - make it obvious that the rl_mw field is a pointer -- allocated separately, not as part of struct rpcrdma_mr_seg - promote "struct {} frmr;" to a named type - promote the state enum to a named type - name the MW state field the same way other fields in rpcrdma_mw are named Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Don't invalidate FRMRs if registration failsChuck Lever
If FRMR registration fails, it's likely to transition the QP to the error state. Or, registration may have failed because the QP is _already_ in ERROR. Thus calling rpcrdma_deregister_external() in rpcrdma_create_chunks() is useless in FRMR mode: the LOCAL_INVs just get flushed. It is safe to leave existing registrations: when FRMR registration is tried again, rpcrdma_register_frmr_external() checks if each FRMR is already/still VALID, and knocks it down first if it is. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: On disconnect, don't ignore pending CQEsChuck Lever
xprtrdma is currently throwing away queued completions during a reconnect. RPC replies posted just before connection loss, or successful completions that change the state of an FRMR, can be missed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Update rkeys after transport reconnectChuck Lever
Various reports of: rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0 ep ffff8800bfd3e848 Ensure that rkeys in already-marshalled RPC/RDMA headers are refreshed after the QP has been replaced by a reconnect. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=249 Suggested-by: Selvin Xavier <Selvin.Xavier@Emulex.Com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Limit data payload size for ALLPHYSICALChuck Lever
When the client uses physical memory registration, each page in the payload gets its own array entry in the RPC/RDMA header's chunk list. Therefore, don't advertise a maximum payload size that would require more array entries than can fit in the RPC buffer where RPC/RDMA headers are built. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=248 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Protect ia->ri_id when unmapping/invalidating MRsChuck Lever
Ensure ia->ri_id remains valid while invoking dma_unmap_page() or posting LOCAL_INV during a transport reconnect. Otherwise, ia->ri_id->device or ia->ri_id->qp is NULL, which triggers a panic. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=259 Fixes: ec62f40 'xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting' Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-07-31xprtrdma: Fix panic in rpcrdma_register_frmr_external()Chuck Lever
seg1->mr_nsegs is not yet initialized when it is used to unmap segments during an error exit. Use the same unmapping logic for all error exits. "if (frmr_wr.wr.fast_reg.length < len) {" used to be a BUG_ON check. The broken code will never be executed under normal operation. Fixes: c977dea (xprtrdma: Remove BUG_ON() call sites) Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Shirley Ma <shirley.ma@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>