summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-02-20remoteproc: fix trace buffer va initializationLoic Pallardy
With rproc_alloc_registered_carveouts() introduction, carveouts are allocated after resource table parsing. rproc_da_to_va() may return NULL at trace resource registering. This patch modifies trace debufs registering to provide device address (da) instead of va. da to va translation is done at each trace buffer access through debugfs interface. Fixes: d7c51706d095 ("remoteproc: add alloc ops in rproc_mem_entry struct") Signed-off-by: Loic Pallardy <loic.pallardy@st.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-02-20remoteproc: fix rproc_alloc_carveout() for rproc with iommu domainLoic Pallardy
Correct remoteproc core behavior when memory carveout device address is fixed in resource table and rproc device doesn't have associated IOMMU. Current returned error is breaking legacy on TI platforms. This patch restores previous behavior. It adds a warn message when allocation doesn't fit carveout request, but doesn't stop rproc_start() sequence anymore. Fixes: 3bc8140b157c ("remoteproc: configure IOMMU only if device address requested") Signed-off-by: Loic Pallardy <loic.pallardy@st.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-02-20remoteproc: add warning on resource table castLoic Pallardy
Today resource table supports only 32bit address fields. This is not compliant with 64bit platform for which addresses are cast in 32bit. This patch adds warn messages when address cast is done. Signed-off-by: Loic Pallardy <loic.pallardy@st.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-02-20remoteproc: fix rproc_alloc_carveout() bad variable castLoic Pallardy
As dma member of struct rproc_mem_entry is dma_addr_t, no need to cast in u32. Fixes: d7c51706d095 ("remoteproc: add alloc ops in rproc_mem_entry struct") Signed-off-by: Loic Pallardy <loic.pallardy@st.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-02-20remoteproc: fix rproc_da_to_va in case of unallocated carveoutLoic Pallardy
With introduction of rproc_alloc_registered_carveouts() which delays carveout allocation just before the start of the remote processor, rproc_da_to_va() could be called before all carveouts are allocated. This patch adds a check in rproc_da_to_va() to return NULL if carveout is not allocated. Fixes: d7c51706d095 ("remoteproc: add alloc ops in rproc_mem_entry struct") Signed-off-by: Loic Pallardy <loic.pallardy@st.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-02-20remoteproc: correct rproc_mem_entry_init() commentsLoic Pallardy
Add alloc parameter description and correct comment about release one. Signed-off-by: Loic Pallardy <loic.pallardy@st.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-02-20remoteproc: fix recovery procedureLoic Pallardy
Commit 7e83cab824a87e83cab824a8 ("remoteproc: Modify recovery path to use rproc_{start,stop}()") replaces rproc_{shutdown,boot}() with rproc_{stop,start}(), which skips destroy the virtio device at stop but re-initializes it again at start. Issue is that struct virtio_dev is not correctly reinitialized like done at initial allocation thanks to kzalloc() and kobject is considered as already initialized by kernel. That is due to the fact struct virtio_dev is allocated and released at vdev resource handling level managed and virtio device is registered and unregistered at rproc subdevices level. Moreover kernel documentation mentions that device struct must be zero initialized before calling device_initialize(). This patch disentangles struct virtio_dev from struct rproc_vdev as the two struct don't have the same life-cycle. struct virtio_dev is now allocated on rproc_start() and released on rproc_stop(). This patch applies on top of patch remoteproc: create vdev subdevice with specific dma memory pool [1] [1]: https://patchwork.kernel.org/patch/10755781/ Fixes: 7e83cab824a8 ("remoteproc: Modify recovery path to use rproc_{start,stop}()") Reported-by: Xiang Xiao <xiaoxiang781216@gmail.com> Signed-off-by: Loic Pallardy <loic.pallardy@st.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-02-20rpmsg: virtio: change header file sort styleLoic Pallardy
Make header files alphabetical order. Signed-off-by: Loic Pallardy <loic.pallardy@st.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-02-20rpmsg: virtio: allocate buffer from parentLoic Pallardy
Remoteproc is now capable to create one specific sub-device per virtio link to associate a dedicated memory pool. This implies to change device used by virtio_rpmsg for buffer allocation from grand-parent to parent. Signed-off-by: Loic Pallardy <loic.pallardy@st.com> Reviewed-by: Anup Patel <anup@brainfault.org> Tested-by: Anup Patel <anup@brainfault.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-02-20remoteproc: st: add reserved memory supportLoic Pallardy
ST remote processor needs some specified memory regions for firmware and IPC. Memory regions are defined as reserved memory and should be registered in remoteproc core thanks to rproc_add_carveout function before rproc_start. For this, st rproc driver implements prepare ops. Signed-off-by: Loic Pallardy <loic.pallardy@st.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-02-20remoteproc: create vdev subdevice with specific dma memory poolLoic Pallardy
This patch creates a dedicated vdev subdevice for each vdev declared in firmware resource table and associates carveout named "vdev%dbuffer" (with %d vdev index in resource table) if any as dma coherent memory pool. Then vdev subdevice is used as parent for virtio device. Signed-off-by: Loic Pallardy <loic.pallardy@st.com> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
2019-02-20dax: Check the end of the block-device capacity with dax_direct_access()Dan Williams
The checks in __bdev_dax_supported() helped mitigate a potential data corruption bug in the pmem driver's handling of section alignment padding. Strengthen the checks, including checking the end of the range, to validate the dev_pagemap, Xarray entries, and sector-to-pfn translation established for pmem namespaces. Acked-by: Jan Kara <jack@suse.cz> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2019-02-21RISC-V: Setup init_mm before parse_early_param()Anup Patel
We should setup init_mm before doing parse_early_param() in setup_arch() to be consistent with setup_arch() of other architectures such as x86, ARM, and ARM64. Signed-off-by: Anup Patel <anup.patel@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
2019-02-20cxgb4: Mask out interrupts that are not enabled.Vishal Kulkarni
There are rare cases where a PL_INT_CAUSE bit may end up getting set when the corresponding PL_INT_ENABLE bit isn't set. Signed-off-by: Vishal Kulkarni <vishal@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20dm: eliminate 'split_discard_bios' flag from DM target interfaceMike Snitzer
There is no need to have DM core split discards on behalf of a DM target now that blk_queue_split() handles splitting discards based on the queue_limits. A DM target just needs to set max_discard_sectors, discard_granularity, etc, in queue_limits. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-02-20Merge branch 'net-phy-disable-aneg-in-genphy_c45_pma_setup_forced'David S. Miller
Heiner Kallweit says: ==================== net: phy: disable aneg in genphy_c45_pma_setup_forced When genphy_c45_pma_setup_forced() is called the "aneg enabled" bit may still be set, therefore clear it. This is also in line with what genphy_setup_forced() does for Clause 22. v2: - fix a typo in patch 1 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20net: phy: marvell10g: improve mv3310_config_anegHeiner Kallweit
Now that genphy_c45_pma_setup_forced() makes sure the "aneg enabled" bit is cleared, the call to genphy_c45_an_disable_aneg() isn't needed any longer. And the code pattern is now the same as in genphy_config_aneg(). Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20net: phy: disable aneg in genphy_c45_pma_setup_forcedHeiner Kallweit
When genphy_c45_pma_setup_forced() is called the "aneg enabled" bit may still be set, therefore clear it. This is also in line with what genphy_setup_forced() does for Clause 22. v2: - fix typo Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20Merge tag 'mlx5-updates-2019-02-19' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-02-19 This series includes misc updates to mlx5 drivers and one ethtool update. 1) From Aya Levin: - ethtool: Define 50Gbps per lane link modes - add support for 50Gbps per lane link modes in mlx5 driver 2) From Tariq Toukan, - Add a helper function to unify mlx5 resource reloading 3) From Vlad Buslov, - Remove wrong and superfluous tc pedit header type check 4) From Tonghao Zhang, - Some refactoring in en_tc.c to simplify the mlx5e_tc_add_fdb_flow 5) From Leon Romanovsky & Saeed, - Compilation warning fixes 6) From Bodong wang, - E-Switch fixes that are related to the SmarNIC series ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20net_sched: fix a memory leak in cls_tcindexCong Wang
(cherry picked from commit 033b228e7f26b29ae37f8bfa1bc6b209a5365e9f) When tcindex_destroy() destroys all the filter results in the perfect hash table, it invokes the walker to delete each of them. However, results with class==0 are skipped in either tcindex_walk() or tcindex_delete(), which causes a memory leak reported by kmemleak. This patch fixes it by skipping the walker and directly deleting these filter results so we don't miss any filter result. As a result of this change, we have to initialize exts->net properly in tcindex_alloc_perfect_hash(). For net-next, we need to consider whether we should initialize ->net in tcf_exts_init() instead, before that just directly test CONFIG_NET_CLS_ACT=y. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20net_sched: fix a race condition in tcindex_destroy()Cong Wang
(cherry picked from commit 8015d93ebd27484418d4952284fd02172fa4b0b2) tcindex_destroy() invokes tcindex_destroy_element() via a walker to delete each filter result in its perfect hash table, and tcindex_destroy_element() calls tcindex_delete() which schedules tcf RCU works to do the final deletion work. Unfortunately this races with the RCU callback __tcindex_destroy(), which could lead to use-after-free as reported by Adrian. Fix this by migrating this RCU callback to tcf RCU work too, as that workqueue is ordered, we will not have use-after-free. Note, we don't need to hold netns refcnt because we don't call tcf_exts_destroy() here. Fixes: 27ce4f05e2ab ("net_sched: use tcf_queue_work() in tcindex filter") Reported-by: Adrian <bugs@abtelecom.ro> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20missing barriers in some of unix_sock ->addr and ->path accessesAl Viro
Several u->addr and u->path users are not holding any locks in common with unix_bind(). unix_state_lock() is useless for those purposes. u->addr is assign-once and *(u->addr) is fully set up by the time we set u->addr (all under unix_table_lock). u->path is also set in the same critical area, also before setting u->addr, and any unix_sock with ->path filled will have non-NULL ->addr. So setting ->addr with smp_store_release() is all we need for those "lockless" users - just have them fetch ->addr with smp_load_acquire() and don't even bother looking at ->path if they see NULL ->addr. Users of ->addr and ->path fall into several classes now: 1) ones that do smp_load_acquire(u->addr) and access *(u->addr) and u->path only if smp_load_acquire() has returned non-NULL. 2) places holding unix_table_lock. These are guaranteed that *(u->addr) is seen fully initialized. If unix_sock is in one of the "bound" chains, so's ->path. 3) unix_sock_destructor() using ->addr is safe. All places that set u->addr are guaranteed to have seen all stores *(u->addr) while holding a reference to u and unix_sock_destructor() is called when (atomic) refcount hits zero. 4) unix_release_sock() using ->path is safe. unix_bind() is serialized wrt unix_release() (normally - by struct file refcount), and for the instances that had ->path set by unix_bind() unix_release_sock() comes from unix_release(), so they are fine. Instances that had it set in unix_stream_connect() either end up attached to a socket (in unix_accept()), in which case the call chain to unix_release_sock() and serialization are the same as in the previous case, or they never get accept'ed and unix_release_sock() is called when the listener is shut down and its queue gets purged. In that case the listener's queue lock provides the barriers needed - unix_stream_connect() shoves our unix_sock into listener's queue under that lock right after having set ->path and eventual unix_release_sock() caller picks them from that queue under the same lock right before calling unix_release_sock(). 5) unix_find_other() use of ->path is pointless, but safe - it happens with successful lookup by (abstract) name, so ->path.dentry is guaranteed to be NULL there. earlier-variant-reviewed-by: "Paul E. McKenney" <paulmck@linux.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-20net: marvell: mvneta: fix DMA debug warningRussell King
Booting 4.20 on SolidRun Clearfog issues this warning with DMA API debug enabled: WARNING: CPU: 0 PID: 555 at kernel/dma/debug.c:1230 check_sync+0x514/0x5bc mvneta f1070000.ethernet: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000002dd7dc00] [size=240 bytes] Modules linked in: ahci mv88e6xxx dsa_core xhci_plat_hcd xhci_hcd devlink armada_thermal marvell_cesa des_generic ehci_orion phy_armada38x_comphy mcp3021 spi_orion evbug sfp mdio_i2c ip_tables x_tables CPU: 0 PID: 555 Comm: bridge-network- Not tainted 4.20.0+ #291 Hardware name: Marvell Armada 380/385 (Device Tree) [<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14) [<c0014888>] (show_stack) from [<c07f54e0>] (dump_stack+0x9c/0xd4) [<c07f54e0>] (dump_stack) from [<c00312bc>] (__warn+0xf8/0x124) [<c00312bc>] (__warn) from [<c00313b0>] (warn_slowpath_fmt+0x38/0x48) [<c00313b0>] (warn_slowpath_fmt) from [<c00b0370>] (check_sync+0x514/0x5bc) [<c00b0370>] (check_sync) from [<c00b04f8>] (debug_dma_sync_single_range_for_cpu+0x6c/0x74) [<c00b04f8>] (debug_dma_sync_single_range_for_cpu) from [<c051bd14>] (mvneta_poll+0x298/0xf58) [<c051bd14>] (mvneta_poll) from [<c0656194>] (net_rx_action+0x128/0x424) [<c0656194>] (net_rx_action) from [<c000a230>] (__do_softirq+0xf0/0x540) [<c000a230>] (__do_softirq) from [<c00386e0>] (irq_exit+0x124/0x144) [<c00386e0>] (irq_exit) from [<c009b5e0>] (__handle_domain_irq+0x58/0xb0) [<c009b5e0>] (__handle_domain_irq) from [<c03a63c4>] (gic_handle_irq+0x48/0x98) [<c03a63c4>] (gic_handle_irq) from [<c0009a10>] (__irq_svc+0x70/0x98) ... This appears to be caused by mvneta_rx_hwbm() calling dma_sync_single_range_for_cpu() with the wrong struct device pointer, as the buffer manager device pointer is used to map and unmap the buffer. Fix this. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-21Merge tag 'drm-intel-fixes-2019-02-20' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes drm/i915 fbdev takeover fix for v5.0 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87k1hutrmc.fsf@intel.com
2019-02-21KVM: PPC: Book3S HV: Context switch AMR on Power9Michael Ellerman
kvmhv_p9_guest_entry() implements a fast-path guest entry for Power9 when guest and host are both running with the Radix MMU. Currently in that path we don't save the host AMR (Authority Mask Register) value, and we always restore 0 on return to the host. That is OK at the moment because the AMR is not used for storage keys with the Radix MMU. However we plan to start using the AMR on Radix to prevent the kernel from reading/writing to userspace outside of copy_to/from_user(). In order to make that work we need to save/restore the AMR value. We only restore the value if it is different from the guest value, which is already in the register when we exit to the host. This should mean we rarely need to actually restore the value when running a modern Linux as a guest, because it will be using the same value as us. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Russell Currey <ruscur@russell.cc>
2019-02-20thermal: rcar_gen3_thermal: Register hwmon sysfs interfaceMarek Vasut
Register the hwmon sysfs interface on R-Car Gen3 thermal driver to align it with Gen2 driver. Use devm_add_action() to unregister the hwmon interface automatically. Cc: Eduardo Valentin <edubezval@gmail.com> Cc: Wolfram Sang <wsa+renesas@sang-engineering.com> Cc: linux-renesas-soc@vger.kernel.org To: linux-pm@vger.kernel.org From: Marek Vasut <marek.vasut+renesas@gmail.com> Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-02-20thermal/qcom/tsens-common : fix possible object reference leakPeng Hao
of_find_device_by_node() takes a reference to the struct device when it finds a match via get_device. We also should make sure to drop the reference to the device taken by of_find_device_by_node() when returning error. Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-02-20thermal: tegra: add get_trend opsWei Ni
Add support for get_trend ops that allows soctherm sensors to be used with the step-wise governor. Signed-off-by: Wei Ni <wni@nvidia.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-02-20thermal: tegra: fix memory allocationWei Ni
Fix memory allocation to store the pointers to thermal_zone_device. Signed-off-by: Wei Ni <wni@nvidia.com> Acked-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-02-20thermal: tegra: remove unnecessary warningsWei Ni
Convert warnings to info as not all platforms may have all the thresholds and sensors enabled. Signed-off-by: Wei Ni <wni@nvidia.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-02-20thermal: mediatek: add support for MT8183Michael Kao
MT8183 has two built-in thermal controllers with total six thermal sensors. And it doesn't have bank, so doesn't need to select bank. This patch adds support for mt8183. Signed-off-by: Michael Kao <michael.kao@mediatek.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
2019-02-20MAINTAINERS: add Eric Biggers as an fscrypt maintainerTheodore Ts'o
Also update the location of the git tree as we will be using a shared git tree. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-02-20Merge tag 'kvm-s390-master-5.0' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master KVM: s390: Fix crypto handling for nested KVM
2019-02-20SUNRPC: Remove the redundant 'zerocopy' argument to xs_sendpages()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Further cleanups of xs_sendpages()Trond Myklebust
Now that we send the pages using a struct msghdr, instead of using sendpage(), we no longer need to 'prime the socket' with an address for unconnected UDP messages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Convert socket page send code to use iov_iter()Trond Myklebust
Simplify the page send code using iov_iter and bvecs. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec()Trond Myklebust
Prepare to the socket transmission code to use iov_iter. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Initiate a connection close on an ESHUTDOWN error in stream receiveTrond Myklebust
If the client stream receive code receives an ESHUTDOWN error either because the server closed the connection, or because it sent a callback which cannot be processed, then we should shut down the connection. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Don't suppress socket errors when a message read completesTrond Myklebust
If the message read completes, but the socket returned an error condition, we should ensure to propagate that error. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Handle zero length fragments correctlyTrond Myklebust
A zero length fragment is really a bug, but let's ensure we don't go nuts when one turns up. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: Don't reset the stream record info when the receive worker is runningTrond Myklebust
To ensure that the receive worker has exclusive access to the stream record info, we must not reset the contents other than when holding the transport->recv_mutex. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20nfs: fix xfstest generic/099 failed on nfsv3ZhangXiaoxu
After setxattr, the nfsv3 cached the acl which set by user. But at the backend, the shared file system (eg. ext4) will check the acl, if it can merged with mode, it won't add acl to the file. So, the nfsv3 cached acl is redundant. Don't 'set_cached_acl' when setxattr. Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20pNFS: Avoid read/modify/write when it is not necessaryKazuo Ito
As the block and SCSI layouts can only read/write fixed-length blocks, we must perform read-modify-write when data to be written is not aligned to a block boundary or smaller than the block size. (612aa983a0410 pnfs: add flag to force read-modify-write in ->write_begin) The current code tries to see if we have to do read-modify-write on block-oriented pNFS layouts by just checking !PageUptodate(page), but the same condition also applies for overwriting of any uncached potions of existing files, making such operations excessively slow even it is block-aligned. The change does not affect the optimization for modify-write-read cases (38c73044f5f4d NFS: read-modify-write page updating), because partial update of !PageUptodate() pages can only happen in layouts that can do arbitrary length read/write and never in block-based ones. Testing results: We ran fio on one of the pNFS clients running 4.20 kernel (vanilla and patched) in this configuration to read/write/overwrite files on the storage array, exported as pnfs share by the server. pNFS clients ---1G Ethernet--- pNFS server (HP DL360 G8) (HP DL360 G8) | | | | +------8G Fiber Channel--------+ | Storage Array (HP P6350) Throughput of overwrite (both buffered and O_SYNC) is noticeably improved. Ops. |block size| Throughput | | (KiB) | (MiB/s) | | | 4.20 | patched| ---------+----------+----------------+ buffered | 4| 21.3 | 232 | overwrite| 32| 22.2 | 256 | | 512| 22.4 | 260 | ---------+----------+----------------+ O_SYNC | 4| 3.84| 4.77| overwrite| 32| 12.2 | 32.0 | | 512| 18.5 | 152 | ---------+----------+----------------+ Read and write (buffered and O_SYNC) by the same client remain unchanged by the patch either negatively or positively, as they should do. Ops. |block size| Throughput | | (KiB) | (MiB/s) | | | 4.20 | patched| ---------+----------+----------------+ read | 4| 548 | 550 | | 32| 547 | 551 | | 512| 548 | 551 | ---------+----------+----------------+ buffered | 4| 237 | 244 | write | 32| 261 | 268 | | 512| 265 | 272 | ---------+----------+----------------+ O_SYNC | 4| 0.46| 0.46| write | 32| 3.60| 3.57| | 512| 105 | 106 | ---------+----------+----------------+ Signed-off-by: Kazuo Ito <ito_kazuo_g3@lab.ntt.co.jp> Tested-by: Hiroyuki Watanabe <watanabe.hiroyuki@lab.ntt.co.jp> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20pNFS: Fix potential corruption of page being writtenKazuo Ito
nfs_want_read_modify_write() didn't check for !PagePrivate when pNFS block or SCSI layout was in use, therefore we could lose data forever if the page being written was filled by a read before completion. Signed-off-by: Kazuo Ito <ito_kazuo_g3@lab.ntt.co.jp> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20NFS: Fix typo in comments of nfs_readdir_alloc_pages()zhangliguang
This fixes the typo in comments of nfs_readdir_alloc_pages(). Because nfs_readdir_large_page and nfs_readdir_free_pagearray had been renamed. Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20NFS: Remove redundant semicolonzhangliguang
This removes redundant semicolon for ending code. Fixes: c7944ebb9ce9 ("NFSv4: Fix lookup revalidate of regular files") Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20NFS: readdirplus optimization by cache mechanismluanshi
When listing very large directories via NFS, clients may take a long time to complete. There are about three factors involved: First of all, ls and practically every other method of listing a directory including python os.listdir and find rely on libc readdir(). However readdir() only reads 32K of directory entries at a time, which means that if you have a lot of files in the same directory, it is going to take an insanely long time to read all the directory entries. Secondly, libc readdir() reads 32K of directory entries at a time, in kernel space 32K buffer split into 8 pages. One NFS readdirplus rpc will be called for one page, which introduces many readdirplus rpc calls. Lastly, one NFS readdirplus rpc asks for 32K data (filled by nfs_dentry) to fill one page (filled by dentry), we found that nearly one third of data was wasted. To solve above problems, pagecache mechanism was introduced. One NFS readdirplus rpc will ask for a large data (more than 32k), the data can fill more than one page, the cached pages can be used for next readdir call. This can reduce many readdirplus rpc calls and improve readdirplus performance. TESTING: When listing very large directories(include 300 thousand files) via NFS time ls -l /nfs_mount | wc -l without the patch: 300001 real 1m53.524s user 0m2.314s sys 0m2.599s with the patch: 300001 real 0m23.487s user 0m2.305s sys 0m2.558s Improved performance: 79.6% readdirplus rpc calls decrease: 85% Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20fs/nfs: Fix nfs_parse_devname to not modify it's argumentEric W. Biederman
In the rare and unsupported case of a hostname list nfs_parse_devname will modify dev_name. There is no need to modify dev_name as the all that is being computed is the length of the hostname, so the computed length can just be shorted. Fixes: dc04589827f7 ("NFS: Use common device name parsing logic for NFSv4 and NFSv2/v3") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20SUNRPC: remove pointless test in unx_match()NeilBrown
As reported by Dan Carpenter, this test for acred->cred being set is inconsistent with the dereference of the pointer a few lines earlier. An 'auth_cred' *always* has ->cred set - every place that creates one initializes this field, often as the first thing done. So remove this test. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-02-20NFS: drop useless LIST_HEADJulia Lawall
Drop LIST_HEAD where the variable it declares has never been used. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ - LIST_HEAD(x); ... when != x // </smpl> Fixes: 0e20162ed1e9 ("NFSv4.1 Use MDS auth flavor for data server connection") Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>