Age | Commit message (Collapse) | Author |
|
With rproc_alloc_registered_carveouts() introduction, carveouts are
allocated after resource table parsing.
rproc_da_to_va() may return NULL at trace resource registering.
This patch modifies trace debufs registering to provide device address
(da) instead of va.
da to va translation is done at each trace buffer access
through debugfs interface.
Fixes: d7c51706d095 ("remoteproc: add alloc ops in rproc_mem_entry struct")
Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
Correct remoteproc core behavior when memory carveout device
address is fixed in resource table and rproc device doesn't have
associated IOMMU.
Current returned error is breaking legacy on TI platforms.
This patch restores previous behavior. It adds a warn message when
allocation doesn't fit carveout request, but doesn't stop rproc_start()
sequence anymore.
Fixes: 3bc8140b157c ("remoteproc: configure IOMMU only if device address requested")
Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
Today resource table supports only 32bit address fields.
This is not compliant with 64bit platform for which addresses
are cast in 32bit.
This patch adds warn messages when address cast is done.
Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
As dma member of struct rproc_mem_entry is dma_addr_t, no
need to cast in u32.
Fixes: d7c51706d095 ("remoteproc: add alloc ops in rproc_mem_entry struct")
Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
With introduction of rproc_alloc_registered_carveouts() which
delays carveout allocation just before the start of the remote
processor, rproc_da_to_va() could be called before all carveouts
are allocated.
This patch adds a check in rproc_da_to_va() to return NULL if
carveout is not allocated.
Fixes: d7c51706d095 ("remoteproc: add alloc ops in rproc_mem_entry struct")
Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
Add alloc parameter description and correct comment
about release one.
Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
Commit 7e83cab824a87e83cab824a8 ("remoteproc: Modify recovery path
to use rproc_{start,stop}()") replaces rproc_{shutdown,boot}() with
rproc_{stop,start}(), which skips destroy the virtio device at stop
but re-initializes it again at start.
Issue is that struct virtio_dev is not correctly reinitialized like done
at initial allocation thanks to kzalloc() and kobject is considered as
already initialized by kernel. That is due to the fact struct virtio_dev
is allocated and released at vdev resource handling level managed and
virtio device is registered and unregistered at rproc subdevices level.
Moreover kernel documentation mentions that device struct must be
zero initialized before calling device_initialize().
This patch disentangles struct virtio_dev from struct rproc_vdev as
the two struct don't have the same life-cycle.
struct virtio_dev is now allocated on rproc_start() and released
on rproc_stop().
This patch applies on top of patch
remoteproc: create vdev subdevice with specific dma memory pool [1]
[1]: https://patchwork.kernel.org/patch/10755781/
Fixes: 7e83cab824a8 ("remoteproc: Modify recovery path to use rproc_{start,stop}()")
Reported-by: Xiang Xiao <xiaoxiang781216@gmail.com>
Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
Make header files alphabetical order.
Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
Remoteproc is now capable to create one specific sub-device per
virtio link to associate a dedicated memory pool.
This implies to change device used by virtio_rpmsg for
buffer allocation from grand-parent to parent.
Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
ST remote processor needs some specified memory regions for
firmware and IPC.
Memory regions are defined as reserved memory and should
be registered in remoteproc core thanks to rproc_add_carveout
function before rproc_start. For this, st rproc driver implements
prepare ops.
Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
This patch creates a dedicated vdev subdevice for each vdev declared
in firmware resource table and associates carveout named "vdev%dbuffer"
(with %d vdev index in resource table) if any as dma coherent memory pool.
Then vdev subdevice is used as parent for virtio device.
Signed-off-by: Loic Pallardy <loic.pallardy@st.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
|
|
The checks in __bdev_dax_supported() helped mitigate a potential data
corruption bug in the pmem driver's handling of section alignment
padding. Strengthen the checks, including checking the end of the range,
to validate the dev_pagemap, Xarray entries, and sector-to-pfn
translation established for pmem namespaces.
Acked-by: Jan Kara <jack@suse.cz>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
We should setup init_mm before doing parse_early_param() in setup_arch()
to be consistent with setup_arch() of other architectures such as x86,
ARM, and ARM64.
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
|
|
There are rare cases where a PL_INT_CAUSE bit may end up getting
set when the corresponding PL_INT_ENABLE bit isn't set.
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is no need to have DM core split discards on behalf of a DM target
now that blk_queue_split() handles splitting discards based on the
queue_limits. A DM target just needs to set max_discard_sectors,
discard_granularity, etc, in queue_limits.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Heiner Kallweit says:
====================
net: phy: disable aneg in genphy_c45_pma_setup_forced
When genphy_c45_pma_setup_forced() is called the "aneg enabled" bit
may still be set, therefore clear it. This is also in line with what
genphy_setup_forced() does for Clause 22.
v2:
- fix a typo in patch 1
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that genphy_c45_pma_setup_forced() makes sure the "aneg enabled"
bit is cleared, the call to genphy_c45_an_disable_aneg() isn't needed
any longer. And the code pattern is now the same as in
genphy_config_aneg().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When genphy_c45_pma_setup_forced() is called the "aneg enabled" bit may
still be set, therefore clear it. This is also in line with what
genphy_setup_forced() does for Clause 22.
v2:
- fix typo
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2019-02-19
This series includes misc updates to mlx5 drivers and one ethtool update.
1) From Aya Levin:
- ethtool: Define 50Gbps per lane link modes
- add support for 50Gbps per lane link modes in mlx5 driver
2) From Tariq Toukan,
- Add a helper function to unify mlx5 resource reloading
3) From Vlad Buslov,
- Remove wrong and superfluous tc pedit header type check
4) From Tonghao Zhang,
- Some refactoring in en_tc.c to simplify the mlx5e_tc_add_fdb_flow
5) From Leon Romanovsky & Saeed,
- Compilation warning fixes
6) From Bodong wang,
- E-Switch fixes that are related to the SmarNIC series
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
(cherry picked from commit 033b228e7f26b29ae37f8bfa1bc6b209a5365e9f)
When tcindex_destroy() destroys all the filter results in
the perfect hash table, it invokes the walker to delete
each of them. However, results with class==0 are skipped
in either tcindex_walk() or tcindex_delete(), which causes
a memory leak reported by kmemleak.
This patch fixes it by skipping the walker and directly
deleting these filter results so we don't miss any filter
result.
As a result of this change, we have to initialize exts->net
properly in tcindex_alloc_perfect_hash(). For net-next, we
need to consider whether we should initialize ->net in
tcf_exts_init() instead, before that just directly test
CONFIG_NET_CLS_ACT=y.
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
(cherry picked from commit 8015d93ebd27484418d4952284fd02172fa4b0b2)
tcindex_destroy() invokes tcindex_destroy_element() via
a walker to delete each filter result in its perfect hash
table, and tcindex_destroy_element() calls tcindex_delete()
which schedules tcf RCU works to do the final deletion work.
Unfortunately this races with the RCU callback
__tcindex_destroy(), which could lead to use-after-free as
reported by Adrian.
Fix this by migrating this RCU callback to tcf RCU work too,
as that workqueue is ordered, we will not have use-after-free.
Note, we don't need to hold netns refcnt because we don't call
tcf_exts_destroy() here.
Fixes: 27ce4f05e2ab ("net_sched: use tcf_queue_work() in tcindex filter")
Reported-by: Adrian <bugs@abtelecom.ro>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Several u->addr and u->path users are not holding any locks in
common with unix_bind(). unix_state_lock() is useless for those
purposes.
u->addr is assign-once and *(u->addr) is fully set up by the time
we set u->addr (all under unix_table_lock). u->path is also
set in the same critical area, also before setting u->addr, and
any unix_sock with ->path filled will have non-NULL ->addr.
So setting ->addr with smp_store_release() is all we need for those
"lockless" users - just have them fetch ->addr with smp_load_acquire()
and don't even bother looking at ->path if they see NULL ->addr.
Users of ->addr and ->path fall into several classes now:
1) ones that do smp_load_acquire(u->addr) and access *(u->addr)
and u->path only if smp_load_acquire() has returned non-NULL.
2) places holding unix_table_lock. These are guaranteed that
*(u->addr) is seen fully initialized. If unix_sock is in one of the
"bound" chains, so's ->path.
3) unix_sock_destructor() using ->addr is safe. All places
that set u->addr are guaranteed to have seen all stores *(u->addr)
while holding a reference to u and unix_sock_destructor() is called
when (atomic) refcount hits zero.
4) unix_release_sock() using ->path is safe. unix_bind()
is serialized wrt unix_release() (normally - by struct file
refcount), and for the instances that had ->path set by unix_bind()
unix_release_sock() comes from unix_release(), so they are fine.
Instances that had it set in unix_stream_connect() either end up
attached to a socket (in unix_accept()), in which case the call
chain to unix_release_sock() and serialization are the same as in
the previous case, or they never get accept'ed and unix_release_sock()
is called when the listener is shut down and its queue gets purged.
In that case the listener's queue lock provides the barriers needed -
unix_stream_connect() shoves our unix_sock into listener's queue
under that lock right after having set ->path and eventual
unix_release_sock() caller picks them from that queue under the
same lock right before calling unix_release_sock().
5) unix_find_other() use of ->path is pointless, but safe -
it happens with successful lookup by (abstract) name, so ->path.dentry
is guaranteed to be NULL there.
earlier-variant-reviewed-by: "Paul E. McKenney" <paulmck@linux.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Booting 4.20 on SolidRun Clearfog issues this warning with DMA API
debug enabled:
WARNING: CPU: 0 PID: 555 at kernel/dma/debug.c:1230 check_sync+0x514/0x5bc
mvneta f1070000.ethernet: DMA-API: device driver tries to sync DMA memory it has not allocated [device address=0x000000002dd7dc00] [size=240 bytes]
Modules linked in: ahci mv88e6xxx dsa_core xhci_plat_hcd xhci_hcd devlink armada_thermal marvell_cesa des_generic ehci_orion phy_armada38x_comphy mcp3021 spi_orion evbug sfp mdio_i2c ip_tables x_tables
CPU: 0 PID: 555 Comm: bridge-network- Not tainted 4.20.0+ #291
Hardware name: Marvell Armada 380/385 (Device Tree)
[<c0019638>] (unwind_backtrace) from [<c0014888>] (show_stack+0x10/0x14)
[<c0014888>] (show_stack) from [<c07f54e0>] (dump_stack+0x9c/0xd4)
[<c07f54e0>] (dump_stack) from [<c00312bc>] (__warn+0xf8/0x124)
[<c00312bc>] (__warn) from [<c00313b0>] (warn_slowpath_fmt+0x38/0x48)
[<c00313b0>] (warn_slowpath_fmt) from [<c00b0370>] (check_sync+0x514/0x5bc)
[<c00b0370>] (check_sync) from [<c00b04f8>] (debug_dma_sync_single_range_for_cpu+0x6c/0x74)
[<c00b04f8>] (debug_dma_sync_single_range_for_cpu) from [<c051bd14>] (mvneta_poll+0x298/0xf58)
[<c051bd14>] (mvneta_poll) from [<c0656194>] (net_rx_action+0x128/0x424)
[<c0656194>] (net_rx_action) from [<c000a230>] (__do_softirq+0xf0/0x540)
[<c000a230>] (__do_softirq) from [<c00386e0>] (irq_exit+0x124/0x144)
[<c00386e0>] (irq_exit) from [<c009b5e0>] (__handle_domain_irq+0x58/0xb0)
[<c009b5e0>] (__handle_domain_irq) from [<c03a63c4>] (gic_handle_irq+0x48/0x98)
[<c03a63c4>] (gic_handle_irq) from [<c0009a10>] (__irq_svc+0x70/0x98)
...
This appears to be caused by mvneta_rx_hwbm() calling
dma_sync_single_range_for_cpu() with the wrong struct device pointer,
as the buffer manager device pointer is used to map and unmap the
buffer. Fix this.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes
drm/i915 fbdev takeover fix for v5.0
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/87k1hutrmc.fsf@intel.com
|
|
kvmhv_p9_guest_entry() implements a fast-path guest entry for Power9
when guest and host are both running with the Radix MMU.
Currently in that path we don't save the host AMR (Authority Mask
Register) value, and we always restore 0 on return to the host. That
is OK at the moment because the AMR is not used for storage keys with
the Radix MMU.
However we plan to start using the AMR on Radix to prevent the kernel
from reading/writing to userspace outside of copy_to/from_user(). In
order to make that work we need to save/restore the AMR value.
We only restore the value if it is different from the guest value,
which is already in the register when we exit to the host. This should
mean we rarely need to actually restore the value when running a
modern Linux as a guest, because it will be using the same value as
us.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Russell Currey <ruscur@russell.cc>
|
|
Register the hwmon sysfs interface on R-Car Gen3 thermal driver to
align it with Gen2 driver. Use devm_add_action() to unregister the
hwmon interface automatically.
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Wolfram Sang <wsa+renesas@sang-engineering.com>
Cc: linux-renesas-soc@vger.kernel.org
To: linux-pm@vger.kernel.org
From: Marek Vasut <marek.vasut+renesas@gmail.com>
Signed-off-by: Marek Vasut <marek.vasut+renesas@gmail.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
|
of_find_device_by_node() takes a reference to the struct device
when it finds a match via get_device.
We also should make sure to drop the reference to the device
taken by of_find_device_by_node() when returning error.
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
|
Add support for get_trend ops that allows soctherm
sensors to be used with the step-wise governor.
Signed-off-by: Wei Ni <wni@nvidia.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
|
Fix memory allocation to store the pointers to
thermal_zone_device.
Signed-off-by: Wei Ni <wni@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
|
Convert warnings to info as not all platforms may
have all the thresholds and sensors enabled.
Signed-off-by: Wei Ni <wni@nvidia.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
|
MT8183 has two built-in thermal controllers with total six thermal
sensors. And it doesn't have bank, so doesn't need to select bank.
This patch adds support for mt8183.
Signed-off-by: Michael Kao <michael.kao@mediatek.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
|
Also update the location of the git tree as we will be using a shared
git tree.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master
KVM: s390: Fix crypto handling for nested KVM
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Now that we send the pages using a struct msghdr, instead of
using sendpage(), we no longer need to 'prime the socket' with
an address for unconnected UDP messages.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Simplify the page send code using iov_iter and bvecs.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Prepare to the socket transmission code to use iov_iter.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If the client stream receive code receives an ESHUTDOWN error either
because the server closed the connection, or because it sent a
callback which cannot be processed, then we should shut down
the connection.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If the message read completes, but the socket returned an error
condition, we should ensure to propagate that error.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
A zero length fragment is really a bug, but let's ensure we don't
go nuts when one turns up.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
To ensure that the receive worker has exclusive access to the stream record
info, we must not reset the contents other than when holding the
transport->recv_mutex.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
After setxattr, the nfsv3 cached the acl which set by user.
But at the backend, the shared file system (eg. ext4) will check
the acl, if it can merged with mode, it won't add acl to the file.
So, the nfsv3 cached acl is redundant.
Don't 'set_cached_acl' when setxattr.
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
As the block and SCSI layouts can only read/write fixed-length
blocks, we must perform read-modify-write when data to be written is
not aligned to a block boundary or smaller than the block size.
(612aa983a0410 pnfs: add flag to force read-modify-write in ->write_begin)
The current code tries to see if we have to do read-modify-write
on block-oriented pNFS layouts by just checking !PageUptodate(page),
but the same condition also applies for overwriting of any uncached
potions of existing files, making such operations excessively slow
even it is block-aligned.
The change does not affect the optimization for modify-write-read
cases (38c73044f5f4d NFS: read-modify-write page updating),
because partial update of !PageUptodate() pages can only happen
in layouts that can do arbitrary length read/write and never
in block-based ones.
Testing results:
We ran fio on one of the pNFS clients running 4.20 kernel
(vanilla and patched) in this configuration to read/write/overwrite
files on the storage array, exported as pnfs share by the server.
pNFS clients ---1G Ethernet--- pNFS server
(HP DL360 G8) (HP DL360 G8)
| |
| |
+------8G Fiber Channel--------+
|
Storage Array
(HP P6350)
Throughput of overwrite (both buffered and O_SYNC) is noticeably
improved.
Ops. |block size| Throughput |
| (KiB) | (MiB/s) |
| | 4.20 | patched|
---------+----------+----------------+
buffered | 4| 21.3 | 232 |
overwrite| 32| 22.2 | 256 |
| 512| 22.4 | 260 |
---------+----------+----------------+
O_SYNC | 4| 3.84| 4.77|
overwrite| 32| 12.2 | 32.0 |
| 512| 18.5 | 152 |
---------+----------+----------------+
Read and write (buffered and O_SYNC) by the same client remain unchanged
by the patch either negatively or positively, as they should do.
Ops. |block size| Throughput |
| (KiB) | (MiB/s) |
| | 4.20 | patched|
---------+----------+----------------+
read | 4| 548 | 550 |
| 32| 547 | 551 |
| 512| 548 | 551 |
---------+----------+----------------+
buffered | 4| 237 | 244 |
write | 32| 261 | 268 |
| 512| 265 | 272 |
---------+----------+----------------+
O_SYNC | 4| 0.46| 0.46|
write | 32| 3.60| 3.57|
| 512| 105 | 106 |
---------+----------+----------------+
Signed-off-by: Kazuo Ito <ito_kazuo_g3@lab.ntt.co.jp>
Tested-by: Hiroyuki Watanabe <watanabe.hiroyuki@lab.ntt.co.jp>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
nfs_want_read_modify_write() didn't check for !PagePrivate when pNFS
block or SCSI layout was in use, therefore we could lose data forever
if the page being written was filled by a read before completion.
Signed-off-by: Kazuo Ito <ito_kazuo_g3@lab.ntt.co.jp>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
This fixes the typo in comments of nfs_readdir_alloc_pages().
Because nfs_readdir_large_page and nfs_readdir_free_pagearray had been
renamed.
Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
This removes redundant semicolon for ending code.
Fixes: c7944ebb9ce9 ("NFSv4: Fix lookup revalidate of regular files")
Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
When listing very large directories via NFS, clients may take a long
time to complete. There are about three factors involved:
First of all, ls and practically every other method of listing a
directory including python os.listdir and find rely on libc readdir().
However readdir() only reads 32K of directory entries at a time, which
means that if you have a lot of files in the same directory, it is going
to take an insanely long time to read all the directory entries.
Secondly, libc readdir() reads 32K of directory entries at a time, in
kernel space 32K buffer split into 8 pages. One NFS readdirplus rpc will
be called for one page, which introduces many readdirplus rpc calls.
Lastly, one NFS readdirplus rpc asks for 32K data (filled by nfs_dentry)
to fill one page (filled by dentry), we found that nearly one third of
data was wasted.
To solve above problems, pagecache mechanism was introduced. One NFS
readdirplus rpc will ask for a large data (more than 32k), the data can
fill more than one page, the cached pages can be used for next readdir
call. This can reduce many readdirplus rpc calls and improve readdirplus
performance.
TESTING:
When listing very large directories(include 300 thousand files) via NFS
time ls -l /nfs_mount | wc -l
without the patch:
300001
real 1m53.524s
user 0m2.314s
sys 0m2.599s
with the patch:
300001
real 0m23.487s
user 0m2.305s
sys 0m2.558s
Improved performance: 79.6%
readdirplus rpc calls decrease: 85%
Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
In the rare and unsupported case of a hostname list nfs_parse_devname
will modify dev_name. There is no need to modify dev_name as the all
that is being computed is the length of the hostname, so the computed
length can just be shorted.
Fixes: dc04589827f7 ("NFS: Use common device name parsing logic for NFSv4 and NFSv2/v3")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
As reported by Dan Carpenter, this test for acred->cred being set is
inconsistent with the dereference of the pointer a few lines earlier.
An 'auth_cred' *always* has ->cred set - every place that creates one
initializes this field, often as the first thing done.
So remove this test.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Drop LIST_HEAD where the variable it declares has never
been used.
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
... when != x
// </smpl>
Fixes: 0e20162ed1e9 ("NFSv4.1 Use MDS auth flavor for data server connection")
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|