summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2018-10-01video/hdmi: Pass buffer size to infoframe unpack functionsVille Syrjälä
To make sure the infoframe unpack functions don't end up examining stack garbage or oopsing, let's pass in the size of the buffer. v2: Convert tda1997x.c as well (kbuild test robot) Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Hans Verkuil <hans.verkuil@cisco.com> Cc: linux-media@vger.kernel.org Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180920185145.1912-3-ville.syrjala@linux.intel.com Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
2018-10-01video/hdmi: Constify 'buffer' to the unpack functionsVille Syrjälä
The unpack functions just read from the passed in buffer, so make it const. Cc: Thierry Reding <thierry.reding@gmail.com> Cc: Hans Verkuil <hans.verkuil@cisco.com> Cc: linux-media@vger.kernel.org Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180920185145.1912-2-ville.syrjala@linux.intel.com Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
2018-10-01net/mlx5: Cache the system image guidAlaa Hleihel
The system image guid is a read-only field which is used by the TC offloads code to determine if two mlx5 devices belong to the same ASIC while adding flows. Read this once and save it on the core device rather than querying each time an offloaded flow is added. Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01ext4: adjust reserved cluster count when removing extentsEric Whitney
Modify ext4_ext_remove_space() and the code it calls to correct the reserved cluster count for pending reservations (delayed allocated clusters shared with allocated blocks) when a block range is removed from the extent tree. Pending reservations may be found for the clusters at the ends of written or unwritten extents when a block range is removed. If a physical cluster at the end of an extent is freed, it's necessary to increment the reserved cluster count to maintain correct accounting if the corresponding logical cluster is shared with at least one delayed and unwritten extent as found in the extents status tree. Add a new function, ext4_rereserve_cluster(), to reapply a reservation on a delayed allocated cluster sharing blocks with a freed allocated cluster. To avoid ENOSPC on reservation, a flag is applied to ext4_free_blocks() to briefly defer updating the freeclusters counter when an allocated cluster is freed. This prevents another thread from allocating the freed block before the reservation can be reapplied. Redefine the partial cluster object as a struct to carry more state information and to clarify the code using it. Adjust the conditional code structure in ext4_ext_remove_space to reduce the indentation level in the main body of the code to improve readability. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-10-01ext4: fix reserved cluster accounting at delayed write timeEric Whitney
The code in ext4_da_map_blocks sometimes reserves space for more delayed allocated clusters than it should, resulting in premature ENOSPC, exceeded quota, and inaccurate free space reporting. Fix this by checking for written and unwritten blocks shared in the same cluster with the newly delayed allocated block. A cluster reservation should not be made for a cluster for which physical space has already been allocated. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-10-01ext4: generalize extents status tree search functionsEric Whitney
Ext4 contains a few functions that are used to search for delayed extents or blocks in the extents status tree. Rather than duplicate code to add new functions to search for extents with different status values, such as written or a combination of delayed and unwritten, generalize the existing code to search for caller-specified extents status values. Also, move this code into extents_status.c where it is better associated with the data structures it operates upon, and where it can be more readily used to implement new extents status tree functions that might want a broader scope for i_es_lock. Three missing static specifiers in RFC version of patch reported and fixed by Fengguang Wu <fengguang.wu@intel.com>. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-10-01net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rulesAlaa Hleihel
If the peer device was already unbound, then do not attempt to modify it's resources, otherwise we will crash on dereferencing non-existing device. Fixes: 5c65c564c962 ("net/mlx5e: Support offloading TC NIC hairpin flows") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2018-10-01Merge branch 'for-joerg/arm-smmu/updates' of ↵Joerg Roedel
git://git.kernel.org/pub/scm/linux/kernel/git/will/linux into arm/smmu
2018-10-01Merge tag 'v4.19-rc6' into for-4.20/blockJens Axboe
Merge -rc6 in, for two reasons: 1) Resolve a trivial conflict in the blk-mq-tag.c documentation 2) A few important regression fixes went into upstream directly, so they aren't in the 4.20 branch. Signed-off-by: Jens Axboe <axboe@kernel.dk> * tag 'v4.19-rc6': (780 commits) Linux 4.19-rc6 MAINTAINERS: fix reference to moved drivers/{misc => auxdisplay}/panel.c cpufreq: qcom-kryo: Fix section annotations perf/core: Add sanity check to deal with pinned event failure xen/blkfront: correct purging of persistent grants Revert "xen/blkfront: When purging persistent grants, keep them in the buffer" selftests/powerpc: Fix Makefiles for headers_install change blk-mq: I/O and timer unplugs are inverted in blktrace dax: Fix deadlock in dax_lock_mapping_entry() x86/boot: Fix kexec booting failure in the SEV bit detection code bcache: add separate workqueue for journal_write to avoid deadlock drm/amd/display: Fix Edid emulation for linux drm/amd/display: Fix Vega10 lightup on S3 resume drm/amdgpu: Fix vce work queue was not cancelled when suspend Revert "drm/panel: Add device_link from panel device to DRM device" xen/blkfront: When purging persistent grants, keep them in the buffer clocksource/drivers/timer-atmel-pit: Properly handle error cases block: fix deadline elevator drain for zoned block devices ACPI / hotplug / PCI: Don't scan for non-hotplug bridges if slot is not bridge drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set ... Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-10-01dma-direct: implement complete bus_dma_mask handlingChristoph Hellwig
Instead of rejecting devices with a too small bus_dma_mask we can handle by taking the bus dma_mask into account for allocations and bounce buffering decisions. Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-10-01dma-direct: add an explicit dma_direct_get_required_maskChristoph Hellwig
This is somewhat modelled after the powerpc version, and differs from the legacy fallback in use fls64 instead of pointlessly splitting up the address into low and high dwords and in that it takes (__)phys_to_dma into account. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-10-01dma-mapping: make the get_required_mask method available unconditionallyChristoph Hellwig
This save some duplication for ia64, and makes the interface more general. In the long run we want each dma_map_ops instance to fill this out, but this will take a little more prep work. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2018-10-01bpf: introduce per-cpu cgroup local storageRoman Gushchin
This commit introduced per-cpu cgroup local storage. Per-cpu cgroup local storage is very similar to simple cgroup storage (let's call it shared), except all the data is per-cpu. The main goal of per-cpu variant is to implement super fast counters (e.g. packet counters), which don't require neither lookups, neither atomic operations. >From userspace's point of view, accessing a per-cpu cgroup storage is similar to other per-cpu map types (e.g. per-cpu hashmaps and arrays). Writing to a per-cpu cgroup storage is not atomic, but is performed by copying longs, so some minimal atomicity is here, exactly as with other per-cpu maps. Signed-off-by: Roman Gushchin <guro@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01bpf: rework cgroup storage pointer passingRoman Gushchin
To simplify the following introduction of per-cpu cgroup storage, let's rework a bit a mechanism of passing a pointer to a cgroup storage into the bpf_get_local_storage(). Let's save a pointer to the corresponding bpf_cgroup_storage structure, instead of a pointer to the actual buffer. It will help us to handle per-cpu storage later, which has a different way of accessing to the actual data. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01bpf: extend cgroup bpf core to allow multiple cgroup storage typesRoman Gushchin
In order to introduce per-cpu cgroup storage, let's generalize bpf cgroup core to support multiple cgroup storage types. Potentially, per-node cgroup storage can be added later. This commit is mostly a formal change that replaces cgroup_storage pointer with a array of cgroup_storage pointers. It doesn't actually introduce a new storage type, it will be done later. Each bpf program is now able to have one cgroup storage of each type. Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-10-01vgic: Add support for 52bit guest physical addressKristina Martsenko
Add support for handling 52bit guest physical address to the VGIC layer. So far we have limited the guest physical address to 48bits, by explicitly masking the upper bits. This patch removes the restriction. We do not have to check if the host supports 52bit as the gpa is always validated during an access. (e.g, kvm_{read/write}_guest, kvm_is_visible_gfn()). Also, the ITS table save-restore is also not affected with the enhancement. The DTE entries already store the bits[51:8] of the ITT_addr (with a 256byte alignment). Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <cdall@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> [ Macro clean ups, fix PROPBASER and PENDBASER accesses ] Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-10-01iommu/dma: Add support for non-strict modeZhen Lei
With the flush queue infrastructure already abstracted into IOVA domains, hooking it up in iommu-dma is pretty simple. Since there is a degree of dependency on the IOMMU driver knowing what to do to play along, we key the whole thing off a domain attribute which will be set on default DMA ops domains to request non-strict invalidation. That way, drivers can indicate the appropriate support by acknowledging the attribute, and we can easily fall back to strict invalidation otherwise. The flush queue callback needs a handle on the iommu_domain which owns our cookie, so we have to add a pointer back to that, but neatly, that's also sufficient to indicate whether we're using a flush queue or not, and thus which way to release IOVAs. The only slight subtlety is switching __iommu_dma_unmap() from calling iommu_unmap() to explicit iommu_unmap_fast()/iommu_tlb_sync() so that we can elide the sync entirely in non-strict mode. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> [rm: convert to domain attribute, tweak comments and commit message] Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-10-01Merge tag 'v4.19-rc6' into develLinus Walleij
This is the 4.19-rc6 release I needed to merge this in because of extensive conflicts in the MSM and Intel pin control drivers. I know how to resolve them, so let's do it like this. Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-10-01signal: Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstackWill Deacon
The sigaltstack(2) system call fails with -ENOMEM if the new alternative signal stack is found to be smaller than SIGMINSTKSZ. On architectures such as arm64, where the native value for SIGMINSTKSZ is larger than the compat value, this can result in an unexpected error being reported to a compat task. See, for example: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=904385 This patch fixes the problem by extending do_sigaltstack to take the minimum signal stack size as an additional parameter, allowing the native and compat system call entry code to pass in their respective values. COMPAT_SIGMINSTKSZ is just defined as SIGMINSTKSZ if it has not been defined by the architecture. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Reported-by: Steve McIntyre <steve.mcintyre@arm.com> Tested-by: Steve McIntyre <93sam@debian.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-10-01Merge branch 'opp/linux-next' of ↵Rafael J. Wysocki
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm into pm-opp Pull operating performance points (OPP) material for 4.20 from Viresh Kumar. "This contains patches that fix several bugs in the OPP core and makes it more stable." * 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: OPP: Pass OPP table to _of_add_opp_table_v{1|2}() OPP: Prevent creating multiple OPP tables for devices sharing OPP nodes OPP: Use a single mechanism to free the OPP table OPP: Don't remove dynamic OPPs from _dev_pm_opp_remove_table() cpufreq: mvebu: Remove OPPs using dev_pm_opp_remove() OPP: Create separate kref for static OPPs list OPP: Don't take OPP table's kref for static OPPs OPP: Parse OPP table's DT properties from _of_init_opp_table() OPP: Pass index to _of_init_opp_table() OPP: Protect dev_list with opp_table lock OPP: Don't try to remove all OPP tables on failure OPP: Free OPP table properly on performance state irregularities
2018-10-01gpio: Restore indentation of continued linesGeert Uytterhoeven
Fixes: 3027743f83f867d8 ("gpio: Remove VLA from gpiolib") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-10-01gpio: Propagate errors from gpiod_set_array_value_complex()Geert Uytterhoeven
Internal helper function gpiod_set_array_value_complex() was changed to return an error value, but not all gpiolib callers were updated to propagate the new error up. Fixes: 3027743f83f867d8 ("gpio: Remove VLA from gpiolib") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2018-10-01fuse: add max_pages to init_outConstantine Shulyupin
Replace FUSE_MAX_PAGES_PER_REQ with the configurable parameter max_pages to improve performance. Old RFC with detailed description of the problem and many fixes by Mitsuo Hayasaka (mitsuo.hayasaka.hu@hitachi.com): - https://lkml.org/lkml/2012/7/5/136 We've encountered performance degradation and fixed it on a big and complex virtual environment. Environment to reproduce degradation and improvement: 1. Add lag to user mode FUSE Add nanosleep(&(struct timespec){ 0, 1000 }, NULL); to xmp_write_buf in passthrough_fh.c 2. patch UM fuse with configurable max_pages parameter. The patch will be provided latter. 3. run test script and perform test on tmpfs fuse_test() { cd /tmp mkdir -p fusemnt passthrough_fh -o max_pages=$1 /tmp/fusemnt grep fuse /proc/self/mounts dd conv=fdatasync oflag=dsync if=/dev/zero of=fusemnt/tmp/tmp \ count=1K bs=1M 2>&1 | grep -v records rm fusemnt/tmp/tmp killall passthrough_fh } Test results: passthrough_fh /tmp/fusemnt fuse.passthrough_fh \ rw,nosuid,nodev,relatime,user_id=0,group_id=0 0 0 1073741824 bytes (1.1 GB) copied, 1.73867 s, 618 MB/s passthrough_fh /tmp/fusemnt fuse.passthrough_fh \ rw,nosuid,nodev,relatime,user_id=0,group_id=0,max_pages=256 0 0 1073741824 bytes (1.1 GB) copied, 1.15643 s, 928 MB/s Obviously with bigger lag the difference between 'before' and 'after' will be more significant. Mitsuo Hayasaka, in 2012 (https://lkml.org/lkml/2012/7/5/136), observed improvement from 400-550 to 520-740. Signed-off-by: Constantine Shulyupin <const@MakeLinux.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-09-30f2fs: support superblock checksumJunling Zheng
Now we support crc32 checksum for superblock. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Junling Zheng <zhengjunling@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-09-30IB/hfi1: Prepare resource waits for dual legDennis Dalessandro
Current implementation allows each qp to have only one send engine. As such, each qp has only one list to queue prebuilt packets when send engine resources are not available. To improve performance, it is desired to support multiple send engines for each qp. This patch creates the framework to support two send engines (two legs) for each qp for the TID RDMA protocol, which can be easily extended to support more send engines. It achieves the goal by creating a leg specific struct, iowait_work in the iowait struct, to hold the work_struct and the tx_list as well as a pointer to the parent iowait struct. The hfi1_pkt_state now has an additional field to record the current legs work structure and that is now passed to all egress waiters to determine the leg that needs to wait via a new iowait helper. The APIs are adjusted to use the new leg specific struct as required. Many new and modified helpers are added to support this change. Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30IB/rdmavt: Rename check_send_wqe as setup_wqeKaike Wan
The driver-provided function check_send_wqe allows the hardware driver to check and set up the incoming send wqe before it is inserted into the swqe ring. This patch will rename it as setup_wqe to better reflect its usage. In addition, this function is only called when all setup is complete in rdmavt. Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30RDMA/qedr: Remove enumerated type qed_roce_ll2_tx_destNathan Chancellor
Clang warns when one enumerated type is explicitly converted to another. drivers/infiniband/hw/qedr/qedr_roce_cm.c:198:28: warning: implicit conversion from enumeration type 'enum qed_roce_ll2_tx_dest' to different enumeration type 'enum qed_ll2_tx_dest' [-Wenum-conversion] ll2_tx_pkt.tx_dest = pkt->tx_dest; ~ ~~~~~^~~~~~~ 1 warning generated. Turns out that QED_ROCE_LL2_TX_DEST_NW and QED_ROCE_LL2_TX_DEST_LB are only used once in the whole tree and QED_ROCE_LL2_TX_DEST_MAX is used nowhere. Remove them and use the equivalent values from qed_ll2_tx_dest in their place. Reported-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Acked-by: Michal Kalderon <michal.kalderon@cavium.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-09-30SUNRPC: Replace krb5_seq_lock with a lockless schemeTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30NFS: Remove private spinlock in struct nfs_pgio_headerTrond Myklebust
Now that each struct nfs_pgio_header corresponds to one RPC call, we only have one writer to the struct nfs_pgio_header. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30NFSv4: Save a few bytes in the nfs_pgio_args/resTrond Myklebust
Save a few bytes by allowing the read/write specific fields of the structures to share storage. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30NFSv3: Improve NFSv3 performance when server returns no post-op attributesTrond Myklebust
When the server fails to return post-op attributes, the client's attempt to place read data directly in the page cache fails, and so we have to do an extra copy in order to realign the data with page borders. This patch attempts to detect servers that don't return post-op attributes on read (e.g. for pNFS) and adjusts the placement calculation accordingly. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30NFS: Convert lookups of the open context to RCUTrond Myklebust
Reduce contention on the inode->i_lock by ensuring that we use RCU when looking up the NFS open context. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30NFS: Convert lookups of the lock context to RCUTrond Myklebust
Speed up lookups of an existing lock context by avoiding the inode->i_lock, and using RCU instead. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Unexport xdr_partial_copy_from_skb()Trond Myklebust
It is no longer used outside of net/sunrpc/socklib.c Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Allow AF_LOCAL sockets to use the generic stream receiveTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Clean up - rename xs_tcp_data_receive() to xs_stream_data_receive()Trond Myklebust
In preparation for sharing with AF_LOCAL. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Simplify TCP receive code by switching to using iteratorsTrond Myklebust
Most of this code should also be reusable with other socket types. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Add a bvec array to struct xdr_buf for use with iovec_iter()Trond Myklebust
Add a bvec array to struct xdr_buf, and have the client allocate it when we need to receive data into pages. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Add a label for RPC calls that require allocation on receiveTrond Myklebust
If the RPC call relies on the receive call allocating pages as buffers, then let's label it so that we a) Don't leak memory by allocating pages for requests that do not expect this behaviour b) Can optimise for the common case where calls do not require allocation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Fix priority queue fairnessTrond Myklebust
Fix up the priority queue to not batch by owner, but by queue, so that we allow '1 << priority' elements to be dequeued before switching to the next priority queue. The owner field is still used to wake up requests in round robin order by owner to avoid single processes hogging the RPC layer by loading the queues. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Convert xprt receive queue to use an rbtreeTrond Myklebust
If the server is slow, we can find ourselves with quite a lot of entries on the receive queue. Converting the search from an O(n) to O(log(n)) can make a significant difference, particularly since we have to hold a number of locks while searching. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Cleanup: remove the unused 'task' argument from the request_send()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Clean up transport write space handlingTrond Myklebust
Treat socket write space handling in the same way we now treat transport congestion: by denying the XPRT_LOCK until the transport signals that it has free buffer space. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Turn off throttling of RPC slots for TCP socketsTrond Myklebust
The theory was that we would need to grab the socket lock anyway, so we might as well use it to gate the allocation of RPC slots for a TCP socket. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Support for congestion control when queuing is enabledTrond Myklebust
Both RDMA and UDP transports require the request to get a "congestion control" credit before they can be transmitted. Right now, this is done when the request locks the socket. We'd like it to happen when a request attempts to be transmitted for the first time. In order to support retransmission of requests that already hold such credits, we also want to ensure that they get queued first, so that we don't deadlock with requests that have yet to obtain a credit. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Improve latency for interactive tasksTrond Myklebust
One of the intentions with the priority queues was to ensure that no single process can hog the transport. The field task->tk_owner therefore identifies the RPC call's origin, and is intended to allow the RPC layer to organise queues for fairness. This commit therefore modifies the transmit queue to group requests by task->tk_owner, and ensures that we round robin among those groups. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Treat the task and request as separate in the xprt_ops->send_request()Trond Myklebust
When we shift to using the transmit queue, then the task that holds the write lock will not necessarily be the same as the one being transmitted. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Fix up the back channel transmitTrond Myklebust
Fix up the back channel code to recognise that it has already been transmitted, so does not need to be called again. Also ensure that we set req->rq_task. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Refactor RPC call encodingTrond Myklebust
Move the call encoding so that it occurs before the transport connection etc. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-09-30SUNRPC: Add a transmission queue for RPC requestsTrond Myklebust
Add the queue that will enforce the ordering of RPC task transmission. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>