summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2022-07-26mmc: core: Add support for SDIO wakeup interruptAxe Yang
If wakeup-source flag is set in host dts node, parse EAI information from SDIO CCCR interrupt externsion segment for in-band wakeup. If async interrupt is supported by SDIO card then enable it and set enable_async_irq flag in sdio_cccr structure to 1. The parse flow is implemented in sdio_read_cccr(). Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Axe Yang <axe.yang@mediatek.com> Link: https://lore.kernel.org/r/20220726062842.18846-3-axe.yang@mediatek.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2022-07-26PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg()Logan Gunthorpe
This interface is superseded by support in dma_map_sg() which now supports heterogeneous scatterlists. There are no longer any users, so remove it. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-26dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA supportLogan Gunthorpe
Add a flags member to the dma_map_ops structure with one flag to indicate support for PCI P2PDMA. Also, add a helper to check if a device supports PCI P2PDMA. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-26PCI/P2PDMA: Introduce helpers for dma_map_sg implementationsLogan Gunthorpe
Add pci_p2pdma_map_segment() as a helper for dma_map_sg() implementations. It takes an scatterlist segment that must point to a pci_p2pdma struct page and will map it if the mapping requires a bus address. The return value indicates whether the mapping required a bus address or whether the caller still needs to map the segment normally. If the segment should not be mapped, -EREMOTEIO is returned. This helper uses a state structure to track the changes to the pgmap across calls and avoid needing to lookup into the xarray for every page. The prototype for the helper is added to dma-map-ops.h as it is only useful to dma map implementations and don't need to pollute the public pci-p2pdma header. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-26lib/scatterlist: add flag for indicating P2PDMA segments in an SGLLogan Gunthorpe
Introduce a dma_flags field in struct scatterlist. These flags will be used by dma_[un]map_sg_p2pdma() to determine when a given SGL segments dma_address points to a PCI bus address. dma_unmap_sg_p2pdma() will need to perform different cleanup when a segment is marked as a bus address. The dma_flags field will fit in the existing padding on 64BIT systems (assuming CONFIG_NEED_SG_DMA_LENGTH is also set). The new bit will only be used when CONFIG_PCI_P2PDMA is set; this means PCI P2PDMA will require CONFIG_64BIT. This should be acceptable as the majority of P2PDMA use cases are restricted to newer root complexes and roughly require the extra address space for memory BARs used in the transactions. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-07-26video: fbdev: imxfb: Fold <linux/platform_data/video-imxfb.h> into only userUwe Kleine-König
No source file but the driver itself includes the header containing the platform data definition. The last user is gone since commit 8485adf17a15 ("ARM: imx: Remove imx device directory"). Move the remaining symbols directly into the driver and remove the then unused header file. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Helge Deller <deller@gmx.de>
2022-07-26video: fbdev: imxfb: Drop unused symbols from headerUwe Kleine-König
The only file that includes <linux/platform_data/video-imxfb.h> is the imxfb driver. Drop all symbols that are unused there. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Helge Deller <deller@gmx.de>
2022-07-26video: fbdev: imxfb: Drop platform data supportUwe Kleine-König
No source file but the driver itself includes the header containing the platform data definition. The last user is gone since commit 8485adf17a15 ("ARM: imx: Remove imx device directory"). So we can safely drop platform data support. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Helge Deller <deller@gmx.de>
2022-07-25Merge tag 'wireless-next-2022-07-25' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v5.20 Third set of patches for v5.20. MLO work continues and we have a lot of stack changes due to that, including driver API changes. Not much driver patches except on mt76. Major changes: cfg80211/mac80211 - more prepartion for Wi-Fi 7 Multi-Link Operation (MLO) support, works with one link now - align with IEEE Draft P802.11be_D2.0 - hardware timestamps for receive and transmit mt76 - preparation for new chipset support - ACPI SAR support * tag 'wireless-next-2022-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (254 commits) wifi: mac80211: fix link data leak wifi: mac80211: mlme: fix disassoc with MLO wifi: mac80211: add macros to loop over active links wifi: mac80211: remove erroneous sband/link validation wifi: mac80211: mlme: transmit assoc frame with address translation wifi: mac80211: verify link addresses are different wifi: mac80211: rx: track link in RX data wifi: mac80211: optionally implement MLO multicast TX wifi: mac80211: expand ieee80211_mgmt_tx() for MLO wifi: nl80211: add MLO link ID to the NL80211_CMD_FRAME TX API wifi: mac80211: report link ID to cfg80211 on mgmt RX wifi: cfg80211: report link ID in NL80211_CMD_FRAME wifi: mac80211: add hardware timestamps for RX and TX wifi: cfg80211: add hardware timestamps to frame RX info wifi: cfg80211/nl80211: move rx management data into a struct wifi: cfg80211: add a function for reporting TX status with hardware timestamps wifi: nl80211: add RX and TX timestamp attributes wifi: ieee80211: add helper functions for detecting TM/FTM frames wifi: mac80211_hwsim: handle links for wmediumd/virtio wifi: mac80211: sta_info: fix link_sta insertion ... ==================== Link: https://lore.kernel.org/r/20220725174547.EA465C341C6@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-25vfio: Replace phys_pfn with pages for vfio_pin_pages()Nicolin Chen
Most of the callers of vfio_pin_pages() want "struct page *" and the low-level mm code to pin pages returns a list of "struct page *" too. So there's no gain in converting "struct page *" to PFN in between. Replace the output parameter "phys_pfn" list with a "pages" list, to simplify callers. This also allows us to replace the vfio_iommu_type1 implementation with a more efficient one. And drop the pfn_valid check in the gvt code, as there is no need to do such a check at a page-backed struct page pointer. For now, also update vfio_iommu_type1 to fit this new parameter too. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Eric Farman <farman@linux.ibm.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-11-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-25vfio: Rename user_iova of vfio_dma_rw()Nicolin Chen
Following the updated vfio_pin/unpin_pages(), use the simpler "iova". Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-9-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-25vfio: Pass in starting IOVA to vfio_pin/unpin_pages APINicolin Chen
The vfio_pin/unpin_pages() so far accepted arrays of PFNs of user IOVA. Among all three callers, there was only one caller possibly passing in a non-contiguous PFN list, which is now ensured to have contiguous PFN inputs too. Pass in the starting address with "iova" alone to simplify things, so callers no longer need to maintain a PFN list or to pin/unpin one page at a time. This also allows VFIO to use more efficient implementations of pin/unpin_pages. For now, also update vfio_iommu_type1 to fit this new parameter too, while keeping its input intact (being user_iova) since we don't want to spend too much effort swapping its parameters and local variables at that level. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com> Acked-by: Eric Farman <farman@linux.ibm.com> Tested-by: Terrence Xu <terrence.xu@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Link: https://lore.kernel.org/r/20220723020256.30081-6-nicolinc@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-07-25block: add bdev_max_segments() helperNaohiro Aota
Add bdev_max_segments() like other queue parameters. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-07-25highmem: Make __kunmap_{local,atomic}() take const void pointerFabio M. De Francesco
__kunmap_ {local,atomic}() currently take pointers to void. However, this is semantically incorrect, since these functions do not change the memory their arguments point to. Therefore, make this semantics explicit by modifying the __kunmap_{local,atomic}() prototypes to take pointers to const void. As a side effect, compilers may produce more efficient code. Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Helge Deller <deller@gmx.de> # parisc Suggested-by: David Sterba <dsterba@suse.cz> Suggested-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2022-07-25SUNRPC create a function that probes only offline transportsOlga Kornievskaia
For only offline transports, attempt to check connectivity via a NULL call and, if that succeeds, call a provided session trunking detection function. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25SUNRPC export xprt_iter_rewind functionOlga Kornievskaia
Make xprt_iter_rewind callable outside of xprtmultipath.c Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25SUNRPC create an rpc function that allows xprt removal from rpc_clntOlga Kornievskaia
Expose a function that allows a removal of xprt from the rpc_clnt. When called from NFS that's running a trunked transport then don't decrement the active transport counter. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25SUNRPC enable back offline transports in trunking discoveryOlga Kornievskaia
When we are adding a transport to a xprt_switch that's already on the list but has been marked OFFLINE, then make the state ONLINE since it's been tested now. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25SUNRPC create an iterator to list only OFFLINE xprtsOlga Kornievskaia
Create a new iterator helper that will go thru the all the transports in the switch and return transports that are marked OFFLINE. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25SUNRPC add function to offline remove trunkable transportsOlga Kornievskaia
Iterate thru available transports in the xprt_switch for all trunkable transports offline and possibly remote them as well. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25SUNRPC expose functions for offline remote xprt functionalityOlga Kornievskaia
Re-arrange the code that make offline transport and delete transport callable functions. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-25random: handle archrandom with multiple longsJason A. Donenfeld
The archrandom interface was originally designed for x86, which supplies RDRAND/RDSEED for receiving random words into registers, resulting in one function to generate an int and another to generate a long. However, other architectures don't follow this. On arm64, the SMCCC TRNG interface can return between one and three longs. On s390, the CPACF TRNG interface can return arbitrary amounts, with four longs having the same cost as one. On UML, the os_getrandom() interface can return arbitrary amounts. So change the api signature to take a "max_longs" parameter designating the maximum number of longs requested, and then return the number of longs generated. Since callers need to check this return value and loop anyway, each arch implementation does not bother implementing its own loop to try again to fill the maximum number of longs. Additionally, all existing callers pass in a constant max_longs parameter. Taken together, these two things mean that the codegen doesn't really change much for one-word-at-a-time platforms, while performance is greatly improved on platforms such as s390. Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-07-25Merge branch 'for-next/perf' into for-next/coreWill Deacon
* for-next/perf: drivers/perf: arm_spe: Fix consistency of SYS_PMSCR_EL1.CX perf: RISC-V: Add of_node_put() when breaking out of for_each_of_cpu_node() docs: perf: Include hns3-pmu.rst in toctree to fix 'htmldocs' WARNING drivers/perf: hisi: add driver for HNS3 PMU drivers/perf: hisi: Add description for HNS3 PMU driver drivers/perf: riscv_pmu_sbi: perf format perf/arm-cci: Use the bitmap API to allocate bitmaps drivers/perf: riscv_pmu: Add riscv pmu pm notifier perf: hisi: Extract hisi_pmu_init perf/marvell_cn10k: Fix TAD PMU register offset perf/marvell_cn10k: Remove useless license text when SPDX-License-Identifier is already used arm64: cpufeature: Allow different PMU versions in ID_DFR0_EL1 perf/arm-cci: fix typo in comment drivers/perf:Directly use ida_alloc()/free() drivers/perf: Directly use ida_alloc()/free()
2022-07-25Merge branch 'for-next/mte' into for-next/coreWill Deacon
* for-next/mte: arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags" mm: kasan: Skip page unpoisoning only if __GFP_SKIP_KASAN_UNPOISON mm: kasan: Skip unpoisoning of user pages mm: kasan: Ensure the tags are visible before the tag in page->flags
2022-07-25Merge branch irq/misc-5.20 into irq/irqchip-nextMarc Zyngier
* irq/misc-5.20: : . : Misc IRQ changes for 5.20: : : - Let irq_set_chip_handler_name_locked() take a const struct irq_chip * : : - Convert the ocelot irq_chip to being immutable (depends on the above) : : - Tidy-up the NOMAP irqdomain API variant : : - Teach action_show() to use for_each_action_of_desc() : : - Check ioremap() return value in the MIPS GIC driver : : - Move MMP driver init function declarations into the common .h : : - The obligatory typo fixes : . irqchip/mmp: Declare init functions in common header file irqchip/mips-gic: Check the return value of ioremap() in gic_of_init() genirq: Use for_each_action_of_desc in actions_show() irqdomain: Use hwirq_max instead of revmap_size for NOMAP domains irqdomain: Report irq number for NOMAP domains irqchip/gic-v3: Fix comment typo pinctrl: ocelot: Make irq_chip immutable genirq: Allow irq_set_chip_handler_name_locked() to take a const irq_chip Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-07-25irqchip/mmp: Declare init functions in common header fileBen Dooks
The functions icu_init_irq and mmp2_init_icu are exported from this code, so declare them in the header file to avoid the following sparse warnings: drivers/irqchip/irq-mmp.c:248:13: warning: symbol 'icu_init_irq' was not declared. Should it be static? drivers/irqchip/irq-mmp.c:271:13: warning: symbol 'mmp2_init_icu' was not declared. Should it be static? Signed-off-by: Ben Dooks <ben-linux@fluff.org> [maz: fixup commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220724222152.551850-1-ben-linux@fluff.org
2022-07-24io_uring: cache struct io_notifPavel Begunkov
kmalloc'ing struct io_notif is too expensive when done frequently, cache them as many other resources in io_uring. Keep two list, the first one is from where we're getting notifiers, it's protected by ->uring_lock. The second is protected by ->completion_lock, to which we queue released notifiers. Then we splice one list into another when needed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9dec18f7fcbab9f4bd40b96e5ae158b119945230.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24io_uring: add zc notification infrastructurePavel Begunkov
Add internal part of send zerocopy notifications. There are two main structures, the first one is struct io_notif, which carries inside struct ubuf_info and maps 1:1 to it. io_uring will be binding a number of zerocopy send requests to it and ask to complete (aka flush) it. When flushed and all attached requests and skbs complete, it'll generate one and only one CQE. There are intended to be passed into the network layer as struct msghdr::msg_ubuf. The second concept is notification slots. The userspace will be able to register an array of slots and subsequently addressing them by the index in the array. Slots are independent of each other. Each slot can have only one notifier at a time (called active notifier) but many notifiers during the lifetime. When active, a notifier not going to post any completion but the userspace can attach requests to it by specifying the corresponding slot while issueing send zc requests. Eventually, the userspace will want to "flush" the notifier losing any way to attach new requests to it, however it can use the next atomatically added notifier of this slot or of any other slot. When the network layer is done with all enqueued skbs attached to a notifier and doesn't need the specified in them user data, the flushed notifier will post a CQE. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3ecf54c31a85762bf679b0a432c9f43ecf7e61cc.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24io_uring: export io_put_task()Pavel Begunkov
Make io_put_task() available to non-core parts of io_uring, we'll need it for notification infrastructure. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3686807d4c03b72e389947b0e8692d4d44334ef0.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24Merge branch 'for-5.20/io_uring' into for-5.20/io_uring-zerocopy-sendJens Axboe
* for-5.20/io_uring: (716 commits) io_uring: ensure REQ_F_ISREG is set async offload net: fix compat pointer in get_compat_msghdr() io_uring: Don't require reinitable percpu_ref io_uring: fix types in io_recvmsg_multishot_overflow io_uring: Use atomic_long_try_cmpxchg in __io_account_mem io_uring: support multishot in recvmsg net: copy from user before calling __get_compat_msghdr net: copy from user before calling __copy_msghdr io_uring: support 0 length iov in buffer select in compat io_uring: fix multishot ending when not polled io_uring: add netmsg cache io_uring: impose max limit on apoll cache io_uring: add abstraction around apoll cache io_uring: move apoll cache to poll.c io_uring: consolidate hash_locked io-wq handling io_uring: clear REQ_F_HASH_LOCKED on hash removal io_uring: don't race double poll setting REQ_F_ASYNC_DATA io_uring: don't miss setting REQ_F_DOUBLE_POLL io_uring: disable multishot recvmsg io_uring: only trace one of complete or overflow ... Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24fs: Add async write file modification handling.Stefan Roesch
This adds a file_modified_async() function to return -EAGAIN if the request either requires to remove privileges or needs to update the file modification time. This is required for async buffered writes, so the request gets handled in the io worker of io-uring. Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20220623175157.1715274-11-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24fs: add a FMODE_BUF_WASYNC flags for f_modeStefan Roesch
This introduces the flag FMODE_BUF_WASYNC. If devices support async buffered writes, this flag can be set. It also modifies the check in generic_write_checks to take async buffered writes into consideration. Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20220623175157.1715274-8-shr@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24mm: Add balance_dirty_pages_ratelimited_flags() functionJan Kara
This adds the helper function balance_dirty_pages_ratelimited_flags(). It adds the parameter flags to balance_dirty_pages_ratelimited(). The flags parameter is passed to balance_dirty_pages(). For async buffered writes the flag value will be BDP_ASYNC. If balance_dirty_pages() gets called for async buffered write, we don't want to wait. Instead we need to indicate to the caller that throttling is needed so that it can stop writing and offload the rest of the write to a context that can block. The new helper function is also used by balance_dirty_pages_ratelimited(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Stefan Roesch <shr@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220623175157.1715274-4-shr@fb.com [axboe: fix kerneltest bot 'ret' issue] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24net: copy from user before calling __copy_msghdrDylan Yudaken
this is in preparation for multishot receive from io_uring, where it needs to have access to the original struct user_msghdr. functionally this should be a no-op. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220714110258.1336200-2-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24io_uring: add netmsg cacheJens Axboe
For recvmsg/sendmsg, if they don't complete inline, we currently need to allocate a struct io_async_msghdr for each request. This is a somewhat large struct. Hook up sendmsg/recvmsg to use the io_alloc_cache. This reduces the alloc + free overhead considerably, yielding 4-5% of extra performance running netbench. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24io_uring: impose max limit on apoll cacheJens Axboe
Caches like this tend to grow to the peak size, and then never get any smaller. Impose a max limit on the size, to prevent it from growing too big. A somewhat randomly chosen 512 is the max size we'll allow the cache to get. If a batch of frees come in and would bring it over that, we simply start kfree'ing the surplus. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24io_uring: add abstraction around apoll cacheJens Axboe
In preparation for adding limits, and one more user, abstract out the core bits of the allocation+free cache. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24io_uring: let to set a range for file slot allocationPavel Begunkov
From recently io_uring provides an option to allocate a file index for operation registering fixed files. However, it's utterly unusable with mixed approaches when for a part of files the userspace knows better where to place it, as it may race and users don't have any sane way to pick a slot and hoping it will not be taken. Let the userspace to register a range of fixed file slots in which the auto-allocation happens. The use case is splittting the fixed table in two parts, where on of them is used for auto-allocation and another for slot-specified operations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/66ab0394e436f38437cf7c44676e1920d09687ad.1656154403.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24io_uring: fuse fallback_node and normal tw nodePavel Begunkov
Now as both normal and fallback paths use llist, just keep one node head in struct io_task_work and kill off ->fallback_node. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d04ebde409f7b162fe247b361b4486b193293e46.1656153285.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24io_uring: lockless task listDylan Yudaken
With networking use cases we see contention on the spinlock used to protect the task_list when multiple threads try and add completions at once. Instead we can use a lockless list, and assume that the first caller to add to the list is responsible for kicking off task work. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220622134028.2013417-4-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24io_uring: hide eventfd assumptions in eventfd pathsPavel Begunkov
Some io_uring-eventfd users assume that there won't be spurious wakeups. That assumption has to be honoured by all io_cqring_ev_posted() callers, which is inconvenient and from time to time leads to problems but should be maintained to not break the userspace. Instead of making the callers track whether a CQE was posted or not, hide it inside io_eventfd_signal(). It saves ->cached_cq_tail it saw last time and triggers the eventfd only when ->cached_cq_tail changed since then. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0ffc66bae37a2513080b601e4370e147faaa72c5.1655684496.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24io_uring: remove ->flush_cqes optimisationPavel Begunkov
It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often ->flush_cqes flag prevents from completion being flushed. Sometimes it's high level of concurrency that enables it at least for one CQE, but sometimes it doesn't save much because nobody waiting on the CQ. Remove ->flush_cqes flag and the optimisation, it should benefit the normal use case. Note, that there is no spurious eventfd problem with that as checks for spuriousness were incorporated into io_eventfd_signal(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/692e81eeddccc096f449a7960365fa7b4a18f8e6.1655637157.git.asml.silence@gmail.com [axboe: remove now dead state->flush_cqes variable] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24io_uring: move a few private types to local headersJens Axboe
Commit 3a3d47fa9cfd ("io_uring: make io_uring_types.h public") moved a bunch of io_uring types to a kernel wide header, so we could make tracing a bit saner rather than pass in a ton of arguments. However, there are a few types in there that are not really needed to be system wide. Move the cancel data and mapped buffers back to the appropriate io_uring local headers. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-24io_uring: make io_uring_types.h publicPavel Begunkov
Move io_uring types to linux/include, need them public so tracing can see the definitions and we can clean trace/events/io_uring.h Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a15f12e8cb7289b2de0deaddcc7518d98a132d17.1655384063.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-07-23SUNRPC: Remove xdr_align_data() and xdr_expand_hole()Anna Schumaker
These functions are no longer needed now that the NFS client places data and hole segments directly. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-23SUNRPC: Add a function for zeroing out a portion of an xdr_streamAnna Schumaker
This will be used during READ_PLUS decoding for handling HOLE segments. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-23SUNRPC: Add a function for directly setting the xdr page lenAnna Schumaker
We need to do this step during READ_PLUS decoding so that we know pages are the right length and any extra data has been preserved in the tail. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-23SUNRPC: Introduce xdr_stream_move_subsegment()Anna Schumaker
I do this by creating an xdr subsegment for the range we will be operating over. This lets me shift data to the correct place without potentially overwriting anything already there. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-23nfs: only issue commit in DIO codepath if we have uncommitted dataJeff Layton
Currently, we try to determine whether to issue a commit based on nfs_write_need_commit which looks at the current verifier. In the case where we got a short write and then tried to follow it up with one that failed, the verifier can't be trusted. What we really want to know is whether the pgio request had any successful writes that came back as UNSTABLE. Add a new flag to the pgio request, and use that to indicate that we've had a successful unstable write. Only issue a commit if that flag is set. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-23SUNRPC: Shrink size of struct rpc_taskTrond Myklebust
Move the field 'tk_rpc_status' so that we eliminate a 4 byte hole in the structure. For x86_64, this shrinks the size of the struct by 8 bytes. 'pahole' output before the change: /* size: 232, cachelines: 4, members: 27 */ /* sum members: 222, holes: 1, sum holes: 4 */ /* sum bitfield members: 8 bits (1 bytes) */ /* padding: 5 */ /* last cacheline: 40 bytes */ 'pahole' output after the change: /* size: 224, cachelines: 4, members: 27 */ /* padding: 1 */ /* last cacheline: 32 bytes */ Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>