summaryrefslogtreecommitdiff
path: root/drivers/virtio
AgeCommit message (Collapse)Author
2022-04-05Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Fixes and cleanups: - A couple of mlx5 fixes related to cvq - A couple of reverts dropping useless code (code that used it got reverted earlier)" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa: mlx5: synchronize driver status with CVQ vdpa: mlx5: prevent cvq work from hogging CPU Revert "virtio_config: introduce a new .enable_cbs method" Revert "virtio: use virtio_device_ready() in virtio_device_restore()"
2022-03-31Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - vdpa generic device type support - more virtio hardening for broken devices (but on the same theme, revert some virtio hotplug hardening patches - they were misusing some interrupt flags and had to be reverted) - RSS support in virtio-net - max device MTU support in mlx5 vdpa - akcipher support in virtio-crypto - shared IRQ support in ifcvf vdpa - a minor performance improvement in vhost - enable virtio mem for ARM64 - beginnings of advance dma support - cleanups, fixes all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (33 commits) vdpa/mlx5: Avoid processing works if workqueue was destroyed vhost: handle error while adding split ranges to iotlb vdpa: support exposing the count of vqs to userspace vdpa: change the type of nvqs to u32 vdpa: support exposing the config size to userspace vdpa/mlx5: re-create forwarding rules after mac modified virtio: pci: check bar values read from virtio config space Revert "virtio_pci: harden MSI-X interrupts" Revert "virtio-pci: harden INTX interrupts" drivers/net/virtio_net: Added RSS hash report control. drivers/net/virtio_net: Added RSS hash report. drivers/net/virtio_net: Added basic RSS support. drivers/net/virtio_net: Fixed padded vheader to use v1 with hash. virtio: use virtio_device_ready() in virtio_device_restore() tools/virtio: compile with -pthread tools/virtio: fix after premapped buf support virtio_ring: remove flags check for unmap packed indirect desc virtio_ring: remove flags check for unmap split indirect desc virtio_ring: rename vring_unmap_state_packed() to vring_unmap_extra_packed() net/mlx5: Add support for configuring max device MTU ...
2022-03-30Revert "virtio: use virtio_device_ready() in virtio_device_restore()"Michael S. Tsirkin
This reverts commit 8d65bc9a5be3f23c5e2ab36b6b8ef40095165b18. We reverted the problematic changes, no more need for work arounds on restore. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-03-28virtio: pci: check bar values read from virtio config spaceKeir Fraser
virtio pci config structures may in future have non-standard bar values in the bar field. We should anticipate this by skipping any structures containing such a reserved value. The bar value should never change: check for harmful modified values we re-read it from the config space in vp_modern_map_capability(). Also clean up an existing check to consistently use PCI_STD_NUM_BARS. Signed-off-by: Keir Fraser <keirf@google.com> Link: https://lore.kernel.org/r/20220323140727.3499235-1-keirf@google.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-28Revert "virtio_pci: harden MSI-X interrupts"Jason Wang
This reverts commit 9e35276a5344f74d4a3600fc4100b3dd251d5c56. Issue were reported for the drivers that are using affinity managed IRQ where manually toggling IRQ status is not expected. And we forget to enable the interrupts in the restore path as well. In the future, we will rework on the interrupt hardening. Fixes: 9e35276a5344 ("virtio_pci: harden MSI-X interrupts") Reported-by: Marc Zyngier <maz@kernel.org> Reported-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220323031524.6555-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-28Revert "virtio-pci: harden INTX interrupts"Jason Wang
This reverts commit 080cd7c3ac8701081d143a15ba17dd9475313188. Since the MSI-X interrupts hardening will be reverted in the next patch. We will rework the interrupt hardening in the future. Fixes: 080cd7c3ac87 ("virtio-pci: harden INTX interrupts") Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220323031524.6555-1-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-28virtio: use virtio_device_ready() in virtio_device_restore()Stefano Garzarella
After waking up a suspended VM, the kernel prints the following trace for virtio drivers which do not directly call virtio_device_ready() in the .restore: PM: suspend exit irq 22: nobody cared (try booting with the "irqpoll" option) Call Trace: <IRQ> dump_stack_lvl+0x38/0x49 dump_stack+0x10/0x12 __report_bad_irq+0x3a/0xaf note_interrupt.cold+0xb/0x60 handle_irq_event+0x71/0x80 handle_fasteoi_irq+0x95/0x1e0 __common_interrupt+0x6b/0x110 common_interrupt+0x63/0xe0 asm_common_interrupt+0x1e/0x40 ? __do_softirq+0x75/0x2f3 irq_exit_rcu+0x93/0xe0 sysvec_apic_timer_interrupt+0xac/0xd0 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x12/0x20 arch_cpu_idle+0x12/0x20 default_idle_call+0x39/0xf0 do_idle+0x1b5/0x210 cpu_startup_entry+0x20/0x30 start_secondary+0xf3/0x100 secondary_startup_64_no_verify+0xc3/0xcb </TASK> handlers: [<000000008f9bac49>] vp_interrupt [<000000008f9bac49>] vp_interrupt Disabling IRQ #22 This happens because we don't invoke .enable_cbs callback in virtio_device_restore(). That callback is used by some transports (e.g. virtio-pci) to enable interrupts. Let's fix it, by calling virtio_device_ready() as we do in virtio_dev_probe(). This function calls .enable_cts callback and sets DRIVER_OK status bit. This fix also avoids setting DRIVER_OK twice for those drivers that call virtio_device_ready() in the .restore. Fixes: d50497eb4e55 ("virtio_config: introduce a new .enable_cbs method") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20220322114313.116516-1-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-28virtio_ring: remove flags check for unmap packed indirect descXuan Zhuo
When calling vring_unmap_desc_packed(), it will not encounter the situation that the flags contains VRING_DESC_F_INDIRECT. So remove this logic. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20220224110402.108161-4-xuanzhuo@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-28virtio_ring: remove flags check for unmap split indirect descXuan Zhuo
When calling vring_unmap_one_split_indirect(), it will not encounter the situation that the flags contains VRING_DESC_F_INDIRECT. So remove this logic. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20220224110402.108161-3-xuanzhuo@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-28virtio_ring: rename vring_unmap_state_packed() to vring_unmap_extra_packed()Xuan Zhuo
The actual parameter handled by vring_unmap_state_packed() is that vring_desc_extra, so this function should use "extra" instead of "state". Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20220224110402.108161-2-xuanzhuo@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-28drivers/virtio: Enable virtio mem for ARM64Gavin Shan
This enables virtio-mem device support by allowing to enable the corresponding kernel config option (CONFIG_VIRTIO_MEM) on the architecture. Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20220119010551.181405-1-gshan@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-22mm: enforce pageblock_order < MAX_ORDERDavid Hildenbrand
Some places in the kernel don't really expect pageblock_order >= MAX_ORDER, and it looks like this is only possible in corner cases: 1) CONFIG_DEFERRED_STRUCT_PAGE_INIT we'll end up freeing pageblock_order pages via __free_pages_core(), which cannot possibly work. 2) find_zone_movable_pfns_for_nodes() will roundup the ZONE_MOVABLE start PFN to MAX_ORDER_NR_PAGES. Consequently with a bigger pageblock_order, we could have a single pageblock partially managed by two zones. 3) compaction code runs into __fragmentation_index() with order >= MAX_ORDER, when checking WARN_ON_ONCE(order >= MAX_ORDER). [1] 4) mm/page_reporting.c won't be reporting any pages with default page_reporting_order == pageblock_order, as we'll be skipping the reporting loop inside page_reporting_process_zone(). 5) __rmqueue_fallback() will never be able to steal with ALLOC_NOFRAGMENT. pageblock_order >= MAX_ORDER is weird either way: it's a pure optimization for making alloc_contig_range(), as used for allcoation of gigantic pages, a little more reliable to succeed. However, if there is demand for somewhat reliable allocation of gigantic pages, affected setups should be using CMA or boottime allocations instead. So let's make sure that pageblock_order < MAX_ORDER and simplify. [1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com Link: https://lkml.kernel.org/r/20220214174132.219303-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Frank Rowand <frowand.list@gmail.com> Cc: John Garry via iommu <iommu@lists.linux-foundation.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Rob Herring <robh+dt@kernel.org> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-06virtio: drop default for virtio-memMichael S. Tsirkin
There's no special reason why virtio-mem needs a default that's different from what kconfig provides, any more than e.g. virtio blk. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David Hildenbrand <david@redhat.com>
2022-03-04vdpa: factor out vdpa_set_features_unlocked for vdpa internal useSi-Wei Liu
No functional change introduced. vdpa bus driver such as virtio_vdpa or vhost_vdpa is not supposed to take care of the locking for core by its own. The locked API vdpa_set_features should suffice the bus driver's need. Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/1642206481-30721-2-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-03-04virtio: document virtio_reset_deviceMichael S. Tsirkin
Looks like most callers get driver/device removal wrong. Document what's expected of callers. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04virtio: acknowledge all features before accessMichael S. Tsirkin
The feature negotiation was designed in a way that makes it possible for devices to know which config fields will be accessed by drivers. This is broken since commit 404123c2db79 ("virtio: allow drivers to validate features") with fallout in at least block and net. We have a partial work-around in commit 2f9a174f918e ("virtio: write back F_VERSION_1 before validate") which at least lets devices find out which format should config space have, but this is a partial fix: guests should not access config space without acknowledging features since otherwise we'll never be able to change the config space format. To fix, split finalize_features from virtio_finalize_features and call finalize_features with all feature bits before validation, and then - if validation changed any bits - once again after. Since virtio_finalize_features no longer writes out features rename it to virtio_features_ok - since that is what it does: checks that features are ok with the device. As a side effect, this also reduces the amount of hypervisor accesses - we now only acknowledge features once unless we are clearing any features when validating (which is uncommon). IRC I think that this was more or less always the intent in the spec but unfortunately the way the spec is worded does not say this explicitly, I plan to address this at the spec level, too. Acked-by: Jason Wang <jasowang@redhat.com> Cc: stable@vger.kernel.org Fixes: 404123c2db79 ("virtio: allow drivers to validate features") Fixes: 2f9a174f918e ("virtio: write back F_VERSION_1 before validate") Cc: "Halil Pasic" <pasic@linux.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-04virtio: unexport virtio_finalize_featuresMichael S. Tsirkin
virtio_finalize_features is only used internally within virtio. No reason to export it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14vdpa: Allow to configure max data virtqueuesEli Cohen
Add netlink support to configure the max virtqueue pairs for a device. At least one pair is required. The maximum is dictated by the device. Example: $ vdpa dev add name vdpa-a mgmtdev auxiliary/mlx5_core.sf.1 max_vqp 4 Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-6-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14vdpa: Sync calls set/get config/status with cf_mutexEli Cohen
Add wrappers to get/set status and protect these operations with cf_mutex to serialize these operations with respect to get/set config operations. Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-4-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14vdpa: Provide interface to read driver featuresEli Cohen
Provide an interface to read the negotiated features. This is needed when building the netlink message in vdpa_dev_net_config_fill(). Also fix the implementation of vdpa_dev_net_config_fill() to use the negotiated features instead of the device features. To make APIs clearer, make the following name changes to struct vdpa_config_ops so they better describe their operations: get_features -> get_device_features set_features -> set_driver_features Finally, add get_driver_features to return the negotiated features and add implementation to all the upstream drivers. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-2-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14virtio_ring: mark ring unused on errorMichael S. Tsirkin
A recently added error path does not mark ring unused when exiting on OOM, which will lead to BUG on the next entry in debug builds. TODO: refactor code so we have START_USE and END_USE in the same function. Fixes: fc6d70f40b3d ("virtio_ring: check desc == NULL when using indirect with packed") Cc: "Xuan Zhuo" <xuanzhuo@linux.alibaba.com> Cc: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14virtio/virtio_pci_legacy_dev: ensure the correct return valuePeng Hao
When pci_iomap return NULL, the return value is zero. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Link: https://lore.kernel.org/r/20211222112014.87394-1-flyingpeng@tencent.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2022-01-14virtio/virtio_mem: handle a possible NULL as a memcpy parameterPeng Hao
There is a check for vm->sbm.sb_states before, and it should check it here as well. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Link: https://lore.kernel.org/r/20211222011225.40573-1-flyingpeng@tencent.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Fixes: 5f1f79bbc9e2 ("virtio-mem: Paravirtualized memory hotplug") Cc: stable@vger.kernel.org # v5.8+
2022-01-14virtio: fix a typo in function "vp_modern_remove" comments.Dapeng Mi
Function name "vp_modern_remove" in comments is written to "vp_modern_probe" incorrectly. Change it. Signed-off-by: Dapeng Mi <dapeng1.mi@intel.com> Link: https://lore.kernel.org/r/20211210073546.700783-1-dapeng1.mi@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2022-01-14virtio-pci: fix the confusing error message王贇
The error message on the failure of pfn check should tell virtio-pci rather than virtio-mmio, just fix it. Signed-off-by: Michael Wang <yun.wang@linux.alibaba.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/ae5e154e-ac59-f0fa-a7c7-091a2201f581@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-14virtio-mem: prepare fake page onlining code for granularity smaller than ↵David Hildenbrand
MAX_ORDER - 1 Let's prepare our fake page onlining code for subblock size smaller than MAX_ORDER - 1: we might get called for ranges not covering properly aligned MAX_ORDER - 1 pages. We have to detect the order to use dynamically. Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211126134209.17332-3-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Eric Ren <renzhengeek@gmail.com>
2022-01-14virtio-mem: prepare page onlining code for granularity smaller than ↵David Hildenbrand
MAX_ORDER - 1 Let's prepare our page onlining code for subblock size smaller than MAX_ORDER - 1: we'll get called for a MAX_ORDER - 1 page but might have some subblocks in the range plugged and some unplugged. In that case, fallback to subblock granularity to properly only expose the plugged parts to the buddy. Signed-off-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20211126134209.17332-2-david@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Eric Ren <renzhengeek@gmail.com>
2022-01-14virtio: wrap config->reset callsMichael S. Tsirkin
This will enable cleanups down the road. The idea is to disable cbs, then add "flush_queued_cbs" callback as a parameter, this way drivers can flush any work queued after callbacks have been disabled. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-12-08virtio_ring: Fix querying of maximum DMA mapping size for virtio deviceWill Deacon
virtio_max_dma_size() returns the maximum DMA mapping size of the virtio device by querying dma_max_mapping_size() for the device when the DMA API is in use for the vring. Unfortunately, the device passed is initialised by register_virtio_device() and does not inherit the DMA configuration from its parent, resulting in SWIOTLB errors when bouncing is enabled and the default 256K mapping limit (IO_TLB_SEGSIZE) is not respected: | virtio-pci 0000:00:01.0: swiotlb buffer is full (sz: 294912 bytes), total 1024 (slots), used 725 (slots) Follow the pattern used elsewhere in the virtio_ring code when calling into the DMA layer and pass the parent device to dma_max_mapping_size() instead. Cc: Marc Zyngier <maz@kernel.org> Cc: Quentin Perret <qperret@google.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20211201112018.25276-1-will@kernel.org Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Fixes: e6d6dd6c875e ("virtio: Introduce virtio_max_dma_size()") Cc: Joerg Roedel <jroedel@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-24Revert "virtio_ring: validate used buffer length"Michael S. Tsirkin
This reverts commit 939779f5152d161b34f612af29e7dc1ac4472fcf. Attempts to validate length in the core did not work out: there turn out to exist multiple broken devices, and in particular legacy devices are known to be broken in this respect. We have ideas for handling this better in the next version but for now let's revert to a known good state to make sure drivers work for people. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-10virtio-mem: support VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLEDavid Hildenbrand
The initial virtio-mem spec states that while unplugged memory should not be read, the device still has to allow for reading unplugged memory inside the usable region. The primary motivation for this default handling was to simplify bringup of virtio-mem, because there were corner cases where Linux might have accidentially read unplugged memory inside added Linux memory blocks. In the meantime, we: 1. Removed /dev/kmem in commit bbcd53c96071 ("drivers/char: remove /dev/kmem for good") 2. Disallowed access to virtio-mem device memory via /dev/mem in commit 2128f4e21aa2 ("virtio-mem: disallow mapping virtio-mem memory via /dev/mem") 3. Sanitized access to virtio-mem device memory via /proc/kcore in commit 0daa322b8ff9 ("fs/proc/kcore: don't read offline sections, logically offline pages and hwpoisoned pages") 4. Sanitized access to virtio-mem device memory via /proc/vmcore in commit ce2814622e84 ("virtio-mem: kdump mode to sanitize /proc/vmcore access") "Accidential" access to unplugged memory is no longer possible; we can support the new VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE feature that will be required by some hypervisors implementing virtio-mem in the near future. Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Marek Kedzierski <mkedzier@redhat.com> Cc: Hui Zhu <teawater@gmail.com> Cc: Sebastien Boeuf <sebastien.boeuf@intel.com> Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2021-11-09Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "87 patches. Subsystems affected by this patch series: mm (pagecache and hugetlb), procfs, misc, MAINTAINERS, lib, checkpatch, binfmt, kallsyms, ramfs, init, codafs, nilfs2, hfs, crash_dump, signals, seq_file, fork, sysvfs, kcov, gdb, resource, selftests, and ipc" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (87 commits) ipc/ipc_sysctl.c: remove fallback for !CONFIG_PROC_SYSCTL ipc: check checkpoint_restore_ns_capable() to modify C/R proc files selftests/kselftest/runner/run_one(): allow running non-executable files virtio-mem: disallow mapping virtio-mem memory via /dev/mem kernel/resource: disallow access to exclusive system RAM regions kernel/resource: clean up and optimize iomem_is_exclusive() scripts/gdb: handle split debug for vmlinux kcov: replace local_irq_save() with a local_lock_t kcov: avoid enable+disable interrupts if !in_task() kcov: allocate per-CPU memory on the relevant node Documentation/kcov: define `ip' in the example Documentation/kcov: include types.h in the example sysv: use BUILD_BUG_ON instead of runtime check kernel/fork.c: unshare(): use swap() to make code cleaner seq_file: fix passing wrong private data seq_file: move seq_escape() to a header signal: remove duplicate include in signal.h crash_dump: remove duplicate include in crash_dump.h crash_dump: fix boolreturn.cocci warning hfs/hfsplus: use WARN_ON for sanity check ...
2021-11-09virtio-mem: disallow mapping virtio-mem memory via /dev/memDavid Hildenbrand
We don't want user space to be able to map virtio-mem device memory directly (e.g., via /dev/mem) in order to have guarantees that in a sane setup we'll never accidentially access unplugged memory within the device-managed region of a virtio-mem device, just as required by the virtio-spec. As soon as the virtio-mem driver is loaded, the device region is visible in /proc/iomem via the parent device region. From that point on user space is aware of the device region and we want to disallow mapping anything inside that region (where we will dynamically (un)plug memory) until the driver has been unloaded cleanly and e.g., another driver might take over. By creating our parent IORESOURCE_SYSTEM_RAM resource with IORESOURCE_EXCLUSIVE, we will disallow any /dev/mem access to our device region until the driver was unloaded cleanly and removed the parent region. This will work even though only some memory blocks are actually currently added to Linux and appear as busy in the resource tree. So access to the region from user space is only possible a) if we don't load the virtio-mem driver. b) after unloading the virtio-mem driver cleanly. Don't build virtio-mem if access to /dev/mem cannot be restricticted -- if we have CONFIG_DEVMEM=y but CONFIG_STRICT_DEVMEM is not set. Link: https://lkml.kernel.org/r/20210920142856.17758-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09virtio-mem: kdump mode to sanitize /proc/vmcore accessDavid Hildenbrand
Although virtio-mem currently supports reading unplugged memory in the hypervisor, this will change in the future, indicated to the device via a new feature flag. We similarly sanitized /proc/kcore access recently. [1] Let's register a vmcore callback, to allow vmcore code to check if a PFN belonging to a virtio-mem device is either currently plugged and should be dumped or is currently unplugged and should not be accessed, instead mapping the shared zeropage or returning zeroes when reading. This is important when not capturing /proc/vmcore via tools like "makedumpfile" that can identify logically unplugged virtio-mem memory via PG_offline in the memmap, but simply by e.g., copying the file. Distributions that support virtio-mem+kdump have to make sure that the virtio_mem module will be part of the kdump kernel or the kdump initrd; dracut was recently [2] extended to include virtio-mem in the generated initrd. As long as no special kdump kernels are used, this will automatically make sure that virtio-mem will be around in the kdump initrd and sanitize /proc/vmcore access -- with dracut. With this series, we'll send one virtio-mem state request for every ~2 MiB chunk of virtio-mem memory indicated in the vmcore that we intend to read/map. In the future, we might want to allow building virtio-mem for kdump mode only, even without CONFIG_MEMORY_HOTPLUG and friends: this way, we could support special stripped-down kdump kernels that have many other config options disabled; we'll tackle that once required. Further, we might want to try sensing bigger blocks (e.g., memory sections) first before falling back to device blocks on demand. Tested with Fedora rawhide, which contains a recent kexec-tools version (considering "System RAM (virtio_mem)" when creating the vmcore header) and a recent dracut version (including the virtio_mem module in the kdump initrd). Link: https://lkml.kernel.org/r/20210526093041.8800-1-david@redhat.com [1] Link: https://github.com/dracutdevs/dracut/pull/1157 [2] Link: https://lkml.kernel.org/r/20211005121430.30136-10-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09virtio-mem: factor out hotplug specifics from virtio_mem_remove() into ↵David Hildenbrand
virtio_mem_deinit_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/20211005121430.30136-9-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09virtio-mem: factor out hotplug specifics from virtio_mem_probe() into ↵David Hildenbrand
virtio_mem_init_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/20211005121430.30136-8-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-09virtio-mem: factor out hotplug specifics from virtio_mem_init() into ↵David Hildenbrand
virtio_mem_init_hotplug() Let's prepare for a new virtio-mem kdump mode in which we don't actually hot(un)plug any memory but only observe the state of device blocks. Link: https://lkml.kernel.org/r/20211005121430.30136-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Young <dyoung@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-06Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "257 patches. Subsystems affected by this patch series: scripts, ocfs2, vfs, and mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache, gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools, memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm, vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram, cleanups, kfence, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits) mm/damon: remove return value from before_terminate callback mm/damon: fix a few spelling mistakes in comments and a pr_debug message mm/damon: simplify stop mechanism Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions Docs/admin-guide/mm/damon/start: simplify the content Docs/admin-guide/mm/damon/start: fix a wrong link Docs/admin-guide/mm/damon/start: fix wrong example commands mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on mm/damon: remove unnecessary variable initialization Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM) selftests/damon: support watermarks mm/damon/dbgfs: support watermarks mm/damon/schemes: activate schemes based on a watermarks mechanism tools/selftests/damon: update for regions prioritization of schemes mm/damon/dbgfs: support prioritization weights mm/damon/vaddr,paddr: support pageout prioritization mm/damon/schemes: prioritize regions within the quotas mm/damon/selftests: support schemes quotas mm/damon/dbgfs: support quotas of schemes ...
2021-11-06mm/memory_hotplug: remove CONFIG_MEMORY_HOTPLUG_SPARSEDavid Hildenbrand
CONFIG_MEMORY_HOTPLUG depends on CONFIG_SPARSEMEM, so there is no need for CONFIG_MEMORY_HOTPLUG_SPARSE anymore; adjust all instances to use CONFIG_MEMORY_HOTPLUG and remove CONFIG_MEMORY_HOTPLUG_SPARSE. Link: https://lkml.kernel.org/r/20210929143600.49379-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Shuah Khan <skhan@linuxfoundation.org> [kselftest] Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Alex Shi <alexs@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-04Merge tag 'char-misc-5.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big set of char and misc and other tiny driver subsystem updates for 5.16-rc1. Loads of things in here, all of which have been in linux-next for a while with no reported problems (except for one called out below.) Included are: - habanana labs driver updates, including dma_buf usage, reviewed and acked by the dma_buf maintainers - iio driver update (going through this tree not staging as they really do not belong going through that tree anymore) - counter driver updates - hwmon driver updates that the counter drivers needed, acked by the hwmon maintainer - xillybus driver updates - binder driver updates - extcon driver updates - dma_buf module namespaces added (will cause a build error in arm64 for allmodconfig, but that change is on its way through the drm tree) - lkdtm driver updates - pvpanic driver updates - phy driver updates - virt acrn and nitr_enclaves driver updates - smaller char and misc driver updates" * tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (386 commits) comedi: dt9812: fix DMA buffers on stack comedi: ni_usb6501: fix NULL-deref in command paths arm64: errata: Enable TRBE workaround for write to out-of-range address arm64: errata: Enable workaround for TRBE overwrite in FILL mode coresight: trbe: Work around write to out of range coresight: trbe: Make sure we have enough space coresight: trbe: Add a helper to determine the minimum buffer size coresight: trbe: Workaround TRBE errata overwrite in FILL mode coresight: trbe: Add infrastructure for Errata handling coresight: trbe: Allow driver to choose a different alignment coresight: trbe: Decouple buffer base from the hardware base coresight: trbe: Add a helper to pad a given buffer area coresight: trbe: Add a helper to calculate the trace generated coresight: trbe: Defer the probe on offline CPUs coresight: trbe: Fix incorrect access of the sink specific data coresight: etm4x: Add ETM PID for Kryo-5XX coresight: trbe: Prohibit trace before disabling TRBE coresight: trbe: End the AUX handle on truncation coresight: trbe: Do not truncate buffer on IRQ coresight: trbe: Fix handling of spurious interrupts ...
2021-11-01vdpa: Introduce and use vdpa device get, set config helpersParav Pandit
Subsequent patches enable get and set configuration either via management device or via vdpa device' config ops. This requires synchronization between multiple callers to get and set config callbacks. Features setting also influence the layout of the configuration fields endianness. To avoid exposing synchronization primitives to callers, introduce helper for setting the configuration and use it. Signed-off-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Eli Cohen <elic@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20211026175519.87795-2-parav@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01virtio_ring: validate used buffer lengthJason Wang
This patch validate the used buffer length provided by the device before trying to use it. This is done by record the in buffer length in a new field in desc_state structure during virtqueue_add(), then we can fail the virtqueue_get_buf() when we find the device is trying to give us a used buffer length which is greater than the in buffer length. Since some drivers have already done the validation by themselves, this patch tries to makes the core validation optional. For the driver that doesn't want the validation, it can set the suppress_used_validation to be true (which could be overridden by force_used_validation module parameter). To be more efficient, a dedicate array is used for storing the validate used length, this helps to eliminate the cache stress if validation is done by the driver. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211027022107.14357-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01virtio_ring: fix typos in vring_desc_extraJason Wang
We're actually tracking descriptor address and length instead of the buffer. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211019070152.8236-7-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01virtio-pci: harden INTX interruptsJason Wang
This patch tries to make sure the virtio interrupt handler for INTX won't be called after a reset and before virtio_device_ready(). We can't use IRQF_NO_AUTOEN since we're using shared interrupt (IRQF_SHARED). So this patch tracks the INTX enabling status in a new intx_soft_enabled variable and toggle it during in vp_disable/enable_vectors(). The INTX interrupt handler will check intx_soft_enabled before processing the actual interrupt. Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211019070152.8236-6-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01virtio_pci: harden MSI-X interruptsJason Wang
We used to synchronize pending MSI-X irq handlers via synchronize_irq(), this may not work for the untrusted device which may keep sending interrupts after reset which may lead unexpected results. Similarly, we should not enable MSI-X interrupt until the device is ready. So this patch fixes those two issues by: 1) switching to use disable_irq() to prevent the virtio interrupt handlers to be called after the device is reset. 2) using IRQF_NO_AUTOEN and enable the MSI-X irq during .ready() This can make sure the virtio interrupt handler won't be called before virtio_device_ready() and after reset. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211019070152.8236-5-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01virtio_ring: check desc == NULL when using indirect with packedXuan Zhuo
When using indirect with packed, we don't check for allocation failures. This patch checks that and fall back on direct. Fixes: 1ce9e6055fa0 ("virtio_ring: introduce packed ring support") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://lore.kernel.org/r/20211020112323.67466-3-xuanzhuo@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01virtio_ring: make virtqueue_add_indirect_packed prettierXuan Zhuo
Align the arguments of virtqueue_add_indirect_packed() to the open ( to make it look prettier. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20211020112323.67466-2-xuanzhuo@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01virtio_vdpa: setup correct vq size with callbacks get_vq_num_{max,min}Wu Zongyong
For the devices which implement the get_vq_num_min callback, the driver should not negotiate with virtqueue size with the backend vdpa device if the value returned by get_vq_num_min equals to the value returned by get_vq_num_max. This is useful for vdpa devices based on legacy virtio specfication. Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com> Link: https://lore.kernel.org/r/bc0551cec6c3f3dd9424b678b7c22d882aebab3a.1635493219.git.wuzongyong@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-11-01virtio-pci: introduce legacy device moduleWu Zongyong
Split common codes from virtio-pci-legacy so vDPA driver can reuse it later. Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/71605acde5e97fcb2760a6973e406279fb1bbd33.1635493219.git.wuzongyong@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-10-27virtio-ring: fix DMA metadata flagsVincent Whitchurch
The flags are currently overwritten, leading to the wrong direction being passed to the DMA unmap functions. Fixes: 72b5e8958738aaa4 ("virtio-ring: store DMA metadata in desc_extra for split virtqueue") Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Link: https://lore.kernel.org/r/20211026133100.17541-1-vincent.whitchurch@axis.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>