Age | Commit message (Collapse) | Author |
|
This increases the coverage the fail_nth test gets, as well as via
syzkaller.
Link: https://lore.kernel.org/r/17-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Provide a mock kernel module for the iommu_domain that allows it to run
without any HW and the mocking provides a way to directly validate that
the PFNs loaded into the iommu_domain are correct. This exposes the access
kAPI toward userspace to allow userspace to explore the functionality of
pages.c and io_pagetable.c
The mock also simulates the rare case of PAGE_SIZE > iommu page size as
the mock will operate at a 2K iommu page size. This allows exercising all
of the calculations to support this mismatch.
This is also intended to support syzkaller exploring the same space.
However, it is an unusually invasive config option to enable all of
this. The config option should not be enabled in a production kernel.
Link: https://lore.kernel.org/r/16-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390
Tested-by: Eric Auger <eric.auger@redhat.com> # aarch64
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
mapping them into io_pagetable operations.
A userspace application can test against iommufd and confirm compatibility
then simply make a small change to open /dev/iommu instead of
/dev/vfio/vfio.
For testing purposes /dev/vfio/vfio can be symlinked to /dev/iommu and
then all applications will use the compatibility path with no code
changes. A later series allows /dev/vfio/vfio to be directly provided by
iommufd, which allows the rlimit mode to work the same as well.
This series just provides the iommufd side of compatibility. Actually
linking this to VFIO_SET_CONTAINER is a followup series, with a link in
the cover letter.
Internally the compatibility API uses a normal IOAS object that, like
vfio, is automatically allocated when the first device is
attached.
Userspace can also query or set this IOAS object directly using the
IOMMU_VFIO_IOAS ioctl. This allows mixing and matching new iommufd only
features while still using the VFIO style map/unmap ioctls.
While this is enough to operate qemu, it has a few differences:
- Resource limits rely on memory cgroups to bound what userspace can do
instead of the module parameter dma_entry_limit.
- VFIO P2P is not implemented. The DMABUF patches for vfio are a start at
a solution where iommufd would import a special DMABUF. This is to avoid
further propogating the follow_pfn() security problem.
- A full audit for pedantic compatibility details (eg errnos, etc) has
not yet been done
- powerpc SPAPR is left out, as it is not connected to the iommu_domain
framework. It seems interest in SPAPR is minimal as it is currently
non-working in v6.1-rc1. They will have to convert to the iommu
subsystem framework to enjoy iommfd.
The following are not going to be implemented and we expect to remove them
from VFIO type1:
- SW access 'dirty tracking'. As discussed in the cover letter this will
be done in VFIO.
- VFIO_TYPE1_NESTING_IOMMU
https://lore.kernel.org/all/0-v1-0093c9b0e345+19-vfio_no_nesting_jgg@nvidia.com/
- VFIO_DMA_MAP_FLAG_VADDR
https://lore.kernel.org/all/Yz777bJZjTyLrHEQ@nvidia.com/
Link: https://lore.kernel.org/r/15-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Kernel access is the mode that VFIO "mdevs" use. In this case there is no
struct device and no IOMMU connection. iommufd acts as a record keeper for
accesses and returns the actual struct pages back to the caller to use
however they need. eg with kmap or the DMA API.
Each caller must create a struct iommufd_access with
iommufd_access_create(), similar to how iommufd_device_bind() works. Using
this struct the caller can access blocks of IOVA using
iommufd_access_pin_pages() or iommufd_access_rw().
Callers must provide a callback that immediately unpins any IOVA being
used within a range. This happens if userspace unmaps the IOVA under the
pin.
The implementation forwards the access requests directly to the iopt
infrastructure that manages the iopt_pages_access.
Link: https://lore.kernel.org/r/14-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Add the four functions external drivers need to connect physical DMA to
the IOMMUFD:
iommufd_device_bind() / iommufd_device_unbind()
Register the device with iommufd and establish security isolation.
iommufd_device_attach() / iommufd_device_detach()
Connect a bound device to a page table
Binding a device creates a device object ID in the uAPI, however the
generic API does not yet provide any IOCTLs to manipulate them.
Link: https://lore.kernel.org/r/13-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The hw_pagetable object exposes the internal struct iommu_domain's to
userspace. An iommu_domain is required when any DMA device attaches to an
IOAS to control the io page table through the iommu driver.
For compatibility with VFIO the hw_pagetable is automatically created when
a DMA device is attached to the IOAS. If a compatible iommu_domain already
exists then the hw_pagetable associated with it is used for the
attachment.
In the initial series there is no iommufd uAPI for the hw_pagetable
object. The next patch provides driver facing APIs for IO page table
attachment that allows drivers to accept either an IOAS or a hw_pagetable
ID and for the driver to return the hw_pagetable ID that was auto-selected
from an IOAS. The expectation is the driver will provide uAPI through its
own FD for attaching its device to iommufd. This allows userspace to learn
the mapping of devices to iommu_domains and to override the automatic
attachment.
The future HW specific interface will allow userspace to create
hw_pagetable objects using iommu_domains with IOMMU driver specific
parameters. This infrastructure will allow linking those domains to IOAS's
and devices.
Link: https://lore.kernel.org/r/12-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Connect the IOAS to its IOCTL interface. This exposes most of the
functionality in the io_pagetable to userspace.
This is intended to be the core of the generic interface that IOMMUFD will
provide. Every IOMMU driver should be able to implement an iommu_domain
that is compatible with this generic mechanism.
It is also designed to be easy to use for simple non virtual machine
monitor users, like DPDK:
- Universal simple support for all IOMMUs (no PPC special path)
- An IOVA allocator that considers the aperture and the allowed/reserved
ranges
- io_pagetable allows any number of iommu_domains to be connected to the
IOAS
- Automatic allocation and re-use of iommu_domains
Along with room in the design to add non-generic features to cater to
specific HW functionality.
Link: https://lore.kernel.org/r/11-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
This is the remainder of the IOAS data structure. Provide an object called
an io_pagetable that is composed of iopt_areas pointing at iopt_pages,
along with a list of iommu_domains that mirror the IOVA to PFN map.
At the top this is a simple interval tree of iopt_areas indicating the map
of IOVA to iopt_pages. An xarray keeps track of a list of domains. Based
on the attached domains there is a minimum alignment for areas (which may
be smaller than PAGE_SIZE), an interval tree of reserved IOVA that can't
be mapped and an IOVA of allowed IOVA that can always be mappable.
The concept of an 'access' refers to something like a VFIO mdev that is
accessing the IOVA and using a 'struct page *' for CPU based access.
Externally an API is provided that matches the requirements of the IOCTL
interface for map/unmap and domain attachment.
The API provides a 'copy' primitive to establish a new IOVA map in a
different IOAS from an existing mapping by re-using the iopt_pages. This
is the basic mechanism to provide single pinning.
This is designed to support a pre-registration flow where userspace would
setup an dummy IOAS with no domains, map in memory and then establish an
access to pin all PFNs into the xarray.
Copy can then be used to create new IOVA mappings in a different IOAS,
with iommu_domains attached. Upon copy the PFNs will be read out of the
xarray and mapped into the iommu_domains, avoiding any pin_user_pages()
overheads.
Link: https://lore.kernel.org/r/10-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The iopt_pages which represents a logical linear list of full PFNs held in
different storage tiers. Each area points to a slice of exactly one
iopt_pages, and each iopt_pages can have multiple areas and accesses.
The three storage tiers are managed to meet these objectives:
- If no iommu_domain or in-kerenel access exists then minimal memory
should be consumed by iomufd
- If a page has been pinned then an iopt_pages will not pin it again
- If an in-kernel access exists then the xarray must provide the backing
storage to avoid allocations on domain removals
- Otherwise any iommu_domain will be used for storage
In a common configuration with only an iommu_domain the iopt_pages does
not allocate significant memory itself.
The external interface for pages has several logical operations:
iopt_area_fill_domain() will load the PFNs from storage into a single
domain. This is used when attaching a new domain to an existing IOAS.
iopt_area_fill_domains() will load the PFNs from storage into multiple
domains. This is used when creating a new IOVA map in an existing IOAS
iopt_pages_add_access() creates an iopt_pages_access that tracks an
in-kernel access of PFNs. This is some external driver that might be
accessing the IOVA using the CPU, or programming PFNs with the DMA
API. ie a VFIO mdev.
iopt_pages_rw_access() directly perform a memcpy on the PFNs, without
the overhead of iopt_pages_add_access()
iopt_pages_fill_xarray() will load PFNs into the xarray and return a
'struct page *' array. It is used by iopt_pages_access's to extract PFNs
for in-kernel use. iopt_pages_fill_from_xarray() is a fast path when it
is known the xarray is already filled.
As an iopt_pages can be referred to in slices by many areas and accesses
it uses interval trees to keep track of which storage tiers currently hold
the PFNs. On a page-by-page basis any request for a PFN will be satisfied
from one of the storage tiers and the PFN copied to target domain/array.
Unfill actions are similar, on a page by page basis domains are unmapped,
xarray entries freed or struct pages fully put back.
Significant complexity is required to fully optimize all of these data
motions. The implementation calculates the largest consecutive range of
same-storage indexes and operates in blocks. The accumulation of PFNs
always generates the largest contiguous PFN range possible to optimize and
this gathering can cross storage tier boundaries. For cases like 'fill
domains' care is taken to avoid duplicated work and PFNs are read once and
pushed into all domains.
The map/unmap interaction with the iommu_domain always works in contiguous
PFN blocks. The implementation does not require or benefit from any
split/merge optimization in the iommu_domain driver.
This design suggests several possible improvements in the IOMMU API that
would greatly help performance, particularly a way for the driver to map
and read the pfns lists instead of working with one driver call per page
to read, and one driver call per contiguous range to store.
Link: https://lore.kernel.org/r/9-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
The top of the data structure provides an IO Address Space (IOAS) that is
similar to a VFIO container. The IOAS allows map/unmap of memory into
ranges of IOVA called iopt_areas. Multiple IOMMU domains (IO page tables)
and in-kernel accesses (like VFIO mdevs) can be attached to the IOAS to
access the PFNs that those IOVA areas cover.
The IO Address Space (IOAS) datastructure is composed of:
- struct io_pagetable holding the IOVA map
- struct iopt_areas representing populated portions of IOVA
- struct iopt_pages representing the storage of PFNs
- struct iommu_domain representing each IO page table in the system IOMMU
- struct iopt_pages_access representing in-kernel accesses of PFNs (ie
VFIO mdevs)
- struct xarray pinned_pfns holding a list of pages pinned by in-kernel
accesses
This patch introduces the lowest part of the datastructure - the movement
of PFNs in a tiered storage scheme:
1) iopt_pages::pinned_pfns xarray
2) Multiple iommu_domains
3) The origin of the PFNs, i.e. the userspace pointer
PFN have to be copied between all combinations of tiers, depending on the
configuration.
The interface is an iterator called a 'pfn_reader' which determines which
tier each PFN is stored and loads it into a list of PFNs held in a struct
pfn_batch.
Each step of the iterator will fill up the pfn_batch, then the caller can
use the pfn_batch to send the PFNs to the required destination. Repeating
this loop will read all the PFNs in an IOVA range.
The pfn_reader and pfn_batch also keep track of the pinned page accounting.
While PFNs are always stored and accessed as full PAGE_SIZE units the
iommu_domain tier can store with a sub-page offset/length to support
IOMMUs with a smaller IOPTE size than PAGE_SIZE.
Link: https://lore.kernel.org/r/8-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
This is the basic infrastructure of a new miscdevice to hold the iommufd
IOCTL API.
It provides:
- A miscdevice to create file descriptors to run the IOCTL interface over
- A table based ioctl dispatch and centralized extendable pre-validation
step
- An xarray mapping userspace ID's to kernel objects. The design has
multiple inter-related objects held within in a single IOMMUFD fd
- A simple usage count to build a graph of object relations and protect
against hostile userspace racing ioctls
The only IOCTL provided in this patch is the generic 'destroy any object
by handle' operation.
Link: https://lore.kernel.org/r/6-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Lixiao Yang <lixiao.yang@intel.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
If device_register() returns error, the name allocated by dev_set_name()
need be freed. It should use put_device() to give up the reference in the
error path, so that the name can be freed in kobject_cleanup(), and
list_del() is called to delete the port from rio_mports.
Link: https://lkml.kernel.org/r/20221114152636.2939035-3-yangyingliang@huawei.com
Fixes: 2aaf308b95b2 ("rapidio: rework device hierarchy and introduce mport class of devices")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "rapidio: fix three possible memory leaks".
This patchset fixes three name leaks in error handling.
- patch #1 fixes two name leaks while rio_add_device() fails.
- patch #2 fixes a name leak while rio_register_mport() fails.
This patch (of 2):
If rio_add_device() returns error, the name allocated by dev_set_name()
need be freed. It should use put_device() to give up the reference in the
error path, so that the name can be freed in kobject_cleanup(), and the
'rdev' can be freed in rio_release_dev().
Link: https://lkml.kernel.org/r/20221114152636.2939035-1-yangyingliang@huawei.com
Link: https://lkml.kernel.org/r/20221114152636.2939035-2-yangyingliang@huawei.com
Fixes: e8de370188d0 ("rapidio: add mport char device driver")
Fixes: 1fa5ae857bb1 ("driver core: get rid of struct device's bus_id string array")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Cc: Alexandre Bounine <alex.bou9@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
On some platforms, `char` is unsigned, but this driver, for the most part,
assumed it was signed. In other places, it uses `char` to mean an
unsigned number, but only in cases when the values are small. And in
still other places, `char` is used as a boolean. Put an end to this
confusion by declaring explicit types, depending on the context.
Link: https://lkml.kernel.org/r/20221019155541.3410813-1-Jason@zx2c4.com
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We don't show num_reads and num_writes since we removed corresponding
sysfs nodes in 2017. Block layer stats are exposed via
/sys/block/zramX/stat file.
However, we still increment those atomic vars and store them in zram
stats. Remove leftovers.
Link: https://lkml.kernel.org/r/20221117141326.1105181-1-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
FOLL_FORCE is really only for ptrace access. As we unpin the pinned pages
using unpin_user_pages_dirty_lock(true), the assumption is that all these
pages are writable.
FOLL_FORCE in this case seems to be due to copy-and-past from other
drivers. Let's just remove it.
Link: https://lkml.kernel.org/r/20221116102659.70287-20-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Oded Gabbay <ogabbay@kernel.org>
Cc: Oded Gabbay <ogabbay@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
FOLL_FORCE is really only for ptrace access. As we unpin the pinned pages
using unpin_user_pages_dirty_lock(true), the assumption is that all these
pages are writable.
FOLL_FORCE in this case seems to be a legacy leftover. Let's just remove
it.
Link: https://lkml.kernel.org/r/20221116102659.70287-19-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
FOLL_FORCE is really only for ptrace access. As we unpin the pinned pages
using unpin_user_pages_dirty_lock(true), the assumption is that all these
pages are writable.
FOLL_FORCE in this case seems to be a legacy leftover. Let's just remove
it.
Link: https://lkml.kernel.org/r/20221116102659.70287-18-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
FOLL_FORCE is really only for ptrace access. According to commit
707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are always
writable"), get_vaddr_frames() currently pins all pages writable as a
workaround for issues with read-only buffers.
FOLL_FORCE, however, seems to be a legacy leftover as it predates
commit 707947247e95 ("media: videobuf2-vmalloc: get_userptr: buffers are
always writable"). Let's just remove it.
Once the read-only buffer issue has been resolved, FOLL_WRITE could
again be set depending on the DMA direction.
Link: https://lkml.kernel.org/r/20221116102659.70287-17-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Acked-by: Tomasz Figa <tfiga@chromium.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
FOLL_FORCE is really only for ptrace access. R/O pinning a page is
supposed to fail if the VMA misses proper access permissions (no VM_READ).
Let's just remove FOLL_FORCE usage here; there would have to be a pretty
good reason to allow arbitrary drivers to R/O pin pages in a PROT_NONE
VMA. Most probably, FOLL_FORCE usage is just some legacy leftover.
Link: https://lkml.kernel.org/r/20221116102659.70287-16-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
GUP now supports reliable R/O long-term pinning in COW mappings, such
that we break COW early. MAP_SHARED VMAs only use the shared zeropage so
far in one corner case (DAXFS file with holes), which can be ignored
because GUP does not support long-term pinning in fsdax (see
check_vma_flags()).
commit cd5297b0855f ("drm/etnaviv: Use FOLL_FORCE for userptr")
documents that FOLL_FORCE | FOLL_WRITE was really only used for reliable
R/O pinning.
Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required
for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop
using FOLL_FORCE, which is really only for ptrace access.
Link: https://lkml.kernel.org/r/20221116102659.70287-15-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: David Airlie <airlied@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
GUP now supports reliable R/O long-term pinning in COW mappings, such
that we break COW early. MAP_SHARED VMAs only use the shared zeropage so
far in one corner case (DAXFS file with holes), which can be ignored
because GUP does not support long-term pinning in fsdax (see
check_vma_flags()).
Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required
for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop
using FOLL_FORCE, which is really only for ptrace access.
Link: https://lkml.kernel.org/r/20221116102659.70287-14-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
GUP now supports reliable R/O long-term pinning in COW mappings, such
that we break COW early. MAP_SHARED VMAs only use the shared zeropage so
far in one corner case (DAXFS file with holes), which can be ignored
because GUP does not support long-term pinning in fsdax (see
check_vma_flags()).
Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required
for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop
using FOLL_FORCE, which is really only for ptrace access.
Link: https://lkml.kernel.org/r/20221116102659.70287-13-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Bernard Metzler <bmt@zurich.ibm.com>
Cc: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
GUP now supports reliable R/O long-term pinning in COW mappings, such
that we break COW early. MAP_SHARED VMAs only use the shared zeropage so
far in one corner case (DAXFS file with holes), which can be ignored
because GUP does not support long-term pinning in fsdax (see
check_vma_flags()).
Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required
for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop
using FOLL_FORCE, which is really only for ptrace access.
Link: https://lkml.kernel.org/r/20221116102659.70287-12-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Nelson Escobar <neescoba@cisco.com>
Cc: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
GUP now supports reliable R/O long-term pinning in COW mappings, such
that we break COW early. MAP_SHARED VMAs only use the shared zeropage so
far in one corner case (DAXFS file with holes), which can be ignored
because GUP does not support long-term pinning in fsdax (see
check_vma_flags()).
Consequently, FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM is no longer required
for reliable R/O long-term pinning: FOLL_LONGTERM is sufficient. So stop
using FOLL_FORCE, which is really only for ptrace access.
Link: https://lkml.kernel.org/r/20221116102659.70287-11-david@redhat.com
Tested-by: Leon Romanovsky <leonro@nvidia.com> [over mlx4 and mlx5]
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a new flag to zram block state that shows if the page is
incompressible: that none of the algorithm (including secondary ones)
could compress it.
Link: https://lkml.kernel.org/r/20221109115047.2921851-14-senozhatsky@chromium.org
Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add support for incompressible pages writeback:
echo incompressible > /sys/block/zramX/writeback
Link: https://lkml.kernel.org/r/20221109115047.2921851-13-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Recompression iterates through all the registered secondary compression
algorithms in order of their priorities so that we have higher chances of
finding the algorithm that compresses a particular page. This, however,
may not always be best approach and sometimes we may want to limit
recompression to only one particular algorithm. For instance, when a
higher priority algorithm uses too much power and device has a relatively
low battery level we may want to limit recompression to use only a lower
priority algorithm, which uses less power.
Introduce algo= parameter support to recompression sysfs knob so that
user-sapce can request recompression with particular algorithm only:
echo "type=idle algo=zstd" > /sys/block/zramX/recompress
Link: https://lkml.kernel.org/r/20221109115047.2921851-11-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Size class index comparison is powerful enough so we can remove object
size comparisons.
Link: https://lkml.kernel.org/r/20221109115047.2921851-10-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It makes no sense for us to recompress the object if it will be in the
same size class. We anyway don't get any memory gain. But, at the same
time, we get a CPU time overhead when inserting this object into zspage
and decompressing it afterwards.
[senozhatsky: rebased and fixed conflicts]
Link: https://lkml.kernel.org/r/20221109115047.2921851-9-senozhatsky@chromium.org
Signed-off-by: Alexey Romanov <avromanov@sberdevices.ru>
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Avoid typecasts that are needed for IS_ERR() and use IS_ERR_VALUE()
instead.
Link: https://lkml.kernel.org/r/20221109115047.2921851-8-senozhatsky@chromium.org
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Re-phrase writeback BIO error comment.
Link: https://lkml.kernel.org/r/20221109115047.2921851-7-senozhatsky@chromium.org
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a new flag to zram block state that shows if the page was recompressed
(using alternative compression algorithm).
Link: https://lkml.kernel.org/r/20221109115047.2921851-6-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Allow zram to recompress (using secondary compression streams)
pages.
Re-compression algorithms (we support up to 3 at this stage)
are selected via recomp_algorithm:
echo "algo=zstd priority=1" > /sys/block/zramX/recomp_algorithm
Please read documentation for more details.
We support several recompression modes:
1) IDLE pages recompression is activated by `idle` mode
echo "type=idle" > /sys/block/zram0/recompress
2) Since there may be many idle pages user-space may pass a size
threshold value (in bytes) and we will recompress pages only
of equal or greater size:
echo "threshold=888" > /sys/block/zram0/recompress
3) HUGE pages recompression is activated by `huge` mode
echo "type=huge" > /sys/block/zram0/recompress
4) HUGE_IDLE pages recompression is activated by `huge_idle` mode
echo "type=huge_idle" > /sys/block/zram0/recompress
[senozhatsky@chromium.org: we should always zero out err variable in recompress loop[
Link: https://lkml.kernel.org/r/20221110143423.3250790-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20221109115047.2921851-5-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We will use non-WB variant in ZRAM page recompression path.
Link: https://lkml.kernel.org/r/20221109115047.2921851-4-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Introduce recomp_algorithm sysfs knob that controls secondary algorithm
selection used for recompression.
We will support up to 3 secondary compression algorithms which are sorted
in order of their priority. To select an algorithm user has to provide
its name and priority:
echo "algo=zstd priority=1" > /sys/block/zramX/recomp_algorithm
echo "algo=deflate priority=2" > /sys/block/zramX/recomp_algorithm
During recompression zram iterates through the list of registered
secondary algorithms in order of their priorities.
We also have a short version for cases when there is only
one secondary compression algorithm:
echo "algo=zstd" > /sys/block/zramX/recomp_algorithm
This will register zstd as the secondary algorithm with priority 1.
Link: https://lkml.kernel.org/r/20221109115047.2921851-3-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "zram: Support multiple compression streams", v5.
This series adds support for multiple compression streams. The main idea
is that different compression algorithms have different characteristics
and zram may benefit when it uses a combination of algorithms: a default
algorithm that is faster but have lower compression rate and a secondary
algorithm that can use higher compression rate at a price of slower
compression/decompression.
There are several use-case for this functionality:
- huge pages re-compression: zstd or deflate can successfully compress
huge pages (~50% of huge pages on my synthetic ChromeOS tests), IOW
pages that lzo was not able to compress.
- idle pages re-compression: idle/cold pages sit in the memory and we
may reduce zsmalloc memory usage if we recompress those idle pages.
Userspace has a number of ways to control the behavior and impact of zram
recompression: what type of pages should be recompressed, size watermarks,
etc. Please refer to documentation patch.
This patch (of 13):
The patch turns compression streams and compressor algorithm name struct
zram members into arrays, so that we can have multiple compression streams
support (in the next patches).
The patch uses a rather explicit API for compressor selection:
- Get primary (default) compression stream
zcomp_stream_get(zram->comps[ZRAM_PRIMARY_COMP])
- Get secondary compression stream
zcomp_stream_get(zram->comps[ZRAM_SECONDARY_COMP])
We use similar API for compression streams put().
At this point we always have just one compression stream,
since CONFIG_ZRAM_MULTI_COMP is not yet defined.
Link: https://lkml.kernel.org/r/20221109115047.2921851-1-senozhatsky@chromium.org
Link: https://lkml.kernel.org/r/20221109115047.2921851-2-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: Alexey Romanov <avromanov@sberdevices.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A set of clk driver fixes that resolve issues for various SoCs.
Most of these are incorrect clk data, like bad parent descriptions.
When the clk tree is improperly described things don't work, like USB
and UFS controllers, because clk frequencies are wonky. Here are the
extra details:
- Fix the parent of UFS reference clks on Qualcomm SC8280XP so that
UFS works properly
- Fix the clk ID for USB on AT91 RM9200 so the USB driver continues
to probe
- Stop using of_device_get_match_data() on the wrong device for a
Samsung Exynos driver so it gets the proper clk data
- Fix ExynosAutov9 binding
- Fix the parent of the div4 clk on Exynos7885
- Stop calling runtime PM APIs from the Qualcomm GDSC driver directly
as it leads to a lockdep splat and is just plain wrong because it
violates runtime PM semantics by calling runtime PM APIs when the
device has been runtime PM disabled"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: qcom: gcc-sc8280xp: add cxo as parent for three ufs ref clks
ARM: at91: rm9200: fix usb device clock id
clk: samsung: Revert "clk: samsung: exynos-clkout: Use of_device_get_match_data()"
dt-bindings: clock: exynosautov9: fix reference to CMU_FSYS1
clk: qcom: gdsc: Remove direct runtime PM calls
clk: samsung: exynos7885: Correct "div4" clock parents
|
|
The linux,keycodes property is optional.
Fix the driver not probing when it's not specified.
Fixes: c18ef50346f2 ("Input: msg2638 - add support for msg2138 key events")
Signed-off-by: Vincent Knecht <vincent.knecht@mailoo.org>
Link: https://lore.kernel.org/r/20221130210202.2069213-1-vincent.knecht@mailoo.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
The wistron_btns driver calls rtc_cmos_read(), which isn't
available with UML builds, so disable this driver on UML.
Prevents this build error:
ld: drivers/input/misc/wistron_btns.o: in function `poll_bios':
wistron_btns.c:(.text+0x4be): undefined reference to `rtc_cmos_read'
Fixes: 0bbadafdc49d ("um: allow disabling NO_IOMEM") # v5.14+
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20221130161604.1879-1-rdunlap@infradead.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
|
|
Patch series "Fix a bunch of allmodconfig errors", v2.
Since b339ec9c229aa ("kbuild: Only default to -Werror if COMPILE_TEST")
WERROR now defaults to COMPILE_TEST meaning that it's enabled for
allmodconfig builds. This leads to some interesting build failures when
using Clang, each resolved in this set.
With this set applied, I am able to obtain a successful allmodconfig Arm
build.
This patch (of 2):
calculate_bandwidth() is presently broken on all !(X86_64 || SPARC64 ||
ARM64) architectures built with Clang (all released versions), whereby the
stack frame gets blown up to well over 5k. This would cause an immediate
kernel panic on most architectures. We'll revert this when the following
bug report has been resolved:
https://github.com/llvm/llvm-project/issues/41896.
Link: https://lkml.kernel.org/r/20221125120750.3537134-1-lee@kernel.org
Link: https://lkml.kernel.org/r/20221125120750.3537134-2-lee@kernel.org
Signed-off-by: Lee Jones <lee@kernel.org>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Lee Jones <lee@kernel.org>
Cc: Leo Li <sunpeng.li@amd.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Tom Rix <trix@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
ida_simple[get|remove] are deprecated, and are just wrappers to
ida_[alloc_range|free]. Replace ida_simple[get|remove] with their
corresponding counterparts.
No functional changes.
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Link: https://lore.kernel.org/r/20221130123001.25473-1-p.raghav@samsung.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Without this change, the interrupt test fail with MSI-X environment:
$ sudo ethtool -t enp0s2 offline
[ 43.921783] igb 0000:00:02.0: offline testing starting
[ 44.855824] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Down
[ 44.961249] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 51.272202] igb 0000:00:02.0: testing shared interrupt
[ 56.996975] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
The test result is FAIL
The test extra info:
Register test (offline) 0
Eeprom test (offline) 0
Interrupt test (offline) 4
Loopback test (offline) 0
Link test (on/offline) 0
Here, "4" means an expected interrupt was not delivered.
To fix this, route IRQs correctly to the first MSI-X vector by setting
IVAR_MISC. Also, set bit 0 of EIMS so that the vector will not be
masked. The interrupt test now runs properly with this change:
$ sudo ethtool -t enp0s2 offline
[ 42.762985] igb 0000:00:02.0: offline testing starting
[ 50.141967] igb 0000:00:02.0: testing shared interrupt
[ 56.163957] igb 0000:00:02.0 enp0s2: igb: enp0s2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
The test result is PASS
The test extra info:
Register test (offline) 0
Eeprom test (offline) 0
Interrupt test (offline) 0
Loopback test (offline) 0
Link test (on/offline) 0
Fixes: 4eefa8f01314 ("igb: add single vector msi-x testing to interrupt test")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
e1000_xmit_frame is expected to stop the queue and dispatch frames to
hardware if there is not sufficient space for the next frame in the
buffer, but sometimes it failed to do so because the estimated maximum
size of frame was wrong. As the consequence, the later invocation of
e1000_xmit_frame failed with NETDEV_TX_BUSY, and the frame in the buffer
remained forever, resulting in a watchdog failure.
This change fixes the estimated size by making it match with the
condition for NETDEV_TX_BUSY. Apparently, the old estimation failed to
account for the following lines which determines the space requirement
for not causing NETDEV_TX_BUSY:
```
/* reserve a descriptor for the offload context */
if ((mss) || (skb->ip_summed == CHECKSUM_PARTIAL))
count++;
count++;
count += DIV_ROUND_UP(len, adapter->tx_fifo_limit);
```
This issue was found when running http-stress02 test included in Linux
Test Project 20220930 on QEMU with the following commandline:
```
qemu-system-x86_64 -M q35,accel=kvm -m 8G -smp 8
-drive if=virtio,format=raw,file=root.img,file.locking=on
-device e1000e,netdev=netdev
-netdev tap,script=ifup,downscript=no,id=netdev
```
Fixes: bc7f75fa9788 ("[E1000E]: New pci-express e1000 driver (currently for ICH9 devices only)")
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Users may disable HWP in firmware, in which case intel_pstate wouldn't load
unless the CPU model is explicitly supported.
See also the following past commits:
commit d8de7a44e11f ("cpufreq: intel_pstate: Add Skylake servers support")
commit 706c5328851d ("cpufreq: intel_pstate: Add Cometlake support in
no-HWP mode")
commit fbdc21e9b038 ("cpufreq: intel_pstate: Add Icelake servers support in
no-HWP mode")
commit 71bb5c82aaae ("cpufreq: intel_pstate: Add Tigerlake support in
no-HWP mode")
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
pci_get_device() will increase the reference count for the returned
pci_dev. We need to use pci_dev_put() to decrease the reference count
after using pci_get_device(). Let's add it.
Fixes: 59a3b3a8db16 ("cpufreq: AMD: Ignore the check for ProcFeedback in ST/CZ")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
In cpufreq_policy_alloc(), it will call uninitialed completion in
cpufreq_sysfs_release() when kobject_init_and_add() fails. And
that will cause a crash such as the following page fault in complete:
BUG: unable to handle page fault for address: fffffffffffffff8
[..]
RIP: 0010:complete+0x98/0x1f0
[..]
Call Trace:
kobject_put+0x1be/0x4c0
cpufreq_online.cold+0xee/0x1fd
cpufreq_add_dev+0x183/0x1e0
subsys_interface_register+0x3f5/0x4e0
cpufreq_register_driver+0x3b7/0x670
acpi_cpufreq_init+0x56c/0x1000 [acpi_cpufreq]
do_one_initcall+0x13d/0x780
do_init_module+0x1c3/0x630
load_module+0x6e67/0x73b0
__do_sys_finit_module+0x181/0x240
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 4ebe36c94aed ("cpufreq: Fix kobject memleak")
Signed-off-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 5.2+ <stable@vger.kernel.org> # 5.2+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Dm_integrity also has the same UAF problem when dm_resume()
and dm_destroy() are concurrent.
Therefore, cancelling timer again in dm_integrity_dtr().
Cc: stable@vger.kernel.org
Fixes: 7eada909bfd7a ("dm: add integrity target")
Signed-off-by: Luo Meng <luomeng12@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
Dm_cache also has the same UAF problem when dm_resume()
and dm_destroy() are concurrent.
Therefore, cancelling timer again in destroy().
Cc: stable@vger.kernel.org
Fixes: c6b4fcbad044e ("dm: add cache target")
Signed-off-by: Luo Meng <luomeng12@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|