Age | Commit message (Collapse) | Author |
|
Return the bus id of the ccw proxy device. This makes 'ethtool -i'
show a more useful value than 'virtio' in the bus-info field.
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
A queue with a capacity of zero is clearly not a valid virtio queue.
Some emulators report zero queue size if queried with an invalid queue
index. Instead of crashing in this case let us just return -ENOENT. To
make that work properly, let us fix the notifier cleanup logic as well.
Cc: stable@vger.kernel.org
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
We've changed to kzalloc the vb struct, so no need to 0-initialize
this field one more time.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
|
|
There is no need to update the balloon actual register when there is no
ballooning request. This patch avoids update_balloon_size when diff is 0.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
Segments can't be larger than the maximum DMA mapping size
supported on the platform. Take that into account when
setting the maximum segment size for a block device.
Cc: stable@vger.kernel.org
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
This function returns the maximum segment size for a single
dma transaction of a virtio device. The possible limit comes
from the SWIOTLB implementation in the Linux kernel, that
has an upper limit of (currently) 256kb of contiguous
memory it can map. Other DMA-API implementations might also
have limits.
Use the new dma_max_mapping_size() function to determine the
maximum mapping size when DMA-API is in use for virtio.
Cc: stable@vger.kernel.org
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
"Lots of tooling updates - too many to list, here's a few highlights:
- Various subcommand updates to 'perf trace', 'perf report', 'perf
record', 'perf annotate', 'perf script', 'perf test', etc.
- CPU and NUMA topology and affinity handling improvements,
- HW tracing and HW support updates:
- Intel PT updates
- ARM CoreSight updates
- vendor HW event updates
- BPF updates
- Tons of infrastructure updates, both on the build system and the
library support side
- Documentation updates.
- ... and lots of other changes, see the changelog for details.
Kernel side updates:
- Tighten up kprobes blacklist handling, reduce the number of places
where developers can install a kprobe and hang/crash the system.
- Fix/enhance vma address filter handling.
- Various PMU driver updates, small fixes and additions.
- refcount_t conversions
- BPF updates
- error code propagation enhancements
- misc other changes"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (238 commits)
perf script python: Add Python3 support to syscall-counts-by-pid.py
perf script python: Add Python3 support to syscall-counts.py
perf script python: Add Python3 support to stat-cpi.py
perf script python: Add Python3 support to stackcollapse.py
perf script python: Add Python3 support to sctop.py
perf script python: Add Python3 support to powerpc-hcalls.py
perf script python: Add Python3 support to net_dropmonitor.py
perf script python: Add Python3 support to mem-phys-addr.py
perf script python: Add Python3 support to failed-syscalls-by-pid.py
perf script python: Add Python3 support to netdev-times.py
perf tools: Add perf_exe() helper to find perf binary
perf script: Handle missing fields with -F +..
perf data: Add perf_data__open_dir_data function
perf data: Add perf_data__(create_dir|close_dir) functions
perf data: Fail check_backup in case of error
perf data: Make check_backup work over directories
perf tools: Add rm_rf_perf_data function
perf tools: Add pattern name checking to rm_rf
perf tools: Add depth checking to rm_rf
perf data: Add global path holder
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
"The main EFI changes in this cycle were:
- Use 32-bit alignment for efi_guid_t
- Allow the SetVirtualAddressMap() call to be omitted
- Implement earlycon=efifb based on existing earlyprintk code
- Various minor fixes and code cleanups from Sai, Ard and me"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Fix build error due to enum collision between efi.h and ima.h
efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation
x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbol
efi/arm/arm64: Allow SetVirtualAddressMap() to be omitted
efi: Replace GPL license boilerplate with SPDX headers
efi/fdt: Apply more cleanups
efi: Use 32-bit alignment for efi_guid_t
efi/memattr: Don't bail on zero VA if it equals the region's PA
x86/efi: Mark can_free_region() as an __init function
|
|
When using dm-integrity underneath md-raid, some tests with raid
auto-correction trigger large amounts of integrity failures - and all
these failures print an error message. These messages can bring the
system to a halt if the system is using serial console.
Fix this by limiting the rate of error messages - it improves the speed
of raid recovery and avoids the hang.
Fixes: 7eada909bfd7a ("dm: add integrity target")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Patch series "mm: PG_reserved cleanups and documentation", v2.
I was recently going over all users of PG_reserved. Short story: it is
difficult and sometimes not really clear if setting/checking for
PG_reserved is only a relict from the past. Easy to break things. I
guess I now have a pretty good idea wh things are like that nowadays and
how they evolved.
I had way more cleanups in this series inititally, but some
architectures take PG_reserved as a way to apply a different caching
strategy (for MMIO pages). So I decided to only include the most
obvious changes (that are less likely to break something). So the big
chunk of manual SetPageReserved users are MMIO/DMA related things on
device buffers.
Most notably, for device memory we will hopefully soon stop setting
PG_reserved. Then the documentation has to be updated.
This patch (of 9):
The l1 GATT page table is kept in a special on-chip page with 64
entries. We allocate the l2 page table pages via get_zeroed_page() and
enter them into the table. These l2 pages are modified accordingly when
inserting/removing memory via efficeon_insert_memory and
efficeon_remove_memory.
Apart from that, these pages are not exposed or ioremap'ed. We can stop
setting them reserved (propably copied from generic code).
Link: http://lkml.kernel.org/r/20190114125903.24845-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The migration scanner is a linear scan of a zone with a potentiall large
search space. Furthermore, many pageblocks are unusable such as those
filled with reserved pages or partially filled with pages that cannot
migrate. These still get scanned in the common case of allocating a THP
and the cost accumulates.
The patch uses a partial search of the free lists to locate a migration
source candidate that is marked as MOVABLE when allocating a THP. It
prefers picking a block with a larger number of free pages already on
the basis that there are fewer pages to migrate to free the entire
block. The lowest PFN found during searches is tracked as the basis of
the start for the linear search after the first search of the free list
fails. After the search, the free list is shuffled so that the next
search will not encounter the same page. If the search fails then the
subsequent searches will be shorter and the linear scanner is used.
If this search fails, or if the request is for a small or
unmovable/reclaimable allocation then the linear scanner is still used.
It is somewhat pointless to use the list search in those cases. Small
free pages must be used for the search and there is no guarantee that
movable pages are located within that block that are contiguous.
5.0.0-rc1 5.0.0-rc1
noboost-v3r10 findmig-v3r15
Amean fault-both-3 3771.41 ( 0.00%) 3390.40 ( 10.10%)
Amean fault-both-5 5409.05 ( 0.00%) 5082.28 ( 6.04%)
Amean fault-both-7 7040.74 ( 0.00%) 7012.51 ( 0.40%)
Amean fault-both-12 11887.35 ( 0.00%) 11346.63 ( 4.55%)
Amean fault-both-18 16718.19 ( 0.00%) 15324.19 ( 8.34%)
Amean fault-both-24 21157.19 ( 0.00%) 16088.50 * 23.96%*
Amean fault-both-30 21175.92 ( 0.00%) 18723.42 * 11.58%*
Amean fault-both-32 21339.03 ( 0.00%) 18612.01 * 12.78%*
5.0.0-rc1 5.0.0-rc1
noboost-v3r10 findmig-v3r15
Percentage huge-3 86.50 ( 0.00%) 89.83 ( 3.85%)
Percentage huge-5 92.52 ( 0.00%) 91.96 ( -0.61%)
Percentage huge-7 92.44 ( 0.00%) 92.85 ( 0.44%)
Percentage huge-12 92.98 ( 0.00%) 92.74 ( -0.25%)
Percentage huge-18 91.70 ( 0.00%) 91.71 ( 0.02%)
Percentage huge-24 91.59 ( 0.00%) 92.13 ( 0.60%)
Percentage huge-30 90.14 ( 0.00%) 93.79 ( 4.04%)
Percentage huge-32 90.03 ( 0.00%) 91.27 ( 1.37%)
This shows an improvement in allocation latencies with similar
allocation success rates. While not presented, there was a 31%
reduction in migration scanning and a 8% reduction on system CPU usage.
A 2-socket machine showed similar benefits.
[mgorman@techsingularity.net: several fixes]
Link: http://lkml.kernel.org/r/20190204120111.GL9565@techsingularity.net
[vbabka@suse.cz: migrate block that was found-fast, some optimisations]
Link: http://lkml.kernel.org/r/20190118175136.31341-10-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <Vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "Replace all open encodings for NUMA_NO_NODE", v3.
All these places for replacement were found by running the following
grep patterns on the entire kernel code. Please let me know if this
might have missed some instances. This might also have replaced some
false positives. I will appreciate suggestions, inputs and review.
1. git grep "nid == -1"
2. git grep "node == -1"
3. git grep "nid = -1"
4. git grep "node = -1"
This patch (of 2):
At present there are multiple places where invalid node number is
encoded as -1. Even though implicitly understood it is always better to
have macros in there. Replace these open encodings for an invalid node
number with the global macro NUMA_NO_NODE. This helps remove NUMA
related assumptions like 'invalid node' from various places redirecting
them to a common definition.
Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe]
Acked-by: Jens Axboe <axboe@kernel.dk> [mtip32xx]
Acked-by: Vinod Koul <vkoul@kernel.org> [dmaengine.c]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Acked-by: Doug Ledford <dledford@redhat.com> [drivers/infiniband]
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Mark inflated and never onlined pages PG_offline, to tell the world that
the content is stale and should not be dumped.
[david@redhat.com: use vmballoon_page_in_frames more widely]
Link: http://lkml.kernel.org/r/20181122100627.5189-7-david@redhat.com
Link: http://lkml.kernel.org/r/20181119101616.8901-7-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Nadav Amit <namit@vmware.com>
Cc: Xavier Deguillard <xdeguillard@vmware.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Julien Freche <jfreche@vmware.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Christian Hansen <chansen3@cisco.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kairui Song <kasong@redhat.com>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Lianbo Jiang <lijiang@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Pankaj gupta <pagupta@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Mark inflated and never onlined pages PG_offline, to tell the world that
the content is stale and should not be dumped.
Link: http://lkml.kernel.org/r/20181119101616.8901-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Pankaj gupta <pagupta@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Kairui Song <kasong@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Christian Hansen <chansen3@cisco.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Julien Freche <jfreche@vmware.com>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Lianbo Jiang <lijiang@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Mark inflated and never onlined pages PG_offline, to tell the world that
the content is stale and should not be dumped.
Link: http://lkml.kernel.org/r/20181119101616.8901-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Hansen <chansen3@cisco.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Julien Freche <jfreche@vmware.com>
Cc: Kairui Song <kasong@redhat.com>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Lianbo Jiang <lijiang@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Pankaj gupta <pagupta@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When freeing pages are done with higher order, time spent on coalescing
pages by buddy allocator can be reduced. With section size of 256MB,
hot add latency of a single section shows improvement from 50-60 ms to
less than 1 ms, hence improving the hot add latency by 60 times. Modify
external providers of online callback to align with the change.
[arunks@codeaurora.org: v11]
Link: http://lkml.kernel.org/r/1547792588-18032-1-git-send-email-arunks@codeaurora.org
[akpm@linux-foundation.org: remove unused local, per Arun]
[akpm@linux-foundation.org: avoid return of void-returning __free_pages_core(), per Oscar]
[akpm@linux-foundation.org: fix it for mm-convert-totalram_pages-and-totalhigh_pages-variables-to-atomic.patch]
[arunks@codeaurora.org: v8]
Link: http://lkml.kernel.org/r/1547032395-24582-1-git-send-email-arunks@codeaurora.org
[arunks@codeaurora.org: v9]
Link: http://lkml.kernel.org/r/1547098543-26452-1-git-send-email-arunks@codeaurora.org
Link: http://lkml.kernel.org/r/1538727006-5727-1-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull year 2038 updates from Thomas Gleixner:
"Another round of changes to make the kernel ready for 2038. After lots
of preparatory work this is the first set of syscalls which are 2038
safe:
403 clock_gettime64
404 clock_settime64
405 clock_adjtime64
406 clock_getres_time64
407 clock_nanosleep_time64
408 timer_gettime64
409 timer_settime64
410 timerfd_gettime64
411 timerfd_settime64
412 utimensat_time64
413 pselect6_time64
414 ppoll_time64
416 io_pgetevents_time64
417 recvmmsg_time64
418 mq_timedsend_time64
419 mq_timedreceiv_time64
420 semtimedop_time64
421 rt_sigtimedwait_time64
422 futex_time64
423 sched_rr_get_interval_time64
The syscall numbers are identical all over the architectures"
* 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
riscv: Use latest system call ABI
checksyscalls: fix up mq_timedreceive and stat exceptions
unicore32: Fix __ARCH_WANT_STAT64 definition
asm-generic: Make time32 syscall numbers optional
asm-generic: Drop getrlimit and setrlimit syscalls from default list
32-bit userspace ABI: introduce ARCH_32BIT_OFF_T config option
compat ABI: use non-compat openat and open_by_handle_at variants
y2038: add 64-bit time_t syscalls to all 32-bit architectures
y2038: rename old time and utime syscalls
y2038: remove struct definition redirects
y2038: use time32 syscall names on 32-bit
syscalls: remove obsolete __IGNORE_ macros
y2038: syscalls: rename y2038 compat syscalls
x86/x32: use time64 versions of sigtimedwait and recvmmsg
timex: change syscalls to use struct __kernel_timex
timex: use __kernel_timex internally
sparc64: add custom adjtimex/clock_adjtime functions
time: fix sys_timer_settime prototype
time: Add struct __kernel_timex
time: make adjtime compat handling available for 32 bit
...
|
|
The commit referenced below introduced device locking around save and
restore of state for each device during a PCI bus "try" reset, making it
decidely non-"try" and prone to deadlock in the event that a device is
already locked. Restore __pci_reset_bus() and __pci_reset_slot() to their
advertised locking semantics by pushing the save and restore functions into
the branch where the entire tree is already locked. Extend the helper
function names with "_locked" and update the comment to reflect this
calling requirement.
Fixes: b014e96d1abb ("PCI: Protect pci_error_handlers->reset_notify() usage with device_lock()")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Sinan Kaya <okaya@kernel.org>
|
|
A warning is generated when a PCIe device is probed with a degraded link,
but there was no similar mechanism to warn when the link becomes degraded
after probing. The Link Bandwidth Notification provides this mechanism.
Use the Link Bandwidth Management Interrupt to detect bandwidth changes,
and rescan the bandwidth, looking for the weakest point. This is the same
logic used in probe().
Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The interrupt departement delivers this time:
- New infrastructure to manage NMIs on platforms which have a sane
NMI delivery, i.e. identifiable NMI vectors instead of a single
lump.
- Simplification of the interrupt affinity management so drivers
don't have to implement ugly loops around the PCI/MSI enablement.
- Speedup for interrupt statistics in /proc/stat
- Provide a function to retrieve the default irq domain
- A new interrupt controller for the Loongson LS1X platform
- Affinity support for the SiFive PLIC
- Better support for the iMX irqsteer driver
- NUMA aware memory allocations for GICv3
- The usual small fixes, improvements and cleanups all over the
place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
irqchip/imx-irqsteer: Add multi output interrupts support
irqchip/imx-irqsteer: Change to use reg_num instead of irq_group
dt-bindings: irq: imx-irqsteer: Add multi output interrupts support
dt-binding: irq: imx-irqsteer: Use irq number instead of group number
irqchip/brcmstb-l2: Use _irqsave locking variants in non-interrupt code
irqchip/gicv3-its: Use NUMA aware memory allocation for ITS tables
irqdomain: Allow the default irq domain to be retrieved
irqchip/sifive-plic: Implement irq_set_affinity() for SMP host
irqchip/sifive-plic: Differentiate between PLIC handler and context
irqchip/sifive-plic: Add warning in plic_init() if handler already present
irqchip/sifive-plic: Pre-compute context hart base and enable base
PCI/MSI: Remove obsolete sanity checks for multiple interrupt sets
genirq/affinity: Remove the leftovers of the original set support
nvme-pci: Simplify interrupt allocation
genirq/affinity: Add new callback for (re)calculating interrupt sets
genirq/affinity: Store interrupt sets size in struct irq_affinity
genirq/affinity: Code consolidation
irqchip/irq-sifive-plic: Check and continue in case of an invalid cpuid.
irqchip/i8259: Fix shutdown order by moving syscore_ops registration
dt-bindings: interrupt-controller: loongson ls1x intc
...
|
|
This condition needs to be fipped around because "err" is uninitialized
when "force" is set. The Smatch static analysis tool complains and
UBsan will also complain at runtime.
Fixes: 663586c0a892 ("ubi: Expose the bitrot interface")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer and clockevent updates from Thomas Gleixner:
"The time(r) core and clockevent updates are mostly boring this time:
- A new driver for the Tegra210 timer
- Small fixes and improvements alll over the place
- Documentation updates and cleanups"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
soc/tegra: default select TEGRA_TIMER for Tegra210
clocksource/drivers/tegra: Add Tegra210 timer support
dt-bindings: timer: add Tegra210 timer
clocksource/drivers/timer-cs5535: Rename the file for consistency
clocksource/drivers/timer-pxa: Rename the file for consistency
clocksource/drivers/tango-xtal: Rename the file for consistency
dt-bindings: timer: gpt: update binding doc
clocksource/drivers/exynos_mct: Remove unused header includes
dt-bindings: timer: mediatek: update bindings for MT7629 SoC
clocksource/drivers/exynos_mct: Fix error path in timer resources initialization
clocksource/drivers/exynos_mct: Remove dead code
clocksource/drivers/riscv: Add required checks during clock source init
dt-bindings: timer: renesas: tmu: Document r8a774c0 bindings
dt-bindings: timer: renesas, cmt: Document r8a774c0 CMT support
clocksource/drivers/exynos_mct: Clear timer interrupt when shutdown
clocksource/drivers/exynos_mct: Move one-shot check from tick clear to ISR
clocksource/drivers/arch_timer: Workaround for Allwinner A64 timer instability
clocksource/drivers/sun5i: Fail gracefully when clock rate is unavailable
timers: Mark expected switch fall-throughs
timekeeping/debug: No need to check return value of debugfs_create functions
...
|
|
Don't define a direct_access function that fails, dm_dax_direct_access
already fails with -EIO if the pointer is zero;
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
DM cache now defaults to passing discards down to the origin device.
User may disable this using the "no_discard_passdown" feature when
creating the cache device.
If the cache's underlying origin device doesn't support discards then
passdown is disabled (with warning). Similarly, if the underlying
origin device's max_discard_sectors is less than a cache block discard
passdown will be disabled (this is required because sizing of the cache
internal discard bitset depends on it).
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
The workqueue's name should be "writecache-writeback" instead of
"writecache-writeabck".
Signed-off-by: Huaisheng Ye <yehs1@lenovo.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Add a "create" module parameter, which allows device-mapper targets to
be configured at boot time. This enables early use of DM targets in the
boot process (as the root device or otherwise) without the need of an
initramfs.
The syntax used in the boot param is based on the concise format from
the dmsetup tool to follow the rule of least surprise:
dmsetup table --concise /dev/mapper/lroot
Which is:
dm-mod.create=<name>,<uuid>,<minor>,<flags>,<table>[,<table>+][;<name>,<uuid>,<minor>,<flags>,<table>[,<table>+]+]
Where,
<name> ::= The device name.
<uuid> ::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | ""
<minor> ::= The device minor number | ""
<flags> ::= "ro" | "rw"
<table> ::= <start_sector> <num_sectors> <target_type> <target_args>
<target_type> ::= "verity" | "linear" | ...
For example, the following could be added in the boot parameters:
dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0
Only the targets that were tested are allowed and the ones that don't
change any block device when the device is create as read-only. For
example, mirror and cache targets are not allowed. The rationale behind
this is that if the user makes a mistake, choosing the wrong device to
be the mirror or the cache can corrupt data.
The only targets initially allowed are:
* crypt
* delay
* linear
* snapshot-origin
* striped
* verity
Co-developed-by: Will Drewry <wad@chromium.org>
Co-developed-by: Kees Cook <keescook@chromium.org>
Co-developed-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Invoking dm_get_device() twice on the same device path with different
modes is dangerous. Because in that case, upgrade_mode() will alloc a
new 'dm_dev' and free the old one, which may be referenced by a previous
caller. Dereferencing the dangling pointer will trigger kernel NULL
pointer dereference.
The following two cases can reproduce this issue. Actually, they are
invalid setups that must be disallowed, e.g.:
1. Creating a thin-pool with read_only mode, and the same device as
both metadata and data.
dmsetup create thinp --table \
"0 41943040 thin-pool /dev/vdb /dev/vdb 128 0 1 read_only"
BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
...
Call Trace:
new_read+0xfb/0x110 [dm_bufio]
dm_bm_read_lock+0x43/0x190 [dm_persistent_data]
? kmem_cache_alloc_trace+0x15c/0x1e0
__create_persistent_data_objects+0x65/0x3e0 [dm_thin_pool]
dm_pool_metadata_open+0x8c/0xf0 [dm_thin_pool]
pool_ctr.cold.79+0x213/0x913 [dm_thin_pool]
? realloc_argv+0x50/0x70 [dm_mod]
dm_table_add_target+0x14e/0x330 [dm_mod]
table_load+0x122/0x2e0 [dm_mod]
? dev_status+0x40/0x40 [dm_mod]
ctl_ioctl+0x1aa/0x3e0 [dm_mod]
dm_ctl_ioctl+0xa/0x10 [dm_mod]
do_vfs_ioctl+0xa2/0x600
? handle_mm_fault+0xda/0x200
? __do_page_fault+0x26c/0x4f0
ksys_ioctl+0x60/0x90
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x55/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xa9
2. Creating a external snapshot using the same thin-pool device.
dmsetup create thinp --table \
"0 41943040 thin-pool /dev/vdc /dev/vdb 128 0 2 ignore_discard"
dmsetup message /dev/mapper/thinp 0 "create_thin 0"
dmsetup create snap --table \
"0 204800 thin /dev/mapper/thinp 0 /dev/mapper/thinp"
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
...
Call Trace:
? __alloc_pages_nodemask+0x13c/0x2e0
retrieve_status+0xa5/0x1f0 [dm_mod]
? dm_get_live_or_inactive_table.isra.7+0x20/0x20 [dm_mod]
table_status+0x61/0xa0 [dm_mod]
ctl_ioctl+0x1aa/0x3e0 [dm_mod]
dm_ctl_ioctl+0xa/0x10 [dm_mod]
do_vfs_ioctl+0xa2/0x600
ksys_ioctl+0x60/0x90
? ksys_write+0x4f/0xb0
__x64_sys_ioctl+0x16/0x20
do_syscall_64+0x55/0x150
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: Jason Cai (Xiang Feng) <jason.cai@linux.alibaba.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
unlikely has already included in IS_ERR(),
so just remove redundant unlikely annotation.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
unlikely has already included in IS_ERR(),
so just remove redundant unlikely annotation.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
unlikely has already included in IS_ERR(),
so just remove redundant unlikely annotation.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Do not just call blk_queue_split() if the bio is_abnormal_io().
Fixes: 568c73a355e ("dm: update dm_process_bio() to split bio if in ->make_request_fn()")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:
struct foo {
int stuff;
void *entry[];
};
instance = kzalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL);
Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:
instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Also move dm_rq_target_io structure definition from dm-rq.h to dm-rq.c
Fixes: 6a23e05c2fe3c6 ("dm: remove legacy request-based IO path")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
Pull MIPS updates from Paul Burton:
- Support for the MIPSr6 MemoryMapID register & Global INValidate TLB
(GINVT) instructions, allowing for more efficient TLB maintenance
when running on a CPU such as the I6500 that supports these.
- Enable huge page support for MIPS64r6.
- Optimize post-DMA cache sync by removing that code entirely for
kernel configurations in which we know it won't be needed.
- The number of pages allocated for interrupt stacks is now calculated
correctly, where before we would wastefully allocate too much memory
in some configurations.
- The ath79 platform migrates to devicetree.
- The bcm47xx platform sees fixes for the Buffalo WHR-G54S board.
- The ingenic/jz4740 platform gains support for appended devicetrees.
- The cavium_octeon, lantiq, loongson32 & sgi-ip27 platforms all see
cleanups as do various pieces of core architecture code.
* tag 'mips_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (66 commits)
MIPS: lantiq: Remove separate GPHY Firmware loader
MIPS: ingenic: Add support for appended devicetree
MIPS: SGI-IP27: rework HUB interrupts
MIPS: SGI-IP27: do boot CPU init later
MIPS: SGI-IP27: do xtalk scanning later
MIPS: SGI-IP27: use pr_info/pr_emerg and pr_cont to fix output
MIPS: SGI-IP27: clean up bridge access and header files
MIPS: SGI-IP27: get rid of volatile and hubreg_t
MIPS: irq: Allocate accurate order pages for irq stack
MIPS: dma-noncoherent: Remove bogus condition in dma_sync_phys()
MIPS: eBPF: Remove REG_32BIT_ZERO_EX
MIPS: eBPF: Always return sign extended 32b values
MIPS: CM: Fix indentation
MIPS: BCM47XX: Fix/improve Buffalo WHR-G54S support
MIPS: OCTEON: program rx/tx-delay always from DT
MIPS: OCTEON: delete board-specific link status
MIPS: OCTEON: don't lie about interface type of CN3005 board
MIPS: OCTEON: warn if deprecated link status is being used
MIPS: OCTEON: add fixed-link nodes to in-kernel device tree
MIPS: Delete unused flush_cache_sigtramp()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
"The most important changes in this patch set are:
- DMA-related cleanups for parisc with the aim to move anything not
required by drivers out of <asm/dma-mapping.h>, by Christoph
Hellwig
- Switch to memblock_alloc(), by Mike Rapoport
- Makefile cleanups by Masahiro Yamada
- Switch to bust_spinlocks(), by Sergey Senozhatsky
- Improved initial SMP affinity selection for IRQs
- Added IPI- and rescheduling interrupts in /proc/interrupts output"
* 'parisc-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: (21 commits)
parisc: use memblock_alloc() instead of custom get_memblock()
parisc: Add constants for various PDC firmware calls
parisc: Add constant for PDC_PAT_COMPLEX firmware call
parisc: Show machine product number during boot
parisc: Add constants for PDC_RELOCATE PDC call
parisc: Add PDC_CRASH_PREP PDC function number
parisc: Use F_EXTEND() macro in iosapic code
parisc: remove the HBA_DATA macro
parisc/lba_pci: use container_of in LBA_DEV
parisc/dino: use container_of in DINO_DEV
parisc: properly type the return value of parisc_walk_tree
parisc: properly type the iommu field in struct pci_hba_data
parisc: turn GET_IOC into an inline function
parisc: move internal implementation details out of <asm/dma-mapping.h>
parisc: don't include <asm/cacheflush.h> in <asm/dma-mapping.h>
parisc: remove meaningless ccflags-y in arch/parisc/boot/Makefile
parisc: replace oops_in_progress manipulation with bust_spinlocks()
parisc: Improve initial IRQ to CPU assignment
parisc: Count IPI function call interrupts
parisc: Show rescheduling interrupts on SMP machines only
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
- A copy of Arnds compat wrapper generation series
- Pass information about the KVM guest to the host in form the control
program code and the control program version code
- Map IOV resources to support PCI physical functions on s390
- Add vector load and store alignment hints to improve performance
- Use the "jdd" constraint with gcc 9 to make jump labels working again
- Remove amode workaround for old z/VM releases from the DCSS code
- Add support for in-kernel performance measurements using the CPU
measurement counter facility
- Introduce a new PMU device cpum_cf_diag to capture counters and store
thenn as event raw data.
- Bug fixes and cleanups
* tag 's390-5.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits)
Revert "s390/cpum_cf: Add kernel message exaplanations"
s390/dasd: fix read device characteristic with CONFIG_VMAP_STACK=y
s390/suspend: fix prefix register reset in swsusp_arch_resume
s390: warn about clearing als implied facilities
s390: allow overriding facilities via command line
s390: clean up redundant facilities list setup
s390/als: remove duplicated in-place implementation of stfle
s390/cio: Use cpa range elsewhere within vfio-ccw
s390/cio: Fix vfio-ccw handling of recursive TICs
s390: vfio_ap: link the vfio_ap devices to the vfio_ap bus subsystem
s390/cpum_cf: Handle EBUSY return code from CPU counter facility reservation
s390/cpum_cf: Add kernel message exaplanations
s390/cpum_cf_diag: Add support for s390 counter facility diagnostic trace
s390/cpum_cf: add ctr_stcctm() function
s390/cpum_cf: move common functions into a separate file
s390/cpum_cf: introduce kernel_cpumcf_avail() function
s390/cpu_mf: replace stcctm5() with the stcctm() function
s390/cpu_mf: add store cpu counter multiple instruction support
s390/cpum_cf: Add minimal in-kernel interface for counter measurements
s390/cpum_cf: introduce kernel_cpumcf_alert() to obtain measurement alerts
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- VLA removal
- gcc-8.x build fixes
- small improvements and cleanups
- defconfig updates
* tag 'm68k-for-v5.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
m68k: Add -ffreestanding to CFLAGS
m68k/apollo: Fix comment in Makefile
dio: Fix buffer overflow in case of unknown board
m68k/defconfig: Update defconfigs for v5.0-rc1
m68k/atari: Avoid VLA use in atari_switches_setup()
m68k: Avoid VLA use in mangle_kernel_stack()
m68k/mac: Use '030 reset method on SE/30
m68k/mac: Remove obsolete comment
m68k/mac: Skip VIA port setup unless RTC is connected
m68k/mac: Clean up unused timer definitions
m68k/defconfig: Drop NET_VENDOR_<FOO>=n
|
|
All copyups perform deep-copyup regardless of whether deep-flatten
feature is enabled. The feature bit is used to ensure that image is
written to only by new-enough clients that always perform deep-copyup.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Otherwise, once the parent snapshot is removed, the clone's snapshot
wouldn't reflect the state of the clone prior to whole-object write or
zeroout because a deep-copyup was never done ("rbd flatten" wouldn't do
it because the modified object would exist in HEAD).
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
This is the core of deep-flatten feature: sending a copyup request
(i.e. a guarded write of the data read from the parent) with an empty
snapshot context (snaps = [], seq = 0) causes the OSD to reflect the
write in all existing snapshots. This allows "rbd flatten" to fully
disconnect the clone image and its snapshots from the parent and make
the parent snapshot removable.
The actual modification request is sent only after deep-copyup request
is completed. Waiting for deep-copyup reply is unnecessary, this will
be improved in the future.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In preparation for deep-flatten feature, split rbd_obj_issue_copyup()
into two functions and add a new write state to make the state machine
slightly more clear. Make the copyup op optional and start using that
for when the overlap goes to 0.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
In preparation for deep-flatten feature, stop copying num_osd_ops from
the original request in rbd_obj_issue_copyup(). Split the calculation
into count_{write,zeroout}_ops() respectively and determine whether the
assert_exists guard is needed with the new rbd_obj_copyup_enabled().
As a nice side effect, we no longer guard in the writefull case as the
copyup'ed object is always fully overwritten.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Allow passing a custom snapshot context: NULL for read and an empty
snapshot context for deep-copyup.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Otherwise the assert in rbd_obj_end_request() is triggered.
Fixes: 3da691bf4366 ("rbd: new request handling code")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Support for kernel layering hasn't been considered experimental for
a few years now. All the issues that I'm aware of were shaken out in
2014 and early 2015. Moreover, most of that code was rewritten with
the addition of support for fancy striping.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
|
If, after rounding off, the discard request is smaller than alloc_size,
drop it on the floor in __rbd_img_fill_request().
Default alloc_size to 64k. This should cover both HDD and SSD based
bluestore OSDs and somewhat improve things for filestore. For OSDs on
filestore with filestore_punch_hole = false, alloc_size is best set to
object size in order to allow deletes and truncates and disallow zero
op.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
|
With discard_zeroes_data gone in commit 48920ff2a5a9 ("block: remove
the discard_zeroes_data flag"), continuing to provide this guarantee is
pointless: applications can't query it and discards can only be used
for deallocating.
Add OBJ_OP_ZEROOUT and move the existing logic under it. As the first
step to divorcing OBJ_OP_DISCARD, stop worrying about copyups but keep
special casing whole-object layered discards.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
|
It is effectively unused.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
"API:
- Add helper for simple skcipher modes.
- Add helper to register multiple templates.
- Set CRYPTO_TFM_NEED_KEY when setkey fails.
- Require neither or both of export/import in shash.
- AEAD decryption test vectors are now generated from encryption
ones.
- New option CONFIG_CRYPTO_MANAGER_EXTRA_TESTS that includes random
fuzzing.
Algorithms:
- Conversions to skcipher and helper for many templates.
- Add more test vectors for nhpoly1305 and adiantum.
Drivers:
- Add crypto4xx prng support.
- Add xcbc/cmac/ecb support in caam.
- Add AES support for Exynos5433 in s5p.
- Remove sha384/sha512 from artpec7 as hardware cannot do partial
hash"
[ There is a merge of the Freescale SoC tree in order to pull in changes
required by patches to the caam/qi2 driver. ]
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (174 commits)
crypto: s5p - add AES support for Exynos5433
dt-bindings: crypto: document Exynos5433 SlimSSS
crypto: crypto4xx - add missing of_node_put after of_device_is_available
crypto: cavium/zip - fix collision with generic cra_driver_name
crypto: af_alg - use struct_size() in sock_kfree_s()
crypto: caam - remove redundant likely/unlikely annotation
crypto: s5p - update iv after AES-CBC op end
crypto: x86/poly1305 - Clear key material from stack in SSE2 variant
crypto: caam - generate hash keys in-place
crypto: caam - fix DMA mapping xcbc key twice
crypto: caam - fix hash context DMA unmap size
hwrng: bcm2835 - fix probe as platform device
crypto: s5p-sss - Use AES_BLOCK_SIZE define instead of number
crypto: stm32 - drop pointless static qualifier in stm32_hash_remove()
crypto: chelsio - Fixed Traffic Stall
crypto: marvell - Remove set but not used variable 'ivsize'
crypto: ccp - Update driver messages to remove some confusion
crypto: adiantum - add 1536 and 4096-byte test vectors
crypto: nhpoly1305 - add a test vector with len % 16 != 0
crypto: arm/aes-ce - update IV after partial final CTR block
...
|
|
Pull networking updates from David Miller:
"Here we go, another merge window full of networking and #ebpf changes:
1) Snoop DHCPACKS in batman-adv to learn MAC/IP pairs in the DHCP
range without dealing with floods of ARP traffic, from Linus
Lüssing.
2) Throttle buffered multicast packet transmission in mt76, from
Felix Fietkau.
3) Support adaptive interrupt moderation in ice, from Brett Creeley.
4) A lot of struct_size conversions, from Gustavo A. R. Silva.
5) Add peek/push/pop commands to bpftool, as well as bash completion,
from Stanislav Fomichev.
6) Optimize sk_msg_clone(), from Vakul Garg.
7) Add SO_BINDTOIFINDEX, from David Herrmann.
8) Be more conservative with local resends due to local congestion,
from Yuchung Cheng.
9) Allow vetoing of unsupported VXLAN FDBs, from Petr Machata.
10) Add health buffer support to devlink, from Eran Ben Elisha.
11) Add TXQ scheduling API to mac80211, from Toke Høiland-Jørgensen.
12) Add statistics to basic packet scheduler filter, from Cong Wang.
13) Add GRE tunnel support for mlxsw Spectrum-2, from Nir Dotan.
14) Lots of new IP tunneling forwarding tests, also from Nir Dotan.
15) Add 3ad stats to bonding, from Nikolay Aleksandrov.
16) Lots of probing improvements for bpftool, from Quentin Monnet.
17) Various nfp drive #ebpf JIT improvements from Jakub Kicinski.
18) Allow #ebpf programs to access gso_segs from skb shared info, from
Eric Dumazet.
19) Add sock_diag support for AF_XDP sockets, from Björn Töpel.
20) Support 22260 iwlwifi devices, from Luca Coelho.
21) Use rbtree for ipv6 defragmentation, from Peter Oskolkov.
22) Add JMP32 instruction class support to #ebpf, from Jiong Wang.
23) Add spinlock support to #ebpf, from Alexei Starovoitov.
24) Support 256-bit keys and TLS 1.3 in ktls, from Dave Watson.
25) Add device infomation API to devlink, from Jakub Kicinski.
26) Add new timestamping socket options which are y2038 safe, from
Deepa Dinamani.
27) Add RX checksum offloading for various sh_eth chips, from Sergei
Shtylyov.
28) Flow offload infrastructure, from Pablo Neira Ayuso.
29) Numerous cleanups, improvements, and bug fixes to the PHY layer
and many drivers from Heiner Kallweit.
30) Lots of changes to try and make packet scheduler classifiers run
lockless as much as possible, from Vlad Buslov.
31) Support BCM957504 chip in bnxt_en driver, from Erik Burrows.
32) Add concurrency tests to tc-tests infrastructure, from Vlad
Buslov.
33) Add hwmon support to aquantia, from Heiner Kallweit.
34) Allow 64-bit values for SO_MAX_PACING_RATE, from Eric Dumazet.
And I would be remiss if I didn't thank the various major networking
subsystem maintainers for integrating much of this work before I even
saw it. Alexei Starovoitov, Daniel Borkmann, Pablo Neira Ayuso,
Johannes Berg, Kalle Valo, and many others. Thank you!"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2207 commits)
net/sched: avoid unused-label warning
net: ignore sysctl_devconf_inherit_init_net without SYSCTL
phy: mdio-mux: fix Kconfig dependencies
net: phy: use phy_modify_mmd_changed in genphy_c45_an_config_aneg
net: dsa: mv88e6xxx: add call to mv88e6xxx_ports_cmode_init to probe for new DSA framework
selftest/net: Remove duplicate header
sky2: Disable MSI on Dell Inspiron 1545 and Gateway P-79
net/mlx5e: Update tx reporter status in case channels were successfully opened
devlink: Add support for direct reporter health state update
devlink: Update reporter state to error even if recover aborted
sctp: call iov_iter_revert() after sending ABORT
team: Free BPF filter when unregistering netdev
ip6mr: Do not call __IP6_INC_STATS() from preemptible context
isdn: mISDN: Fix potential NULL pointer dereference of kzalloc
net: dsa: mv88e6xxx: support in-band signalling on SGMII ports with external PHYs
cxgb4/chtls: Prefix adapter flags with CXGB4
net-sysfs: Switch to bitmap_zalloc()
mellanox: Switch to bitmap_zalloc()
bpf: add test cases for non-pointer sanitiation logic
mlxsw: i2c: Extend initialization by querying resources data
...
|