Age | Commit message (Collapse) | Author |
|
Recently the following BUG was reported:
Injecting memory failure for pfn 0x3c0000 at process virtual address 0x7fe300000000
Memory failure: 0x3c0000: recovery action for huge page: Recovered
BUG: unable to handle kernel paging request at ffff8dfcc0003000
IP: gup_pgd_range+0x1f0/0xc20
PGD 17ae72067 P4D 17ae72067 PUD 0
Oops: 0000 [#1] SMP PTI
...
CPU: 3 PID: 5467 Comm: hugetlb_1gb Not tainted 4.15.0-rc8-mm1-abc+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
You can easily reproduce this by calling madvise(MADV_HWPOISON) twice on
a 1GB hugepage. This happens because get_user_pages_fast() is not aware
of a migration entry on pud that was created in the 1st madvise() event.
I think that conversion to pud-aligned migration entry is working, but
other MM code walking over page table isn't prepared for it. We need
some time and effort to make all this work properly, so this patch
avoids the reported bug by just disabling error handling for 1GB
hugepage.
[n-horiguchi@ah.jp.nec.com: v2]
Link: http://lkml.kernel.org/r/1517284444-18149-1-git-send-email-n-horiguchi@ah.jp.nec.com
Link: http://lkml.kernel.org/r/1517207283-15769-1-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
During memory hotplugging we traverse struct pages three times:
1. memset(0) in sparse_add_one_section()
2. loop in __add_section() to set do: set_page_node(page, nid); and
SetPageReserved(page);
3. loop in memmap_init_zone() to call __init_single_pfn()
This patch removes the first two loops, and leaves only loop 3. All
struct pages are initialized in one place, the same as it is done during
boot.
The benefits:
- We improve memory hotplug performance because we are not evicting the
cache several times and also reduce loop branching overhead.
- Remove condition from hotpath in __init_single_pfn(), that was added
in order to fix the problem that was reported by Bharata in the above
email thread, thus also improve performance during normal boot.
- Make memory hotplug more similar to the boot memory initialization
path because we zero and initialize struct pages only in one
function.
- Simplifies memory hotplug struct page initialization code, and thus
enables future improvements, such as multi-threading the
initialization of struct pages in order to improve hotplug
performance even further on larger machines.
[pasha.tatashin@oracle.com: v5]
Link: http://lkml.kernel.org/r/20180228030308.1116-7-pasha.tatashin@oracle.com
Link: http://lkml.kernel.org/r/20180215165920.8570-7-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
During memory hotplugging the probe routine will leave struct pages
uninitialized, the same as it is currently done during boot. Therefore,
we do not want to access the inside of struct pages before
__init_single_page() is called during onlining.
Because during hotplug we know that pages in one memory block belong to
the same numa node, we can skip the checking. We should keep checking
for the boot case.
[pasha.tatashin@oracle.com: s/register_new_memory()/hotplug_memory_register()]
Link: http://lkml.kernel.org/r/20180228030308.1116-6-pasha.tatashin@oracle.com
Link: http://lkml.kernel.org/r/20180215165920.8570-6-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
During boot we poison struct page memory in order to ensure that no one
is accessing this memory until the struct pages are initialized in
__init_single_page().
This patch adds more scrutiny to this checking by making sure that flags
do not equal the poison pattern when they are accessed. The pattern is
all ones.
Since node id is also stored in struct page, and may be accessed quite
early, we add this enforcement into page_to_nid() function as well.
Note, this is applicable only when NODE_NOT_IN_PAGE_FLAGS=n
[pasha.tatashin@oracle.com: v4]
Link: http://lkml.kernel.org/r/20180215165920.8570-4-pasha.tatashin@oracle.com
Link: http://lkml.kernel.org/r/20180213193159.14606-4-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Deferred page initialization allows the boot cpu to initialize a small
subset of the system's pages early in boot, with other cpus doing the
rest later on.
It is, however, problematic to know how many pages the kernel needs
during boot. Different modules and kernel parameters may change the
requirement, so the boot cpu either initializes too many pages or runs
out of memory.
To fix that, initialize early pages on demand. This ensures the kernel
does the minimum amount of work to initialize pages during boot and
leaves the rest to be divided in the multithreaded initialization path
(deferred_init_memmap).
The on-demand code is permanently disabled using static branching once
deferred pages are initialized. After the static branch is changed to
false, the overhead is up-to two branch-always instructions if the zone
watermark check fails or if rmqueue fails.
Sergey Senozhatsky noticed that while deferred pages currently make
sense only on NUMA machines (we start one thread per latency node),
CONFIG_NUMA is not a requirement for CONFIG_DEFERRED_STRUCT_PAGE_INIT,
so that is also must be addressed in the patch.
[akpm@linux-foundation.org: fix typo in comment, make deferred_pages static]
[pasha.tatashin@oracle.com: fix min() type mismatch warning]
Link: http://lkml.kernel.org/r/20180212164543.26592-1-pasha.tatashin@oracle.com
[pasha.tatashin@oracle.com: use zone_to_nid() in deferred_grow_zone()]
Link: http://lkml.kernel.org/r/20180214163343.21234-2-pasha.tatashin@oracle.com
[pasha.tatashin@oracle.com: might_sleep warning]
Link: http://lkml.kernel.org/r/20180306192022.28289-1-pasha.tatashin@oracle.com
[akpm@linux-foundation.org: s/spin_lock/spin_lock_irq/ in page_alloc_init_late()]
[pasha.tatashin@oracle.com: v5]
Link: http://lkml.kernel.org/r/20180309220807.24961-3-pasha.tatashin@oracle.com
[akpm@linux-foundation.org: tweak comments]
[pasha.tatashin@oracle.com: v6]
Link: http://lkml.kernel.org/r/20180313182355.17669-3-pasha.tatashin@oracle.com
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20180209192216.20509-2-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Vlastimil Babka reported about a window issue during which when deferred
pages are initialized, and the current version of on-demand
initialization is finished, allocations may fail. While this is highly
unlikely scenario, since this kind of allocation request must be large,
and must come from interrupt handler, we still want to cover it.
We solve this by initializing deferred pages with interrupts disabled,
and holding node_size_lock spin lock while pages in the node are being
initialized. The on-demand deferred page initialization that comes
later will use the same lock, and thus synchronize with
deferred_init_memmap().
It is unlikely for threads that initialize deferred pages to be
interrupted. They run soon after smp_init(), but before modules are
initialized, and long before user space programs. This is why there is
no adverse effect of having these threads running with interrupts
disabled.
[pasha.tatashin@oracle.com: v6]
Link: http://lkml.kernel.org/r/20180313182355.17669-2-pasha.tatashin@oracle.com
Link: http://lkml.kernel.org/r/20180309220807.24961-2-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Gioh Kim <gi-oh.kim@profitbricks.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
alloc_contig_range() initiates compaction and eventual migration for the
purpose of either CMA or HugeTLB allocations. At present, the reason
code remains the same MR_CMA for either of these cases. Let's make it
MR_CONTIG_RANGE which will appropriately reflect the reason code in both
these cases.
Link: http://lkml.kernel.org/r/20180202091518.18798-1-khandual@linux.vnet.ibm.com
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
struct kmem_cache_order_objects is for mixing order and number of
objects, and orders aren't big enough to warrant 64-bit width.
Propagate unsignedness down so that everything fits.
!!! Patch assumes that "PAGE_SIZE << order" doesn't overflow. !!!
Link: http://lkml.kernel.org/r/20180305200730.15812-23-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If kmem case sizes are 32-bit, then usecopy region should be too.
Link: http://lkml.kernel.org/r/20180305200730.15812-21-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If SLAB doesn't support 4GB+ kmem caches (it never did), KASAN should
not do it as well.
Link: http://lkml.kernel.org/r/20180305200730.15812-20-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Linux doesn't support negative length objects (including meta data).
Link: http://lkml.kernel.org/r/20180305200730.15812-18-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Linux doesn't support negative length objects.
Link: http://lkml.kernel.org/r/20180305200730.15812-17-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
->offset is free pointer offset from the start of the object, can't be
negative.
Link: http://lkml.kernel.org/r/20180305200730.15812-16-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
/*
* cpu_partial determined the maximum number of objects
* kept in the per cpu partial lists of a processor.
*/
Can't be negative.
Link: http://lkml.kernel.org/r/20180305200730.15812-15-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
->inuse is "the number of bytes in actual use by the object",
can't be negative.
Link: http://lkml.kernel.org/r/20180305200730.15812-14-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Kmem cache alignment can't be negative.
Link: http://lkml.kernel.org/r/20180305200730.15812-13-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
->reserved is either 0 or sizeof(struct rcu_head), can't be negative.
Link: http://lkml.kernel.org/r/20180305200730.15812-12-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Padding length can't be negative.
Link: http://lkml.kernel.org/r/20180305200730.15812-11-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
->max_attr_size is maximum length of every SLAB memcg attribute
ever written. VFS limits those to INT_MAX.
Link: http://lkml.kernel.org/r/20180305200730.15812-10-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
->remote_node_defrag_ratio is in range 0..1000.
This also adds a check and modifies the behavior to return an error
code. Before this patch invalid values were ignored.
Link: http://lkml.kernel.org/r/20180305200730.15812-9-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
struct kmem_cache::size and ::align were always 32-bit.
Out of curiosity I created 4GB kmem_cache, it oopsed with division by 0.
kmem_cache_create(1UL<<32+1) created 1-byte cache as expected.
size_t doesn't work and never did.
Link: http://lkml.kernel.org/r/20180305200730.15812-6-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
kmalloc_size() derives size of kmalloc cache from internal index, which
can't be negative.
Propagate unsignedness a bit.
Link: http://lkml.kernel.org/r/20180305200730.15812-3-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
kmalloc_index() return index into an array of kmalloc kmem caches,
therefore should be unsigned.
Space savings with SLUB on trimmed down .config:
add/remove: 0/1 grow/shrink: 6/56 up/down: 85/-557 (-472)
Function old new delta
calculate_sizes 924 983 +59
on_freelist 589 604 +15
init_cache_random_seq 122 127 +5
ext4_mb_init 1206 1210 +4
slab_pad_check.part 270 271 +1
cpu_partial_store 112 113 +1
usersize_show 28 27 -1
...
new_slab 1871 1837 -34
slab_order 204 - -204
This patch start a series of converting SLUB (mostly) to "unsigned int".
1) Most integers in the code are in fact unsigned entities: array
indexes, lengths, buffer sizes, allocation orders. It is therefore
better to use unsigned variables
2) Some integers in the code are either "size_t" or "unsigned long" for
no reason.
size_t usually comes from people trying to maintain type correctness
and figuring out that "sizeof" operator returns size_t or
memset/memcpy takes size_t so should everything passed to it.
However the number of 4GB+ objects in the kernel is very small. Most,
if not all, dynamically allocated objects with kmalloc() or
kmem_cache_create() aren't actually big. Maintaining wide types
doesn't do anything.
64-bit ops are bigger than 32-bit on our beloved x86_64,
so try to not use 64-bit where it isn't necessary
(read: everywhere where integers are integers not pointers)
3) in case of SLAB allocators, there are additional limitations
*) page->inuse, page->objects are only 16-/15-bit,
*) cache size was always 32-bit
*) slab orders are small, order 20 is needed to go 64-bit on x86_64
(PAGE_SIZE << order)
Basically everything is 32-bit except kmalloc(1ULL<<32) which gets
shortcut through page allocator.
Christoph said:
:
: That changes with large base page size on power and ARM64 f.e. but then
: we do not want to encourage larger allocations through slab anyways.
Link: http://lkml.kernel.org/r/20180305200730.15812-2-adobriyan@gmail.com
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC driver updates from Arnd Bergmann:
"The main addition this time around is the new ARM "SCMI" framework,
which is the latest in a series of standards coming from ARM to do
power management in a platform independent way.
This has been through many review cycles, and it relies on a rather
interesting way of using the mailbox subsystem, but in the end I
agreed that Sudeep's version was the best we could do after all.
Other changes include:
- the ARM CCN driver is moved out of drivers/bus into drivers/perf,
which makes more sense. Similarly, the performance monitoring
portion of the CCI driver are moved the same way and cleaned up a
little more.
- a series of updates to the SCPI framework
- support for the Mediatek mt7623a SoC in drivers/soc
- support for additional NVIDIA Tegra hardware in drivers/soc
- a new reset driver for Socionext Uniphier
- lesser bug fixes in drivers/soc, drivers/tee, drivers/memory, and
drivers/firmware and drivers/reset across platforms"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (87 commits)
reset: uniphier: add ethernet reset control support for PXs3
reset: stm32mp1: Enable stm32mp1 reset driver
dt-bindings: reset: add STM32MP1 resets
reset: uniphier: add Pro4/Pro5/PXs2 audio systems reset control
reset: imx7: add 'depends on HAS_IOMEM' to fix unmet dependency
reset: modify the way reset lookup works for board files
reset: add support for non-DT systems
clk: scmi: use devm_of_clk_add_hw_provider() API and drop scmi_clocks_remove
firmware: arm_scmi: prevent accessing rate_discrete uninitialized
hwmon: (scmi) return -EINVAL when sensor information is unavailable
amlogic: meson-gx-socinfo: Update soc ids
soc/tegra: pmc: Use the new reset APIs to manage reset controllers
soc: mediatek: update power domain data of MT2712
dt-bindings: soc: update MT2712 power dt-bindings
cpufreq: scmi: add thermal dependency
soc: mediatek: fix the mistaken pointer accessed when subdomains are added
soc: mediatek: add SCPSYS power domain driver for MediaTek MT7623A SoC
soc: mediatek: avoid hardcoded value with bus_prot_mask
dt-bindings: soc: add header files required for MT7623A SCPSYS dt-binding
dt-bindings: soc: add SCPSYS binding for MT7623 and MT7623A SoC
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC platform updates from Arnd Bergmann:
"This release brings up a new platform based on the old ARM9 core: the
Nuvoton NPCM is used as a baseboard management controller, competing
with the better known ASpeed AST2xx series.
Another important change is the addition of ARMv7-A based chips in
mach-stm32. The older parts in this platform are ARMv7-M based
microcontrollers, now they are expanding to general-purpose workloads.
The other changes are the usual defconfig updates to enable additional
drivers, lesser bugfixes. The largest updates as often are the ongoing
OMAP cleanups, but we also have a number of changes for the older PXA
and davinci platforms this time.
For the Renesas shmobile/r-car platform, some new infrastructure is
needed to make the watchdog work correctly.
Supporting Multiprocessing on Allwinner A80 required a significant
amount of new code, but is not doing anything unexpected"
* tag 'armsoc-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (179 commits)
arm: npcm: modify configuration for the NPCM7xx BMC.
MAINTAINERS: update entry for ARM/berlin
ARM: omap2: fix am43xx build without L2X0
ARM: davinci: da8xx: simplify CFGCHIP regmap_config
ARM: davinci: da8xx: fix oops in USB PHY driver due to stack allocated platform_data
ARM: multi_v7_defconfig: add NXP FlexCAN IP support
ARM: multi_v7_defconfig: enable thermal driver for i.MX devices
ARM: multi_v7_defconfig: add RN5T618 PMIC family support
ARM: multi_v7_defconfig: add NXP graphics drivers
ARM: multi_v7_defconfig: add GPMI NAND controller support
ARM: multi_v7_defconfig: add OCOTP driver for NXP SoCs
ARM: multi_v7_defconfig: configure I2C driver built-in
arm64: defconfig: add CONFIG_UNIPHIER_THERMAL and CONFIG_SNI_AVE
ARM: imx: fix imx6sll-only build
ARM: imx: select ARM_CPU_SUSPEND for CPU_IDLE as well
ARM: mxs_defconfig: Re-sync defconfig
ARM: imx_v4_v5_defconfig: Use the generic fsl-asoc-card driver
ARM: imx_v4_v5_defconfig: Re-sync defconfig
arm64: defconfig: enable stmmac ethernet to defconfig
ARM: EXYNOS: Simplify code in coupled CPU idle hot path
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree updates from Rob Herring:
- Sync dtc to upstream version v1.4.6-9-gaadd0b65c987. This adds a
bunch more warnings (hidden behind W=1).
- Build dtc lexer and parser files instead of using shipped versions.
- Rework overlay apply API to take an FDT as input and apply overlays
in a single step.
- Add a phandle lookup cache. This improves boot time by hundreds of
msec on systems with large DT.
- Add trivial mcp4017/18/19 potentiometers bindings.
- Remove VLA stack usage in DT code.
* tag 'devicetree-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (26 commits)
of: unittest: fix an error code in of_unittest_apply_overlay()
of: unittest: move misplaced function declaration
of: unittest: Remove VLA stack usage
of: overlay: Fix forgotten reference to of_overlay_apply()
of: Documentation: Fix forgotten reference to of_overlay_apply()
of: unittest: local return value variable related cleanups
of: unittest: remove unneeded local return value variables
dt-bindings: trivial: add various mcp4017/18/19 potentiometers
of: unittest: fix an error test in of_unittest_overlay_8()
of: cache phandle nodes to reduce cost of of_find_node_by_phandle()
dt-bindings: rockchip-dw-mshc: use consistent clock names
MAINTAINERS: Add linux/of_*.h headers to appropriate subsystems
scripts: turn off some new dtc warnings by default
scripts/dtc: Update to upstream version v1.4.6-9-gaadd0b65c987
scripts/dtc: generate lexer and parser during build instead of shipping
powerpc: boot: add strrchr function
of: overlay: do not include path in full_name of added nodes
of: unittest: clean up changeset test
arm64/efi: Make strrchr() available to the EFI namespace
ARM: boot: add strrchr function
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo updates from Eric Biederman:
"The work on cleaning up and getting the bugs out of siginfo generation
was largely stalled this round. The progress that was made was the
definition of FPE_FLTUNK. Which is usable to fix many of the cases
where siginfo generation is erroneously generating SI_USER by setting
si_code to 0, that has recently been tagged as FPE_FIXME.
You already have the change by way of the arm64 tree as that
definition was pulled into the arm64 tree to allow fixing the problem
there.
What remains is the second round of fixing for what I thought was a
trivial change to the struct siginfo when put the union in _sigfault
where it belongs. Do to historical reasons 32bit m68k only ensures
that pointers are 2 byte aligned. So I have added a m68k test case
made of BUILD_BUG_ONs to verify I have this fix correct and possibly
catch problems, and I have computed the number of bytes of padding
needed for the _addr_bnd and _addr_pkey cases and just use an array of
characters that size.
For pure paranoia I have written the code so if there is an
architecture out there that does not perform any alignment of
structures it should still work.
With the removal of all of the stale arechitectures this cycle future
work on cleaning up struct siginfo should be much easier. Almost all
of the conflicting si_code definitions have been removed with the
removal of (blackfin, tile, and frv). Plus some of the most difficult
to test cases have simply been removed from the tree.
Which means that with a little luck copy_siginfo_to_user can become a
light weight wrapper around copy_to_user in the next cycle"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
m68k: Verify the offsets in struct siginfo never change.
signal: Correct the offset of si_pkey and si_lower in struct siginfo on m68k
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:
- Add info about loaded kdump kernel into the dump stack header
- Move dump-stack related code from printk.c to lib/dump_stack.c
- Write message about suspending consoles in KERN_INFO log level
* 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
printk: change message to pr_info
printk: move dump stack related code to lib/dump_stack.c
print kdump kernel loaded status in stack dump
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull misc filesystem updates from Jan Kara:
"udf, ext2, quota, fsnotify fixes & cleanups:
- udf fixes for handling of media without uid/gid
- udf fixes for some corner cases in parsing of volume recognition
sequence
- improvements of fsnotify handling of ENOMEM
- new ioctl to allow setting of watch descriptor id for inotify (for
checkpoint - restart)
- small ext2, reiserfs, quota cleanups"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: Kill an unused extern entry form quota.h
reiserfs: Remove VLA from fs/reiserfs/reiserfs.h
udf: fix potential refcnt problem of nls module
ext2: change return code to -ENOMEM when failing memory allocation
udf: Do not mark possibly inconsistent filesystems as closed
fsnotify: Let userspace know about lost events due to ENOMEM
fanotify: Avoid lost events due to ENOMEM for unlimited queues
udf: Remove never implemented mount options
udf: Update mount option documentation
udf: Provide saner default for invalid uid / gid
udf: Clean up handling of invalid uid/gid
udf: Apply uid/gid mount options also to new inodes & chown
udf: Ignore [ug]id=ignore mount options
udf: Fix handling of Partition Descriptors
udf: Unify common handling of descriptors
udf: Convert descriptor index definitions to enum
udf: Allow volume descriptor sequence to be terminated by unrecorded block
udf: Simplify handling of Volume Descriptor Pointers
udf: Fix off-by-one in volume descriptor sequence length
inotify: Extend ioctl to allow to request id of new watch descriptor
|
|
Pull nfsd updates from Bruce Fields:
"Chuck Lever did a bunch of work on nfsd tracepoints, on RDMA, and on
server xdr decoding (with an eye towards eliminating a data copy in
the RDMA case).
I did some refactoring of the delegation code in preparation for
eliminating some delegation self-conflicts and implementing write
delegations"
* tag 'nfsd-4.17' of git://linux-nfs.org/~bfields/linux: (40 commits)
nfsd: fix incorrect umasks
sunrpc: remove incorrect HMAC request initialization
NFSD: Clean up legacy NFS SYMLINK argument XDR decoders
NFSD: Clean up legacy NFS WRITE argument XDR decoders
nfsd: Trace NFSv4 COMPOUND execution
nfsd: Add I/O trace points in the NFSv4 read proc
nfsd: Add I/O trace points in the NFSv4 write path
nfsd: Add "nfsd_" to trace point names
nfsd: Record request byte count, not count of vectors
nfsd: Fix NFSD trace points
svc: Report xprt dequeue latency
sunrpc: Report per-RPC execution stats
sunrpc: Re-purpose trace_svc_process
sunrpc: Save remote presentation address in svc_xprt for trace events
sunrpc: Simplify trace_svc_recv
sunrpc: Simplify do_enqueue tracing
sunrpc: Move trace_svc_xprt_dequeue()
sunrpc: Update show_svc_xprt_flags() to include recently added flags
svc: Simplify ->xpo_secure_port
sunrpc: Remove unneeded pointer dereference
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs update from Jaegeuk Kim:
"In this round, we've mainly focused on performance tuning and critical
bug fixes occurred in low-end devices. Sheng Yong introduced
lost_found feature to keep missing files during recovery instead of
thrashing them. We're preparing coming fsverity implementation. And,
we've got more features to communicate with users for better
performance. In low-end devices, some memory-related issues were
fixed, and subtle race condtions and corner cases were addressed as
well.
Enhancements:
- large nat bitmaps for more free node ids
- add three block allocation policies to pass down write hints given by user
- expose extension list to user and introduce hot file extension
- tune small devices seamlessly for low-end devices
- set readdir_ra by default
- give more resources under gc_urgent mode regarding to discard and cleaning
- introduce fsync_mode to enforce posix or not
- nowait aio support
- add lost_found feature to keep dangling inodes
- reserve bits for future fsverity feature
- add test_dummy_encryption for FBE
Bug fixes:
- don't use highmem for dentry pages
- align memory boundary for bitops
- truncate preallocated blocks in write errors
- guarantee i_times on fsync call
- clear CP_TRIMMED_FLAG correctly
- prevent node chain loop during recovery
- avoid data race between atomic write and background cleaning
- avoid unnecessary selinux violation warnings on resgid option
- GFP_NOFS to avoid deadlock in quota and read paths
- fix f2fs_skip_inode_update to allow i_size recovery
In addition to the above, there are several minor bug fixes and clean-ups"
* tag 'f2fs-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (50 commits)
f2fs: remain written times to update inode during fsync
f2fs: make assignment of t->dentry_bitmap more readable
f2fs: truncate preallocated blocks in error case
f2fs: fix a wrong condition in f2fs_skip_inode_update
f2fs: reserve bits for fs-verity
f2fs: Add a segment type check in inplace write
f2fs: no need to initialize zero value for GFP_F2FS_ZERO
f2fs: don't track new nat entry in nat set
f2fs: clean up with F2FS_BLK_ALIGN
f2fs: check blkaddr more accuratly before issue a bio
f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read
f2fs: introduce a new mount option test_dummy_encryption
f2fs: introduce F2FS_FEATURE_LOST_FOUND feature
f2fs: release locks before return in f2fs_ioc_gc_range()
f2fs: align memory boundary for bitops
f2fs: remove unneeded set_cold_node()
f2fs: add nowait aio support
f2fs: wrap all options with f2fs_sb_info.mount_opt
f2fs: Don't overwrite all types of node to keep node chain
f2fs: introduce mount option for fsync mode
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual drivers: arcmsr, qla2xx, lpfc,
ufs, mpt3sas, hisi_sas.
In addition we have removed several really old drivers: sym53c416,
NCR53c406a, fdomain, fdomain_cs and removed the old scsi_module.c
initialization from all remaining drivers.
Plus an assortment of bug fixes, initialization errors and other minor
fixes"
* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (168 commits)
scsi: ufs: Add support for Auto-Hibernate Idle Timer
scsi: ufs: sysfs: reworking of the rpm_lvl and spm_lvl entries
scsi: qla2xxx: fx00 copypaste typo
scsi: qla2xxx: fix error message on <qla2400
scsi: smartpqi: update driver version
scsi: smartpqi: workaround fw bug for oq deletion
scsi: arcmsr: Change driver version to v1.40.00.05-20180309
scsi: arcmsr: Sleep to avoid CPU stuck too long for waiting adapter ready
scsi: arcmsr: Handle adapter removed due to thunderbolt cable disconnection.
scsi: arcmsr: Rename ACB_F_BUS_HANG_ON to ACB_F_ADAPTER_REMOVED for adapter hot-plug
scsi: qla2xxx: Update driver version to 10.00.00.06-k
scsi: qla2xxx: Fix Async GPN_FT for FCP and FC-NVMe scan
scsi: qla2xxx: Cleanup code to improve FC-NVMe error handling
scsi: qla2xxx: Fix FC-NVMe IO abort during driver reset
scsi: qla2xxx: Fix retry for PRLI RJT with reason of BUSY
scsi: qla2xxx: Remove nvme_done_list
scsi: qla2xxx: Return busy if rport going away
scsi: qla2xxx: Fix n2n_ae flag to prevent dev_loss on PDB change
scsi: qla2xxx: Add FC-NVMe abort processing
scsi: qla2xxx: Add changes for devloss timeout in driver
...
|
|
Pull block layer updates from Jens Axboe:
"It's a pretty quiet round this time, which is nice. This contains:
- series from Bart, cleaning up the way we set/test/clear atomic
queue flags.
- series from Bart, fixing races between gendisk and queue
registration and removal.
- set of bcache fixes and improvements from various folks, by way of
Michael Lyle.
- set of lightnvm updates from Matias, most of it being the 1.2 to
2.0 transition.
- removal of unused DIO flags from Nikolay.
- blk-mq/sbitmap memory ordering fixes from Omar.
- divide-by-zero fix for BFQ from Paolo.
- minor documentation patches from Randy.
- timeout fix from Tejun.
- Alpha "can't write a char atomically" fix from Mikulas.
- set of NVMe fixes by way of Keith.
- bsg and bsg-lib improvements from Christoph.
- a few sed-opal fixes from Jonas.
- cdrom check-disk-change deadlock fix from Maurizio.
- various little fixes, comment fixes, etc from various folks"
* tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits)
blk-mq: Directly schedule q->timeout_work when aborting a request
blktrace: fix comment in blktrace_api.h
lightnvm: remove function name in strings
lightnvm: pblk: remove some unnecessary NULL checks
lightnvm: pblk: don't recover unwritten lines
lightnvm: pblk: implement 2.0 support
lightnvm: pblk: implement get log report chunk
lightnvm: pblk: rename ppaf* to addrf*
lightnvm: pblk: check for supported version
lightnvm: implement get log report chunk helpers
lightnvm: make address conversions depend on generic device
lightnvm: add support for 2.0 address format
lightnvm: normalize geometry nomenclature
lightnvm: complete geo structure with maxoc*
lightnvm: add shorten OCSSD version in geo
lightnvm: add minor version to generic geometry
lightnvm: simplify geometry structure
lightnvm: pblk: refactor init/exit sequences
lightnvm: Avoid validation of default op value
lightnvm: centralize permission check for lightnvm ioctl
...
|
|
Pull EDAC updates from Borislav Petkov:
"Noteworthy is the NVDIMM support:
- NVDIMM support to EDAC (Tony Luck)
- misc fixes"
* tag 'edac_for_4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC, sb_edac: Remove variable length array usage
EDAC, skx_edac: Detect non-volatile DIMMs
firmware, DMI: Add function to look up a handle and return DIMM size
acpi, nfit: Add function to look up nvdimm device and provide SMBIOS handle
EDAC: Add new memory type for non-volatile DIMMs
EDAC: Drop duplicated array of strings for memory type names
EDAC, layerscape: Allow building for LS1021A
|
|
In the effort to remove all VLAs from the kernel[1], it is desirable to
build with -Wvla. However, this warning is overly pessimistic, in that
it is only happy with stack array sizes that are declared as constant
expressions, and not constant values. One case of this is the
evaluation of the max() macro which, due to its construction, ends up
converting constant expression arguments into a constant value result.
All attempts to rewrite this macro with __builtin_constant_p() failed
with older compilers (e.g. gcc 4.4)[2]. However, Martin Uecker,
constructed[3] a mind-shattering solution that works everywhere.
Cthulhu fhtagn!
This patch updates the min()/max() macros to evaluate to a constant
expression when called on constant expression arguments. This removes
several false-positive stack VLA warnings from an x86 allmodconfig build
when -Wvla is added:
$ diff -u before.txt after.txt | grep ^-
-drivers/input/touchscreen/cyttsp4_core.c:871:2: warning: ISO C90 forbids variable length array ‘ids’ [-Wvla]
-fs/btrfs/tree-checker.c:344:4: warning: ISO C90 forbids variable length array ‘namebuf’ [-Wvla]
-lib/vsprintf.c:747:2: warning: ISO C90 forbids variable length array ‘sym’ [-Wvla]
-net/ipv4/proc.c:403:2: warning: ISO C90 forbids variable length array ‘buff’ [-Wvla]
-net/ipv6/proc.c:198:2: warning: ISO C90 forbids variable length array ‘buff’ [-Wvla]
-net/ipv6/proc.c:218:2: warning: ISO C90 forbids variable length array ‘buff64’ [-Wvla]
This also updates two cases where different enums were being compared
and explicitly casts them to int (which matches the old side-effect of
the single-evaluation code): one in tpm/tpm_tis_core.h, and one in
drm/drm_color_mgmt.c.
[1] https://lkml.org/lkml/2018/3/7/621
[2] https://lkml.org/lkml/2018/3/10/170
[3] https://lkml.org/lkml/2018/3/20/845
Co-Developed-by: Linus Torvalds <torvalds@linux-foundation.org>
Co-Developed-by: Martin Uecker <Martin.Uecker@med.uni-goettingen.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- new driver for PhoenixRC Flight Controller Adapter
- new driver for RAVE SP Power button
- fixes for autosuspend-related deadlocks in a few unput USB dirvers
- support for 2nd wheel in ATech PS/2 mouse
- fix for ALPS trackpoint detection on Thinkpad L570 and Latitude 7370
- bunch of cleanups in various in PS/2 protocols
- other assorted changes and fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (35 commits)
Input: i8042 - enable MUX on Sony VAIO VGN-CS series to fix touchpad
Input: stmfts, s6sy761 - update my e-mail
Input: stmfts - use async probe & suspend/resume to avoid 2s delay
Input: ALPS - fix TrackStick detection on Thinkpad L570 and Latitude 7370
Input: xpad - add PDP device id 0x02a4
Input: alps - report pressure of v3 and v7 trackstick
Input: pxrc - new driver for PhoenixRC Flight Controller Adapter
Input: usbtouchscreen - do not rely on input_dev->users
Input: usbtouchscreen - fix deadlock in autosuspend
Input: pegasus_notetaker - do not rely on input_dev->users
Input: pagasus_notetaker - fix deadlock in autosuspend
Input: synaptics_usb - do not rely on input_dev->users
Input: synaptics_usb - fix deadlock in autosuspend
Input: gpio-keys - add support for wakeup event action
Input: appletouch - use true and false for boolean values
Input: silead - add Chuwi Hi8 support
Input: analog - use get_cycles() on PPC
Input: stmpe-keypad - remove VLA usage
Input: i8042 - add Lenovo ThinkPad L460 to i8042 reset list
Input: add RAVE SP Powerbutton driver
...
|
|
This change updates the mlx5 interface to create mkey
on the device.
The updates in the command mailbox include increasing the
access mode type field to 5 bits in order to support additional
types such as MLX5_MKC_ACCESS_MODE_MEMIC which represents device
memory access type and will be used when registering MR on allocated
device memory.
All the places that use the old access mode format are adjusted as
well.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
This patch adds the mlx5_ib driver implementation for the device
memory allocation API.
It implements the ib_device callbacks for allocation and deallocation
operations as well as a new mmap command support which allows mapping
an allocated device memory to a VMA.
The change also adds reporting of device memory maximum size and
alignment parameters reported in device capabilities.
The allocation/deallocation operations are using new firmware
commands to allocate MEMIC memory on the device.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
This patch adds querying of device memory capabilities by the mlx5_core
driver during initialization.
Device memory capabilities is a new capability type and structure
which contains the necessary data that is needed for future device
memory allocation.
The presence of this new capabilities struct is indicated in the
general capabilities struct which is queried first by the driver.
If the presence bit is set, the driver will also query the new
capabilities struct and save it in the device context.
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID updates from Jiri Kosina:
- 3rd generation Wacom Intuos BT device support from Aaron Armstrong
Skomra
- support for NSG-MR5U and NSG-MR7U devices from Todd Kelner
- multitouch Razer Blade Stealth support from Benjamin Tissoires
- Elantech touchpad support from Alexandrov Stansilav
- a few other scattered-around fixes and cleanups to drivers and
generic code
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (31 commits)
HID: google: Enable PM Full On mode when adjusting backlight
HID: google: add google hammer HID driver
HID: core: reset the quirks before calling probe again
HID: multitouch: do not set HID_QUIRK_NO_INIT_REPORTS
HID: core: remove the need for HID_QUIRK_NO_EMPTY_INPUT
HID: use BIT() macro for quirks too
HID: use BIT macro instead of plain integers for flags
HID: multitouch: remove dead zones of Razer Blade Stealth
HID: multitouch: export a quirk for the button handling of touchpads
HID: usbhid: extend the polling interval configuration to keyboards
HID: ntrig: document sysfs interface
HID: wacom: wacom_wac_collection() is local to wacom_wac.c
HID: wacom: generic: add the "Report Valid" usage
HID: wacom: generic: Support multiple tools per report
HID: wacom: Add support for 3rd generation Intuos BT
HID: core: rewrite the hid-generic automatic unbind
HID: sony: Add touchpad support for NSG-MR5U and NSG-MR7U remotes
HID: hid-multitouch: Use true and false for boolean values
HID: hid-ntrig: use true and false for boolean values
HID: logitech-hidpp: document sysfs interface
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"This became a large update. The changes are scattered widely, and the
majority of them are attributed to ASoC componentization. The gitk
output made me dizzy, but it's slightly better than London tube.
OK, below are some highlights:
- Continued hardening works in ALSA PCM core; most of the existing
syzkaller reports should have been covered.
- USB-audio got the initial USB Audio Class 3 support, as well as
UAC2 jack detection support and more DSD-device support.
- ASoC componentization: finally each individual driver was converted
to components framework, which is more future-proof for further
works. Most of conversations were systematic.
- Lots of fixes for Intel Baytrail / Cherrytrail devices with Realtek
codecs, typically tablets and small PCs.
- Fixes / cleanups for Samsung Odroid systems
- Cleanups in Freescale SSI driver
- New ASoC drivers:
* AKM AK4458 and AK5558 codecs
* A few AMD based machine drivers
* Intel Kabylake machine drivers
* Maxim MAX9759 codec
* Motorola CPCAP codec
* Socionext Uniphier SoCs
* TI PCM1789 and TDA7419 codecs
- Retirement of Blackfin drivers along with architecture removal"
* tag 'sound-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (497 commits)
ALSA: pcm: Fix UAF at PCM release via PCM timer access
ALSA: usb-audio: silence a static checker warning
ASoC: tscs42xx: Remove owner assignment from i2c_driver
ASoC: mediatek: remove "simple-mfd" in the example
ASoC: cpcap: replace codec to component
ASoC: Intel: bytcr_rt5651: don't use codec anymore
ASoC: amd: don't use codec anymore
ALSA: usb-audio: fix memory leak on cval
ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls
ASoC: topology: Fix kcontrol name string handling
ALSA: aloop: Mark paused device as inactive
ALSA: usb-audio: update clock valid control
ALSA: usb-audio: UAC2 jack detection
ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams
ALSA: pcm: Avoid potential races between OSS ioctls and read/write
ALSA: usb-audio: Integrate native DSD support for ITF-USB based DACs.
ALSA: usb-audio: FIX native DSD support for TEAC UD-501 DAC
ALSA: usb-audio: Add native DSD support for Luxman DA-06
ALSA: usb-audio: fix uac control query argument
ASoC: nau8824: recover system clock when device changes
...
|
|
Pull dma-mapping updates from Christoph Hellwig:
"Very light this round as the interesting dma mapping changes went
through the x86 tree.
This just provides proper stubs for architectures not supporting dma
(Geert Uytterhoeven)"
* tag 'dma-mapping-4.17' of git://git.infradead.org/users/hch/dma-mapping:
usb: gadget: Add NO_DMA dummies for DMA mapping API
scsi: Add NO_DMA dummies for SCSI DMA mapping API
mm: Add NO_DMA dummies for DMA pool API
dma-coherent: Add NO_DMA dummies for managed DMA API
dma-mapping: Convert NO_DMA get_dma_ops() into a real dummy
|
|
Push the decision whether or not to stop the tick somewhat deeper
into the idle loop.
Stopping the tick upfront leads to unpleasant outcomes in case the
idle governor doesn't agree with the nohz code on the duration of the
upcoming idle period. Specifically, if the tick has been stopped and
the idle governor predicts short idle, the situation is bad regardless
of whether or not the prediction is accurate. If it is accurate, the
tick has been stopped unnecessarily which means excessive overhead.
If it is not accurate, the CPU is likely to spend too much time in
the (shallow, because short idle has been predicted) idle state
selected by the governor [1].
As the first step towards addressing this problem, change the code
to make the tick stopping decision inside of the loop in do_idle().
In particular, do not stop the tick in the cpu_idle_poll() code path.
Also don't do that in tick_nohz_irq_exit() which doesn't really have
enough information on whether or not to stop the tick.
Link: https://marc.info/?l=linux-pm&m=150116085925208&w=2 # [1]
Link: https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/haec/powernightmares.pdf
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
Prepare the scheduler tick code for reworking the idle loop to
avoid stopping the tick in some cases.
The idea is to split the nohz idle entry call to decouple the idle
time stats accounting and preparatory work from the actual tick stop
code, in order to later be able to delay the tick stop once we reach
more power-knowledgeable callers.
Move away the tick_nohz_start_idle() invocation from
__tick_nohz_idle_enter(), rename the latter to
__tick_nohz_idle_stop_tick() and define tick_nohz_idle_stop_tick()
as a wrapper around it for calling it from the outside.
Make tick_nohz_idle_enter() only call tick_nohz_start_idle() instead
of calling the entire __tick_nohz_idle_enter(), add another wrapper
disabling and enabling interrupts around tick_nohz_idle_stop_tick()
and make the current callers of tick_nohz_idle_enter() call it too
to retain their current functionality.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO updates from Linus Walleij:
"This is the bulk of GPIO changes for the v4.17 kernel cycle:
New drivers:
- Nintendo Wii GameCube GPIO, known as "Hollywood"
- Raspberry Pi mailbox service GPIO expander
- Spreadtrum main SC9860 SoC and IEC GPIO controllers.
Improvements:
- Implemented .get_multiple() callback for most of the
high-performance industrial GPIO cards for the ISA bus.
- ISA GPIO drivers now select the ISA_BUS_API instead of depending on
it. This is merged with the same pattern for all the ISA drivers
and some other Kconfig cleanups related to this.
Cleanup:
- Delete the TZ1090 GPIO drivers following the deletion of this SoC
from the ARM tree.
- Move the documentation over to driver-api to conform with the rest
of the kernel documentation build.
- Continue to make the GPIO drivers include only
<linux/gpio/driver.h> and not the too broad <linux/gpio.h> that we
want to get rid of.
- Managed to remove VLA allocation from two drivers pending more
fixes in this area for the next merge window.
- Misc janitorial fixes"
* tag 'gpio-v4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (77 commits)
gpio: Add Spreadtrum PMIC EIC driver support
gpio: Add Spreadtrum EIC driver support
dt-bindings: gpio: Add Spreadtrum EIC controller documentation
gpio: ath79: Fix potential NULL dereference in ath79_gpio_probe()
pinctrl: qcom: Don't allow protected pins to be requested
gpiolib: Support 'gpio-reserved-ranges' property
gpiolib: Change bitmap allocation to kmalloc_array
gpiolib: Extract mask allocation into subroutine
dt-bindings: gpio: Add a gpio-reserved-ranges property
gpio: mockup: fix a potential crash when creating debugfs entries
gpio: pca953x: add compatibility for pcal6524 and pcal9555a
gpio: dwapb: Add support for a bus clock
gpio: Remove VLA from xra1403 driver
gpio: Remove VLA from MAX3191X driver
gpio: ws16c48: Implement get_multiple callback
gpio: gpio-mm: Implement get_multiple callback
gpio: 104-idi-48: Implement get_multiple callback
gpio: 104-dio-48e: Implement get_multiple callback
gpio: pcie-idio-24: Implement get_multiple/set_multiple callbacks
gpio: pci-idio-16: Implement get_multiple callback
...
|
|
It may be useful for an architecture to override the definitions of the
COMPAT_SYSCALL_DEFINE0() and __COMPAT_SYSCALL_DEFINEx() macros in
<linux/compat.h>, in particular to use a different calling convention
for syscalls. This patch provides a mechanism to do so, based on the
previously introduced CONFIG_ARCH_HAS_SYSCALL_WRAPPER. If it is enabled,
<asm/sycall_wrapper.h> is included in <linux/compat.h> and may be used
to define the macros mentioned above. Moreover, as the syscall calling
convention may be different if CONFIG_ARCH_HAS_SYSCALL_WRAPPER is set,
the compat syscall function prototypes in <linux/compat.h> are #ifndef'd
out in that case.
As some of the syscalls and/or compat syscalls may not be present,
the COND_SYSCALL() and COND_SYSCALL_COMPAT() macros in kernel/sys_ni.c
as well as the SYS_NI() and COMPAT_SYS_NI() macros in
kernel/time/posix-stubs.c can be re-defined in <asm/syscall_wrapper.h> iff
CONFIG_ARCH_HAS_SYSCALL_WRAPPER is enabled.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180405095307.3730-5-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
64-bit syscalls
Let's make use of ARCH_HAS_SYSCALL_WRAPPER=y on pure 64-bit x86-64 systems:
Each syscall defines a stub which takes struct pt_regs as its only
argument. It decodes just those parameters it needs, e.g:
asmlinkage long sys_xyzzy(const struct pt_regs *regs)
{
return SyS_xyzzy(regs->di, regs->si, regs->dx);
}
This approach avoids leaking random user-provided register content down
the call chain.
For example, for sys_recv() which is a 4-parameter syscall, the assembly
now is (in slightly reordered fashion):
<sys_recv>:
callq <__fentry__>
/* decode regs->di, ->si, ->dx and ->r10 */
mov 0x70(%rdi),%rdi
mov 0x68(%rdi),%rsi
mov 0x60(%rdi),%rdx
mov 0x38(%rdi),%rcx
[ SyS_recv() is automatically inlined by the compiler,
as it is not [yet] used anywhere else ]
/* clear %r9 and %r8, the 5th and 6th args */
xor %r9d,%r9d
xor %r8d,%r8d
/* do the actual work */
callq __sys_recvfrom
/* cleanup and return */
cltq
retq
The only valid place in an x86-64 kernel which rightfully calls
a syscall function on its own -- vsyscall -- needs to be modified
to pass struct pt_regs onwards as well.
To keep the syscall table generation working independent of
SYSCALL_PTREGS being enabled, the stubs are named the same as the
"original" syscall stubs, i.e. sys_*().
This patch is based on an original proof-of-concept
| From: Linus Torvalds <torvalds@linux-foundation.org>
| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
and was split up and heavily modified by me, in particular to base it on
ARCH_HAS_SYSCALL_WRAPPER, to limit it to 64-bit-only for the time being,
and to update the vsyscall to the new calling convention.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180405095307.3730-4-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
It may be useful for an architecture to override the definitions of the
SYSCALL_DEFINE0() and __SYSCALL_DEFINEx() macros in <linux/syscalls.h>,
in particular to use a different calling convention for syscalls.
This patch provides a mechanism to do so: It introduces
CONFIG_ARCH_HAS_SYSCALL_WRAPPER. If it is enabled, <asm/sycall_wrapper.h>
is included in <linux/syscalls.h> and may be used to define the macros
mentioned above. Moreover, as the syscall calling convention may be
different if CONFIG_ARCH_HAS_SYSCALL_WRAPPER is set, the syscall function
prototypes in <linux/syscalls.h> are #ifndef'd out in that case.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180405095307.3730-3-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull a few small generic code cleanups.
|
|
Pull Razer Blade Stealth support improvement and a few generic cleanups
|