Age | Commit message (Collapse) | Author |
|
Clean up unmap_and_move_huge_page() by converting move_hugetlb_state() to
take in folios.
[akpm@linux-foundation.org: fix CONFIG_HUGETLB_PAGE=n build]
Link: https://lkml.kernel.org/r/20221101223059.460937-10-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Bui Quang Minh <minhquangbui99@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Continue to use a folio inside free_huge_page() by converting
hugetlb_cgroup_uncharge_page*() to folios.
Link: https://lkml.kernel.org/r/20221101223059.460937-8-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Bui Quang Minh <minhquangbui99@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Cleans up intermediate page to folio conversion code in
hugetlb_cgroup_migrate() by changing its arguments from pages to folios.
Link: https://lkml.kernel.org/r/20221101223059.460937-5-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Bui Quang Minh <minhquangbui99@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Allows __prep_new_huge_page() to operate on a folio by converting
set_hugetlb_cgroup*() to take in a folio.
Link: https://lkml.kernel.org/r/20221101223059.460937-4-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Bui Quang Minh <minhquangbui99@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Introduce folios in __remove_hugetlb_page() by converting
hugetlb_cgroup_from_page() to use folios.
Also gets rid of unsed hugetlb_cgroup_from_page_resv() function.
Link: https://lkml.kernel.org/r/20221101223059.460937-3-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Bui Quang Minh <minhquangbui99@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "convert hugetlb_cgroup helper functions to folios", v2.
This patch series continues the conversion of hugetlb code from being
managed in pages to folios by converting many of the hugetlb_cgroup helper
functions to use folios. This allows the core hugetlb functions to pass
in a folio to these helper functions.
This patch (of 9);
Change __set_hugetlb_cgroup() to use folios so it is explicit that the
function operates on a head page.
Link: https://lkml.kernel.org/r/20221101223059.460937-1-sidhartha.kumar@oracle.com
Link: https://lkml.kernel.org/r/20221101223059.460937-2-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Bui Quang Minh <minhquangbui99@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Mina Almasry <almasrymina@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Direct reclaim stats are useful for identifying a potential source for
application latency, as well as spotting issues with kswapd. However,
khugepaged currently distorts the picture: as a kernel thread it doesn't
impose allocation latencies on userspace, and it explicitly opts out of
kswapd reclaim. Its activity showing up in the direct reclaim stats is
misleading. Counting it as kswapd reclaim could also cause confusion when
trying to understand actual kswapd behavior.
Break out khugepaged from the direct reclaim counters into new
pgsteal_khugepaged, pgdemote_khugepaged, pgscan_khugepaged counters.
Test with a huge executable (CONFIG_READ_ONLY_THP_FOR_FS):
pgsteal_kswapd 1342185
pgsteal_direct 0
pgsteal_khugepaged 3623
pgscan_kswapd 1345025
pgscan_direct 0
pgscan_khugepaged 3623
Link: https://lkml.kernel.org/r/20221026180133.377671-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Eric Bergen <ebergen@meta.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Cannot call memory_failure() directly from the fault handler because
mmap_lock (and others) are held.
It is important, but not urgent, to mark the source page as h/w poisoned
and unmap it from other tasks.
Use memory_failure_queue() to request a call to memory_failure() for the
page with the error.
Also provide a stub version for CONFIG_MEMORY_FAILURE=n
Link: https://lkml.kernel.org/r/20221021200120.175753-3-tony.luck@intel.com
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Copy-on-write poison recovery", v3.
Part 1 deals with the process that triggered the copy on write fault with
a store to a shared read-only page. That process is send a SIGBUS with
the usual machine check decoration to specify the virtual address of the
lost page, together with the scope.
Part 2 sets up to asynchronously take the page with the uncorrected error
offline to prevent additional machine check faults. H/t to Miaohe Lin
<linmiaohe@huawei.com> and Shuai Xue <xueshuai@linux.alibaba.com> for
pointing me to the existing function to queue a call to memory_failure().
On x86 there is some duplicate reporting (because the error is also
signalled by the memory controller as well as by the core that triggered
the machine check). Console logs look like this:
This patch (of 2):
If the kernel is copying a page as the result of a copy-on-write
fault and runs into an uncorrectable error, Linux will crash because
it does not have recovery code for this case where poison is consumed
by the kernel.
It is easy to set up a test case. Just inject an error into a private
page, fork(2), and have the child process write to the page.
I wrapped that neatly into a test at:
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/ras-tools.git
just enable ACPI error injection and run:
# ./einj_mem-uc -f copy-on-write
Add a new copy_user_highpage_mc() function that uses copy_mc_to_kernel()
on architectures where that is available (currently x86 and powerpc).
When an error is detected during the page copy, return VM_FAULT_HWPOISON
to caller of wp_page_copy(). This propagates up the call stack. Both x86
and powerpc have code in their fault handler to deal with this code by
sending a SIGBUS to the application.
Note that this patch avoids a system crash and signals the process that
triggered the copy-on-write action. It does not take any action for the
memory error that is still in the shared page. To handle that a call to
memory_failure() is needed. But this cannot be done from wp_page_copy()
because it holds mmap_lock(). Perhaps the architecture fault handlers
can deal with this loose end in a subsequent patch?
On Intel/x86 this loose end will often be handled automatically because
the memory controller provides an additional notification of the h/w
poison in memory, the handler for this will call memory_failure(). This
isn't a 100% solution. If there are multiple errors, not all may be
logged in this way.
[tony.luck@intel.com: add call to kmsan_unpoison_memory(), per Miaohe Lin]
Link: https://lkml.kernel.org/r/20221031201029.102123-2-tony.luck@intel.com
Link: https://lkml.kernel.org/r/20221021200120.175753-1-tony.luck@intel.com
Link: https://lkml.kernel.org/r/20221021200120.175753-2-tony.luck@intel.com
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The percpu_counter is used for scenarios where performance is more
important than the accuracy. For percpu_counter users, who want more
accurate information in their slowpath, percpu_counter_sum is provided
which traverses all the online CPUs to accumulate the data. The reason it
only needs to traverse online CPUs is because percpu_counter does
implement CPU offline callback which syncs the local data of the offlined
CPU.
However there is a small race window between the online CPUs traversal of
percpu_counter_sum and the CPU offline callback. The offline callback has
to traverse all the percpu_counters on the system to flush the CPU local
data which can be a lot. During that time, the CPU which is going offline
has already been published as offline to all the readers. So, as the
offline callback is running, percpu_counter_sum can be called for one
counter which has some state on the CPU going offline. Since
percpu_counter_sum only traverses online CPUs, it will skip that specific
CPU and the offline callback might not have flushed the state for that
specific percpu_counter on that offlined CPU.
Normally this is not an issue because percpu_counter users can deal with
some inaccuracy for small time window. However a new user i.e. mm_struct
on the cleanup path wants to check the exact state of the percpu_counter
through check_mm(). For such users, this patch introduces
percpu_counter_sum_all() which traverses all possible CPUs and it is used
in fork.c:check_mm() to avoid the potential race.
This issue is exposed by the later patch "mm: convert mm's rss stats into
percpu_counter".
Link: https://lkml.kernel.org/r/20221109012011.881058-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently mm_struct maintains rss_stats which are updated on page fault
and the unmapping codepaths. For page fault codepath the updates are
cached per thread with the batch of TASK_RSS_EVENTS_THRESH which is 64.
The reason for caching is performance for multithreaded applications
otherwise the rss_stats updates may become hotspot for such applications.
However this optimization comes with the cost of error margin in the rss
stats. The rss_stats for applications with large number of threads can be
very skewed. At worst the error margin is (nr_threads * 64) and we have a
lot of applications with 100s of threads, so the error margin can be very
high. Internally we had to reduce TASK_RSS_EVENTS_THRESH to 32.
Recently we started seeing the unbounded errors for rss_stats for specific
applications which use TCP rx0cp. It seems like vm_insert_pages()
codepath does not sync rss_stats at all.
This patch converts the rss_stats into percpu_counter to convert the error
margin from (nr_threads * 64) to approximately (nr_cpus ^ 2). However
this conversion enable us to get the accurate stats for situations where
accuracy is more important than the cpu cost.
This patch does not make such tradeoffs - we can just use
percpu_counter_add_local() for the updates and percpu_counter_sum() (or
percpu_counter_sync() + percpu_counter_read) for the readers. At the
moment the readers are either procfs interface, oom_killer and memory
reclaim which I think are not performance critical and should be ok with
slow read. However I think we can make that change in a separate patch.
Link: https://lkml.kernel.org/r/20221024052841.3291983-1-shakeelb@google.com
Signed-off-by: Shakeel Butt <shakeelb@google.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The networking programs typically don't require CAP_PERFMON, but through kfuncs
like bpf_cast_to_kern_ctx() they can access memory through PTR_TO_BTF_ID. In
such case enforce CAP_PERFMON.
Also make sure that only GPL programs can access kernel data structures.
All kfuncs require GPL already.
Also remove allow_ptr_to_map_access. It's the same as allow_ptr_leaks and
different name for the same check only causes confusion.
Fixes: fd264ca02094 ("bpf: Add a kfunc to type cast from bpf uapi ctx to kernel ctx")
Fixes: 50c6b8a9aea2 ("selftests/bpf: Add a test for btf_type_tag "percpu"")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221125220617.26846-1-alexei.starovoitov@gmail.com
|
|
Merge my series [1] to deprecate the SLOB allocator.
- Renames CONFIG_SLOB to CONFIG_SLOB_DEPRECATED with deprecation notice.
- The recommended replacement is CONFIG_SLUB, optionally with the new
CONFIG_SLUB_TINY tweaks for systems with 16MB or less RAM.
- Use cases that stopped working with CONFIG_SLUB_TINY instead of SLOB
should be reported to linux-mm@kvack.org and slab maintainers,
otherwise SLOB will be removed in few cycles.
[1] https://lore.kernel.org/all/20221121171202.22080-1-vbabka@suse.cz/
|
|
SLUB gets most of its scalability by percpu slabs. However for
CONFIG_SLUB_TINY the goal is minimal memory overhead, not scalability.
Thus, #ifdef out the whole kmem_cache_cpu percpu structure and
associated code. Additionally to the slab page savings, this reduces
percpu allocator usage, and code size.
This change builds on recent commit c7323a5ad078 ("mm/slub: restrict
sysfs validation to debug caches and make it safe"), as caches with
enabled debugging also avoid percpu slabs and all allocations and
freeing ends up working with the partial list. With a bit more
refactoring by the preceding patches, use the same code paths with
CONFIG_SLUB_TINY.
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
|
|
|
|
license_is_gpl_compatible"
It causes build failures with unusual CC/HOSTCC combinations.
Quoting
https://lkml.kernel.org/r/A222B1E6-69B8-4085-AD1B-27BDB72CA971@goldelico.com:
HOSTCC scripts/mod/modpost.o - due to target missing
In file included from include/linux/string.h:5,
from scripts/mod/../../include/linux/license.h:5,
from scripts/mod/modpost.c:24:
include/linux/compiler.h:246:10: fatal error: asm/rwonce.h: No such file or directory
246 | #include <asm/rwonce.h>
| ^~~~~~~~~~~~~~
compilation terminated.
...
The problem is that HOSTCC is not necessarily the same compiler or even
architecture as CC and pulling in <linux/compiler.h> or <asm/rwonce.h>
files indirectly isn't a good idea then.
My toolchain is providing HOSTCC = gcc (MacPorts) and CC = arm-linux-gnueabihf
(built from gcc source) and all running on Darwin.
If I change the include to <string.h> I can then "HOSTCC scripts/mod/modpost.c"
but then it fails for "CC kernel/module/main.c" not finding <string.h>:
CC kernel/module/main.o - due to target missing
In file included from kernel/module/main.c:43:0:
./include/linux/license.h:5:20: fatal error: string.h: No such file or directory
#include <string.h>
^
compilation terminated.
Reported-by: "H. Nikolaus Schaller" <hns@goldelico.com>
Cc: Sam James <sam@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Since commit 70cbc3cc78a99 ("mm: gup: fix the fast GUP race against THP
collapse"), the lockless_pages_from_mm() fastpath rechecks the pmd_t to
ensure that the page table was not removed by khugepaged in between.
However, lockless_pages_from_mm() still requires that the page table is
not concurrently freed. Fix it by sending IPIs (if the architecture uses
semi-RCU-style page table freeing) before freeing/reusing page tables.
Link: https://lkml.kernel.org/r/20221129154730.2274278-2-jannh@google.com
Link: https://lkml.kernel.org/r/20221128180252.1684965-2-jannh@google.com
Link: https://lkml.kernel.org/r/20221125213714.4115729-2-jannh@google.com
Fixes: ba76149f47d8 ("thp: khugepaged")
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When running as a Xen PV guests commit eed9a328aa1a ("mm: x86: add
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG") can cause a protection violation in
pmdp_test_and_clear_young():
BUG: unable to handle page fault for address: ffff8880083374d0
#PF: supervisor write access in kernel mode
#PF: error_code(0x0003) - permissions violation
PGD 3026067 P4D 3026067 PUD 3027067 PMD 7fee5067 PTE 8010000008337065
Oops: 0003 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 158 Comm: kswapd0 Not tainted 6.1.0-rc5-20221118-doflr+ #1
RIP: e030:pmdp_test_and_clear_young+0x25/0x40
This happens because the Xen hypervisor can't emulate direct writes to
page table entries other than PTEs.
This can easily be fixed by introducing arch_has_hw_nonleaf_pmd_young()
similar to arch_has_hw_pte_young() and test that instead of
CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG.
Link: https://lkml.kernel.org/r/20221123064510.16225-1-jgross@suse.com
Fixes: eed9a328aa1a ("mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Yu Zhao <yuzhao@google.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: David Hildenbrand <david@redhat.com> [core changes]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In order to avoid #ifdeffery add a dummy pmd_young() implementation as a
fallback. This is required for the later patch "mm: introduce
arch_has_hw_nonleaf_pmd_young()".
Link: https://lkml.kernel.org/r/fd3ac3cd-7349-6bbd-890a-71a9454ca0b3@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear page
tables associated with the address range. For hugetlb vmas,
zap_page_range will call __unmap_hugepage_range_final. However,
__unmap_hugepage_range_final assumes the passed vma is about to be removed
and deletes the vma_lock to prevent pmd sharing as the vma is on the way
out. In the case of madvise(MADV_DONTNEED) the vma remains, but the
missing vma_lock prevents pmd sharing and could potentially lead to issues
with truncation/fault races.
This issue was originally reported here [1] as a BUG triggered in
page_try_dup_anon_rmap. Prior to the introduction of the hugetlb
vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to
prevent pmd sharing. Subsequent faults on this vma were confused as
VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping was
not set in new pages added to the page table. This resulted in pages that
appeared anonymous in a VM_SHARED vma and triggered the BUG.
Address issue by adding a new zap flag ZAP_FLAG_UNMAP to indicate an unmap
call from unmap_vmas(). This is used to indicate the 'final' unmapping of
a hugetlb vma. When called via MADV_DONTNEED, this flag is not set and
the vm_lock is not deleted.
[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6JudD0g@mail.gmail.com/
Link: https://lkml.kernel.org/r/20221114235507.294320-3-mike.kravetz@oracle.com
Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Wei Chen <harperchen1110@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This series addresses the issue first reported in [1], and fully described
in patch 2. Patches 1 and 2 address the user visible issue and are tagged
for stable backports.
While exploring solutions to this issue, related problems with mmu
notification calls were discovered. This is addressed in the patch
"hugetlb: remove duplicate mmu notifications:". Since there are no user
visible effects, this third is not tagged for stable backports.
Previous discussions suggested further cleanup by removing the
routine zap_page_range. This is possible because zap_page_range_single
is now exported, and all callers of zap_page_range pass ranges entirely
within a single vma. This work will be done in a later patch so as not
to distract from this bug fix.
[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6JudD0g@mail.gmail.com/
This patch (of 2):
Expose the routine zap_page_range_single to zap a range within a single
vma. The madvise routine madvise_dontneed_single_vma can use this routine
as it explicitly operates on a single vma. Also, update the mmu
notification range in zap_page_range_single to take hugetlb pmd sharing
into account. This is required as MADV_DONTNEED supports hugetlb vmas.
Link: https://lkml.kernel.org/r/20221114235507.294320-1-mike.kravetz@oracle.com
Link: https://lkml.kernel.org/r/20221114235507.294320-2-mike.kravetz@oracle.com
Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Wei Chen <harperchen1110@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Syzbot reported the below splat:
WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 __alloc_pages_node
include/linux/gfp.h:221 [inline]
WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221
hpage_collapse_alloc_page mm/khugepaged.c:807 [inline]
WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221
alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963
Modules linked in:
CPU: 1 PID: 3646 Comm: syz-executor210 Not tainted
6.1.0-rc1-syzkaller-00454-ga70385240892 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 10/11/2022
RIP: 0010:__alloc_pages_node include/linux/gfp.h:221 [inline]
RIP: 0010:hpage_collapse_alloc_page mm/khugepaged.c:807 [inline]
RIP: 0010:alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963
Code: e5 01 4c 89 ee e8 6e f9 ae ff 4d 85 ed 0f 84 28 fc ff ff e8 70 fc
ae ff 48 8d 6b ff 4c 8d 63 07 e9 16 fc ff ff e8 5e fc ae ff <0f> 0b e9
96 fa ff ff 41 bc 1a 00 00 00 e9 86 fd ff ff e8 47 fc ae
RSP: 0018:ffffc90003fdf7d8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888077f457c0 RSI: ffffffff81cd8f42 RDI: 0000000000000001
RBP: ffff888079388c0c R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f6b48ccf700(0000) GS:ffff8880b9b00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6b48a819f0 CR3: 00000000171e7000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
collapse_file+0x1ca/0x5780 mm/khugepaged.c:1715
hpage_collapse_scan_file+0xd6c/0x17a0 mm/khugepaged.c:2156
madvise_collapse+0x53a/0xb40 mm/khugepaged.c:2611
madvise_vma_behavior+0xd0a/0x1cc0 mm/madvise.c:1066
madvise_walk_vmas+0x1c7/0x2b0 mm/madvise.c:1240
do_madvise.part.0+0x24a/0x340 mm/madvise.c:1419
do_madvise mm/madvise.c:1432 [inline]
__do_sys_madvise mm/madvise.c:1432 [inline]
__se_sys_madvise mm/madvise.c:1430 [inline]
__x64_sys_madvise+0x113/0x150 mm/madvise.c:1430
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f6b48a4eef9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 15 00 00 90 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6b48ccf318 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007f6b48af0048 RCX: 00007f6b48a4eef9
RDX: 0000000000000019 RSI: 0000000000600003 RDI: 0000000020000000
RBP: 00007f6b48af0040 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6b48aa53a4
R13: 00007f6b48bffcbf R14: 00007f6b48ccf400 R15: 0000000000022000
</TASK>
It is because khugepaged allocates pages with __GFP_THISNODE, but the
preferred node is bogus. The previous patch fixed the khugepaged code to
avoid allocating page from non-existing node. But it is still racy
against memory hotremove. There is no synchronization with the memory
hotplug so it is possible that memory gets offline during a longer taking
scanning.
So this warning still seems not quite helpful because:
* There is no guarantee the node is online for __GFP_THISNODE context
for all the callsites.
* Kernel just fails the allocation regardless the warning, and it looks
all callsites handle the allocation failure gracefully.
Although while the warning has helped to identify a buggy code, it is not
safe in general and this warning could panic the system with panic-on-warn
configuration which tends to be used surprisingly often. So replace
VM_WARN_ON to pr_warn(). And the warning will be triggered if
__GFP_NOWARN is set since the allocator would print out warning for such
case if __GFP_NOWARN is not set.
[shy828301@gmail.com: rename nid to this_node and gfp to warn_gfp]
Link: https://lkml.kernel.org/r/20221123193014.153983-1-shy828301@gmail.com
[akpm@linux-foundation.org: fix whitespace]
[akpm@linux-foundation.org: print gfp_mask instead of warn_gfp, per Michel]
Link: https://lkml.kernel.org/r/20221108184357.55614-3-shy828301@gmail.com
Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reported-by: <syzbot+0044b22d177870ee974f@syzkaller.appspotmail.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
'srcunmisafe.2022.11.09a', 'torture.2022.10.18c' and 'torturescript.2022.10.20a' into HEAD
doc.2022.10.20a: Documentation updates.
fixes.2022.10.21a: Miscellaneous fixes.
lazy.2022.11.30a: Lazy call_rcu() and NOCB updates.
srcunmisafe.2022.11.09a: NMI-safe SRCU readers.
torture.2022.10.18c: Torture-test updates.
torturescript.2022.10.20a: Torture-test scripting updates.
|
|
The kobject embedded into the request_queue is used for the queue
directory in sysfs, but that is a child of the gendisks directory and is
intimately tied to it. Move this kobject to the gendisk and use a
refcount_t in the request_queue for the actual request_queue refcounting
that is completely unrelated to the device model.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20221114042637.1009333-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a new parameter to complement the existing 'netmask' option. The
main difference between netmask and bitmask is that bitmask takes any
arbitrary ip address as input, it does not have to be a valid netmask.
The name of the new parameter is 'bitmask'. This lets us mask out
arbitrary bits in the ip address, for example:
ipset create set1 hash:ip bitmask 255.128.255.0
ipset create set2 hash:ip,port family inet6 bitmask ffff::ff80
Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Joshua Hunt <johunt@akamai.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
No need to have distinct functions. After merge, ipv6 can avoid
protooff computation if the connection neither needs sequence adjustment
nor helper invocation -- this is the normal case.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Reshuffle issue flags to keep normal flags separate from the uring_cmd
ctx-setup like flags. Shift the second type to the second byte so it's
easier to add new ones in the future.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d6e4696c883943082d248716f4cd568f37b17a74.1669821213.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
SCTP conntrack currently assumes that the SCTP endpoints will
probe secondary paths using HEARTBEAT before sending traffic.
But, according to RFC 9260, SCTP endpoints can send any traffic
on any of the confirmed paths after SCTP association is up.
SCTP endpoints that sends INIT will confirm all peer addresses
that upper layer configures, and the SCTP endpoint that receives
COOKIE_ECHO will only confirm the address it sent the INIT_ACK to.
So, we can have a situation where the INIT sender can start to
use secondary paths without the need to send HEARTBEAT. This patch
allows DATA/SACK packets to create new connection tracking entry.
A new state has been added to indicate that a DATA/SACK chunk has
been seen in the original direction - SCTP_CONNTRACK_DATA_SENT.
State transitions mostly follows the HEARTBEAT_SENT, except on
receiving HEARTBEAT/HEARTBEAT_ACK/DATA/SACK in the reply direction.
State transitions in original direction:
- DATA_SENT behaves similar to HEARTBEAT_SENT for all chunks,
except that it remains in DATA_SENT on receving HEARTBEAT,
HEARTBEAT_ACK/DATA/SACK chunks
State transitions in reply direction:
- DATA_SENT behaves similar to HEARTBEAT_SENT for all chunks,
except that it moves to HEARTBEAT_ACKED on receiving
HEARTBEAT/HEARTBEAT_ACK/DATA/SACK chunks
Note: This patch still doesn't solve the problem when the SCTP
endpoint decides to use primary paths for association establishment
but uses a secondary path for association shutdown. We still have
to depend on timeout for connections to expire in such a case.
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
into soc/drivers
This pull request contains Broadcom SoCs driver changes for 6.2, please
pull the following:
- Yuan uses dev_err_probe() in the Raspberry Pi firmware provider to
simplify the error handling code
- Rafal adds support for initialiazing the BCM47xx NVMEM/NVRAM firmware
provider out of memory-mapped flash devices.
* tag 'arm-soc/for-6.2/drivers' of https://github.com/Broadcom/stblinux:
firmware/nvram: bcm47xx: support init from IO memory
firmware: raspberrypi: Use dev_err_probe() to simplify code
Link: https://lore.kernel.org/r/20221129191755.542584-3-f.fainelli@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
This reverts commit dae590a6c96c799434e0ff8156ef29b88c257e60.
We've had a few reports on this causing a crash at boot time, because
of a reference issue. While this problem seemginly did exist before
the patch and needs solving separately, this patch makes it a lot
easier to trigger.
Link: https://lore.kernel.org/linux-block/CA+QYu4oxiRKC6hJ7F27whXy-PRBx=Tvb+-7TQTONN8qTtV3aDA@mail.gmail.com/
Link: https://lore.kernel.org/linux-block/69af7ccb-6901-c84c-0e95-5682ccfb750c@acm.org/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into soc/dt
Qualcomm ARM64 DTS updates for 6.2
This introduces support for SM4250, SM6115, SM6375 and SDM670 platforms
and Sony Xperia 10 IV, Google Pixel 3a, OnePlus 3, OnePlus 3T, Google
Pazquel and OnePlus Nord N100.
A wide variety of updates to align with DeviceTree bindings across
many/most platforms is introduced, and incorrectly styled comments are
adjusted across the tree.
Apps RSC is added to the cluster-idle power-domain across SM8150,
SM8250, SM8350 and SM8450, to ensure sleep and wake votes are flushed as
the last core is being powered down.
Remoteproc firmware patches are aligned with agreed upon structure used
in linux-firmware across Inforce 6560, Lenovo Miix 630, various Sony
Xperia devices and Samsung Galaxy Book2 (although these are not
available in linux-firmware today).
On IPQ8074 CPU clocks are added, thermal zones are introduced and vqmmc
supply is specified for the HK01 board.
Alcatel OneTouch Idol 3 gains LED nodes and Samsung Galaxy A3U gained
vibrator support.
The application subsystem's IOMMU and the display subsystem is enabled
for MSM8953.
A new CPU frequency table is introduced for MSM8996Pro, to properly
describe it separate of MSM8996. The GPU opp-table is extended as well.
On SC7180 USB is marked as a wakeup source, USB gains required-opps to
ensure that the core voltage rail is voted for as needed. The
description of the fingerprint sensor in Trogdor is corrected.
On SC7280 Wake-on-WLAN is introduced, and PHY parameters for the SNPS
USB PHY is defined across SC7280.
The memory map across Google Herobrine is adjusted, to regain unused
memory on the WiFi SKUs. A LTE SKU of the Evoker board is introduced
and the bard gains touchscreen.
NVME support is disabled on Villager boards, as it's not used.
PCIe support is introduced on SC8280XP, with NVMe, SDX55 (5G) and WiFi
enabled on the Lenovo Thinkpad X13s and Compute Reference Device. ADCs
and thermal zones are intrduced for the same. Lenovo Thinkpad X13s
gains LID switch support.
Fairphone FP3 gains touchscreen support.
Support for Xiaomi Poco F1 variant with EBBG panel.
The round-robin ADC is enabled across DB845c, OnePlus devices and
Pocophone F1 devices.
The displayport controller on SDM845 is introduced.
SM6350 gains SDHCI support and on Sony Xperia 10 III sd-card,
touchscreen and GPI DMA is enabled.
Fairphone FP4 got SD-card support.
UFS PHY register ranges are corrected across SM8150, SM8250, SM8350 and
SM8450.
Sony Xperia 1 II got NFC support and Sony Xperia 5 III got PMIC
regulators defined and USB definition corrected, to enable USB3.
The SDHCI controller is described for SM8450 and microSD support is
enabled for the HDK and QRD devices.
SM8450 also gains camera CCI interface and display clock controller.
* tag 'qcom-arm64-for-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (261 commits)
arm64: dts: qcom: sdm845-polaris: Don't duplicate DMA assignment
arm64: dts: qcom: sm8350-sagami: Wire up USB regulators and fix USB3
arm64: dts: qcom: sm8350-sagami: Add most RPMh regulators
arm64: dts: qcom: sc7280: Make herobrine-audio-rt5682 mic dtsi's match more
arm64: dts: qcom: trim addresses to 8 digits
arm64: dts: msm8998: unify PCIe clock order withMSM8996
arm64: dts: msm8998: add MSM8998 specific compatible
arm64: dts: qcom: sc8280xp-x13s: enable WiFi controller
arm64: dts: qcom: sc8280xp-x13s: enable modem
arm64: dts: qcom: sc8280xp-x13s: enable NVMe SSD
arm64: dts: qcom: sc8280xp-crd: enable WiFi controller
arm64: dts: qcom: sc8280xp-crd: enable SDX55 modem
arm64: dts: qcom: sc8280xp-crd: enable NVMe SSD
arm64: dts: qcom: sc8280xp-crd: rename backlight and misc regulators
arm64: dts: qcom: sa8295p-adp: enable PCIe
arm64: dts: qcom: sc8280xp/sa8540p: add PCIe2-4 nodes
arm64: dts: qcom: add sdm670 and pixel 3a device trees
arm64: dts: qcom: sc7280: Add Google Herobrine WIFI SKU dts fragment
arm64: dts: qcom: sc7280: Mark all Qualcomm reference boards as LTE
arm64: dts: qcom: sm7225-fairphone-fp4: Enable SD card
...
Link: https://lore.kernel.org/r/20221124100650.1982448-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
There are a number of places in the kernel that are accessing the
inode->i_flctx field without smp_load_acquire. This is required to
ensure that the caller doesn't see a partially-initialized structure.
Add a new accessor function for it to make this clear and convert all of
the relevant accesses in locks.c to use it. Also, convert
locks_free_lock_context to use the helper as well instead of just doing
a "bare" assignment.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
|
|
Ceph has a need to know whether a particular inode has any locks set on
it. It's currently tracking that by a num_locks field in its
filp->private_data, but that's problematic as it tries to decrement this
field when releasing locks and that can race with the file being torn
down.
Add a new vfs_inode_has_locks helper that just returns whether any locks
are currently held on the inode.
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
|
|
https://gitlab.freedesktop.org/lumag/msm into drm-next
drm/msm updates for 6.2
Core:
- MSM_INFO_GET_FLAGS support
- Cleaned up MSM IOMMU wrapper code
DPU:
- Added support for XR30 and P010 image formats
- Reworked MDSS/DPU schema, added SM8250 MDSS bindings
- Added Qualcomm SM6115 support
DP:
- Dropped unsane sanity checks
DSI:
- Fix calculation of DSC pps payload
DSI PHY:
- DSI PHY support for QCM2290
HDMI:
- Reworked dev init path
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20221126102141.721353-1-dmitry.baryshkov@linaro.org
|
|
MLX5_UMR_KLM_ALIGNMENT is in units of number of entries, while
MLX5_UMR_MTT_ALIGNMENT (generalized and renamed to
MLX5_UMR_FLEX_ALIGNMENT) is in byte units. This is misleading and
confusing.
Replace this KLM definition with one based on the generic definition.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Per the device spec, MLX5_UMR_MTT_ALIGNMENT is good not only for UMR MTT
entries, but for all other entries as well, like KLMs and KSMs.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Defines MLX5_UMR_MTT_MASK and MLX5_UMR_MTT_MIN_CHUNK_SIZE are not in
use. Remove them.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Per the device spec, MTTs/KLMs list in a UMR WQE must be aligned to 64B.
Per our SW design, the MTT/KLMs list would need alignment only if it's
too small, for example on PPC when PAGE_SIZE is 64KB, and only 4 pages
are needed to cover a MPWQE of size 256KB.
Padding, if needed, is taken into account when calculating the UMR WQE
fields (ds_cnt and xlt_octowords), however no entries are provided,
instead garbage is passed.
No real harm though, as these parts act as gaps between the RX MPWQEs
and not used by any of them. Hence, in practice, device does not try to
write any incoming packet to them. Still, prefer providing clean padding
marking the end of the list, and do not map garbage into the RQ memory
region.
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Gal Pressman <gal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
Remove mlx5_priv.ctx_list and ctx_lock which are no longer used after
commit 601c10c89cbb ("net/mlx5: Delete custom device management logic").
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
ipsec-next 2022-11-26
1) Remove redundant variable in esp6.
From Colin Ian King.
2) Update x->lastused for every packet. It was used only for
outgoing mobile IPv6 packets, but showed to be usefull
to check if the a SA is still in use in general.
From Antony Antony.
3) Remove unused variable in xfrm_byidx_resize.
From Leon Romanovsky.
4) Finalize extack support for xfrm.
From Sabrina Dubroca.
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: add extack to xfrm_set_spdinfo
xfrm: add extack to xfrm_alloc_userspi
xfrm: add extack to xfrm_do_migrate
xfrm: add extack to xfrm_new_ae and xfrm_replay_verify_len
xfrm: add extack to xfrm_del_sa
xfrm: add extack to xfrm_add_sa_expire
xfrm: a few coding style clean ups
xfrm: Remove not-used total variable
xfrm: update x->lastused for every packet
esp6: remove redundant variable err
====================
Link: https://lore.kernel.org/r/20221126110303.1859238-1-steffen.klassert@secunet.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://gitlab.freedesktop.org/drm/tegra into drm-next
drm/tegra: Changes for v6.2-rc1
This contains a bunch of cleanups across the board as well as support
for the NVDEC hardware found on the Tegra234 SoC.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thierry Reding <thierry.reding@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221125155219.3352952-1-thierry.reding@gmail.com
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/accel into drm-next
This tag contains the patches that add the new compute acceleration
subsystem, which is part of the DRM subsystem.
The patches:
- Add a new directory at drivers/accel.
- Add a new major (261) for compute accelerators.
- Add a new DRM minor type for compute accelerators.
- Integrate the accel core code with DRM core code.
- Add documentation for the accel subsystem.
Signed-off-by: Dave Airlie <airlied@redhat.com>
some acks from the list (some are in the patch series):
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Sonal Santan <sonal.santan@amd.com>
Acked-by: Maxime Ripard <maxime@cerno.tech>
Acked-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Tested-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
From: Oded Gabbay <ogabbay@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20221122112222.GA352082@ogabbay-vm-u20.habana-labs.com
|
|
Kernels configured with CONFIG_PRINTK=n and CONFIG_SRCU=n get build
failures. This causes trouble for deep embedded systems. But given
that there are more than 25 instances of "select SRCU" in the kernel,
it is hard to believe that there are many kernels running in production
without SRCU. This commit therefore makes SRCU mandatory. The SRCU
Kconfig option remains for backwards compatibility, and will be removed
when it is no longer used.
[ paulmck: Update per kernel test robot feedback. ]
Reported-by: John Ogness <john.ogness@linutronix.de>
Reported-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: <linux-arch@vger.kernel.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Reviewed-by: John Ogness <john.ogness@linutronix.de>
|
|
Implement timer-based RCU callback batching (also known as lazy
callbacks). With this we save about 5-10% of power consumed due
to RCU requests that happen when system is lightly loaded or idle.
By default, all async callbacks (queued via call_rcu) are marked
lazy. An alternate API call_rcu_hurry() is provided for the few users,
for example synchronize_rcu(), that need the old behavior.
The batch is flushed whenever a certain amount of time has passed, or
the batch on a particular CPU grows too big. Also memory pressure will
flush it in a future patch.
To handle several corner cases automagically (such as rcu_barrier() and
hotplug), we re-use bypass lists which were originally introduced to
address lock contention, to handle lazy CBs as well. The bypass list
length has the lazy CB length included in it. A separate lazy CB length
counter is also introduced to keep track of the number of lazy CBs.
[ paulmck: Fix formatting of inline call_rcu_lazy() definition. ]
[ paulmck: Apply Zqiang feedback. ]
[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]
Suggested-by: Paul McKenney <paulmck@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
|
|
tools/lib/bpf/ringbuf.c
927cbb478adf ("libbpf: Handle size overflow for ringbuf mmap")
b486d19a0ab0 ("libbpf: checkpatch: Fixed code alignments in ringbuf.c")
https://lore.kernel.org/all/20221121122707.44d1446a@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into asahi-wip
New boards:
- Model A and blade baseboards for the SOQuartz (rk3568) SoM,
- Anberic RG351M, RG353V, RG353VS; Odroid Go Super, Advance gaming devices
- Odroid M1
- Theobroma px30 SoM with baseboard
- Rockchip's own rk3566 demo board
Some core support for per SoC specifics:
- crypto support for rk3399 and rk3328
- second I2S controller for rk3568
- Cache properties for follow the binding for rk3308 and rk3328
Bigger device support updates for:
- SOQuartz: PCIe2, video output, gpu, HDMI sound
- Rock 3A: eth regulator, eth clock input, Wifi+Bt, I2S, PCIe3
As well as some minor extensions for Rock960 (hdmi supplies),
rk3566-roc-pc (PCIe2), Rock 4C+ (thermal support), Pinephone Pro (Wifi+Bt)
* tag 'v6.2-rockchip-dts64-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: (51 commits)
arm64: dts: rockchip: update cache properties for rk3308 and rk3328
arm64: dts: rockchip: Add SOQuartz Model A baseboard
dt-bindings: arm: rockchip: Add SOQuartz Model A
arm64: dts: rockchip: Add SOQuartz blade board
dt-bindings: arm: rockchip: Add SOQuartz Blade
arm64: dts: rockchip: Add Anbernic RG351M
arm64: dts: rockchip: Add Odroid Go Super
arm64: dts: rockchip: Add Odroid Go Advance Black Edition
dt-bindings: arm: rockchip: Add more RK3326 devices
arm64: dts: rockchip: Move most of Odroid Go Advance DTS into a DTSI
arm64: dts: rockchip: Add support of regulator for ethernet node on Rock 3A SBC
arm64: dts: rockchip: Add support of external clock to ethernet node on Rock 3A SBC
arm64: dts: rockchip: Add HDMI supplies on Rock960
arm64: dts: rockchip: Add dts for rockchip rk3566 box demo board
dt-bindings: rockchip: Add Rockchip rk3566 box demo board
arm64: dts: rockchip: Enable PCIe 2 on SOQuartz CM4IO
arm64: dts: rockchip: Enable HDMI sound on SOQuartz
arm64: dts: rockchip: Enable video output and HDMI on SOQuartz
arm64: dts: rockchip: Enable GPU on SOQuartz CM4
arm64: dts: rockchip: enable pcie2 on rk3566-roc-pc
...
Link: https://lore.kernel.org/r/4716610.aeNJFYEL58@phil
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
* for-6.2/io_uring: (41 commits)
io_uring: keep unlock_post inlined in hot path
io_uring: don't use complete_post in kbuf
io_uring: spelling fix
io_uring: remove io_req_complete_post_tw
io_uring: allow multishot polled reqs to defer completion
io_uring: remove overflow param from io_post_aux_cqe
io_uring: add lockdep assertion in io_fill_cqe_aux
io_uring: make io_fill_cqe_aux static
io_uring: add io_aux_cqe which allows deferred completion
io_uring: allow defer completion for aux posted cqes
io_uring: defer all io_req_complete_failed
io_uring: always lock in io_apoll_task_func
io_uring: remove iopoll spinlock
io_uring: iopoll protect complete_post
io_uring: inline __io_req_complete_put()
io_uring: remove io_req_tw_post_queue
io_uring: use io_req_task_complete() in timeout
io_uring: hold locks for io_req_complete_failed
io_uring: add completion locking for iopoll
io_uring: kill io_cqring_ev_posted() and __io_cq_unlock_post()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and wifi.
Current release - new code bugs:
- eth: mlx5e:
- use kvfree() in mlx5e_accel_fs_tcp_create()
- MACsec, fix RX data path 16 RX security channel limit
- MACsec, fix memory leak when MACsec device is deleted
- MACsec, fix update Rx secure channel active field
- MACsec, fix add Rx security association (SA) rule memory leak
Previous releases - regressions:
- wifi: cfg80211: don't allow multi-BSSID in S1G
- stmmac: set MAC's flow control register to reflect current settings
- eth: mlx5:
- E-switch, fix duplicate lag creation
- fix use-after-free when reverting termination table
Previous releases - always broken:
- ipv4: fix route deletion when nexthop info is not specified
- bpf: fix a local storage BPF map bug where the value's spin lock
field can get initialized incorrectly
- tipc: re-fetch skb cb after tipc_msg_validate
- wifi: wilc1000: fix Information Element parsing
- packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
- sctp: fix memory leak in sctp_stream_outq_migrate()
- can: can327: fix potential skb leak when netdev is down
- can: add number of missing netdev freeing on error paths
- aquantia: do not purge addresses when setting the number of rings
- wwan: iosm:
- fix incorrect skb length leading to truncated packet
- fix crash in peek throughput test due to skb UAF"
* tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
net: ethernet: renesas: ravb: Fix promiscuous mode after system resumed
MAINTAINERS: Update maintainer list for chelsio drivers
ionic: update MAINTAINERS entry
sctp: fix memory leak in sctp_stream_outq_migrate()
packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
net/mlx5: Lag, Fix for loop when checking lag
Revert "net/mlx5e: MACsec, remove replay window size limitation in offload path"
net: marvell: prestera: Fix a NULL vs IS_ERR() check in some functions
net: tun: Fix use-after-free in tun_detach()
net: mdiobus: fix unbalanced node reference count
net: hsr: Fix potential use-after-free
tipc: re-fetch skb cb after tipc_msg_validate
mptcp: fix sleep in atomic at close time
mptcp: don't orphan ssk in mptcp_close()
dsa: lan9303: Correct stat name
ipv4: Fix route deletion when nexthop info is not specified
net: wwan: iosm: fix incorrect skb length
net: wwan: iosm: fix crash in peek throughput test
net: wwan: iosm: fix dma_alloc_coherent incompatible pointer type
net: wwan: iosm: fix kernel test robot reported error
...
|
|
When sctp_stream_outq_migrate() is called to release stream out resources,
the memory pointed to by prio_head in stream out is not released.
The memory leak information is as follows:
unreferenced object 0xffff88801fe79f80 (size 64):
comm "sctp_repo", pid 7957, jiffies 4294951704 (age 36.480s)
hex dump (first 32 bytes):
80 9f e7 1f 80 88 ff ff 80 9f e7 1f 80 88 ff ff ................
90 9f e7 1f 80 88 ff ff 90 9f e7 1f 80 88 ff ff ................
backtrace:
[<ffffffff81b215c6>] kmalloc_trace+0x26/0x60
[<ffffffff88ae517c>] sctp_sched_prio_set+0x4cc/0x770
[<ffffffff88ad64f2>] sctp_stream_init_ext+0xd2/0x1b0
[<ffffffff88aa2604>] sctp_sendmsg_to_asoc+0x1614/0x1a30
[<ffffffff88ab7ff1>] sctp_sendmsg+0xda1/0x1ef0
[<ffffffff87f765ed>] inet_sendmsg+0x9d/0xe0
[<ffffffff8754b5b3>] sock_sendmsg+0xd3/0x120
[<ffffffff8755446a>] __sys_sendto+0x23a/0x340
[<ffffffff87554651>] __x64_sys_sendto+0xe1/0x1b0
[<ffffffff89978b49>] do_syscall_64+0x39/0xb0
[<ffffffff89a0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Link: https://syzkaller.appspot.com/bug?exrid=29c402e56c4760763cc0
Fixes: 637784ade221 ("sctp: introduce priority based stream scheduler")
Reported-by: syzbot+29c402e56c4760763cc0@syzkaller.appspotmail.com
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20221126031720.378562-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit c0071be0e16c461680d87b763ba1ee5e46548fde.
The cited commit removed the validity checks which initialized the
window_sz and never removed the use of the now uninitialized variable,
so now we are left with wrong value in the window size and the following
clang warning: [-Wuninitialized]
drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c:232:45:
warning: variable 'window_sz' is uninitialized when used here
MLX5_SET(macsec_aso, aso_ctx, window_size, window_sz);
Revet at this time to address the clang issue due to lack of time to
test the proper solution.
Fixes: c0071be0e16c ("net/mlx5e: MACsec, remove replay window size limitation in offload path")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Reported-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20221129093006.378840-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|