Age | Commit message (Collapse) | Author |
|
These files seem best suited to shmem.
Link: https://lkml.kernel.org/r/20250617161359.166955-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
page_vma_mapped_walk() is used to traverse page tables from a VMA, used by
rmap logic once the reverse mapping has been traversed to the VMA level.
It is also used by other users (migration, damon, etc.) but is primarily
used by the reverse mapping and is a key part of its logic, so it seems
appropriate to place it here.
Link: https://lkml.kernel.org/r/20250617165142.173716-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This file is clearly specific to hugetlb so this seems the most
appropriate place for it.
Link: https://lkml.kernel.org/r/20250617171538.178042-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
These files comprise the bootmem info logic which is initialised on
startup and also memory tests that are run on startup and as such this
seems the most appropriate section for them.
Link: https://lkml.kernel.org/r/20250617174538.188977-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Change my role to Maintainer as I am quite involved in HugeTLB
development, and will be more so with the upcoming HugetLB-pagewalk
unification, so I would like to help Munchun take care of the code.
Besides, having two people will help in offloading some pressure.
Also add David as a Reviewer since he has quite some knowledge in the
field and has already provided valuable feedback.
Link: https://lkml.kernel.org/r/20250617185910.471406-1-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Muchun Song <muchun.song@linux.dev>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Temporarily clear the preallocation flag when explicitly requesting
allocations. Pre-existing allocations are already counted against the
request through mas_node_count_gfp(), but the allocations will not happen
if the MA_STATE_PREALLOC flag is set. This flag is meant to avoid
re-allocating in bulk allocation mode, and to detect issues with
preallocation calculations.
The MA_STATE_PREALLOC flag should also always be set on zero allocations
so that detection of underflow allocations will print a WARN_ON() during
consumption.
User visible effect of this flaw is a WARN_ON() followed by a null pointer
dereference when subsequent requests for larger number of nodes is
ignored, such as the vma merge retry in mmap_region() caused by drivers
altering the vma flags (which happens in v6.6, at least)
Link: https://lkml.kernel.org/r/20250616184521.3382795-3-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Reported-by: Hailong Liu <hailong.liu@oppo.com>
Link: https://lore.kernel.org/all/1652f7eb-a51b-4fee-8058-c73af63bacd1@oppo.com/
Link: https://lore.kernel.org/all/20250428184058.1416274-1-Liam.Howlett@oracle.com/
Link: https://lore.kernel.org/all/20250429014754.1479118-1-Liam.Howlett@oracle.com/
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Hailong Liu <hailong.liu@oppo.com>
Cc: zhangpeng.00@bytedance.com <zhangpeng.00@bytedance.com>
Cc: Steve Kang <Steve.Kang@unisoc.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We previously overlooked GUP test files that sensibly should belong to the
GUP section, include them now.
Link: https://lkml.kernel.org/r/20250616200844.560225-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The working set logic belongs very much to the reclaim section and is
otherwise not assigned to any other MAINTAINERS section so add it here.
Link: https://lkml.kernel.org/r/20250616201643.561626-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If uprobes are not enabled, the test currently fails with:
7151 12:46:54.627936 # # # RUN merge.handle_uprobe_upon_merged_vma ...
7152 12:46:54.639014 # # f /sys/bus/event_source/devices/uprobe/type
7153 12:46:54.639306 # # fopen: No such file or directory
7154 12:46:54.650451 # # # merge.c:473:handle_uprobe_upon_merged_vma:Expected
read_sysfs("/sys/bus/event_source/devices/uprobe/type", &type) (1) == 0 (0)
7155 12:46:54.650730 # # # handle_uprobe_upon_merged_vma: Test terminated by assertion
7156 12:46:54.661750 # # # FAIL merge.handle_uprobe_upon_merged_vma
7157 12:46:54.662030 # # not ok 8 merge.handle_uprobe_upon_merged_vma
Skipping is a more sane and friendly behavior here.
Link: https://lkml.kernel.org/r/20250610122209.3177587-1-pfalcato@suse.de
Fixes: efe99fabeb11 ("selftests/mm: add test about uprobe pte be orphan during vma merge")
Signed-off-by: Pedro Falcato <pfalcato@suse.de>
Reported-by: Aishwarya <aishwarya.tcv@arm.com>
Closes: https://lore.kernel.org/linux-mm/20250610103729.72440-1-aishwarya.tcv@arm.com/
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Tested-by : Donet Tom <donettom@linux.ibm.com>
Reviewed-by : Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
After reverting the transition to the generic min heap library, bcache no
longer depends on MIN_HEAP. The select entry can be removed to reduce
code size and shrink the kernel's attack surface.
This change effectively reverts the bcache-related part of commit
92a8b224b833 ("lib/min_heap: introduce non-inline versions of min heap API
functions").
This is part of a series of changes to address a performance regression
caused by the use of the generic min_heap implementation.
As reported by Robert, bcache now suffers from latency spikes, with P100
(max) latency increasing from 600 ms to 2.4 seconds every 5 minutes.
These regressions degrade bcache's effectiveness as a low-latency cache
layer and lead to frequent timeouts and application stalls in production
environments.
Link: https://lore.kernel.org/lkml/CAJhEC05+0S69z+3+FB2Cd0hD+pCRyWTKLEOsc8BOmH73p1m+KQ@mail.gmail.com
Link: https://lkml.kernel.org/r/20250614202353.1632957-4-visitorckw@gmail.com
Fixes: 866898efbb25 ("bcache: remove heap-related macros and switch to generic min_heap")
Fixes: 92a8b224b833 ("lib/min_heap: introduce non-inline versions of min heap API functions")
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reported-by: Robert Pang <robertpang@google.com>
Closes: https://lore.kernel.org/linux-bcache/CAJhEC06F_AtrPgw2-7CvCqZgeStgCtitbD-ryuPpXQA-JG5XXw@mail.gmail.com
Acked-by: Coly Li <colyli@kernel.org>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This reverts commit 866898efbb25bb44fd42848318e46db9e785973a.
The generic bottom-up min_heap implementation causes performance
regression in invalidate_buckets_lru(), a hot path in bcache. Before the
cache is fully populated, new_bucket_prio() often returns zero, leading to
many equal comparisons. In such cases, bottom-up sift_down performs up to
2 * log2(n) comparisons, while the original top-down approach completes
with just O() comparisons, resulting in a measurable performance gap.
The performance degradation is further worsened by the non-inlined
min_heap API functions introduced in commit 92a8b224b833 ("lib/min_heap:
introduce non-inline versions of min heap API functions"), adding function
call overhead to this critical path.
As reported by Robert, bcache now suffers from latency spikes, with P100
(max) latency increasing from 600 ms to 2.4 seconds every 5 minutes.
These regressions degrade bcache's effectiveness as a low-latency cache
layer and lead to frequent timeouts and application stalls in production
environments.
This revert aims to restore bcache's original low-latency behavior.
Link: https://lore.kernel.org/lkml/CAJhEC05+0S69z+3+FB2Cd0hD+pCRyWTKLEOsc8BOmH73p1m+KQ@mail.gmail.com
Link: https://lkml.kernel.org/r/20250614202353.1632957-3-visitorckw@gmail.com
Fixes: 866898efbb25 ("bcache: remove heap-related macros and switch to generic min_heap")
Fixes: 92a8b224b833 ("lib/min_heap: introduce non-inline versions of min heap API functions")
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reported-by: Robert Pang <robertpang@google.com>
Closes: https://lore.kernel.org/linux-bcache/CAJhEC06F_AtrPgw2-7CvCqZgeStgCtitbD-ryuPpXQA-JG5XXw@mail.gmail.com
Acked-by: Coly Li <colyli@kernel.org>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "bcache: Revert min_heap migration due to performance
regression".
This patch series reverts the migration of bcache from its original heap
implementation to the generic min_heap library. While the original change
aimed to simplify the code and improve maintainability, it introduced a
severe performance regression in real-world scenarios.
As reported by Robert, systems using bcache now suffer from periodic
latency spikes, with P100 (max) latency increasing from 600 ms to 2.4
seconds every 5 minutes. This degrades bcache's value as a low-latency
caching layer, and leads to frequent timeouts and application stalls in
production environments.
The primary cause of this regression is the behavior of the generic
min_heap implementation's bottom-up sift_down, which performs up to 2 *
log2(n) comparisons when many elements are equal. The original top-down
variant used by bcache only required O(1) comparisons in such cases. The
issue was further exacerbated by commit 92a8b224b833 ("lib/min_heap:
introduce non-inline versions of min heap API functions"), which
introduced non-inlined versions of the min_heap API, adding function call
overhead to a performance-critical hot path.
This patch (of 3):
This reverts commit 3d8a9a1c35227c3f1b0bd132c9f0a80dbda07b65.
Although removing the custom swap function simplified the code, this
change is part of a broader migration to the generic min_heap API that
introduced significant performance regressions in bcache.
As reported by Robert, bcache now suffers from latency spikes, with P100
(max) latency increasing from 600 ms to 2.4 seconds every 5 minutes.
These regressions degrade bcache's effectiveness as a low-latency cache
layer and lead to frequent timeouts and application stalls in production
environments.
This revert is part of a series of changes to restore previous performance
by undoing the min_heap transition.
Link: https://lkml.kernel.org/r/20250614202353.1632957-1-visitorckw@gmail.com
Link: https://lore.kernel.org/lkml/CAJhEC05+0S69z+3+FB2Cd0hD+pCRyWTKLEOsc8BOmH73p1m+KQ@mail.gmail.com
Link: https://lkml.kernel.org/r/20250614202353.1632957-2-visitorckw@gmail.com
Fixes: 866898efbb25 ("bcache: remove heap-related macros and switch to generic min_heap")
Fixes: 92a8b224b833 ("lib/min_heap: introduce non-inline versions of min heap API functions")
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reported-by: Robert Pang <robertpang@google.com>
Closes: https://lore.kernel.org/linux-bcache/CAJhEC06F_AtrPgw2-7CvCqZgeStgCtitbD-ryuPpXQA-JG5XXw@mail.gmail.com
Acked-by: Coly Li <colyli@kernel.org>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If CONFIG_UPROBES is not set, a merge subtest fails:
Failure log:
7151 12:46:54.627936 # # # RUN merge.handle_uprobe_upon_merged_vma ...
7152 12:46:54.639014 # # f /sys/bus/event_source/devices/uprobe/type
7153 12:46:54.639306 # # fopen: No such file or directory
7154 12:46:54.650451 # # # merge.c:473:handle_uprobe_upon_merged_vma:Expected read_sysfs("/sys/bus/event_source/devices/uprobe/type", &type) (1) == 0 (0)
7155 12:46:54.650730 # # # handle_uprobe_upon_merged_vma: Test terminated by assertion
7156 12:46:54.661750 # # # FAIL merge.handle_uprobe_upon_merged_vma
7157 12:46:54.662030 # # not ok 8 merge.handle_uprobe_upon_merged_vma
CONFIG_UPROBES is enabled by CONFIG_UPROBE_EVENTS, which gets enabled by
CONFIG_FTRACE. Therefore add these configs to selftests/mm/config so that
CI systems can include this config in the kernel build. To be completely
safe, add CONFIG_PROFILING too, to enable the dependency chain
PROFILING -> PERF_EVENTS -> UPROBE_EVENTS -> UPROBES.
Link: https://lkml.kernel.org/r/20250613034912.53791-1-dev.jain@arm.com
Fixes: efe99fabeb11 ("selftests/mm: add test about uprobe pte be orphan during vma merge")
Signed-off-by: Dev Jain <dev.jain@arm.com>
Reported-by: Aishwarya <aishwarya.tcv@arm.com>
Closes: https://lore.kernel.org/all/20250610103729.72440-1-aishwarya.tcv@arm.com/
Tested-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Tested-by : Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Pu Lehui <pulehui@huawei.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Currently, when restoring higher order folios, kho_restore_folio() only
calls prep_compound_page() on all the pages. That is not enough to
properly initialize the folios. The managed page count does not get
updated, the reserved flag does not get dropped, and page count does not
get initialized properly.
Restoring a higher order folio with it results in the following BUG with
CONFIG_DEBUG_VM when attempting to free the folio:
BUG: Bad page state in process test pfn:104e2b
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0xffffffffffffffff pfn:0x104e2b
flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff)
raw: 002fffff80000000 0000000000000000 00000000ffffffff 0000000000000000
raw: ffffffffffffffff 0000000000000000 00000001ffffffff 0000000000000000
page dumped because: nonzero _refcount
[...]
Call Trace:
<TASK>
dump_stack_lvl+0x4b/0x70
bad_page.cold+0x97/0xb2
__free_frozen_pages+0x616/0x850
[...]
Combine the path for 0-order and higher order folios, initialize the tail
pages with a count of zero, and call adjust_managed_page_count() to
account for all the pages instead of just missing them.
In addition, since all the KHO-preserved pages get marked with
MEMBLOCK_RSRV_NOINIT by deserialize_bitmap(), the reserved flag is not
actually set (as can also be seen from the flags of the dumped page in the
logs above). So drop the ClearPageReserved() calls.
[ptyadav@amazon.de: declare i in the loop instead of at the top]
Link: https://lkml.kernel.org/r/20250613125916.39272-1-pratyush@kernel.org
Link: https://lkml.kernel.org/r/20250605171143.76963-1-pratyush@kernel.org
Fixes: fc33e4b44b27 ("kexec: enable KHO support for memory preservation")
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Along with kexec, KHO also has parts dealing with memory management, like
page/folio initialization, memblock, and preserving/unpreserving memory
for next kernel. Copy linux-mm@ to KHO patches so the right set of eyes
can look at changes to those parts.
Link: https://lkml.kernel.org/r/20250613131917.4488-1-pratyush@kernel.org
Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: SeongJae Park <sj@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Changyuan Lyu <changyuanl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This commit fixes two kinds of races, they may have different results:
Barry reported a BUG_ON in commit c50f8e6053b0, we may see the same
BUG_ON if the filemap lookup returned NULL and folio is added to swap
cache after that.
If another kind of race is triggered (folio changed after lookup) we
may see RSS counter is corrupted:
[ 406.893936] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0
type:MM_ANONPAGES val:-1
[ 406.894071] BUG: Bad rss-counter state mm:ffff0000c5a9ddc0
type:MM_SHMEMPAGES val:1
Because the folio is being accounted to the wrong VMA.
I'm not sure if there will be any data corruption though, seems no.
The issues above are critical already.
On seeing a swap entry PTE, userfaultfd_move does a lockless swap cache
lookup, and tries to move the found folio to the faulting vma. Currently,
it relies on checking the PTE value to ensure that the moved folio still
belongs to the src swap entry and that no new folio has been added to the
swap cache, which turns out to be unreliable.
While working and reviewing the swap table series with Barry, following
existing races are observed and reproduced [1]:
In the example below, move_pages_pte is moving src_pte to dst_pte, where
src_pte is a swap entry PTE holding swap entry S1, and S1 is not in the
swap cache:
CPU1 CPU2
userfaultfd_move
move_pages_pte()
entry = pte_to_swp_entry(orig_src_pte);
// Here it got entry = S1
... < interrupted> ...
<swapin src_pte, alloc and use folio A>
// folio A is a new allocated folio
// and get installed into src_pte
<frees swap entry S1>
// src_pte now points to folio A, S1
// has swap count == 0, it can be freed
// by folio_swap_swap or swap
// allocator's reclaim.
<try to swap out another folio B>
// folio B is a folio in another VMA.
<put folio B to swap cache using S1 >
// S1 is freed, folio B can use it
// for swap out with no problem.
...
folio = filemap_get_folio(S1)
// Got folio B here !!!
... < interrupted again> ...
<swapin folio B and free S1>
// Now S1 is free to be used again.
<swapout src_pte & folio A using S1>
// Now src_pte is a swap entry PTE
// holding S1 again.
folio_trylock(folio)
move_swap_pte
double_pt_lock
is_pte_pages_stable
// Check passed because src_pte == S1
folio_move_anon_rmap(...)
// Moved invalid folio B here !!!
The race window is very short and requires multiple collisions of multiple
rare events, so it's very unlikely to happen, but with a deliberately
constructed reproducer and increased time window, it can be reproduced
easily.
This can be fixed by checking if the folio returned by filemap is the
valid swap cache folio after acquiring the folio lock.
Another similar race is possible: filemap_get_folio may return NULL, but
folio (A) could be swapped in and then swapped out again using the same
swap entry after the lookup. In such a case, folio (A) may remain in the
swap cache, so it must be moved too:
CPU1 CPU2
userfaultfd_move
move_pages_pte()
entry = pte_to_swp_entry(orig_src_pte);
// Here it got entry = S1, and S1 is not in swap cache
folio = filemap_get_folio(S1)
// Got NULL
... < interrupted again> ...
<swapin folio A and free S1>
<swapout folio A re-using S1>
move_swap_pte
double_pt_lock
is_pte_pages_stable
// Check passed because src_pte == S1
folio_move_anon_rmap(...)
// folio A is ignored !!!
Fix this by checking the swap cache again after acquiring the src_pte
lock. And to avoid the filemap overhead, we check swap_map directly [2].
The SWP_SYNCHRONOUS_IO path does make the problem more complex, but so far
we don't need to worry about that, since folios can only be exposed to the
swap cache in the swap out path, and this is covered in this patch by
checking the swap cache again after acquiring the src_pte lock.
Testing with a simple C program that allocates and moves several GB of
memory did not show any observable performance change.
Link: https://lkml.kernel.org/r/20250604151038.21968-1-ryncsn@gmail.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Kairui Song <kasong@tencent.com>
Closes: https://lore.kernel.org/linux-mm/CAMgjq7B1K=6OOrK2OUZ0-tqCzi+EJt+2_K97TPGoSt=9+JwP7Q@mail.gmail.com/ [1]
Link: https://lore.kernel.org/all/CAGsJ_4yJhJBo16XhiC-nUzSheyX-V3-nFE+tAi=8Y560K8eT=A@mail.gmail.com/ [2]
Reviewed-by: Lokesh Gidra <lokeshgidra@google.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Reviewed-by: Chris Li <chrisl@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
After commit 1aaf8c122918 ("mm: gup: fix infinite loop within
__get_longterm_locked") we are able to longterm pin folios that are not
supposed to get longterm pinned, simply because they temporarily have the
LRU flag cleared (esp. temporarily isolated).
For example, two __get_longterm_locked() callers can race, or
__get_longterm_locked() can race with anything else that temporarily
isolates folios.
The introducing commit mentions the use case of a driver that uses
vm_ops->fault to insert pages allocated through cma_alloc() into the page
tables, assuming they can later get longterm pinned. These pages/ folios
would never have the LRU flag set and consequently cannot get isolated.
There is no known in-tree user making use of that so far, fortunately.
To handle that in the future -- and avoid retrying forever to
isolate/migrate them -- we will need a different mechanism for the CMA
area *owner* to indicate that it actually already allocated the page and
is fine with longterm pinning it. The LRU flag is not suitable for that.
Probably we can lookup the relevant CMA area and query the bitmap; we only
have have to care about some races, probably. If already allocated, we
could just allow longterm pinning)
Anyhow, let's fix the "must not be longterm pinned" problem first by
reverting the original commit.
Link: https://lkml.kernel.org/r/20250611131314.594529-1-david@redhat.com
Fixes: 1aaf8c122918 ("mm: gup: fix infinite loop within __get_longterm_locked")
Signed-off-by: David Hildenbrand <david@redhat.com>
Closes: https://lore.kernel.org/all/20250522092755.GA3277597@tiffany/
Reported-by: Hyesoo Yu <hyesoo.yu@samsung.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Peter Xu <peterx@redhat.com>
Cc: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Cc: Aijun Sun <aijun.sun@unisoc.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The mm selftests are timing out with the current 180-second limit.
Testing shows that run_vmtests.sh takes approximately 11 minutes
(664 seconds) to complete.
Increase the timeout to 900 seconds (15 minutes) to provide sufficient
buffer for the tests to complete successfully.
Link: https://lkml.kernel.org/r/20250609120606.73145-2-shivankg@amd.com
Signed-off-by: Shivank Garg <shivankg@amd.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Following softlockup can be easily reproduced on my test machine with:
echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled
swapon /dev/zram0 # zram0 is a 48G swap device
mkdir -p /sys/fs/cgroup/memory/test
echo 1G > /sys/fs/cgroup/test/memory.max
echo $BASHPID > /sys/fs/cgroup/test/cgroup.procs
while true; do
dd if=/dev/zero of=/tmp/test.img bs=1M count=5120
cat /tmp/test.img > /dev/null
rm /tmp/test.img
done
Then after a while:
watchdog: BUG: soft lockup - CPU#0 stuck for 763s! [cat:5787]
Modules linked in: zram virtiofs
CPU: 0 UID: 0 PID: 5787 Comm: cat Kdump: loaded Tainted: G L 6.15.0.orig-gf3021d9246bc-dirty #118 PREEMPT(voluntary)·
Tainted: [L]=SOFTLOCKUP
Hardware name: Red Hat KVM/RHEL-AV, BIOS 0.0.0 02/06/2015
RIP: 0010:mpol_shared_policy_lookup+0xd/0x70
Code: e9 b8 b4 ff ff 31 c0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 41 54 55 53 <48> 8b 1f 48 85 db 74 41 4c 8d 67 08 48 89 fb 48 89 f5 4c 89 e7 e8
RSP: 0018:ffffc90002b1fc28 EFLAGS: 00000202
RAX: 00000000001c20ca RBX: 0000000000724e1e RCX: 0000000000000001
RDX: ffff888118e214c8 RSI: 0000000000057d42 RDI: ffff888118e21518
RBP: 000000000002bec8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000bf4 R11: 0000000000000000 R12: 0000000000000001
R13: 00000000001c20ca R14: 00000000001c20ca R15: 0000000000000000
FS: 00007f03f995c740(0000) GS:ffff88a07ad9a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f03f98f1000 CR3: 0000000144626004 CR4: 0000000000770eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
shmem_alloc_folio+0x31/0xc0
shmem_swapin_folio+0x309/0xcf0
? filemap_get_entry+0x117/0x1e0
? xas_load+0xd/0xb0
? filemap_get_entry+0x101/0x1e0
shmem_get_folio_gfp+0x2ed/0x5b0
shmem_file_read_iter+0x7f/0x2e0
vfs_read+0x252/0x330
ksys_read+0x68/0xf0
do_syscall_64+0x4c/0x1c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f03f9a46991
Code: 00 48 8b 15 81 14 10 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd e8 20 ad 01 00 f3 0f 1e fa 80 3d 35 97 10 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 4f c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec
RSP: 002b:00007fff3c52bd28 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000040000 RCX: 00007f03f9a46991
RDX: 0000000000040000 RSI: 00007f03f98ba000 RDI: 0000000000000003
RBP: 00007fff3c52bd50 R08: 0000000000000000 R09: 00007f03f9b9a380
R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000040000
R13: 00007f03f98ba000 R14: 0000000000000003 R15: 0000000000000000
</TASK>
The reason is simple, readahead brought some order 0 folio in swap cache,
and the swapin mTHP folio being allocated is in conflict with it, so
swapcache_prepare fails and causes shmem_swap_alloc_folio to return
-EEXIST, and shmem simply retries again and again causing this loop.
Fix it by applying a similar fix for anon mTHP swapin.
The performance change is very slight, time of swapin 10g zero folios
with shmem (test for 12 times):
Before: 2.47s
After: 2.48s
[kasong@tencent.com: add comment]
Link: https://lkml.kernel.org/r/20250610181645.45922-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20250610181645.45922-1-ryncsn@gmail.com
Link: https://lkml.kernel.org/r/20250609171751.36305-1-ryncsn@gmail.com
Fixes: 1dd44c0af4fa ("mm: shmem: skip swapcache for swapin of synchronous swap device")
Signed-off-by: Kairui Song <kasong@tencent.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
dma_map_XXX() can fail and should be tested for errors with
dma_mapping_error().
Fixes: a63e78eb2b0f ("scsi: fnic: Add support for fabric based solicited requests and responses")
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Link: https://lore.kernel.org/r/20250618065715.14740-2-fourier.thomas@gmail.com
Reviewed-by: Karan Tilak Kumar <kartilak@cisco.com>
Reviewed-by: John Menghini <jmeneghi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Replace KERN_INFO with KERN_DEBUG for a log message.
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Reviewed-by: Gian Carlo Boffa <gcboffa@cisco.com>
Reviewed-by: Arun Easi <aeasi@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/stable/20250612002212.4144-1-kartilak%40cisco.com
Link: https://lore.kernel.org/r/20250618003431.6314-4-kartilak@cisco.com
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add logs in FDMI and FDMI ABTS paths.
Modify log text in these paths.
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Reviewed-by: Gian Carlo Boffa <gcboffa@cisco.com>
Reviewed-by: Arun Easi <aeasi@cisco.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20250618003431.6314-3-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When the link goes down and comes up, FDMI requests are not sent out
anymore.
Fix bug by turning off FNIC_FDMI_ACTIVE when the link goes down.
Fixes: 09c1e6ab4ab2 ("scsi: fnic: Add and integrate support for FDMI")
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Reviewed-by: Gian Carlo Boffa <gcboffa@cisco.com>
Reviewed-by: Arun Easi <aeasi@cisco.com>
Tested-by: Karan Tilak Kumar <kartilak@cisco.com>
Cc: stable@vger.kernel.org
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20250618003431.6314-2-kartilak@cisco.com
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When both the RHBA and RPA FDMI requests time out, fnic reuses a frame to
send ABTS for each of them. On send completion, this causes an attempt to
free the same frame twice that leads to a crash.
Fix crash by allocating separate frames for RHBA and RPA, and modify ABTS
logic accordingly.
Tested by checking MDS for FDMI information.
Tested by using instrumented driver to:
- Drop PLOGI response
- Drop RHBA response
- Drop RPA response
- Drop RHBA and RPA response
- Drop PLOGI response + ABTS response
- Drop RHBA response + ABTS response
- Drop RPA response + ABTS response
- Drop RHBA and RPA response + ABTS response for both of them
Fixes: 09c1e6ab4ab2 ("scsi: fnic: Add and integrate support for FDMI")
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Reviewed-by: Gian Carlo Boffa <gcboffa@cisco.com>
Tested-by: Arun Easi <aeasi@cisco.com>
Co-developed-by: Arun Easi <aeasi@cisco.com>
Signed-off-by: Arun Easi <aeasi@cisco.com>
Tested-by: Karan Tilak Kumar <kartilak@cisco.com>
Cc: stable@vger.kernel.org
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20250618003431.6314-1-kartilak@cisco.com
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In ufshcd_host_reset_and_restore(), scale up clocks only when clock
scaling is supported. Without this change CPU latency is voted for 0
(ufshcd_pm_qos_update) during resume unconditionally.
Signed-off-by: anvithdosapati <anvithdosapati@google.com>
Link: https://lore.kernel.org/r/20250616085734.2133581-1-anvithdosapati@google.com
Fixes: a3cd5ec55f6c ("scsi: ufs: add load based scaling of UFS gear")
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
On a system with DRAM interleave enabled, out-of-bound access is
detected:
megaraid_sas 0000:3f:00.0: requested/available msix 128/128 poll_queue 0
------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in ./arch/x86/include/asm/topology.h:72:28
index -1 is out of range for type 'cpumask *[1024]'
dump_stack_lvl+0x5d/0x80
ubsan_epilogue+0x5/0x2b
__ubsan_handle_out_of_bounds.cold+0x46/0x4b
megasas_alloc_irq_vectors+0x149/0x190 [megaraid_sas]
megasas_probe_one.cold+0xa4d/0x189c [megaraid_sas]
local_pci_probe+0x42/0x90
pci_device_probe+0xdc/0x290
really_probe+0xdb/0x340
__driver_probe_device+0x78/0x110
driver_probe_device+0x1f/0xa0
__driver_attach+0xba/0x1c0
bus_for_each_dev+0x8b/0xe0
bus_add_driver+0x142/0x220
driver_register+0x72/0xd0
megasas_init+0xdf/0xff0 [megaraid_sas]
do_one_initcall+0x57/0x310
do_init_module+0x90/0x250
init_module_from_file+0x85/0xc0
idempotent_init_module+0x114/0x310
__x64_sys_finit_module+0x65/0xc0
do_syscall_64+0x82/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fix it accordingly.
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Link: https://lore.kernel.org/r/20250604042556.3731059-1-yu.c.chen@intel.com
Fixes: 8049da6f3943 ("scsi: megaraid_sas: Use irq_set_affinity_and_hint()")
Cc: stable@vger.kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fix from Mark Brown:
"One fix here from Thierry, fixing crashes caused by attempting to do
cache sync operations on uncached memory on Tegra platforms"
* tag 'spi-fix-v6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: tegra210-qspi: Remove cache operations
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fix from Mark Brown:
"One patch here from Heiko which fixes stability issues on some
Rockchip platforms by implementing soft start support and providing
startup time information for their regulators"
* tag 'regulator-fix-v6.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: fan53555: add enable_time support and soft-start times
|
|
When num_parents is 4, __clk_register() occurs an out-of-bounds
when accessing parent_names member. Use ARRAY_SIZE() instead of
hardcode number here.
BUG: KASAN: global-out-of-bounds in __clk_register+0x1844/0x20d8
Read of size 8 at addr ffff800086988e78 by task kworker/u24:3/59
Hardware name: NXP i.MX95 19X19 board (DT)
Workqueue: events_unbound deferred_probe_work_func
Call trace:
dump_backtrace+0x94/0xec
show_stack+0x18/0x24
dump_stack_lvl+0x8c/0xcc
print_report+0x398/0x5fc
kasan_report+0xd4/0x114
__asan_report_load8_noabort+0x20/0x2c
__clk_register+0x1844/0x20d8
clk_hw_register+0x44/0x110
__clk_hw_register_mux+0x284/0x3a8
imx95_bc_probe+0x4f4/0xa70
Fixes: 5224b189462f ("clk: imx: add i.MX95 BLK CTL clk driver")
Cc: stable@vger.kernel.org
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Link: https://lore.kernel.org/r/20250619062108.2016511-1-xiaolei.wang@windriver.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
The SCMI clock driver currently assumes that parent clocks are always
initialized before their children. However, this assumption can fail if
a child clock is encountered before its parent during probe.
This leads to an issue during initialization of the parent_data array:
sclk->parent_data[i].hw = hws[sclk->info->parents[i]];
If the parent clock's hardware structure has not been initialized yet,
this assignment results in invalid data.
To resolve this, allocate all struct scmi_clk instances as a contiguous
array at the beginning of the probe and populate the hws[] array
upfront. This ensures that any parent referenced later is already
initialized, regardless of the order in which clocks are processed.
Note that we can no longer free individual scmi_clk instances if
scmi_clk_ops_init() fails which shouldn't be a problem if the SCMI
platform has proper per-agent clock discovery.
Fixes: 65a8a3dd3b95f ("clk: scmi: Add support for clock {set,get}_parent")
Reviewed-by: peng.fan@nxp.com
Reviewed-by: Cristian Marussi <cristian.marussi@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Link: https://lore.kernel.org/r/20250612-clk-scmi-children-parent-fix-v3-1-7de52a27593d@pengutronix.de
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- A workaround update (Vinay)
- Fix memset on iomem (Lucas)
- Fix early wedge on GuC Load failure (Daniele)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/aFQ03kNzhbiNK7gW@fedora
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
drm-misc-fixes for v6.16-rc3:
- vivante scheduler fix.
- v3d null pointer crash fix.
- fix backlight, booting GSP-RM, and potential integer shift overflow in nouveau.
- fix compiler warnings about unused linux/export.h
- fix malidp unknown modifier spam.
- fix for ssd130x.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/d44bab7b-01f8-45a8-a7f4-5d3d563d2f9d@linux.intel.com
|
|
MANA supports RDMA in PF mode. The driver should record the doorbell
physical address when in PF mode.
The doorbell physical address is used by the RDMA driver to map
doorbell pages of the device to user-mode applications through RDMA
verbs interface. In the past, they have been mapped to user-mode while
the device is in VF mode. With the support for PF mode implemented,
also expose those pages in PF mode.
Support for PF mode is implemented in
290e5d3c49f6 ("net: mana: Add support for Multi Vports on Bare metal")
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1750210606-12167-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
List both CPU supply regulators which drive the little and big CPU
clusters, respectively, so that cpufreq can pick them up.
Without this patch the cpufreq governor attempts to raise the big CPU
frequency under high load, while its supply voltage stays at 850000 uV.
This causes system instability and, in my case, random reboots.
With this patch, supply voltages are adjusted in step with frequency
changes from 700000-737000 uV in idle to 950000 uV under full load,
and the system appears to be stable.
While at this, list all CPU supplies for completeness.
Cc: stable@vger.kernel.org
Fixes: 40f742b07ab2 ("arm64: dts: rockchip: Add rk3576-armsom-sige5 board")
Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Tested-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Signed-off-by: Alexey Charkov <alchark@gmail.com>
Link: https://lore.kernel.org/r/20250614-sige5-updates-v2-1-3bb31b02623c@gmail.com
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
|
|
If the end result of a security_compute_sid() computation matches the
ssid or tsid, return that SID rather than looking it up again. This
avoids the problem of multiple initial SIDs that map to the same
context.
Cc: stable@vger.kernel.org
Reported-by: Guido Trentalancia <guido@trentalancia.com>
Fixes: ae254858ce07 ("selinux: introduce an initial SID for early boot processes")
Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com>
Tested-by: Guido Trentalancia <guido@trentalancia.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
|
|
For now we only have one key type in these btrees, but forward
compatibility means we do have to check.
Reported-by: syzbot+b4cb4a6988aced0cec4b@syzkaller.appspotmail.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from wireless.
The ath12k fix to avoid FW crashes requires adding support for a
number of new FW commands so it's quite large in terms of LoC. The
rest is relatively small.
Current release - fix to a fix:
- ptp: fix breakage after ptp_vclock_in_use() rework
Current release - regressions:
- openvswitch: allocate struct ovs_pcpu_storage dynamically, static
allocation may exhaust module loader limit on smaller systems
Previous releases - regressions:
- tcp: fix tcp_packet_delayed() for peers with no selective ACK
support
Previous releases - always broken:
- wifi: ath12k: don't activate more links than firmware supports
- tcp: make sure sockets open via passive TFO have valid NAPI ID
- eth: bnxt_en: update MRU and RSS table of RSS contexts on queue
reset, prevent Rx queues from silently hanging after queue reset
- NFC: uart: set tty->disc_data only in success path"
* tag 'net-6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (59 commits)
net: airoha: Differentiate hwfd buffer size for QDMA0 and QDMA1
net: airoha: Compute number of descriptors according to reserved memory size
tools: ynl: fix mixing ops and notifications on one socket
net: atm: fix /proc/net/atm/lec handling
net: atm: add lec_mutex
mlxbf_gige: return EPROBE_DEFER if PHY IRQ is not available
net: airoha: Always check return value from airoha_ppe_foe_get_entry()
NFC: nci: uart: Set tty->disc_data only in success path
calipso: Fix null-ptr-deref in calipso_req_{set,del}attr().
MAINTAINERS: Remove Shannon Nelson from MAINTAINERS file
net: lan743x: fix potential out-of-bounds write in lan743x_ptp_io_event_clock_get()
eth: fbnic: avoid double free when failing to DMA-map FW msg
tcp: fix passive TFO socket having invalid NAPI ID
selftests: net: add test for passive TFO socket NAPI ID
selftests: net: add passive TFO test binary
selftests: netdevsim: improve lib.sh include in peer.sh
tipc: fix null-ptr-deref when acquiring remote ip of ethernet bearer
Octeontx2-pf: Fix Backpresure configuration
net: ftgmac100: select FIXED_PHY
net: ethtool: remove duplicate defines for family info
...
|
|
Memory allocated for the ECC engine conf is not released during spinand
cleanup. Below kmemleak trace is seen for this memory leak:
unreferenced object 0xffffff80064f00e0 (size 8):
comm "swapper/0", pid 1, jiffies 4294937458
hex dump (first 8 bytes):
00 00 00 00 00 00 00 00 ........
backtrace (crc 0):
kmemleak_alloc+0x30/0x40
__kmalloc_cache_noprof+0x208/0x3c0
spinand_ondie_ecc_init_ctx+0x114/0x200
nand_ecc_init_ctx+0x70/0xa8
nanddev_ecc_engine_init+0xec/0x27c
spinand_probe+0xa2c/0x1620
spi_mem_probe+0x130/0x21c
spi_probe+0xf0/0x170
really_probe+0x17c/0x6e8
__driver_probe_device+0x17c/0x21c
driver_probe_device+0x58/0x180
__device_attach_driver+0x15c/0x1f8
bus_for_each_drv+0xec/0x150
__device_attach+0x188/0x24c
device_initial_probe+0x10/0x20
bus_probe_device+0x11c/0x160
Fix the leak by calling nanddev_ecc_engine_cleanup() inside
spinand_cleanup().
Signed-off-by: Pablo Martin-Gomez <pmartin-gomez@freebox.fr>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
|
|
This fixes exceeding the bump allocator limit when the allocator finds
many buckets that need repair - they're repaired asynchronously, which
means that every error logged a message in the bump allocator, without
committing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
Don't change buf->size on error - this would usually be a transaction
restart, but it could also be -ENOMEM - when we've exceeded the bump
allocator max).
Fixes: 247abee6ae6d ("bcachefs: btree_trans_subbuf")
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- ltc4282: Avoid repeated register write operation
- occ: Fix unaligned accesses, and rework attribute registration to
reduce stack usage
- ftsteutates: Fix TOCTOU race
* tag 'hwmon-for-v6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ltc4282) avoid repeated register write
hwmon: (occ) fix unaligned accesses
hwmon: (occ) Rework attribute registration for stack usage
hwmon: (ftsteutates) Fix TOCTOU race in fts_read()
|
|
Lorenzo Bianconi says:
====================
net: airoha: Improve hwfd buffer/descriptor queues setup
Compute the number of hwfd buffers/descriptors according to the reserved
memory size if provided via DTS.
Reduce the required hwfd buffers queue size for QDMA1.
v3: https://lore.kernel.org/20250618-airoha-hw-num-desc-v3-0-18a6487cd75e@kernel.org
v1: https://lore.kernel.org/20250615-airoha-hw-num-desc-v1-0-8f88daa4abd7@kernel.org
====================
Link: https://patch.msgid.link/20250619-airoha-hw-num-desc-v4-0-49600a9b319a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
EN7581 SoC allows configuring the size and the number of buffers in
hwfd payload queue for both QDMA0 and QDMA1.
In order to reduce the required DRAM used for hwfd buffers queues and
decrease the memory footprint, differentiate hwfd buffer size for QDMA0
and QDMA1 and reduce hwfd buffer size to 1KB for QDMA1 (WAN) while
maintaining 2KB for QDMA0 (LAN).
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250619-airoha-hw-num-desc-v4-2-49600a9b319a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In order to not exceed the reserved memory size for hwfd buffers,
compute the number of hwfd buffers/descriptors according to the
reserved memory size and the size of each hwfd buffer (2KB).
Fixes: 3a1ce9e3d01b ("net: airoha: Add the capability to allocate hwfd buffers via reserved-memory")
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250619-airoha-hw-num-desc-v4-1-49600a9b319a@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless
Johannes Berg says:
====================
More fixes:
- ath12k
- avoid busy-waiting
- activate correct number of links
- iwlwifi
- iwldvm regression (lots of warnings)
- iwlmld merge damage regression (crash)
- fix build with some old gcc versions
- carl9170: don't talk to device w/o FW [syzbot]
- ath6kl: remove bad FW WARN [syzbot]
- ieee80211: use variable-length arrays [syzbot]
- mac80211
- remove WARN on delayed beacon update [syzbot]
- drop OCB frames with invalid source [syzbot]
* tag 'wireless-2025-06-18' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
wifi: iwlwifi: Fix incorrect logic on cmd_ver range checking
wifi: iwlwifi: dvm: restore n_no_reclaim_cmds setting
wifi: iwlwifi: cfg: Limit cb_size to valid range
wifi: iwlwifi: restore missing initialization of async_handlers_list (again)
wifi: ath6kl: remove WARN on bad firmware input
wifi: carl9170: do not ping device which has failed to load firmware
wifi: ath12k: don't wait when there is no vdev started
wifi: ath12k: don't use static variables in ath12k_wmi_fw_stats_process()
wifi: ath12k: avoid burning CPU while waiting for firmware stats
wifi: ath12k: fix documentation on firmware stats
wifi: ath12k: don't activate more links than firmware supports
wifi: ath12k: update link active in case two links fall on the same MAC
wifi: ath12k: support WMI_MLO_LINK_SET_ACTIVE_CMDID command
wifi: ath12k: update freq range for each hardware mode
wifi: ath12k: parse and save sbs_lower_band_end_freq from WMI_SERVICE_READY_EXT2_EVENTID event
wifi: ath12k: parse and save hardware mode info from WMI_SERVICE_READY_EXT_EVENTID event for later use
wifi: ath12k: Avoid CPU busy-wait by handling VDEV_STAT and BCN_STAT
wifi: mac80211: don't WARN for late channel/color switch
wifi: mac80211: drop invalid source address OCB frames
wifi: remove zero-length arrays
====================
Link: https://patch.msgid.link/20250618210642.35805-6-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The multi message support loosened the connection between the request
and response handling, as we can now submit multiple requests before
we start processing responses. Passing the attr set to NlMsgs decoding
no longer makes sense (if it ever did), attr set may differ message
by messsage. Isolate the part of decoding responsible for attr-set
specific interpretation and call it once we identified the correct op.
Without this fix performing SET operation on an ethtool socket, while
being subscribed to notifications causes:
# File "tools/net/ynl/pyynl/lib/ynl.py", line 1096, in _op
# Exception| return self._ops(ops)[0]
# Exception| ~~~~~~~~~^^^^^
# File "tools/net/ynl/pyynl/lib/ynl.py", line 1040, in _ops
# Exception| nms = NlMsgs(reply, attr_space=op.attr_set)
# Exception| ^^^^^^^^^^^
The value of op we use on line 1040 is stale, it comes form the previous
loop. If a notification comes before a response we will update op to None
and the next iteration thru the loop will break with the trace above.
Fixes: 6fda63c45fe8 ("tools/net/ynl: fix cli.py --subscribe feature")
Fixes: ba8be00f68f5 ("tools/net/ynl: Add multi message support to ynl")
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://patch.msgid.link/20250618171746.1201403-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
net: atm: protect dev_lec[] with a mutex
Based on an initial syzbot report.
First patch is adding lec_mutex to address the report.
Second patch protects /proc/net/atm/lec operations.
We probably should delete this driver, it seems quite broken.
====================
Link: https://patch.msgid.link/20250618140844.1686882-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
/proc/net/atm/lec must ensure safety against dev_lec[] changes.
It appears it had dev_put() calls without prior dev_hold(),
leading to imbalance and UAF.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Francois Romieu <romieu@fr.zoreil.com> # Minor atm contributor
Link: https://patch.msgid.link/20250618140844.1686882-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
syzbot found its way in net/atm/lec.c, and found an error path
in lecd_attach() could leave a dangling pointer in dev_lec[].
Add a mutex to protect dev_lecp[] uses from lecd_attach(),
lec_vcc_attach() and lec_mcast_attach().
Following patch will use this mutex for /proc/net/atm/lec.
BUG: KASAN: slab-use-after-free in lecd_attach net/atm/lec.c:751 [inline]
BUG: KASAN: slab-use-after-free in lane_ioctl+0x2224/0x23e0 net/atm/lec.c:1008
Read of size 8 at addr ffff88807c7b8e68 by task syz.1.17/6142
CPU: 1 UID: 0 PID: 6142 Comm: syz.1.17 Not tainted 6.16.0-rc1-syzkaller-00239-g08215f5486ec #0 PREEMPT(full)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:408 [inline]
print_report+0xcd/0x680 mm/kasan/report.c:521
kasan_report+0xe0/0x110 mm/kasan/report.c:634
lecd_attach net/atm/lec.c:751 [inline]
lane_ioctl+0x2224/0x23e0 net/atm/lec.c:1008
do_vcc_ioctl+0x12c/0x930 net/atm/ioctl.c:159
sock_do_ioctl+0x118/0x280 net/socket.c:1190
sock_ioctl+0x227/0x6b0 net/socket.c:1311
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl fs/ioctl.c:893 [inline]
__x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
Allocated by task 6132:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
__kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:394
kasan_kmalloc include/linux/kasan.h:260 [inline]
__do_kmalloc_node mm/slub.c:4328 [inline]
__kvmalloc_node_noprof+0x27b/0x620 mm/slub.c:5015
alloc_netdev_mqs+0xd2/0x1570 net/core/dev.c:11711
lecd_attach net/atm/lec.c:737 [inline]
lane_ioctl+0x17db/0x23e0 net/atm/lec.c:1008
do_vcc_ioctl+0x12c/0x930 net/atm/ioctl.c:159
sock_do_ioctl+0x118/0x280 net/socket.c:1190
sock_ioctl+0x227/0x6b0 net/socket.c:1311
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl fs/ioctl.c:893 [inline]
__x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 6132:
kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
kasan_save_track+0x14/0x30 mm/kasan/common.c:68
kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:576
poison_slab_object mm/kasan/common.c:247 [inline]
__kasan_slab_free+0x51/0x70 mm/kasan/common.c:264
kasan_slab_free include/linux/kasan.h:233 [inline]
slab_free_hook mm/slub.c:2381 [inline]
slab_free mm/slub.c:4643 [inline]
kfree+0x2b4/0x4d0 mm/slub.c:4842
free_netdev+0x6c5/0x910 net/core/dev.c:11892
lecd_attach net/atm/lec.c:744 [inline]
lane_ioctl+0x1ce8/0x23e0 net/atm/lec.c:1008
do_vcc_ioctl+0x12c/0x930 net/atm/ioctl.c:159
sock_do_ioctl+0x118/0x280 net/socket.c:1190
sock_ioctl+0x227/0x6b0 net/socket.c:1311
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:907 [inline]
__se_sys_ioctl fs/ioctl.c:893 [inline]
__x64_sys_ioctl+0x18e/0x210 fs/ioctl.c:893
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+8b64dec3affaed7b3af5@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6852c6f6.050a0220.216029.0018.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250618140844.1686882-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The message "Error getting PHY irq. Use polling instead"
is emitted when the mlxbf_gige driver is loaded by the
kernel before the associated gpio-mlxbf driver, and thus
the call to get the PHY IRQ fails since it is not yet
available. The driver probe() must return -EPROBE_DEFER
if acpi_dev_gpio_irq_get_by() returns the same.
Fixes: 6c2a6ddca763 ("net: mellanox: mlxbf_gige: Replace non-standard interrupt handling")
Signed-off-by: David Thompson <davthompson@nvidia.com>
Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250618135902.346-1-davthompson@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|