Age | Commit message (Collapse) | Author |
|
Patch series "Rework find_get_entries() and find_lock_entries()", v3.
Originally the callers of find_get_entries() and find_lock_entries() were
keeping track of the start index themselves as they traverse the search
range.
This resulted in hacky code such as in shmem_undo_range():
index = folio->index + folio_nr_pages(folio) - 1;
where the - 1 is only present to stay in the right spot after incrementing
index later. This sort of calculation was also being done on every folio
despite not even using index later within that function.
These patches change find_get_entries() and find_lock_entries() to
calculate the new index instead of leaving it to the callers so we can
avoid all these complications.
This patch (of 2):
Initially, find_lock_entries() was being passed in the start offset as a
value. That left the calculation of the offset to the callers. This led
to complexity in the callers trying to keep track of the index.
Now find_lock_entries() takes in a pointer to the start offset and updates
the value to be directly after the last entry found. If no entry is
found, the offset is not changed. This gets rid of multiple hacky
calculations that kept track of the start offset.
Link: https://lkml.kernel.org/r/20221017161800.2003-1-vishal.moola@gmail.com
Link: https://lkml.kernel.org/r/20221017161800.2003-2-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit 2555283eb40d ("mm/rmap: Fix anon_vma->degree ambiguity leading to
double-reuse") use num_children and num_active_vmas to replace the origin
degree to fix anon_vma UAF problem. Update the comment in anon_vma_clone
to fit this change.
Link: https://lkml.kernel.org/r/20221014013931.1565969-1-mawupeng1@huawei.com
Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Helper function to retrieve hstate information from a hugetlb folio.
Link: https://lkml.kernel.org/r/20220922154207.1575343-6-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Cross <ccross@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Remove the last caller of delete_from_page_cache() by converting the code
to its folio equivalent.
Link: https://lkml.kernel.org/r/20220922154207.1575343-5-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Cross <ccross@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Allow hugetlbfs_migrate_folio to check and read subpool information by
passing in a folio.
Link: https://lkml.kernel.org/r/20220922154207.1575343-4-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Cross <ccross@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Allow struct folio to store hugetlb metadata that is contained in the
private field of the first tail page. On 32-bit, _private_1 aligns with
page[1].private.
Link: https://lkml.kernel.org/r/20220922154207.1575343-3-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Cross <ccross@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "begin converting hugetlb code to folios", v4.
This patch series starts the conversion of the hugetlb code to operate on
struct folios rather than struct pages. This removes the ambiguitiy of
whether functions are operating on head pages, tail pages of compound
pages, or base pages.
This series passes the linux test project hugetlb test cases.
Patch 1 adds hugeltb specific page macros that can operate on folios.
Patch 2 adds the private field of the first tail page to struct page. For
32-bit, _private_1 alinging with page[1].private was confirmed by using
pahole.
Patch 3 introduces hugetlb subpool helper functions which operate on
struct folios. These patches were tested using the hugepage-mmap.c
selftest along with the migratepages command.
Patch 4 converts hugetlb_delete_from_page_cache() to use folios.
Patch 5 adds a folio_hstate() function to get hstate information from a
folio and adds a user of folio_hstate().
Bpftrace was used to track time spent in the free_huge_pages function
during the ltp test cases as it is a caller of the hugetlb subpool
functions. From the histogram, the performance is similar before and
after the patch series.
Time spent in 'free_huge_page'
6.0.0-rc2.master.20220823
@nsecs:
[256, 512) 14770 |@@@@@@@@@@@@@@@@@@@@@@@@@@@
|@@@@@@@@@@@@@@@@@@@@@@@@@ |
[512, 1K) 155 | |
[1K, 2K) 169 | |
[2K, 4K) 50 | |
[4K, 8K) 14 | |
[8K, 16K) 3 | |
[16K, 32K) 3 | |
6.0.0-rc2.master.20220823 + patch series
@nsecs:
[256, 512) 13678 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
|@@@@@@@@@@@@@@@@@@@@@@@@@ |
[512, 1K) 142 | |
[1K, 2K) 199 | |
[2K, 4K) 44 | |
[4K, 8K) 13 | |
[8K, 16K) 4 | |
[16K, 32K) 1 | |
This patch (of 5):
Allow the macros which test, set, and clear hugetlb specific page flags to
take a hugetlb folio as an input. The macrros are generated as
folio_{test, set, clear}_hugetlb_{restore_reserve, migratable, temporary,
freed, vmemmap_optimized, raw_hwp_unreliable}.
Link: https://lkml.kernel.org/r/20220922154207.1575343-1-sidhartha.kumar@oracle.com
Link: https://lkml.kernel.org/r/20220922154207.1575343-2-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Colin Cross <ccross@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
After converting all the three relevant testcases (uffd, madvise, mremap)
to use memfd, no test will need the hugetlb mount point anymore. Drop the
code.
Link: https://lkml.kernel.org/r/20221014144015.94039-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
For dropping the hugetlb mountpoint in run_vmtests.sh. Cleaned it up a
little bit around the changed codes.
Link: https://lkml.kernel.org/r/20221014144013.94027-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
For dropping the hugetlb mountpoint in run_vmtests.sh. Since no parameter
is needed, drop USAGE too.
Link: https://lkml.kernel.org/r/20221014143921.93887-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "selftests/vm: Drop hugetlb mntpoint in run_vmtests.sh", v2.
Clean the code up so we can use the same memfd for both hugetlb and shmem
which is cleaner.
This patch (of 4):
We already used memfd for shmem test, move it forward with hugetlb too so
that we don't need user to specify the hugetlb file path explicitly when
running hugetlb shared tests.
Link: https://lkml.kernel.org/r/20221014143921.93887-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20221014143921.93887-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We noticed a 2% webserver throughput regression after upgrading from 5.6.
This could be tracked down to a shift in the anon/file reclaim balance
(confirmed with swappiness) that resulted in worse reclaim efficiency and
thus more kswapd activity for the same outcome.
The change that exposed the problem is aae466b0052e ("mm/swap: implement
workingset detection for anonymous LRU"). By qualifying swapins based on
their refault distance, it lowered the cost of anon reclaim in this
workload, in turn causing (much) more anon scanning than before. Scanning
the anon list is more expensive due to the higher ratio of mmapped pages
that may rotate during reclaim, and so the result was an increase in %sys
time.
Right now, rotations aren't considered a cost when balancing scan pressure
between LRUs. We can end up with very few file refaults putting all the
scan pressure on hot anon pages that are rotated en masse, don't get
reclaimed, and never push back on the file LRU again. We still only
reclaim file cache in that case, but we burn a lot CPU rotating anon
pages. It's "fair" from an LRU age POV, but doesn't reflect the real cost
it imposes on the system.
Consider rotations as a secondary factor in balancing the LRUs. This
doesn't attempt to make a precise comparison between IO cost and CPU cost,
it just says: if reloads are about comparable between the lists, or
rotations are overwhelmingly different, adjust for CPU work.
This fixed the regression on our webservers. It has since been deployed
to the entire Meta fleet and hasn't caused any problems.
Link: https://lkml.kernel.org/r/20221013193113.726425-1-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
During discussions of this series [1], it was suggested that hugetlb
handling code in follow_page_mask could be simplified. At the beginning
of follow_page_mask, there currently is a call to follow_huge_addr which
'may' handle hugetlb pages. ia64 is the only architecture which provides
a follow_huge_addr routine that does not return error. Instead, at each
level of the page table a check is made for a hugetlb entry. If a hugetlb
entry is found, a call to a routine associated with that entry is made.
Currently, there are two checks for hugetlb entries at each page table
level. The first check is of the form:
if (p?d_huge())
page = follow_huge_p?d();
the second check is of the form:
if (is_hugepd())
page = follow_huge_pd().
We can replace these checks, as well as the special handling routines such
as follow_huge_p?d() and follow_huge_pd() with a single routine to handle
hugetlb vmas.
A new routine hugetlb_follow_page_mask is called for hugetlb vmas at the
beginning of follow_page_mask. hugetlb_follow_page_mask will use the
existing routine huge_pte_offset to walk page tables looking for hugetlb
entries. huge_pte_offset can be overwritten by architectures, and already
handles special cases such as hugepd entries.
[1] https://lore.kernel.org/linux-mm/cover.1661240170.git.baolin.wang@linux.alibaba.com/
[mike.kravetz@oracle.com: remove vma (pmd sharing) per Peter]
Link: https://lkml.kernel.org/r/20221028181108.119432-1-mike.kravetz@oracle.com
[mike.kravetz@oracle.com: remove left over hugetlb_vma_unlock_read()]
Link: https://lkml.kernel.org/r/20221030225825.40872-1-mike.kravetz@oracle.com
Link: https://lkml.kernel.org/r/20220919021348.22151-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a blank line to make the sentence before the list render as a separate
paragraph, not a definition.
Link: https://lkml.kernel.org/r/20221107142255.4038811-1-glider@google.com
Fixes: 93858ae70cf4 ("kmsan: add ReST documentation")
Signed-off-by: Alexander Potapenko <glider@google.com>
Suggested-by: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
A user could write a name of a file under 'damon/' debugfs directory,
which is not a user-created context, to 'rm_contexts' file. In the case,
'dbgfs_rm_context()' just assumes it's the valid DAMON context directory
only if a file of the name exist. As a result, invalid memory access
could happen as below. Fix the bug by checking if the given input is for
a directory. This check can filter out non-context inputs because
directories under 'damon/' debugfs directory can be created via only
'mk_contexts' file.
This bug has found by syzbot[1].
[1] https://lore.kernel.org/damon/000000000000ede3ac05ec4abf8e@google.com/
Link: https://lkml.kernel.org/r/20221107165001.5717-2-sj@kernel.org
Fixes: 75c1c2b53c78 ("mm/damon/dbgfs: support multiple contexts")
Signed-off-by: SeongJae Park <sj@kernel.org>
Reported-by: syzbot+6087eafb76a94c4ac9eb@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org> [5.15.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In RCU mode, the node limits were being updated to the last pivot which
may not be correct and would cause the metadata to be set when it
shouldn't. Fix this by not setting a new limit in this case.
Link: https://lkml.kernel.org/r/20221107163857.867377-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It is possible to confuse the depth tracking in the maple state by
searching the same node for values. Fix the depth tracking by moving
where the depth is incremented closer to where the node changes level.
Also change the initial depth setting when using the root node.
Link: https://lkml.kernel.org/r/20221107163814.866612-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The following bug is reported to be triggered when starting X on x86-32
system with i915:
[ 225.777375] kernel BUG at mm/memory.c:2664!
[ 225.777391] invalid opcode: 0000 [#1] PREEMPT SMP
[ 225.777405] CPU: 0 PID: 2402 Comm: Xorg Not tainted 6.1.0-rc3-bdg+ #86
[ 225.777415] Hardware name: /8I865G775-G, BIOS F1 08/29/2006
[ 225.777421] EIP: __apply_to_page_range+0x24d/0x31c
[ 225.777437] Code: ff ff 8b 55 e8 8b 45 cc e8 0a 11 ec ff 89 d8 83 c4 28 5b 5e 5f 5d c3 81 7d e0 a0 ef 96 c1 74 ad 8b 45 d0 e8 2d 83 49 00 eb a3 <0f> 0b 25 00 f0 ff ff 81 eb 00 00 00 40 01 c3 8b 45 ec 8b 00 e8 76
[ 225.777446] EAX: 00000001 EBX: c53a3b58 ECX: b5c00000 EDX: c258aa00
[ 225.777454] ESI: b5c00000 EDI: b5900000 EBP: c4b0fdb4 ESP: c4b0fd80
[ 225.777462] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010202
[ 225.777470] CR0: 80050033 CR2: b5900000 CR3: 053a3000 CR4: 000006d0
[ 225.777479] Call Trace:
[ 225.777486] ? i915_memcpy_init_early+0x63/0x63 [i915]
[ 225.777684] apply_to_page_range+0x21/0x27
[ 225.777694] ? i915_memcpy_init_early+0x63/0x63 [i915]
[ 225.777870] remap_io_mapping+0x49/0x75 [i915]
[ 225.778046] ? i915_memcpy_init_early+0x63/0x63 [i915]
[ 225.778220] ? mutex_unlock+0xb/0xd
[ 225.778231] ? i915_vma_pin_fence+0x6d/0xf7 [i915]
[ 225.778420] vm_fault_gtt+0x2a9/0x8f1 [i915]
[ 225.778644] ? lock_is_held_type+0x56/0xe7
[ 225.778655] ? lock_is_held_type+0x7a/0xe7
[ 225.778663] ? 0xc1000000
[ 225.778670] __do_fault+0x21/0x6a
[ 225.778679] handle_mm_fault+0x708/0xb21
[ 225.778686] ? mt_find+0x21e/0x5ae
[ 225.778696] exc_page_fault+0x185/0x705
[ 225.778704] ? doublefault_shim+0x127/0x127
[ 225.778715] handle_exception+0x130/0x130
[ 225.778723] EIP: 0xb700468a
Recently pud_huge() got aware of non-present entry by commit 3a194f3f8ad0
("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present
pud entry") to handle some special states of gigantic page. However, it's
overlooked that pud_none() always returns false when running with 2-level
paging, and as a result pud_huge() can return true pointlessly.
Introduce "#if CONFIG_PGTABLE_LEVELS > 2" to pud_huge() to deal with this.
Link: https://lkml.kernel.org/r/20221107021010.2449306-1-naoya.horiguchi@linux.dev
Fixes: 3a194f3f8ad0 ("mm/hugetlb: make pud_huge() and follow_huge_pud() aware of non-present pud entry")
Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Liu Shixin <liushixin2@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When psi annotations were added to to btrfs compression reads, the psi
state tracking over add_ra_bio_pages and btrfs_submit_compressed_read was
faulty. A pressure state, once entered, is never left. This results in
incorrectly elevated pressure, which triggers OOM kills.
pflags record the *previous* memstall state when we enter a new one. The
code tried to initialize pflags to 1, and then optimize the leave call
when we either didn't enter a memstall, or were already inside a nested
stall. However, there can be multiple PageWorkingset pages in the bio, at
which point it's that path itself that enters repeatedly and overwrites
pflags. This causes us to miss the exit.
Enter the stall only once if needed, then unwind correctly.
erofs has the same problem, fix that up too. And move the memstall exit
past submit_bio() to restore submit accounting originally added by
b8e24a9300b0 ("block: annotate refault stalls from IO submission").
Link: https://lkml.kernel.org/r/Y2UHRqthNUwuIQGS@cmpxchg.org
Fixes: 4088a47e78f9 ("btrfs: add manual PSI accounting for compressed reads")
Fixes: 99486c511f68 ("erofs: add manual PSI accounting for the compressed address space")
Fixes: 118f3663fbc6 ("block: remove PSI accounting from the bio layer")
Link: https://lore.kernel.org/r/d20a0a85-e415-cf78-27f9-77dd7a94bc8d@leemhuis.info/
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Thorsten Leemhuis <linux@leemhuis.info>
Tested-by: Thorsten Leemhuis <linux@leemhuis.info>
Cc: Chao Yu <chao@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Sterba <dsterba@suse.com>
Cc: Gao Xiang <xiang@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
If a nilfs2 filesystem is downgraded to read-only due to metadata
corruption on disk and is remounted read/write, or if emergency read-only
remount is performed, detaching a log writer and synchronizing the
filesystem can be done at the same time.
In these cases, use-after-free of the log writer (hereinafter
nilfs->ns_writer) can happen as shown in the scenario below:
Task1 Task2
-------------------------------- ------------------------------
nilfs_construct_segment
nilfs_segctor_sync
init_wait
init_waitqueue_entry
add_wait_queue
schedule
nilfs_remount (R/W remount case)
nilfs_attach_log_writer
nilfs_detach_log_writer
nilfs_segctor_destroy
kfree
finish_wait
_raw_spin_lock_irqsave
__raw_spin_lock_irqsave
do_raw_spin_lock
debug_spin_lock_before <-- use-after-free
While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1
waked up, Task1 accesses nilfs->ns_writer which is already freed. This
scenario diagram is based on the Shigeru Yoshida's post [1].
This patch fixes the issue by not detaching nilfs->ns_writer on remount so
that this UAF race doesn't happen. Along with this change, this patch
also inserts a few necessary read-only checks with superblock instance
where only the ns_writer pointer was used to check if the filesystem is
read-only.
Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df8b
Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1]
Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+f816fa82f8783f7a02bb@syzkaller.appspotmail.com
Reported-by: Shigeru Yoshida <syoshida@redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There is a case in exc_invalid_op handler that is executed outside the
irqentry_enter()/irqentry_exit() region when an UD2 instruction is used to
encode a call to __warn().
In that case the `struct pt_regs` passed to the interrupt handler is never
unpoisoned by KMSAN (this is normally done in irqentry_enter()), which
leads to false positives inside handle_bug().
Use kmsan_unpoison_entry_regs() to explicitly unpoison those registers
before using them.
Link: https://lkml.kernel.org/r/20221102110611.1085175-5-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Marco Elver <elver@google.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
As pointed out by Peter Zijlstra, __msan_poison_alloca() does not play
well with IRQ code when PREEMPT_RT is on, because in that mode even
GFP_ATOMIC allocations cannot be performed.
Fixing this would require making stackdepot completely lockless, which is
quite challenging and may be excessive for the time being.
Instead, make sure KMSAN is incompatible with PREEMPT_RT, like other debug
configs are.
Link: https://lkml.kernel.org/r/20221102110611.1085175-4-glider@google.com
Link: https://lore.kernel.org/lkml/20221025221755.3810809-1-glider@google.com/
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
As pointed out by Masahiro Yamada, Kconfig picks up the first default
entry which has true 'if' condition. Hence, the previously added check
for KMSAN was never used, because it followed the checks for 64BIT and
!64BIT.
Put KMSAN check before others to ensure it is always applied.
Link: https://lkml.kernel.org/r/20221102110611.1085175-3-glider@google.com
Link: https://github.com/google/kmsan/issues/89
Link: https://lore.kernel.org/linux-mm/20221024212144.2852069-3-glider@google.com/
Fixes: 921757bc9b61 ("Kconfig.debug: disable CONFIG_FRAME_WARN for KMSAN by default")
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Make sure usercopy hooks from linux/instrumented.h are invoked for
copy_from_user_nmi(). This fixes KMSAN false positives reported when
dumping opcodes for a stack trace.
Link: https://lkml.kernel.org/r/20221102110611.1085175-2-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Marco Elver <elver@google.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Without that, every call to __msan_poison_alloca() in NMI may end up
allocating memory, which is NMI-unsafe.
Link: https://lkml.kernel.org/r/20221102110611.1085175-1-glider@google.com
Link: https://lore.kernel.org/lkml/20221025221755.3810809-1-glider@google.com/
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The kernel test robot reported build failures with a 'randconfig' on s390:
>> mm/hugetlb_vmemmap.c:421:11: error: a function declaration without a
prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
core_param(hugetlb_free_vmemmap, vmemmap_optimize_enabled, bool, 0);
^
Link: https://lore.kernel.org/linux-mm/202210300751.rG3UDsuc-lkp@intel.com/
Link: https://lkml.kernel.org/r/patch.git-296b83ca939b.your-ad-here.call-01667411912-ext-5073@work.hours
Fixes: 30152245c63b ("mm: hugetlb_vmemmap: replace early_param() with core_param()")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
mfill_atomic_install_pte() checks page->mapping to detect whether one page
is used in the page cache. However as pointed out by Matthew, the page
can logically be a tail page rather than always the head in the case of
uffd minor mode with UFFDIO_CONTINUE. It means we could wrongly install
one pte with shmem thp tail page assuming it's an anonymous page.
It's not that clear even for anonymous page, since normally anonymous
pages also have page->mapping being setup with the anon vma. It's safe
here only because the only such caller to mfill_atomic_install_pte() is
always passing in a newly allocated page (mcopy_atomic_pte()), whose
page->mapping is not yet setup. However that's not extremely obvious
either.
For either of above, use page_mapping() instead.
Link: https://lkml.kernel.org/r/Y2K+y7wnhC4vbnP2@x1n
Fixes: 153132571f02 ("userfaultfd/shmem: support UFFDIO_CONTINUE for shmem")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Matthew Wilcox <willy@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
virtio_pmem use devm_memremap_pages() to map the device memory. By
default this memory is mapped as encrypted with SEV. Guest reboot changes
the current encryption key and guest no longer properly decrypts the FSDAX
device meta data.
Mark the corresponding device memory region for FSDAX devices (mapped with
memremap_pages) as decrypted to retain the persistent memory property.
Link: https://lkml.kernel.org/r/20221102160728.3184016-1-pankaj.gupta@amd.com
Fixes: b7b3c01b19159 ("mm/memremap_pages: support multiple ranges per invocation")
Signed-off-by: Pankaj Gupta <pankaj.gupta@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Anatoly Pugachev reported sparc64 breakage on the patch:
https://lore.kernel.org/r/20221021160603.GA23307@u164.east.ru
The sparc64 impl of pte_mkdirty() is definitely slightly special in that
it leverages a code patching mechanism for sun4u/sun4v on relevant pgtable
entry operations.
Before having a clue of why the sparc64 is special and caused the patch to
SIGSEGV the processes, revert the patch for now. The swap path of dirty
bit inheritage is kept because that's using the swap shared code so we
assume it'll not be affected.
Link: https://lkml.kernel.org/r/Y1Wbi4yyVvDtg4zN@x1n
Fixes: 0ccf7f168e17 ("mm/thp: carry over dirty bit when thp splits on pmd")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
A semaphore deadlock can occur if nilfs_get_block() detects metadata
corruption while locating data blocks and a superblock writeback occurs at
the same time:
task 1 task 2
------ ------
* A file operation *
nilfs_truncate()
nilfs_get_block()
down_read(rwsem A) <--
nilfs_bmap_lookup_contig()
... generic_shutdown_super()
nilfs_put_super()
* Prepare to write superblock *
down_write(rwsem B) <--
nilfs_cleanup_super()
* Detect b-tree corruption * nilfs_set_log_cursor()
nilfs_bmap_convert_error() nilfs_count_free_blocks()
__nilfs_error() down_read(rwsem A) <--
nilfs_set_error()
down_write(rwsem B) <--
*** DEADLOCK ***
Here, nilfs_get_block() readlocks rwsem A (= NILFS_MDT(dat_inode)->mi_sem)
and then calls nilfs_bmap_lookup_contig(), but if it fails due to metadata
corruption, __nilfs_error() is called from nilfs_bmap_convert_error()
inside the lock section.
Since __nilfs_error() calls nilfs_set_error() unless the filesystem is
read-only and nilfs_set_error() attempts to writelock rwsem B (=
nilfs->ns_sem) to write back superblock exclusively, hierarchical lock
acquisition occurs in the order rwsem A -> rwsem B.
Now, if another task starts updating the superblock, it may writelock
rwsem B during the lock sequence above, and can deadlock trying to
readlock rwsem A in nilfs_count_free_blocks().
However, there is actually no need to take rwsem A in
nilfs_count_free_blocks() because it, within the lock section, only reads
a single integer data on a shared struct with
nilfs_sufile_get_ncleansegs(). This has been the case after commit
aa474a220180 ("nilfs2: add local variable to cache the number of clean
segments"), that is, even before this bug was introduced.
So, this resolves the deadlock problem by just not taking the semaphore in
nilfs_count_free_blocks().
Link: https://lkml.kernel.org/r/20221029044912.9139-1-konishi.ryusuke@gmail.com
Fixes: e828949e5b42 ("nilfs2: call nilfs_error inside bmap routines")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: syzbot+45d6ce7b7ad7ef455d03@syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org> [2.6.38+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There is a memory leak reported by kmemleak:
unreferenced object 0xffff88817231ce40 (size 224):
comm "mount.cifs", pid 19308, jiffies 4295917571 (age 405.880s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
60 c0 b2 00 81 88 ff ff 98 83 01 42 81 88 ff ff `..........B....
backtrace:
[<ffffffff81936171>] __alloc_file+0x21/0x250
[<ffffffff81937051>] alloc_empty_file+0x41/0xf0
[<ffffffff81937159>] alloc_file+0x59/0x710
[<ffffffff81937964>] alloc_file_pseudo+0x154/0x210
[<ffffffff81741dbf>] __shmem_file_setup+0xff/0x2a0
[<ffffffff817502cd>] shmem_zero_setup+0x8d/0x160
[<ffffffff817cc1d5>] mmap_region+0x1075/0x19d0
[<ffffffff817cd257>] do_mmap+0x727/0x1110
[<ffffffff817518b2>] vm_mmap_pgoff+0x112/0x1e0
[<ffffffff83adf955>] do_syscall_64+0x35/0x80
[<ffffffff83c0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
The root cause was traced to an error handing path in mmap_region() when
arch_validate_flags() or mas_preallocate() fails. In the shared anonymous
mapping sence, vma will be setuped and mapped with a new shared anonymous
file via shmem_zero_setup(). So in this case, the file resource needs to
be released.
Fix it by calling fput(vma->vm_file) and unmap_region() when
arch_validate_flags() or mas_preallocate() returns an error in the shared
anonymous mapping sence.
Link: https://lkml.kernel.org/r/20221028073717.1179380-1-lizetao1@huawei.com
Fixes: d4af56c5c7c6 ("mm: start tracking VMAs with maple tree")
Fixes: c462ac288f2c ("mm: Introduce arch_validate_flags()")
Signed-off-by: Li Zetao <lizetao1@huawei.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Along the development cycle, the testing code support for module/in-kernel
compiles was removed. Restore this functionality by moving any internal
API tests to the userspace side, as well as threading tests. Fix the
lockdep issues and add a way to reduce memory usage so the tests can
complete with KASAN + memleak detection. Make the tests work on 32 bit
hosts where possible and detect 32 bit hosts in the radix test suite.
[akpm@linux-foundation.org: fix module export]
[akpm@linux-foundation.org: fix it some more]
[liam.howlett@oracle.com: fix compile warnings on 32bit build in check_find()]
Link: https://lkml.kernel.org/r/20221107203816.1260327-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20221028180415.3074673-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
clang-analyzer reported some Dead Stores in mas_anode_descend(). Upon
inspection, there were a few clean ups that would make the code cleaner:
The count variable was set from the mt_slots array and then updated but
never used again. Just use the array reference directly.
Also stop updating the type since it isn't used after the update.
Stop setting the gaps pointer to NULL at the start since it is always
set before the loop begins.
Link: https://lkml.kernel.org/r/20221026151413.4032730-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
There is a more direct and cleaner way of implementing the same functional
code. Remove the confusing and unnecessary use of pointers here.
Link: https://lkml.kernel.org/r/20221026151241.4031117-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Suggested-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Dan Williams:
"Several fixes for CXL region creation crashes, leaks and failures.
This is mainly fallout from the original implementation of dynamic CXL
region creation (instantiate new physical memory pools) that arrived
in v6.0-rc1.
Given the theme of "failures in the presence of pass-through decoders"
this also includes new regression test infrastructure for that case.
Summary:
- Fix region creation crash with pass-through decoders
- Fix region creation crash when no decoder allocation fails
- Fix region creation crash when scanning regions to enforce the
increasing physical address order constraint that CXL mandates
- Fix a memory leak for cxl_pmem_region objects, track 1:N instead of
1:1 memory-device-to-region associations.
- Fix a memory leak for cxl_region objects when regions with active
targets are deleted
- Fix assignment of NUMA nodes to CXL regions by CFMWS (CXL Window)
emulated proximity domains.
- Fix region creation failure for switch attached devices downstream
of a single-port host-bridge
- Fix false positive memory leak of cxl_region objects by recycling
recently used region ids rather than freeing them
- Add regression test infrastructure for a pass-through decoder
configuration
- Fix some mailbox payload handling corner cases"
* tag 'cxl-fixes-for-6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/region: Recycle region ids
cxl/region: Fix 'distance' calculation with passthrough ports
tools/testing/cxl: Add a single-port host-bridge regression config
tools/testing/cxl: Fix some error exits
cxl/pmem: Fix cxl_pmem_region and cxl_memdev leak
cxl/region: Fix cxl_region leak, cleanup targets at region delete
cxl/region: Fix region HPA ordering validation
cxl/pmem: Use size_add() against integer overflow
cxl/region: Fix decoder allocation crash
ACPI: NUMA: Add CXL CFMWS 'nodes' to the possible nodes set
cxl/pmem: Fix failure to account for 8 byte header for writes to the device LSA.
cxl/region: Fix null pointer dereference due to pass through decoder commit
cxl/mbox: Add a check on input payload size
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix two regressions:
- Commit 54cc3dbfc10d ("hwmon: (pmbus) Add regulator supply into
macro") resulted in regulator undercount when disabling regulators.
Revert it.
- The thermal subsystem rework caused the scmi driver to no longer
register with the thermal subsystem because index values no longer
match. To fix the problem, the scmi driver now directly registers
with the thermal subsystem, no longer through the hwmon core"
* tag 'hwmon-for-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
Revert "hwmon: (pmbus) Add regulator supply into macro"
hwmon: (scmi) Register explicitly with Thermal Framework
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Add Cooper Lake's stepping to the PEBS guest/host events isolation
fixed microcode revisions checking quirk
- Update Icelake and Sapphire Rapids events constraints
- Use the standard energy unit for Sapphire Rapids in RAPL
- Fix the hw_breakpoint test to fail more graciously on !SMP configs
* tag 'perf_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Add Cooper Lake stepping to isolation_ucodes[]
perf/x86/intel: Fix pebs event constraints for SPR
perf/x86/intel: Fix pebs event constraints for ICL
perf/x86/rapl: Use standard Energy Unit for SPR Dram RAPL domain
perf/hw_breakpoint: test: Skip the test if dependencies unmet
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add new Intel CPU models
- Enforce that TDX guests are successfully loaded only on TDX hardware
where virtualization exception (#VE) delivery on kernel memory is
disabled because handling those in all possible cases is "essentially
impossible"
- Add the proper include to the syscall wrappers so that BTF can see
the real pt_regs definition and not only the forward declaration
* tag 'x86_urgent_for_v6.1_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Add several Intel server CPU model numbers
x86/tdx: Panic on bad configs that #VE on "private" memory access
x86/tdx: Prepare for using "INFO" call for a second purpose
x86/syscall: Include asm/ptrace.h in syscall_wrapper header
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Use POSIX-compatible grep options
- Document git-related tips for reproducible builds
- Fix a typo in the modpost rule
- Suppress SIGPIPE error message from gcc-ar and llvm-ar
- Fix segmentation fault in the menuconfig search
* tag 'kbuild-fixes-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: fix segmentation fault in menuconfig search
kbuild: fix SIGPIPE error message for AR=gcc-ar and AR=llvm-ar
kbuild: fix typo in modpost
Documentation: kbuild: Add description of git for reproducible builds
kbuild: use POSIX-compatible grep option
|
|
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix the pKVM stage-1 walker erronously using the stage-2 accessor
- Correctly convert vcpu->kvm to a hyp pointer when generating an
exception in a nVHE+MTE configuration
- Check that KVM_CAP_DIRTY_LOG_* are valid before enabling them
- Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
- Document the boot requirements for FGT when entering the kernel at
EL1
x86:
- Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()
- Make argument order consistent for kvcalloc()
- Userspace API fixes for DEBUGCTL and LBRs"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Fix a typo about the usage of kvcalloc()
KVM: x86: Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()
KVM: VMX: Ignore guest CPUID for host userspace writes to DEBUGCTL
KVM: VMX: Fold vmx_supported_debugctl() into vcpu_supported_debugctl()
KVM: VMX: Advertise PMU LBRs if and only if perf supports LBRs
arm64: booting: Document our requirements for fine grained traps with SME
KVM: arm64: Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
KVM: Check KVM_CAP_DIRTY_LOG_{RING, RING_ACQ_REL} prior to enabling them
KVM: arm64: Fix bad dereference on MTE-enabled systems
KVM: arm64: Use correct accessor to parse stage-1 PTEs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"One fix for silencing a smatch warning, and a small cleanup patch"
* tag 'for-linus-6.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen: simplify sysenter and syscall setup
x86/xen: silence smatch warning in pmu_msr_chk_emulated()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix a number of bugs, including some regressions, the most serious of
which was one which would cause online resizes to fail with file
systems with metadata checksums enabled.
Also fix a warning caused by the newly added fortify string checker,
plus some bugs that were found using fuzzed file systems"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix fortify warning in fs/ext4/fast_commit.c:1551
ext4: fix wrong return err in ext4_load_and_init_journal()
ext4: fix warning in 'ext4_da_release_space'
ext4: fix BUG_ON() when directory entry has invalid rec_len
ext4: update the backup superblock's at the end of the online resize
|
|
Pull cifs fixes from Steve French:
"One symlink handling fix and two fixes foir multichannel issues with
iterating channels, including for oplock breaks when leases are
disabled"
* tag '6.1-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix use-after-free on the link name
cifs: avoid unnecessary iteration of tcp sessions
cifs: always iterate smb sessions using primary channel
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull `lTracing fixes for 6.1-rc3:
- Fixed NULL pointer dereference in the ring buffer wait-waiters code
for machines that have less CPUs than what nr_cpu_ids returns.
The buffer array is of size nr_cpu_ids, but only the online CPUs get
initialized.
- Fixed use after free call in ftrace_shutdown.
- Fix accounting of if a kprobe is enabled
- Fix NULL pointer dereference on error path of fprobe rethook_alloc().
- Fix unregistering of fprobe_kprobe_handler
- Fix memory leak in kprobe test module
* tag 'trace-v6.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: kprobe: Fix memory leak in test_gen_kprobe/kretprobe_cmd()
tracing/fprobe: Fix to check whether fprobe is registered correctly
fprobe: Check rethook_alloc() return in rethook initialization
kprobe: reverse kp->flags when arm_kprobe failed
ftrace: Fix use-after-free for dynamic ftrace_ops
ring-buffer: Check for NULL cpu_buffer in ring_buffer_wake_waiters()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
* Fix the pKVM stage-1 walker erronously using the stage-2 accessor
* Correctly convert vcpu->kvm to a hyp pointer when generating
an exception in a nVHE+MTE configuration
* Check that KVM_CAP_DIRTY_LOG_* are valid before enabling them
* Fix SMPRI_EL1/TPIDR2_EL0 trapping on VHE
* Document the boot requirements for FGT when entering the kernel
at EL1
|
|
x86:
* Use SRCU to protect zap in __kvm_set_or_clear_apicv_inhibit()
* Make argument order consistent for kvcalloc()
* Userspace API fixes for DEBUGCTL and LBRs
|
|
With the new fortify string system, rework the memcpy to avoid this
warning:
memcpy: detected field-spanning write (size 60) of single field "&raw_inode->i_generation" at fs/ext4/fast_commit.c:1551 (size 4)
Cc: stable@kernel.org
Fixes: 54d9469bc515 ("fortify: Add run-time WARN for cross-field memcpy()")
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|
The return value is wrong in ext4_load_and_init_journal(). The local
variable 'err' need to be initialized before goto out. The original code
in __ext4_fill_super() is fine because it has two return values 'ret'
and 'err' and 'ret' is initialized as -EINVAL. After we factor out
ext4_load_and_init_journal(), this code is broken. So fix it by directly
returning -EINVAL in the error handler path.
Cc: stable@kernel.org
Fixes: 9c1dd22d7422 ("ext4: factor out ext4_load_and_init_journal()")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221025040206.3134773-1-yanaijie@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|