Age | Commit message (Collapse) | Author |
|
Some drivers (bnxt but I think also mlx5 from ML discussions) change
the size of the indirection table depending on the number of Rx rings.
Decouple the max table size from the size of the currently used table,
so that we can reserve space in the context for table growth.
Static members in ethtool_ops are good enough for now, we can add
callbacks to read the max size more dynamically if someone needs
that.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240711220713.283778-3-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
RSS contexts may get lost from a device, in various extreme circumstances.
Specifically if the firmware leaks resources and resets, or crashes and
either recovers in partially working state or the crash causes a
different FW version to run - creating the context again may fail.
Drivers should do their absolute best to prevent this from happening.
When it does, however, telling user that a context exists, when it can't
possibly be used any more is counter productive. Add a helper for
drivers to discard contexts. Print an error, in the future netlink
notification will also be sent.
More robust approaches were proposed, like keeping the contexts
but marking them as "dead" (but possibly resurrected by next reset).
That may be better but it's unclear at this stage whether the
effort is worth the benefits.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://patch.msgid.link/20240711220713.283778-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit eb8f689046b8 ("Use separate sections for __dev/
_cpu/__mem code/data").
Check section mismatch to __meminit* only when CONFIG_MEMORY_HOTPLUG=n.
With this change, the linker script and modpost become simpler, and we
can get rid of the __ref annotations from the memory hotplug code.
[sfr@canb.auug.org.au: remove MEM_KEEP from arch/powerpc/kernel/vmlinux.lds.S]
Link: https://lkml.kernel.org/r/20240710093213.2aefb25f@canb.auug.org.au
Link: https://lkml.kernel.org/r/20240706160511.2331061-2-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
These macros are not used anywhere.
Link: https://lkml.kernel.org/r/20240706160511.2331061-1-masahiroy@kernel.org
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The legacy PMD-sized THP counters at /proc/vmstat include thp_file_alloc,
thp_file_fallback and thp_file_fallback_charge, which rather confusingly
refer to shmem THP and do not include any other types of file pages. This
is inconsistent since in most other places in the kernel, THP counters are
explicitly separated for anon, shmem and file flavours. However, we are
stuck with it since it constitutes a user ABI.
Recently, commit 66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous
shmem") added equivalent mTHP stats for shmem, keeping the same "file_"
prefix in the names. But in future, we may want to add extra stats to
cover actual file pages, at which point, it would all become very
confusing.
So let's take the opportunity to rename these new counters "shmem_" before
the change makes it upstream and the ABI becomes immutable. While we are
at it, let's improve the documentation for the legacy counters to make it
clear that they count shmem pages only.
Link: https://lkml.kernel.org/r/20240710095503.3193901-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lance Yang <ioworker0@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Barry Song <baohua@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Daniel Gomez <da.gomez@samsung.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
After the commit 7fb1b252afb5 ("mm: shmem: add mTHP support for anonymous
shmem"), we can configure different policies through the multi-size THP
sysfs interface for anonymous shmem. But currently "THPeligible"
indicates only whether the mapping is eligible for allocating THP-pages as
well as the THP is PMD mappable or not for anonymous shmem, we need to
support semantics for mTHP with anonymous shmem similar to those for mTHP
with anonymous memory.
Link: https://lkml.kernel.org/r/20240705032309.24933-1-libang.li@antgroup.com
Signed-off-by: Bang Li <libang.li@antgroup.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Since the introduction of mTHP, the docuementation has stated that
khugepaged would be enabled when any mTHP size is enabled, and disabled
when all mTHP sizes are disabled. There are 2 problems with this; 1.
this is not what was implemented by the code and 2. this is not the
desirable behavior.
Desirable behavior is for khugepaged to be enabled when any PMD-sized THP
is enabled, anon or file. (Note that file THP is still controlled by the
top-level control so we must always consider that, as well as the PMD-size
mTHP control for anon). khugepaged only supports collapsing to PMD-sized
THP so there is no value in enabling it when PMD-sized THP is disabled.
So let's change the code and documentation to reflect this policy.
Further, per-size enabled control modification events were not previously
forwarded to khugepaged to give it an opportunity to start or stop.
Consequently the following was resulting in khugepaged eroneously not
being activated:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo always > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
[ryan.roberts@arm.com: v3]
Link: https://lkml.kernel.org/r/20240705102849.2479686-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20240705102849.2479686-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20240704091051.2411934-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface")
Closes: https://lore.kernel.org/linux-mm/7a0bbe69-1e3d-4263-b206-da007791a5c4@redhat.com/
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The current memory tier initialization process is distributed across two
different functions, memory_tier_init() and memory_tier_late_init(). This
design is hard to maintain. Thus, this patch is proposed to reduce the
possible code paths by consolidating different initialization patches into
one.
The earlier discussion with Jonathan and Ying is listed here:
https://lore.kernel.org/lkml/20240405150244.00004b49@Huawei.com/
If we want to put these two initializations together, they must be placed
together in the later function. Because only at that time, the HMAT
information will be ready, adist between nodes can be calculated, and
memory tiering can be established based on the adist. So we position the
initialization at memory_tier_init() to the memory_tier_late_init() call.
Moreover, it's natural to keep memory_tier initialization in drivers at
device_initcall() level.
If we simply move the set_node_memory_tier() from memory_tier_init() to
late_initcall(), it will result in HMAT not registering the
mt_adistance_algorithm callback function, because set_node_memory_tier()
is not performed during the memory tiering initialization phase, leading
to a lack of correct default_dram information.
Therefore, we introduced a nodemask to pass the information of the default
DRAM nodes. The reason for not choosing to reuse default_dram_type->nodes
is that it is not clean enough. So in the end, we use a __initdata
variable, which is a variable that is released once initialization is
complete, including both CPU and memory nodes for HMAT to iterate through.
Link: https://lkml.kernel.org/r/20240704072646.437579-1-horen.chuang@linux.dev
Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Gregory Price <gourry.memverge@gmail.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Ravi Jonnalagadda <ravis.opensrc@micron.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
It's a lot of math, and there is nothing memcontrol specific about it.
This makes it easier to use inside of the drm cgroup controller.
[akpm@linux-foundation.org: fix kerneldoc, per Jeff Johnson]
Link: https://lkml.kernel.org/r/20240703112510.36424-1-maarten.lankhorst@linux.intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Jeff Johnson <quic_jjohnson@quicinc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
A number of allocation helper functions were converted into macros to
account them at the call sites. Add a comment for each converted
allocation helper explaining why it has to be a macro and why we typecast
the return value wherever required. The patch also moves
acpi_os_acquire_object() closer to other allocation helpers to group them
together under the same comment. The patch has no functional changes.
Link: https://lkml.kernel.org/r/20240703174225.3891393-1-surenb@google.com
Fixes: 2c321f3f70bc ("mm: change inlined allocation helpers to account at the call site")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Christian König <christian.koenig@amd.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Thorsten Blum <thorsten.blum@toblux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
powerpc was the only user of CONFIG_ARCH_HAS_HUGEPD and doesn't use it
anymore, so remove all related code.
Link: https://lkml.kernel.org/r/4b10c54c794780b955f3ad6c657d0199dd792146.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
On powerpc 8xx huge_ptep_get() will need to know whether the given ptep is
a PTE entry or a PMD entry. This cannot be known with the PMD entry
itself because there is no easy way to know it from the content of the
entry.
So huge_ptep_get() will need to know either the size of the page or get
the pmd.
In order to be consistent with huge_ptep_get_and_clear(), give mm and
address to huge_ptep_get().
Link: https://lkml.kernel.org/r/cc00c70dd384298796a4e1b25d6c4eb306d3af85.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
On powerpc 8xx, when a page is 8M size, the information is in the PMD
entry. So allow architectures to provide __pte_leaf_size() instead of
pte_leaf_size() and provide the PMD entry to that function.
When __pte_leaf_size() is not defined, define it as a pte_leaf_size() so
that architectures not interested in the PMD arguments are not impacted.
Only define a default pte_leaf_size() when __pte_leaf_size() is not
defined to make sure nobody adds new calls to pte_leaf_size() in the core.
Link: https://lkml.kernel.org/r/c7c008f0a314bf8029ad7288fdc908db1ec7e449.1719928057.git.christophe.leroy@csgroup.eu
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm: introduce per-order mTHP split counters", v3.
At present, the split counters in THP statistics no longer include
PTE-mapped mTHP. Therefore, we want to introduce per-order mTHP split
counters to monitor the frequency of mTHP splits. This will assist
developers in better analyzing and optimizing system performance.
/sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
split
split_failed
split_deferred
This patch (of 2):
Currently, the split counters in THP statistics no longer include
PTE-mapped mTHP. Therefore, we propose introducing per-order mTHP split
counters to monitor the frequency of mTHP splits. This will help
developers better analyze and optimize system performance.
/sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats
split
split_failed
split_deferred
[ioworker0@gmail.com: make things more readable, per Barry and Baolin]
Link: https://lkml.kernel.org/r/20240704012905.42971-2-ioworker0@gmail.com
[ioworker0@gmail.com: use == for `order' test, per David]
Link: https://lkml.kernel.org/r/20240705113119.82210-1-ioworker0@gmail.com
Link: https://lkml.kernel.org/r/20240704012905.42971-1-ioworker0@gmail.com
Link: https://lkml.kernel.org/r/20240704012905.42971-2-ioworker0@gmail.com
Link: https://lkml.kernel.org/r/20240628130750.73097-1-ioworker0@gmail.com
Link: https://lkml.kernel.org/r/20240628130750.73097-2-ioworker0@gmail.com
Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
Signed-off-by: Lance Yang <ioworker0@gmail.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Barry Song <baohua@kernel.org>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Bang Li <libang.li@antgroup.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
For drivers that would like to longterm-pin the folios associated with a
memfd, the memfd_pin_folios() API provides an option to not only pin the
folios via FOLL_PIN but also to check and migrate them if they reside in
movable zone or CMA block. This API currently works with memfds but it
should work with any files that belong to either shmemfs or hugetlbfs.
Files belonging to other filesystems are rejected for now.
The folios need to be located first before pinning them via FOLL_PIN. If
they are found in the page cache, they can be immediately pinned.
Otherwise, they need to be allocated using the filesystem specific APIs
and then pinned.
[akpm@linux-foundation.org: improve the CONFIG_MMU=n situation, per SeongJae]
[vivek.kasireddy@intel.com: return -EINVAL if the end offset is greater than the size of memfd]
Link: https://lkml.kernel.org/r/IA0PR11MB71850525CBC7D541CAB45DF1F8DB2@IA0PR11MB7185.namprd11.prod.outlook.com
Link: https://lkml.kernel.org/r/20240624063952.1572359-4-vivek.kasireddy@intel.com
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> (v2)
Reviewed-by: David Hildenbrand <david@redhat.com> (v3)
Reviewed-by: Christoph Hellwig <hch@lst.de> (v6)
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Junxiao Chang <junxiao.chang@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm/gup: Introduce memfd_pin_folios() for pinning memfd
folios", v16.
Currently, some drivers (e.g, Udmabuf) that want to longterm-pin the
pages/folios associated with a memfd, do so by simply taking a reference
on them. This is not desirable because the pages/folios may reside in
Movable zone or CMA block.
Therefore, having drivers use memfd_pin_folios() API ensures that the
folios are appropriately pinned via FOLL_PIN for longterm DMA.
This patchset also introduces a few helpers and converts the Udmabuf
driver to use folios and memfd_pin_folios() API to longterm-pin the folios
for DMA. Two new Udmabuf selftests are also included to test the driver
and the new API.
This patch (of 9):
These helpers are the folio versions of unpin_user_page/unpin_user_pages.
They are currently only useful for unpinning folios pinned by
memfd_pin_folios() or other associated routines. However, they could find
new uses in the future, when more and more folio-only helpers are added to
GUP.
We should probably sanity check the folio as part of unpin similar to how
it is done in unpin_user_page/unpin_user_pages but we cannot cleanly do
that at the moment without also checking the subpage. Therefore, sanity
checking needs to be added to these routines once we have a way to
determine if any given folio is anon-exclusive (via a per folio
AnonExclusive flag).
Link: https://lkml.kernel.org/r/20240624063952.1572359-1-vivek.kasireddy@intel.com
Link: https://lkml.kernel.org/r/20240624063952.1572359-2-vivek.kasireddy@intel.com
Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dongwon Kim <dongwon.kim@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Junxiao Chang <junxiao.chang@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When loading a EXT program without specifying `attr->attach_prog_fd`,
the `prog->aux->dst_prog` will be null. At this time, calling
resolve_prog_type() anywhere will result in a null pointer dereference.
Example stack trace:
[ 8.107863] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
[ 8.108262] Mem abort info:
[ 8.108384] ESR = 0x0000000096000004
[ 8.108547] EC = 0x25: DABT (current EL), IL = 32 bits
[ 8.108722] SET = 0, FnV = 0
[ 8.108827] EA = 0, S1PTW = 0
[ 8.108939] FSC = 0x04: level 0 translation fault
[ 8.109102] Data abort info:
[ 8.109203] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 8.109399] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 8.109614] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 8.109836] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101354000
[ 8.110011] [0000000000000004] pgd=0000000000000000, p4d=0000000000000000
[ 8.112624] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 8.112783] Modules linked in:
[ 8.113120] CPU: 0 PID: 99 Comm: may_access_dire Not tainted 6.10.0-rc3-next-20240613-dirty #1
[ 8.113230] Hardware name: linux,dummy-virt (DT)
[ 8.113390] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 8.113429] pc : may_access_direct_pkt_data+0x24/0xa0
[ 8.113746] lr : add_subprog_and_kfunc+0x634/0x8e8
[ 8.113798] sp : ffff80008283b9f0
[ 8.113813] x29: ffff80008283b9f0 x28: ffff800082795048 x27: 0000000000000001
[ 8.113881] x26: ffff0000c0bb2600 x25: 0000000000000000 x24: 0000000000000000
[ 8.113897] x23: ffff0000c1134000 x22: 000000000001864f x21: ffff0000c1138000
[ 8.113912] x20: 0000000000000001 x19: ffff0000c12b8000 x18: ffffffffffffffff
[ 8.113929] x17: 0000000000000000 x16: 0000000000000000 x15: 0720072007200720
[ 8.113944] x14: 0720072007200720 x13: 0720072007200720 x12: 0720072007200720
[ 8.113958] x11: 0720072007200720 x10: 0000000000f9fca4 x9 : ffff80008021f4e4
[ 8.113991] x8 : 0101010101010101 x7 : 746f72705f6d656d x6 : 000000001e0e0f5f
[ 8.114006] x5 : 000000000001864f x4 : ffff0000c12b8000 x3 : 000000000000001c
[ 8.114020] x2 : 0000000000000002 x1 : 0000000000000000 x0 : 0000000000000000
[ 8.114126] Call trace:
[ 8.114159] may_access_direct_pkt_data+0x24/0xa0
[ 8.114202] bpf_check+0x3bc/0x28c0
[ 8.114214] bpf_prog_load+0x658/0xa58
[ 8.114227] __sys_bpf+0xc50/0x2250
[ 8.114240] __arm64_sys_bpf+0x28/0x40
[ 8.114254] invoke_syscall.constprop.0+0x54/0xf0
[ 8.114273] do_el0_svc+0x4c/0xd8
[ 8.114289] el0_svc+0x3c/0x140
[ 8.114305] el0t_64_sync_handler+0x134/0x150
[ 8.114331] el0t_64_sync+0x168/0x170
[ 8.114477] Code: 7100707f 54000081 f9401c00 f9403800 (b9400403)
[ 8.118672] ---[ end trace 0000000000000000 ]---
One way to fix it is by forcing `attach_prog_fd` non-empty when
bpf_prog_load(). But this will lead to `libbpf_probe_bpf_prog_type`
API broken which use verifier log to probe prog type and will log
nothing if we reject invalid EXT prog before bpf_check().
Another way is by adding null check in resolve_prog_type().
The issue was introduced by commit 4a9c7bbe2ed4 ("bpf: Resolve to
prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT") which wanted
to correct type resolution for BPF_PROG_TYPE_TRACING programs. Before
that, the type resolution of BPF_PROG_TYPE_EXT prog actually follows
the logic below:
prog->aux->dst_prog ? prog->aux->dst_prog->type : prog->type;
It implies that when EXT program is not yet attached to `dst_prog`,
the prog type should be EXT itself. This code worked fine in the past.
So just keep using it.
Fix this by returning `prog->type` for BPF_PROG_TYPE_EXT if `dst_prog`
is not present in resolve_prog_type().
Fixes: 4a9c7bbe2ed4 ("bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT")
Signed-off-by: Tengda Wu <wutengda@huaweicloud.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20240711145819.254178-2-wutengda@huaweicloud.com
|
|
Currently the event counting provided by misc.events is hierarchical,
it's not practical if user is only concerned with events of a
specified cgroup. Therefore, introduce misc.events.local collect events
specific to the given cgroup.
This is analogous to memory.events.local and pids.events.local.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
This commit introduces the dm target flag mempool_needs_integrity. When
the flag is set, device mapper will call bioset_integrity_create on it's
bio sets. The target can then call bio_integrity_alloc on the bios
allocated from the table's mempool.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
|
|
* iommu/iommufd/paging-domain-alloc:
RDMA/usnic: Use iommu_paging_domain_alloc()
wifi: ath11k: Use iommu_paging_domain_alloc()
wifi: ath10k: Use iommu_paging_domain_alloc()
drm/msm: Use iommu_paging_domain_alloc()
vhost-vdpa: Use iommu_paging_domain_alloc()
vfio/type1: Use iommu_paging_domain_alloc()
iommufd: Use iommu_paging_domain_alloc()
iommu: Add iommu_paging_domain_alloc() interface
|
|
* iommu/iommufd/attach-handles:
iommu: Extend domain attach group with handle support
iommu: Add attach handle to struct iopf_group
iommu: Remove sva handle list
iommu: Introduce domain attachment handle
|
|
* iommu/fwspec-ops-removal:
iommu: Remove iommu_fwspec ops
OF: Simplify of_iommu_configure()
ACPI: Retire acpi_iommu_fwspec_ops()
iommu: Resolve fwspec ops automatically
iommu/mediatek-v1: Clean up redundant fwspec checks
[will: Fixed conflict in drivers/iommu/tegra-smmu.c between fwspec ops
removal and fwspec driver fix as per Robin and Jon]
|
|
* iommu/core:
docs: iommu: Remove outdated Documentation/userspace-api/iommu.rst
iommufd: Use atomic_long_try_cmpxchg() in incr_user_locked_vm()
iommu/iova: Add missing MODULE_DESCRIPTION() macro
iommu/dma: Prune redundant pgprot arguments
iommu: Make iommu_sva_domain_alloc() static
|
|
* iommu/arm/smmu: (32 commits)
iommu: Move IOMMU_DIRTY_NO_CLEAR define
iommu/arm-smmu-qcom: Register the TBU driver in qcom_smmu_impl_init
iommu/arm-smmu-v3: Enable HTTU for stage1 with io-pgtable mapping
iommu/arm-smmu-v3: Add support for dirty tracking in domain alloc
iommu/io-pgtable-arm: Add read_and_clear_dirty() support
iommu/arm-smmu-v3: Add feature detection for HTTU
iommu/arm-smmu-v3: Add support for domain_alloc_user fn
iommu/arm-smmu-qcom: record reason for deferring probe
iommu/arm-smmu: Pretty-print context fault related regs
iommu/arm-smmu-qcom-debug: Do not print for handled faults
iommu/arm-smmu: Add CB prefix to register bitfields
dt-bindings: arm-smmu: Add X1E80100 GPU SMMU
iommu/arm-smmu-v3: add missing MODULE_DESCRIPTION() macro
iommu/arm-smmu-v3: Shrink the strtab l1_desc array
iommu/arm-smmu-v3: Do not zero the strtab twice
iommu/arm-smmu-v3: Allow setting a S1 domain to a PASID
iommu/arm-smmu-v3: Allow a PASID to be set when RID is IDENTITY/BLOCKED
iommu/arm-smmu-v3: Test the STE S1DSS functionality
iommu/arm-smmu-v3: Allow IDENTITY/BLOCKED to be set while PASID is used
iommu/arm-smmu-v3: Put the SVA mmu notifier in the smmu_domain
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.11
1. Add ParaVirt steal time support.
2. Add some VM migration enhancement.
3. Add perf kvm-stat support for loongarch.
|
|
Fixes the compile issue when CONFIG_IOMMU_API is not set.
Fixes: 4fe88fd8b4ae ("iommu/io-pgtable-arm: Add read_and_clear_dirty() support")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202407121602.HL9ih1it-lkp@intel.com/
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20240712113132.45100-1-shameerali.kolothum.thodi@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Pre-population has been requested several times to mitigate KVM page faults
during guest boot or after live migration. It is also required by TDX
before filling in the initial guest memory with measured contents.
Introduce it as a generic API.
|
|
Add a new ioctl KVM_PRE_FAULT_MEMORY in the KVM common code. It iterates on the
memory range and calls the arch-specific function. The implementation is
optional and enabled by a Kconfig symbol.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Message-ID: <819322b8f25971f2b9933bfa4506e618508ad782.1712785629.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The flags AS_UNMOVABLE and AS_INACCESSIBLE were both added just for guest_memfd;
AS_UNMOVABLE is already in existing versions of Linux, while AS_INACCESSIBLE was
acked for inclusion in 6.11.
But really, they are the same thing: only guest_memfd uses them, at least for
now, and guest_memfd pages are unmovable because they should not be
accessed by the CPU.
So merge them into one; use the AS_INACCESSIBLE name which is more comprehensive.
At the same time, this fixes an embarrassing bug where AS_INACCESSIBLE was used
as a bit mask, despite it being just a bit index.
The bug was mostly benign, because AS_INACCESSIBLE's bit representation (1010)
corresponded to setting AS_UNEVICTABLE (which is already set) and AS_ENOSPC
(except no async writes can happen on the guest_memfd). So the AS_INACCESSIBLE
flag simply had no effect.
Fixes: 1d23040caa8b ("KVM: guest_memfd: Use AS_INACCESSIBLE when creating guest_memfd inode")
Fixes: c72ceafbd12c ("mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory")
Cc: linux-mm@kvack.org
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Michael Roth <michael.roth@amd.com>
Reviewed-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The timer/counter block on the Realtek SoCs provides up to 5 timers. It
also includes a watchdog timer which is handled by the
realtek_otto_wdt.c driver.
One timer will be used per CPU as a local clock event generator. An
additional timer will be used as an overal stable clocksource.
Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Signed-off-by: Sander Vanheule <sander@svanheule.net>
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Link: https://lore.kernel.org/r/20240710043524.1535151-8-chris.packham@alliedtelesis.co.nz
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
|
|
Because __thermal_zone_get_trip() is only called by thermal_zone_get_trip()
now, fold the former into the latter.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/22339769.EfDdHjke4D@rjwysocki.net
|
|
Out of several drivers implementing the .set_trip_temp() thermal zone
operation, three don't actually use the trip ID argument passed to it,
two call __thermal_zone_get_trip() to get a struct thermal_trip
corresponding to the given trip ID, and the other use the trip ID as an
index into their own data structures with the assumption that it will
always match the ordering of entries in the trips table passed to the
core during thermal zone registration, which is fragile and not really
guaranteed.
Even though the trip IDs used by the core are in fact their indices in the
trips table passed to it by the thermal zone creator, that is purely a
matter of convenience and should not be relied on for correctness.
For this reason, modify trip_point_temp_store() to pass a (const) trip
pointer to .set_trip_temp() and adjust the drivers implementing it
accordingly.
This helps to simplify the drivers invoking __thermal_zone_get_trip()
from their .set_trip_temp() callback functions because they will not
need to do it now and the other drivers can store their internal
trip indices in the priv field in struct thermal_trip and their
.set_trip_temp() callback functions can get those indices from there.
The intel_quark_dts thermal driver can instead use the trip type to
determine the requisite trip index.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/8392906.T7Z3S40VBb@rjwysocki.net
[ rjw: Add missing colon and 2 empty code lines ]
[ rjw: Add missing change in imx_thermal.c and adjust the changelog ]
[ rjw: Drop an unused local variable ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
While efi_memory_attributes_table_t::entry isn't used directly as an
array, it is used as a base for pointer arithmetic. The type is wrong
as it's not technically an array of efi_memory_desc_t's; they could be
larger. Regardless, leave the type unchanged and remove the old style
"0" array size. Additionally replace the open-coded entry offset code
with the existing efi_memdesc_ptr() helper.
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The "early" part of the helper's name isn't accurate[1]. Drop it in
preparation for adding a new (not early) usage.
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/lkml/CAMj1kXEyDjH0uu3Z4eBesV3PEnKGi5ArXXMp7R-hn8HdRytiPg@mail.gmail.com [1]
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
In fact, this structure contains a flexible array at the end, but
historically its size, alignment etc., is calculated manually.
There are several instances of the structure embedded into other
structures, but also there's ongoing effort to remove them and we
could in the meantime declare &net_device properly.
Declare the array explicitly, use struct_size() and store the array
size inside the structure, so that __counted_by() can be applied.
Don't use PTR_ALIGN(), as SLUB itself tries its best to ensure the
allocated buffer is aligned to what the user expects.
Also, change its alignment from %NETDEV_ALIGN to the cacheline size
as per several suggestions on the netdev ML.
bloat-o-meter for vmlinux:
free_netdev 445 440 -5
netdev_freemem 24 - -24
alloc_netdev_mqs 1481 1450 -31
On x86_64 with several NICs of different vendors, I was never able to
get a &net_device pointer not aligned to the cacheline size after the
change.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://patch.msgid.link/20240710113036.2125584-1-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The only managed mapping function currently is pcim_iomap() which doesn't
allow for mapping an area starting at a certain offset, which many drivers
want.
Add pcim_iomap_range() as an exported function.
Link: https://lore.kernel.org/r/20240613115032.29098-13-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
PCI subfunctions (SF) are anchored on the auxiliary bus. PCI physical
and virtual functions are anchored on the PCI bus. The irq information
of each such function is visible to users via sysfs directory "msi_irqs"
containing files for each irq entry. However, for PCI SFs such
information is unavailable. Due to this users have no visibility on IRQs
used by the SFs.
Secondly, an SF can be multi function device supporting rdma, netdevice
and more. Without irq information at the bus level, the user is unable
to view or use the affinity of the SF IRQs.
Hence to match to the equivalent PCI PFs and VFs, add "irqs" directory,
for supporting auxiliary devices, containing file for each irq entry.
For example:
$ ls /sys/bus/auxiliary/devices/mlx5_core.sf.1/irqs/
50 51 52 53 54 55 56 57 58
Cc: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
v9-v10:
- remove Przemek RB
- add name field to auxiliary_irq_info (Greg and Przemek)
- handle bogus IRQ in auxiliary_device_sysfs_irq_remove (Greg)
v8-v9:
- add Przemek RB
- use guard() in auxiliary_irq_dir_prepare (Paolo)
v7-v8:
- use cleanup.h for info and name fields (Greg)
- correct error flow in auxiliary_irq_dir_prepare (Przemek)
- add documentation for new fields of auxiliary_device (Simon)
v6-v7:
- dynamically creating irqs directory when first irq file created (Greg)
- removed irqs flag and simplified the dev_add() API (Greg)
- move sysfs related new code to a new auxiliary_sysfs.c file (Greg)
v5-v6:
- removed concept of shared and exclusive and hence global xarray (Greg)
v4-v5:
- restore global mutex and replace refcount_t with simple integer (Greg)
v3->4:
- remove global mutex (Przemek)
v2->v3:
- fix function declaration in case SYSFS isn't defined
v1->v2:
- move #ifdefs from drivers/base/auxiliary.c to
include/linux/auxiliary_bus.h (Greg)
- use EXPORT_SYMBOL_GPL instead of EXPORT_SYMBOL (Greg)
- Fix kzalloc(ref) to kzalloc(*ref) (Simon)
- Add return description in auxiliary_device_sysfs_irq_add() kdoc (Simon)
- Fix auxiliary_irq_mode_show doc (kernel test boot)
|
|
Cross-merge networking fixes after downstream PR.
Conflicts:
net/sched/act_ct.c
26488172b029 ("net/sched: Fix UAF when resolving a clash")
3abbd7ed8b76 ("act_ct: prepare for stolen verdict coming from conntrack and nat engine")
No adjacent changes.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"This fixes two regressions that have been bubbling along for a large
part of this release.
One is a revert of the multi mode support for the OMAP SPI controller,
this introduced regressions on a number of systems and while there has
been progress on fixing those we've not got something that works for
everyone yet so let's just drop the change for now.
The other is a series of fixes from David Lechner for his recent
message optimisation work, this interacted badly with spi-mux which
is altogether too clever with recursive use of the bus and creates
situations that hadn't been considered.
There are also a couple of small driver specific fixes, including one
more patch from David for sleep duration calculations in the AXI
driver"
* tag 'spi-fix-v6.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: mux: set ctlr->bits_per_word_mask
spi: add defer_optimize_message controller flag
spi: don't unoptimize message in spi_async()
spi: omap2-mcspi: Revert multi mode support
spi: davinci: Unset POWERDOWN bit when releasing resources
spi: axi-spi-engine: fix sleep calculation
spi: imx: Don't expect DMA for i.MX{25,35,50,51,53} cspi devices
|
|
* for-next/vcpu-hotplug: (21 commits)
: arm64 support for virtual CPU hotplug (ACPI)
irqchip/gic-v3: Fix 'broken_rdists' unused warning when !SMP and !ACPI
arm64: Kconfig: Fix dependencies to enable ACPI_HOTPLUG_CPU
cpumask: Add enabled cpumask for present CPUs that can be brought online
arm64: document virtual CPU hotplug's expectations
arm64: Kconfig: Enable hotplug CPU on arm64 if ACPI_PROCESSOR is enabled.
arm64: arch_register_cpu() variant to check if an ACPI handle is now available.
arm64: psci: Ignore DENIED CPUs
irqchip/gic-v3: Add support for ACPI's disabled but 'online capable' CPUs
irqchip/gic-v3: Don't return errors from gic_acpi_match_gicc()
arm64: acpi: Harden get_cpu_for_acpi_id() against missing CPU entry
arm64: acpi: Move get_cpu_for_acpi_id() to a header
ACPI: Add post_eject to struct acpi_scan_handler for cpu hotplug
ACPI: scan: switch to flags for acpi_scan_check_and_detach()
ACPI: processor: Register deferred CPUs from acpi_processor_get_info()
ACPI: processor: Add acpi_get_processor_handle() helper
ACPI: processor: Move checks and availability of acpi_processor earlier
ACPI: processor: Fix memory leaks in error paths of processor_add()
ACPI: processor: Return an error if acpi_processor_get_info() fails in processor_add()
ACPI: processor: Drop duplicated check on _STA (enabled + present)
cpu: Do not warn on arch_register_cpu() returning -EPROBE_DEFER
...
|
|
'for-next/mte', 'for-next/errata', 'for-next/acpi', 'for-next/gic-v3-pmr' and 'for-next/doc', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
perf: add missing MODULE_DESCRIPTION() macros
perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h
perf: arm_v6/7_pmu: Drop non-DT probe support
perf/arm: Move 32-bit PMU drivers to drivers/perf/
perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check
perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold
perf: imx_perf: add support for i.MX95 platform
perf: imx_perf: fix counter start and config sequence
perf: imx_perf: refactor driver for imx93
perf: imx_perf: let the driver manage the counter usage rather the user
perf: imx_perf: add macro definitions for parsing config attr
dt-bindings: perf: fsl-imx-ddr: Add i.MX95 compatible
perf: pmuv3: Add new Cortex and Neoverse PMUs
dt-bindings: arm: pmu: Add new Cortex and Neoverse cores
perf/arm-cmn: Enable support for tertiary match group
perf/arm-cmn: Decouple wp_config registers from filter group number
* for-next/cpufeature:
: Various cpufeature infrastructure patches
arm64/cpufeature: Replace custom macros with fields from ID_AA64PFR0_EL1
KVM: arm64: Replace custom macros with fields from ID_AA64PFR0_EL1
arm64/cpufeatures/kvm: Add ARMv8.9 FEAT_ECBHB bits in ID_AA64MMFR1 register
* for-next/misc:
: Miscellaneous patches
arm64: smp: Fix missing IPI statistics
arm64: Cleanup __cpu_set_tcr_t0sz()
arm64/mm: Stop using ESR_ELx_FSC_TYPE during fault
arm64: Kconfig: fix typo in __builtin_return_adddress
ARM64: reloc_test: add missing MODULE_DESCRIPTION() macro
arm64: implement raw_smp_processor_id() using thread_info
arm64/arch_timer: include <linux/percpu.h>
* for-next/kselftest:
: arm64 kselftest updates
selftests: arm64: tags: remove the result script
selftests: arm64: tags_test: conform test to TAP output
kselftest/arm64: Fix a couple of spelling mistakes
kselftest/arm64: Fix redundancy of a testcase
kselftest/arm64: Include kernel mode NEON in fp-stress
* for-next/mte:
: MTE updates
arm64: mte: Make mte_check_tfsr_*() conditional on KASAN instead of MTE
* for-next/errata:
: Arm CPU errata workarounds
arm64: errata: Expand speculative SSBS workaround
arm64: errata: Unify speculative SSBS errata logic
arm64: cputype: Add Cortex-X925 definitions
arm64: cputype: Add Cortex-A720 definitions
arm64: cputype: Add Cortex-X3 definitions
* for-next/acpi:
: arm64 ACPI patches
ACPI: Add acpi=nospcr to disable ACPI SPCR as default console on ARM64
ACPI / amba: Drop unnecessary check for registered amba_dummy_clk
arm64: FFH: Move ACPI specific code into drivers/acpi/arm64/
arm64: cpuidle: Move ACPI specific code into drivers/acpi/arm64/
ACPI: arm64: Sort entries alphabetically
* for-next/gic-v3-pmr:
: arm64: irqchip/gic-v3: Use compiletime constant PMR values
arm64: irqchip/gic-v3: Select priorities at boot time
irqchip/gic-v3: Detect GICD_CTRL.DS and SCR_EL3.FIQ earlier
irqchip/gic-v3: Make distributor priorities variables
irqchip/gic-common: Remove sync_access callback
wordpart.h: Add REPEAT_BYTE_U32()
* for-next/doc:
: arm64 documentation updates
Documentation: arm64: Update memory.rst for TBI
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"cachefiles:
- Export an existing and add a new cachefile helper to be used in
filesystems to fix reference count bugs
- Use the newly added fscache_ty_get_volume() helper to get a
reference count on an fscache_volume to handle volumes that are
about to be removed cleanly
- After withdrawing a fscache_cache via FSCACHE_CACHE_IS_WITHDRAWN
wait for all ongoing cookie lookups to complete and for the object
count to reach zero
- Propagate errors from vfs_getxattr() to avoid an infinite loop in
cachefiles_check_volume_xattr() because it keeps seeing ESTALE
- Don't send new requests when an object is dropped by raising
CACHEFILES_ONDEMAND_OJBSTATE_DROPPING
- Cancel all requests for an object that is about to be dropped
- Wait for the ondemand_boject_worker to finish before dropping a
cachefiles object to prevent use-after-free
- Use cyclic allocation for message ids to better handle id recycling
- Add missing lock protection when iterating through the xarray when
polling
netfs:
- Use standard logging helpers for debug logging
VFS:
- Fix potential use-after-free in file locks during
trace_posix_lock_inode(). The tracepoint could fire while another
task raced it and freed the lock that was requested to be traced
- Only increment the nr_dentry_negative counter for dentries that are
present on the superblock LRU. Currently, DCACHE_LRU_LIST list is
used to detect this case. However, the flag is also raised in
combination with DCACHE_SHRINK_LIST to indicate that dentry->d_lru
is used. So checking only DCACHE_LRU_LIST will lead to wrong
nr_dentry_negative count. Fix the check to not count dentries that
are on a shrink related list
Misc:
- hfsplus: fix an uninitialized value issue in copy_name
- minix: fix minixfs_rename with HIGHMEM. It still uses kunmap() even
though we switched it to kmap_local_page() a while ago"
* tag 'vfs-6.10-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
minixfs: Fix minixfs_rename with HIGHMEM
hfsplus: fix uninit-value in copy_name
vfs: don't mod negative dentry count when on shrinker list
filelock: fix potential use-after-free in posix_lock_inode
cachefiles: add missing lock protection when polling
cachefiles: cyclic allocation of msg_id to avoid reuse
cachefiles: wait for ondemand_object_worker to finish when dropping object
cachefiles: cancel all requests for the object that is being dropped
cachefiles: stop sending new request when dropping object
cachefiles: propagate errors from vfs_getxattr() to avoid infinite loop
cachefiles: fix slab-use-after-free in cachefiles_withdraw_cookie()
cachefiles: fix slab-use-after-free in fscache_withdraw_volume()
netfs, fscache: export fscache_put_volume() and add fscache_try_get_volume()
netfs: Switch debug logging to pr_debug()
|
|
Start changing the wording of the I2C main header wrt. the newest I2C
v7 and SMBus 3.2 specifications and replace "master/slave" with more
appropriate terms. The first step renames the members of struct
i2c_algorithm. Once all in-tree users are converted, the anonymous union
will go away again. All this work will also pave the way for finally
seperating the monolithic header into more fine-grained headers like
"i2c/clients.h" etc.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
|
|
the branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Merge series from Richard Fitzgerald <rf@opensource.cirrus.com>:
Commit series that makes some small improvements to code and the
kernel log messages.
|
|
Introduce a new help addrs_per_page() to wrap common code
from addrs_per_inode() and addrs_per_block() for cleanup.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Merge series from Guenter Roeck <linux@roeck-us.net>:
regmap_multi_reg_read() is similar to regmap_bilk_read() but reads from
an array of non-sequential registers. It is helpful if multiple non-
sequential registers need to be read in a single operation which would
otherwise have to be mutex protected.
The name of the new function was chosen to match the existing function
regmap_multi_reg_write().
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"21 hotfixes, 15 of which are cc:stable.
No identifiable theme here - all are singleton patches, 19 are for MM"
* tag 'mm-hotfixes-stable-2024-07-10-13-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits)
mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio
mm/hugetlb: fix potential race in __update_and_free_hugetlb_folio()
filemap: replace pte_offset_map() with pte_offset_map_nolock()
arch/xtensa: always_inline get_current() and current_thread_info()
sched.h: always_inline alloc_tag_{save|restore} to fix modpost warnings
MAINTAINERS: mailmap: update Lorenzo Stoakes's email address
mm: fix crashes from deferred split racing folio migration
lib/build_OID_registry: avoid non-destructive substitution for Perl < 5.13.2 compat
mm: gup: stop abusing try_grab_folio
nilfs2: fix kernel bug on rename operation of broken directory
mm/hugetlb_vmemmap: fix race with speculative PFN walkers
cachestat: do not flush stats in recency check
mm/shmem: disable PMD-sized page cache if needed
mm/filemap: skip to create PMD-sized page cache if needed
mm/readahead: limit page cache size in page_cache_ra_order()
mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray
mm/damon/core: merge regions aggressively when max_nr_regions is unmet
Fix userfaultfd_api to return EINVAL as expected
mm: vmalloc: check if a hash-index is in cpu_possible_mask
mm: prevent derefencing NULL ptr in pfn_section_valid()
...
|
|
As suggested by the B-ext spec, the Zbc (carry-less multiplication)
instructions can be used to accelerate CRC calculations. Currently, the
crc32 is the most widely used crc function inside kernel, so this patch
focuses on the optimization of just the crc32 APIs.
Compared with the current table-lookup based optimization, Zbc based
optimization can also achieve large stride during CRC calculation loop,
meantime, it avoids the memory access latency of the table-lookup based
implementation and it reduces memory footprint.
If Zbc feature is not supported in a runtime environment, then the
table-lookup based implementation would serve as fallback via alternative
mechanism.
By inspecting the vmlinux built by gcc v12.2.0 with default optimization
level (-O2), we can see below instruction count change for each 8-byte
stride in the CRC32 loop:
rv64: crc32_be (54->31), crc32_le (54->13), __crc32c_le (54->13)
rv32: crc32_be (50->32), crc32_le (50->16), __crc32c_le (50->16)
The compile target CPU is little endian, extra effort is needed for byte
swapping for the crc32_be API, thus, the instruction count change is not
as significant as that in the *_le cases.
This patch is tested on QEMU VM with the kernel CRC32 selftest for both
rv64 and rv32. Running the CRC32 selftest on a real hardware (SpacemiT K1)
with Zbc extension shows 65% and 125% performance improvement respectively
on crc32_test() and crc32c_test().
Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Link: https://lore.kernel.org/r/20240621054707.1847548-1-xiao.w.wang@intel.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
|
|
If CONFIG_ZSWAP is set to N, it means zswap cannot be enabled.
zswap_never_enabled() should return true.
The only effect of this issue is that with Barry's latest large folio
swapin patches for zram ("mm: support mTHP swap-in for zRAM-like
swapfile"), we will always fallback to order-0 swapin, even mistakenly
when !CONFIG_ZSWAP.
Basically this bug makes Barry's in progress patches not work at all.
The API was created to inform the mm core that zswap has never been
enabled, allowing the mm core to perform mTHP swap-in. This is a
transitional solution until zswap supports mTHP. If zswap has been
enabled, performing mTHP swap-in will result in corrupted data. You
may find the answer in the mTHP swap-in series:
https://lore.kernel.org/linux-mm/CAJD7tkZ4FQr6HZpduOdvmqgg_-whuZYE-Bz5O2t6yzw6Yg+v1A@mail.gmail.com/
Link: https://lkml.kernel.org/r/20240629232231.42394-1-21cnbao@gmail.com
Fixes: 0300e17d67c3 ("mm: zswap: add zswap_never_enabled()")
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Chris Li <chrisl@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|