Age | Commit message (Collapse) | Author |
|
There are no users of HAVE_ARCH_NODEDATA_EXTENSION left, so
arch_alloc_nodedata() and arch_refresh_nodedata() are not needed anymore.
Replace the call to arch_alloc_nodedata() in free_area_init() with a new
helper alloc_offline_node_data(), remove arch_refresh_nodedata() and
cleanup include/linux/memory_hotplug.h from the associated ifdefery.
Link: https://lkml.kernel.org/r/20240807064110.1003856-9-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Rob Herring (Arm) <robh@kernel.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Every architecture that supports NUMA defines node_data in the same way:
struct pglist_data *node_data[MAX_NUMNODES];
No reason to keep multiple copies of this definition and its forward
declarations, especially when such forward declaration is the only thing
in include/asm/mmzone.h for many architectures.
Add definition and declaration of node_data to generic code and drop
architecture-specific versions.
Link: https://lkml.kernel.org/r/20240807064110.1003856-8-rppt@kernel.org
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU]
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Rob Herring (Arm) <robh@kernel.org>
Cc: Samuel Holland <samuel.holland@sifive.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Link all full cluster with one full list, and reclaim from it when the
allocation have ran out of all usable clusters.
There are many reason a folio can end up being in the swap cache while
having no swap count reference. So the best way to search for such slots
is still by iterating the swap clusters.
With the list as an LRU, iterating from the oldest cluster and keep them
rotating is a very doable and clean way to free up potentially not inuse
clusters.
When any allocation failure, try reclaim and rotate only one cluster.
This is adaptive for high order allocations they can tolerate fallback.
So this avoids latency, and give the full cluster list an fair chance to
get reclaimed. It release the usage stress for the fallback order 0
allocation or following up high order allocation.
If the swap device is getting very full, reclaim more aggresively to
ensure no OOM will happen. This ensures order 0 heavy workload won't go
OOM as order 0 won't fail if any cluster still have any space.
[ryncsn@gmail.com: fix discard of full cluster]
Link: https://lkml.kernel.org/r/CAMgjq7CWwK75_2Zi5P40K08pk9iqOcuWKL6khu=x4Yg_nXaQag@mail.gmail.com
Link: https://lkml.kernel.org/r/20240730-swap-allocator-v5-9-cb9c148b9297@kernel.org
Signed-off-by: Kairui Song <kasong@tencent.com>
Reported-by: Barry Song <21cnbao@gmail.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kairui Song <ryncsn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This commit implements reclaim during scan for cluster allocator.
Cluster scanning were unable to reuse SWAP_HAS_CACHE slots, which could
result in low allocation success rate or early OOM.
So to ensure maximum allocation success rate, integrate reclaiming with
scanning. If found a range of suitable swap slots but fragmented due to
HAS_CACHE, just try to reclaim the slots.
Link: https://lkml.kernel.org/r/20240730-swap-allocator-v5-8-cb9c148b9297@kernel.org
Signed-off-by: Kairui Song <kasong@tencent.com>
Reported-by: Barry Song <21cnbao@gmail.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Now swap cluster allocator arranges the clusters in LRU style, so the
"cold" cluster stay at the head of nonfull lists are the ones that were
used for allocation long time ago and still partially occupied. So if
allocator can't find enough contiguous slots to satisfy an high order
allocation, it's unlikely there will be slot being free on them to satisfy
the allocation, at least in a short period.
As a result, nonfull cluster scanning will waste time repeatly scanning
the unusable head of the list.
Also, multiple CPUs could content on the same head cluster of nonfull
list. Unlike free clusters which are removed from the list when any CPU
starts using it, nonfull cluster stays on the head.
So introduce a new list frag list, all scanned nonfull clusters will be
moved to this list. Both for avoiding repeated scanning and contention.
Frag list is still used as fallback for allocations, so if one CPU failed
to allocate one order of slots, it can still steal other CPU's clusters.
And order 0 will favor the fragmented clusters to better protect nonfull
clusters
If any slots on a fragment list are being freed, move the fragment list
back to nonfull list indicating it worth another scan on the cluster.
Compared to scan upon freeing a slot, this keep the scanning lazy and save
some CPU if there are still other clusters to use.
It may seems unneccessay to keep the fragmented cluster on list at all if
they can't be used for specific order allocation. But this will start to
make sense once reclaim dring scanning is ready.
Link: https://lkml.kernel.org/r/20240730-swap-allocator-v5-7-cb9c148b9297@kernel.org
Signed-off-by: Kairui Song <kasong@tencent.com>
Reported-by: Barry Song <21cnbao@gmail.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Track the nonfull cluster as well as the empty cluster on lists. Each
order has one nonfull cluster list.
The cluster will remember which order it was used during new cluster
allocation.
When the cluster has free entry, add to the nonfull[order] list. When
the free cluster list is empty, also allocate from the nonempty list of
that order.
This improves the mTHP swap allocation success rate.
There are limitations if the distribution of numbers of different orders
of mTHP changes a lot. e.g. there are a lot of nonfull cluster assign to
order A while later time there are a lot of order B allocation while very
little allocation in order A. Currently the cluster used by order A will
not reused by order B unless the cluster is 100% empty.
Link: https://lkml.kernel.org/r/20240730-swap-allocator-v5-2-cb9c148b9297@kernel.org
Signed-off-by: Chris Li <chrisl@kernel.org>
Reported-by: Barry Song <21cnbao@gmail.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm: swap: mTHP swap allocator base on swap cluster order",
v5.
This is the short term solutions "swap cluster order" listed in my "Swap
Abstraction" discussion slice 8 in the recent LSF/MM conference.
When commit 845982eb264bc "mm: swap: allow storage of all mTHP orders" is
introduced, it only allocates the mTHP swap entries from the new empty
cluster list. It has a fragmentation issue reported by Barry.
https://lore.kernel.org/all/CAGsJ_4zAcJkuW016Cfi6wicRr8N9X+GJJhgMQdSMp+Ah+NSgNQ@mail.gmail.com/
The reason is that all the empty clusters have been exhausted while there
are plenty of free swap entries in the cluster that are not 100% free.
Remember the swap allocation order in the cluster. Keep track of the per
order non full cluster list for later allocation.
This series gives the swap SSD allocation a new separate code path from
the HDD allocation. The new allocator use cluster list only and do not
global scan swap_map[] without lock any more.
This streamline the swap allocation for SSD. The code matches the
execution flow much better.
User impact: For users that allocate and free mix order mTHP swapping, It
greatly improves the success rate of the mTHP swap allocation after the
initial phase.
It also performs faster when the swapfile is close to full, because the
allocator can get the non full cluster from a list rather than scanning a
lot of swap_map entries.
With Barry's mthp test program V2:
Without:
$ ./thp_swap_allocator_test -a
Iteration 1: swpout inc: 32, swpout fallback inc: 192, Fallback percentage: 85.71%
Iteration 2: swpout inc: 0, swpout fallback inc: 231, Fallback percentage: 100.00%
Iteration 3: swpout inc: 0, swpout fallback inc: 227, Fallback percentage: 100.00%
...
Iteration 98: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
Iteration 99: swpout inc: 0, swpout fallback inc: 215, Fallback percentage: 100.00%
Iteration 100: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
$ ./thp_swap_allocator_test -a -s
Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
..
Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%
$ ./thp_swap_allocator_test -s
Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
..
Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%
$ ./thp_swap_allocator_test
Iteration 1: swpout inc: 0, swpout fallback inc: 224, Fallback percentage: 100.00%
Iteration 2: swpout inc: 0, swpout fallback inc: 218, Fallback percentage: 100.00%
Iteration 3: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
..
Iteration 98: swpout inc: 0, swpout fallback inc: 228, Fallback percentage: 100.00%
Iteration 99: swpout inc: 0, swpout fallback inc: 230, Fallback percentage: 100.00%
Iteration 100: swpout inc: 0, swpout fallback inc: 229, Fallback percentage: 100.00%
With: # with all 0.00% filter out
$ ./thp_swap_allocator_test -a | grep -v "0.00%"
$ # all result are 0.00%
$ ./thp_swap_allocator_test -a -s | grep -v "0.00%"
./thp_swap_allocator_test -a -s | grep -v "0.00%"
Iteration 14: swpout inc: 223, swpout fallback inc: 3, Fallback percentage: 1.33%
Iteration 19: swpout inc: 219, swpout fallback inc: 7, Fallback percentage: 3.10%
Iteration 28: swpout inc: 225, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 29: swpout inc: 227, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 34: swpout inc: 220, swpout fallback inc: 8, Fallback percentage: 3.51%
Iteration 35: swpout inc: 222, swpout fallback inc: 11, Fallback percentage: 4.72%
Iteration 38: swpout inc: 217, swpout fallback inc: 4, Fallback percentage: 1.81%
Iteration 40: swpout inc: 222, swpout fallback inc: 6, Fallback percentage: 2.63%
Iteration 42: swpout inc: 221, swpout fallback inc: 2, Fallback percentage: 0.90%
Iteration 43: swpout inc: 215, swpout fallback inc: 7, Fallback percentage: 3.15%
Iteration 47: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88%
Iteration 49: swpout inc: 217, swpout fallback inc: 1, Fallback percentage: 0.46%
Iteration 52: swpout inc: 221, swpout fallback inc: 8, Fallback percentage: 3.49%
Iteration 56: swpout inc: 224, swpout fallback inc: 4, Fallback percentage: 1.75%
Iteration 58: swpout inc: 214, swpout fallback inc: 5, Fallback percentage: 2.28%
Iteration 62: swpout inc: 220, swpout fallback inc: 3, Fallback percentage: 1.35%
Iteration 64: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 67: swpout inc: 221, swpout fallback inc: 1, Fallback percentage: 0.45%
Iteration 75: swpout inc: 220, swpout fallback inc: 9, Fallback percentage: 3.93%
Iteration 82: swpout inc: 227, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 86: swpout inc: 211, swpout fallback inc: 12, Fallback percentage: 5.38%
Iteration 89: swpout inc: 226, swpout fallback inc: 2, Fallback percentage: 0.88%
Iteration 93: swpout inc: 220, swpout fallback inc: 1, Fallback percentage: 0.45%
Iteration 94: swpout inc: 224, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 96: swpout inc: 221, swpout fallback inc: 6, Fallback percentage: 2.64%
Iteration 98: swpout inc: 227, swpout fallback inc: 1, Fallback percentage: 0.44%
Iteration 99: swpout inc: 227, swpout fallback inc: 3, Fallback percentage: 1.30%
$ ./thp_swap_allocator_test
./thp_swap_allocator_test
Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 2: swpout inc: 131, swpout fallback inc: 101, Fallback percentage: 43.53%
Iteration 3: swpout inc: 71, swpout fallback inc: 155, Fallback percentage: 68.58%
Iteration 4: swpout inc: 55, swpout fallback inc: 168, Fallback percentage: 75.34%
Iteration 5: swpout inc: 35, swpout fallback inc: 191, Fallback percentage: 84.51%
Iteration 6: swpout inc: 25, swpout fallback inc: 199, Fallback percentage: 88.84%
Iteration 7: swpout inc: 23, swpout fallback inc: 205, Fallback percentage: 89.91%
Iteration 8: swpout inc: 9, swpout fallback inc: 219, Fallback percentage: 96.05%
Iteration 9: swpout inc: 13, swpout fallback inc: 213, Fallback percentage: 94.25%
Iteration 10: swpout inc: 12, swpout fallback inc: 216, Fallback percentage: 94.74%
Iteration 11: swpout inc: 16, swpout fallback inc: 213, Fallback percentage: 93.01%
Iteration 12: swpout inc: 10, swpout fallback inc: 210, Fallback percentage: 95.45%
Iteration 13: swpout inc: 16, swpout fallback inc: 212, Fallback percentage: 92.98%
Iteration 14: swpout inc: 12, swpout fallback inc: 212, Fallback percentage: 94.64%
Iteration 15: swpout inc: 15, swpout fallback inc: 211, Fallback percentage: 93.36%
Iteration 16: swpout inc: 15, swpout fallback inc: 200, Fallback percentage: 93.02%
Iteration 17: swpout inc: 9, swpout fallback inc: 220, Fallback percentage: 96.07%
$ ./thp_swap_allocator_test -s
./thp_swap_allocator_test -s
Iteration 1: swpout inc: 233, swpout fallback inc: 0, Fallback percentage: 0.00%
Iteration 2: swpout inc: 97, swpout fallback inc: 135, Fallback percentage: 58.19%
Iteration 3: swpout inc: 42, swpout fallback inc: 192, Fallback percentage: 82.05%
Iteration 4: swpout inc: 19, swpout fallback inc: 214, Fallback percentage: 91.85%
Iteration 5: swpout inc: 12, swpout fallback inc: 213, Fallback percentage: 94.67%
Iteration 6: swpout inc: 11, swpout fallback inc: 217, Fallback percentage: 95.18%
Iteration 7: swpout inc: 9, swpout fallback inc: 214, Fallback percentage: 95.96%
Iteration 8: swpout inc: 8, swpout fallback inc: 213, Fallback percentage: 96.38%
Iteration 9: swpout inc: 2, swpout fallback inc: 223, Fallback percentage: 99.11%
Iteration 10: swpout inc: 2, swpout fallback inc: 228, Fallback percentage: 99.13%
Iteration 11: swpout inc: 4, swpout fallback inc: 214, Fallback percentage: 98.17%
Iteration 12: swpout inc: 5, swpout fallback inc: 226, Fallback percentage: 97.84%
Iteration 13: swpout inc: 3, swpout fallback inc: 212, Fallback percentage: 98.60%
Iteration 14: swpout inc: 0, swpout fallback inc: 222, Fallback percentage: 100.00%
Iteration 15: swpout inc: 3, swpout fallback inc: 222, Fallback percentage: 98.67%
Iteration 16: swpout inc: 4, swpout fallback inc: 223, Fallback percentage: 98.24%
=========
Kernel compile under tmpfs with cgroup memory.max = 470M.
12 core 24 hyperthreading, 32 jobs. 10 Run each group
SSD swap 10 runs average, 20G swap partition:
With:
user 2929.064
system 1479.381 : 1376.89 1398.22 1444.64 1477.39 1479.04 1497.27
1504.47 1531.4 1532.92 1551.57
real 1441.324
Without:
user 2910.872
system 1482.732 : 1440.01 1451.4 1462.01 1467.47 1467.51 1469.3
1470.19 1496.32 1544.1 1559.01
real 1580.822
Two zram swap: zram0 3.0G zram1 20G.
The idea is forcing the zram0 almost full then overflow to zram1:
With:
user 4320.301
system 4272.403 : 4236.24 4262.81 4264.75 4269.13 4269.44 4273.06
4279.85 4285.98 4289.64 4293.13
real 431.759
Without
user 4301.393
system 4387.672 : 4374.47 4378.3 4380.95 4382.84 4383.06 4388.05
4389.76 4397.16 4398.23 4403.9
real 433.979
------ more test result from Kaiui ----------
Test with build linux kernel using a 4G ZRAM, 1G memory.max limit on top of shmem:
System info: 32 Core AMD Zen2, 64G total memory.
Test 3 times using only 4K pages:
=================================
With:
-----
1838.74user 2411.21system 2:37.86elapsed 2692%CPU (0avgtext+0avgdata 847060maxresident)k
1839.86user 2465.77system 2:39.35elapsed 2701%CPU (0avgtext+0avgdata 847060maxresident)k
1840.26user 2454.68system 2:39.43elapsed 2693%CPU (0avgtext+0avgdata 847060maxresident)k
Summary (~4.6% improment of system time):
User: 1839.62
System: 2443.89: 2465.77 2454.68 2411.21
Real: 158.88
Without:
--------
1837.99user 2575.95system 2:43.09elapsed 2706%CPU (0avgtext+0avgdata 846520maxresident)k
1838.32user 2555.15system 2:42.52elapsed 2709%CPU (0avgtext+0avgdata 846520maxresident)k
1843.02user 2561.55system 2:43.35elapsed 2702%CPU (0avgtext+0avgdata 846520maxresident)k
Summary:
User: 1839.78
System: 2564.22: 2575.95 2555.15 2561.55
Real: 162.99
Test 5 times using enabled all mTHP pages:
==========================================
With:
-----
1796.44user 2937.33system 2:59.09elapsed 2643%CPU (0avgtext+0avgdata 846936maxresident)k
1802.55user 3002.32system 2:54.68elapsed 2750%CPU (0avgtext+0avgdata 847072maxresident)k
1806.59user 2986.53system 2:55.17elapsed 2736%CPU (0avgtext+0avgdata 847092maxresident)k
1803.27user 2982.40system 2:54.49elapsed 2742%CPU (0avgtext+0avgdata 846796maxresident)k
1807.43user 3036.08system 2:56.06elapsed 2751%CPU (0avgtext+0avgdata 846488maxresident)k
Summary (~8.4% improvement of system time):
User: 1803.25
System: 2988.93: 2937.33 3002.32 2986.53 2982.40 3036.08
Real: 175.90
mTHP swapout status:
/sys/kernel/mm/transparent_hugepage/hugepages-32kB/stats/swpout:347721
/sys/kernel/mm/transparent_hugepage/hugepages-32kB/stats/swpout_fallback:3110
/sys/kernel/mm/transparent_hugepage/hugepages-512kB/stats/swpout:3365
/sys/kernel/mm/transparent_hugepage/hugepages-512kB/stats/swpout_fallback:8269
/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/stats/swpout:24
/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/stats/swpout_fallback:3341
/sys/kernel/mm/transparent_hugepage/hugepages-1024kB/stats/swpout:145
/sys/kernel/mm/transparent_hugepage/hugepages-1024kB/stats/swpout_fallback:5038
/sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swpout:322737
/sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swpout_fallback:36808
/sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/swpout:380455
/sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/swpout_fallback:1010
/sys/kernel/mm/transparent_hugepage/hugepages-256kB/stats/swpout:24973
/sys/kernel/mm/transparent_hugepage/hugepages-256kB/stats/swpout_fallback:13223
/sys/kernel/mm/transparent_hugepage/hugepages-128kB/stats/swpout:197348
/sys/kernel/mm/transparent_hugepage/hugepages-128kB/stats/swpout_fallback:80541
Without:
--------
1794.41user 3151.29system 3:05.97elapsed 2659%CPU (0avgtext+0avgdata 846704maxresident)k
1810.27user 3304.48system 3:05.38elapsed 2759%CPU (0avgtext+0avgdata 846636maxresident)k
1809.84user 3254.85system 3:03.83elapsed 2755%CPU (0avgtext+0avgdata 846952maxresident)k
1813.54user 3259.56system 3:04.28elapsed 2752%CPU (0avgtext+0avgdata 846848maxresident)k
1829.97user 3338.40system 3:07.32elapsed 2759%CPU (0avgtext+0avgdata 847024maxresident)k
Summary:
User: 1811.61
System: 3261.72 : 3151.29 3304.48 3254.85 3259.56 3338.40
Real: 185.356
mTHP swapout status:
hugepages-32kB/stats/swpout:35630
hugepages-32kB/stats/swpout_fallback:1809908
hugepages-512kB/stats/swpout:523
hugepages-512kB/stats/swpout_fallback:55235
hugepages-2048kB/stats/swpout:53
hugepages-2048kB/stats/swpout_fallback:17264
hugepages-1024kB/stats/swpout:85
hugepages-1024kB/stats/swpout_fallback:24979
hugepages-64kB/stats/swpout:30117
hugepages-64kB/stats/swpout_fallback:1825399
hugepages-16kB/stats/swpout:42775
hugepages-16kB/stats/swpout_fallback:1951123
hugepages-256kB/stats/swpout:2326
hugepages-256kB/stats/swpout_fallback:170165
hugepages-128kB/stats/swpout:17925
hugepages-128kB/stats/swpout_fallback:1309757
This patch (of 9):
Previously, the swap cluster used a cluster index as a pointer to
construct a custom single link list type "swap_cluster_list". The next
cluster pointer is shared with the cluster->count. It prevents puting the
non free cluster into a list.
Change the cluster to use the standard double link list instead. This
allows tracing the nonfull cluster in the follow up patch. That way, it
is faster to get to the nonfull cluster of that order.
Remove the cluster getter/setter for accessing the cluster struct member.
The list operation is protected by the swap_info_struct->lock.
Change cluster code to use "struct swap_cluster_info *" to reference the
cluster rather than by using index. That is more consistent with the list
manipulation. It avoids the repeat adding index to the cluser_info. The
code is easier to understand.
Remove the cluster next pointer is NULL flag, the double link list can
handle the empty list pretty well.
The "swap_cluster_info" struct is two pointer bigger, because 512 swap
entries share one swap_cluster_info struct, it has very little impact on
the average memory usage per swap entry. For 1TB swapfile, the swap
cluster data structure increases from 8MB to 24MB.
Other than the list conversion, there is no real function change in this
patch.
Link: https://lkml.kernel.org/r/20240730-swap-allocator-v5-0-cb9c148b9297@kernel.org
Link: https://lkml.kernel.org/r/20240730-swap-allocator-v5-1-cb9c148b9297@kernel.org
Signed-off-by: Chris Li <chrisl@kernel.org>
Reported-by: Barry Song <21cnbao@gmail.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
DMA ops are a helper for architectures and not for drivers to override
the DMA implementation.
Unfortunately driver authors keep ignoring this. Make the fact more
clear by renaming the symbol to ARCH_HAS_DMA_OPS and having the two drivers
overriding their dma_ops depend on that. These drivers should probably be
marked broken, but we can give them a bit of a grace period for that.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> # for IPU6
Acked-by: Robin Murphy <robin.murphy@arm.com>
|
|
- Resolve trivial context conflicts from dl_server clearing being moved
around.
- Add @next to put_prev_task_scx() and @prev to pick_next_task_scx() to
match sched/core.
- Merge sched_class->switch_class() addition from sched_ext with
tip/sched/core changes in __pick_next_task().
- Make pick_next_task_scx() call put_prev_task_scx() to emulate the previous
behavior where sched_class->put_prev_task() was called before
sched_class->pick_next_task().
While this makes sched_ext build and function, the behavior is not in line
with other sched classes. The follow-up patches will address the
discrepancies and remove sched_class->switch_class().
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
If we trigger the bus rescan from sysfs, we'll try to lock the PCI rescan
mutex recursively and deadlock - the platform device will be populated and
probed on the same thread that handles the sysfs write.
Add a workqueue to the pwrctl code on which we schedule the rescan for
controlled PCI devices. While at it: add a new interface for initializing
the pwrctl context where we'd now assign the parent device address and
initialize the workqueue.
Link: https://lore.kernel.org/r/20240823093323.33450-3-brgl@bgdev.pl
Fixes: 4565d2652a37 ("PCI/pwrctl: Add PCI power control core code")
Reported-by: Konrad Dybcio <konradybcio@kernel.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
|
|
Immutable branch between MFD, IIO and power-supply providing the
register definitions needed for AXP717 support in the axp20x
axp20x_battery and axp20x_usb_power drivers.
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
Changing usb_types type from array to bitmap in the power_supply_desc
struct requires updating power-supply drivers living in different
subsystem, so it is handled via an immutable branch.
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
The bit_types array just hold a list of valid enum power_supply_usb_type
values which map to 0 - 9. This can easily be represented as a bitmap.
This reduces the size of struct power_supply_desc and further reduces
the data section size by drivers no longer needing to store the array.
This also unifies how usb_types are handled with charge_behaviours,
which allows power_supply_show_usb_type() to be removed.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20240831142039.28830-7-hdegoede@redhat.com
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
|
|
e01c9797c0eb ("PCI: endpoint: Clean up hardware description for BARs")
added enum pci_epc_bar_type with incomplete kerneldoc. Add the missing
piece.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
There's a potential race when `cgroup_bpf_enabled(CGROUP_GETSOCKOPT)` is
false during the execution of `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN`, but
becomes true when `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is called.
This inconsistency can lead to `BPF_CGROUP_RUN_PROG_GETSOCKOPT` receiving
an "-EFAULT" from `__cgroup_bpf_run_filter_getsockopt(max_optlen=0)`.
Scenario shown as below:
`process A` `process B`
----------- ------------
BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN
enable CGROUP_GETSOCKOPT
BPF_CGROUP_RUN_PROG_GETSOCKOPT (-EFAULT)
To resolve this, remove the `BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN` macro and
directly uses `copy_from_sockptr` to ensure that `max_optlen` is always
set before `BPF_CGROUP_RUN_PROG_GETSOCKOPT` is invoked.
Fixes: 0d01da6afc54 ("bpf: implement getsockopt and setsockopt hooks")
Co-developed-by: Yanghui Li <yanghui.li@mediatek.com>
Signed-off-by: Yanghui Li <yanghui.li@mediatek.com>
Co-developed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Link: https://patch.msgid.link/20240830082518.23243-1-Tze-nan.Wu@mediatek.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2024-09-01
Simon Horman catched two typos in our headers. No functional change.
* tag 'ieee802154-for-net-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan:
ieee802154: Correct spelling in nl802154.h
mac802154: Correct spelling in mac802154.h
====================
Link: https://patch.msgid.link/20240901184213.2303047-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
There are no longer any users of the platform data struct. Remove
support for it from the driver.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Link: https://patch.msgid.link/20240814092629.9862-1-brgl@bgdev.pl
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
Store new timeout and expiration in transaction object, use them to
update elements from .commit path. Otherwise, discard update if .abort
path is exercised.
Use update_flags in the transaction to note whether the timeout,
expiration, or both need to be updated.
Annotate access to timeout extension now that it can be updated while
lockless read access is possible.
Reject timeout updates on elements with no timeout extension.
Element transaction remains in the 96 bytes kmalloc slab on x86_64 after
this update.
This patch requires ("netfilter: nf_tables: use timestamp to check for
set element timeout") to make sure an element does not expire while
transaction is ongoing.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch uses zero as timeout marker for those elements that never expire
when the element is created.
If userspace provides no timeout for an element, then the default set
timeout applies. However, if no default set timeout is specified and
timeout flag is set on, then timeout extension is allocated and timeout
is set to zero to allow for future updates.
Use of zero a never timeout marker has been suggested by Phil Sutter.
Note that, in older kernels, it is already possible to define elements
that never expire by declaring a set with the set timeout flag set on
and no global set timeout, in this case, new element with no explicit
timeout never expire do not allocate the timeout extension, hence, they
never expire. This approach makes it complicated to accomodate element
timeout update, because element extensions do not support reallocations.
Therefore, allocate the timeout extension and use the new marker for
this case, but do not expose it to userspace to retain backward
compatibility in the set listing.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Expiration and timeout are stored in separated set element extensions,
but they are tightly coupled. Consolidate them in a single extension to
simplify and prepare for set element updates.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
element expiration can be read-write locklessly, it can be written by
dynset and read from netlink dump, add annotation.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Abide by the simple rule:
pick_next_task() := pick_task() + set_next_task(.first = true)
This allows us to trivially get rid of server_pick_next() and things
collapse nicely.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20240813224015.837303391@infradead.org
|
|
Stephen reported that there is a kernel build warning due to a missing
description of a parameter in mapping_align_index().
Add the missing index parameter in the comment description.
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Link: https://lore.kernel.org/r/20240827084206.106347-2-kernel@pankajraghav.com
Fixes: ab95d23bab22 ("filemap: allocate mapping_min_order folios in the page cache")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In order to switch fuse over to using iomap for buffered writes we need
to be able to have the struct file for the original write, in case we
have to read in the page to make it uptodate. Handle this by using the
existing private field in the iomap_iter, and add the argument to
iomap_file_buffered_write. This will allow us to pass the file in
through the iomap buffered write path, and is flexible for any other
file systems needs.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/7f55c7c32275004ba00cddf862d970e6e633f750.1724755651.git.josef@toxicpanda.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Page cache now has the ability to have a minimum order when allocating
a folio which is a prerequisite to add support for block size > page
size.
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20240827-xfs-fix-wformat-bs-gt-ps-v1-1-aec6717609e0@kernel.org # fix folded
Link: https://lore.kernel.org/r/20240822135018.1931258-11-kernel@pankajraghav.com
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Replace with already defined values for readability. While at it, let's
also change the mode-parameter from an int to bool, as the only used values
are 0 or 1.
Signed-off-by: Chanwoo Lee <cw9316.lee@samsung.com>
Link: https://lore.kernel.org/r/20240829024709.402285-1-cw9316.lee@samsung.com
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
|
|
Since commit d492cc2573a0 ("driver core: device.h: make struct
bus_type a const *"), the driver core can properly handle constant
struct bus_type, move the fsl_mc_bus_type variable to be a constant
structure as well, placing it into read-only memory which can not be
modified at runtime.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Acked-by: Christophe Leroy <christophe.leroy@csgroup.eu> # for
Link: https://lore.kernel.org/r/20240823062440.113628-1-kunwu.chan@linux.dev
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add code to allow asynchronous shutdown of devices, ensuring that each
device is shut down before its parents & suppliers.
Only devices with drivers that have async_shutdown_enable enabled will be
shut down asynchronously.
This can dramatically reduce system shutdown/reboot time on systems that
have multiple devices that take many seconds to shut down (like certain
NVMe drives). On one system tested, the shutdown time went from 11 minutes
without this patch to 55 seconds with the patch.
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Tested-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20240822202805.6379-4-stuart.w.hayes@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Since commit d492cc2573a0 ("driver core: device.h: make struct
bus_type a const *"), the driver core can properly handle constant
struct bus_type, move the platform_bus_type variable to be a constant
structure as well, placing it into read-only memory which can not be
modified at runtime.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Link: https://lore.kernel.org/r/20240823075544.144426-1-kunwu.chan@linux.dev
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There are several drivers/base APIs for finding a specific device, and
they currently use the following good type for the @match parameter:
int (*match)(struct device *dev, const void *data)
Since these operations do not modify the caller-provided @*data, this
type is worthy of a dedicated typedef:
typedef int (*device_match_t)(struct device *dev, const void *data)
Advantages of using device_match_t:
- Shorter API declarations and definitions
- Prevent further APIs from using a bad type for @match
So introduce device_match_t and apply it to the existing
(bus|class|driver|auxiliary)_find_device() APIs.
Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com>
Link: https://lore.kernel.org/r/20240813-dev_match_api-v3-1-6c6878a99b9f@quicinc.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into soc/dt
Samsung DTS ARM64 changes for v6.12
1. Exynos7885: Correct amount of RAM on Samsung Galaxy A8.
2. ExynosAutov9: Add new DPUM clock controller and DPUM IOMMU (SysMMU).
3. ExynosAutov920: Add initial (incomplete) clock controllers: TOP and
PERIC0 controllers.
4. Google GS101: Add reboot and poweroff support.
5. Add binding headers with clock IDs for several devices, used by the
DTS.
* tag 'samsung-dt64-6.12' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
arm64: dts: exynosautov920: add initial CMU clock nodes in ExynosAuto v920
dt-bindings: clock: add ExynosAuto v920 SoC CMU bindings
arm64: dts: exynosautov9: Add dpum SysMMU
arm64: dts: exynosautov9: add dpum clock DT nodes
dt-bindings: clock: exynosautov9: add dpum clock
dt-bindings: clock: exynos7885: Add indices for USB clocks
dt-bindings: clock: exynos7885: Add CMU_TOP PLL MUX indices
dt-bindings: clock: exynos7885: Fix duplicated binding
dt-bindings: clock: exynos850: Add TMU clock
arm64: dts: exynos: gs101: add syscon-poweroff and syscon-reboot nodes
arm64: dts: exynos: exynos7885-jackpotlte: Correct RAM amount to 4GB
Link: https://lore.kernel.org/r/20240827121638.29707-2-krzysztof.kozlowski@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux into char-misc-next
Suzuki writes:
coresight: updates for Linux v6.12
CoreSight/hwtracing subsystem updates targeting Linux v6.12:
- Miscellaneous fixes and cleanups
- TraceID allocation per sink, allowing system with > 110 cores for
perf tracing.
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
* tag 'coresight-next-v6.12' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/coresight/linux:
coresight: Make trace ID map spinlock local to the map
coresight: Emit sink ID in the HW_ID packets
coresight: Remove pending trace ID release mechanism
coresight: Use per-sink trace ID maps for Perf sessions
coresight: Make CPU id map a property of a trace ID map
coresight: Expose map arguments in trace ID API
coresight: Move struct coresight_trace_id_map to common header
coresight: Clarify comments around the PID of the sink owner
coresight: Remove unused ETM Perf stubs
coresight: tmc: sg: Do not leak sg_table
Coresight: Set correct cs_mode for dummy source to fix disable issue
Coresight: Set correct cs_mode for TPDM to fix disable issue
coresight: cti: use device_* to iterate over device child nodes
|
|
Correct spelling in iw_handler.h.
As reported by codespell.
Also, while the "few shortcomings" line is being updated,
correct its grammar.
Signed-off-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20240903-wifi-spell-v2-1-bfcf7062face@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
NETIF_F_ALL_FCOE is used only in vlan_dev.c, 2 times. Now that it's only
2 bits, open-code it and remove the definition from netdev_features.h.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Ability to handle maximum FCoE frames of 2158 bytes can never be changed
and thus more of an attribute, not a toggleable feature.
Move it from netdev_features_t to "cold" priv flags (bitfield bool) and
free yet another feature bit.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
"Interface can't change network namespaces" is rather an attribute,
not a feature, and it can't be changed via Ethtool.
Make it a "cold" private flag instead of a netdev_feature and free
one more bit.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
NETIF_F_LLTX can't be changed via Ethtool and is not a feature,
rather an attribute, very similar to IFF_NO_QUEUE (and hot).
Free one netdev_features_t bit and make it a "hot" private flag.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Make dev->priv_flags `u32` back and define bits higher than 31 as
bitfield booleans as per Jakub's suggestion. This simplifies code
which accesses these bits with no optimization loss (testb both
before/after), allows to not extend &netdev_priv_flags each time,
but also scales better as bits > 63 in the future would only add
a new u64 to the structure with no complications, comparing to
that extending ::priv_flags would require converting it to a bitmap.
Note that I picked `unsigned long :1` to not lose any potential
optimizations comparing to `bool :1` etc.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio into char-misc-testing
Jonathan writes:
IIO: 1st set of new device support, features and cleanup for 6.12
Includes a merge of spi-mos-config branch from spi.git that brings
support needed for the AD4000 driver.
Lots of new device support this time including 9 new drivers and substantial
changes to add new support to several more.
New device support
------------------
Given we have a lot of new support, I've subcategorized them:
Substantial changes, or new driver
**********************************
adi,ad4000
- New driver for this high speed ADC.
adi,ad4695
- New driver supporting AD4690, AD4696, AD4697 and AD4698 ADCs.
- Follow up series added triggered buffer support.
adi,ad7380
- Add support for single ended parts, AD7386, ADC7387, AD7388 and -4 variants.
(driver previously only support differential parts).
These variants have an additional front end MUX so only half the channels
can be sampled efficiently.
adi,ad9467
- Refactor and extend driver to support ad9643, ad9449 and ad9652 high speed
ADCs.
adi,adxl380
- New driver for this low power accelerometer.
adi,ltc2664
- New driver supporting LTC2664 and LTC2672 DACs.
microchip,pac1921
- New driver for this power/current monitor chip.
rohm,bh1745
- New driver for this RGBC colour sensor.
rohm,bu27034anuc
- The original bu27034 was canceled before mass production, so the
driver is modified to support the BU27034ANUC which had some significant
differences. DT compatible changed to avoid chance of old driver ever
binding to real hardware.
sciosense,ens210
- New driver for ens210, ens210a, ens211, ens212, ens213a, and ens215
temperature and humidity sensors (all register compatible up to some
conversion time differences)
sensiron,sdp500
- New driver for this differential pressure sensor.
tyhx,hx9023s
- New driver to support this capacitive proximity sensor.
Minor changes to support new devices
************************************
adi,adf4377
- Add support for the single output adf4378.
kionix,kxcjk-1013
- Add support for KX022-1020 accelerometer (binding and ID table only)
liteon,ltrf216a
- Add support for ltr-308. A few minor differences in features set
rockchip,saradc
- Add ID for rk3576-saradc
sensortek,stk3310
- Add ID for stk3013 proximity sensor which (despite documentation) has
an ambient light sensor and is compatible with existing parts.
Documentation updates
---------------------
Generalize ABI docs for shunt resistor attribute
Improve calibscale and calibbias related documentation. A couple of follow
up patches to resolve duplicate documentation that resulted.
New core features
-----------------
backend
- Add option for debugfs - useful for test pattern control
- Use this for both adi-axi-adc and adi-axi-dac
trigger suspend
- Add functions to allow triggers to be suspended. This avoids problems
when a device enters suspend to idle with a sysfs trigger. Use it for now
in the bmi323 only.
New driver features
-------------------
adi,ad7192
- Add option to be a clock provider (+ additional clock config options)
adi,ad7380
- Add documentation for this fairly new driver.
adi,ad9461
- Provide control of test modes and backend validation blocks used
to identify problems (via debugfs)
adi,ad9739
- Add backend debugfs and docs for what is provided via adi-axi-dac
avago,apds9960
- Add proximity and gesture calibration offset control
bosch,bmp280
- Triggered buffer support including adding raw+scale output for sysfs.
liteon,ltr390
- Add configuration of integration time and scale.
stm,dfsdm
- Convert this SD modulator driver to backend framework and add support
for channel scaling + modern channel bindings.
Treewide cleanup
----------------
iio_dev->masklength: Making it private.
- Provide access function to read the core compute channel mask length
and a macro to iterate over elements in the active_scan_mask.
- Enables marking masklength __private preventing drivers from
writing it without triggering a build warning whilst minimizing overhead
in what are typically hot paths.
- Convert all drivers and finally mark it private.
Merge conflicts resolved in drivers applied after this point.
Constify regmap_bus
- These are never modified, so mark them const.
Core cleanup
------------
backend
- A few late breaking bits of feedback (unused variable, error messages)
dma-buffer
- Namespace exports.
core
- Drop unused assignment.
Driver cleanup
--------------
adi,ad4695
- Fixing binding to reflect that common-mode-channel is a scalar.
adi,ad7280a
- Use __free(kfree) to simplify freeing of receive buffer.
adi,ad7606
- Various dt-binding cleanup and improvements.
- Fix oversampling related gpio handling.
- Make polarity of standby gpio match documentation.
- use guard() to simplify lock handling.
adi,ad7768
- Use device_for_each_child_node_scoped() instead of fwnode equivalent.
adi,ad7124
- Reduce SPI transfers by avoiding separate writes to different fields
in the same register.
- Start the ADC in idle mode.
adi,adis
- Drop ifdefs in favor of IS_ENABLED.
adi,admv8818
- Fix wrong ABI docs.
asahi-kasei,ak8975
- Drop a prefix free compatible accidentally added recently.
aspeed,adc
- Use of_property_present() instead of of_find_property() to see if the
property is there or not.
atmel,at91,
- Use __free(kfree) to simplify freeing of channel related array.
bosch,bma400
- Use __free(kfree) to simplify freeing a locally allocated string.
bosch,bmc150
- Add missing mount-matrix binding docs.
bosch,bme680
- Fix read/write to ensure multiple necessary sequential reads without
device configuration change.
- Drop unnecessary type casts and use more appropriate data types.
- Drop some left over ACPI code as ACPI support was removed due to invalid
IDs (and no known users).
- Sort headers consistently.
- Avoid unnecessary duplicate read and redundant read of gas config.
- Use bulk reads to get calibration data.
- Reorder allocation of IIO device to be prior to device init.
- Add remaining read/write buffers to the union used already for all others.
- Tidy up error checks for consistency of style, including dev_err_probe()
- Bring the device startup procedure inline with the vendor code.
- Reorder code so mode forcing is more obvious occurring where needed.
- Tidy up data locality in reading functions so no magic data is stored
in state structures just to get it across function calls.
- Make a local lookup table static to avoid placing it on the stack.
bosch,bmp280
- Fix BME280 regmap to not include registers it doesn't have.
- Wait a little longer after config to allow for maximum possible necessary
wait.
- Reorganize headers.
- Make conversion_time_max array static to avoid placing it on the stack.
maxim,max1363
- Use __free(kfree) to simplify freeing transmission buffer.
microchip,mcp3964
- Use devm_regulator_get_enable_read_voltage()
microchip,mcp3911
- Use devm_regulator_get_enable_read_voltage()
microchip,mcp4728
- Use devm_regulator_get_enable_read_voltage()
microchip,mcp4922
- Use devm_regulator_get_enable_read_voltage() and devm_* to allow
dropping of explicit remove() callback.
onnn,noa1305
- Various tidy up.
- Provide available scale values.
- Make integration time configurable.
- Fix up integration time look up (/2 error)
ti,dac7311
- Check if spi_setup() succeeded.
ti,tsc2046
- Use __free(kfree) to simplify freeing rx and tx buffers.
- Use devm_regulator_get_enable_read_voltage()
Various minor fixes not called out explicitly.
* tag 'iio-for-6.12a' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jic23/iio: (250 commits)
drivers:iio:Fix the NULL vs IS_ERR() bug for debugfs_create_dir()
iio: sgp40: retain documentation in driver
iio: ABI: remove duplicate in_resistance_calibbias
dt-bindings: iio: st,stm32-adc: add top-level constraints
iio: ABI: add missing calibbias attributes
iio: ABI: add missing calibscale attributes
iio: ABI: sort calibscale attributes
iio: ABI: document calibscale_available attributes
iio: light: ltr390: Calculate 'counts_per_uvi' dynamically
iio: light: ltr390: Add ALS channel and support for gain and resolution
doc: iio: ad4695: document buffered read
iio: adc: ad4695: implement triggered buffer
iio: proximity: hx9023s: Fix error code in hx9023s_property_get()
iio: light: noa1305: Fix up integration time look up
iio: humidity: Add support for ENS210
dt-bindings: iio: humidity: add ENS210 sensor family
iio: imu: adis16460: drop ifdef around CONFIG_DEBUG_FS
iio: imu: adis16400: drop ifdef around CONFIG_DEBUG_FS
iio: imu: adis16480: drop ifdef around CONFIG_DEBUG_FS
iio: imu: adis16475: drop ifdef around CONFIG_DEBUG_FS
...
|
|
- Add missing documentation of struct field and enum items.
- Add missing documentation of function parameter.
Flagged by ./scripts/kernel-doc -none.
No functional change intended.
Compile tested only.
Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Correct spelling in nf_tables.h.
As reported by codespell.
Signed-off-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Since commit a654de8fdc18 ("netfilter: nf_tables: fix chain dependency validation")
the validate() callback no longer needs the return pointer argument.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
We move the func_utils.h header to include/linux/usb to be
able to compile function drivers outside of the
drivers/usb/gadget/function directory.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Link: https://lore.kernel.org/r/20240116-ml-topic-u9p-v12-1-9a27de5160e0@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add 'tunnel_mode' enum to usb device structure to describe if a USB3
link is tunneled over USB4, or connected directly using native USB2/USB3
protocols.
Tunneled devices depend on USB4 NHI host to maintain the tunnel.
Knowledge about tunneled devices is important to ensure correct
suspend and resume order between USB4 hosts and tunneled devices.
i.e. make sure tunnel is up before the USB device using it resumes.
USB hosts such as xHCI may have vendor specific ways to detect tunneled
connections. This 'tunnel_mode' parameter can be set by USB3 host driver
during hcd->driver->update_device(hcd, udev) callback.
tunnel_mode can be set to:
USB_LINK_UNKNOWN = 0
USB_LINK_NATIVE
USB_LINK_TUNNELED
USB_LINK_UNKNOWN is used in case host is not capable of detecting
tunneled links.
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Link: https://lore.kernel.org/r/20240830152630.3943215-3-mathias.nyman@linux.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
- A series from Hervé Codina that bring support for the newer version
of QMC (QUICC Multi-channel Controller) and TSA (Time Slots Assigner)
found on MPC 83xx micro-controllers.
- Misc changes for qbman freescale drivers for removing a redundant
warning and using iommu_paging_domain_alloc()
* tag 'soc_fsl-6.12-2' of https://github.com/chleroy/linux: (38 commits)
soc: fsl: qbman: Remove redundant warnings
soc: fsl: qbman: Use iommu_paging_domain_alloc()
MAINTAINERS: Add QE files related to the Freescale QMC controller
soc: fsl: cpm1: qmc: Handle QUICC Engine (QE) soft-qmc firmware
soc: fsl: cpm1: qmc: Add support for QUICC Engine (QE) implementation
soc: fsl: qe: Add missing PUSHSCHED command
soc: fsl: qe: Add resource-managed muram allocators
soc: fsl: cpm1: qmc: Introduce qmc_version
soc: fsl: cpm1: qmc: Rename SCC_GSMRL_MODE_QMC
soc: fsl: cpm1: qmc: Handle RPACK initialization
soc: fsl: cpm1: qmc: Rename qmc_chan_command()
soc: fsl: cpm1: qmc: Introduce qmc_{init,exit}_xcc() and their CPM1 version
soc: fsl: cpm1: qmc: Introduce qmc_init_resource() and its CPM1 version
soc: fsl: cpm1: qmc: Re-order probe() operations
soc: fsl: cpm1: qmc: Introduce qmc_data structure
dt-bindings: soc: fsl: cpm_qe: Add QUICC Engine (QE) QMC controller
soc: fsl: cpm1: qmc: Add missing spinlock comment
soc: fsl: cpm1: qmc: Fix 'transmiter' typo
soc: fsl: cpm1: qmc: Remove unneeded parenthesis
soc: fsl: cpm1: qmc: Fix blank line and spaces
...
Link: https://lore.kernel.org/r/326d9a7d-7674-4c28-aa40-dd2c190244dd@csgroup.eu
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
The PUSHSCHED command is missing in the QE header file.
This command is supported on MPC8321 and is used to modify the start
address for the task running on a given peripheral. It is needed for the
QMC in order to perform the re-initialization procedure and so, ensure
the correct UCC setup in that case.
Simply add the missing command in the commands list available in the QE
header file.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20240808071132.149251-34-herve.codina@bootlin.com
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
|
|
Introduce devm_cpm_muram_alloc() and devm_cpm_muram_alloc_fixed(), the
resource-managed version of cpm_muram_alloc and cpm_muram_alloc_fixed().
These resource-managed versions simplify the user avoiding the need to
call cpm_muram_free(). Indeed, the allocated area returned by these
functions will be automatically freed on driver detach.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20240808071132.149251-33-herve.codina@bootlin.com
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
|
|
Add support for the time slot assigner (TSA) available in some
PowerQUICC SoC that uses a QUICC Engine (QE) block such as MPC8321.
This QE TSA is similar to the CPM TSA except that it uses UCCs (Unified
Communication Controllers) instead of SCCs (Serial Communication
Controllers). Also, compared against the CPM TSA, this QE TSA can handle
up to 4 TDMs instead of 2 and allows to configure the logic level of
sync signals.
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Rob Herring (Arm) <robh@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/r/20240808071132.149251-8-herve.codina@bootlin.com
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
|
|
We need the USB fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
split_folio() and split_folio_to_list() assume order 0, to support
minorder for non-anonymous folios, we must expand these to check the
folio mapping order and use that.
Set new_order to be at least minimum folio order if it is set in
split_huge_page_to_list() so that we can maintain minimum folio order
requirement in the page cache.
Update the debugfs write files used for testing to ensure the order
is respected as well. We simply enforce the min order when a file
mapping is used.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Link: https://lore.kernel.org/r/20240902124931.506061-2-kernel@pankajraghav.com # folded fix
Link: https://lore.kernel.org/r/20240822135018.1931258-5-kernel@pankajraghav.com
Tested-by: David Howells <dhowells@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|