summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-11-11mm: swap: count successful large folio zswap stores in hugepage zswpout statsKanchana P Sridhar
Added a new MTHP_STAT_ZSWPOUT entry to the sysfs transparent_hugepage stats so that successful large folio zswap stores can be accounted under the per-order sysfs "zswpout" stats: /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout Other non-zswap swap device swap-out events will be counted under the existing sysfs "swpout" stats: /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/swpout Also, added documentation for the newly added sysfs per-order hugepage "zswpout" stats. The documentation clarifies that only non-zswap swapouts will be accounted in the existing "swpout" stats. Link: https://lkml.kernel.org/r/20241001053222.6944-8-kanchana.p.sridhar@intel.com Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Wajdi Feghali <wajdi.k.feghali@intel.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: "Zou, Nanhai" <nanhai.zou@intel.com> Cc: Barry Song <21cnbao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11mm: zswap: support large folios in zswap_store()Kanchana P Sridhar
This series enables zswap_store() to accept and store large folios. The most significant contribution in this series is from the earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been migrated to mm-unstable as of 9-30-2024 in patch 6 of this series, and adapted based on code review comments received for the current patch-series. [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting https://lore.kernel.org/linux-mm/20231019110543.3284654-1-ryan.roberts@arm.com/T/#u The first few patches do the prep work for supporting large folios in zswap_store. Patch 6 provides the main functionality to swap-out large folios in zswap. Patch 7 adds sysfs per-order hugepages "zswpout" counters that get incremented upon successful zswap_store of large folios, and also updates the documentation for this: /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout This series is a pre-requisite for zswap compress batching of large folio swap-out and decompress batching of swap-ins based on swapin_readahead(), using Intel IAA hardware acceleration, which we would like to submit in subsequent patch-series, with performance improvement data. Thanks to Ying Huang for pre-posting review feedback and suggestions! Thanks also to Nhat, Yosry, Johannes, Barry, Chengming, Usama, Ying and Matthew for their helpful feedback, code/data reviews and suggestions! I would like to thank Ryan Roberts for his original RFC [1]. System setup for testing: ========================= Testing of this series was done with mm-unstable as of 9-27-2024, commit de2fbaa6d9c3576ec7133ed02a370ec9376bf000 (without this patch-series) and mm-unstable 9-30-2024 commit c121617e3606be6575cdacfdb63cc8d67b46a568 (with this patch-series). Data was gathered on an Intel Sapphire Rapids server, dual-socket 56 cores per socket, 4 IAA devices per socket, 503 GiB RAM and 525G SSD disk partition swap. Core frequency was fixed at 2500MHz. The vm-scalability "usemem" test was run in a cgroup whose memory.high was fixed at 150G. The is no swap limit set for the cgroup. 30 usemem processes were run, each allocating and writing 10G of memory, and sleeping for 10 sec before exiting: usemem --init-time -w -O -s 10 -n 30 10g Other kernel configuration parameters: zswap compressors : zstd, deflate-iaa zswap allocator : zsmalloc vm.page-cluster : 2 In the experiments where "deflate-iaa" is used as the zswap compressor, IAA "compression verification" is enabled by default (cat /sys/bus/dsa/drivers/crypto/verify_compress). Hence each IAA compression will be decompressed internally by the "iaa_crypto" driver, the crc-s returned by the hardware will be compared and errors reported in case of mismatches. Thus "deflate-iaa" helps ensure better data integrity as compared to the software compressors, and the experimental data listed below is with verify_compress set to "1". Metrics reporting methodology: ============================== Total and average throughput are derived from the individual 30 processes' throughputs reported by usemem. elapsed/sys times are measured with perf. All percentage changes are "new" vs. "old"; hence a positive value denotes an increase in the metric, whether it is throughput or latency, and a negative value denotes a reduction in the metric. Positive throughput change percentages and negative latency change percentages denote improvements. The vm stats and sysfs hugepages stats included with the performance data provide details on the swapout activity to zswap/swap device. Testing labels used in data summaries: ====================================== The data refers to these test configurations and the before/after comparisons that they do: before-case1: ------------- mm-unstable 9-27-2024, CONFIG_THP_SWAP=N (compares zswap 4K vs. zswap 64K) In this scenario, CONFIG_THP_SWAP=N results in 64K/2M folios to be split into 4K folios that get processed by zswap. before-case2: ------------- mm-unstable 9-27-2024, CONFIG_THP_SWAP=Y (compares SSD swap large folios vs. zswap large folios) In this scenario, CONFIG_THP_SWAP=Y results in zswap rejecting large folios, which will then be stored by the SSD swap device. after: ------ v10 of this patch-series, CONFIG_THP_SWAP=Y The "after" is CONFIG_THP_SWAP=Y and v10 of this patch-series, that results in 64K/2M folios to not be split, and to be processed by zswap_store. Regression Testing: =================== I ran vm-scalability usemem without large folios, i.e., only 4K folios with mm-unstable and this patch-series. The main goal was to make sure that there is no functional or performance regression wrt the earlier zswap behavior for 4K folios, now that 4K folios will be processed by the new zswap_store() code. The data indicates there is no significant regression. ------------------------------------------------------------------------------- 4K folios: ========== zswap compressor zstd zstd zstd zstd v10 before-case1 before-case2 after vs. vs. case1 case2 ------------------------------------------------------------------------------- Total throughput (KB/s) 4,793,363 4,880,978 4,853,074 1% -1% Average throughput (KB/s) 159,778 162,699 161,769 1% -1% elapsed time (sec) 130.14 123.17 126.29 -3% 3% sys time (sec) 3,135.53 2,985.64 3,083.18 -2% 3% memcg_high 446,826 444,626 452,930 memcg_swap_fail 0 0 0 zswpout 48,932,107 48,931,971 48,931,820 zswpin 383 386 397 pswpout 0 0 0 pswpin 0 0 0 thp_swpout 0 0 0 thp_swpout_fallback 0 0 0 64kB-mthp_swpout_fallback 0 0 0 pgmajfault 3,063 3,077 3,479 swap_ra 93 94 96 swap_ra_hit 47 47 50 ZSWPOUT-64kB n/a n/a 0 SWPOUT-64kB 0 0 0 ------------------------------------------------------------------------------- Performance Testing: ==================== We list the data for 64K folios with before/after data per-compressor, followed by the same for 2M pmd-mappable folios. ------------------------------------------------------------------------------- 64K folios: zstd: ================= zswap compressor zstd zstd zstd zstd v10 before-case1 before-case2 after vs. vs. case1 case2 ------------------------------------------------------------------------------- Total throughput (KB/s) 5,222,213 1,076,611 6,159,776 18% 472% Average throughput (KB/s) 174,073 35,887 205,325 18% 472% elapsed time (sec) 120.50 347.16 108.33 -10% -69% sys time (sec) 2,930.33 248.16 2,549.65 -13% 927% memcg_high 416,773 552,200 465,874 memcg_swap_fail 3,192,906 1,293 1,012 zswpout 48,931,583 20,903 48,931,218 zswpin 384 363 410 pswpout 0 40,778,448 0 pswpin 0 16 0 thp_swpout 0 0 0 thp_swpout_fallback 0 0 0 64kB-mthp_swpout_fallback 3,192,906 1,293 1,012 pgmajfault 3,452 3,072 3,061 swap_ra 90 87 107 swap_ra_hit 42 43 57 ZSWPOUT-64kB n/a n/a 3,057,173 SWPOUT-64kB 0 2,548,653 0 ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 64K folios: deflate-iaa: ======================== zswap compressor deflate-iaa deflate-iaa deflate-iaa deflate-iaa v10 before-case1 before-case2 after vs. vs. case1 case2 ------------------------------------------------------------------------------- Total throughput (KB/s) 5,652,608 1,089,180 7,189,778 27% 560% Average throughput (KB/s) 188,420 36,306 239,659 27% 560% elapsed time (sec) 102.90 343.35 87.05 -15% -75% sys time (sec) 2,246.86 213.53 1,864.16 -17% 773% memcg_high 576,104 502,907 642,083 memcg_swap_fail 4,016,117 1,407 1,478 zswpout 61,163,423 22,444 57,798,716 zswpin 401 368 454 pswpout 0 40,862,080 0 pswpin 0 20 0 thp_swpout 0 0 0 thp_swpout_fallback 0 0 0 64kB-mthp_swpout_fallback 4,016,117 1,407 1,478 pgmajfault 3,063 3,153 3,122 swap_ra 96 93 156 swap_ra_hit 46 45 83 ZSWPOUT-64kB n/a n/a 3,611,032 SWPOUT-64kB 0 2,553,880 0 ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 2M folios: zstd: ================ zswap compressor zstd zstd zstd zstd v10 before-case1 before-case2 after vs. vs. case1 case2 ------------------------------------------------------------------------------- Total throughput (KB/s) 5,895,500 1,109,694 6,484,224 10% 484% Average throughput (KB/s) 196,516 36,989 216,140 10% 484% elapsed time (sec) 108.77 334.28 106.33 -2% -68% sys time (sec) 2,657.14 94.88 2,376.13 -11% 2404% memcg_high 64,200 66,316 56,898 memcg_swap_fail 101,182 70 27 zswpout 48,931,499 36,507 48,890,640 zswpin 380 379 377 pswpout 0 40,166,400 0 pswpin 0 0 0 thp_swpout 0 78,450 0 thp_swpout_fallback 101,182 70 27 2MB-mthp_swpout_fallback 0 0 27 pgmajfault 3,067 3,417 3,311 swap_ra 91 90 854 swap_ra_hit 45 45 810 ZSWPOUT-2MB n/a n/a 95,459 SWPOUT-2MB 0 78,450 0 ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 2M folios: deflate-iaa: ======================= zswap compressor deflate-iaa deflate-iaa deflate-iaa deflate-iaa v10 before-case1 before-case2 after vs. vs. case1 case2 ------------------------------------------------------------------------------- Total throughput (KB/s) 6,286,587 1,126,785 7,073,464 13% 528% Average throughput (KB/s) 209,552 37,559 235,782 13% 528% elapsed time (sec) 96.19 333.03 85.79 -11% -74% sys time (sec) 2,141.44 99.96 1,826.67 -15% 1727% memcg_high 99,253 64,666 79,718 memcg_swap_fail 129,074 53 165 zswpout 61,312,794 28,321 56,045,120 zswpin 383 406 403 pswpout 0 40,048,128 0 pswpin 0 0 0 thp_swpout 0 78,219 0 thp_swpout_fallback 129,074 53 165 2MB-mthp_swpout_fallback 0 0 165 pgmajfault 3,430 3,077 31,468 swap_ra 91 103 84,373 swap_ra_hit 47 46 84,317 ZSWPOUT-2MB n/a n/a 109,229 SWPOUT-2MB 0 78,219 0 ------------------------------------------------------------------------------- And finally, this is a comparison of deflate-iaa vs. zstd with v10 of this patch-series: --------------------------------------------- zswap_store large folios v10 Impr w/ deflate-iaa vs. zstd 64K folios 2M folios --------------------------------------------- Throughput (KB/s) 17% 9% elapsed time (sec) -20% -19% sys time (sec) -27% -23% --------------------------------------------- Conclusions based on the performance results: ============================================= v10 wrt before-case1: --------------------- We see significant improvements in throughput, elapsed and sys time for zstd and deflate-iaa, when comparing before-case1 (THP_SWAP=N) vs. after (THP_SWAP=Y) with zswap_store large folios. v10 wrt before-case2: --------------------- We see even more significant improvements in throughput and elapsed time for zstd and deflate-iaa, when comparing before-case2 (large-folio-SSD) vs. after (large-folio-zswap). The sys time increases with large-folio-zswap as expected, due to the CPU compression time vs. asynchronous disk write times, as pointed out by Ying and Yosry. In before-case2, when zswap does not store large folios, only allocations and cgroup charging due to 4K folio zswap stores count towards the cgroup memory limit. However, in the after scenario, with the introduction of zswap_store() of large folios, there is an added component of the zswap compressed pool usage from large folio stores from potentially all 30 processes, that gets counted towards the memory limit. As a result, we see higher swapout activity in the "after" data. Summary: ======== The v10 data presented above shows that zswap_store of large folios demonstrates good throughput/performance improvements compared to conventional SSD swap of large folios with a sufficiently large 525G SSD swap device. Hence, it seems reasonable for zswap_store to support large folios, so that further performance improvements can be implemented. In the experimental setup used in this patchset, we have enabled IAA compress verification to ensure additional hardware data integrity CRC checks not currently done by the software compressors. We see good throughput/latency improvements with deflate-iaa vs. zstd with zswap_store of large folios. Some of the ideas for further reducing latency that have shown promise in our experiments, are: 1) IAA compress/decompress batching. 2) Distributing compress jobs across all IAA devices on the socket. The tests run for this patchset are using only 1 IAA device per core, that avails of 2 compress engines on the device. In our experiments with IAA batching, we distribute compress jobs from all cores to the 8 compress engines available per socket. We further compress the pages in each folio in parallel in the accelerator. As a result, we improve compress latency and reclaim throughput. In decompress batching, we use swapin_readahead to generate a prefetch batch of 4K folios that we decompress in parallel in IAA. ------------------------------------------------------------------------------ IAA compress/decompress batching Further improvements wrt v10 zswap_store Sequential subpage store using "deflate-iaa": "deflate-iaa" Batching "deflate-iaa-canned" [2] Batching Additional Impr Additional Impr 64K folios 2M folios 64K folios 2M folios ------------------------------------------------------------------------------ Throughput (KB/s) 19% 43% 26% 55% elapsed time (sec) -5% -14% -10% -21% sys time (sec) 4% -7% -4% -18% ------------------------------------------------------------------------------ With zswap IAA compress/decompress batching, we are able to demonstrate significant performance improvements and memory savings in server scalability experiments in highly contended system scenarios under significant memory pressure; as compared to software compressors. We hope to submit this work in subsequent patch series. The current patch-series is a prequisite for these future submissions. This patch (of 7): zswap_store() will store large folios by compressing them page by page. This patch provides a sequential implementation of storing a large folio in zswap_store() by iterating through each page in the folio to compress and store it in the zswap zpool. zswap_store() calls the newly added zswap_store_page() function for each page in the folio. zswap_store_page() handles compressing and storing each page. We check the global and per-cgroup limits once at the beginning of zswap_store(), and only check that the limit is not reached yet. This is racy and inaccurate, but it should be sufficient for now. We also obtain initial references to the relevant objcg and pool to guarantee that subsequent references can be acquired by zswap_store_page(). A new function zswap_pool_get() is added to facilitate this. If these one-time checks pass, we compress the pages of the folio, while maintaining a running count of compressed bytes for all the folio's pages. If all pages are successfully compressed and stored, we do the cgroup zswap charging with the total compressed bytes, and batch update the zswap_stored_pages atomic/zswpout event stats with folio_nr_pages() once, before returning from zswap_store(). If an error is encountered during the store of any page in the folio, all pages in that folio currently stored in zswap will be invalidated. Thus, a folio is either entirely stored in zswap, or entirely not stored in zswap. The most important value provided by this patch is it enables swapping out large folios to zswap without splitting them. Furthermore, it batches some operations while doing so (cgroup charging, stats updates). This patch also forms the basis for building compress batching of pages in a large folio in zswap_store() by compressing up to say, 8 pages of the folio in parallel in hardware using the Intel In-Memory Analytics Accelerator (Intel IAA). This change reuses and adapts the functionality in Ryan Roberts' RFC patch [1]: "[RFC,v1] mm: zswap: Store large folios without splitting" [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1-ryan.roberts@arm.com/T/#u Link: https://lkml.kernel.org/r/20241001053222.6944-1-kanchana.p.sridhar@intel.com Link: https://lkml.kernel.org/r/20241001053222.6944-7-kanchana.p.sridhar@intel.com Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Originally-by: Ryan Roberts <ryan.roberts@arm.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Wajdi Feghali <wajdi.k.feghali@intel.com> Cc: "Zou, Nanhai" <nanhai.zou@intel.com> Cc: Barry Song <21cnbao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11mm: zswap: modify zswap_stored_pages to be atomic_long_tKanchana P Sridhar
For zswap_store() to support large folios, we need to be able to do a batch update of zswap_stored_pages upon successful store of all pages in the folio. For this, we need to add folio_nr_pages(), which returns a long, to zswap_stored_pages. Link: https://lkml.kernel.org/r/20241001053222.6944-6-kanchana.p.sridhar@intel.com Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Acked-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Wajdi Feghali <wajdi.k.feghali@intel.com> Cc: "Zou, Nanhai" <nanhai.zou@intel.com> Cc: Barry Song <21cnbao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11mm: zswap: rename zswap_pool_get() to zswap_pool_tryget()Kanchana P Sridhar
Modify the name of the existing zswap_pool_get() to zswap_pool_tryget() to be representative of the call it makes to percpu_ref_tryget(). A subsequent patch will introduce a new zswap_pool_get() that calls percpu_ref_get(). The intent behind this change is for higher level zswap API such as zswap_store() to call zswap_pool_tryget() to check upfront if the pool's refcount is "0" (which means it could be getting destroyed) and to handle this as an error condition. zswap_store() would proceed only if zswap_pool_tryget() returns success, and any additional pool refcounts that need to be obtained for compressing sub-pages in a large folio could simply call zswap_pool_get(). Link: https://lkml.kernel.org/r/20241001053222.6944-4-kanchana.p.sridhar@intel.com Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Acked-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Wajdi Feghali <wajdi.k.feghali@intel.com> Cc: "Zou, Nanhai" <nanhai.zou@intel.com> Cc: Barry Song <21cnbao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11mm: zswap: modify zswap_compress() to accept a page instead of a folioKanchana P Sridhar
For zswap_store() to be able to store a large folio by compressing it one page at a time, zswap_compress() needs to accept a page as input. This will allow us to iterate through each page in the folio in zswap_store(), compress it and store it in the zpool. Link: https://lkml.kernel.org/r/20241001053222.6944-3-kanchana.p.sridhar@intel.com Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Yosry Ahmed <yosryahmed@google.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Wajdi Feghali <wajdi.k.feghali@intel.com> Cc: "Zou, Nanhai" <nanhai.zou@intel.com> Cc: Barry Song <21cnbao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11mm: define obj_cgroup_get() if CONFIG_MEMCG is not definedKanchana P Sridhar
Patch series "mm: zswap swap-out of large folios", v10. This patch series enables zswap_store() to accept and store large folios. The most significant contribution in this series is from the earlier RFC submitted by Ryan Roberts [1]. Ryan's original RFC has been migrated to mm-unstable as of 9-30-2024 in patch 6 of this series, and adapted based on code review comments received for the current patch-series. [1]: [RFC PATCH v1] mm: zswap: Store large folios without splitting https://lore.kernel.org/linux-mm/20231019110543.3284654-1-ryan.roberts@arm.com/T/#u The first few patches do the prep work for supporting large folios in zswap_store. Patch 6 provides the main functionality to swap-out large folios in zswap. Patch 7 adds sysfs per-order hugepages "zswpout" counters that get incremented upon successful zswap_store of large folios, and also updates the documentation for this: /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout This patch series is a prerequisite for zswap compress batching of large folio swap-out and decompress batching of swap-ins based on swapin_readahead(), using Intel IAA hardware acceleration, which we would like to submit in subsequent patch-series, with performance improvement data. Thanks to Ying Huang for pre-posting review feedback and suggestions! Thanks also to Nhat, Yosry, Johannes, Barry, Chengming, Usama, Ying and Matthew for their helpful feedback, code/data reviews and suggestions! Co-development signoff request: =============================== I would like to thank Ryan Roberts for his original RFC [1] and request his co-developer signoff on patch 6 in this series. Thanks Ryan! System setup for testing: ========================= Testing of this patch series was done with mm-unstable as of 9-27-2024, commit de2fbaa6d9c3576ec7133ed02a370ec9376bf000 (without this patch-series) and mm-unstable 9-30-2024 commit c121617e3606be6575cdacfdb63cc8d67b46a568 (with this patch-series). Data was gathered on an Intel Sapphire Rapids server, dual-socket 56 cores per socket, 4 IAA devices per socket, 503 GiB RAM and 525G SSD disk partition swap. Core frequency was fixed at 2500MHz. The vm-scalability "usemem" test was run in a cgroup whose memory.high was fixed at 150G. The is no swap limit set for the cgroup. 30 usemem processes were run, each allocating and writing 10G of memory, and sleeping for 10 sec before exiting: usemem --init-time -w -O -s 10 -n 30 10g Other kernel configuration parameters: zswap compressors : zstd, deflate-iaa zswap allocator : zsmalloc vm.page-cluster : 2 In the experiments where "deflate-iaa" is used as the zswap compressor, IAA "compression verification" is enabled by default (cat /sys/bus/dsa/drivers/crypto/verify_compress). Hence each IAA compression will be decompressed internally by the "iaa_crypto" driver, the crc-s returned by the hardware will be compared and errors reported in case of mismatches. Thus "deflate-iaa" helps ensure better data integrity as compared to the software compressors, and the experimental data listed below is with verify_compress set to "1". Metrics reporting methodology: ============================== Total and average throughput are derived from the individual 30 processes' throughputs reported by usemem. elapsed/sys times are measured with perf. All percentage changes are "new" vs. "old"; hence a positive value denotes an increase in the metric, whether it is throughput or latency, and a negative value denotes a reduction in the metric. Positive throughput change percentages and negative latency change percentages denote improvements. The vm stats and sysfs hugepages stats included with the performance data provide details on the swapout activity to zswap/swap device. Testing labels used in data summaries: ====================================== The data refers to these test configurations and the before/after comparisons that they do: before-case1: ------------- mm-unstable 9-27-2024, CONFIG_THP_SWAP=N (compares zswap 4K vs. zswap 64K) In this scenario, CONFIG_THP_SWAP=N results in 64K/2M folios to be split into 4K folios that get processed by zswap. before-case2: ------------- mm-unstable 9-27-2024, CONFIG_THP_SWAP=Y (compares SSD swap large folios vs. zswap large folios) In this scenario, CONFIG_THP_SWAP=Y results in zswap rejecting large folios, which will then be stored by the SSD swap device. after: ------ v10 of this patch-series, CONFIG_THP_SWAP=Y The "after" is CONFIG_THP_SWAP=Y and v10 of this patch-series, that results in 64K/2M folios to not be split, and to be processed by zswap_store. Regression Testing: =================== I ran vm-scalability usemem without large folios, i.e., only 4K folios with mm-unstable and this patch-series. The main goal was to make sure that there is no functional or performance regression wrt the earlier zswap behavior for 4K folios, now that 4K folios will be processed by the new zswap_store() code. The data indicates there is no significant regression. ------------------------------------------------------------------------------- 4K folios: ========== zswap compressor zstd zstd zstd zstd v10 before-case1 before-case2 after vs. vs. case1 case2 ------------------------------------------------------------------------------- Total throughput (KB/s) 4,793,363 4,880,978 4,853,074 1% -1% Average throughput (KB/s) 159,778 162,699 161,769 1% -1% elapsed time (sec) 130.14 123.17 126.29 -3% 3% sys time (sec) 3,135.53 2,985.64 3,083.18 -2% 3% memcg_high 446,826 444,626 452,930 memcg_swap_fail 0 0 0 zswpout 48,932,107 48,931,971 48,931,820 zswpin 383 386 397 pswpout 0 0 0 pswpin 0 0 0 thp_swpout 0 0 0 thp_swpout_fallback 0 0 0 64kB-mthp_swpout_fallback 0 0 0 pgmajfault 3,063 3,077 3,479 swap_ra 93 94 96 swap_ra_hit 47 47 50 ZSWPOUT-64kB n/a n/a 0 SWPOUT-64kB 0 0 0 ------------------------------------------------------------------------------- Performance Testing: ==================== We list the data for 64K folios with before/after data per-compressor, followed by the same for 2M pmd-mappable folios. ------------------------------------------------------------------------------- 64K folios: zstd: ================= zswap compressor zstd zstd zstd zstd v10 before-case1 before-case2 after vs. vs. case1 case2 ------------------------------------------------------------------------------- Total throughput (KB/s) 5,222,213 1,076,611 6,159,776 18% 472% Average throughput (KB/s) 174,073 35,887 205,325 18% 472% elapsed time (sec) 120.50 347.16 108.33 -10% -69% sys time (sec) 2,930.33 248.16 2,549.65 -13% 927% memcg_high 416,773 552,200 465,874 memcg_swap_fail 3,192,906 1,293 1,012 zswpout 48,931,583 20,903 48,931,218 zswpin 384 363 410 pswpout 0 40,778,448 0 pswpin 0 16 0 thp_swpout 0 0 0 thp_swpout_fallback 0 0 0 64kB-mthp_swpout_fallback 3,192,906 1,293 1,012 pgmajfault 3,452 3,072 3,061 swap_ra 90 87 107 swap_ra_hit 42 43 57 ZSWPOUT-64kB n/a n/a 3,057,173 SWPOUT-64kB 0 2,548,653 0 ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 64K folios: deflate-iaa: ======================== zswap compressor deflate-iaa deflate-iaa deflate-iaa deflate-iaa v10 before-case1 before-case2 after vs. vs. case1 case2 ------------------------------------------------------------------------------- Total throughput (KB/s) 5,652,608 1,089,180 7,189,778 27% 560% Average throughput (KB/s) 188,420 36,306 239,659 27% 560% elapsed time (sec) 102.90 343.35 87.05 -15% -75% sys time (sec) 2,246.86 213.53 1,864.16 -17% 773% memcg_high 576,104 502,907 642,083 memcg_swap_fail 4,016,117 1,407 1,478 zswpout 61,163,423 22,444 57,798,716 zswpin 401 368 454 pswpout 0 40,862,080 0 pswpin 0 20 0 thp_swpout 0 0 0 thp_swpout_fallback 0 0 0 64kB-mthp_swpout_fallback 4,016,117 1,407 1,478 pgmajfault 3,063 3,153 3,122 swap_ra 96 93 156 swap_ra_hit 46 45 83 ZSWPOUT-64kB n/a n/a 3,611,032 SWPOUT-64kB 0 2,553,880 0 ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 2M folios: zstd: ================ zswap compressor zstd zstd zstd zstd v10 before-case1 before-case2 after vs. vs. case1 case2 ------------------------------------------------------------------------------- Total throughput (KB/s) 5,895,500 1,109,694 6,484,224 10% 484% Average throughput (KB/s) 196,516 36,989 216,140 10% 484% elapsed time (sec) 108.77 334.28 106.33 -2% -68% sys time (sec) 2,657.14 94.88 2,376.13 -11% 2404% memcg_high 64,200 66,316 56,898 memcg_swap_fail 101,182 70 27 zswpout 48,931,499 36,507 48,890,640 zswpin 380 379 377 pswpout 0 40,166,400 0 pswpin 0 0 0 thp_swpout 0 78,450 0 thp_swpout_fallback 101,182 70 27 2MB-mthp_swpout_fallback 0 0 27 pgmajfault 3,067 3,417 3,311 swap_ra 91 90 854 swap_ra_hit 45 45 810 ZSWPOUT-2MB n/a n/a 95,459 SWPOUT-2MB 0 78,450 0 ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- 2M folios: deflate-iaa: ======================= zswap compressor deflate-iaa deflate-iaa deflate-iaa deflate-iaa v10 before-case1 before-case2 after vs. vs. case1 case2 ------------------------------------------------------------------------------- Total throughput (KB/s) 6,286,587 1,126,785 7,073,464 13% 528% Average throughput (KB/s) 209,552 37,559 235,782 13% 528% elapsed time (sec) 96.19 333.03 85.79 -11% -74% sys time (sec) 2,141.44 99.96 1,826.67 -15% 1727% memcg_high 99,253 64,666 79,718 memcg_swap_fail 129,074 53 165 zswpout 61,312,794 28,321 56,045,120 zswpin 383 406 403 pswpout 0 40,048,128 0 pswpin 0 0 0 thp_swpout 0 78,219 0 thp_swpout_fallback 129,074 53 165 2MB-mthp_swpout_fallback 0 0 165 pgmajfault 3,430 3,077 31,468 swap_ra 91 103 84,373 swap_ra_hit 47 46 84,317 ZSWPOUT-2MB n/a n/a 109,229 SWPOUT-2MB 0 78,219 0 ------------------------------------------------------------------------------- And finally, this is a comparison of deflate-iaa vs. zstd with v10 of this patch-series: --------------------------------------------- zswap_store large folios v10 Impr w/ deflate-iaa vs. zstd 64K folios 2M folios --------------------------------------------- Throughput (KB/s) 17% 9% elapsed time (sec) -20% -19% sys time (sec) -27% -23% --------------------------------------------- Conclusions based on the performance results: ============================================= v10 wrt before-case1: --------------------- We see significant improvements in throughput, elapsed and sys time for zstd and deflate-iaa, when comparing before-case1 (THP_SWAP=N) vs. after (THP_SWAP=Y) with zswap_store large folios. v10 wrt before-case2: --------------------- We see even more significant improvements in throughput and elapsed time for zstd and deflate-iaa, when comparing before-case2 (large-folio-SSD) vs. after (large-folio-zswap). The sys time increases with large-folio-zswap as expected, due to the CPU compression time vs. asynchronous disk write times, as pointed out by Ying and Yosry. In before-case2, when zswap does not store large folios, only allocations and cgroup charging due to 4K folio zswap stores count towards the cgroup memory limit. However, in the after scenario, with the introduction of zswap_store() of large folios, there is an added component of the zswap compressed pool usage from large folio stores from potentially all 30 processes, that gets counted towards the memory limit. As a result, we see higher swapout activity in the "after" data. Summary: ======== The v10 data presented above shows that zswap_store of large folios demonstrates good throughput/performance improvements compared to conventional SSD swap of large folios with a sufficiently large 525G SSD swap device. Hence, it seems reasonable for zswap_store to support large folios, so that further performance improvements can be implemented. In the experimental setup used in this patchset, we have enabled IAA compress verification to ensure additional hardware data integrity CRC checks not currently done by the software compressors. We see good throughput/latency improvements with deflate-iaa vs. zstd with zswap_store of large folios. Some of the ideas for further reducing latency that have shown promise in our experiments, are: 1) IAA compress/decompress batching. 2) Distributing compress jobs across all IAA devices on the socket. The tests run for this patchset are using only 1 IAA device per core, that avails of 2 compress engines on the device. In our experiments with IAA batching, we distribute compress jobs from all cores to the 8 compress engines available per socket. We further compress the pages in each folio in parallel in the accelerator. As a result, we improve compress latency and reclaim throughput. In decompress batching, we use swapin_readahead to generate a prefetch batch of 4K folios that we decompress in parallel in IAA. ------------------------------------------------------------------------------ IAA compress/decompress batching Further improvements wrt v10 zswap_store Sequential subpage store using "deflate-iaa": "deflate-iaa" Batching "deflate-iaa-canned" [2] Batching Additional Impr Additional Impr 64K folios 2M folios 64K folios 2M folios ------------------------------------------------------------------------------ Throughput (KB/s) 19% 43% 26% 55% elapsed time (sec) -5% -14% -10% -21% sys time (sec) 4% -7% -4% -18% ------------------------------------------------------------------------------ With zswap IAA compress/decompress batching, we are able to demonstrate significant performance improvements and memory savings in server scalability experiments in highly contended system scenarios under significant memory pressure; as compared to software compressors. We hope to submit this work in subsequent patch series. The current patch-series is a prequisite for these future submissions. [1] https://lore.kernel.org/linux-mm/20231019110543.3284654-1-ryan.roberts@arm.com/T/#u [2] https://patchwork.kernel.org/project/linux-crypto/cover/cover.1710969449.git.andre.glover@linux.intel.com/ This patch (of 6): This resolves an issue with obj_cgroup_get() not being defined if CONFIG_MEMCG is not defined. Before this patch, we would see build errors if obj_cgroup_get() is called from code that is agnostic of CONFIG_MEMCG. The zswap_store() changes for large folios in subsequent commits will require the use of obj_cgroup_get() in zswap code that falls into this category. Link: https://lkml.kernel.org/r/20241001053222.6944-1-kanchana.p.sridhar@intel.com Link: https://lkml.kernel.org/r/20241001053222.6944-2-kanchana.p.sridhar@intel.com Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Wajdi Feghali <wajdi.k.feghali@intel.com> Cc: "Zou, Nanhai" <nanhai.zou@intel.com> Cc: Barry Song <21cnbao@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11Merge drm/drm-fixes into drm-misc-fixesThomas Zimmermann
Backmerging to get fixes from v6.12-rc7. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-11-11Merge branch 'mm-hotfixes-stable' into mm-stableAndrew Morton
Pick up e7ac4daeed91 ("mm: count zeromap read and set for swapout and swapin") in order to move mm: define obj_cgroup_get() if CONFIG_MEMCG is not defined mm: zswap: modify zswap_compress() to accept a page instead of a folio mm: zswap: rename zswap_pool_get() to zswap_pool_tryget() mm: zswap: modify zswap_stored_pages to be atomic_long_t mm: zswap: support large folios in zswap_store() mm: swap: count successful large folio zswap stores in hugepage zswpout stats mm: zswap: zswap_store_page() will initialize entry after adding to xarray. mm: add per-order mTHP swpin counters from mm-unstable into mm-stable.
2024-11-11mm: count zeromap read and set for swapout and swapinBarry Song
When the proportion of folios from the zeromap is small, missing their accounting may not significantly impact profiling. However, it's easy to construct a scenario where this becomes an issue—for example, allocating 1 GB of memory, writing zeros from userspace, followed by MADV_PAGEOUT, and then swapping it back in. In this case, the swap-out and swap-in counts seem to vanish into a black hole, potentially causing semantic ambiguity. On the other hand, Usama reported that zero-filled pages can exceed 10% in workloads utilizing zswap, while Hailong noted that some app in Android have more than 6% zero-filled pages. Before commit 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap"), both zswap and zRAM implemented similar optimizations, leading to these optimized-out pages being counted in either zswap or zRAM counters (with pswpin/pswpout also increasing for zRAM). With zeromap functioning prior to both zswap and zRAM, userspace will no longer detect these swap-out and swap-in actions. We have three ways to address this: 1. Introduce a dedicated counter specifically for the zeromap. 2. Use pswpin/pswpout accounting, treating the zero map as a standard backend. This approach aligns with zRAM's current handling of same-page fills at the device level. However, it would mean losing the optimized-out page counters previously available in zRAM and would not align with systems using zswap. Additionally, as noted by Nhat Pham, pswpin/pswpout counters apply only to I/O done directly to the backend device. 3. Count zeromap pages under zswap, aligning with system behavior when zswap is enabled. However, this would not be consistent with zRAM, nor would it align with systems lacking both zswap and zRAM. Given the complications with options 2 and 3, this patch selects option 1. We can find these counters from /proc/vmstat (counters for the whole system) and memcg's memory.stat (counters for the interested memcg). For example: $ grep -E 'swpin_zero|swpout_zero' /proc/vmstat swpin_zero 1648 swpout_zero 33536 $ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat swpin_zero 3905 swpout_zero 3985 This patch does not address any specific zeromap bug, but the missing swpout and swpin counts for zero-filled pages can be highly confusing and may mislead user-space agents that rely on changes in these counters as indicators. Therefore, we add a Fixes tag to encourage the inclusion of this counter in any kernel versions with zeromap. Many thanks to Kanchana for the contribution of changing count_objcg_event() to count_objcg_events() to support large folios[1], which has now been incorporated into this patch. [1] https://lkml.kernel.org/r/20241001053222.6944-5-kanchana.p.sridhar@intel.com Link: https://lkml.kernel.org/r/20241107011246.59137-1-21cnbao@gmail.com Fixes: 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap") Co-developed-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Hailong Liu <hailong.liu@oppo.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Andi Kleen <ak@linux.intel.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chris Li <chrisl@kernel.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kairui Song <kasong@tencent.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-11cpufreq: sun50i: add a100 cpufreq supportShuosheng Huang
Let's add cpufreq nvmem based for allwinner a100 soc. It's similar to h6, let us use efuse_xlate to extract the differentiated part. Signed-off-by: Shuosheng Huang <huangshuosheng@allwinnertech.com> [masterr3c0rd@epochal.quest: add A100 to opp_match_list] Signed-off-by: Cody Eksal <masterr3c0rd@epochal.quest> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Tested-by: Andre Przywara <andre.przywara@arm.com> Tested-by: Parthiban Nallathambi <parthiban@linumiz.com> Acked-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2024-11-11bcachefs: Allow for unknown key types in backpointers fsckKent Overstreet
We can't assume that btrees only contain keys of a given type - even if they only have a single key type listed in the allowed key types for that btree; this is a forwards compatibility issue. Reported-by: syzbot+a27c3aaa3640dd3e1dfb@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-11bcachefs: Fix assertion pop in topology repairKent Overstreet
Fixes: baefd3f849ed ("bcachefs: btree_cache.freeable list fixes") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-11-11cpufreq: mediatek-hw: Fix wrong return value in mtk_cpufreq_get_cpu_power()Jinjie Ruan
mtk_cpufreq_get_cpu_power() return 0 if the policy is NULL. Then in em_create_perf_table(), the later zero check for power is not invalid as power is uninitialized. As Lukasz suggested, it must return -EINVAL when the 'policy' is not found. So return -EINVAL to fix it. Cc: stable@vger.kernel.org Fixes: 4855e26bcf4d ("cpufreq: mediatek-hw: Add support for CPUFREQ HW") Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Suggested-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2024-11-11cpufreq: CPPC: Fix wrong return value in cppc_get_cpu_power()Jinjie Ruan
cppc_get_cpu_power() return 0 if the policy is NULL. Then in em_create_perf_table(), the later zero check for power is not valid as power is uninitialized. As Quentin pointed out, kernel energy model core check the return value of active_power() first, so if the callback failed it should tell the core. So return -EINVAL to fix it. Fixes: a78e72075642 ("cpufreq: CPPC: Fix possible null-ptr-deref for cpufreq_cpu_get_raw()") Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2024-11-11cpufreq: CPPC: Fix wrong return value in cppc_get_cpu_cost()Jinjie Ruan
cppc_get_cpu_cost() return 0 if the policy is NULL. Then in em_compute_costs(), the later zero check for cost is not valid as cost is uninitialized. As Quentin pointed out, kernel energy model core check the return value of get_cost() first, so if the callback failed it should tell the core. Return -EINVAL to fix it. Fixes: 1a1374bb8c59 ("cpufreq: CPPC: Fix possible null-ptr-deref for cppc_get_cpu_cost()") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/all/c4765377-7830-44c2-84fa-706b6e304e10@stanley.mountain/ Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Suggested-by: Quentin Perret <qperret@google.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2024-11-11cpufreq: loongson3: Check for error code from devm_mutex_init() callAndy Shevchenko
Even if it's not critical, the avoidance of checking the error code from devm_mutex_init() call today diminishes the point of using devm variant of it. Tomorrow it may even leak something. Add the missed check. Fixes: ccf51454145b ("cpufreq: Add Loongson-3 CPUFreq driver support") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2024-11-11cpufreq: scmi: Fix cleanup path when boost enablement failsSibi Sankar
Include free_cpufreq_table in the cleanup path when boost enablement fails. cc: stable@vger.kernel.org Fixes: a8e949d41c72 ("cpufreq: scmi: Enable boost support") Signed-off-by: Sibi Sankar <quic_sibis@quicinc.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
2024-11-11Merge tag 'drm-misc-next-2024-11-08' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.13: UAPI Changes: - Add 1X7X5 media-bus formats. Cross-subsystem Changes: - Maintainer updates for VKMS and IT6263. - Add media-bus-fmt for MEDIA_BUS_FMT_RGB101010_1X7X5_*. - Add IT6263 DT bindings and driver. Core Changes: - Add ABGR210101010 support to panic handler. - Use ATOMIC64_INIT in drm_file.c - Improve scheduler teardown documentation. Driver Changes: - Make mediatek compile on ARM again. - Add missing drm/drm_bridge.h header include, already in drm-next. - Small fixes and cleanups to vkms, bridge/it6505, panfrost, panthor. - Add panic support to nouveau for nv50+. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/344afe41-d27b-408a-8542-bfecfd3555f6@linux.intel.com
2024-11-10rust: use custom FFI integer typesGary Guo
Currently FFI integer types are defined in libcore. This commit creates the `ffi` crate and asks bindgen to use that crate for FFI integer types instead of `core::ffi`. This commit is preparatory and no type changes are made in this commit yet. Signed-off-by: Gary Guo <gary@garyguo.net> Link: https://lore.kernel.org/r/20240913213041.395655-4-gary@garyguo.net [ Added `rustdoc`, `rusttest` and KUnit tests support. Rebased on top of `rust-next` (e.g. migrated more `core::ffi` cases). Reworded crate docs slightly and formatted. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-11-10rust: map `__kernel_size_t` and friends also to usize/isizeGary Guo
Currently bindgen has special logic to recognise `size_t` and `ssize_t` and map them to Rust `usize` and `isize`. Similarly, `ptrdiff_t` is mapped to `isize`. However this falls short for `__kernel_size_t`, `__kernel_ssize_t` and `__kernel_ptrdiff_t`. To ensure that they are mapped to usize/isize rather than 32/64 integers depending on platform, blocklist them in bindgen parameters and manually provide their definition. Signed-off-by: Gary Guo <gary@garyguo.net> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Trevor Gross <tmgross@umich.edu> Link: https://lore.kernel.org/r/20240913213041.395655-3-gary@garyguo.net [ Formatted comment. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-11-11m68k: coldfire/device.c: only build FEC when HW macros are definedAntonio Quartulli
When CONFIG_FEC is set (due to COMPILE_TEST) along with CONFIG_M54xx, coldfire/device.c has compile errors due to missing MCFEC_* and MCF_IRQ_FEC_* symbols. Make the whole FEC blocks dependent on having the HW macros defined, rather than on CONFIG_FEC itself. This fix is very similar to commit e6e1e7b19fa1 ("m68k: coldfire/device.c: only build for MCF_EDMA when h/w macros are defined") Fixes: b7ce7f0d0efc ("m68knommu: merge common ColdFire FEC platform setup code") To: Greg Ungerer <gerg@linux-m68k.org> To: Geert Uytterhoeven <geert@linux-m68k.org> Cc: linux-m68k@lists.linux-m68k.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Antonio Quartulli <antonio@mandelbit.com> Signed-off-by: Greg Ungerer <gerg@kernel.org>
2024-11-11m68k: mcfgpio: Fix incorrect register offset for CONFIG_M5441xJean-Michel Hautbois
Fix a typo in the CONFIG_M5441x preprocessor condition, where the GPIO register offset was incorrectly set to 8 instead of 0. This prevented proper GPIO configuration for m5441x targets. Fixes: bea8bcb12da0 ("m68knommu: Add support for the Coldfire m5441x.") Signed-off-by: Jean-Michel Hautbois <jeanmichel.hautbois@yoseli.org> Signed-off-by: Greg Ungerer <gerg@kernel.org>
2024-11-10rust: fix size_t in bindgen prototypes of C builtinsGary Guo
Without `-fno-builtin`, for functions like memcpy/memmove (and many others), bindgen seems to be using the clang-provided prototype. This prototype is ABI-wise compatible, but the issue is that it does not have the same information as the source code w.r.t. typedefs. For example, bindgen generates the following: extern "C" { pub fn strlen(s: *const core::ffi::c_char) -> core::ffi::c_ulong; } note that the return type is `c_ulong` (i.e. unsigned long), despite the size_t-is-usize behavior (this is default, and we have not opted out from it using --no-size_t-is-usize). Similarly, memchr's size argument should be of type `__kernel_size_t`, but bindgen generates `c_ulong` directly. We want to ensure any `size_t` is translated to Rust `usize` so that we can avoid having them be different type on 32-bit and 64-bit architectures, and hence would require a lot of excessive type casts when calling FFI functions. I found that this bindgen behavior (which probably is caused by libclang) can be disabled by `-fno-builtin`. Using the flag for compiled code can result in less optimisation because compiler cannot assume about their properties anymore, but this should not affect bindgen. [ Trevor asked: "I wonder how reliable this behavior is. Maybe bindgen could do a better job controlling this, is there an open issue?". Gary replied: ..."apparently this is indeed the suggested approach in https://github.com/rust-lang/rust-bindgen/issues/1770". - Miguel ] Signed-off-by: Gary Guo <gary@garyguo.net> Link: https://lore.kernel.org/r/20240913213041.395655-2-gary@garyguo.net [ Formatted comment. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2024-11-10hwmon: (pmbus) add documentation for existing flagsJerome Brunet
PMBUS_NO_WRITE_PROTECT and PMBUS_USE_COEFFICIENTS_CMD flags have been added to pmbus, but the corresponding documentation was not updated. Update the documentation before adding new flags Signed-off-by: Jerome Brunet <jbrunet@baylibre.com> Message-ID: <20241105-tps25990-v4-1-0e312ac70b62@baylibre.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (ina226) Add support for SY24655Wenliang Yan
SY24655: Support for current and voltage detection as well as power calculation. Signed-off-by: Wenliang Yan <wenliang202407@163.com> Message-ID: <20241106150547.2538-1-wenliang202407@163.com> [groeck: Changed order of compatible entries; dropped spurious extra return statement in is_visible(); fixed code problems] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10dt-bindings: Add SY24655 to ina2xx devicetree bindingsWenliang Yan
SY24655 is similar to INA226. Its supply voltage and pin definitions are therefore the same. Compared to INA226, SY24655 has two additional registers for configuring and calculating average power. Signed-off-by: Wenliang Yan <wenliang202407@163.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Message-ID: <20241106150547.2538-2-wenliang202407@163.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (pmbus/ltc2978) add support for ltc7841Mariel Tinaco
Add support for LTC7841. The LTC7841 is a high performance PolyPhase® single output synchronous boost converter controller. Multiphase operation reduces input and output capacitor requirements and allows the use of smaller inductors than the single-phase equivalent. The relevant registers in the LTC7841 are similar to the LTC7880, only reduced by some amount. So it's just a matter of adding the chip id. The device also doesn't support polling, on top of the reduced register set, so a separate case for setting the chip info is added. Signed-off-by: Mariel Tinaco <Mariel.Tinaco@analog.com> Message-ID: <20241029013734.293024-4-Mariel.Tinaco@analog.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (pmbus/ltc7841) add support for LTC7841 - docsMariel Tinaco
Add LTC7841 to compatible devices of LTC2978 Signed-off-by: Mariel Tinaco <Mariel.Tinaco@analog.com> Message-ID: <20241029013734.293024-3-Mariel.Tinaco@analog.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10dt-bindings: hwmon: ltc2978: add support for ltc7841Mariel Tinaco
Add LTC7841 to supported devices of LTC2978. It has similar set of registers to LTC7880, differing only in number of output channels and some unimplemented PMBUS status and functionalities. Signed-off-by: Mariel Tinaco <Mariel.Tinaco@analog.com> Acked-by: Rob Herring (Arm) <robh@kernel.org> Message-ID: <20241029013734.293024-2-Mariel.Tinaco@analog.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: Add driver for I2C chip Nuvoton NCT7363YBan Feng
The NCT7363Y is a fan controller which provides up to 16 independent FAN input monitors. It can report each FAN input count values. The NCT7363Y also provides up to 16 independent PWM outputs. Each PWM can output specific PWM signal by manual mode to control the FAN duty outside. Signed-off-by: Ban Feng <kcfeng0@nuvoton.com> Message-ID: <20241022052905.4062682-3-kcfeng0@nuvoton.com> [groeck: Dropped unnecessary variable initialization, and , after { }] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10dt-bindings: hwmon: Add NCT7363Y documentationBan Feng
Add bindings for the Nuvoton NCT7363Y Fan Controller Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Ban Feng <kcfeng0@nuvoton.com> Message-ID: <20241022052905.4062682-2-kcfeng0@nuvoton.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10dt-bindings: hwmon: pmbus: Add bindings for Vicor pli1209bcNaresh Solanki
Remove vicor,pli1209bc from trivial-devices as it requires additional properties and does not fit into the trivial devices category. Add new bindings for Vicor pli1209bc, a Digital Supervisor with Isolation for use with BCM Bus Converter Modules. VR rails are defined under regulator node as expected by pmbus driver. Signed-off-by: Naresh Solanki <naresh.solanki@9elements.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Message-ID: <20241021123044.3648960-1-naresh.solanki@9elements.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10dt-bindings: hwmon: pmbus: Add bindings for MPS MP297xNaresh Solanki
Remove mps297x from trivial-devices as it requires additional properties and does not fit into the trivial devices category. Add new bindings for MPS mp2971, mp2973 & mp2975. It is Dual-Loop, Digital Multi-Phase Controller with PMBUS interface Signed-off-by: Naresh Solanki <naresh.solanki@9elements.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Message-ID: <20241022103750.572677-1-naresh.solanki@9elements.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10MAINTAINERS: Remove Aleksandr Mezin as NZXT-SMART2 driver maintainerGuenter Roeck
Per his request, remove Aleksandr Mezin as maintainer of the NZXT-SMART2 hardware monitoring driver. Cc: Aleksandr Mezin <mezin.alexander@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (nct6775) Add 665-ACE/600M-CL to ASUS WMI monitoring listSarah Maedel
Boards such as * Pro WS 665-ACE * Pro WS 600M-CL have got a nct6775 chip, but by default there's no use of it because of resource conflict with WMI method. Add affected boards to the WMI monitoring list. Link: https://bugzilla.kernel.org/show_bug.cgi?id=204807 Co-developed-by: Tommy Giesler <tommy.giesler@hetzner.com> Signed-off-by: Tommy Giesler <tommy.giesler@hetzner.com> Signed-off-by: Sarah Maedel <sarah.maedel@hetzner-cloud.de> Message-ID: <20241018074611.358619-1-sarah.maedel@hetzner-cloud.de> [groeck: Change commit message to imperative mood] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (isl28022) new driver for ISL28022 power monitorYikai Tsai
Driver for Renesas ISL28022 power monitor with I2C interface. The device monitors voltage, current via shunt resistor and calculated power. Signed-off-by: Carsten Spieß <mail@carsten-spiess.de> Signed-off-by: Yikai Tsai <yikai.tsai.wiwynn@gmail.com> Message-ID: <20241002081133.13123-3-yikai.tsai.wiwynn@gmail.com> [groeck: Fixed alignment issues, dropped noise at end of probe] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10dt-bindings: hwmon: add renesas,isl28022Yikai Tsai
Add dt-bindings for Renesas ISL28022 power monitor. Signed-off-by: Carsten Spieß <mail@carsten-spiess.de> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Yikai Tsai <yikai.tsai.wiwynn@gmail.com> Message-ID: <20241002081133.13123-2-yikai.tsai.wiwynn@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: Switch back to struct platform_driver::remove()Uwe Kleine-König
After commit 0edb555a65d1 ("platform: Make platform_driver::remove() return void") .remove() is (again) the right callback to implement for platform drivers. Convert all platform drivers below drivers/hwmonto use .remove(), with the eventual goal to drop struct platform_driver::remove_new(). As .remove() and .remove_new() have the same prototypes, conversion is done by just changing the structure member name in the driver initializer. While touching these files, make indention of the struct initializer consistent in several files. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Message-ID: <20241017155900.137357-2-u.kleine-koenig@baylibre.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (sht4x): add heater supportAntoni Pokusinski
Add support for manipulating the internal heater of sht4x devices. Enabling the heater removes condensed water from the sensor surface which disturbs the relative humidity measurements. The heater can operate at three heating levels (20, 110 or 200 milliwatts). Also, two heating durations may be selected (0.1 or 1s). Once the heating time elapses the heater is automatically switched off. Changes since v3: * struct sht4x_data: add heating_complete timestamp * struct sht4x_data: add data_pending flag * heater_enable_store: return -EINVAL if input != 1 * heater_enable_store: check for data->heating_complete and update it * heater_enable_store: set data_pending flag after heating request * sht4x_read_values: msleep if heating in progress * sht4x_read_values: dont send measurement request if data_pending * heater_enable attr: make it RW * Documentation: update info about heater_enable attr Changes since v2: * heater_enable_store: remove unnecessary if * Documentation: remove incorrect info about turning off the heater * be more specific in the patch description Changes since v1: * explain the use case of the new attributes set * heater_enable attr: make it write-only * heater_enable_store: define cmd as u8 instead of u8* * heater_enable_store: remove unreachable data path * heater_enable_store: remove unnecessary lock * heater_enable_store: call i2c_master_send only if status==true * define attributes as DEVICE_ATTR_* instead of SENSOR_DEVICE_ATTR_* Signed-off-by: Antoni Pokusinski <apokusinski01@gmail.com> Message-ID: <20240930205346.2147-1-apokusinski01@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (nzxt-kraken2) Simplify specifying static visibility attributeHeiner Kallweit
Use new member visible of struct hwmon_ops to simplify specifying the static attribute visibility. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Message-ID: <1ac2be2d-df4f-455a-900d-821fc7bd12c4@gmail.com> Acked-by: Jonas Malaco <jonas@protocubo.io> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (intel-m10-bmc) Simplify specifying static visibility attributeHeiner Kallweit
Use new member visible of struct hwmon_ops to simplify specifying the static attribute visibility. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Message-ID: <8ef99170-b37d-4c9a-b3bf-59f4ea76cf29@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (raspberrypi) Simplify specifying static visibility attributeHeiner Kallweit
Use new member visible of struct hwmon_ops to simplify specifying the static attribute visibility. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Message-ID: <4e8893a1-b080-4676-97b9-a48ac9ead28a@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (powerz) Simplify specifying static visibility attributeHeiner Kallweit
Use new member visible of struct hwmon_ops to simplify specifying the static attribute visibility. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Acked-by: Thomas Weißschuh <linux@weissschuh.net> Message-ID: <c4b4568b-59f6-43ac-8281-536a82ecd6ab@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (gsc) Simplify specifying static visibility attributeHeiner Kallweit
Use new member visible of struct hwmon_ops to simplify specifying the static attribute visibility. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Message-ID: <bd909a2c-23ee-437d-9bd4-858119f6f266@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (sl28cpld) Simplify specifying static visibility attributeHeiner Kallweit
Use new member visible of struct hwmon_ops to simplify specifying the static attribute visibility. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Message-ID: <5c26d8cf-d6dc-46c5-be7c-fd8207b3f177@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (surface_fan) Simplify specifying static visibility attributeHeiner Kallweit
Use new member visible of struct hwmon_ops to simplify specifying the static attribute visibility. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Message-ID: <d5d2570c-dfd9-4be5-ad9f-e721be477131@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (i5500_temp) Simplify specifying static visibility attributeHeiner Kallweit
Use new member visible of struct hwmon_ops to simplify specifying the static attribute visibility. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Message-ID: <2b1f2778-1127-4979-b02d-f75e16497ad7@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (max6639) : Configure based on DT propertyNaresh Solanki
Remove platform data & initialize with defaults configuration & overwrite based on DT properties. Signed-off-by: Naresh Solanki <naresh.solanki@9elements.com> Message-ID: <20241007090426.811736-1-naresh.solanki@9elements.com> [groeck: Dropped some unnecessary empty lines] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: Add static visibility member to struct hwmon_opsHeiner Kallweit
Several drivers return the same static value in their is_visible callback, what results in code duplication. Therefore add an option for drivers to specify a static visibility directly. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Message-ID: <89690b81-2c73-47ae-9ae9-45c77b45ca0c@gmail.com> groeck: Renamed hwmon_ops_is_visible -> hwmon_is_visible Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2024-11-10hwmon: (max31827) Fix spelling errors reported by codespellEverest K.C.
Below mentioned spelling errors reported by codesepll were fixed: respresents ==> represents signifcant ==> significant bandwitdh ==> bandwidth Signed-off-by: Everest K.C. <everestkc@everestkc.com.np> Message-ID: <20241001011521.80982-1-everestkc@everestkc.com.np> Signed-off-by: Guenter Roeck <linux@roeck-us.net>