summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-07-20Merge tag 'staging-6.16-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes from Greg KH: "Here are some small driver fixes for the vchiq_arm staging driver: - reverts of previous changes that turned out to caused problems. - change to prevent a resource leak All of these have been in linux-next this week with no reported problems" * tag 'staging-6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: vchiq_arm: Make vchiq_shutdown never fail Revert "staging: vchiq_arm: Create keep-alive thread during probe" Revert "staging: vchiq_arm: Improve initial VCHIQ connect"
2025-07-20Merge tag 'char-misc-6.16-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc / IIO fixes from Greg KH: "Here are some char/misc/iio and other driver fixes for 6.16-rc7. Included in here are: - IIO driver fixes for reported problems - Interconnect driver fixes for reported problems - nvmem driver fixes - bunch of comedi driver fixes for long-term bugs - Kconfig dependancy fixes for mux drivers - other small driver fixes for reported problems. All of these have been in linux-next for a while with no reported problems" * tag 'char-misc-6.16-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (35 commits) nvmem: layouts: u-boot-env: remove crc32 endianness conversion misc: amd-sbi: Explicitly clear in/out arg "mb_in_out" misc: amd-sbi: Address copy_to/from_user() warning reported in smatch misc: amd-sbi: Address potential integer overflow issue reported in smatch comedi: comedi_test: Fix possible deletion of uninitialized timers comedi: Fix initialization of data for instructions that write to subdevice comedi: Fix use of uninitialized data in insn_rw_emulate_bits() comedi: das6402: Fix bit shift out of bounds comedi: aio_iiro_16: Fix bit shift out of bounds comedi: pcl812: Fix bit shift out of bounds comedi: das16m1: Fix bit shift out of bounds comedi: Fix some signed shift left operations comedi: Fail COMEDI_INSNLIST ioctl if n_insns is too large nvmem: imx-ocotp: fix MAC address byte length MAINTAINERS: add miscdevice Rust abstractions mux: mmio: Fix missing CONFIG_REGMAP_MMIO iio: dac: ad3530r: Fix incorrect masking for channels 4-7 in powerdown mode iio: adc: ad7380: fix adi,gain-milli property parsing iio: adc: ad7949: use spi_is_bpw_supported() iio: accel: fxls8962af: Fix use after free in fxls8962af_fifo_flush ...
2025-07-20Merge tag 'spi-fix-v6.16-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fix from Mark Brown: "A fix adding missing validation that 8 bit I/O mode is actually supported for the specific device when attempting to use it. Anything that runs into this should already have been having problems, enforcing this should just make things safer and more obvious" * tag 'spi-fix-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: Add check for 8-bit transfer with 8 IO mode support
2025-07-20Merge tag 'regmap-fix-v6.16-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap Pull regmap fix from Mark Brown: "A fix for a memory leak when we get an error during regmap init for a bus that uses free_on_exit to clean up device specific data" * tag 'regmap-fix-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: regmap: fix potential memory leak of regmap_bus
2025-07-20Merge tag 'input-for-v6.16-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fix from Dmitry Torokhov: - just a small fixup to the xpad driver correcting the recent addition of the Acer NGR200 controller * tag 'input-for-v6.16-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: xpad - set correct controller type for Acer NGR200
2025-07-20apparmor: fix: accept2 being specifie even when permission table is presntJohn Johansen
The transition to the perms32 permission table dropped the need for the accept2 table as permissions. However accept2 can be used for flags and may be present even when the perms32 table is present. So instead of checking on version, check whether the table is present. Fixes: 2e12c5f06017 ("apparmor: add additional flags to extended permission.") Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20apparmor: transition from a list of rules to a vector of rulesJohn Johansen
The set of rules on a profile is not dynamically extended, instead if a new ruleset is needed a new version of the profile is created. This allows us to use a vector of rules instead of a list, slightly reducing memory usage and simplifying the code. Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20apparmor: fix documentation mismatches in val_mask_to_str and socket functionsPeng Jiang
This patch fixes kernel-doc warnings: 1. val_mask_to_str: - Added missing descriptions for `size` and `table` parameters. - Removed outdated str_size and chrs references. 2. Socket Functions: - Makes non-null requirements clear for socket/address args. - Standardizes return values per kernel conventions. - Adds Unix domain socket protocol details. These changes silence doc validation warnings and improve accuracy for AppArmor LSM docs. Signed-off-by: Peng Jiang <jiang.peng9@zte.com.cn> Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20apparmor: remove redundant perms.allow MAY_EXEC bitflag setRyan Lee
This section of profile_transition that occurs after x_to_label only happens if perms.allow already has the MAY_EXEC bit set, so we don't need to set it again. Fixes: 16916b17b4f8 ("apparmor: force auditing of conflicting attachment execs from confined") Signed-off-by: Ryan Lee <ryan.lee@canonical.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20apparmor: fix kernel doc warnings for kernel test robotJohn Johansen
Fix kernel doc warnings for the functions - apparmor_socket_bind - apparmor_unix_may_send - apparmor_unix_stream_connect - val_mask_to_str Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506070127.B1bc3da4-lkp@intel.com/ Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20apparmor: Fix unaligned memory accesses in KUnit testHelge Deller
The testcase triggers some unnecessary unaligned memory accesses on the parisc architecture: Kernel: unaligned access to 0x12f28e27 in policy_unpack_test_init+0x180/0x374 (iir 0x0cdc1280) Kernel: unaligned access to 0x12f28e67 in policy_unpack_test_init+0x270/0x374 (iir 0x64dc00ce) Use the existing helper functions put_unaligned_le32() and put_unaligned_le16() to avoid such warnings on architectures which prefer aligned memory accesses. Signed-off-by: Helge Deller <deller@gmx.de> Fixes: 98c0cc48e27e ("apparmor: fix policy_unpack_test on big endian systems") Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20apparmor: Fix 8-byte alignment for initial dfa blob streamsHelge Deller
The dfa blob stream for the aa_dfa_unpack() function is expected to be aligned on a 8 byte boundary. The static nulldfa_src[] and stacksplitdfa_src[] arrays store the initial apparmor dfa blob streams, but since they are declared as an array-of-chars the compiler and linker will only ensure a "char" (1-byte) alignment. Add an __aligned(8) annotation to the arrays to tell the linker to always align them on a 8-byte boundary. This avoids runtime warnings at startup on alignment-sensitive platforms like parisc such as: Kernel: unaligned access to 0x7f2a584a in aa_dfa_unpack+0x124/0x788 (iir 0xca0109f) Kernel: unaligned access to 0x7f2a584e in aa_dfa_unpack+0x210/0x788 (iir 0xca8109c) Kernel: unaligned access to 0x7f2a586a in aa_dfa_unpack+0x278/0x788 (iir 0xcb01090) Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org Fixes: 98b824ff8984 ("apparmor: refcount the pdb") Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20apparmor: shift uid when mediating af_unix in usernsGabriel Totev
Avoid unshifted ouids for socket file operations as observed when using AppArmor profiles in unprivileged containers with LXD or Incus. For example, root inside container and uid 1000000 outside, with `owner /root/sock rw,` profile entry for nc: /root$ nc -lkU sock & nc -U sock ==> dmesg apparmor="DENIED" operation="connect" class="file" namespace="root//lxd-podia_<var-snap-lxd-common-lxd>" profile="sockit" name="/root/sock" pid=3924 comm="nc" requested_mask="wr" denied_mask="wr" fsuid=1000000 ouid=0 [<== should be 1000000] Fix by performing uid mapping as per common_perm_cond() in lsm.c Signed-off-by: Gabriel Totev <gabriel.totev@zetier.com> Fixes: c05e705812d1 ("apparmor: add fine grained af_unix mediation") Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20apparmor: shift ouid when mediating hard links in usernsGabriel Totev
When using AppArmor profiles inside an unprivileged container, the link operation observes an unshifted ouid. (tested with LXD and Incus) For example, root inside container and uid 1000000 outside, with `owner /root/link l,` profile entry for ln: /root$ touch chain && ln chain link ==> dmesg apparmor="DENIED" operation="link" class="file" namespace="root//lxd-feet_<var-snap-lxd-common-lxd>" profile="linkit" name="/root/link" pid=1655 comm="ln" requested_mask="l" denied_mask="l" fsuid=1000000 ouid=0 [<== should be 1000000] target="/root/chain" Fix by mapping inode uid of old_dentry in aa_path_link() rather than using it directly, similarly to how it's mapped in __file_path_perm() later in the file. Signed-off-by: Gabriel Totev <gabriel.totev@zetier.com> Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20apparmor: make sure unix socket labeling is correctly updated.John Johansen
When a unix socket is passed into a different confinement domain make sure its cached mediation labeling is updated to correctly reflect which domains are using the socket. Fixes: c05e705812d1 ("apparmor: add fine grained af_unix mediation") Signed-off-by: John Johansen <john.johansen@canonical.com>
2025-07-20net/mlx5: Expose cable_length field in PFCC registerOren Sidi
Introduce new "cable_length" field in PFCC register and related fields to enhance rx buffer configuration management: 1. cable_length: Shifts cable length handling to fw by storing a manually entered length from user in PFCC.cable_length 2. lane_rate_oper: In a case where PFCC.cable_length is not supported, helps compute a default cable length Signed-off-by: Oren Sidi <osidi@nvidia.com> Reviewed-by: Alex Lazar <alazar@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1752734895-257735-4-git-send-email-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-07-20net/mlx5: Add IFC bits and enums for buf_ownershipOren Sidi
Extend structure layouts and defines buf_ownership. buf_ownership indicates whether the buffer is managed by SW or FW. Signed-off-by: Oren Sidi <osidi@nvidia.com> Reviewed-by: Alex Lazar <alazar@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1752734895-257735-3-git-send-email-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-07-20net/mlx5: Add IFC bits to support RSS for IPSec offloadJianbo Liu
This adds the capabilities, ipsec_next_header and inner/outer l4_type_ext fields to support RSS for the decrypted packets. These fields are specifically for firmware steering. HWS validation logic is updated to correctly handle the changes, ensuring the unsupported fields are not set. Besides, reserved_at_c4 is fixed to reserved_at_d4 to reflect the accurate offset within the structure. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1752734895-257735-2-git-send-email-tariqt@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-07-19seq_buf: Introduce KUnit testsKees Cook
Add KUnit tests for the seq_buf API to ensure its correctness and prevent future regressions, covering the following functions: - seq_buf_init() - DECLARE_SEQ_BUF() - seq_buf_clear() - seq_buf_puts() - seq_buf_putc() - seq_buf_printf() - seq_buf_get_buf() - seq_buf_commit() $ tools/testing/kunit/kunit.py run seq_buf =================== seq_buf (9 subtests) =================== [PASSED] seq_buf_init_test [PASSED] seq_buf_declare_test [PASSED] seq_buf_clear_test [PASSED] seq_buf_puts_test [PASSED] seq_buf_puts_overflow_test [PASSED] seq_buf_putc_test [PASSED] seq_buf_printf_test [PASSED] seq_buf_printf_overflow_test [PASSED] seq_buf_get_buf_commit_test ===================== [PASSED] seq_buf ===================== Link: https://lore.kernel.org/r/20250717085156.work.363-kees@kernel.org Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Kees Cook <kees@kernel.org>
2025-07-19Input: xpad - set correct controller type for Acer NGR200Nilton Perim Neto
The controller should have been set as XTYPE_XBOX360 and not XTYPE_XBOX. Also the entry is in the wrong place. Fix it. Reported-by: Vicki Pfau <vi@endrift.com> Signed-off-by: Nilton Perim Neto <niltonperimneto@gmail.com> Link: https://lore.kernel.org/r/20250708033126.26216-2-niltonperimneto@gmail.com Fixes: 22c69d786ef8 ("Input: xpad - support Acer NGR 200 Controller") Cc: stable@vger.kernel.org Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
2025-07-19dt-bindings: clock: qcom,sm4450-dispcc: Reference qcom,gcc.yamlSatya Priya Kakitapalli
Reference the common qcom,gcc.yaml schema to unify the common parts of the binding. Suggested-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Satya Priya Kakitapalli <quic_skakitap@quicinc.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250717-gcc-ref-fixes-v2-4-a2a571d2be28@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-19dt-bindings: clock: qcom,sm4450-camcc: Reference qcom,gcc.yamlSatya Priya Kakitapalli
Reference the common qcom,gcc.yaml schema to unify the common parts of the binding. Suggested-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Satya Priya Kakitapalli <quic_skakitap@quicinc.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250717-gcc-ref-fixes-v2-3-a2a571d2be28@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-19dt-bindings: clock: qcom,mmcc: Reference qcom,gcc.yamlSatya Priya Kakitapalli
Reference the common qcom,gcc.yaml schema to unify the common parts of the binding. Suggested-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Satya Priya Kakitapalli <quic_skakitap@quicinc.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250717-gcc-ref-fixes-v2-2-a2a571d2be28@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-19dt-bindings: clock: qcom,sm8150-camcc: Reference qcom,gcc.yamlSatya Priya Kakitapalli
Reference the common qcom,gcc.yaml schema to unify the common parts of the binding. Suggested-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Satya Priya Kakitapalli <quic_skakitap@quicinc.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250717-gcc-ref-fixes-v2-1-a2a571d2be28@quicinc.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-19dt-bindings: clock: qcom: Remove double colon from descriptionLuca Weiss
No double colon is necessary in the description. Fix it for all bindings so future bindings won't have the same copy-paste mistake. Reported-by: Rob Herring <robh@kernel.org> Closes: https://lore.kernel.org/lkml/20250625150458.GA1182597-robh@kernel.org/ Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20250717-bindings-double-colon-v1-1-c04abc180fcd@fairphone.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2025-07-19kasan: use vmalloc_dump_obj() for vmalloc error reportsMarco Elver
Since 6ee9b3d84775 ("kasan: remove kasan_find_vm_area() to prevent possible deadlock"), more detailed info about the vmalloc mapping and the origin was dropped due to potential deadlocks. While fixing the deadlock is necessary, that patch was too quick in killing an otherwise useful feature, and did no due-diligence in understanding if an alternative option is available. Restore printing more helpful vmalloc allocation info in KASAN reports with the help of vmalloc_dump_obj(). Example report: | BUG: KASAN: vmalloc-out-of-bounds in vmalloc_oob+0x4c9/0x610 | Read of size 1 at addr ffffc900002fd7f3 by task kunit_try_catch/493 | | CPU: [...] | Call Trace: | <TASK> | dump_stack_lvl+0xa8/0xf0 | print_report+0x17e/0x810 | kasan_report+0x155/0x190 | vmalloc_oob+0x4c9/0x610 | [...] | | The buggy address belongs to a 1-page vmalloc region starting at 0xffffc900002fd000 allocated at vmalloc_oob+0x36/0x610 | The buggy address belongs to the physical page: | page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x126364 | flags: 0x200000000000000(node=0|zone=2) | raw: 0200000000000000 0000000000000000 dead000000000122 0000000000000000 | raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 | page dumped because: kasan: bad access detected | | [..] Link: https://lkml.kernel.org/r/20250716152448.3877201-1-elver@google.com Fixes: 6ee9b3d84775 ("kasan: remove kasan_find_vm_area() to prevent possible deadlock") Signed-off-by: Marco Elver <elver@google.com> Suggested-by: Uladzislau Rezki <urezki@gmail.com> Acked-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Yunseong Kim <ysk@kzalloc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/ksm: fix -Wsometimes-uninitialized from clang-21 in advisor_mode_show()Nathan Chancellor
After a recent change in clang to expose uninitialized warnings from const variables [1], there is a false positive warning from the if statement in advisor_mode_show(). mm/ksm.c:3687:11: error: variable 'output' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] 3687 | else if (ksm_advisor == KSM_ADVISOR_SCAN_TIME) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ mm/ksm.c:3690:33: note: uninitialized use occurs here 3690 | return sysfs_emit(buf, "%s\n", output); | ^~~~~~ Rewrite the if statement to implicitly make KSM_ADVISOR_NONE the else branch so that it is obvious to the compiler that ksm_advisor can only be KSM_ADVISOR_NONE or KSM_ADVISOR_SCAN_TIME due to the assignments in advisor_mode_store(). Link: https://lkml.kernel.org/r/20250715-ksm-fix-clang-21-uninit-warning-v1-1-f443feb4bfc4@kernel.org Fixes: 66790e9a735b ("mm/ksm: add sysfs knobs for advisor") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Closes: https://github.com/ClangBuiltLinux/linux/issues/2100 Link: https://github.com/llvm/llvm-project/commit/2464313eef01c5b1edf0eccf57a32cdee01472c7 [1] Acked-by: David Hildenbrand <david@redhat.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Stefan Roesch <shr@devkernel.io> Cc: xu xin <xu.xin16@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm: update MAINTAINERS entry for HMMJason Gunthorpe
Jérôme has moved on from RH and has not been looking at HMM patches for some time. I've made the most changes to the core code in the recent period and Leon is now working on the HMM side from the RDMA ODP. Link: https://lkml.kernel.org/r/0-v1-a1df5219c7a3+1d981-hmm_maintainers_jgg@nvidia.com Closes: https://lore.kernel.org/all/39d43309-9f34-48bc-a9ad-108c607ba175@samsung.com/ Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Leon Romanovsky <leon@kernel.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19nilfs2: reject invalid file types when reading inodesRyusuke Konishi
To prevent inodes with invalid file types from tripping through the vfs and causing malfunctions or assertion failures, add a missing sanity check when reading an inode from a block device. If the file type is not valid, treat it as a filesystem error. Link: https://lkml.kernel.org/r/20250710134952.29862-1-konishi.ryusuke@gmail.com Fixes: 05fe58fdc10d ("nilfs2: inode operations") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+895c23f6917da440ed0d@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=895c23f6917da440ed0d Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19selftests/mm: fix split_huge_page_test for folio_split() testsZi Yan
PID_FMT does not have an offset field, so folio_split() tests are not performed. Add PID_FMT_OFFSET with an offset field and use it to perform folio_split() tests. Link: https://lkml.kernel.org/r/20250709012800.3225727-1-ziy@nvidia.com Fixes: 80a5c494c89f ("selftests/mm: add tests for folio_split(), buddy allocator like split") Signed-off-by: Zi Yan <ziy@nvidia.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Tested-by : Donet Tom <donettom@linux.ibm.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mariano Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mailmap: add entry for SenozhatskySergey Senozhatsky
Consolidate and map all addresses to a single one. Link: https://lkml.kernel.org/r/20250707075243.858895-1-senozhatsky@chromium.org Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/zsmalloc: do not pass __GFP_MOVABLE if CONFIG_COMPACTION=nHarry Yoo
Commit 48b4800a1c6a ("zsmalloc: page migration support") added support for migrating zsmalloc pages using the movable_operations migration framework. However, the commit did not take into account that zsmalloc supports migration only when CONFIG_COMPACTION is enabled. Tracing shows that zsmalloc was still passing the __GFP_MOVABLE flag even when compaction is not supported. This can result in unmovable pages being allocated from movable page blocks (even without stealing page blocks), ZONE_MOVABLE and CMA area. Possible user visible effects: - Some ZONE_MOVABLE memory can be not actually movable - CMA allocation can fail because of this - Increased memory fragmentation due to ignoring the page mobility grouping feature I'm not really sure who uses kernels without compaction support, though :( To fix this, clear the __GFP_MOVABLE flag when !IS_ENABLED(CONFIG_COMPACTION). Link: https://lkml.kernel.org/r/20250704103053.6913-1-harry.yoo@oracle.com Fixes: 48b4800a1c6a ("zsmalloc: page migration support") Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19mm/vmscan: fix hwpoisoned large folio handling in shrink_folio_listJinjiang Tu
In shrink_folio_list(), the hwpoisoned folio may be large folio, which can't be handled by unmap_poisoned_folio(). For THP, try_to_unmap_one() must be passed with TTU_SPLIT_HUGE_PMD to split huge PMD first and then retry. Without TTU_SPLIT_HUGE_PMD, we will trigger null-ptr deref of pvmw.pte. Even we passed TTU_SPLIT_HUGE_PMD, we will trigger a WARN_ON_ONCE due to the page isn't in swapcache. Since UCE is rare in real world, and race with reclaimation is more rare, just skipping the hwpoisoned large folio is enough. memory_failure() will handle it if the UCE is triggered again. This happens when memory reclaim for large folio races with memory_failure(), and will lead to kernel panic. The race is as follows: cpu0 cpu1 shrink_folio_list memory_failure TestSetPageHWPoison unmap_poisoned_folio --> trigger BUG_ON due to unmap_poisoned_folio couldn't handle large folio [tujinjiang@huawei.com: add comment to unmap_poisoned_folio()] Link: https://lkml.kernel.org/r/69fd4e00-1b13-d5f7-1c82-705c7d977ea4@huawei.com Link: https://lkml.kernel.org/r/20250627125747.3094074-2-tujinjiang@huawei.com Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com> Fixes: 1b0449544c64 ("mm/vmscan: don't try to reclaim hwpoison folio") Reported-by: syzbot+3b220254df55d8ca8a61@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/68412d57.050a0220.2461cf.000e.GAE@google.com/ Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19lib/raid6: update recov_rvv.c zero page usageHerbert Xu
Update lib/raid6/recov_rvv.c, for 1857fcc84744 ("lib/raid6: replace custom zero page with ZERO_PAGE"), per Klara. Link: https://lkml.kernel.org/r/aFkUnXWtxcgOTVkw@gondor.apana.org.au Fixes: 1857fcc84744 ("lib/raid6: replace custom zero page with ZERO_PAGE") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Song Liu <song@kernel.org> Cc: Yu Kuai <yukuai3@huawei.com> Cc: Klara Modin <klarasmodin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19docs: update docs after introducing delaytopWang Yaxin
The "getdelays" can only display the latency of a single task by specifying a PID, we introduce the "delaytop" with the following capabilities: 1. system view: monitors latency metrics (CPU, I/O, memory, IRQ, etc.) for all system processes 2. supports field-based sorting (e.g., default sort by CPU latency in descending order) 3. dynamic interactive interface: focus on specific processes with --pid; limit displayed entries with --processes 20; control monitoring duration with --iterations; Link: https://lkml.kernel.org/r/2025071013514177028RdjISjqeIOnTCRvGAwy@zte.com.cn Signed-off-by: Fan Yu <fan.yu9@zte.com.cn> Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Signed-off-by: Jiang Kun <jiang.kun2@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Peilin He <he.peilin@zte.com.cn> Cc: Qiang Tu <tu.qiang35@zte.com.cn> Cc: wangyong <wang.yong12@zte.com.cn> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19delaytop: add psi info to show system delayWang Yaxin
Support showing whole delay of system by reading PSI, just like the first few lines of information output by the top command. the output of delaytop includes both system-wide delay and delay of individual tasks, providing a more comprehensive reflection of system latency status. Use case ======== bash# ./delaytop System Pressure Information: (avg10/avg60/avg300/total) CPU: full: 0.0%/ 0.0%/ 0.0%/0 some: 0.1%/ 0.0%/ 0.0%/14216596 Memory: full: 0.0%/ 0.0%/ 0.0%/34010659 some: 0.0%/ 0.0%/ 0.0%/35406492 IO: full: 0.1%/ 0.0%/ 0.0%/51029453 some: 0.1%/ 0.0%/ 0.0%/55330465 IRQ: full: 0.0%/ 0.0%/ 0.0%/0 Top 20 processes (sorted by CPU delay): PID TGID COMMAND CPU(ms) IO(ms) SWAP(ms) RCL(ms) THR(ms) CMP(ms) WP(ms) IRQ(ms) --------------------------------------------------------------------------------------------- 32 32 kworker/2:0H-sy 23.65 0.00 0.00 0.00 0.00 0.00 0.00 0.00 497 497 kworker/R-scsi_ 1.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 495 495 kworker/R-scsi_ 1.13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 494 494 scsi_eh_0 1.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 485 485 kworker/R-ata_s 0.90 0.00 0.00 0.00 0.00 0.00 0.00 0.00 574 574 kworker/R-kdmfl 0.36 0.00 0.00 0.00 0.00 0.00 0.00 0.00 34 34 idle_inject/3 0.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1123 1123 nde-netfilter 0.28 0.00 0.00 0.00 0.00 0.00 0.00 0.00 60 60 ksoftirqd/7 0.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 114 114 kworker/0:2-cgr 0.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 496 496 scsi_eh_1 0.24 0.00 0.00 0.00 0.00 0.00 0.00 0.00 51 51 cpuhp/6 0.24 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1667 1667 atd 0.24 0.00 0.00 0.00 0.00 0.00 0.00 0.00 45 45 cpuhp/5 0.23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1102 1102 nde-backupservi 0.22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1098 1098 systemsettings 0.21 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1100 1100 audit-monitor 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 53 53 migration/6 0.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1482 1482 sshd 0.19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 39 39 cpuhp/4 0.19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Link: https://lkml.kernel.org/r/20250710135451340_5pOgpIFi0M5AE7H44W1D@zte.com.cn Co-developed-by: Fan Yu <fan.yu9@zte.com.cn> Signed-off-by: Fan Yu <fan.yu9@zte.com.cn> Signed-off-by: Wang Yaxin <wang.yaxin@zte.com.cn> Signed-off-by: Jiang Kun <jiang.kun2@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Peilin He <he.peilin@zte.com.cn> Cc: Qiang Tu <tu.qiang35@zte.com.cn> Cc: wangyong <wang.yong12@zte.com.cn> Cc: xu xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: Yunkai Zhang <zhang.yunkai@zte.com.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19selftests/thermal: remove duplicate newlines in perror callsWangYuli
perror() automatically appends a newline character, so the explicit '\n' in the format strings is redundant and results in duplicate newlines in the output. Remove the redundant '\n' characters from perror() calls in workload_hint_test.c to fix the formatting. Link: https://lkml.kernel.org/r/F482FB1EC020000C+20250710134751.306096-1-wangyuli@uniontech.com Signed-off-by: WangYuli <wangyuli@uniontech.com> Cc: Guan Wentao <guanwentao@uniontech.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19selftests/thermal: remove duplicate sprintf() call in workload_hint_testWangYuli
Remove redundant sprintf() call that was duplicating the same operation of formatting delay_str with argv[1]. Link: https://lkml.kernel.org/r/6338CD0E839B770B+20250710130412.284531-1-wangyuli@uniontech.com Signed-off-by: WangYuli <wangyuli@uniontech.com> Cc: Guan Wentao <guanwentao@uniontech.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19riscv: optimize gcd() performance on RISC-V without Zbb extensionKuan-Wei Chiu
The binary GCD implementation uses FFS (find first set), which benefits from hardware support for the ctz instruction, provided by the Zbb extension on RISC-V. Without Zbb, this results in slower software-emulated behavior. Previously, RISC-V always used the binary GCD, regardless of actual hardware support. This patch improves runtime efficiency by disabling the efficient_ffs_key static branch when Zbb is either not enabled in the kernel (config) or not supported on the executing CPU. This selects the odd-even GCD implementation, which is faster in the absence of efficient FFS. This change ensures the most suitable GCD algorithm is chosen dynamically based on actual hardware capabilities. Link: https://lkml.kernel.org/r/20250606134758.1308400-4-visitorckw@gmail.com Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19riscv: optimize gcd() code size when CONFIG_RISCV_ISA_ZBB is disabledKuan-Wei Chiu
The binary GCD implementation depends on efficient ffs(), which on RISC-V requires hardware support for the Zbb extension. When CONFIG_RISCV_ISA_ZBB is not enabled, the kernel will never use binary GCD, as runtime logic will always fall back to the odd-even implementation. To avoid compiling unused code and reduce code size, select CONFIG_CPU_NO_EFFICIENT_FFS when CONFIG_RISCV_ISA_ZBB is not set. $ ./scripts/bloat-o-meter ./lib/math/gcd.o.old ./lib/math/gcd.o.new add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-274 (-274) Function old new delta gcd 360 86 -274 Total: Before=384, After=110, chg -71.35% Link: https://lkml.kernel.org/r/20250606134758.1308400-3-visitorckw@gmail.com Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Acked-by: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19lib/math/gcd: use static key to select implementation at runtimeKuan-Wei Chiu
Patch series "Optimize GCD performance on RISC-V by selecting implementation at runtime", v3. The current implementation of gcd() selects between the binary GCD and the odd-even GCD algorithm at compile time, depending on whether CONFIG_CPU_NO_EFFICIENT_FFS is set. On platforms like RISC-V, however, this compile-time decision can be misleading: even when the compiler emits ctz instructions based on the assumption that they are efficient (as is the case when CONFIG_RISCV_ISA_ZBB is enabled), the actual hardware may lack support for the Zbb extension. In such cases, ffs() falls back to a software implementation at runtime, making the binary GCD algorithm significantly slower than the odd-even variant. To address this, we introduce a static key to allow runtime selection between the binary and odd-even GCD implementations. On RISC-V, the kernel now checks for Zbb support during boot. If Zbb is unavailable, the static key is disabled so that gcd() consistently uses the more efficient odd-even algorithm in that scenario. Additionally, to further reduce code size, we select CONFIG_CPU_NO_EFFICIENT_FFS automatically when CONFIG_RISCV_ISA_ZBB is not enabled, avoiding compilation of the unused binary GCD implementation entirely on systems where it would never be executed. This series ensures that the most efficient GCD algorithm is used in practice and avoids compiling unnecessary code based on hardware capabilities and kernel configuration. This patch (of 3): On platforms like RISC-V, the compiler may generate hardware FFS instructions even if the underlying CPU does not actually support them. Currently, the GCD implementation is chosen at compile time based on CONFIG_CPU_NO_EFFICIENT_FFS, which can result in suboptimal behavior on such systems. Introduce a static key, efficient_ffs_key, to enable runtime selection between the binary GCD (using ffs) and the odd-even GCD implementation. This allows the kernel to default to the faster binary GCD when FFS is efficient, while retaining the ability to fall back when needed. Link: https://lkml.kernel.org/r/20250606134758.1308400-1-visitorckw@gmail.com Link: https://lkml.kernel.org/r/20250606134758.1308400-2-visitorckw@gmail.com Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19ocfs2: avoid potential ABBA deadlock by reordering tl_inode lockIvan Pravdin
In ocfs2_move_extent(), tl_inode is currently locked after the global bitmap inode. However, in ocfs2_flush_truncate_log(), the lock order is reversed: tl_inode is locked first, followed by the global bitmap inode. This creates a classic ABBA deadlock scenario if two threads attempt these operations concurrently and acquire the locks in different orders. To prevent this, move the tl_inode locking earlier in ocfs2_move_extent(), so that it always precedes the global bitmap inode lock. No functional changes beyond lock ordering. Link: https://lkml.kernel.org/r/20250708020640.387741-1-ipravdin.official@gmail.com Reported-by: syzbot+6bf948e47f9bac7aacfa@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67d5645c.050a0220.1dc86f.0004.GAE@google.com/ Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19squashfs: fix incorrect argument to sizeof in kmalloc_array callColin Ian King
The sizeof(void *) is the incorrect argument in the kmalloc_array call, it best to fix this by using sizeof(*cache_folios) instead. Fortunately the sizes of void* and folio* happen to be the same, so this has not shown up as a run time issue. [akpm@linux-foundation.org: fix build] Link: https://lkml.kernel.org/r/20250708142604.1891156-1-colin.i.king@gmail.com Fixes: 2e227ff5e272 ("squashfs: add optional full compressed block caching") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Chanho Min <chanho.min@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19squashfs: replace ;; with ; and end of fi declarationColin Ian King
There is an extraneous ; after a declaration, remove it. Link: https://lkml.kernel.org/r/20250708114900.1883130-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19ocfs2: avoid NULL pointer dereference in dx_dir_lookup_rec()Ivan Pravdin
When a directory entry is not found, ocfs2_dx_dir_lookup_rec() prints an error message that unconditionally dereferences the 'rec' pointer. However, if 'rec' is NULL, this leads to a NULL pointer dereference and a kernel panic. Add an explicit check empty extent list to avoid dereferencing NULL 'rec' pointer. Link: https://lkml.kernel.org/r/20250708001009.372263-1-ipravdin.official@gmail.com Reported-by: syzbot+20282c1b2184a857ac4c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67cd7e29.050a0220.e1a89.0007.GAE@google.com/ Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19ocfs2/dlm: fix "take a while" typoAhelenia Ziemiańska
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19init/main.c: add warning when file specified in rdinit is inaccessibleLillian Berry
Avoid silently ignoring the initramfs when the file specified in rdinit is not usable. This prints an error that clearly explains the issue (file was not found, vs initramfs was not found). Link: https://lkml.kernel.org/r/20250707091411.1412681-1-lillian@star-ark.net Signed-off-by: Lillian Berry <lillian@star-ark.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19samples: enhance hung_task detector test with read-write semaphore supportZi Li
Extend the hung_task detector test module to include read-write semaphore support alongside existing mutex and semaphore tests. This module now creates additional debugfs files under <debugfs>/hung_task, namely 'rw_semaphore_read' and 'rw_semaphore_write', in addition to 'mutex' and 'semaphore'. Reading these files with multiple processes triggers a prolonged sleep (256 seconds) while holding the respective lock, enabling hung_task detector testing for various locking mechanisms. This change builds on the extensible hung_task_tests module, adding read-write semaphore functionality to improve test coverage for kernel locking primitives. The implementation ensures proper lock handling and includes checks to prevent redundant data reads. Usage is: > cd /sys/kernel/debug/hung_task > cat mutex & cat mutex # Test mutex blocking > cat semaphore & cat semaphore # Test semaphore blocking > cat rw_semaphore_write \ & cat rw_semaphore_read # Test rwsem blocking > cat rw_semaphore_write \ & cat rw_semaphore_write # Test rwsem blocking Update the Kconfig description to reflect the addition of read-write semaphore debugfs files. Link: https://lkml.kernel.org/r/20250627072924.36567-4-lance.yang@linux.dev Signed-off-by: Zi Li <zi.li@linux.dev> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Stultz <jstultz@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Mingzhe Yang <mingzhe.yang@ly.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Yongliang Gao <leonylgao@tencent.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19hung_task: extend hung task blocker tracking to rwsemsLance Yang
Inspired by mutex blocker tracking[1], and having already extended it to semaphores, let's now add support for reader-writer semaphores (rwsems). The approach is simple: when a task enters TASK_UNINTERRUPTIBLE while waiting for an rwsem, we just call hung_task_set_blocker(). The hung task detector can then query the rwsem's owner to identify the lock holder. Tracking works reliably for writers, as there can only be a single writer holding the lock, and its task struct is stored in the owner field. The main challenge lies with readers. The owner field points to only one of many concurrent readers, so we might lose track of the blocker if that specific reader unlocks, even while others remain. This is not a significant issue, however. In practice, long-lasting lock contention is almost always caused by a writer. Therefore, reliably tracking the writer is the primary goal of this patch series ;) With this change, the hung task detector can now show blocker task's info like below: [Fri Jun 27 15:21:34 2025] INFO: task cat:28631 blocked for more than 122 seconds. [Fri Jun 27 15:21:34 2025] Tainted: G S 6.16.0-rc3 #8 [Fri Jun 27 15:21:34 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Fri Jun 27 15:21:34 2025] task:cat state:D stack:0 pid:28631 tgid:28631 ppid:28501 task_flags:0x400000 flags:0x00004000 [Fri Jun 27 15:21:34 2025] Call Trace: [Fri Jun 27 15:21:34 2025] <TASK> [Fri Jun 27 15:21:34 2025] __schedule+0x7c7/0x1930 [Fri Jun 27 15:21:34 2025] ? __pfx___schedule+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? policy_nodemask+0x215/0x340 [Fri Jun 27 15:21:34 2025] ? _raw_spin_lock_irq+0x8a/0xe0 [Fri Jun 27 15:21:34 2025] ? __pfx__raw_spin_lock_irq+0x10/0x10 [Fri Jun 27 15:21:34 2025] schedule+0x6a/0x180 [Fri Jun 27 15:21:34 2025] schedule_preempt_disabled+0x15/0x30 [Fri Jun 27 15:21:34 2025] rwsem_down_read_slowpath+0x55e/0xe10 [Fri Jun 27 15:21:34 2025] ? __pfx_rwsem_down_read_slowpath+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __pfx___might_resched+0x10/0x10 [Fri Jun 27 15:21:34 2025] down_read+0xc9/0x230 [Fri Jun 27 15:21:34 2025] ? __pfx_down_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __debugfs_file_get+0x14d/0x700 [Fri Jun 27 15:21:34 2025] ? __pfx___debugfs_file_get+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? handle_pte_fault+0x52a/0x710 [Fri Jun 27 15:21:34 2025] ? selinux_file_permission+0x3a9/0x590 [Fri Jun 27 15:21:34 2025] read_dummy_rwsem_read+0x4a/0x90 [Fri Jun 27 15:21:34 2025] full_proxy_read+0xff/0x1c0 [Fri Jun 27 15:21:34 2025] ? rw_verify_area+0x6d/0x410 [Fri Jun 27 15:21:34 2025] vfs_read+0x177/0xa50 [Fri Jun 27 15:21:34 2025] ? __pfx_vfs_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? fdget_pos+0x1cf/0x4c0 [Fri Jun 27 15:21:34 2025] ksys_read+0xfc/0x1d0 [Fri Jun 27 15:21:34 2025] ? __pfx_ksys_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] do_syscall_64+0x66/0x2d0 [Fri Jun 27 15:21:34 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e [Fri Jun 27 15:21:34 2025] RIP: 0033:0x7f3f8faefb40 [Fri Jun 27 15:21:34 2025] RSP: 002b:00007ffdeda5ab98 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [Fri Jun 27 15:21:34 2025] RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007f3f8faefb40 [Fri Jun 27 15:21:34 2025] RDX: 0000000000010000 RSI: 00000000010fa000 RDI: 0000000000000003 [Fri Jun 27 15:21:34 2025] RBP: 00000000010fa000 R08: 0000000000000000 R09: 0000000000010fff [Fri Jun 27 15:21:34 2025] R10: 00007ffdeda59fe0 R11: 0000000000000246 R12: 00000000010fa000 [Fri Jun 27 15:21:34 2025] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000fff [Fri Jun 27 15:21:34 2025] </TASK> [Fri Jun 27 15:21:34 2025] INFO: task cat:28631 <reader> blocked on an rw-semaphore likely owned by task cat:28630 <writer> [Fri Jun 27 15:21:34 2025] task:cat state:S stack:0 pid:28630 tgid:28630 ppid:28501 task_flags:0x400000 flags:0x00004000 [Fri Jun 27 15:21:34 2025] Call Trace: [Fri Jun 27 15:21:34 2025] <TASK> [Fri Jun 27 15:21:34 2025] __schedule+0x7c7/0x1930 [Fri Jun 27 15:21:34 2025] ? __pfx___schedule+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __mod_timer+0x304/0xa80 [Fri Jun 27 15:21:34 2025] schedule+0x6a/0x180 [Fri Jun 27 15:21:34 2025] schedule_timeout+0xfb/0x230 [Fri Jun 27 15:21:34 2025] ? __pfx_schedule_timeout+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __pfx_process_timeout+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? down_write+0xc4/0x140 [Fri Jun 27 15:21:34 2025] msleep_interruptible+0xbe/0x150 [Fri Jun 27 15:21:34 2025] read_dummy_rwsem_write+0x54/0x90 [Fri Jun 27 15:21:34 2025] full_proxy_read+0xff/0x1c0 [Fri Jun 27 15:21:34 2025] ? rw_verify_area+0x6d/0x410 [Fri Jun 27 15:21:34 2025] vfs_read+0x177/0xa50 [Fri Jun 27 15:21:34 2025] ? __pfx_vfs_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? fdget_pos+0x1cf/0x4c0 [Fri Jun 27 15:21:34 2025] ksys_read+0xfc/0x1d0 [Fri Jun 27 15:21:34 2025] ? __pfx_ksys_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] do_syscall_64+0x66/0x2d0 [Fri Jun 27 15:21:34 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e [Fri Jun 27 15:21:34 2025] RIP: 0033:0x7f8f288efb40 [Fri Jun 27 15:21:34 2025] RSP: 002b:00007ffffb631038 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [Fri Jun 27 15:21:34 2025] RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007f8f288efb40 [Fri Jun 27 15:21:34 2025] RDX: 0000000000010000 RSI: 000000002a4b5000 RDI: 0000000000000003 [Fri Jun 27 15:21:34 2025] RBP: 000000002a4b5000 R08: 0000000000000000 R09: 0000000000010fff [Fri Jun 27 15:21:34 2025] R10: 00007ffffb630460 R11: 0000000000000246 R12: 000000002a4b5000 [Fri Jun 27 15:21:34 2025] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000fff [Fri Jun 27 15:21:34 2025] </TASK> [1] https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com/ Link: https://lkml.kernel.org/r/20250627072924.36567-3-lance.yang@linux.dev Signed-off-by: Lance Yang <lance.yang@linux.dev> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Stultz <jstultz@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Mingzhe Yang <mingzhe.yang@ly.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Yongliang Gao <leonylgao@tencent.com> Cc: Zi Li <zi.li@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-19locking/rwsem: make owner helpers globally availableLance Yang
Patch series "extend hung task blocker tracking to rwsems". Inspired by mutex blocker tracking[1], and having already extended it to semaphores, let's now add support for reader-writer semaphores (rwsems). The approach is simple: when a task enters TASK_UNINTERRUPTIBLE while waiting for an rwsem, we just call hung_task_set_blocker(). The hung task detector can then query the rwsem's owner to identify the lock holder. Tracking works reliably for writers, as there can only be a single writer holding the lock, and its task struct is stored in the owner field. The main challenge lies with readers. The owner field points to only one of many concurrent readers, so we might lose track of the blocker if that specific reader unlocks, even while others remain. This is not a significant issue, however. In practice, long-lasting lock contention is almost always caused by a writer. Therefore, reliably tracking the writer is the primary goal of this patch series ;) With this change, the hung task detector can now show blocker task's info like below: [Fri Jun 27 15:21:34 2025] INFO: task cat:28631 blocked for more than 122 seconds. [Fri Jun 27 15:21:34 2025] Tainted: G S 6.16.0-rc3 #8 [Fri Jun 27 15:21:34 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Fri Jun 27 15:21:34 2025] task:cat state:D stack:0 pid:28631 tgid:28631 ppid:28501 task_flags:0x400000 flags:0x00004000 [Fri Jun 27 15:21:34 2025] Call Trace: [Fri Jun 27 15:21:34 2025] <TASK> [Fri Jun 27 15:21:34 2025] __schedule+0x7c7/0x1930 [Fri Jun 27 15:21:34 2025] ? __pfx___schedule+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? policy_nodemask+0x215/0x340 [Fri Jun 27 15:21:34 2025] ? _raw_spin_lock_irq+0x8a/0xe0 [Fri Jun 27 15:21:34 2025] ? __pfx__raw_spin_lock_irq+0x10/0x10 [Fri Jun 27 15:21:34 2025] schedule+0x6a/0x180 [Fri Jun 27 15:21:34 2025] schedule_preempt_disabled+0x15/0x30 [Fri Jun 27 15:21:34 2025] rwsem_down_read_slowpath+0x55e/0xe10 [Fri Jun 27 15:21:34 2025] ? __pfx_rwsem_down_read_slowpath+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __pfx___might_resched+0x10/0x10 [Fri Jun 27 15:21:34 2025] down_read+0xc9/0x230 [Fri Jun 27 15:21:34 2025] ? __pfx_down_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __debugfs_file_get+0x14d/0x700 [Fri Jun 27 15:21:34 2025] ? __pfx___debugfs_file_get+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? handle_pte_fault+0x52a/0x710 [Fri Jun 27 15:21:34 2025] ? selinux_file_permission+0x3a9/0x590 [Fri Jun 27 15:21:34 2025] read_dummy_rwsem_read+0x4a/0x90 [Fri Jun 27 15:21:34 2025] full_proxy_read+0xff/0x1c0 [Fri Jun 27 15:21:34 2025] ? rw_verify_area+0x6d/0x410 [Fri Jun 27 15:21:34 2025] vfs_read+0x177/0xa50 [Fri Jun 27 15:21:34 2025] ? __pfx_vfs_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? fdget_pos+0x1cf/0x4c0 [Fri Jun 27 15:21:34 2025] ksys_read+0xfc/0x1d0 [Fri Jun 27 15:21:34 2025] ? __pfx_ksys_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] do_syscall_64+0x66/0x2d0 [Fri Jun 27 15:21:34 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e [Fri Jun 27 15:21:34 2025] RIP: 0033:0x7f3f8faefb40 [Fri Jun 27 15:21:34 2025] RSP: 002b:00007ffdeda5ab98 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [Fri Jun 27 15:21:34 2025] RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007f3f8faefb40 [Fri Jun 27 15:21:34 2025] RDX: 0000000000010000 RSI: 00000000010fa000 RDI: 0000000000000003 [Fri Jun 27 15:21:34 2025] RBP: 00000000010fa000 R08: 0000000000000000 R09: 0000000000010fff [Fri Jun 27 15:21:34 2025] R10: 00007ffdeda59fe0 R11: 0000000000000246 R12: 00000000010fa000 [Fri Jun 27 15:21:34 2025] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000fff [Fri Jun 27 15:21:34 2025] </TASK> [Fri Jun 27 15:21:34 2025] INFO: task cat:28631 <reader> blocked on an rw-semaphore likely owned by task cat:28630 <writer> [Fri Jun 27 15:21:34 2025] task:cat state:S stack:0 pid:28630 tgid:28630 ppid:28501 task_flags:0x400000 flags:0x00004000 [Fri Jun 27 15:21:34 2025] Call Trace: [Fri Jun 27 15:21:34 2025] <TASK> [Fri Jun 27 15:21:34 2025] __schedule+0x7c7/0x1930 [Fri Jun 27 15:21:34 2025] ? __pfx___schedule+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __mod_timer+0x304/0xa80 [Fri Jun 27 15:21:34 2025] schedule+0x6a/0x180 [Fri Jun 27 15:21:34 2025] schedule_timeout+0xfb/0x230 [Fri Jun 27 15:21:34 2025] ? __pfx_schedule_timeout+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? __pfx_process_timeout+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? down_write+0xc4/0x140 [Fri Jun 27 15:21:34 2025] msleep_interruptible+0xbe/0x150 [Fri Jun 27 15:21:34 2025] read_dummy_rwsem_write+0x54/0x90 [Fri Jun 27 15:21:34 2025] full_proxy_read+0xff/0x1c0 [Fri Jun 27 15:21:34 2025] ? rw_verify_area+0x6d/0x410 [Fri Jun 27 15:21:34 2025] vfs_read+0x177/0xa50 [Fri Jun 27 15:21:34 2025] ? __pfx_vfs_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] ? fdget_pos+0x1cf/0x4c0 [Fri Jun 27 15:21:34 2025] ksys_read+0xfc/0x1d0 [Fri Jun 27 15:21:34 2025] ? __pfx_ksys_read+0x10/0x10 [Fri Jun 27 15:21:34 2025] do_syscall_64+0x66/0x2d0 [Fri Jun 27 15:21:34 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e [Fri Jun 27 15:21:34 2025] RIP: 0033:0x7f8f288efb40 [Fri Jun 27 15:21:34 2025] RSP: 002b:00007ffffb631038 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [Fri Jun 27 15:21:34 2025] RAX: ffffffffffffffda RBX: 0000000000010000 RCX: 00007f8f288efb40 [Fri Jun 27 15:21:34 2025] RDX: 0000000000010000 RSI: 000000002a4b5000 RDI: 0000000000000003 [Fri Jun 27 15:21:34 2025] RBP: 000000002a4b5000 R08: 0000000000000000 R09: 0000000000010fff [Fri Jun 27 15:21:34 2025] R10: 00007ffffb630460 R11: 0000000000000246 R12: 000000002a4b5000 [Fri Jun 27 15:21:34 2025] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000fff [Fri Jun 27 15:21:34 2025] </TASK> This patch (of 3): In preparation for extending blocker tracking to support rwsems, make the rwsem_owner() and is_rwsem_reader_owned() helpers globally available for determining if the blocker is a writer or one of the readers. Additionally, a stale owner pointer in a reader-owned rwsem can lead to false positives in blocker tracking when CONFIG_DETECT_HUNG_TASK_BLOCKER is enabled. To mitigate this, clear the owner field on the reader unlock path, similar to what CONFIG_DEBUG_RWSEMS does. A NULL owner is better than a stale one for diagnostics. Link: https://lkml.kernel.org/r/20250627072924.36567-1-lance.yang@linux.dev Link: https://lkml.kernel.org/r/20250627072924.36567-2-lance.yang@linux.dev Link: https://lore.kernel.org/all/174046694331.2194069.15472952050240807469.stgit@mhiramat.tok.corp.google.com/ [1] Signed-off-by: Lance Yang <lance.yang@linux.dev> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Stultz <jstultz@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Mingzhe Yang <mingzhe.yang@ly.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tomasz Figa <tfiga@chromium.org> Cc: Waiman Long <longman@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: Yongliang Gao <leonylgao@tencent.com> Cc: Zi Li <zi.li@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>