linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-05-11	nilfs2: remove calls to folio_set_error() and folio_clear_error()	Matthew Wilcox (Oracle)
	Nobody checks this flag on nilfs2 folios, stop setting and clearing it. That lets us simplify nilfs_end_folio_io() slightly. Link: https://lkml.kernel.org/r/20240420025029.2166544-17-willy@infradead.org Link: https://lkml.kernel.org/r/20240430050901.3239-1-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: kernel test robot <lkp@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order	Xiu Jianfeng
	Since commit 857f21397f71 ("memcg, oom: remove unnecessary check in mem_cgroup_oom_synchronize()"), memcg_oom_gfp_mask and memcg_oom_order are no longer used any more. Link: https://lkml.kernel.org/r/20240509032628.1217652-1-xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Benjamin Segall <bsegall@google.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage ↵	Dev Jain
	size at runtime Currently, the size used in mmap() is statically defined, leading to skipping of the test on a hugepage size other than 2 MB, since munmap() won't free the hugepage for a size greater than 2 MB. Hence, query the size at runtime. Also, there is no reason why a hugepage allocation should fail, since we are using a simple mmap() using MAP_HUGETLB; hence, instead of skipping the test, make it fail. Link: https://lkml.kernel.org/r/20240509095447.3791573-1-dev.jain@arm.com Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp	Oscar Salvador
	commit 1cb9dc4b475c ("mm: hwpoison: support recovery from HugePage copy-on-write faults") added support to use the mc variants when coping hugetlb pages on CoW faults. Add the missing VM_FAULT_SET_HINDEX, so the right si_addr_lsb will be passed to userspace to report the extension of the faulty area. Link: https://lkml.kernel.org/r/20240509100148.22384-3-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault	Oscar Salvador
	Patch series "Minor fixups for hugetlb fault path". This series contains a couple of fixups for hugetlb_fault and hugetlb_wp respectively, where a VM_FAULT_SET_HINDEX call was missing. I did not bother with a Fixes tag because the missing piece here is that we will not report to userspace the right extension of the faulty area by adjusting struct kernel_siginfo.si_addr_lsb, but I do not consider that to be a big issue because I assume that userspace already knows the size of the mapping anyway. This patch (of 2): commit af19487f00f3 ("mm: make PTE_MARKER_SWAPIN_ERROR more general") added the code to handle pte_markers in hugetlb faulting path. In case of an UFFD_POISON event, a PTE_MARKER_POISONED will be created and we will return VM_FAULT_HWPOISON_LARGE upon detecting that in the fault path. Add the missing VM_FAULT_SET_HINDEX, so the right si_addr_lsb will be passed to userspace to report the extension of the faulty area. Link: https://lkml.kernel.org/r/20240509100148.22384-1-osalvador@suse.de Link: https://lkml.kernel.org/r/20240509100148.22384-2-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Liu Shixin <liushixin2@huawei.com> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	selftests: cgroup: add tests to verify the zswap writeback path	Usama Arif
	Attempt writeback with the below steps and check using memory.stat.zswpwb if zswap writeback occurred: 1. Allocate memory. 2. Reclaim memory equal to the amount that was allocated in step 1. This will move it into zswap. 3. Save current zswap usage. 4. Move the memory allocated in step 1 back in from zswap. 5. Set zswap.max to half the amount that was recorded in step 3. 6. Attempt to reclaim memory equal to the amount that was allocated, this will either trigger writeback if it's enabled, or reclamation will fail if writeback is disabled as there isn't enough zswap space. Link: https://lkml.kernel.org/r/20240508171359.1545744-1-usamaarif642@gmail.com Signed-off-by: Usama Arif <usamaarif642@gmail.com> Suggested-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Nhat Pham <nphamcs@gmail.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	mm: memcg: make alloc_mem_cgroup_per_node_info() return bool	Xiu Jianfeng
	alloc_mem_cgroup_per_node_info() returns int that doesn't map to any errno error code. The only existing caller doesn't really need an error code so change the function to return bool (true on success) because this is slightly less confusing and more consistent with the other code. Link: https://lkml.kernel.org/r/20240507132324.1158510-1-xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	mm/damon/core: fix return value from damos_wmark_metric_value	Alex Rusuf
	damos_wmark_metric_value's return value is 'unsigned long', so returning -EINVAL as 'unsigned long' may turn out to be very different from the expected one (using 2's complement) and treat as usual matric's value. So, fix that, checking if returned value is not 0. Link: https://lkml.kernel.org/r/20240506180238.53842-1-sj@kernel.org Fixes: ee801b7dd782 ("mm/damon/schemes: activate schemes based on a watermarks mechanism") Signed-off-by: Alex Rusuf <yorha.op@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED	Yosry Ahmed
	Previously, all NR_VM_EVENT_ITEMS stats were maintained per-memcg, although some of those fields are not exposed anywhere. Commit 14e0f6c957e39 ("memcg: reduce memory for the lruvec and memcg stats") changed this such that we only maintain the stats we actually expose per-memcg via a translation table. Additionally, commit 514462bbe927b ("memcg: warn for unexpected events and stats") added a warning if a per-memcg stat update is attempted for a stat that is not in the translation table. The warning started firing for the NR_{FILE/SHMEM}_PMDMAPPED stat updates in the rmap code. These stats are not maintained per-memcg, and hence are not in the translation table. Do not use __lruvec_stat_mod_folio() when updating NR_FILE_PMDMAPPED and NR_SHMEM_PMDMAPPED. Use __mod_node_page_state() instead, which updates the global per-node stats only. Link: https://lkml.kernel.org/r/20240506192924.271999-1-yosryahmed@google.com Fixes: 514462bbe927 ("memcg: warn for unexpected events and stats") Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reported-by: syzbot+9319a4268a640e26b72b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/0000000000001b9d500617c8b23c@google.com Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	selftests: cgroup: remove redundant enabling of memory controller	Usama Arif
	Memory controller is already enabled in main which invokes the test, hence this does not need to be done in test_no_kmem_bypass. Link: https://lkml.kernel.org/r/20240502200529.4193651-2-usamaarif642@gmail.com Signed-off-by: Usama Arif <usamaarif642@gmail.com> Acked-by: Yosry Ahmed <yosryahmed@google.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree	SeongJae Park
	The document mentions any patches for review should based on mm-unstable instead of damon/next. It should be the recommended process, but sometimes patches based on damon/next could be posted for some reasons. Actually, the DAMON-based tiered memory management patchset[1] was written on top of 'young page' DAMOS filter patchset, which was in damon/next tree as of the writing. Allow such case and just ask such things to be clearly specified. [1] https://lore.kernel.org/20240405060858.2818-1-honggyu.kim@sk.com Link: https://lkml.kernel.org/r/20240503180318.72798-11-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST ↵	SeongJae Park
	to PT The document says the maintainer is working on only PST. The maintainer respects daylight saving system, though. Update the time zone to PT. Link: https://lkml.kernel.org/r/20240503180318.72798-10-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	Docs/mm/damon/design: use a list for supported filters	SeongJae Park
	Filters section is listing currently supported filter types in a normal paragraph. Since the number of types are higher than four, it is not easy to read for only specific types. Use a list for easier finding of specific types. [sj@kernel.org: fix build warning] Link: https://lkml.kernel.org/r/20240507161747.52430-1-sj@kernel.org Link: https://lkml.kernel.org/r/20240503180318.72798-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update ↵	SeongJae Park
	command To update effective size quota of DAMOS schemes on DAMON sysfs file interface, user should write 'update_schemes_effective_quotas' to the kdamond 'state' file. But the document is mistakenly saying the input string as 'update_schemes_effective_bytes'. Fix it (s/bytes/quotas/). Link: https://lkml.kernel.org/r/20240503180318.72798-8-sj@kernel.org Fixes: a6068d6dfa2f ("Docs/admin-guide/mm/damon/usage: document effective_bytes file") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.9.x] Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching ↵	SeongJae Park
	sysfs file The example usage of DAMOS filter sysfs files, specifically the part of 'matching' file writing for memcg type filter, is wrong. The intention is to exclude pages of a memcg that already getting enough care from a given scheme, but the example is setting the filter to apply the scheme to only the pages of the memcg. Fix it. Link: https://lkml.kernel.org/r/20240503180318.72798-7-sj@kernel.org Fixes: 9b7f9322a530 ("Docs/admin-guide/mm/damon/usage: document DAMOS filters of sysfs") Closes: https://lore.kernel.org/r/20240317191358.97578-1-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.3.x] Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	selftests/damon: classify tests for functionalities and regressions	SeongJae Park
	DAMON selftests can be classified into two categories: functionalities and regressions. Functionality tests are for checking if the function is working as specified, while the regression tests are basically reproducers of previously reported and fixed bugs. The tests of the categories are mixed in the selftests Makefile. Separate those for easier understanding of the types of tests. Link: https://lkml.kernel.org/r/20240503180318.72798-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'	SeongJae Park
	_damon_sysfs.py is using '==' or '!=' for 'None'. Since 'None' is a singleton, using 'is' or 'is not' is more efficient. Use the more efficient one. Link: https://lkml.kernel.org/r/20240503180318.72798-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts	SeongJae Park
	_damon_sysfs.py assumes sysfs is mounted at /sys. In some systems, that might not be true. Find the mount point from /proc/mounts file content. Link: https://lkml.kernel.org/r/20240503180318.72798-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	selftests/damon/_damon_sysfs: check errors from nr_schemes file reads	SeongJae Park
	DAMON context staging method in _damon_sysfs.py is not checking the returned error from nr_schemes file read. Check it. Link: https://lkml.kernel.org/r/20240503180318.72798-3-sj@kernel.org Fixes: f5f0e5a2bef9 ("selftests/damon/_damon_sysfs: implement kdamonds start function") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()	SeongJae Park
	Patch series "mm/damon: misc fixes and improvements". Add miscelleneous and non-urgent fixes and improvements for DAMON code, selftests, and documents. This patch (of 10): damos_quota_init_priv() function should initialize all private fields of struct damos_quota. However, it is not initializing ->esz_bp field. This could result in use of uninitialized variable from damon_feed_loop_next_input() function. There is no such issue at the moment because every caller of the function is passing damos_quota object that already having the field zero value. But we cannot guarantee the future, and the function is not doing what it is promising. A bug is a bug. This fix is for preventing possible future issues. Link: https://lkml.kernel.org/r/20240503180318.72798-1-sj@kernel.org Link: https://lkml.kernel.org/r/20240503180318.72798-2-sj@kernel.org Fixes: 9294a037c015 ("mm/damon/core: implement goal-oriented feedback-driven quota auto-tuning") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	selftests/damon: add a test for DAMOS quota goal	SeongJae Park
	Add a selftest for DAMOS quota goal. It tests the feature by setting a user_input metric based goal, change the current feedback, and check if the effective quota size is increased and decreased as expected. Link: https://lkml.kernel.org/r/20240502172718.74166-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	selftests/damon/_damon_sysfs: support quota goals	SeongJae Park
	Patch series "selftests/damon: add DAMOS quota goal test". Extend DAMON selftest-purpose sysfs wrapper to support DAMOS quota goal, and implement a simple selftest for the feature using it. This patch (of 2): The DAMON sysfs test purpose wrapper, _damon_sysfs.py, is not supporting quota goals. Implement the support for testing the feature. The test will be implemented and added by the following commit. Link: https://lkml.kernel.org/r/20240502172718.74166-1-sj@kernel.org Link: https://lkml.kernel.org/r/20240502172718.74166-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-05-11	selftests/harness: Handle TEST_F()'s explicit exit codes	Mickaël Salaün
	If TEST_F() explicitly calls exit(code) with code different than 0, then _metadata->exit_code is set to this code (e.g. KVM_ONE_VCPU_TEST()). We need to keep in mind that _metadata->exit_code can be KSFT_SKIP while the process exit code is 0. Cc: Jakub Kicinski <kuba@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Reported-by: Sean Christopherson <seanjc@google.com> Tested-by: Sean Christopherson <seanjc@google.com> Closes: https://lore.kernel.org/r/ZjPelW6-AbtYvslu@google.com Fixes: 0710a1a73fb4 ("selftests/harness: Merge TEST_F_FORK() into TEST_F()") Link: https://lore.kernel.org/r/20240511171445.904356-11-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11	selftests/harness: Fix vfork() side effects	Mickaël Salaün
	Setting the time namespace with CLONE_NEWTIME returns -EUSERS if the calling thread shares memory with another thread (because of the shared vDSO), which is the case when it is created with vfork(). Fix pidfd_setns_test by replacing test harness's vfork() call with a clone3() call with CLONE_VFORK, and an explicit sharing of the _metadata and self objects. Replace _metadata->teardown_parent with a new FIXTURE_TEARDOWN_PARENT() helper that can replace FIXTURE_TEARDOWN(). This is a cleaner approach and it enables to selectively share the fixture data between the child process running tests and the parent process running the fixture teardown. This also avoids updating several tests to not rely on the self object's copy-on-write property (e.g. storing the returned value of a fork() call). Cc: Christian Brauner <brauner@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Günther Noack <gnoack@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202403291015.1fcfa957-oliver.sang@intel.com Fixes: 0710a1a73fb4 ("selftests/harness: Merge TEST_F_FORK() into TEST_F()") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-10-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11	selftests/harness: Share _metadata between forked processes	Mickaël Salaün
	Unconditionally share _metadata between all forked processes, which enables to actually catch errors which were previously ignored. This is required for a following commit replacing vfork() with clone3() and CLONE_VFORK (i.e. not sharing the full memory) . It should also be useful to share _metadata to extend expectations to test process's forks. For instance, this change identified a wrong expectation in pidfd_setns_test. Because this _metadata is used by the new XFAIL_ADD(), use a global pointer initialized in TEST_F(). This is OK because only XFAIL_ADD() use it, and XFAIL_ADD() already depends on TEST_F(). Cc: Jakub Kicinski <kuba@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Will Drewry <wad@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-9-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11	selftests/pidfd: Fix wrong expectation	Mickaël Salaün
	Replace a wrong EXPECT_GT(self->child_pid_exited, 0) with EXPECT_GE(), which will be actually tested on the parent and child sides with a following commit. Cc: Shuah Khan <skhan@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240511171445.904356-8-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11	selftests/harness: Constify fixture variants	Mickaël Salaün
	FIXTURE_VARIANT_ADD() types are passed as const pointers to FIXTURE_TEARDOWN(). Make that explicit by constifying the variants declarations. Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Will Drewry <wad@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-7-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11	selftests/landlock: Do not allocate memory in fixture data	Mickaël Salaün
	Do not allocate self->dir_path in the test process because this would not be visible in the FIXTURE_TEARDOWN() process when relying on fork()/clone3() instead of vfork(). This change is required for a following commit removing vfork() call to not break the layout3_fs.* test cases. Cc: Günther Noack <gnoack@google.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-6-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11	selftests/harness: Fix interleaved scheduling leading to race conditions	Mickaël Salaün
	Fix a race condition when running several FIXTURE_TEARDOWN() managing the same resource. This fixes a race condition in the Landlock file system tests when creating or unmounting the same directory. Using clone3() with CLONE_VFORK guarantees that the child and grandchild test processes are sequentially scheduled. This is implemented with a new clone3_vfork() helper replacing the fork() call. This avoids triggering this error in __wait_for_test(): Test ended in some other way [127] Cc: Christian Brauner <brauner@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Günther Noack <gnoack@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Will Drewry <wad@chromium.org> Fixes: 41cca0542d7c ("selftests/harness: Fix TEST_F()'s vfork handling") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-5-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11	selftests/harness: Fix fixture teardown	Mickaël Salaün
	Make sure fixture teardowns are run when test cases failed, including when _metadata->teardown_parent is set to true. Make sure only one fixture teardown is run per test case, handling the case where the test child forks. Cc: Jakub Kicinski <kuba@kernel.org> Cc: Shengyu Li <shengyu.li.evgeny@gmail.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Fixes: 72d7cb5c190b ("selftests/harness: Prevent infinite loop due to Assert in FIXTURE_TEARDOWN") Fixes: 0710a1a73fb4 ("selftests/harness: Merge TEST_F_FORK() into TEST_F()") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-4-mic@digikod.net Rule: add Link: https://lore.kernel.org/stable/20240506165518.474504-4-mic%40digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11	selftests/landlock: Fix FS tests when run on a private mount point	Mickaël Salaün
	According to the test environment, the mount point of the test's working directory may be shared or not, which changes the visibility of the nested "tmp" mount point for the test's parent process calling umount("tmp"). This was spotted while running tests in containers [1], where mount points are private. Cc: Günther Noack <gnoack@google.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Link: https://github.com/landlock-lsm/landlock-test-tools/pull/4 [1] Fixes: 41cca0542d7c ("selftests/harness: Fix TEST_F()'s vfork handling") Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20240511171445.904356-3-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11	selftests/pidfd: Fix config for pidfd_setns_test	Mickaël Salaün
	Required by switch_timens() to open /proc/self/ns/time_for_children. CONFIG_GENERIC_VDSO_TIME_NS is not available on UML, so pidfd_setns_test cannot be run successfully on this architecture. Cc: Shuah Khan <skhan@linuxfoundation.org> Fixes: 2b40c5db73e2 ("selftests/pidfd: add pidfd setns tests") Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240511171445.904356-2-mic@digikod.net Signed-off-by: Mickaël Salaün <mic@digikod.net>
2024-05-11	perf pmu: Count sys and cpuid JSON events separately	Ian Rogers
	Sys events are eagerly loaded as each event has a compat option that may mean the event is or isn't associated with the PMU. These shouldn't be counted as loaded_json_events as that is used for JSON events matching the CPUID that may or may not have been loaded. The mismatch causes issues on ARM64 that uses sys events. Fixes: e6ff1eed3584362d ("perf pmu: Lazily add JSON events") Closes: https://lore.kernel.org/lkml/20240510024729.1075732-1-justin.he@arm.com/ Reported-by: Jia He <justin.he@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240511003601.2666907-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-11	perf stat: Don't display metric header for non-leader uncore events	Ian Rogers
	On an Intel tigerlake laptop a metric like: { "BriefDescription": "Test", "MetricExpr": "imc_free_running@data_read@ + imc_free_running@data_write@", "MetricGroup": "Test", "MetricName": "Test", "ScaleUnit": "6.103515625e-5MiB" }, Will have 4 events: uncore_imc_free_running_0/data_read/ uncore_imc_free_running_0/data_write/ uncore_imc_free_running_1/data_read/ uncore_imc_free_running_1/data_write/ If aggregration is disabled with metric-only 2 column headers are needed: $ perf stat -M test --metric-only -A -a sleep 1 Performance counter stats for 'system wide': MiB Test MiB Test CPU0 1821.0 1820.5 But when not, the counts aggregated in the metric leader and only 1 column should be shown: $ perf stat -M test --metric-only -a sleep 1 Performance counter stats for 'system wide': MiB Test 5909.4 1.001258915 seconds time elapsed Achieve this by skipping events that aren't metric leaders when printing column headers and aggregation isn't disabled. The bug is long standing, the fixes tag is set to a refactor as that is as far back as is reasonable to backport. Fixes: 088519f318be3a41 ("perf stat: Move the display functions to stat-display.c") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kaige Ye <ye@kaige.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240510051309.2452468-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-11	perf annotate-data: Ensure the number of type histograms	Namhyung Kim
	Arnaldo reported that there is a case where nr_histograms and histograms don't agree each other. It ended up in a segfault trying to access a NULL histograms array. Let's make sure to update the nr_histograms when the histograms array is changed. Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240510210452.2449944-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-11	perf annotate: Fix segfault on sample histogram	Namhyung Kim
	A symbol can have no samples, then accessing the annotated_source->samples hashmap will result in a segfault. Fixes: a3f7768bcf48281d ("perf annotate: Fix memory leak in annotated_source") Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240510210452.2449944-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-05-11	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next into ↵	Jens Axboe
	net-accept-more * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1557 commits) net: qede: use extack in qede_parse_actions() net: qede: propagate extack through qede_flow_spec_validate() net: qede: use faked extack in qede_flow_spec_to_rule() net: qede: use extack in qede_parse_flow_attr() net: qede: add extack in qede_add_tc_flower_fltr() net: qede: use extack in qede_flow_parse_udp_v4() net: qede: use extack in qede_flow_parse_udp_v6() net: qede: use extack in qede_flow_parse_tcp_v4() net: qede: use extack in qede_flow_parse_tcp_v6() net: qede: use extack in qede_flow_parse_v4_common() net: qede: use extack in qede_flow_parse_v6_common() net: qede: use extack in qede_set_v4_tuple_to_profile() net: qede: use extack in qede_set_v6_tuple_to_profile() net: qede: use extack in qede_flow_parse_ports() net: usb: smsc95xx: stop lying about skb->truesize net: dsa: microchip: Fix spellig mistake "configur" -> "configure" af_unix: Add dead flag to struct scm_fp_list. net: ethernet: adi: adin1110: Replace linux/gpio.h by proper one octeontx2-pf: Reuse Transmit queue/Send queue index of HTB class gve: Use ethtool_sprintf/puts() to fill stats strings ...
2024-05-11	Merge branch 'for-6.10/io_uring' into net-accept-more	Jens Axboe
	* for-6.10/io_uring: (97 commits) io_uring: support to inject result for NOP io_uring: fail NOP if non-zero op flags is passed in io_uring/net: add IORING_ACCEPT_POLL_FIRST flag io_uring/net: add IORING_ACCEPT_DONTWAIT flag io_uring/filetable: don't unnecessarily clear/reset bitmap io_uring/io-wq: Use set_bit() and test_bit() at worker->flags io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring io_uring: Require zeroed sqe->len on provided-buffers send io_uring/notif: disable LAZY_WAKE for linked notifs io_uring/net: fix sendzc lazy wake polling io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it io_uring/rw: reinstate thread check for retries io_uring/notif: implement notification stacking io_uring/notif: simplify io_notif_flush() net: add callback for setting a ubuf_info to skb net: extend ubuf_info callback to ops structure io_uring/net: support bundles for recv io_uring/net: support bundles for send io_uring/kbuf: add helpers for getting/peeking multiple buffers io_uring/net: add provided buffer support for IORING_OP_SEND ...
2024-05-11	csky: Emulate one-byte cmpxchg	Paul E. McKenney
	Use the new cmpxchg_emu_u8() to emulate one-byte cmpxchg() on csky. [ paulmck: Apply kernel test robot feedback. ] [ paulmck: Drop two-byte support per Arnd Bergmann feedback. ] Co-developed-by: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yujie Liu <yujie.liu@intel.com> Reviewed-by: Guo Ren <guoren@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: <linux-csky@vger.kernel.org>
2024-05-11	drm/bridge: aux-hpd-bridge: correct devm_drm_dp_hpd_bridge_add() stub	Dmitry Baryshkov
	If CONFIG_DRM_AUX_HPD_BRIDGE is not enabled, the aux-bridge.h header provides a stub for the bridge's functions. Correct the arguments list of one of those stubs to match the argument list of the non-stubbed function. Fixes: e5ca263508f7 ("drm/bridge: aux-hpd: separate allocation and registration") Reported-by: kernel test robot <lkp@intel.com> Cc: stable <stable@kernel.org> Closes: https://lore.kernel.org/oe-kbuild-all/202405110428.TMCfb1Ut-lkp@intel.com/ Cc: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20240511-fix-aux-hpd-stubs-v1-1-98dae71dfaec@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-11	Merge branch 'acpica'	Rafael J. Wysocki
	Merge ACPICA material for v6.10. This is mostly new material included in the 20240322 upstream ACPICA release. - Disable -Wstringop-truncation for some ACPICA code in the kernel to avoid a compiler warning that is not very useful (Arnd Bergmann). - Add EINJ CXL error types to actbl1.h (Ben Cheatham). - Add support for RAS2 table to ACPICA (Shiju Jose). - Fix various spelling mistakes in text files and code comments in ACPICA (Colin Ian King). - Fix spelling and typos in ACPICA (Saket Dumbre). - Modify ACPI_OBJECT_COMMON_HEADER (lijun). - Add RISC-V RINTC affinity structure support to ACPICA (Haibo Xu). - Fix CXL 3.0 structure (RDPAS) in the CEDT table (Hojin Nam). - Add missin increment of registered GPE count to ACPICA (Daniil Tatianin). - Mark new ACPICA release 20240322 (Saket Dumbre). - Add support for the AEST V2 table to ACPICA (Ruidong Tian). * acpica: ACPICA: AEST: Add support for the AEST V2 table ACPICA: Update acpixf.h for new ACPICA release 20240322 ACPICA: events/evgpeinit: don't forget to increment registered GPE count ACPICA: Fix CXL 3.0 structure (RDPAS) in the CEDT table ACPICA: SRAT: Add dump and compiler support for RINTC affinity structure ACPICA: SRAT: Add RISC-V RINTC affinity structure ACPICA: Modify ACPI_OBJECT_COMMON_HEADER ACPICA: Fix spelling and typos ACPICA: Clean up the fix for Issue #900 ACPICA: Fix various spelling mistakes in text files and code comments ACPICA: Attempt 1 to fix issue #900 ACPICA: ACPI 6.5: RAS2: Add support for RAS2 table ACPICA: actbl1.h: Add EINJ CXL error types ACPI: disable -Wstringop-truncation
2024-05-11	watchdog: LENOVO_SE10_WDT should depend on X86 && DMI	Geert Uytterhoeven
	The Lenovo SE10 watchdog is only present on Lenovo ThinkEdge SE10 platforms, which are based on Intel Atom SoCs, and its driver relies on DMI tables. Hence add dependencies on X86 && DMI, to prevent asking the user about this driver when configuring a kernel without Intel Atom or DMI support. While at it, fix the odd indentation (spaces instead of TABs). Fixes: 1f6602c8ed1eccac ("watchdog: lenovo_se10_wdt: Watchdog driver for Lenovo SE10 platform") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Mark Pearson <mpearson-lenovo@squebb.ca> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/58005595a05ef803b454b78d3ae9b8ee0675bd5d.1715076440.git.geert+renesas@glider.be Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@linux-watchdog.org>
2024-05-11	usb: fotg210: Add missing kernel doc description	Andy Shevchenko
	kernel-doc validator is not happy: warning: Function parameter or struct member 'fotg' not described in 'fotg210_vbus' Add missing description. Fixes: 3e679bde529e ("usb: fotg210-udc: Implement VBUS session") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/r/20240510152641.2421298-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-11	usb: dwc3: core: Fix unused variable warning in core driver	Krishna Kurapati
	While fixing a merge conflict in linux-next, hw_mode variable was left unused. Remove the unused variable in hs_phy_setup call. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/all/202405030439.AH8NR0Mg-lkp@intel.com/ Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/20240506074939.1833835-1-quic_kriskura@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-10	Merge branch 'mlx5-misc-fixes'	Jakub Kicinski
	Tariq Toukan says: ==================== mlx5 misc fixes This patchset provides bug fixes to mlx5 driver. Patch 1 by Shay fixes the error flow in mlx5e_suspend(). Patch 2 by Shay aligns the peer devlink set logic with the register devlink flow. Patch 3 by Maher solves a deadlock in lag enable/disable. Patches 4 and 5 by Akiva address issues in command interface corner cases. Series generated against: commit 393ceeb9211e ("Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'") ==================== Link: https://lore.kernel.org/r/20240509112951.590184-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10	net/mlx5: Discard command completions in internal error	Akiva Goldberger
	Fix use after free when FW completion arrives while device is in internal error state. Avoid calling completion handler in this case, since the device will flush the command interface and trigger all completions manually. Kernel log: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. ... RIP: 0010:refcount_warn_saturate+0xd8/0xe0 ... Call Trace: <IRQ> ? __warn+0x79/0x120 ? refcount_warn_saturate+0xd8/0xe0 ? report_bug+0x17c/0x190 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? refcount_warn_saturate+0xd8/0xe0 cmd_ent_put+0x13b/0x160 [mlx5_core] mlx5_cmd_comp_handler+0x5f9/0x670 [mlx5_core] cmd_comp_notifier+0x1f/0x30 [mlx5_core] notifier_call_chain+0x35/0xb0 atomic_notifier_call_chain+0x16/0x20 mlx5_eq_async_int+0xf6/0x290 [mlx5_core] notifier_call_chain+0x35/0xb0 atomic_notifier_call_chain+0x16/0x20 irq_int_handler+0x19/0x30 [mlx5_core] __handle_irq_event_percpu+0x4b/0x160 handle_irq_event+0x2e/0x80 handle_edge_irq+0x98/0x230 __common_interrupt+0x3b/0xa0 common_interrupt+0x7b/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling") Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240509112951.590184-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10	net/mlx5: Add a timeout to acquire the command queue semaphore	Akiva Goldberger
	Prevent forced completion handling on an entry that has not yet been assigned an index, causing an out of bounds access on idx = -22. Instead of waiting indefinitely for the sem, blocking flow now waits for index to be allocated or a sem acquisition timeout before beginning the timer for FW completion. Kernel log example: mlx5_core 0000:06:00.0: wait_func_handle_exec_timeout:1128:(pid 185911): cmd[-22]: CREATE_UCTX(0xa04) No done completion Fixes: 8e715cd613a1 ("net/mlx5: Set command entry semaphore up once got index free") Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20240509112951.590184-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10	net/mlx5: Reload only IB representors upon lag disable/enable	Maher Sanalla
	On lag disable, the bond IB device along with all of its representors are destroyed, and then the slaves' representors get reloaded. In case the slave IB representor load fails, the eswitch error flow unloads all representors, including ethernet representors, where the netdevs get detached and removed from lag bond. Such flow is inaccurate as the lag driver is not responsible for loading/unloading ethernet representors. Furthermore, the flow described above begins by holding lag lock to prevent bond changes during disable flow. However, when reaching the ethernet representors detachment from lag, the lag lock is required again, triggering the following deadlock: Call trace: __switch_to+0xf4/0x148 __schedule+0x2c8/0x7d0 schedule+0x50/0xe0 schedule_preempt_disabled+0x18/0x28 __mutex_lock.isra.13+0x2b8/0x570 __mutex_lock_slowpath+0x1c/0x28 mutex_lock+0x4c/0x68 mlx5_lag_remove_netdev+0x3c/0x1a0 [mlx5_core] mlx5e_uplink_rep_disable+0x70/0xa0 [mlx5_core] mlx5e_detach_netdev+0x6c/0xb0 [mlx5_core] mlx5e_netdev_change_profile+0x44/0x138 [mlx5_core] mlx5e_netdev_attach_nic_profile+0x28/0x38 [mlx5_core] mlx5e_vport_rep_unload+0x184/0x1b8 [mlx5_core] mlx5_esw_offloads_rep_load+0xd8/0xe0 [mlx5_core] mlx5_eswitch_reload_reps+0x74/0xd0 [mlx5_core] mlx5_disable_lag+0x130/0x138 [mlx5_core] mlx5_lag_disable_change+0x6c/0x70 [mlx5_core] // hold ldev->lock mlx5_devlink_eswitch_mode_set+0xc0/0x410 [mlx5_core] devlink_nl_cmd_eswitch_set_doit+0xdc/0x180 genl_family_rcv_msg_doit.isra.17+0xe8/0x138 genl_rcv_msg+0xe4/0x220 netlink_rcv_skb+0x44/0x108 genl_rcv+0x40/0x58 netlink_unicast+0x198/0x268 netlink_sendmsg+0x1d4/0x418 sock_sendmsg+0x54/0x60 __sys_sendto+0xf4/0x120 __arm64_sys_sendto+0x30/0x40 el0_svc_common+0x8c/0x120 do_el0_svc+0x30/0xa0 el0_svc+0x20/0x30 el0_sync_handler+0x90/0xb8 el0_sync+0x160/0x180 Thus, upon lag enable/disable, load and unload only the IB representors of the slaves preventing the deadlock mentioned above. While at it, refactor the mlx5_esw_offloads_rep_load() function to have a static helper method for its internal logic, in symmetry with the representor unload design. Fixes: 598fe77df855 ("net/mlx5: Lag, Create shared FDB when in switchdev mode") Co-developed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240509112951.590184-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10	net/mlx5: Fix peer devlink set for SF representor devlink port	Shay Drory
	The cited patch change register devlink flow, and neglect to reflect the changes for peer devlink set logic. Peer devlink set is triggering a call trace if done after devl_register.[1] Hence, align peer devlink set logic with register devlink flow. [1] WARNING: CPU: 4 PID: 3394 at net/devlink/core.c:155 devlink_rel_nested_in_add+0x177/0x180 CPU: 4 PID: 3394 Comm: kworker/u40:1 Not tainted 6.9.0-rc4_for_linust_min_debug_2024_04_16_14_08 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_vhca_event0 mlx5_vhca_state_work_handler [mlx5_core] RIP: 0010:devlink_rel_nested_in_add+0x177/0x180 Call Trace: <TASK> ? __warn+0x78/0x120 ? devlink_rel_nested_in_add+0x177/0x180 ? report_bug+0x16d/0x180 ? handle_bug+0x3c/0x60 ? exc_invalid_op+0x14/0x70 ? asm_exc_invalid_op+0x16/0x20 ? devlink_port_init+0x30/0x30 ? devlink_port_type_clear+0x50/0x50 ? devlink_rel_nested_in_add+0x177/0x180 ? devlink_rel_nested_in_add+0xdd/0x180 mlx5_sf_mdev_event+0x74/0xb0 [mlx5_core] notifier_call_chain+0x35/0xb0 blocking_notifier_call_chain+0x3d/0x60 mlx5_blocking_notifier_call_chain+0x22/0x30 [mlx5_core] mlx5_sf_dev_probe+0x185/0x3e0 [mlx5_core] auxiliary_bus_probe+0x38/0x80 ? driver_sysfs_add+0x51/0x80 really_probe+0xc5/0x3a0 ? driver_probe_device+0x90/0x90 __driver_probe_device+0x80/0x160 driver_probe_device+0x1e/0x90 __device_attach_driver+0x7d/0x100 bus_for_each_drv+0x80/0xd0 __device_attach+0xbc/0x1f0 bus_probe_device+0x86/0xa0 device_add+0x64f/0x860 __auxiliary_device_add+0x3b/0xa0 mlx5_sf_dev_add+0x139/0x330 [mlx5_core] mlx5_sf_dev_state_change_handler+0x1e4/0x250 [mlx5_core] notifier_call_chain+0x35/0xb0 blocking_notifier_call_chain+0x3d/0x60 mlx5_vhca_state_work_handler+0x151/0x200 [mlx5_core] process_one_work+0x13f/0x2e0 worker_thread+0x2bd/0x3c0 ? rescuer_thread+0x410/0x410 kthread+0xc4/0xf0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x2d/0x50 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork_asm+0x11/0x20 </TASK> Fixes: bf729988303a ("net/mlx5: Restore mistakenly dropped parts in register devlink flow") Fixes: c6e77aa9dd82 ("net/mlx5: Register devlink first under devlink lock") Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240509112951.590184-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-10	net/mlx5e: Fix netif state handling	Shay Drory
	mlx5e_suspend cleans resources only if netif_device_present() returns true. However, mlx5e_resume changes the state of netif, via mlx5e_nic_enable, only if reg_state == NETREG_REGISTERED. In the below case, the above leads to NULL-ptr Oops[1] and memory leaks: mlx5e_probe _mlx5e_resume mlx5e_attach_netdev mlx5e_nic_enable <-- netdev not reg, not calling netif_device_attach() register_netdev <-- failed for some reason. ERROR_FLOW: _mlx5e_suspend <-- netif_device_present return false, resources aren't freed :( Hence, clean resources in this case as well. [1] BUG: kernel NULL pointer dereference, address: 0000000000000000 PGD 0 P4D 0 Oops: 0010 [#1] SMP CPU: 2 PID: 9345 Comm: test-ovs-ct-gen Not tainted 6.5.0_for_upstream_min_debug_2023_09_05_16_01 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:0x0 Code: Unable to access opcode bytes at0xffffffffffffffd6. RSP: 0018:ffff888178aaf758 EFLAGS: 00010246 Call Trace: <TASK> ? __die+0x20/0x60 ? page_fault_oops+0x14c/0x3c0 ? exc_page_fault+0x75/0x140 ? asm_exc_page_fault+0x22/0x30 notifier_call_chain+0x35/0xb0 blocking_notifier_call_chain+0x3d/0x60 mlx5_blocking_notifier_call_chain+0x22/0x30 [mlx5_core] mlx5_core_uplink_netdev_event_replay+0x3e/0x60 [mlx5_core] mlx5_mdev_netdev_track+0x53/0x60 [mlx5_ib] mlx5_ib_roce_init+0xc3/0x340 [mlx5_ib] __mlx5_ib_add+0x34/0xd0 [mlx5_ib] mlx5r_probe+0xe1/0x210 [mlx5_ib] ? auxiliary_match_id+0x6a/0x90 auxiliary_bus_probe+0x38/0x80 ? driver_sysfs_add+0x51/0x80 really_probe+0xc9/0x3e0 ? driver_probe_device+0x90/0x90 __driver_probe_device+0x80/0x160 driver_probe_device+0x1e/0x90 __device_attach_driver+0x7d/0x100 bus_for_each_drv+0x80/0xd0 __device_attach+0xbc/0x1f0 bus_probe_device+0x86/0xa0 device_add+0x637/0x840 __auxiliary_device_add+0x3b/0xa0 add_adev+0xc9/0x140 [mlx5_core] mlx5_rescan_drivers_locked+0x22a/0x310 [mlx5_core] mlx5_register_device+0x53/0xa0 [mlx5_core] mlx5_init_one_devl_locked+0x5c4/0x9c0 [mlx5_core] mlx5_init_one+0x3b/0x60 [mlx5_core] probe_one+0x44c/0x730 [mlx5_core] local_pci_probe+0x3e/0x90 pci_device_probe+0xbf/0x210 ? kernfs_create_link+0x5d/0xa0 ? sysfs_do_create_link_sd+0x60/0xc0 really_probe+0xc9/0x3e0 ? driver_probe_device+0x90/0x90 __driver_probe_device+0x80/0x160 driver_probe_device+0x1e/0x90 __device_attach_driver+0x7d/0x100 bus_for_each_drv+0x80/0xd0 __device_attach+0xbc/0x1f0 pci_bus_add_device+0x54/0x80 pci_iov_add_virtfn+0x2e6/0x320 sriov_enable+0x208/0x420 mlx5_core_sriov_configure+0x9e/0x200 [mlx5_core] sriov_numvfs_store+0xae/0x1a0 kernfs_fop_write_iter+0x10c/0x1a0 vfs_write+0x291/0x3c0 ksys_write+0x5f/0xe0 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 CR2: 0000000000000000 ---[ end trace 0000000000000000 ]--- Fixes: 2c3b5beec46a ("net/mlx5e: More generic netdev management API") Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240509112951.590184-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>