linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-11-11	drm/fourcc: add AMD_FMT_MOD_TILE_GFX9_4K_D_X	Qiang Yu
	This is used when radeonsi export small texture's modifier to user with eglExportDMABUFImageQueryMESA(). mesa changes is available here: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31658 Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Qiang Yu <qiang.yu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amdgpu/mes12: correct kiq unmap latency	Jack Xiao
	Correct kiq unmap queue timeout value. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amdgpu: Support vcn and jpeg error info parsing	Stanley.Yang
	Add vcn and jpeg error count parsing. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amd : Update MES API header file for v11 & v12	Shaoyun Liu
	New features require the new fields defines Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling	Shaoyun Liu
	Add back kfd queues in start scheduling that originally been removed on stop scheduling. Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	Linus Torvalds
	Pull virtio fixes from Michael Tsirkin: "Several small bugfixes all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa/mlx5: Fix error path during device add vp_vdpa: fix id_table array not null terminated error virtio_pci: Fix admin vq cleanup by using correct info pointer vDPA/ifcvf: Fix pci_read_config_byte() return code handling Fix typo in vringh_test.c vdpa: solidrun: Fix UB bug with devres vsock/virtio: Initialization of the dangling pointer occurring in vsk->trans
2024-11-11	sched_ext: Rename scx_bpf_dispatch[_vtime]_from_dsq*() -> ↵	Tejun Heo
	scx_bpf_dsq_move[_vtime]() In sched_ext API, a repeatedly reported pain point is the overuse of the verb "dispatch" and confusion around "consume": - ops.dispatch() - scx_bpf_dispatch[_vtime]() - scx_bpf_consume() - scx_bpf_dispatch[_vtime]_from_dsq() This overloading of the term is historical. Originally, there were only built-in DSQs and moving a task into a DSQ always dispatched it for execution. Using the verb "dispatch" for the kfuncs to move tasks into these DSQs made sense. Later, user DSQs were added and scx_bpf_dispatch[_vtime]() updated to be able to insert tasks into any DSQ. The only allowed DSQ to DSQ transfer was from a non-local DSQ to a local DSQ and this operation was named "consume". This was already confusing as a task could be dispatched to a user DSQ from ops.enqueue() and then the DSQ would have to be consumed in ops.dispatch(). Later addition of scx_bpf_dispatch_from_dsq() made the confusion even worse as "dispatch" in this context meant moving a task to an arbitrary DSQ from a user DSQ. Clean up the API with the following renames: 1. scx_bpf_dispatch[_vtime]() -> scx_bpf_dsq_insert[_vtime]() 2. scx_bpf_consume() -> scx_bpf_dsq_move_to_local() 3. scx_bpf_dispatch[_vtime]_from_dsq() -> scx_bpf_dsq_move[_vtime]() This patch performs the third set of renames. Compatibility is maintained by: - The previous kfunc names are still provided by the kernel so that old binaries can run. Kernel generates a warning when the old names are used. - compat.bpf.h provides wrappers for the new names which automatically fall back to the old names when running on older kernels. They also trigger build error if old names are used for new builds. - scx_bpf_dispatch[_vtime]_from_dsq() were already wrapped in __COMPAT macros as they were introduced during v6.12 cycle. Wrap new API in __COMPAT macros too and trigger build errors on both __COMPAT prefixed and naked usages of the old names. The compat features will be dropped after v6.15. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com> Acked-by: Changwoo Min <changwoo@igalia.com> Acked-by: Johannes Bechberger <me@mostlynerdless.de> Acked-by: Giovanni Gherdovich <ggherdovich@suse.com> Cc: Dan Schatzberg <dschatzberg@meta.com> Cc: Ming Yang <yougmark94@gmail.com>
2024-11-11	sched_ext: Rename scx_bpf_consume() to scx_bpf_dsq_move_to_local()	Tejun Heo
	In sched_ext API, a repeatedly reported pain point is the overuse of the verb "dispatch" and confusion around "consume": - ops.dispatch() - scx_bpf_dispatch[_vtime]() - scx_bpf_consume() - scx_bpf_dispatch[_vtime]_from_dsq() This overloading of the term is historical. Originally, there were only built-in DSQs and moving a task into a DSQ always dispatched it for execution. Using the verb "dispatch" for the kfuncs to move tasks into these DSQs made sense. Later, user DSQs were added and scx_bpf_dispatch[_vtime]() updated to be able to insert tasks into any DSQ. The only allowed DSQ to DSQ transfer was from a non-local DSQ to a local DSQ and this operation was named "consume". This was already confusing as a task could be dispatched to a user DSQ from ops.enqueue() and then the DSQ would have to be consumed in ops.dispatch(). Later addition of scx_bpf_dispatch_from_dsq() made the confusion even worse as "dispatch" in this context meant moving a task to an arbitrary DSQ from a user DSQ. Clean up the API with the following renames: 1. scx_bpf_dispatch[_vtime]() -> scx_bpf_dsq_insert[_vtime]() 2. scx_bpf_consume() -> scx_bpf_dsq_move_to_local() 3. scx_bpf_dispatch[_vtime]_from_dsq() -> scx_bpf_dsq_move[_vtime]() This patch performs the second rename. Compatibility is maintained by: - The previous kfunc names are still provided by the kernel so that old binaries can run. Kernel generates a warning when the old names are used. - compat.bpf.h provides wrappers for the new names which automatically fall back to the old names when running on older kernels. They also trigger build error if old names are used for new builds. The compat features will be dropped after v6.15. v2: Comment and documentation updates. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com> Acked-by: Changwoo Min <changwoo@igalia.com> Acked-by: Johannes Bechberger <me@mostlynerdless.de> Acked-by: Giovanni Gherdovich <ggherdovich@suse.com> Cc: Dan Schatzberg <dschatzberg@meta.com> Cc: Ming Yang <yougmark94@gmail.com>
2024-11-11	sched_ext: Rename scx_bpf_dispatch[_vtime]() to scx_bpf_dsq_insert[_vtime]()	Tejun Heo
	In sched_ext API, a repeatedly reported pain point is the overuse of the verb "dispatch" and confusion around "consume": - ops.dispatch() - scx_bpf_dispatch[_vtime]() - scx_bpf_consume() - scx_bpf_dispatch[_vtime]_from_dsq() This overloading of the term is historical. Originally, there were only built-in DSQs and moving a task into a DSQ always dispatched it for execution. Using the verb "dispatch" for the kfuncs to move tasks into these DSQs made sense. Later, user DSQs were added and scx_bpf_dispatch[_vtime]() updated to be able to insert tasks into any DSQ. The only allowed DSQ to DSQ transfer was from a non-local DSQ to a local DSQ and this operation was named "consume". This was already confusing as a task could be dispatched to a user DSQ from ops.enqueue() and then the DSQ would have to be consumed in ops.dispatch(). Later addition of scx_bpf_dispatch_from_dsq() made the confusion even worse as "dispatch" in this context meant moving a task to an arbitrary DSQ from a user DSQ. Clean up the API with the following renames: 1. scx_bpf_dispatch[_vtime]() -> scx_bpf_dsq_insert[_vtime]() 2. scx_bpf_consume() -> scx_bpf_dsq_move_to_local() 3. scx_bpf_dispatch[_vtime]_from_dsq() -> scx_bpf_dsq_move[_vtime]() This patch performs the first set of renames. Compatibility is maintained by: - The previous kfunc names are still provided by the kernel so that old binaries can run. Kernel generates a warning when the old names are used. - compat.bpf.h provides wrappers for the new names which automatically fall back to the old names when running on older kernels. They also trigger build error if old names are used for new builds. The compat features will be dropped after v6.15. v2: Documentation updates. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com> Acked-by: Changwoo Min <changwoo@igalia.com> Acked-by: Johannes Bechberger <me@mostlynerdless.de> Acked-by: Giovanni Gherdovich <ggherdovich@suse.com> Cc: Dan Schatzberg <dschatzberg@meta.com> Cc: Ming Yang <yougmark94@gmail.com>
2024-11-11	ASoC: max98088: Add headphone mixer switch	Marek Vasut
	Add enable/disable headphone output mixers knob. Signed-off-by: Marek Vasut <marex@denx.de> Link: https://patch.msgid.link/20241111165523.113142-1-marex@denx.de Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-11	ASoC: max98088: Add left/right DAC volume control	Marek Vasut
	Add left/right DAC digital volume control knob. Signed-off-by: Marek Vasut <marex@denx.de> Link: https://patch.msgid.link/20241111165435.113086-1-marex@denx.de Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-11	drm/amdkfd: change kfd process kref count at creation	Xiaogang Chen
	kfd process kref count(process->ref) is initialized to 1 by kref_init. After it is created not need to increase its kref. Instad add kfd process kref at kfd process mmu notifier allocation since we already decrease the kref at free_notifier of mmu_notifier_ops, so pair them. When user process opens kfd node multiple times the kfd process kref is increased each time to balance with kfd node close operation. Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amdgpu: Cleanup shift coding style	Advait Dhamorikar
	Improves the coding style by updating bit-shift operations in the amdgpu_jpeg.c driver file. It ensures consistency and avoids potential issues by explicitly using 1U and 1ULL for unsigned and unsigned long long shifts in all relevant instances. Signed-off-by: Advait Dhamorikar <advaitdhamorikar@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amd/amdgpu: Increase MES log buffer to dump mes scratch data	shaoyunl
	MES internal scratch data is useful for mes debug, it can only located in VRAM, change the allocation type and increase size for mes 11 Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Acked-by: Feifei Xu <Feifei.Xu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amdgpu: Implement virt req_ras_err_count	Victor Skvortsov
	Enable RAS late init if VF RAS Telemetry is supported. When enabled, the VF can use this interface to query total RAS error counts from the host. The VF FB access may abruptly end due to a fatal error, therefore the VF must cache and sanitize the input. The Host allows 15 Telemetry messages every 60 seconds, afterwhich the host will ignore any more in-coming telemetry messages. The VF will rate limit its msg calling to once every 5 seconds (12 times in 60 seconds). While the VF is rate limited, it will continue to report the last good cached data. v2: Flip generate report & update statistics order for VF Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Acked-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amdgpu: VF Query RAS Caps from Host if supported	Victor Skvortsov
	If VF RAS Capability support is enabled, guest is able to retrieve the real RAS support from the host. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amdgpu: Add msg handlers for SRIOV RAS Telemetry	Victor Skvortsov
	Add message handlers for RAS telemetry. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amdgpu: Update SRIOV Exchange Headers for RAS Telemetry Support	Victor Skvortsov
	The SRIOV PF/VF Data exchange is extended by 64KB for VF RAS Telemetry data. Add Host RAS Telemetry enable capabilities bitfields. Add a new VF msg REQ_RAS_ERROR_COUNT, the host response data will be populated in the RAS Telemetry region. Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amd/display: 3.2.309	Aric Cyr
	This version brings along the following: - DML2 fixes - DP fixes - DPMS fix - HPD fixes - Misc cleanup - ODM fix - Replay fix - SPL fix Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amd/display: Adjust VSDB parser for replay feature	Rodrigo Siqueira
	At some point, the IEEE ID identification for the replay check in the AMD EDID was added. However, this check causes the following out-of-bounds issues when using KASAN: [ 27.804016] BUG: KASAN: slab-out-of-bounds in amdgpu_dm_update_freesync_caps+0xefa/0x17a0 [amdgpu] [ 27.804788] Read of size 1 at addr ffff8881647fdb00 by task systemd-udevd/383 ... [ 27.821207] Memory state around the buggy address: [ 27.821215] ffff8881647fda00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 27.821224] ffff8881647fda80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 27.821234] >ffff8881647fdb00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 27.821243] ^ [ 27.821250] ffff8881647fdb80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 27.821259] ffff8881647fdc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 27.821268] ================================================================== This is caused because the ID extraction happens outside of the range of the edid lenght. This commit addresses this issue by considering the amd_vsdb_block size. Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com> Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amd/display: Remove unused code	Rodrigo Siqueira
	This commit removes a legacy debug_defaults_diags struct. Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	drm/amd/display: Require minimum VBlank size for stutter optimization	Dillon Varone
	If the nominal VBlank is too small, optimizing for stutter can cause the prefetch bandwidth to increase drasticaly, resulting in higher clock and power requirements. Only optimize if it is >3x the stutter latency. Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11	nvme-multipath: don't bother clearing max_hw_zone_append_sectors	Christoph Hellwig
	The limits stacking now properly zeroes it if at least one of the underlying limits clears it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20241108154657.845768-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11	block: pre-calculate max_zone_append_sectors	Christoph Hellwig
	max_zone_append_sectors differs from all other queue limits in that the final value used is not stored in the queue_limits but needs to be obtained using queue_limits_max_zone_append_sectors helper. This not only adds (tiny) extra overhead to the I/O path, but also can be easily forgotten in file system code. Add a new max_hw_zone_append_sectors value to queue_limits which is set by the driver, and calculate max_zone_append_sectors from that and the other inputs in blk_validate_zoned_limits, similar to how max_sectors is calculated to fix this. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241104073955.112324-3-hch@lst.de Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20241108154657.845768-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-11	Merge branch 'refactor-lock-management'	Alexei Starovoitov
	Kumar Kartikeya Dwivedi says: ==================== Refactor lock management This set refactors lock management in the verifier in preparation for spin locks that can be acquired multiple times. In addition to this, unnecessary code special case reference leak logic for callbacks is also dropped, that is no longer necessary. See patches for details. Changelog: ---------- v5 -> v6 v5: https://lore.kernel.org/bpf/20241109225243.2306756-1-memxor@gmail.com * Move active_locks mutation to {acquire,release}_lock_state (Alexei) v4 -> v5 v4: https://lore.kernel.org/bpf/20241109074347.1434011-1-memxor@gmail.com * Make active_locks part of bpf_func_state (Alexei) * Remove unneeded in_callback_fn logic for references v3 -> v4 v3: https://lore.kernel.org/bpf/20241104151716.2079893-1-memxor@gmail.com * Address comments from Alexei * Drop struct bpf_active_lock definition * Name enum type, expand definition to multiple lines * s/REF_TYPE_BPF_LOCK/REF_TYPE_LOCK/g * Change active_lock type to int * Fix type of 'type' in acquire_lock_state * Filter by taking type explicitly in find_lock_state * WARN for default case in refsafe switch statement v2 -> v3 v2: https://lore.kernel.org/bpf/20241103212252.547071-1-memxor@gmail.com * Rebase on bpf-next to resolve merge conflict v1 -> v2 v1: https://lore.kernel.org/bpf/20241103205856.345580-1-memxor@gmail.com * Fix refsafe state comparison to check callback_ref and ptr separately. ==================== Link: https://lore.kernel.org/r/20241109231430.2475236-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	bpf: Drop special callback reference handling	Kumar Kartikeya Dwivedi
	Logic to prevent callbacks from acquiring new references for the program (i.e. leaving acquired references), and releasing caller references (i.e. those acquired in parent frames) was introduced in commit 9d9d00ac29d0 ("bpf: Fix reference state management for synchronous callbacks"). This was necessary because back then, the verifier simulated each callback once (that could potentially be executed N times, where N can be zero). This meant that callbacks that left lingering resources or cleared caller resources could do it more than once, operating on undefined state or leaking memory. With the fixes to callback verification in commit ab5cfac139ab ("bpf: verify callbacks as if they are called unknown number of times"), all of this extra logic is no longer necessary. Hence, drop it as part of this commit. Cc: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241109231430.2475236-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	bpf: Refactor active lock management	Kumar Kartikeya Dwivedi
	When bpf_spin_lock was introduced originally, there was deliberation on whether to use an array of lock IDs, but since bpf_spin_lock is limited to holding a single lock at any given time, we've been using a single ID to identify the held lock. In preparation for introducing spin locks that can be taken multiple times, introduce support for acquiring multiple lock IDs. For this purpose, reuse the acquired_refs array and store both lock and pointer references. We tag the entry with REF_TYPE_PTR or REF_TYPE_LOCK to disambiguate and find the relevant entry. The ptr field is used to track the map_ptr or btf (for bpf_obj_new allocations) to ensure locks can be matched with protected fields within the same "allocation", i.e. bpf_obj_new object or map value. The struct active_lock is changed to an int as the state is part of the acquired_refs array, and we only need active_lock as a cheap way of detecting lock presence. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241109231430.2475236-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	selftests/bpf: skip the timer_lockup test for single-CPU nodes	Viktor Malik
	The timer_lockup test needs 2 CPUs to work, on single-CPU nodes it fails to set thread affinity to CPU 1 since it doesn't exist: # ./test_progs -t timer_lockup test_timer_lockup:PASS:timer_lockup__open_and_load 0 nsec test_timer_lockup:PASS:pthread_create thread1 0 nsec test_timer_lockup:PASS:pthread_create thread2 0 nsec timer_lockup_thread:PASS:cpu affinity 0 nsec timer_lockup_thread:FAIL:cpu affinity unexpected error: 22 (errno 0) test_timer_lockup:PASS: 0 nsec #406 timer_lockup:FAIL Skip the test if only 1 CPU is available. Signed-off-by: Viktor Malik <vmalik@redhat.com> Fixes: 50bd5a0c658d1 ("selftests/bpf: Add timer lockup selftest") Tested-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241107115231.75200-1-vmalik@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	Merge branch 'fix-lockdep-warning-for-htab-of-map'	Alexei Starovoitov
	Hou Tao says: ==================== The patch set fixes a lockdep warning for htab of map. The warning is found when running test_maps. The warning occurs when htab_put_fd_value() attempts to acquire map_idr_lock to free the map id of the inner map while already holding the bucket lock (raw_spinlock_t). The fix moves the invocation of free_htab_elem() after htab_unlock_bucket() and adds a test case to verify the solution. ==================== Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	selftests/bpf: Test the update operations for htab of maps	Hou Tao
	Add test cases to verify the following four update operations on htab of maps don't trigger lockdep warning: (1) add then delete (2) add, overwrite, then delete (3) add, then lookup_and_delete (4) add two elements, then lookup_and_delete_batch Test cases are added for pre-allocated and non-preallocated htab of maps respectively. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241106063542.357743-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	selftests/bpf: Move ENOTSUPP from bpf_util.h	Hou Tao
	Moving the definition of ENOTSUPP into bpf_util.h to remove the duplicated definitions in multiple files. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241106063542.357743-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	bpf: Call free_htab_elem() after htab_unlock_bucket()	Hou Tao
	For htab of maps, when the map is removed from the htab, it may hold the last reference of the map. bpf_map_fd_put_ptr() will invoke bpf_map_free_id() to free the id of the removed map element. However, bpf_map_fd_put_ptr() is invoked while holding a bucket lock (raw_spin_lock_t), and bpf_map_free_id() attempts to acquire map_idr_lock (spinlock_t), triggering the following lockdep warning: ============================= [ BUG: Invalid wait context ] 6.11.0-rc4+ #49 Not tainted ----------------------------- test_maps/4881 is trying to lock: ffffffff84884578 (map_idr_lock){+...}-{3:3}, at: bpf_map_free_id.part.0+0x21/0x70 other info that might help us debug this: context-{5:5} 2 locks held by test_maps/4881: #0: ffffffff846caf60 (rcu_read_lock){....}-{1:3}, at: bpf_fd_htab_map_update_elem+0xf9/0x270 #1: ffff888149ced148 (&htab->lockdep_key#2){....}-{2:2}, at: htab_map_update_elem+0x178/0xa80 stack backtrace: CPU: 0 UID: 0 PID: 4881 Comm: test_maps Not tainted 6.11.0-rc4+ #49 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... Call Trace: <TASK> dump_stack_lvl+0x6e/0xb0 dump_stack+0x10/0x20 __lock_acquire+0x73e/0x36c0 lock_acquire+0x182/0x450 _raw_spin_lock_irqsave+0x43/0x70 bpf_map_free_id.part.0+0x21/0x70 bpf_map_put+0xcf/0x110 bpf_map_fd_put_ptr+0x9a/0xb0 free_htab_elem+0x69/0xe0 htab_map_update_elem+0x50f/0xa80 bpf_fd_htab_map_update_elem+0x131/0x270 htab_map_update_elem+0x50f/0xa80 bpf_fd_htab_map_update_elem+0x131/0x270 bpf_map_update_value+0x266/0x380 __sys_bpf+0x21bb/0x36b0 __x64_sys_bpf+0x45/0x60 x64_sys_call+0x1b2a/0x20d0 do_syscall_64+0x5d/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e One way to fix the lockdep warning is using raw_spinlock_t for map_idr_lock as well. However, bpf_map_alloc_id() invokes idr_alloc_cyclic() after acquiring map_idr_lock, it will trigger a similar lockdep warning because the slab's lock (s->cpu_slab->lock) is still a spinlock. Instead of changing map_idr_lock's type, fix the issue by invoking htab_put_fd_value() after htab_unlock_bucket(). However, only deferring the invocation of htab_put_fd_value() is not enough, because the old map pointers in htab of maps can not be saved during batched deletion. Therefore, also defer the invocation of free_htab_elem(), so these to-be-freed elements could be linked together similar to lru map. There are four callers for ->map_fd_put_ptr: (1) alloc_htab_elem() (through htab_put_fd_value()) It invokes ->map_fd_put_ptr() under a raw_spinlock_t. The invocation of htab_put_fd_value() can not simply move after htab_unlock_bucket(), because the old element has already been stashed in htab->extra_elems. It may be reused immediately after htab_unlock_bucket() and the invocation of htab_put_fd_value() after htab_unlock_bucket() may release the newly-added element incorrectly. Therefore, saving the map pointer of the old element for htab of maps before unlocking the bucket and releasing the map_ptr after unlock. Beside the map pointer in the old element, should do the same thing for the special fields in the old element as well. (2) free_htab_elem() (through htab_put_fd_value()) Its caller includes __htab_map_lookup_and_delete_elem(), htab_map_delete_elem() and __htab_map_lookup_and_delete_batch(). For htab_map_delete_elem(), simply invoke free_htab_elem() after htab_unlock_bucket(). For __htab_map_lookup_and_delete_batch(), just like lru map, linking the to-be-freed element into node_to_free list and invoking free_htab_elem() for these element after unlock. It is safe to reuse batch_flink as the link for node_to_free, because these elements have been removed from the hash llist. Because htab of maps doesn't support lookup_and_delete operation, __htab_map_lookup_and_delete_elem() doesn't have the problem, so kept it as is. (3) fd_htab_map_free() It invokes ->map_fd_put_ptr without raw_spinlock_t. (4) bpf_fd_htab_map_update_elem() It invokes ->map_fd_put_ptr without raw_spinlock_t. After moving free_htab_elem() outside htab bucket lock scope, using pcpu_freelist_push() instead of __pcpu_freelist_push() to disable the irq before freeing elements, and protecting the invocations of bpf_mem_cache_free() with migrate_{disable\|enable} pair. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241106063542.357743-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	Merge branch 'bpf-add-uprobe-session-support'	Andrii Nakryiko
	Jiri Olsa says: ==================== bpf: Add uprobe session support hi, this patchset is adding support for session uprobe attachment and using it through bpf link for bpf programs. The session means that the uprobe consumer is executed on entry and return of probed function with additional control: - entry callback can control execution of the return callback - entry and return callbacks can share data/cookie Uprobe changes (on top of perf/core [1] are posted in here [2]. This patchset is based on bpf-next/master and will be merged once we pull [2] in bpf-next/master. v9 changes: - rebased on bpf-next/master with perf/core tag merged (thanks Peter!) thanks, jirka [1] git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/core [2] https://lore.kernel.org/bpf/20241018202252.693462-1-jolsa@kernel.org/T/#ma43c549c4bf684ca1b17fa638aa5e7cbb46893e9 --- ==================== Link: https://lore.kernel.org/r/20241108134544.480660-1-jolsa@kernel.org Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	selftests/bpf: Add threads to consumer test	Jiri Olsa
	With recent uprobe fix [1] the sync time after unregistering uprobe is much longer and prolongs the consumer test which creates and destroys hundreds of uprobes. This change adds 16 threads (which fits the test logic) and speeds up the test. Before the change: # perf stat --null ./test_progs -t uprobe_multi_test/consumers #421/9 uprobe_multi_test/consumers:OK #421 uprobe_multi_test:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Performance counter stats for './test_progs -t uprobe_multi_test/consumers': 28.818778973 seconds time elapsed 0.745518000 seconds user 0.919186000 seconds sys After the change: # perf stat --null ./test_progs -t uprobe_multi_test/consumers 2>&1 #421/9 uprobe_multi_test/consumers:OK #421 uprobe_multi_test:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Performance counter stats for './test_progs -t uprobe_multi_test/consumers': 3.504790814 seconds time elapsed 0.012141000 seconds user 0.751760000 seconds sys [1] commit 87195a1ee332 ("uprobes: switch to RCU Tasks Trace flavor for better performance") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-14-jolsa@kernel.org
2024-11-11	selftests/bpf: Add uprobe sessions to consumer test	Jiri Olsa
	Adding uprobe session consumers to the consumer test, so we get the session into the test mix. In addition scaling down the test to have just 1 uprobe and 1 uretprobe, otherwise the test time grows and is unsuitable for CI even with threads. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-13-jolsa@kernel.org
2024-11-11	selftests/bpf: Add uprobe session single consumer test	Jiri Olsa
	Testing that the session ret_handler bypass works on single uprobe with multiple consumers, each with different session ignore return value. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-12-jolsa@kernel.org
2024-11-11	selftests/bpf: Add kprobe session verifier test for return value	Jiri Olsa
	Making sure kprobe.session program can return only [0,1] values. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-11-jolsa@kernel.org
2024-11-11	selftests/bpf: Add uprobe session verifier test for return value	Jiri Olsa
	Making sure uprobe.session program can return only [0,1] values. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-10-jolsa@kernel.org
2024-11-11	selftests/bpf: Add uprobe session recursive test	Jiri Olsa
	Adding uprobe session test that verifies the cookie value is stored properly when single uprobe-ed function is executed recursively. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-9-jolsa@kernel.org
2024-11-11	selftests/bpf: Add uprobe session cookie test	Jiri Olsa
	Adding uprobe session test that verifies the cookie value get properly propagated from entry to return program. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-8-jolsa@kernel.org
2024-11-11	selftests/bpf: Add uprobe session test	Jiri Olsa
	Adding uprobe session test and testing that the entry program return value controls execution of the return probe program. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-7-jolsa@kernel.org
2024-11-11	libbpf: Add support for uprobe multi session attach	Jiri Olsa
	Adding support to attach program in uprobe session mode with bpf_program__attach_uprobe_multi function. Adding session bool to bpf_uprobe_multi_opts struct that allows to load and attach the bpf program via uprobe session. the attachment to create uprobe multi session. Also adding new program loader section that allows: SEC("uprobe.session/bpf_fentry_test*") and loads/attaches uprobe program as uprobe session. Adding sleepable hook (uprobe.session.s) as well. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-6-jolsa@kernel.org
2024-11-11	bpf: Add support for uprobe multi session context	Jiri Olsa
	Placing bpf_session_run_ctx layer in between bpf_run_ctx and bpf_uprobe_multi_run_ctx, so the session data can be retrieved from uprobe_multi link. Plus granting session kfuncs access to uprobe session programs. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-5-jolsa@kernel.org
2024-11-11	bpf: Add support for uprobe multi session attach	Jiri Olsa
	Adding support to attach BPF program for entry and return probe of the same function. This is common use case which at the moment requires to create two uprobe multi links. Adding new BPF_TRACE_UPROBE_SESSION attach type that instructs kernel to attach single link program to both entry and exit probe. It's possible to control execution of the BPF program on return probe simply by returning zero or non zero from the entry BPF program execution to execute or not the BPF program on return probe respectively. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-4-jolsa@kernel.org
2024-11-11	bpf: Force uprobe bpf program to always return 0	Jiri Olsa
	As suggested by Andrii make uprobe multi bpf programs to always return 0, so they can't force uprobe removal. Keeping the int return type for uprobe_prog_run, because it will be used in following session changes. Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link") Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-3-jolsa@kernel.org
2024-11-11	bpf: Allow return values 0 and 1 for kprobe session	Jiri Olsa
	The kprobe session program can return only 0 or 1, instruct verifier to check for that. Fixes: 535a3692ba72 ("bpf: Add support for kprobe session attach") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241108134544.480660-2-jolsa@kernel.org
2024-11-11	selftests/bpf: Fix uprobe consumer test (again)	Jiri Olsa
	The new uprobe changes bring some new behaviour that we need to reflect in the consumer test. Now pending uprobe instance in the kernel can survive longer and thus might call uretprobe consumer callbacks in some situations in which, previously, such callback would be omitted. We now need to take that into account in uprobe-multi consumer tests. The idea being that uretprobe under test either stayed from before to after (uret_stays + test_bit) or uretprobe instance survived and we have uretprobe active in after (uret_survives + test_bit). uret_survives just states that uretprobe survives if there are any uretprobes both before and after (overlapping or not, doesn't matter) and uprobe was attached before. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241107094337.3848210-1-jolsa@kernel.org
2024-11-11	bpf: Remove trailing whitespace in verifier.rst	Abhinav Saxena
	Remove trailing whitespace in Documentation/bpf/verifier.rst. Signed-off-by: Abhinav Saxena <xandfury@gmail.com> Link: https://lore.kernel.org/r/20241107063708.106340-2-xandfury@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	selftests/bpf: Allow building with extra flags	Viktor Malik
	In order to specify extra compilation or linking flags to BPF selftests, it is possible to set EXTRA_CFLAGS and EXTRA_LDFLAGS from the command line. The problem is that they are not propagated to sub-make calls (runqslower, bpftool, libbpf) and in the better case are not applied, in the worse case cause the entire build fail. Propagate EXTRA_CFLAGS and EXTRA_LDFLAGS to the sub-makes. This, for instance, allows to build selftests as PIE with $ make EXTRA_CFLAGS='-fPIE' EXTRA_LDFLAGS='-pie' Without this change, the command would fail because libbpf.a would not be built with -fPIE and other PIE binaries would not link against it. The only problem is that we have to explicitly provide empty EXTRA_CFLAGS='' and EXTRA_LDFLAGS='' to the builds of kernel modules as we don't want to build modules with flags used for userspace (the above example would fail as kernel doesn't support PIE). Signed-off-by: Viktor Malik <vmalik@redhat.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2024-11-11	block: lift bio_is_zone_append to bio.h	Christoph Hellwig
	Make bio_is_zone_append globally available, because file systems need to use to check for a zone append bio in their end_io handlers to deal with the block layer emulation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20241104062647.91160-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>