summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2025-05-21cgroup: avoid per-cpu allocation of size zero rstat cpu locksJP Kobryn
Subsystem rstat locks are dynamically allocated per-cpu. It was discovered that a panic can occur during this allocation when the lock size is zero. This is the case on non-smp systems, since arch_spinlock_t is defined as an empty struct. Prevent this allocation when !CONFIG_SMP by adding a pre-processor conditional around the affected block. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Reported-by: Klara Modin <klarasmodin@gmail.com> Fixes: 748922dcfabd ("cgroup: use subsystem-specific rstat locks to avoid contention") Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-21cgroup, docs: be specific about bandwidth control of rt processesShashank Balaji
Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-19cgroup: document the rstat per-cpu initializationJP Kobryn
The calls to css_rstat_init() occur at different places depending on the context. Document the conditions that determine which point of initialization is used. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-19cgroup: helper for checking rstat participation of cssJP Kobryn
There are a few places where a conditional check is performed to validate a given css on its rstat participation. This new helper tries to make the code more readable where this check is performed. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-19cgroup: use subsystem-specific rstat locks to avoid contentionJP Kobryn
It is possible to eliminate contention between subsystems when updating/flushing stats by using subsystem-specific locks. Let the existing rstat locks be dedicated to the cgroup base stats and rename them to reflect that. Add similar locks to the cgroup_subsys struct for use with individual subsystems. Lock initialization is done in the new function ss_rstat_init(ss) which replaces cgroup_rstat_boot(void). If NULL is passed to this function, the global base stat locks will be initialized. Otherwise, the subsystem locks will be initialized. Change the existing lock helper functions to accept a reference to a css. Then within these functions, conditionally select the appropriate locks based on the subsystem affiliation of the given css. Add helper functions for this selection routine to avoid repeated code. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-19cgroup: use separate rstat trees for each subsystemJP Kobryn
Different subsystems may call cgroup_rstat_updated() within the same cgroup, resulting in a tree of pending updates from multiple subsystems. When one of these subsystems is flushed via cgroup_rstat_flushed(), all other subsystems with pending updates on the tree will also be flushed. Change the paradigm of having a single rstat tree for all subsystems to having separate trees for each subsystem. This separation allows for subsystems to perform flushes without the side effects of other subsystems. As an example, flushing the cpu stats will no longer cause the memory stats to be flushed and vice versa. In order to achieve subsystem-specific trees, change the tree node type from cgroup to cgroup_subsys_state pointer. Then remove those pointers from the cgroup and instead place them on the css. Finally, change update/flush functions to make use of the different node type (css). These changes allow a specific subsystem to be associated with an update or flush. Separate rstat trees will now exist for each unique subsystem. Since updating/flushing will now be done at the subsystem level, there is no longer a need to keep track of updated css nodes at the cgroup level. The list management of these nodes done within the cgroup (rstat_css_list and related) has been removed accordingly. Conditional guards for checking validity of a given css were placed within css_rstat_updated/flush() to prevent undefined behavior occuring from kfunc usage in bpf programs. Guards were also placed within css_rstat_init/exit() in order to help consolidate calls to them. At call sites for all four functions, the existing guards were removed. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-19cgroup: compare css to cgroup::self in helper for distingushing cssJP Kobryn
Adjust the implementation of css_is_cgroup() so that it compares the given css to cgroup::self. Rename the function to css_is_self() in order to reflect that. Change the existing css->ss NULL check to a warning in the true branch. Finally, adjust call sites to use the new function name. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-19cgroup: warn on rstat usage by early init subsystemsJP Kobryn
An early init subsystem that attempts to make use of rstat can lead to failures during early boot. The reason for this is the timing in which the css's of the root cgroup have css_online() invoked on them. At the point of this call, there is a stated assumption that a cgroup has "successfully completed all allocations" [0]. An example of a subsystem that relies on the previously mentioned assumption [0] is the memory subsystem. Within its implementation of css_online(), work is queued to asynchronously begin flushing via rstat. In the early init path for a given subsystem, having rstat enabled leads to this sequence: cgroup_init_early() for_each_subsys(ss, ssid) if (ss->early_init) cgroup_init_subsys(ss, true) cgroup_init_subsys(ss, early_init) css = ss->css_alloc(...) init_and_link_css(css, ss, ...) ... online_css(css) online_css(css) ss = css->ss ss->css_online(css) Continuing to use the memory subsystem as an example, the issue with this sequence is that css_rstat_init() has not been called yet. This means there is now a race between the pending async work to flush rstat and the call to css_rstat_init(). So a flush can occur within the given cgroup while the rstat fields are not initialized. Since we are in the early init phase, the rstat fields cannot be initialized because they require per-cpu allocations. So it's not possible to have css_rstat_init() called early enough (before online_css()). This patch treats the combination of early init and rstat the same as as other invalid conditions. [0] Documentation/admin-guide/cgroup-v1/cgroups.rst (section: css_online) Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-05-09cgroup/cpuset: drop useless cpumask_empty() in ↵Yury Norov
compute_effective_exclusive_cpumask() Empty cpumasks can't intersect with any others. Therefore, testing for non-emptyness is useless. Signed-off-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-25cgroup/rstat: Improve cgroup_rstat_push_children() documentationWaiman Long
The cgroup_rstat_push_children() function converts a set of updated_children lists from different cgroups into a single ordered list of cgroups to be flushed via the rstat_flush_next pointer. The algorithm used isn't that well illustrated and it takes time to grasp what it is doing. Improve the embedded documentation and variable names to better illustrate the transformation process and make the code easier to understand. Also cgroup_rstat_lock must be held for the whole duration from where the rstat_flush_next list is being constructed in cgroup_rstat_push_children() to when it is consumed later in css_rstat_flush(). Otherwise, list corruption can happen leading to system crash as reported in [1]. In this particular case, the branch being used has commit 093c8812de2d ("cgroup: rstat: Cleanup flushing functions and locking") which breaks this rule, but is missing the fix commit 7d6c63c31914 ("cgroup: rstat: call cgroup_rstat_updated_list with cgroup_rstat_lock") that fixes it. This patch has no functional change. [1] https://lore.kernel.org/lkml/BY5PR04MB68495E9E8A46CA9614D62669BCBB2@BY5PR04MB6849.namprd04.prod.outlook.com/ Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-24cgroup: fix goto ordering in cgroup_init()JP Kobryn
Go to the appropriate section labels when css_rstat_init() or psi_cgroup_alloc() fails. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Fixes: a97915559f5c ("cgroup: change rstat function signatures from cgroup-based to css-based") Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-21cgroup: fix pointer check in css_rstat_init()JP Kobryn
In css_rstat_init() allocations are done for the cgroup's pointers rstat_cpu and rstat_base_cpu. Make sure the allocation checks are consistent with what they are allocating. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-07cgroup/cpuset: Add warnings to catch inconsistency in exclusive CPUsWaiman Long
Add WARN_ON_ONCE() statements whenever new exclusive CPUs are being added to a partition root to catch inconsistency in the way exclusive CPUs are being handled in the cpuset code. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-07cgroup/cpuset: Fix obsolete comment in cpuset_css_offline()Waiman Long
Move the partition disablement comment from cpuset_css_offline() to cpuset_css_killed() and reword it to remove the remnant of sched.partition. Suggested-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-07cgroup/cpuset: Always use cpu_active_maskWaiman Long
The current cpuset code uses both cpu_active_mask and cpu_online_mask and it can be confusing which one should be used if we need to update the code. The top_cpuset is always synchronized to cpu_active_mask and we should avoid using cpu_online_mask as much as possible. An active CPU is always an online CPU, but not vice versa. cpu_active_mask and cpu_online_mask can differ during hotplug operations. A CPU is marked active at the last stage of CPU bringup (CPUHP_AP_ACTIVE). It is also the stage where cpuset hotplug code will be called to update the sched domains so that the scheduler can move a normal task to a newly active CPU or remove tasks away from a newly inactivated CPU. The online bit is set much earlier in the CPU bringup process and cleared much later in CPU teardown. If cpu_online_mask is used while a hotunplug operation is happening in parallel, we may leave an offline CPU in cpu_allowed or have a higher chance of leaving an offline CPU in some other masks. Avoid this problem by always using cpu_active_mask in the cpuset code and leave a comment as to why the use of cpu_online_mask is discouraged. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-04cgroup: change rstat function signatures from cgroup-based to css-basedJP Kobryn
This non-functional change serves as preparation for moving to subsystem-based rstat trees. To simplify future commits, change the signatures of existing cgroup-based rstat functions to become css-based and rename them to reflect that. Though the signatures have changed, the implementations have not. Within these functions use the css->cgroup pointer to obtain the associated cgroup and allow code to function the same just as it did before this patch. At applicable call sites, pass the subsystem-specific css pointer as an argument or pass a pointer to cgroup::self if not in subsystem context. Note that cgroup_rstat_updated_list() and cgroup_rstat_push_children() are not altered yet since there would be a larger amount of css to cgroup conversions which may overcomplicate the code at this intermediate phase. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-04cgroup: add helper for checking when css is cgroup::selfJP Kobryn
The cgroup struct has a css field called "self". The main difference between this css and the others found in the cgroup::subsys array is that cgroup::self has a NULL subsystem pointer. There are several places where checks are performed to determine whether the css in question is cgroup::self or not. Instead of accessing css->ss directly, introduce a helper function that shows the intent and use where applicable. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-04cgroup: move rstat base stat objects into their own structJP Kobryn
This non-functional change serves as preparation for moving to subsystem-based rstat trees. The base stats are not an actual subsystem, but in future commits they will have exclusive rstat trees just as other subsystems will. Moving the base stat objects into a new struct allows the cgroup_rstat_cpu struct to become more compact since it now only contains the minimum amount of pointers needed for rstat participation. Subsystems will (in future commits) make use of the compact cgroup_rstat_cpu struct while avoiding the memory overhead of the base stat objects which they will not use. An instance of the new struct cgroup_rstat_base_cpu was placed on the cgroup struct so it can retain ownership of these base stats common to all cgroups. A helper function was added for looking up the cpu-specific base stats of a given cgroup. Finally, initialization and variable names were adjusted where applicable. Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-01cgroup/cpuset: Fix race between newly created partition and dying oneWaiman Long
There is a possible race between removing a cgroup diectory that is a partition root and the creation of a new partition. The partition to be removed can be dying but still online, it doesn't not currently participate in checking for exclusive CPUs conflict, but the exclusive CPUs are still there in subpartitions_cpus and isolated_cpus. These two cpumasks are global states that affect the operation of cpuset partitions. The exclusive CPUs in dying cpusets will only be removed when cpuset_css_offline() function is called after an RCU delay. As a result, it is possible that a new partition can be created with exclusive CPUs that overlap with those of a dying one. When that dying partition is finally offlined, it removes those overlapping exclusive CPUs from subpartitions_cpus and maybe isolated_cpus resulting in an incorrect CPU configuration. This bug was found when a warning was triggered in remote_partition_disable() during testing because the subpartitions_cpus mask was empty. One possible way to fix this is to iterate the dying cpusets as well and avoid using the exclusive CPUs in those dying cpusets. However, this can still cause random partition creation failures or other anomalies due to racing. A better way to fix this race is to reset the partition state at the moment when a cpuset is being killed. Introduce a new css_killed() CSS function pointer and call it, if defined, before setting CSS_DYING flag in kill_css(). Also update the css_is_dying() helper to use the CSS_DYING flag introduced by commit 33c35aa48178 ("cgroup: Prevent kill_css() from being called more than once") for proper synchronization. Add a new cpuset_css_killed() function to reset the partition state of a valid partition root if it is being killed. Fixes: ee8dde0cd2ce ("cpuset: Add new v2 cpuset.sched.partition flag") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-04-01cgroup: rstat: call cgroup_rstat_updated_list with cgroup_rstat_lockShakeel Butt
The commit 093c8812de2d ("cgroup: rstat: Cleanup flushing functions and locking") during cleanup accidentally changed the code to call cgroup_rstat_updated_list() without cgroup_rstat_lock which is required. Fix it. Fixes: 093c8812de2d ("cgroup: rstat: Cleanup flushing functions and locking") Reported-by: Jakub Kicinski <kuba@kernel.org> Reported-by: Breno Leitao <leitao@debian.org> Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/all/6564c3d6-9372-4352-9847-1eb3aea07ca4@linux.ibm.com/ Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31selftest/cgroup: Add a remote partition transition test to test_cpuset_prs.shWaiman Long
The current cgroup directory layout for running the partition state transition tests is mainly suitable for testing local partitions as well as with a mix of local and remote partitions. It is not that suitable for doing extensive remote partition and nested remote/local partition testing. Add a new set of remote partition tests REMOTE_TEST_MATRIX with another cgroup directory structure more tailored for remote partition testing to provide better code coverage. Also add a few new test cases as well as adjusting existig ones for the original TEST_MATRIX. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31selftest/cgroup: Clean up and restructure test_cpuset_prs.shWaiman Long
Cleaning up the test_cpuset_prs.sh script and restructure some of the functions so that a new test matrix with a different cgroup directory structure can be added in the next patch. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31selftest/cgroup: Update test_cpuset_prs.sh to use | as effective CPUs and ↵Waiman Long
state separator Currently, ',' is used as the cgroup separator of the expected effective CPUs and partition root states in the test matrix. However, ',' can be part of the output of the cpuset.cpus*.effective and cpuset.cpus.isolated files. Change the separator to '|' so that ',' can appear as part of the expected values. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31cgroup/cpuset: Remove unneeded goto in sched_partition_write() and rename itWaiman Long
The goto statement in sched_partition_write() is not needed. Remove it and rename sched_partition_write()/sched_partition_show() to cpuset_partition_write()/cpuset_partition_show(). Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31cgroup/cpuset: Code cleanup and comment updateWaiman Long
Rename partition_xcpus_newstate() to isolated_cpus_update(), update_partition_exclusive() to update_partition_exclusive_flag() and the new_xcpus_state variable to isolcpus_updated to make their meanings more explicit. Also add some comments to further clarify the code. No functional change is expected. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31cgroup/cpuset: Don't allow creation of local partition over a remote oneWaiman Long
Currently, we don't allow the creation of a remote partition underneath another local or remote partition. However, it is currently possible to create a new local partition with an existing remote partition underneath it if top_cpuset is the parent. However, the current cpuset code does not set the effective exclusive CPUs correctly to account for those that are taken by the remote partition. Changing the code to properly account for those remote partition CPUs under all possible circumstances can be complex. It is much easier to not allow such a configuration which is not that useful. So forbid that by making sure that exclusive_cpus mask doesn't overlap with subpartitions_cpus and invalidate the partition if that happens. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31cgroup/cpuset: Remove remote_partition_check() & make update_cpumasks_hier() ↵Waiman Long
handle remote partition Currently, changes in exclusive CPUs are being handled in remote_partition_check() by disabling conflicting remote partitions. However, that may lead to results unexpected by the users. Fix this problem by removing remote_partition_check() and making update_cpumasks_hier() handle changes in descendant remote partitions properly. The compute_effective_exclusive_cpumask() function is enhanced to check the exclusive_cpus and effective_xcpus from siblings and excluded them in its effective exclusive CPUs computation and return a value to show if there is any sibling conflicts. This is somewhat like the cpu_exclusive flag check in validate_change(). This is the initial step to enable us to retire the use of cpu_exclusive flag in cgroup v2 in the future. One of the tests in the TEST_MATRIX of the test_cpuset_prs.sh script has to be updated due to changes in the way a child remote partition root is being handled (updated instead of invalidation) in update_cpumasks_hier(). Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31cgroup/cpuset: Fix error handling in remote_partition_disable()Waiman Long
When remote_partition_disable() is called to disable a remote partition, it always sets the partition to an invalid partition state. It should only do so if an error code (prs_err) has been set. Correct that and add proper error code in places where remote_partition_disable() is called due to error. Fixes: 181c8e091aae ("cgroup/cpuset: Introduce remote partition") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31cgroup/cpuset: Fix incorrect isolated_cpus update in ↵Waiman Long
update_parent_effective_cpumask() Before commit f0af1bfc27b5 ("cgroup/cpuset: Relax constraints to partition & cpus changes"), a cpuset partition cannot be enabled if not all the requested CPUs can be granted from the parent cpuset. After that commit, a cpuset partition can be created even if the requested exclusive CPUs contain CPUs not allowed its parent. The delmask containing exclusive CPUs to be removed from its parent wasn't adjusted accordingly. That is not a problem until the introduction of a new isolated_cpus mask in commit 11e5f407b64a ("cgroup/cpuset: Keep track of CPUs in isolated partitions") as the CPUs in the delmask may be added directly into isolated_cpus. As a result, isolated_cpus may incorrectly contain CPUs that are not isolated leading to incorrect data reporting. Fix this by adjusting the delmask to reflect the actual exclusive CPUs for the creation of the partition. Fixes: 11e5f407b64a ("cgroup/cpuset: Keep track of CPUs in isolated partitions") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2025-03-31x86: don't re-generate cpufeaturemasks.h so eagerlyLinus Torvalds
It turns out the code to generate the x86 cpufeaturemasks.h header was way too aggressive, and would re-generate it whenever the timestamp on the kernel config file changed. Now, the regular 'make *config' tools are fairly careful to not rewrite the kernel config file unless the contents change, but other usecases aren't that careful. Michael Kelley reports that 'make-kpkg' ends up doing "make syncconfig" multiple times in prepping to build, and will modify the config file in the process (and then modify it back, but by then the timestamps have changed). Jakub Kicinski reports that the netdev CI does something similar in how it generates the config file in multiple steps. In both cases, the config file timestamp updates then cause the cpufeaturemasks.h file to be regenerated, and that in turn then causes lots of unnecessary rebuilds due to all the normal dependencies. Fix it by using our 'filechk' infrastructure in the Makefile to generate the header file. That will only write a new version of the file if the contents of the file have actually changed. Fixes: 841326332bcb ("x86/cpufeatures: Generate the <asm/cpufeaturemasks.h> header based on build config") Reported-by: Michael Kelley <mhklinux@outlook.com> Reported-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/all/SN6PR02MB415756D1829740F6E8AC11D1D4D82@SN6PR02MB4157.namprd02.prod.outlook.com/ Link: https://lore.kernel.org/all/20250328162311.08134fa6@kernel.org/ Cc: Peter Anvin <hpa@zytor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-03-31Merge tag 'trace-ringbuffer-v6.15-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer updates from Steven Rostedt: - Restructure the persistent memory to have a "scratch" area Instead of hard coding the KASLR offset in the persistent memory by the ring buffer, push that work up to the callers of the persistent memory as they are the ones that need this information. The offsets and such is not important to the ring buffer logic and it should not be part of that. A scratch pad is now created when the caller allocates a ring buffer from persistent memory by stating how much memory it needs to save. - Allow where modules are loaded to be saved in the new scratch pad Save the addresses of modules when they are loaded into the persistent memory scratch pad. - A new module_for_each_mod() helper function was created With the acknowledgement of the module maintainers a new module helper function was created to iterate over all the currently loaded modules. This has a callback to be called for each module. This is needed for when tracing is started in the persistent buffer and the currently loaded modules need to be saved in the scratch area. - Expose the last boot information where the kernel and modules were loaded The last_boot_info file is updated to print out the addresses of where the kernel "_text" location was loaded from a previous boot, as well as where the modules are loaded. If the buffer is recording the current boot, it only prints "# Current" so that it does not expose the KASLR offset of the currently running kernel. - Allow the persistent ring buffer to be released (freed) To have this in production environments, where the kernel command line can not be changed easily, the ring buffer needs to be freed when it is not going to be used. The memory for the buffer will always be allocated at boot up, but if the system isn't going to enable tracing, the memory needs to be freed. Allow it to be freed and added back to the kernel memory pool. - Allow stack traces to print the function names in the persistent buffer Now that the modules are saved in the persistent ring buffer, if the same modules are loaded, the printing of the function names will examine the saved modules. If the module is found in the scratch area and is also loaded, then it will do the offset shift and use kallsyms to display the function name. If the address is not found, it simply displays the address from the previous boot in hex. * tag 'trace-ringbuffer-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Use _text and the kernel offset in last_boot_info tracing: Show last module text symbols in the stacktrace ring-buffer: Remove the unused variable bmeta tracing: Skip update_last_data() if cleared and remove active check for save_mod() tracing: Initialize scratch_size to zero to prevent UB tracing: Fix a compilation error without CONFIG_MODULES tracing: Freeable reserved ring buffer mm/memblock: Add reserved memory release function tracing: Update modules to persistent instances when loaded tracing: Show module names and addresses of last boot tracing: Have persistent trace instances save module addresses module: Add module_for_each_mod() function tracing: Have persistent trace instances save KASLR offset ring-buffer: Add ring_buffer_meta_scratch() ring-buffer: Add buffer meta data for persistent ring buffer ring-buffer: Use kaslr address instead of text delta ring-buffer: Fix bytes_dropped calculation issue
2025-03-31Merge tag 'trace-latency-v6.15-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing documentation fix from Steven Rostedt: "Documentation fix for runtime verifier The runtime verifier documents that were created were not referenced in the indices, which caused warning when building the documentation tree. Those documents are now added to the rv indices" * tag 'trace-latency-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: Documentation/rv: Add sched pages to the indices
2025-03-31Merge tag 'perf-tools-for-v6.15-2025-03-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools updates from Namhyung Kim: "perf record: - Introduce latency profiling using scheduler information. The latency profiling is to show impacts on wall-time rather than cpu-time. By tracking context switches, it can weight samples and find which part of the code contributed more to the execution latency. The value (period) of the sample is weighted by dividing it by the number of parallel execution at the moment. The parallelism is tracked in perf report with sched-switch records. This will reduce the portion that are run in parallel and in turn increase the portion of serial executions. For now, it's limited to profile processes, IOW system-wide profiling is not supported. You can add --latency option to enable this. $ perf record --latency -- make -C tools/perf I've run the above command for perf build which adds -j option to make with the number of CPUs in the system internally. Normally it'd show something like below: $ perf report -F overhead,comm ... # # Overhead Command # ........ ............... # 78.97% cc1 6.54% python3 4.21% shellcheck 3.28% ld 1.80% as 1.37% cc1plus 0.80% sh 0.62% clang 0.56% gcc 0.44% perl 0.39% make ... The cc1 takes around 80% of the overhead as it's the actual compiler. However it runs in parallel so its contribution to latency may be less than that. Now, perf report will show both overhead and latency (if --latency was given at record time) like below: $ perf report -s comm ... # # Overhead Latency Command # ........ ........ ............... # 78.97% 48.66% cc1 6.54% 25.68% python3 4.21% 0.39% shellcheck 3.28% 13.70% ld 1.80% 2.56% as 1.37% 3.08% cc1plus 0.80% 0.98% sh 0.62% 0.61% clang 0.56% 0.33% gcc 0.44% 1.71% perl 0.39% 0.83% make ... You can see latency of cc1 goes down to around 50% and python3 and ld contribute a lot more than their overhead. You can use --latency option in perf report to get the same result but ordered by latency. $ perf report --latency -s comm perf report: - As a side effect of the latency profiling work, it adds a new output field 'latency' and a sort key 'parallelism'. The below is a result from my system with 64 CPUs. The build was well-parallelized but contained some serial portions. $ perf report -s parallelism ... # # Overhead Latency Parallelism # ........ ........ ........... # 16.95% 1.54% 62 13.38% 1.24% 61 12.50% 70.47% 1 11.81% 1.06% 63 7.59% 0.71% 60 4.33% 12.20% 2 3.41% 0.33% 59 2.05% 0.18% 64 1.75% 1.09% 9 1.64% 1.85% 5 ... - Support Feodra mini-debuginfo which is a LZMA compressed symbol table inside ".gnu_debugdata" ELF section. perf annotate: - Add --code-with-type option to enable data-type profiling with the usual annotate output. Instead of focusing on data structure, it shows code annotation together with data type it accesses in case the instruction refers to a memory location (and it was able to resolve the target data type). Currently it only works with --stdio. $ perf annotate --stdio --code-with-type ... Percent | Source code & Disassembly of vmlinux for cpu/mem-loads,ldlat=30/pp (18 samples, percent: local period) ---------------------------------------------------------------------------------------------------------------------- : 0 0xffffffff81050610 <__fdget>: 0.00 : ffffffff81050610: callq 0xffffffff81c01b80 <__fentry__> # data-type: (stack operation) 0.00 : ffffffff81050615: pushq %rbp # data-type: (stack operation) 0.00 : ffffffff81050616: movq %rsp, %rbp 0.00 : ffffffff81050619: pushq %r15 # data-type: (stack operation) 0.00 : ffffffff8105061b: pushq %r14 # data-type: (stack operation) 0.00 : ffffffff8105061d: pushq %rbx # data-type: (stack operation) 0.00 : ffffffff8105061e: subq $0x10, %rsp 0.00 : ffffffff81050622: movl %edi, %ebx 0.00 : ffffffff81050624: movq %gs:0x7efc4814(%rip), %rax # 0x14e40 <current_task> # data-type: struct task_struct* +0 0.00 : ffffffff8105062c: movq 0x8d0(%rax), %r14 # data-type: struct task_struct +0x8d0 (files) 0.00 : ffffffff81050633: movl (%r14), %eax # data-type: struct files_struct +0 (count.counter) 0.00 : ffffffff81050636: cmpl $0x1, %eax 0.00 : ffffffff81050639: je 0xffffffff810506a9 <__fdget+0x99> 0.00 : ffffffff8105063b: movq 0x20(%r14), %rcx # data-type: struct files_struct +0x20 (fdt) 0.00 : ffffffff8105063f: movl (%rcx), %eax # data-type: struct fdtable +0 (max_fds) 0.00 : ffffffff81050641: cmpl %ebx, %eax 0.00 : ffffffff81050643: jbe 0xffffffff810506ef <__fdget+0xdf> 0.00 : ffffffff81050649: movl %ebx, %r15d 5.56 : ffffffff8105064c: movq 0x8(%rcx), %rdx # data-type: struct fdtable +0x8 (fd) ... The "# data-type:" part was added with this change. The first few entries are not very interesting. But later you can it accesses a couple of fields in the task_struct, files_struct and fdtable. perf trace: - Support syscall tracing for different ABI. For example it can trace system calls for 32-bit applications on 64-bit kernel transparently. - Add --summary-mode=total option to show global syscall summary. The default is 'thread' to show per-thread syscall summary. Python support: - Add more interfaces to 'perf' module to parse events, and config, enable or disable the event list properly so that it can implement basic functionalities purely in Python. There is an example code for these new interfaces in python/tracepoint.py. - Add mypy and pylint support to enable build time checking. Fix some code based on the findings from these tools. Internals: - Introduce io_dir__readdir() API to make directory traveral (usually for proc or sysfs) efficient with less memory footprint. JSON vendor events: - Add events and metrics for ARM Neoverse N3 and V3 - Update events and metrics on various Intel CPUs - Add/update events for a number of SiFive processors" * tag 'perf-tools-for-v6.15-2025-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (229 commits) perf bpf-filter: Fix a parsing error with comma perf report: Fix a memory leak for perf_env on AMD perf trace: Fix wrong size to bpf_map__update_elem call perf tools: annotate asm_pure_loop.S perf python: Fix setup.py mypy errors perf test: Address attr.py mypy error perf build: Add pylint build tests perf build: Add mypy build tests perf build: Rename TEST_LOGS to SHELL_TEST_LOGS tools/build: Don't pass test log files to linker perf bench sched pipe: fix enforced blocking reads in worker_thread perf tools: Fix is_compat_mode build break in ppc64 perf build: filter all combinations of -flto for libperl perf vendor events arm64 AmpereOneX: Fix frontend_bound calculation perf vendor events arm64: AmpereOne/AmpereOneX: Mark LD_RETIRED impacted by errata perf trace: Fix evlist memory leak perf trace: Fix BTF memory leak perf trace: Make syscall table stable perf syscalltbl: Mask off ABI type for MIPS system calls perf build: Remove Makefile.syscalls ...
2025-03-30Merge tag 'rust-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux Pull Rust updates from Miguel Ojeda: "Toolchain and infrastructure: - Extract the 'pin-init' API from the 'kernel' crate and make it into a standalone crate. In order to do this, the contents are rearranged so that they can easily be kept in sync with the version maintained out-of-tree that other projects have started to use too (or plan to, like QEMU). This will reduce the maintenance burden for Benno, who will now have his own sub-tree, and will simplify future expected changes like the move to use 'syn' to simplify the implementation. - Add '#[test]'-like support based on KUnit. We already had doctests support based on KUnit, which takes the examples in our Rust documentation and runs them under KUnit. Now, we are adding the beginning of the support for "normal" tests, similar to those the '#[test]' tests in userspace Rust. For instance: #[kunit_tests(my_suite)] mod tests { #[test] fn my_test() { assert_eq!(1 + 1, 2); } } Unlike with doctests, the 'assert*!'s do not map to the KUnit assertion APIs yet. - Check Rust signatures at compile time for functions called from C by name. In particular, introduce a new '#[export]' macro that can be placed in the Rust function definition. It will ensure that the function declaration on the C side matches the signature on the Rust function: #[export] pub unsafe extern "C" fn my_function(a: u8, b: i32) -> usize { // ... } The macro essentially forces the compiler to compare the types of the actual Rust function and the 'bindgen'-processed C signature. These cases are rare so far. In the future, we may consider introducing another tool, 'cbindgen', to generate C headers automatically. Even then, having these functions explicitly marked may be a good idea anyway. - Enable the 'raw_ref_op' Rust feature: it is already stable, and allows us to use the new '&raw' syntax, avoiding a couple macros. After everyone has migrated, we will disallow the macros. - Pass the correct target to 'bindgen' on Usermode Linux. - Fix 'rusttest' build in macOS. 'kernel' crate: - New 'hrtimer' module: add support for setting up intrusive timers without allocating when starting the timer. Add support for 'Pin<Box<_>>', 'Arc<_>', 'Pin<&_>' and 'Pin<&mut _>' as pointer types for use with timer callbacks. Add support for setting clock source and timer mode. - New 'dma' module: add a simple DMA coherent allocator abstraction and a test sample driver. - 'list' module: make the linked list 'Cursor' point between elements, rather than at an element, which is more convenient to us and allows for cursors to empty lists; and document it with examples of how to perform common operations with the provided methods. - 'str' module: implement a few traits for 'BStr' as well as the 'strip_prefix()' method. - 'sync' module: add 'Arc::as_ptr'. - 'alloc' module: add 'Box::into_pin'. - 'error' module: extend the 'Result' documentation, including a few examples on different ways of handling errors, a warning about using methods that may panic, and links to external documentation. 'macros' crate: - 'module' macro: add the 'authors' key to support multiple authors. The original key will be kept until everyone has migrated. Documentation: - Add error handling sections. MAINTAINERS: - Add Danilo Krummrich as reviewer of the Rust "subsystem". - Add 'RUST [PIN-INIT]' entry with Benno Lossin as maintainer. It has its own sub-tree. - Add sub-tree for 'RUST [ALLOC]'. - Add 'DMA MAPPING HELPERS DEVICE DRIVER API [RUST]' entry with Abdiel Janulgue as primary maintainer. It will go through the sub-tree of the 'RUST [ALLOC]' entry. - Add 'HIGH-RESOLUTION TIMERS [RUST]' entry with Andreas Hindborg as maintainer. It has its own sub-tree. And a few other cleanups and improvements" * tag 'rust-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (71 commits) rust: dma: add `Send` implementation for `CoherentAllocation` rust: macros: fix `make rusttest` build on macOS rust: block: refactor to use `&raw mut` rust: enable `raw_ref_op` feature rust: uaccess: name the correct function rust: rbtree: fix comments referring to Box instead of KBox rust: hrtimer: add maintainer entry rust: hrtimer: add clocksource selection through `ClockId` rust: hrtimer: add `HrTimerMode` rust: hrtimer: implement `HrTimerPointer` for `Pin<Box<T>>` rust: alloc: add `Box::into_pin` rust: hrtimer: implement `UnsafeHrTimerPointer` for `Pin<&mut T>` rust: hrtimer: implement `UnsafeHrTimerPointer` for `Pin<&T>` rust: hrtimer: add `hrtimer::ScopedHrTimerPointer` rust: hrtimer: add `UnsafeHrTimerPointer` rust: hrtimer: allow timer restart from timer handler rust: str: implement `strip_prefix` for `BStr` rust: str: implement `AsRef<BStr>` for `[u8]` and `BStr` rust: str: implement `Index` for `BStr` rust: str: implement `PartialEq` for `BStr` ...
2025-03-30Merge tag 'modules-6.15-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux Pull modules updates from Petr Pavlu: - Use RCU instead of RCU-sched The mix of rcu_read_lock(), rcu_read_lock_sched() and preempt_disable() in the module code and its users has been replaced with just rcu_read_lock() - The rest of changes are smaller fixes and updates * tag 'modules-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: (32 commits) MAINTAINERS: Update the MODULE SUPPORT section module: Remove unnecessary size argument when calling strscpy() module: Replace deprecated strncpy() with strscpy() params: Annotate struct module_param_attrs with __counted_by() bug: Use RCU instead RCU-sched to protect module_bug_list. static_call: Use RCU in all users of __module_text_address(). kprobes: Use RCU in all users of __module_text_address(). bpf: Use RCU in all users of __module_text_address(). jump_label: Use RCU in all users of __module_text_address(). jump_label: Use RCU in all users of __module_address(). x86: Use RCU in all users of __module_address(). cfi: Use RCU while invoking __module_address(). powerpc/ftrace: Use RCU in all users of __module_text_address(). LoongArch: ftrace: Use RCU in all users of __module_text_address(). LoongArch/orc: Use RCU in all users of __module_address(). arm64: module: Use RCU in all users of __module_text_address(). ARM: module: Use RCU in all users of __module_text_address(). module: Use RCU in all users of __module_text_address(). module: Use RCU in all users of __module_address(). module: Use RCU in search_module_extables(). ...
2025-03-30Merge tag 'x86-urgent-2025-03-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes and updates from Ingo Molnar: - Fix a large number of x86 Kconfig dependency and help text accuracy bugs/problems, by Mateusz Jończyk and David Heideberg - Fix a VM_PAT interaction with fork() crash. This also touches core kernel code - Fix an ORC unwinder bug for interrupt entries - Fixes and cleanups - Fix an AMD microcode loader bug that can promote verification failures into success - Add early-printk support for MMIO based UARTs on an x86 board that had no other serial debugging facility and also experienced early boot crashes * tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode/AMD: Fix __apply_microcode_amd()'s return value x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range() x86/fpu: Update the outdated comment above fpstate_init_user() x86/early_printk: Add support for MMIO-based UARTs x86/dumpstack: Fix inaccurate unwinding from exception stacks due to misplaced assignment x86/entry: Fix ORC unwinder for PUSH_REGS with save_ret=1 x86/Kconfig: Fix lists in X86_EXTENDED_PLATFORM help text x86/Kconfig: Correct X86_X2APIC help text x86/speculation: Remove the extra #ifdef around CALL_NOSPEC x86/Kconfig: Document release year of glibc 2.3.3 x86/Kconfig: Make CONFIG_PCI_CNB20LE_QUIRK depend on X86_32 x86/Kconfig: Document CONFIG_PCI_MMCONFIG x86/Kconfig: Update lists in X86_EXTENDED_PLATFORM x86/Kconfig: Move all X86_EXTENDED_PLATFORM options together x86/Kconfig: Always enable ARCH_SPARSEMEM_ENABLE x86/Kconfig: Enable X86_X2APIC by default and improve help text
2025-03-30Merge tag 'locking-urgent-2025-03-28' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc locking fixes and updates from Ingo Molnar: - Fix a locking self-test FAIL on PREEMPT_RT kernels - Fix nr_unused_locks accounting bug - Simplify the split-lock debugging feature's fast-path * tag 'locking-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Decrease nr_unused_locks if lock unused in zap_class() lockdep: Fix wait context check on softirq for PREEMPT_RT x86/split_lock: Simplify reenabling
2025-03-30Merge tag 'bpf_try_alloc_pages' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf try_alloc_pages() support from Alexei Starovoitov: "The pull includes work from Sebastian, Vlastimil and myself with a lot of help from Michal and Shakeel. This is a first step towards making kmalloc reentrant to get rid of slab wrappers: bpf_mem_alloc, kretprobe's objpool, etc. These patches make page allocator safe from any context. Vlastimil kicked off this effort at LSFMM 2024: https://lwn.net/Articles/974138/ and we continued at LSFMM 2025: https://lore.kernel.org/all/CAADnVQKfkGxudNUkcPJgwe3nTZ=xohnRshx9kLZBTmR_E1DFEg@mail.gmail.com/ Why: SLAB wrappers bind memory to a particular subsystem making it unavailable to the rest of the kernel. Some BPF maps in production consume Gbytes of preallocated memory. Top 5 in Meta: 1.5G, 1.2G, 1.1G, 300M, 200M. Once we have kmalloc that works in any context BPF map preallocation won't be necessary. How: Synchronous kmalloc/page alloc stack has multiple stages going from fast to slow: cmpxchg16 -> slab_alloc -> new_slab -> alloc_pages -> rmqueue_pcplist -> __rmqueue, where rmqueue_pcplist was already relying on trylock. This set changes rmqueue_bulk/rmqueue_buddy to attempt a trylock and return ENOMEM if alloc_flags & ALLOC_TRYLOCK. It then wraps this functionality into try_alloc_pages() helper. We make sure that the logic is sane in PREEMPT_RT. End result: try_alloc_pages()/free_pages_nolock() are safe to call from any context. try_kmalloc() for any context with similar trylock approach will follow. It will use try_alloc_pages() when slab needs a new page. Though such try_kmalloc/page_alloc() is an opportunistic allocator, this design ensures that the probability of successful allocation of small objects (up to one page in size) is high. Even before we have try_kmalloc(), we already use try_alloc_pages() in BPF arena implementation and it's going to be used more extensively in BPF" * tag 'bpf_try_alloc_pages' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: mm: Fix the flipped condition in gfpflags_allow_spinning() bpf: Use try_alloc_pages() to allocate pages for bpf needs. mm, bpf: Use memcg in try_alloc_pages(). memcg: Use trylock to access memcg stock_lock. mm, bpf: Introduce free_pages_nolock() mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation locking/local_lock: Introduce localtry_lock_t
2025-03-30Merge tag 'bpf_res_spin_lock' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf relisient spinlock support from Alexei Starovoitov: "This patch set introduces Resilient Queued Spin Lock (or rqspinlock with res_spin_lock() and res_spin_unlock() APIs). This is a qspinlock variant which recovers the kernel from a stalled state when the lock acquisition path cannot make forward progress. This can occur when a lock acquisition attempt enters a deadlock situation (e.g. AA, or ABBA), or more generally, when the owner of the lock (which we’re trying to acquire) isn’t making forward progress. Deadlock detection is the main mechanism used to provide instant recovery, with the timeout mechanism acting as a final line of defense. Detection is triggered immediately when beginning the waiting loop of a lock slow path. Additionally, BPF programs attached to different parts of the kernel can introduce new control flow into the kernel, which increases the likelihood of deadlocks in code not written to handle reentrancy. There have been multiple syzbot reports surfacing deadlocks in internal kernel code due to the diverse ways in which BPF programs can be attached to different parts of the kernel. By switching the BPF subsystem’s lock usage to rqspinlock, all of these issues are mitigated at runtime. This spin lock implementation allows BPF maps to become safer and remove mechanisms that have fallen short in assuring safety when nesting programs in arbitrary ways in the same context or across different contexts. We run benchmarks that stress locking scalability and perform comparison against the baseline (qspinlock). For the rqspinlock case, we replace the default qspinlock with it in the kernel, such that all spin locks in the kernel use the rqspinlock slow path. As such, benchmarks that stress kernel spin locks end up exercising rqspinlock. More details in the cover letter in commit 6ffb9017e932 ("Merge branch 'resilient-queued-spin-lock'")" * tag 'bpf_res_spin_lock' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (24 commits) selftests/bpf: Add tests for rqspinlock bpf: Maintain FIFO property for rqspinlock unlock bpf: Implement verifier support for rqspinlock bpf: Introduce rqspinlock kfuncs bpf: Convert lpm_trie.c to rqspinlock bpf: Convert percpu_freelist.c to rqspinlock bpf: Convert hashtab.c to rqspinlock rqspinlock: Add locktorture support rqspinlock: Add entry to Makefile, MAINTAINERS rqspinlock: Add macros for rqspinlock usage rqspinlock: Add basic support for CONFIG_PARAVIRT rqspinlock: Add a test-and-set fallback rqspinlock: Add deadlock detection and recovery rqspinlock: Protect waiters in trylock fallback from stalls rqspinlock: Protect waiters in queue from stalls rqspinlock: Protect pending bit owners from stalls rqspinlock: Hardcode cond_acquire loops for arm64 rqspinlock: Add support for timeouts rqspinlock: Drop PV and virtualization support rqspinlock: Add rqspinlock.h header ...
2025-03-30Merge tag 'bpf-next-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: "For this merge window we're splitting BPF pull request into three for higher visibility: main changes, res_spin_lock, try_alloc_pages. These are the main BPF changes: - Add DFA-based live registers analysis to improve verification of programs with loops (Eduard Zingerman) - Introduce load_acquire and store_release BPF instructions and add x86, arm64 JIT support (Peilin Ye) - Fix loop detection logic in the verifier (Eduard Zingerman) - Drop unnecesary lock in bpf_map_inc_not_zero() (Eric Dumazet) - Add kfunc for populating cpumask bits (Emil Tsalapatis) - Convert various shell based tests to selftests/bpf/test_progs format (Bastien Curutchet) - Allow passing referenced kptrs into struct_ops callbacks (Amery Hung) - Add a flag to LSM bpf hook to facilitate bpf program signing (Blaise Boscaccy) - Track arena arguments in kfuncs (Ihor Solodrai) - Add copy_remote_vm_str() helper for reading strings from remote VM and bpf_copy_from_user_task_str() kfunc (Jordan Rome) - Add support for timed may_goto instruction (Kumar Kartikeya Dwivedi) - Allow bpf_get_netns_cookie() int cgroup_skb programs (Mahe Tardy) - Reduce bpf_cgrp_storage_busy false positives when accessing cgroup local storage (Martin KaFai Lau) - Introduce bpf_dynptr_copy() kfunc (Mykyta Yatsenko) - Allow retrieving BTF data with BTF token (Mykyta Yatsenko) - Add BPF kfuncs to set and get xattrs with 'security.bpf.' prefix (Song Liu) - Reject attaching programs to noreturn functions (Yafang Shao) - Introduce pre-order traversal of cgroup bpf programs (Yonghong Song)" * tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (186 commits) selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid bpf: Fix out-of-bounds read in check_atomic_load/store() libbpf: Add namespace for errstr making it libbpf_errstr bpf: Add struct_ops context information to struct bpf_prog_aux selftests/bpf: Sanitize pointer prior fclose() selftests/bpf: Migrate test_xdp_vlan.sh into test_progs selftests/bpf: test_xdp_vlan: Rename BPF sections bpf: clarify a misleading verifier error message selftests/bpf: Add selftest for attaching fexit to __noreturn functions bpf: Reject attaching fexit/fmod_ret to __noreturn functions bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage bpf: Make perf_event_read_output accessible in all program types. bpftool: Using the right format specifiers bpftool: Add -Wformat-signedness flag to detect format errors selftests/bpf: Test freplace from user namespace libbpf: Pass BPF token from find_prog_btf_id to BPF_BTF_GET_FD_BY_ID bpf: Return prog btf_id without capable check bpf: BPF token support for BPF_BTF_GET_FD_BY_ID bpf, x86: Fix objtool warning for timed may_goto bpf: Check map->record at the beginning of check_and_free_fields() ...
2025-03-29Merge tag 'mailbox-v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox Pull mailbox updates from Jassi Brar: "Core: - misc rejig of header includes - minor const fixes Misc: - constify amba_id table pcc: - cleanup and refactoring of shmem and irq handling qcom: - add MSM8226 compatible fsl,mu: - add i.MX94 compatible mediatek: - remove cl in struct cmdq_pkt tegra: - define dimensioning masks in SoC data" * tag 'mailbox-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox: (25 commits) mailbox: Remove unneeded semicolon mailbox: pcc: Refactor and simplify check_and_ack() mailbox: pcc: Always map the shared memory communication address mailbox: pcc: Refactor error handling in irq handler into separate function mailbox: pcc: Use acpi_os_ioremap() instead of ioremap() mailbox: pcc: Return early if no GAS register from pcc_mbox_cmd_complete_check mailbox: pcc: Drop unnecessary endianness conversion of pcc_hdr.flags mailbox: pcc: Always clear the platform ack interrupt first mailbox: pcc: Fix the possible race in updation of chan_in_use flag dt-bindings: mailbox: qcom: add compatible for MSM8226 SoC dt-bindings: mailbox: fsl,mu: Add i.MX94 compatible MAINTAINERS: add mailbox API's tree type and location mailbox: remove unused header files mailbox: explicitly include <linux/bits.h> mailbox: sort headers alphabetically mailbox: don't protect of_parse_phandle_with_args with con_mutex mailbox: use error ret code of of_parse_phandle_with_args() mailbox: arm_mhuv2: Constify amba_id table mailbox: arm_mhu_db: Constify amba_id table mailbox: arm_mhu: Constify amba_id table ...
2025-03-29Merge tag 'hsi-for-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi Pull HSI update from Sebastian Reichel: - ssi_protocol: fix potential use after free after module removal * tag 'hsi-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi: HSI: ssi_protocol: Fix use after free vulnerability in ssi_protocol Driver Due to Race Condition
2025-03-29Merge tag 'for-v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply Pull power supply and reset updates from Sebastian Reichel: "Power-supply core: - remove unused set_charged infrastructure - drop of_node from power_supply struct Power-supply drivers: - axp717: support devices without thermistors - bq27xxx: support max design voltage for bq270x0 and bq27x10 - pcf50633: drop charger driver - max1720x: add battery health support - switch all power-supply devices from of_node to fwnode - convert regmap users to maple tree register cache - convert drivers to devm_kmemdup_array - misc cleanups and fixes Reset drivers: - at91-sama5d2_shdwc: add sama7d65 support * tag 'for-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (30 commits) power: supply: mt6370: Remove redundant 'flush_workqueue()' calls Revert "power: supply: bq27xxx: do not report bogus zero values" power: supply: max77693: Fix wrong conversion of charge input threshold value power: supply: pcf50633: Remove charger power: supply: all: switch psy_cfg from of_node to fwnode power: supply: core: get rid of of_node power: reset: at91-sama5d2_shdwc: Add sama7d65 PMC power: supply: smb347: convert to use maple tree register cache power: supply: rt9455: convert to use maple tree register cache power: supply: max1720x: convert to use maple tree register cache power: supply: ltc4162l: convert to use maple tree register cache power: supply: bq25980: convert to use maple tree register cache power: supply: bq25890: convert to use maple tree register cache power: supply: bq2515x: convert to use maple tree register cache power: supply: bq24257: convert to use maple tree register cache power: supply: bd99954: convert to use maple tree register cache power: supply: Remove unused set_charged method power: supply: ds2760: Remove unused ds2760_battery_set_charged power: supply: core: Remove unused power_supply_set_battery_charged power: supply: sc27xx: use devm_kmemdup_array() ...
2025-03-29Merge tag 'clk-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk updates from Stephen Boyd: "Here's the pile of clk driver patches. The usual suspects^Wsilicon vendors are all here, adding new SoC support and fixing existing code. There are a few patches to the clk framework here as well. They've been baking in linux-next for weeks so I'm hoping we don't have to revert them. The disable OF node patch is probably the scariest one although it seems unlikely that a system would be relying on a driver _not_ probing because the clk never appeared, but you never know. Nothing looks out of the ordinary on the driver side but that's because it's mostly a bunch of data. Core: - Use dev_err_probe() in the clk registration path (Peering into the crystal ball shows many patches that remove printks) - Check for disabled OF nodes in of_clk_get_hw_from_clkspec() New Drivers: - Allwinner A523/T527 clk driver - Qualcomm IPQ9574 NSS clk driver - Qualcomm QCS8300 GPU and video clk drivers - Qualcomm SDM429 RPM clks - Qualcomm QCM6490 LPASS (low power audio) resets - Samsung Exynos2200: driver for several clock controllers (Alive, CMGP, HSI, PERIC/PERIS, TOP, UFS and VFS) - Samsung Exynos7870: Driver for several clock controllers (Alive, MIF, DISP AUD, FSYS, G3D, ISP, MFC and PERI) - Rockchip rk3528 and rk3562 clk driver Updates: - Various fixes to SoC clk drivers for incorrect data, avoid touching protected registers, etc. - Additions for some missing clks in existing SoC clk drivers - DT schema conversions from text to YAML - Kconfig cleanups to allow drivers to be compiled on moar architectures" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (125 commits) clk: qcom: Add NSS clock Controller driver for IPQ9574 clk: qcom: gcc-ipq9574: Add support for gpll0_out_aux clock dt-bindings: clock: Add ipq9574 NSSCC clock and reset definitions dt-bindings: clock: gcc-ipq9574: Add definition for GPLL0_OUT_AUX clk: qcom: gcc-msm8953: fix stuck venus0_core0 clock clk: qcom: mmcc-sdm660: fix stuck video_subcore0 clock dt-bindings: clock: qcom,x1e80100-camcc: Fix the list of required-opps clk: amlogic: a1: fix a typo clk: amlogic: gxbb: drop non existing 32k clock parent clk: amlogic: gxbb: drop incorrect flag on 32k clock clk: amlogic: g12b: fix cluster A parent data clk: amlogic: g12a: fix mmc A peripheral clock dt-bindings: clocks: atmel,at91rm9200-pmc: add missing compatibles dt-bindings: reset: fix double id on rk3562-cru reset ids drivers: clk: qcom: ipq5424: fix the freq table of sdcc1_apps clock clk: qcom: lpassaudiocc-sc7280: Add support for LPASS resets for QCM6490 dt-bindings: clock: qcom: Add compatible for QCM6490 boards clk: qcom: gdsc: Update the status poll timeout for GDSC clk: qcom: gdsc: Set retain_ff before moving to HW CTRL clk: davinci: remove support for da830 ...
2025-03-29Merge tag 'rproc-v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux Pull remoteproc updates from Bjorn Andersson: - Transition the i.MX8MP DSP remoteproc driver to use the reset framework for driving the run/stall reset bits - Add support for managing the modem remoteprocessor on the Qualcomm MSM8226, MSM8926, and SM8750 platforms * tag 'rproc-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: (28 commits) remoteproc: qcom_q6v5_pas: Make single-PD handling more robust remoteproc: qcom_q6v5_pas: Use resource with CX PD for MSM8226 remoteproc: core: Clear table_sz when rproc_shutdown remoteproc: sysmon: Update qcom_add_sysmon_subdev() comment dt-bindings: remoteproc: Consolidate SC8180X and SM8150 PAS files irqdomain: remoteproc: Switch to of_fwnode_handle() remoteproc: qcom: pas: add minidump_id to SC7280 WPSS remoteproc: imx_dsp_rproc: Document run_stall struct member remoteproc: qcom: pas: Add SM8750 MPSS dt-bindings: remoteproc: Add SM8750 MPSS imx_dsp_rproc: Use reset controller API to control the DSP reset: imx8mp-audiomix: Add support for DSP run/stall reset: imx8mp-audiomix: Introduce active_low configuration option reset: imx8mp-audiomix: Prepare the code for more reset bits reset: imx8mp-audiomix: Add prefix for internal macro dt-bindings: dsp: fsl,dsp: Add resets property dt-bindings: reset: audiomix: Add reset ids for EARC and DSP remoteproc: qcom_wcnss: Handle platforms with only single power domain dt-bindings: remoteproc: qcom,wcnss-pil: Add support for single power-domain platforms remoteproc: qcom_q6v5_mss: Add modem support on MSM8926 ...
2025-03-29Merge tag 'hwlock-v6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux Pull hwspinlock updates from Bjorn Andersson: "Drop a few unused functions from the hwspinlock framework" * tag 'hwlock-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: hwspinlock: Remove unused hwspin_lock_get_id() hwspinlock: Remove unused (devm_)hwspin_lock_request()
2025-03-29Merge tag 'pinctrl-v6.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control updates from Linus Walleij: "Core changes: - None really. New drivers: - AMD ISP411 "AMD ISP" driver - Exynos 2200 and 7870 SoC subdrivers - Sophgo RISC-V SG2042 and SG2044 subdrivers - Amlogic A4 subdriver - Rockchip RK3528 subdriver - Broadcom BCM21664 subdriver - Allwinner A523/T527 subdriver - Ingenic X1600 subdriver - Microchip SAMA7D65 subdriver, essentially a re-branded Atmel AT91 PIO4 driver, but nowadays a Microschip SoC line Improvements: - Bring in the devm_kmemdup_array() helper and use it throughout, also bring in changes to other subsystems for this to establish this helper - Support EGPIO on the Qualcomm SA8775P SoC - Extend EINT support in the Mediatek driver" * tag 'pinctrl-v6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (101 commits) pinctrl: mediatek: Add EINT support for multiple addresses pinctrl: amlogic-a4: Drop surplus semicolon pinctrl: nuvoton: Reduce use of OF-specific APIs pinctrl: nuvoton: Convert to use struct group_desc pinctrl: nuvoton: Make use of struct pinfunction and PINCTRL_PINFUNCTION() pinctrl: nuvoton: Convert to use struct pingroup and PINCTRL_PINGROUP() pinctrl: npcm8xx: Fix incorrect struct npcm8xx_pincfg assignment pinctrl: tegra: Fix off by one in tegra_pinctrl_get_group() pinctrl: PINCTRL_AMDISP should depend on DRM_AMD_ISP pinctrl: qcom: sa8775p: Enable egpio function dt-bindings: pinctrl: qcom: Add egpio function for sa8775p pinctrl: qcom: tlmm-test: Validate irq_enable delivers edge irqs pinctrl: qcom: Clear latched interrupt status when changing IRQ type dt-bindings: pinctrl: airoha: Add missing gpio-ranges property pinctrl: bcm281xx: Add missing assignment in bcm21664_pinctrl_lock_all() pinctrl: amd: isp411: Fix IS_ERR() vs NULL check in probe() dt-bindings: pinctrl: at91-pio4: add microchip,sama7d65-pinctrl pinctrl: tegra: Set SFIO mode to Mux Register pinctrl-tegra: Restore SFSEL bit when freeing pins pinctrl: tegra: Add descriptions for SoC data fields ...
2025-03-29Merge tag 'backlight-next-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight Pull backlight updates from Lee Jones: - Apple DWI Backlight: - Added devicetree bindings for backlight controllers on Apple's DWI interface. - Added a new driver (apple_dwi_bl) for these controllers found on some Apple mobile devices. - Added MAINTAINERS entries for the new driver. - led_bl: Fixed a locking issue by holding the led_access lock when calling led_sysfs_disable() during device removal to prevent potential warnings. - Removed unnecessary <linux/fb.h> includes from a bunch of drivers. - tdo24m: Removed redundant whitespace in Kconfig description. - pcf50633-backlight: Removed the driver as the underlying pcf50633 MFD and s3c24xx platform support were removed. * tag 'backlight-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight: (22 commits) backlight: pcf50633-backlight: Remove unused driver backlight: tdo24m: Eliminate redundant whitespace MAINTAINERS: Add entries for Apple DWI backlight controller backlight: apple_dwi_bl: Add Apple DWI backlight driver dt-bindings: leds: backlight: apple,dwi-bl: Add Apple DWI backlight backlight: led_bl: Hold led_access lock when calling led_sysfs_disable() backlight: wm831x_bl: Do not include <linux/fb.h> backlight: vgg2432a4: Do not include <linux/fb.h> backlight: tps65217_bl: Do not include <linux/fb.h> backlight: max8925_bl: Do not include <linux/fb.h> backlight: lv5207lp: Do not include <linux/fb.h> backlight: locomolcd: Do not include <linux/fb.h> backlight: hp680_bl: Do not include <linux/fb.h> backlight: ep93xx_bl: Do not include <linux/fb.h> backlight: da9052_bl: Do not include <linux/fb.h> backlight: da903x_bl: Do not include <linux/fb.h> backlight: bd6107_bl: Do not include <linux/fb.h> backlight: as3711_bl: Do not include <linux/fb.h> backlight: adp8870_bl: Do not include <linux/fb.h> backlight: adp8860_bl: Do not include <linux/fb.h> ...
2025-03-29Merge tag 'leds-next-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds Pull LED updates from Lee Jones: - pca955x: Add HW blink support, utilizing PWM0. It supports one frequency across all blinking LEDs and falls back to software blink if different frequencies are requested. - trigger: netdev: Allow configuring LED blink interval via .blink_set even when HW offload (.hw_control) is enabled. - led-core: Fix a race condition where a quick LED_OFF followed by another brightness set could leave the LED off incorrectly, especially noticeable after the introduction of the ordered workqueue. - qcom-lpg: Add support for 6-bit PWM resolution alongside the existing 9-bit support. - qcom-lpg: Fix PWM value capping to respect the selected resolution (6-bit or 9-bit) for normal PWMs. - qcom-lpg: Fix PWM value capping to respect the selected resolution for Hi-Res PWMs. - qcom-lpg: Fix calculation of the best period for Hi-Res PWMs to prevent requested duty cycles from exceeding the maximum allowed by the selected resolution. - st1202: Add a check for the error code returned by devm_mutex_init(). - pwm-multicolor: Add a check for the return value of fwnode_property_read_u32(). - st1202: Ensure hardware initialization (st1202_setup) happens before DT node processing (st1202_dt_init). - Kconfig: leds-st1202: Add select LEDS_TRIGGER_PATTERN as it's required by the driver. - lp8860: Drop unneeded explicit assignment to REGCACHE_NONE. - pca955x: Refactor code with helper functions and rename some functions/variables for clarity. - pca955x: Pass driver data pointers instead of the I2C client to helper functions. - pca955x: Optimize probe LED selection logic to reduce I2C operations. - pca955x: Revert the removal of pca95xx_num_led_regs() (renaming it to pca955x_num_led_regs) as it's needed for HW blink support. - st1202: Refactor st1202_led_set() to use the !! operator for boolean conversion. - st1202: Minor spacing and proofreading edits in comments. - Directory Rename: Rename the drivers/leds/simple directory to drivers/leds/simatic as the drivers within are not simple. - mlxcpld: Remove unused include of acpi.h. - nic78bx: Tidy up the ACPI ID table (remove ACPI_PTR, use mod_devicetable.h, remove explicit driver_data initializer). - tlc591xx: Convert text binding to YAML format, add child node constraints, and fix typos/formatting in the example. - qcom-lpg: Document the qcom,pm8937-pwm compatible string as a fallback for qcom,pm8916-pwm. * tag 'leds-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (23 commits) leds: nic78bx: Tidy up ACPI ID table leds: mlxcpld: Remove unused ACPI header inclusion leds: rgb: leds-qcom-lpg: Fix calculation of best period Hi-Res PWMs leds: rgb: leds-qcom-lpg: Fix pwm resolution max for Hi-Res PWMs leds: rgb: leds-qcom-lpg: Fix pwm resolution max for normal PWMs leds: Rename simple directory to simatic leds: Kconfig: leds-st1202: Add select for required LEDS_TRIGGER_PATTERN leds: leds-st1202: Spacing and proofreading editing leds: leds-st1202: Initialize hardware before DT node child operations leds: pwm-multicolor: Add check for fwnode_property_read_u32 leds: rgb: leds-qcom-lpg: Add support for 6-bit PWM resolution leds: Fix LED_OFF brightness race Revert "leds-pca955x: Remove the unused function pca95xx_num_led_regs()" leds: st1202: Refactor st1202_led_set() to use !! operator for boolean conversion dt-bindings: leds: qcom-lpg: Document PM8937 PWM compatible leds: pca955x: Add HW blink support leds: pca955x: Optimize probe LED selection leds: pca955x: Use pointers to driver data rather than I2C client leds: pca955x: Refactor with helper functions and renaming dt-bindings: leds: Convert leds-tlc591xx.txt to yaml format ...
2025-03-29Merge tag 'mfd-next-6.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "Maxim MAX77705: - Added core MFD driver. - Added charger driver. - Added devicetree bindings for the charger and MFD core. - Added Haptic controller support via the input subsystem. - Added LED support. - Added support to simple-mfd-i2c for fuel gauge and hwmon. Samsung S2MPU05 (Exynos7870 PMIC): - Added core MFD support. - Added Regulator support for 21 LDOs and 5 BUCKs. - Added devicetree bindings for regulators and the PMIC core. TI TPS65215 & TPS65214: - Added support to the existing TPS65219 driver. - Added devicetree bindings. STMicroelectronics STM32MP25: - Added support to the stm32-timers MFD driver. - Added devicetree bindings. Congatec Board Controller (CGBC): - Added HWMON support for internal sensors. - Added support for the conga-SA8 module. Microchip LAN969X: - Enabled the at91-usart MFD driver for this architecture. MediaTek MT6359: - Added mfd_cell for mt6359-accdet to allow its driver to probe. Other misc driver updates: - AXP20X (AXP717): Added AXP717_TS_PIN_CFG register to writeable regs for temperature sensor configuration. - SM501: Switched to using BIT() macro to mitigate potential integer overflows in GPIO functions. - ENE KB3930: Added a NULL pointer check for off_gpios during probe to prevent potential dereference. - SYSCON: Added a check for invalid resource size to prevent issues from DT misconfiguration. - CGBC: Corrected signedness issues in cgbc_session_request - intel_soc_pmic_chtdc_ti / intel_soc_pmic_crc: Removed unneeded explicit assignment to REGCACHE_NONE. - ipaq-micro / tps65010: Switched to using str_enable_disable() helpers for clarity and potential size reduction. - upboard-fpga: Removed unnecessary ACPI_PTR() annotation. - max8997: Removed unused max8997_irq_exit() function, using devm_* helpers instead. - lp3943: Dropped unused #include <linux/pwm.h> from the header file. - db8500-prcmu: Removed needless return statements in void APIs. - qnap-mcu: Replaced commas with semicolons between expressions for correctness. - STA2X11: Removed the core MFD driver as the underlying platform support was removed. - EZX-PCAP: Removed the unused pcap_adc_sync function. - PCF50633 (OpenMoko PMIC): Removed the entire driver (core, adc, gpio, irq) as the underlying s3c24xx platform support was removed. Devicetree updates: - Converted fsl,mcu-mpc8349emitx binding to YAML - Added qcom,msm8937-tcsr compatible - Added microchip,sama7d65-flexcom compatible - Added rockchip,rk3528-qos syscon compatible - Added airoha,en7581-pbus-csr syscon compatible - Added microchip,sama7d65-ddr3phy syscon compatible - Added microchip,sama7d65-sfrbu syscon compatible" * tag 'mfd-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (49 commits) mfd: cgbc-core: Add support for conga-SA8 dt-bindings: mfd: syscon: Add microchip,sama7d65-sfrbu dt-bindings: mfd: syscon: Add microchip,sama7d65-ddr3phy mfd: cgbc: Add support for HWMON dt-bindings: mfd: syscon: Add the pbus-csr node for Airoha EN7581 SoC mfd: cgbc-core: Cleanup signedness in cgbc_session_request() mfd: pcf50633: Remove remaining PCF50633 support mfd: pcf50633: Remove unused platform IRQ code mfd: pcF50633-gpio: Remove unused driver mfd: pcf50633-adc: Remove unused driver mfd: qnap-mcu: Convert commas to semicolons in qnap_mcu_exec() mfd: mt6397-core: Add mfd_cell for mt6359-accdet dt-bindings: mfd: syscon: Add rk3528 QoS register compatible dt-bindings: mfd: atmel,sama5d2-flexcom: Add microchip,sama7d65-flexcom mfd: ezx-pcap: Remove unused pcap_adc_sync mfd: db8500-prcmu: Remove needless return in three void APIs mfd: Remove STA2x11 core driver mfd: max77620: Allow building as a module mfd: ene-kb3930: Fix a potential NULL pointer dereference dt-bindings: mfd: qcom,tcsr: Add compatible for MSM8937 ...