Age | Commit message (Collapse) | Author |
|
The calls to css_rstat_init() occur at different places depending on the
context. Document the conditions that determine which point of
initialization is used.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
There are a few places where a conditional check is performed to validate a
given css on its rstat participation. This new helper tries to make the
code more readable where this check is performed.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
It is possible to eliminate contention between subsystems when
updating/flushing stats by using subsystem-specific locks. Let the existing
rstat locks be dedicated to the cgroup base stats and rename them to
reflect that. Add similar locks to the cgroup_subsys struct for use with
individual subsystems.
Lock initialization is done in the new function ss_rstat_init(ss) which
replaces cgroup_rstat_boot(void). If NULL is passed to this function, the
global base stat locks will be initialized. Otherwise, the subsystem locks
will be initialized.
Change the existing lock helper functions to accept a reference to a css.
Then within these functions, conditionally select the appropriate locks
based on the subsystem affiliation of the given css. Add helper functions
for this selection routine to avoid repeated code.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Different subsystems may call cgroup_rstat_updated() within the same
cgroup, resulting in a tree of pending updates from multiple subsystems.
When one of these subsystems is flushed via cgroup_rstat_flushed(), all
other subsystems with pending updates on the tree will also be flushed.
Change the paradigm of having a single rstat tree for all subsystems to
having separate trees for each subsystem. This separation allows for
subsystems to perform flushes without the side effects of other subsystems.
As an example, flushing the cpu stats will no longer cause the memory stats
to be flushed and vice versa.
In order to achieve subsystem-specific trees, change the tree node type
from cgroup to cgroup_subsys_state pointer. Then remove those pointers from
the cgroup and instead place them on the css. Finally, change update/flush
functions to make use of the different node type (css). These changes allow
a specific subsystem to be associated with an update or flush. Separate
rstat trees will now exist for each unique subsystem.
Since updating/flushing will now be done at the subsystem level, there is
no longer a need to keep track of updated css nodes at the cgroup level.
The list management of these nodes done within the cgroup (rstat_css_list
and related) has been removed accordingly.
Conditional guards for checking validity of a given css were placed within
css_rstat_updated/flush() to prevent undefined behavior occuring from kfunc
usage in bpf programs. Guards were also placed within css_rstat_init/exit()
in order to help consolidate calls to them. At call sites for all four
functions, the existing guards were removed.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Adjust the implementation of css_is_cgroup() so that it compares the given
css to cgroup::self. Rename the function to css_is_self() in order to
reflect that. Change the existing css->ss NULL check to a warning in the
true branch. Finally, adjust call sites to use the new function name.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
An early init subsystem that attempts to make use of rstat can lead to
failures during early boot. The reason for this is the timing in which the
css's of the root cgroup have css_online() invoked on them. At the point of
this call, there is a stated assumption that a cgroup has "successfully
completed all allocations" [0]. An example of a subsystem that relies on
the previously mentioned assumption [0] is the memory subsystem. Within its
implementation of css_online(), work is queued to asynchronously begin
flushing via rstat. In the early init path for a given subsystem, having
rstat enabled leads to this sequence:
cgroup_init_early()
for_each_subsys(ss, ssid)
if (ss->early_init)
cgroup_init_subsys(ss, true)
cgroup_init_subsys(ss, early_init)
css = ss->css_alloc(...)
init_and_link_css(css, ss, ...)
...
online_css(css)
online_css(css)
ss = css->ss
ss->css_online(css)
Continuing to use the memory subsystem as an example, the issue with this
sequence is that css_rstat_init() has not been called yet. This means there
is now a race between the pending async work to flush rstat and the call to
css_rstat_init(). So a flush can occur within the given cgroup while the
rstat fields are not initialized.
Since we are in the early init phase, the rstat fields cannot be
initialized because they require per-cpu allocations. So it's not possible
to have css_rstat_init() called early enough (before online_css()). This
patch treats the combination of early init and rstat the same as as other
invalid conditions.
[0] Documentation/admin-guide/cgroup-v1/cgroups.rst (section: css_online)
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
compute_effective_exclusive_cpumask()
Empty cpumasks can't intersect with any others. Therefore, testing for
non-emptyness is useless.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The cgroup_rstat_push_children() function converts a set of
updated_children lists from different cgroups into a single ordered
list of cgroups to be flushed via the rstat_flush_next pointer.
The algorithm used isn't that well illustrated and it takes time to
grasp what it is doing. Improve the embedded documentation and variable
names to better illustrate the transformation process and make the code
easier to understand.
Also cgroup_rstat_lock must be held for the whole duration
from where the rstat_flush_next list is being constructed in
cgroup_rstat_push_children() to when it is consumed later in
css_rstat_flush(). Otherwise, list corruption can happen leading to
system crash as reported in [1]. In this particular case, the branch
being used has commit 093c8812de2d ("cgroup: rstat: Cleanup flushing
functions and locking") which breaks this rule, but is missing the fix
commit 7d6c63c31914 ("cgroup: rstat: call cgroup_rstat_updated_list
with cgroup_rstat_lock") that fixes it.
This patch has no functional change.
[1] https://lore.kernel.org/lkml/BY5PR04MB68495E9E8A46CA9614D62669BCBB2@BY5PR04MB6849.namprd04.prod.outlook.com/
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Go to the appropriate section labels when css_rstat_init() or
psi_cgroup_alloc() fails.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Fixes: a97915559f5c ("cgroup: change rstat function signatures from cgroup-based to css-based")
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
In css_rstat_init() allocations are done for the cgroup's pointers
rstat_cpu and rstat_base_cpu. Make sure the allocation checks are
consistent with what they are allocating.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Add WARN_ON_ONCE() statements whenever new exclusive CPUs are being
added to a partition root to catch inconsistency in the way exclusive
CPUs are being handled in the cpuset code.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Move the partition disablement comment from cpuset_css_offline()
to cpuset_css_killed() and reword it to remove the remnant of
sched.partition.
Suggested-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The current cpuset code uses both cpu_active_mask and cpu_online_mask
and it can be confusing which one should be used if we need to update
the code.
The top_cpuset is always synchronized to cpu_active_mask and we should
avoid using cpu_online_mask as much as possible. An active CPU is always
an online CPU, but not vice versa. cpu_active_mask and cpu_online_mask
can differ during hotplug operations.
A CPU is marked active at the last stage of CPU bringup (CPUHP_AP_ACTIVE).
It is also the stage where cpuset hotplug code will be called to update
the sched domains so that the scheduler can move a normal task to a
newly active CPU or remove tasks away from a newly inactivated CPU. The
online bit is set much earlier in the CPU bringup process and cleared
much later in CPU teardown.
If cpu_online_mask is used while a hotunplug operation is happening in
parallel, we may leave an offline CPU in cpu_allowed or have a higher
chance of leaving an offline CPU in some other masks. Avoid this
problem by always using cpu_active_mask in the cpuset code and leave
a comment as to why the use of cpu_online_mask is discouraged.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
This non-functional change serves as preparation for moving to
subsystem-based rstat trees. To simplify future commits, change the
signatures of existing cgroup-based rstat functions to become css-based and
rename them to reflect that.
Though the signatures have changed, the implementations have not. Within
these functions use the css->cgroup pointer to obtain the associated cgroup
and allow code to function the same just as it did before this patch. At
applicable call sites, pass the subsystem-specific css pointer as an
argument or pass a pointer to cgroup::self if not in subsystem context.
Note that cgroup_rstat_updated_list() and cgroup_rstat_push_children()
are not altered yet since there would be a larger amount of css to
cgroup conversions which may overcomplicate the code at this
intermediate phase.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The cgroup struct has a css field called "self". The main difference
between this css and the others found in the cgroup::subsys array is that
cgroup::self has a NULL subsystem pointer. There are several places where
checks are performed to determine whether the css in question is
cgroup::self or not. Instead of accessing css->ss directly, introduce a
helper function that shows the intent and use where applicable.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
This non-functional change serves as preparation for moving to
subsystem-based rstat trees. The base stats are not an actual subsystem,
but in future commits they will have exclusive rstat trees just as other
subsystems will.
Moving the base stat objects into a new struct allows the cgroup_rstat_cpu
struct to become more compact since it now only contains the minimum amount
of pointers needed for rstat participation. Subsystems will (in future
commits) make use of the compact cgroup_rstat_cpu struct while avoiding the
memory overhead of the base stat objects which they will not use.
An instance of the new struct cgroup_rstat_base_cpu was placed on the
cgroup struct so it can retain ownership of these base stats common to all
cgroups. A helper function was added for looking up the cpu-specific base
stats of a given cgroup. Finally, initialization and variable names were
adjusted where applicable.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
There is a possible race between removing a cgroup diectory that is
a partition root and the creation of a new partition. The partition
to be removed can be dying but still online, it doesn't not currently
participate in checking for exclusive CPUs conflict, but the exclusive
CPUs are still there in subpartitions_cpus and isolated_cpus. These
two cpumasks are global states that affect the operation of cpuset
partitions. The exclusive CPUs in dying cpusets will only be removed
when cpuset_css_offline() function is called after an RCU delay.
As a result, it is possible that a new partition can be created with
exclusive CPUs that overlap with those of a dying one. When that dying
partition is finally offlined, it removes those overlapping exclusive
CPUs from subpartitions_cpus and maybe isolated_cpus resulting in an
incorrect CPU configuration.
This bug was found when a warning was triggered in
remote_partition_disable() during testing because the subpartitions_cpus
mask was empty.
One possible way to fix this is to iterate the dying cpusets as well and
avoid using the exclusive CPUs in those dying cpusets. However, this
can still cause random partition creation failures or other anomalies
due to racing. A better way to fix this race is to reset the partition
state at the moment when a cpuset is being killed.
Introduce a new css_killed() CSS function pointer and call it, if
defined, before setting CSS_DYING flag in kill_css(). Also update the
css_is_dying() helper to use the CSS_DYING flag introduced by commit
33c35aa48178 ("cgroup: Prevent kill_css() from being called more than
once") for proper synchronization.
Add a new cpuset_css_killed() function to reset the partition state of
a valid partition root if it is being killed.
Fixes: ee8dde0cd2ce ("cpuset: Add new v2 cpuset.sched.partition flag")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The commit 093c8812de2d ("cgroup: rstat: Cleanup flushing functions and
locking") during cleanup accidentally changed the code to call
cgroup_rstat_updated_list() without cgroup_rstat_lock which is required.
Fix it.
Fixes: 093c8812de2d ("cgroup: rstat: Cleanup flushing functions and locking")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: Breno Leitao <leitao@debian.org>
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Closes: https://lore.kernel.org/all/6564c3d6-9372-4352-9847-1eb3aea07ca4@linux.ibm.com/
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The current cgroup directory layout for running the partition state
transition tests is mainly suitable for testing local partitions as
well as with a mix of local and remote partitions. It is not that
suitable for doing extensive remote partition and nested remote/local
partition testing.
Add a new set of remote partition tests REMOTE_TEST_MATRIX with another
cgroup directory structure more tailored for remote partition testing
to provide better code coverage.
Also add a few new test cases as well as adjusting existig ones for
the original TEST_MATRIX.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Cleaning up the test_cpuset_prs.sh script and restructure some of the
functions so that a new test matrix with a different cgroup directory
structure can be added in the next patch.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
state separator
Currently, ',' is used as the cgroup separator of the expected effective
CPUs and partition root states in the test matrix. However, ',' can be
part of the output of the cpuset.cpus*.effective and cpuset.cpus.isolated
files. Change the separator to '|' so that ',' can appear as part of
the expected values.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
The goto statement in sched_partition_write() is not needed. Remove
it and rename sched_partition_write()/sched_partition_show() to
cpuset_partition_write()/cpuset_partition_show().
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Rename partition_xcpus_newstate() to isolated_cpus_update(),
update_partition_exclusive() to update_partition_exclusive_flag() and
the new_xcpus_state variable to isolcpus_updated to make their meanings
more explicit. Also add some comments to further clarify the code.
No functional change is expected.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Currently, we don't allow the creation of a remote partition underneath
another local or remote partition. However, it is currently possible to
create a new local partition with an existing remote partition underneath
it if top_cpuset is the parent. However, the current cpuset code does
not set the effective exclusive CPUs correctly to account for those
that are taken by the remote partition.
Changing the code to properly account for those remote partition CPUs
under all possible circumstances can be complex. It is much easier to
not allow such a configuration which is not that useful. So forbid
that by making sure that exclusive_cpus mask doesn't overlap with
subpartitions_cpus and invalidate the partition if that happens.
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
handle remote partition
Currently, changes in exclusive CPUs are being handled in
remote_partition_check() by disabling conflicting remote partitions.
However, that may lead to results unexpected by the users. Fix
this problem by removing remote_partition_check() and making
update_cpumasks_hier() handle changes in descendant remote partitions
properly.
The compute_effective_exclusive_cpumask() function is enhanced to check
the exclusive_cpus and effective_xcpus from siblings and excluded them
in its effective exclusive CPUs computation and return a value to show if
there is any sibling conflicts. This is somewhat like the cpu_exclusive
flag check in validate_change(). This is the initial step to enable us
to retire the use of cpu_exclusive flag in cgroup v2 in the future.
One of the tests in the TEST_MATRIX of the test_cpuset_prs.sh
script has to be updated due to changes in the way a child remote
partition root is being handled (updated instead of invalidation)
in update_cpumasks_hier().
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
When remote_partition_disable() is called to disable a remote partition,
it always sets the partition to an invalid partition state. It should
only do so if an error code (prs_err) has been set. Correct that and
add proper error code in places where remote_partition_disable() is
called due to error.
Fixes: 181c8e091aae ("cgroup/cpuset: Introduce remote partition")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
update_parent_effective_cpumask()
Before commit f0af1bfc27b5 ("cgroup/cpuset: Relax constraints to
partition & cpus changes"), a cpuset partition cannot be enabled if not
all the requested CPUs can be granted from the parent cpuset. After
that commit, a cpuset partition can be created even if the requested
exclusive CPUs contain CPUs not allowed its parent. The delmask
containing exclusive CPUs to be removed from its parent wasn't
adjusted accordingly.
That is not a problem until the introduction of a new isolated_cpus
mask in commit 11e5f407b64a ("cgroup/cpuset: Keep track of CPUs in
isolated partitions") as the CPUs in the delmask may be added directly
into isolated_cpus.
As a result, isolated_cpus may incorrectly contain CPUs that are not
isolated leading to incorrect data reporting. Fix this by adjusting
the delmask to reflect the actual exclusive CPUs for the creation of
the partition.
Fixes: 11e5f407b64a ("cgroup/cpuset: Keep track of CPUs in isolated partitions")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
It turns out the code to generate the x86 cpufeaturemasks.h header was
way too aggressive, and would re-generate it whenever the timestamp on
the kernel config file changed.
Now, the regular 'make *config' tools are fairly careful to not rewrite
the kernel config file unless the contents change, but other usecases
aren't that careful.
Michael Kelley reports that 'make-kpkg' ends up doing "make syncconfig"
multiple times in prepping to build, and will modify the config file in
the process (and then modify it back, but by then the timestamps have
changed).
Jakub Kicinski reports that the netdev CI does something similar in how
it generates the config file in multiple steps.
In both cases, the config file timestamp updates then cause the
cpufeaturemasks.h file to be regenerated, and that in turn then causes
lots of unnecessary rebuilds due to all the normal dependencies.
Fix it by using our 'filechk' infrastructure in the Makefile to generate
the header file. That will only write a new version of the file if the
contents of the file have actually changed.
Fixes: 841326332bcb ("x86/cpufeatures: Generate the <asm/cpufeaturemasks.h> header based on build config")
Reported-by: Michael Kelley <mhklinux@outlook.com>
Reported-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/all/SN6PR02MB415756D1829740F6E8AC11D1D4D82@SN6PR02MB4157.namprd02.prod.outlook.com/
Link: https://lore.kernel.org/all/20250328162311.08134fa6@kernel.org/
Cc: Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer updates from Steven Rostedt:
- Restructure the persistent memory to have a "scratch" area
Instead of hard coding the KASLR offset in the persistent memory by
the ring buffer, push that work up to the callers of the persistent
memory as they are the ones that need this information. The offsets
and such is not important to the ring buffer logic and it should not
be part of that.
A scratch pad is now created when the caller allocates a ring buffer
from persistent memory by stating how much memory it needs to save.
- Allow where modules are loaded to be saved in the new scratch pad
Save the addresses of modules when they are loaded into the
persistent memory scratch pad.
- A new module_for_each_mod() helper function was created
With the acknowledgement of the module maintainers a new module
helper function was created to iterate over all the currently loaded
modules. This has a callback to be called for each module. This is
needed for when tracing is started in the persistent buffer and the
currently loaded modules need to be saved in the scratch area.
- Expose the last boot information where the kernel and modules were
loaded
The last_boot_info file is updated to print out the addresses of
where the kernel "_text" location was loaded from a previous boot, as
well as where the modules are loaded. If the buffer is recording the
current boot, it only prints "# Current" so that it does not expose
the KASLR offset of the currently running kernel.
- Allow the persistent ring buffer to be released (freed)
To have this in production environments, where the kernel command
line can not be changed easily, the ring buffer needs to be freed
when it is not going to be used. The memory for the buffer will
always be allocated at boot up, but if the system isn't going to
enable tracing, the memory needs to be freed. Allow it to be freed
and added back to the kernel memory pool.
- Allow stack traces to print the function names in the persistent
buffer
Now that the modules are saved in the persistent ring buffer, if the
same modules are loaded, the printing of the function names will
examine the saved modules. If the module is found in the scratch area
and is also loaded, then it will do the offset shift and use kallsyms
to display the function name. If the address is not found, it simply
displays the address from the previous boot in hex.
* tag 'trace-ringbuffer-v6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Use _text and the kernel offset in last_boot_info
tracing: Show last module text symbols in the stacktrace
ring-buffer: Remove the unused variable bmeta
tracing: Skip update_last_data() if cleared and remove active check for save_mod()
tracing: Initialize scratch_size to zero to prevent UB
tracing: Fix a compilation error without CONFIG_MODULES
tracing: Freeable reserved ring buffer
mm/memblock: Add reserved memory release function
tracing: Update modules to persistent instances when loaded
tracing: Show module names and addresses of last boot
tracing: Have persistent trace instances save module addresses
module: Add module_for_each_mod() function
tracing: Have persistent trace instances save KASLR offset
ring-buffer: Add ring_buffer_meta_scratch()
ring-buffer: Add buffer meta data for persistent ring buffer
ring-buffer: Use kaslr address instead of text delta
ring-buffer: Fix bytes_dropped calculation issue
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing documentation fix from Steven Rostedt:
"Documentation fix for runtime verifier
The runtime verifier documents that were created were not referenced
in the indices, which caused warning when building the documentation
tree. Those documents are now added to the rv indices"
* tag 'trace-latency-v6.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
Documentation/rv: Add sched pages to the indices
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
Pull perf tools updates from Namhyung Kim:
"perf record:
- Introduce latency profiling using scheduler information.
The latency profiling is to show impacts on wall-time rather than
cpu-time. By tracking context switches, it can weight samples and
find which part of the code contributed more to the execution
latency.
The value (period) of the sample is weighted by dividing it by the
number of parallel execution at the moment. The parallelism is
tracked in perf report with sched-switch records. This will reduce
the portion that are run in parallel and in turn increase the
portion of serial executions.
For now, it's limited to profile processes, IOW system-wide
profiling is not supported. You can add --latency option to enable
this.
$ perf record --latency -- make -C tools/perf
I've run the above command for perf build which adds -j option to
make with the number of CPUs in the system internally. Normally
it'd show something like below:
$ perf report -F overhead,comm
...
#
# Overhead Command
# ........ ...............
#
78.97% cc1
6.54% python3
4.21% shellcheck
3.28% ld
1.80% as
1.37% cc1plus
0.80% sh
0.62% clang
0.56% gcc
0.44% perl
0.39% make
...
The cc1 takes around 80% of the overhead as it's the actual
compiler. However it runs in parallel so its contribution to
latency may be less than that. Now, perf report will show both
overhead and latency (if --latency was given at record time) like
below:
$ perf report -s comm
...
#
# Overhead Latency Command
# ........ ........ ...............
#
78.97% 48.66% cc1
6.54% 25.68% python3
4.21% 0.39% shellcheck
3.28% 13.70% ld
1.80% 2.56% as
1.37% 3.08% cc1plus
0.80% 0.98% sh
0.62% 0.61% clang
0.56% 0.33% gcc
0.44% 1.71% perl
0.39% 0.83% make
...
You can see latency of cc1 goes down to around 50% and python3 and
ld contribute a lot more than their overhead. You can use --latency
option in perf report to get the same result but ordered by
latency.
$ perf report --latency -s comm
perf report:
- As a side effect of the latency profiling work, it adds a new
output field 'latency' and a sort key 'parallelism'. The below is a
result from my system with 64 CPUs. The build was well-parallelized
but contained some serial portions.
$ perf report -s parallelism
...
#
# Overhead Latency Parallelism
# ........ ........ ...........
#
16.95% 1.54% 62
13.38% 1.24% 61
12.50% 70.47% 1
11.81% 1.06% 63
7.59% 0.71% 60
4.33% 12.20% 2
3.41% 0.33% 59
2.05% 0.18% 64
1.75% 1.09% 9
1.64% 1.85% 5
...
- Support Feodra mini-debuginfo which is a LZMA compressed symbol
table inside ".gnu_debugdata" ELF section.
perf annotate:
- Add --code-with-type option to enable data-type profiling with the
usual annotate output.
Instead of focusing on data structure, it shows code annotation
together with data type it accesses in case the instruction refers
to a memory location (and it was able to resolve the target data
type). Currently it only works with --stdio.
$ perf annotate --stdio --code-with-type
...
Percent | Source code & Disassembly of vmlinux for cpu/mem-loads,ldlat=30/pp (18 samples, percent: local period)
----------------------------------------------------------------------------------------------------------------------
: 0 0xffffffff81050610 <__fdget>:
0.00 : ffffffff81050610: callq 0xffffffff81c01b80 <__fentry__> # data-type: (stack operation)
0.00 : ffffffff81050615: pushq %rbp # data-type: (stack operation)
0.00 : ffffffff81050616: movq %rsp, %rbp
0.00 : ffffffff81050619: pushq %r15 # data-type: (stack operation)
0.00 : ffffffff8105061b: pushq %r14 # data-type: (stack operation)
0.00 : ffffffff8105061d: pushq %rbx # data-type: (stack operation)
0.00 : ffffffff8105061e: subq $0x10, %rsp
0.00 : ffffffff81050622: movl %edi, %ebx
0.00 : ffffffff81050624: movq %gs:0x7efc4814(%rip), %rax # 0x14e40 <current_task> # data-type: struct task_struct* +0
0.00 : ffffffff8105062c: movq 0x8d0(%rax), %r14 # data-type: struct task_struct +0x8d0 (files)
0.00 : ffffffff81050633: movl (%r14), %eax # data-type: struct files_struct +0 (count.counter)
0.00 : ffffffff81050636: cmpl $0x1, %eax
0.00 : ffffffff81050639: je 0xffffffff810506a9 <__fdget+0x99>
0.00 : ffffffff8105063b: movq 0x20(%r14), %rcx # data-type: struct files_struct +0x20 (fdt)
0.00 : ffffffff8105063f: movl (%rcx), %eax # data-type: struct fdtable +0 (max_fds)
0.00 : ffffffff81050641: cmpl %ebx, %eax
0.00 : ffffffff81050643: jbe 0xffffffff810506ef <__fdget+0xdf>
0.00 : ffffffff81050649: movl %ebx, %r15d
5.56 : ffffffff8105064c: movq 0x8(%rcx), %rdx # data-type: struct fdtable +0x8 (fd)
...
The "# data-type:" part was added with this change. The first few
entries are not very interesting. But later you can it accesses a
couple of fields in the task_struct, files_struct and fdtable.
perf trace:
- Support syscall tracing for different ABI. For example it can trace
system calls for 32-bit applications on 64-bit kernel
transparently.
- Add --summary-mode=total option to show global syscall summary. The
default is 'thread' to show per-thread syscall summary.
Python support:
- Add more interfaces to 'perf' module to parse events, and config,
enable or disable the event list properly so that it can implement
basic functionalities purely in Python. There is an example code
for these new interfaces in python/tracepoint.py.
- Add mypy and pylint support to enable build time checking. Fix some
code based on the findings from these tools.
Internals:
- Introduce io_dir__readdir() API to make directory traveral (usually
for proc or sysfs) efficient with less memory footprint.
JSON vendor events:
- Add events and metrics for ARM Neoverse N3 and V3
- Update events and metrics on various Intel CPUs
- Add/update events for a number of SiFive processors"
* tag 'perf-tools-for-v6.15-2025-03-27' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (229 commits)
perf bpf-filter: Fix a parsing error with comma
perf report: Fix a memory leak for perf_env on AMD
perf trace: Fix wrong size to bpf_map__update_elem call
perf tools: annotate asm_pure_loop.S
perf python: Fix setup.py mypy errors
perf test: Address attr.py mypy error
perf build: Add pylint build tests
perf build: Add mypy build tests
perf build: Rename TEST_LOGS to SHELL_TEST_LOGS
tools/build: Don't pass test log files to linker
perf bench sched pipe: fix enforced blocking reads in worker_thread
perf tools: Fix is_compat_mode build break in ppc64
perf build: filter all combinations of -flto for libperl
perf vendor events arm64 AmpereOneX: Fix frontend_bound calculation
perf vendor events arm64: AmpereOne/AmpereOneX: Mark LD_RETIRED impacted by errata
perf trace: Fix evlist memory leak
perf trace: Fix BTF memory leak
perf trace: Make syscall table stable
perf syscalltbl: Mask off ABI type for MIPS system calls
perf build: Remove Makefile.syscalls
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Extract the 'pin-init' API from the 'kernel' crate and make it into
a standalone crate.
In order to do this, the contents are rearranged so that they can
easily be kept in sync with the version maintained out-of-tree that
other projects have started to use too (or plan to, like QEMU).
This will reduce the maintenance burden for Benno, who will now
have his own sub-tree, and will simplify future expected changes
like the move to use 'syn' to simplify the implementation.
- Add '#[test]'-like support based on KUnit.
We already had doctests support based on KUnit, which takes the
examples in our Rust documentation and runs them under KUnit.
Now, we are adding the beginning of the support for "normal" tests,
similar to those the '#[test]' tests in userspace Rust. For
instance:
#[kunit_tests(my_suite)]
mod tests {
#[test]
fn my_test() {
assert_eq!(1 + 1, 2);
}
}
Unlike with doctests, the 'assert*!'s do not map to the KUnit
assertion APIs yet.
- Check Rust signatures at compile time for functions called from C
by name.
In particular, introduce a new '#[export]' macro that can be placed
in the Rust function definition. It will ensure that the function
declaration on the C side matches the signature on the Rust
function:
#[export]
pub unsafe extern "C" fn my_function(a: u8, b: i32) -> usize {
// ...
}
The macro essentially forces the compiler to compare the types of
the actual Rust function and the 'bindgen'-processed C signature.
These cases are rare so far. In the future, we may consider
introducing another tool, 'cbindgen', to generate C headers
automatically. Even then, having these functions explicitly marked
may be a good idea anyway.
- Enable the 'raw_ref_op' Rust feature: it is already stable, and
allows us to use the new '&raw' syntax, avoiding a couple macros.
After everyone has migrated, we will disallow the macros.
- Pass the correct target to 'bindgen' on Usermode Linux.
- Fix 'rusttest' build in macOS.
'kernel' crate:
- New 'hrtimer' module: add support for setting up intrusive timers
without allocating when starting the timer. Add support for
'Pin<Box<_>>', 'Arc<_>', 'Pin<&_>' and 'Pin<&mut _>' as pointer
types for use with timer callbacks. Add support for setting clock
source and timer mode.
- New 'dma' module: add a simple DMA coherent allocator abstraction
and a test sample driver.
- 'list' module: make the linked list 'Cursor' point between
elements, rather than at an element, which is more convenient to us
and allows for cursors to empty lists; and document it with
examples of how to perform common operations with the provided
methods.
- 'str' module: implement a few traits for 'BStr' as well as the
'strip_prefix()' method.
- 'sync' module: add 'Arc::as_ptr'.
- 'alloc' module: add 'Box::into_pin'.
- 'error' module: extend the 'Result' documentation, including a few
examples on different ways of handling errors, a warning about
using methods that may panic, and links to external documentation.
'macros' crate:
- 'module' macro: add the 'authors' key to support multiple authors.
The original key will be kept until everyone has migrated.
Documentation:
- Add error handling sections.
MAINTAINERS:
- Add Danilo Krummrich as reviewer of the Rust "subsystem".
- Add 'RUST [PIN-INIT]' entry with Benno Lossin as maintainer. It has
its own sub-tree.
- Add sub-tree for 'RUST [ALLOC]'.
- Add 'DMA MAPPING HELPERS DEVICE DRIVER API [RUST]' entry with
Abdiel Janulgue as primary maintainer. It will go through the
sub-tree of the 'RUST [ALLOC]' entry.
- Add 'HIGH-RESOLUTION TIMERS [RUST]' entry with Andreas Hindborg as
maintainer. It has its own sub-tree.
And a few other cleanups and improvements"
* tag 'rust-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (71 commits)
rust: dma: add `Send` implementation for `CoherentAllocation`
rust: macros: fix `make rusttest` build on macOS
rust: block: refactor to use `&raw mut`
rust: enable `raw_ref_op` feature
rust: uaccess: name the correct function
rust: rbtree: fix comments referring to Box instead of KBox
rust: hrtimer: add maintainer entry
rust: hrtimer: add clocksource selection through `ClockId`
rust: hrtimer: add `HrTimerMode`
rust: hrtimer: implement `HrTimerPointer` for `Pin<Box<T>>`
rust: alloc: add `Box::into_pin`
rust: hrtimer: implement `UnsafeHrTimerPointer` for `Pin<&mut T>`
rust: hrtimer: implement `UnsafeHrTimerPointer` for `Pin<&T>`
rust: hrtimer: add `hrtimer::ScopedHrTimerPointer`
rust: hrtimer: add `UnsafeHrTimerPointer`
rust: hrtimer: allow timer restart from timer handler
rust: str: implement `strip_prefix` for `BStr`
rust: str: implement `AsRef<BStr>` for `[u8]` and `BStr`
rust: str: implement `Index` for `BStr`
rust: str: implement `PartialEq` for `BStr`
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux
Pull modules updates from Petr Pavlu:
- Use RCU instead of RCU-sched
The mix of rcu_read_lock(), rcu_read_lock_sched() and
preempt_disable() in the module code and its users has
been replaced with just rcu_read_lock()
- The rest of changes are smaller fixes and updates
* tag 'modules-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: (32 commits)
MAINTAINERS: Update the MODULE SUPPORT section
module: Remove unnecessary size argument when calling strscpy()
module: Replace deprecated strncpy() with strscpy()
params: Annotate struct module_param_attrs with __counted_by()
bug: Use RCU instead RCU-sched to protect module_bug_list.
static_call: Use RCU in all users of __module_text_address().
kprobes: Use RCU in all users of __module_text_address().
bpf: Use RCU in all users of __module_text_address().
jump_label: Use RCU in all users of __module_text_address().
jump_label: Use RCU in all users of __module_address().
x86: Use RCU in all users of __module_address().
cfi: Use RCU while invoking __module_address().
powerpc/ftrace: Use RCU in all users of __module_text_address().
LoongArch: ftrace: Use RCU in all users of __module_text_address().
LoongArch/orc: Use RCU in all users of __module_address().
arm64: module: Use RCU in all users of __module_text_address().
ARM: module: Use RCU in all users of __module_text_address().
module: Use RCU in all users of __module_text_address().
module: Use RCU in all users of __module_address().
module: Use RCU in search_module_extables().
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 fixes and updates from Ingo Molnar:
- Fix a large number of x86 Kconfig dependency and help text accuracy
bugs/problems, by Mateusz Jończyk and David Heideberg
- Fix a VM_PAT interaction with fork() crash. This also touches core
kernel code
- Fix an ORC unwinder bug for interrupt entries
- Fixes and cleanups
- Fix an AMD microcode loader bug that can promote verification
failures into success
- Add early-printk support for MMIO based UARTs on an x86 board that
had no other serial debugging facility and also experienced early
boot crashes
* tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Fix __apply_microcode_amd()'s return value
x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
x86/fpu: Update the outdated comment above fpstate_init_user()
x86/early_printk: Add support for MMIO-based UARTs
x86/dumpstack: Fix inaccurate unwinding from exception stacks due to misplaced assignment
x86/entry: Fix ORC unwinder for PUSH_REGS with save_ret=1
x86/Kconfig: Fix lists in X86_EXTENDED_PLATFORM help text
x86/Kconfig: Correct X86_X2APIC help text
x86/speculation: Remove the extra #ifdef around CALL_NOSPEC
x86/Kconfig: Document release year of glibc 2.3.3
x86/Kconfig: Make CONFIG_PCI_CNB20LE_QUIRK depend on X86_32
x86/Kconfig: Document CONFIG_PCI_MMCONFIG
x86/Kconfig: Update lists in X86_EXTENDED_PLATFORM
x86/Kconfig: Move all X86_EXTENDED_PLATFORM options together
x86/Kconfig: Always enable ARCH_SPARSEMEM_ENABLE
x86/Kconfig: Enable X86_X2APIC by default and improve help text
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc locking fixes and updates from Ingo Molnar:
- Fix a locking self-test FAIL on PREEMPT_RT kernels
- Fix nr_unused_locks accounting bug
- Simplify the split-lock debugging feature's fast-path
* tag 'locking-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Decrease nr_unused_locks if lock unused in zap_class()
lockdep: Fix wait context check on softirq for PREEMPT_RT
x86/split_lock: Simplify reenabling
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf try_alloc_pages() support from Alexei Starovoitov:
"The pull includes work from Sebastian, Vlastimil and myself with a lot
of help from Michal and Shakeel.
This is a first step towards making kmalloc reentrant to get rid of
slab wrappers: bpf_mem_alloc, kretprobe's objpool, etc. These patches
make page allocator safe from any context.
Vlastimil kicked off this effort at LSFMM 2024:
https://lwn.net/Articles/974138/
and we continued at LSFMM 2025:
https://lore.kernel.org/all/CAADnVQKfkGxudNUkcPJgwe3nTZ=xohnRshx9kLZBTmR_E1DFEg@mail.gmail.com/
Why:
SLAB wrappers bind memory to a particular subsystem making it
unavailable to the rest of the kernel. Some BPF maps in production
consume Gbytes of preallocated memory. Top 5 in Meta: 1.5G, 1.2G,
1.1G, 300M, 200M. Once we have kmalloc that works in any context BPF
map preallocation won't be necessary.
How:
Synchronous kmalloc/page alloc stack has multiple stages going from
fast to slow: cmpxchg16 -> slab_alloc -> new_slab -> alloc_pages ->
rmqueue_pcplist -> __rmqueue, where rmqueue_pcplist was already
relying on trylock.
This set changes rmqueue_bulk/rmqueue_buddy to attempt a trylock and
return ENOMEM if alloc_flags & ALLOC_TRYLOCK. It then wraps this
functionality into try_alloc_pages() helper. We make sure that the
logic is sane in PREEMPT_RT.
End result: try_alloc_pages()/free_pages_nolock() are safe to call
from any context.
try_kmalloc() for any context with similar trylock approach will
follow. It will use try_alloc_pages() when slab needs a new page.
Though such try_kmalloc/page_alloc() is an opportunistic allocator,
this design ensures that the probability of successful allocation of
small objects (up to one page in size) is high.
Even before we have try_kmalloc(), we already use try_alloc_pages() in
BPF arena implementation and it's going to be used more extensively in
BPF"
* tag 'bpf_try_alloc_pages' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next:
mm: Fix the flipped condition in gfpflags_allow_spinning()
bpf: Use try_alloc_pages() to allocate pages for bpf needs.
mm, bpf: Use memcg in try_alloc_pages().
memcg: Use trylock to access memcg stock_lock.
mm, bpf: Introduce free_pages_nolock()
mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation
locking/local_lock: Introduce localtry_lock_t
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf relisient spinlock support from Alexei Starovoitov:
"This patch set introduces Resilient Queued Spin Lock (or rqspinlock
with res_spin_lock() and res_spin_unlock() APIs).
This is a qspinlock variant which recovers the kernel from a stalled
state when the lock acquisition path cannot make forward progress.
This can occur when a lock acquisition attempt enters a deadlock
situation (e.g. AA, or ABBA), or more generally, when the owner of the
lock (which we’re trying to acquire) isn’t making forward progress.
Deadlock detection is the main mechanism used to provide instant
recovery, with the timeout mechanism acting as a final line of
defense. Detection is triggered immediately when beginning the waiting
loop of a lock slow path.
Additionally, BPF programs attached to different parts of the kernel
can introduce new control flow into the kernel, which increases the
likelihood of deadlocks in code not written to handle reentrancy.
There have been multiple syzbot reports surfacing deadlocks in
internal kernel code due to the diverse ways in which BPF programs can
be attached to different parts of the kernel. By switching the BPF
subsystem’s lock usage to rqspinlock, all of these issues are
mitigated at runtime.
This spin lock implementation allows BPF maps to become safer and
remove mechanisms that have fallen short in assuring safety when
nesting programs in arbitrary ways in the same context or across
different contexts.
We run benchmarks that stress locking scalability and perform
comparison against the baseline (qspinlock). For the rqspinlock case,
we replace the default qspinlock with it in the kernel, such that all
spin locks in the kernel use the rqspinlock slow path. As such,
benchmarks that stress kernel spin locks end up exercising rqspinlock.
More details in the cover letter in commit 6ffb9017e932 ("Merge branch
'resilient-queued-spin-lock'")"
* tag 'bpf_res_spin_lock' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (24 commits)
selftests/bpf: Add tests for rqspinlock
bpf: Maintain FIFO property for rqspinlock unlock
bpf: Implement verifier support for rqspinlock
bpf: Introduce rqspinlock kfuncs
bpf: Convert lpm_trie.c to rqspinlock
bpf: Convert percpu_freelist.c to rqspinlock
bpf: Convert hashtab.c to rqspinlock
rqspinlock: Add locktorture support
rqspinlock: Add entry to Makefile, MAINTAINERS
rqspinlock: Add macros for rqspinlock usage
rqspinlock: Add basic support for CONFIG_PARAVIRT
rqspinlock: Add a test-and-set fallback
rqspinlock: Add deadlock detection and recovery
rqspinlock: Protect waiters in trylock fallback from stalls
rqspinlock: Protect waiters in queue from stalls
rqspinlock: Protect pending bit owners from stalls
rqspinlock: Hardcode cond_acquire loops for arm64
rqspinlock: Add support for timeouts
rqspinlock: Drop PV and virtualization support
rqspinlock: Add rqspinlock.h header
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
"For this merge window we're splitting BPF pull request into three for
higher visibility: main changes, res_spin_lock, try_alloc_pages.
These are the main BPF changes:
- Add DFA-based live registers analysis to improve verification of
programs with loops (Eduard Zingerman)
- Introduce load_acquire and store_release BPF instructions and add
x86, arm64 JIT support (Peilin Ye)
- Fix loop detection logic in the verifier (Eduard Zingerman)
- Drop unnecesary lock in bpf_map_inc_not_zero() (Eric Dumazet)
- Add kfunc for populating cpumask bits (Emil Tsalapatis)
- Convert various shell based tests to selftests/bpf/test_progs
format (Bastien Curutchet)
- Allow passing referenced kptrs into struct_ops callbacks (Amery
Hung)
- Add a flag to LSM bpf hook to facilitate bpf program signing
(Blaise Boscaccy)
- Track arena arguments in kfuncs (Ihor Solodrai)
- Add copy_remote_vm_str() helper for reading strings from remote VM
and bpf_copy_from_user_task_str() kfunc (Jordan Rome)
- Add support for timed may_goto instruction (Kumar Kartikeya
Dwivedi)
- Allow bpf_get_netns_cookie() int cgroup_skb programs (Mahe Tardy)
- Reduce bpf_cgrp_storage_busy false positives when accessing cgroup
local storage (Martin KaFai Lau)
- Introduce bpf_dynptr_copy() kfunc (Mykyta Yatsenko)
- Allow retrieving BTF data with BTF token (Mykyta Yatsenko)
- Add BPF kfuncs to set and get xattrs with 'security.bpf.' prefix
(Song Liu)
- Reject attaching programs to noreturn functions (Yafang Shao)
- Introduce pre-order traversal of cgroup bpf programs (Yonghong
Song)"
* tag 'bpf-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (186 commits)
selftests/bpf: Add selftests for load-acquire/store-release when register number is invalid
bpf: Fix out-of-bounds read in check_atomic_load/store()
libbpf: Add namespace for errstr making it libbpf_errstr
bpf: Add struct_ops context information to struct bpf_prog_aux
selftests/bpf: Sanitize pointer prior fclose()
selftests/bpf: Migrate test_xdp_vlan.sh into test_progs
selftests/bpf: test_xdp_vlan: Rename BPF sections
bpf: clarify a misleading verifier error message
selftests/bpf: Add selftest for attaching fexit to __noreturn functions
bpf: Reject attaching fexit/fmod_ret to __noreturn functions
bpf: Only fails the busy counter check in bpf_cgrp_storage_get if it creates storage
bpf: Make perf_event_read_output accessible in all program types.
bpftool: Using the right format specifiers
bpftool: Add -Wformat-signedness flag to detect format errors
selftests/bpf: Test freplace from user namespace
libbpf: Pass BPF token from find_prog_btf_id to BPF_BTF_GET_FD_BY_ID
bpf: Return prog btf_id without capable check
bpf: BPF token support for BPF_BTF_GET_FD_BY_ID
bpf, x86: Fix objtool warning for timed may_goto
bpf: Check map->record at the beginning of check_and_free_fields()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox
Pull mailbox updates from Jassi Brar:
"Core:
- misc rejig of header includes
- minor const fixes
Misc:
- constify amba_id table
pcc:
- cleanup and refactoring of shmem and irq handling
qcom:
- add MSM8226 compatible
fsl,mu:
- add i.MX94 compatible
mediatek:
- remove cl in struct cmdq_pkt
tegra:
- define dimensioning masks in SoC data"
* tag 'mailbox-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox: (25 commits)
mailbox: Remove unneeded semicolon
mailbox: pcc: Refactor and simplify check_and_ack()
mailbox: pcc: Always map the shared memory communication address
mailbox: pcc: Refactor error handling in irq handler into separate function
mailbox: pcc: Use acpi_os_ioremap() instead of ioremap()
mailbox: pcc: Return early if no GAS register from pcc_mbox_cmd_complete_check
mailbox: pcc: Drop unnecessary endianness conversion of pcc_hdr.flags
mailbox: pcc: Always clear the platform ack interrupt first
mailbox: pcc: Fix the possible race in updation of chan_in_use flag
dt-bindings: mailbox: qcom: add compatible for MSM8226 SoC
dt-bindings: mailbox: fsl,mu: Add i.MX94 compatible
MAINTAINERS: add mailbox API's tree type and location
mailbox: remove unused header files
mailbox: explicitly include <linux/bits.h>
mailbox: sort headers alphabetically
mailbox: don't protect of_parse_phandle_with_args with con_mutex
mailbox: use error ret code of of_parse_phandle_with_args()
mailbox: arm_mhuv2: Constify amba_id table
mailbox: arm_mhu_db: Constify amba_id table
mailbox: arm_mhu: Constify amba_id table
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi
Pull HSI update from Sebastian Reichel:
- ssi_protocol: fix potential use after free after module removal
* tag 'hsi-for-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi:
HSI: ssi_protocol: Fix use after free vulnerability in ssi_protocol Driver Due to Race Condition
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply and reset updates from Sebastian Reichel:
"Power-supply core:
- remove unused set_charged infrastructure
- drop of_node from power_supply struct
Power-supply drivers:
- axp717: support devices without thermistors
- bq27xxx: support max design voltage for bq270x0 and bq27x10
- pcf50633: drop charger driver
- max1720x: add battery health support
- switch all power-supply devices from of_node to fwnode
- convert regmap users to maple tree register cache
- convert drivers to devm_kmemdup_array
- misc cleanups and fixes
Reset drivers:
- at91-sama5d2_shdwc: add sama7d65 support
* tag 'for-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (30 commits)
power: supply: mt6370: Remove redundant 'flush_workqueue()' calls
Revert "power: supply: bq27xxx: do not report bogus zero values"
power: supply: max77693: Fix wrong conversion of charge input threshold value
power: supply: pcf50633: Remove charger
power: supply: all: switch psy_cfg from of_node to fwnode
power: supply: core: get rid of of_node
power: reset: at91-sama5d2_shdwc: Add sama7d65 PMC
power: supply: smb347: convert to use maple tree register cache
power: supply: rt9455: convert to use maple tree register cache
power: supply: max1720x: convert to use maple tree register cache
power: supply: ltc4162l: convert to use maple tree register cache
power: supply: bq25980: convert to use maple tree register cache
power: supply: bq25890: convert to use maple tree register cache
power: supply: bq2515x: convert to use maple tree register cache
power: supply: bq24257: convert to use maple tree register cache
power: supply: bd99954: convert to use maple tree register cache
power: supply: Remove unused set_charged method
power: supply: ds2760: Remove unused ds2760_battery_set_charged
power: supply: core: Remove unused power_supply_set_battery_charged
power: supply: sc27xx: use devm_kmemdup_array()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk updates from Stephen Boyd:
"Here's the pile of clk driver patches. The usual suspects^Wsilicon
vendors are all here, adding new SoC support and fixing existing code.
There are a few patches to the clk framework here as well. They've
been baking in linux-next for weeks so I'm hoping we don't have to
revert them. The disable OF node patch is probably the scariest one
although it seems unlikely that a system would be relying on a driver
_not_ probing because the clk never appeared, but you never know.
Nothing looks out of the ordinary on the driver side but that's
because it's mostly a bunch of data.
Core:
- Use dev_err_probe() in the clk registration path (Peering into the
crystal ball shows many patches that remove printks)
- Check for disabled OF nodes in of_clk_get_hw_from_clkspec()
New Drivers:
- Allwinner A523/T527 clk driver
- Qualcomm IPQ9574 NSS clk driver
- Qualcomm QCS8300 GPU and video clk drivers
- Qualcomm SDM429 RPM clks
- Qualcomm QCM6490 LPASS (low power audio) resets
- Samsung Exynos2200: driver for several clock controllers (Alive,
CMGP, HSI, PERIC/PERIS, TOP, UFS and VFS)
- Samsung Exynos7870: Driver for several clock controllers (Alive,
MIF, DISP AUD, FSYS, G3D, ISP, MFC and PERI)
- Rockchip rk3528 and rk3562 clk driver
Updates:
- Various fixes to SoC clk drivers for incorrect data, avoid touching
protected registers, etc.
- Additions for some missing clks in existing SoC clk drivers
- DT schema conversions from text to YAML
- Kconfig cleanups to allow drivers to be compiled on moar
architectures"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (125 commits)
clk: qcom: Add NSS clock Controller driver for IPQ9574
clk: qcom: gcc-ipq9574: Add support for gpll0_out_aux clock
dt-bindings: clock: Add ipq9574 NSSCC clock and reset definitions
dt-bindings: clock: gcc-ipq9574: Add definition for GPLL0_OUT_AUX
clk: qcom: gcc-msm8953: fix stuck venus0_core0 clock
clk: qcom: mmcc-sdm660: fix stuck video_subcore0 clock
dt-bindings: clock: qcom,x1e80100-camcc: Fix the list of required-opps
clk: amlogic: a1: fix a typo
clk: amlogic: gxbb: drop non existing 32k clock parent
clk: amlogic: gxbb: drop incorrect flag on 32k clock
clk: amlogic: g12b: fix cluster A parent data
clk: amlogic: g12a: fix mmc A peripheral clock
dt-bindings: clocks: atmel,at91rm9200-pmc: add missing compatibles
dt-bindings: reset: fix double id on rk3562-cru reset ids
drivers: clk: qcom: ipq5424: fix the freq table of sdcc1_apps clock
clk: qcom: lpassaudiocc-sc7280: Add support for LPASS resets for QCM6490
dt-bindings: clock: qcom: Add compatible for QCM6490 boards
clk: qcom: gdsc: Update the status poll timeout for GDSC
clk: qcom: gdsc: Set retain_ff before moving to HW CTRL
clk: davinci: remove support for da830
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull remoteproc updates from Bjorn Andersson:
- Transition the i.MX8MP DSP remoteproc driver to use the reset
framework for driving the run/stall reset bits
- Add support for managing the modem remoteprocessor on the Qualcomm
MSM8226, MSM8926, and SM8750 platforms
* tag 'rproc-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux: (28 commits)
remoteproc: qcom_q6v5_pas: Make single-PD handling more robust
remoteproc: qcom_q6v5_pas: Use resource with CX PD for MSM8226
remoteproc: core: Clear table_sz when rproc_shutdown
remoteproc: sysmon: Update qcom_add_sysmon_subdev() comment
dt-bindings: remoteproc: Consolidate SC8180X and SM8150 PAS files
irqdomain: remoteproc: Switch to of_fwnode_handle()
remoteproc: qcom: pas: add minidump_id to SC7280 WPSS
remoteproc: imx_dsp_rproc: Document run_stall struct member
remoteproc: qcom: pas: Add SM8750 MPSS
dt-bindings: remoteproc: Add SM8750 MPSS
imx_dsp_rproc: Use reset controller API to control the DSP
reset: imx8mp-audiomix: Add support for DSP run/stall
reset: imx8mp-audiomix: Introduce active_low configuration option
reset: imx8mp-audiomix: Prepare the code for more reset bits
reset: imx8mp-audiomix: Add prefix for internal macro
dt-bindings: dsp: fsl,dsp: Add resets property
dt-bindings: reset: audiomix: Add reset ids for EARC and DSP
remoteproc: qcom_wcnss: Handle platforms with only single power domain
dt-bindings: remoteproc: qcom,wcnss-pil: Add support for single power-domain platforms
remoteproc: qcom_q6v5_mss: Add modem support on MSM8926
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux
Pull hwspinlock updates from Bjorn Andersson:
"Drop a few unused functions from the hwspinlock framework"
* tag 'hwlock-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/remoteproc/linux:
hwspinlock: Remove unused hwspin_lock_get_id()
hwspinlock: Remove unused (devm_)hwspin_lock_request()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control updates from Linus Walleij:
"Core changes:
- None really.
New drivers:
- AMD ISP411 "AMD ISP" driver
- Exynos 2200 and 7870 SoC subdrivers
- Sophgo RISC-V SG2042 and SG2044 subdrivers
- Amlogic A4 subdriver
- Rockchip RK3528 subdriver
- Broadcom BCM21664 subdriver
- Allwinner A523/T527 subdriver
- Ingenic X1600 subdriver
- Microchip SAMA7D65 subdriver, essentially a re-branded Atmel AT91
PIO4 driver, but nowadays a Microschip SoC line
Improvements:
- Bring in the devm_kmemdup_array() helper and use it throughout,
also bring in changes to other subsystems for this to establish
this helper
- Support EGPIO on the Qualcomm SA8775P SoC
- Extend EINT support in the Mediatek driver"
* tag 'pinctrl-v6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (101 commits)
pinctrl: mediatek: Add EINT support for multiple addresses
pinctrl: amlogic-a4: Drop surplus semicolon
pinctrl: nuvoton: Reduce use of OF-specific APIs
pinctrl: nuvoton: Convert to use struct group_desc
pinctrl: nuvoton: Make use of struct pinfunction and PINCTRL_PINFUNCTION()
pinctrl: nuvoton: Convert to use struct pingroup and PINCTRL_PINGROUP()
pinctrl: npcm8xx: Fix incorrect struct npcm8xx_pincfg assignment
pinctrl: tegra: Fix off by one in tegra_pinctrl_get_group()
pinctrl: PINCTRL_AMDISP should depend on DRM_AMD_ISP
pinctrl: qcom: sa8775p: Enable egpio function
dt-bindings: pinctrl: qcom: Add egpio function for sa8775p
pinctrl: qcom: tlmm-test: Validate irq_enable delivers edge irqs
pinctrl: qcom: Clear latched interrupt status when changing IRQ type
dt-bindings: pinctrl: airoha: Add missing gpio-ranges property
pinctrl: bcm281xx: Add missing assignment in bcm21664_pinctrl_lock_all()
pinctrl: amd: isp411: Fix IS_ERR() vs NULL check in probe()
dt-bindings: pinctrl: at91-pio4: add microchip,sama7d65-pinctrl
pinctrl: tegra: Set SFIO mode to Mux Register
pinctrl-tegra: Restore SFSEL bit when freeing pins
pinctrl: tegra: Add descriptions for SoC data fields
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
- Apple DWI Backlight:
- Added devicetree bindings for backlight controllers on Apple's DWI
interface.
- Added a new driver (apple_dwi_bl) for these controllers found on
some Apple mobile devices.
- Added MAINTAINERS entries for the new driver.
- led_bl: Fixed a locking issue by holding the led_access lock when
calling led_sysfs_disable() during device removal to prevent
potential warnings.
- Removed unnecessary <linux/fb.h> includes from a bunch of drivers.
- tdo24m: Removed redundant whitespace in Kconfig description.
- pcf50633-backlight: Removed the driver as the underlying pcf50633 MFD
and s3c24xx platform support were removed.
* tag 'backlight-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight: (22 commits)
backlight: pcf50633-backlight: Remove unused driver
backlight: tdo24m: Eliminate redundant whitespace
MAINTAINERS: Add entries for Apple DWI backlight controller
backlight: apple_dwi_bl: Add Apple DWI backlight driver
dt-bindings: leds: backlight: apple,dwi-bl: Add Apple DWI backlight
backlight: led_bl: Hold led_access lock when calling led_sysfs_disable()
backlight: wm831x_bl: Do not include <linux/fb.h>
backlight: vgg2432a4: Do not include <linux/fb.h>
backlight: tps65217_bl: Do not include <linux/fb.h>
backlight: max8925_bl: Do not include <linux/fb.h>
backlight: lv5207lp: Do not include <linux/fb.h>
backlight: locomolcd: Do not include <linux/fb.h>
backlight: hp680_bl: Do not include <linux/fb.h>
backlight: ep93xx_bl: Do not include <linux/fb.h>
backlight: da9052_bl: Do not include <linux/fb.h>
backlight: da903x_bl: Do not include <linux/fb.h>
backlight: bd6107_bl: Do not include <linux/fb.h>
backlight: as3711_bl: Do not include <linux/fb.h>
backlight: adp8870_bl: Do not include <linux/fb.h>
backlight: adp8860_bl: Do not include <linux/fb.h>
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds
Pull LED updates from Lee Jones:
- pca955x: Add HW blink support, utilizing PWM0. It supports one
frequency across all blinking LEDs and falls back to software blink
if different frequencies are requested.
- trigger: netdev: Allow configuring LED blink interval via .blink_set
even when HW offload (.hw_control) is enabled.
- led-core: Fix a race condition where a quick LED_OFF followed by
another brightness set could leave the LED off incorrectly,
especially noticeable after the introduction of the ordered
workqueue.
- qcom-lpg: Add support for 6-bit PWM resolution alongside the existing
9-bit support.
- qcom-lpg: Fix PWM value capping to respect the selected resolution
(6-bit or 9-bit) for normal PWMs.
- qcom-lpg: Fix PWM value capping to respect the selected resolution
for Hi-Res PWMs.
- qcom-lpg: Fix calculation of the best period for Hi-Res PWMs to
prevent requested duty cycles from exceeding the maximum allowed by
the selected resolution.
- st1202: Add a check for the error code returned by devm_mutex_init().
- pwm-multicolor: Add a check for the return value of
fwnode_property_read_u32().
- st1202: Ensure hardware initialization (st1202_setup) happens before
DT node processing (st1202_dt_init).
- Kconfig: leds-st1202: Add select LEDS_TRIGGER_PATTERN as it's
required by the driver.
- lp8860: Drop unneeded explicit assignment to REGCACHE_NONE.
- pca955x: Refactor code with helper functions and rename some
functions/variables for clarity.
- pca955x: Pass driver data pointers instead of the I2C client to
helper functions.
- pca955x: Optimize probe LED selection logic to reduce I2C operations.
- pca955x: Revert the removal of pca95xx_num_led_regs() (renaming it to
pca955x_num_led_regs) as it's needed for HW blink support.
- st1202: Refactor st1202_led_set() to use the !! operator for boolean
conversion.
- st1202: Minor spacing and proofreading edits in comments.
- Directory Rename: Rename the drivers/leds/simple directory to
drivers/leds/simatic as the drivers within are not simple.
- mlxcpld: Remove unused include of acpi.h.
- nic78bx: Tidy up the ACPI ID table (remove ACPI_PTR, use
mod_devicetable.h, remove explicit driver_data initializer).
- tlc591xx: Convert text binding to YAML format, add child node
constraints, and fix typos/formatting in the example.
- qcom-lpg: Document the qcom,pm8937-pwm compatible string as a
fallback for qcom,pm8916-pwm.
* tag 'leds-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/leds: (23 commits)
leds: nic78bx: Tidy up ACPI ID table
leds: mlxcpld: Remove unused ACPI header inclusion
leds: rgb: leds-qcom-lpg: Fix calculation of best period Hi-Res PWMs
leds: rgb: leds-qcom-lpg: Fix pwm resolution max for Hi-Res PWMs
leds: rgb: leds-qcom-lpg: Fix pwm resolution max for normal PWMs
leds: Rename simple directory to simatic
leds: Kconfig: leds-st1202: Add select for required LEDS_TRIGGER_PATTERN
leds: leds-st1202: Spacing and proofreading editing
leds: leds-st1202: Initialize hardware before DT node child operations
leds: pwm-multicolor: Add check for fwnode_property_read_u32
leds: rgb: leds-qcom-lpg: Add support for 6-bit PWM resolution
leds: Fix LED_OFF brightness race
Revert "leds-pca955x: Remove the unused function pca95xx_num_led_regs()"
leds: st1202: Refactor st1202_led_set() to use !! operator for boolean conversion
dt-bindings: leds: qcom-lpg: Document PM8937 PWM compatible
leds: pca955x: Add HW blink support
leds: pca955x: Optimize probe LED selection
leds: pca955x: Use pointers to driver data rather than I2C client
leds: pca955x: Refactor with helper functions and renaming
dt-bindings: leds: Convert leds-tlc591xx.txt to yaml format
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"Maxim MAX77705:
- Added core MFD driver.
- Added charger driver.
- Added devicetree bindings for the charger and MFD core.
- Added Haptic controller support via the input subsystem.
- Added LED support.
- Added support to simple-mfd-i2c for fuel gauge and hwmon.
Samsung S2MPU05 (Exynos7870 PMIC):
- Added core MFD support.
- Added Regulator support for 21 LDOs and 5 BUCKs.
- Added devicetree bindings for regulators and the PMIC core.
TI TPS65215 & TPS65214:
- Added support to the existing TPS65219 driver.
- Added devicetree bindings.
STMicroelectronics STM32MP25:
- Added support to the stm32-timers MFD driver.
- Added devicetree bindings.
Congatec Board Controller (CGBC):
- Added HWMON support for internal sensors.
- Added support for the conga-SA8 module.
Microchip LAN969X:
- Enabled the at91-usart MFD driver for this architecture.
MediaTek MT6359:
- Added mfd_cell for mt6359-accdet to allow its driver to probe.
Other misc driver updates:
- AXP20X (AXP717): Added AXP717_TS_PIN_CFG register to writeable regs
for temperature sensor configuration.
- SM501: Switched to using BIT() macro to mitigate potential integer
overflows in GPIO functions.
- ENE KB3930: Added a NULL pointer check for off_gpios during probe
to prevent potential dereference.
- SYSCON: Added a check for invalid resource size to prevent issues
from DT misconfiguration.
- CGBC: Corrected signedness issues in cgbc_session_request
- intel_soc_pmic_chtdc_ti / intel_soc_pmic_crc: Removed unneeded
explicit assignment to REGCACHE_NONE.
- ipaq-micro / tps65010: Switched to using str_enable_disable()
helpers for clarity and potential size reduction.
- upboard-fpga: Removed unnecessary ACPI_PTR() annotation.
- max8997: Removed unused max8997_irq_exit() function, using devm_*
helpers instead.
- lp3943: Dropped unused #include <linux/pwm.h> from the header file.
- db8500-prcmu: Removed needless return statements in void APIs.
- qnap-mcu: Replaced commas with semicolons between expressions for
correctness.
- STA2X11: Removed the core MFD driver as the underlying platform
support was removed.
- EZX-PCAP: Removed the unused pcap_adc_sync function.
- PCF50633 (OpenMoko PMIC): Removed the entire driver (core, adc,
gpio, irq) as the underlying s3c24xx platform support was removed.
Devicetree updates:
- Converted fsl,mcu-mpc8349emitx binding to YAML
- Added qcom,msm8937-tcsr compatible
- Added microchip,sama7d65-flexcom compatible
- Added rockchip,rk3528-qos syscon compatible
- Added airoha,en7581-pbus-csr syscon compatible
- Added microchip,sama7d65-ddr3phy syscon compatible
- Added microchip,sama7d65-sfrbu syscon compatible"
* tag 'mfd-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (49 commits)
mfd: cgbc-core: Add support for conga-SA8
dt-bindings: mfd: syscon: Add microchip,sama7d65-sfrbu
dt-bindings: mfd: syscon: Add microchip,sama7d65-ddr3phy
mfd: cgbc: Add support for HWMON
dt-bindings: mfd: syscon: Add the pbus-csr node for Airoha EN7581 SoC
mfd: cgbc-core: Cleanup signedness in cgbc_session_request()
mfd: pcf50633: Remove remaining PCF50633 support
mfd: pcf50633: Remove unused platform IRQ code
mfd: pcF50633-gpio: Remove unused driver
mfd: pcf50633-adc: Remove unused driver
mfd: qnap-mcu: Convert commas to semicolons in qnap_mcu_exec()
mfd: mt6397-core: Add mfd_cell for mt6359-accdet
dt-bindings: mfd: syscon: Add rk3528 QoS register compatible
dt-bindings: mfd: atmel,sama5d2-flexcom: Add microchip,sama7d65-flexcom
mfd: ezx-pcap: Remove unused pcap_adc_sync
mfd: db8500-prcmu: Remove needless return in three void APIs
mfd: Remove STA2x11 core driver
mfd: max77620: Allow building as a module
mfd: ene-kb3930: Fix a potential NULL pointer dereference
dt-bindings: mfd: qcom,tcsr: Add compatible for MSM8937
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
Pull regmap updates from Mark Brown:
"Only a couple of small patches this release, one refactoring struct
regmap to pack it more efficiently and another which makes our way of
setting all bits consistent in the regmap-irq code"
* tag 'regmap-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
regmap: irq: Use one way of setting all bits in the register
regmap: Reorder 'struct regmap'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
- drop parisc specific memcpy_fromio() function
- clean up coding style and fix compile warnings
* tag 'parisc-for-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: led: Use scnprintf() to avoid string truncation warning
Input: gscps2 - Describe missing function parameters
parisc: perf: use named initializers for struct miscdevice
parisc: PDT: Fix missing prototype warning
parisc: Remove memcpy_fromio
parisc: Fix formatting errors in io.c
|