diff options
author | Andrea Righi <arighi@nvidia.com> | 2025-04-05 15:39:24 +0200 |
---|---|---|
committer | Tejun Heo <tj@kernel.org> | 2025-04-07 07:13:52 -1000 |
commit | 683d2d0faba12a0e7d4c3b85a62ac8298977e17b (patch) | |
tree | 874ab03422042b9c9178369529c25dc04bd2981e /kernel/sched/ext.c | |
parent | c2d8b2a57cd461f99eb8ebfaf08cded812d4396f (diff) |
sched_ext: idle: Introduce scx_bpf_select_cpu_and()
Provide a new kfunc, scx_bpf_select_cpu_and(), that can be used to apply
the built-in idle CPU selection policy to a subset of allowed CPU.
This new helper is basically an extension of scx_bpf_select_cpu_dfl().
However, when an idle CPU can't be found, it returns a negative value
instead of @prev_cpu, aligning its behavior more closely with
scx_bpf_pick_idle_cpu().
It also accepts %SCX_PICK_IDLE_* flags, which can be used to enforce
strict selection to @prev_cpu's node (%SCX_PICK_IDLE_IN_NODE), or to
request only a full-idle SMT core (%SCX_PICK_IDLE_CORE), while applying
the built-in selection logic.
With this helper, BPF schedulers can apply the built-in idle CPU
selection policy restricted to any arbitrary subset of CPUs.
Example usage
=============
Possible usage in ops.select_cpu():
s32 BPF_STRUCT_OPS(foo_select_cpu, struct task_struct *p,
s32 prev_cpu, u64 wake_flags)
{
const struct cpumask *cpus = task_allowed_cpus(p) ?: p->cpus_ptr;
s32 cpu;
cpu = scx_bpf_select_cpu_and(p, prev_cpu, wake_flags, cpus, 0);
if (cpu >= 0) {
scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0);
return cpu;
}
return prev_cpu;
}
Results
=======
Load distribution on a 4 sockets, 4 cores per socket system, simulated
using virtme-ng, running a modified version of scx_bpfland that uses
scx_bpf_select_cpu_and() with 0xff00 as the allowed subset of CPUs:
$ vng --cpu 16,sockets=4,cores=4,threads=1
...
$ stress-ng -c 16
...
$ htop
...
0[ 0.0%] 8[||||||||||||||||||||||||100.0%]
1[ 0.0%] 9[||||||||||||||||||||||||100.0%]
2[ 0.0%] 10[||||||||||||||||||||||||100.0%]
3[ 0.0%] 11[||||||||||||||||||||||||100.0%]
4[ 0.0%] 12[||||||||||||||||||||||||100.0%]
5[ 0.0%] 13[||||||||||||||||||||||||100.0%]
6[ 0.0%] 14[||||||||||||||||||||||||100.0%]
7[ 0.0%] 15[||||||||||||||||||||||||100.0%]
With scx_bpf_select_cpu_dfl() tasks would be distributed evenly across
all the available CPUs.
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Diffstat (limited to 'kernel/sched/ext.c')
-rw-r--r-- | kernel/sched/ext.c | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index ac3fd7a409e9..eae16108f8d6 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -465,6 +465,7 @@ struct sched_ext_ops { * idle CPU tracking and the following helpers become unavailable: * * - scx_bpf_select_cpu_dfl() + * - scx_bpf_select_cpu_and() * - scx_bpf_test_and_clear_cpu_idle() * - scx_bpf_pick_idle_cpu() * |