-
Notifications
You must be signed in to change notification settings - Fork 134
scx_layered: Implement sticky modulation optimization #1690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
if (!active_sticky_mod) | ||
return cpu; | ||
|
||
cpu_ctx = lookup_cpu_ctx(prev_cpu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Factoring out into a small inline function can avoid all the gotos.
@@ -1195,6 +1245,55 @@ static void layer_kick_idle_cpu(struct layer *layer) | |||
scx_bpf_put_idle_cpumask(idle_smtmask); | |||
} | |||
|
|||
SEC("tp_btf/sched_switch") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the overhead of the extra probe? Both in terms of having an extra probe and in terms of the code itself. Can we roll this into our starting/stopping methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The probe itself should be pretty fast, but I think longer term hooking into starting/stopping methods is the better way, so I will do that.
llc: | ||
if (!(cpumask = cast_mask(llc->cpumask))) | ||
goto out; | ||
bpf_for(i, 0, nr_possible_cpus) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Comment: I'm surprised we don't have an FFS based foreach cpu in cpumask primitive)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if there's a better way to do this, I'm all ears. It's suboptimal, especially on machines with a lot of CPUs. Even on Bergamo, we're iterating over 88 CPUs for every 8 we want to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that comes to mind is storing start and end CPUs in llc_ctx, but that requires the assignment to be sequential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
layered stores iteration indices in per-cpu/llc arrays. It's a bit more setup work but overall not that bad. But yeah, bit-wise iterators would be great.
Looks good overall, my main nit would be rolling the stat update logic into our own methods instead of a sched_switch-based probe, it is both cleaner (doesn't grab tasks that belong to other schedulers) and more consistent |
IIUC the default is what is done currently/without this flag, right? If so this is good. Could you update the documentation to suggest like, what a reasonable default is, or set the default to disabled and add an enable flag which when passed defaults to a reasonable value? |
No description provided.