-
Notifications
You must be signed in to change notification settings - Fork 165
feat: agent resource sync API #2180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: agent resource sync API #2180
Conversation
Your org has enabled the Graphite merge queue for merging into mainAdd the label “flow:merge-queue” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “flow:hotfix” to add to the merge queue as a hot fix. You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link. |
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
50ab7d0 to
964c9f3
Compare
e8d2106 to
5dd7136
Compare
5dd7136 to
94d419d
Compare
7ca0eb4 to
58002f0
Compare
94d419d to
fdd53c9
Compare
58002f0 to
09893dc
Compare
fdd53c9 to
a771f22
Compare
808b99f to
466ff9d
Compare
5c5886a to
fc28048
Compare
fc28048 to
bbbd88e
Compare
0594b41 to
ef6e3eb
Compare
e0d34df to
116fabd
Compare
ef6e3eb to
e49073f
Compare
116fabd to
0b69f29
Compare
e49073f to
372049e
Compare
0b69f29 to
2d124a5
Compare
372049e to
caa2113
Compare
2d124a5 to
fdca36a
Compare
caa2113 to
5cb8bc8
Compare
fdca36a to
11f6c3f
Compare
5cb8bc8 to
d778970
Compare
|
Closed this PR because we will implement kernel heartbeat and revamp agent heartbeat, which leads to resolve this issue |

resolves #2142 https://github.com/lablup/giftbox/issues/262
Agents's
sync-and-get-kernels()APIThe API that synchronizes agent's kernels to kernel information specified by API parameters (preparing_kernels, pulling_kernels, running_kernels, terminating_kernels). It assumes that the kernel information given by the parameter is the "truth".
If any of kernel information mismatch between
kernel_registryandrunning_kernels(orterminating_kernels), agent injects termination event to terminate the kernel.sync-and-get-kernels()API returns actual { running, terminating, terminated } kernels (which is not used for now).actual_terminated_kernelscontains terminated kernels specified asrunning_kernelsby API parameter.How to use
POST /session/_/sync-agent-resourcemanager APIafter-scheduling,before-kernel-creationandon-creation-failure].on-creation-failure: Set by default. Call resource sync when kernel creation failed byInsufficientResourceexceptionafter-scheduling: Call resource sync right after scheduling on a scaling groupbefore-kernel-creation: Call resource sync before calling create-kernels agent APINote
on-creation-failureoption cannot not handle ExceptionGroup including multipleInsufficientResourceexceptions, which is raised by creation failure of multi kernel session. It covers only creation failure of single kernel session.This will be resolved after merge lablup/callosum#30
Checklist: (if applicable)
docsdirectory📚 Documentation preview 📚: https://sorna--2180.org.readthedocs.build/en/2180/
📚 Documentation preview 📚: https://sorna-ko--2180.org.readthedocs.build/ko/2180/