-
Notifications
You must be signed in to change notification settings - Fork 10
implement Job.ReclaimablePods for AppWrappers #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is somewhat related to #74 in that it also relies on fairly detailed understanding of the wrapped GVK's semantics. |
In Kueue 0.7 only Job, JobSet, and Pod (for PodGroups) implement this optional interface. Until the interface is more widely adopted, there is limited value to recognizing this situation and flowing it through the AppWrapper to Kueue. |
While support for Ray jobs seems problematic given #174, we should be able to implement this interface for wrapped PyTorchJobs, in the following sense: if a PyTorchJob status is failed then we can assume no more pods will be created by the Training operator for this job. |
Closing issue here; AppWrapper integration was merged upstream to Kueue so the work would be done there. |
In the cases where the appwrapper contains resources that are managed by Kueue and the component implemented ReclaimablePods, we should monitor the workload instances and flow that information through to Kueue.
The text was updated successfully, but these errors were encountered: