Skip to content

[POC] RayJob YuniKorn Integration #3379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

troychiu
Copy link
Contributor

@troychiu troychiu commented Apr 15, 2025

Usage

  1. Checkout to this PR to build and run the ray-operator in the cluster. Remember to add --set batchScheduler.name=yunikorn to enable yunikorn as the scheduler.
  2. Follow steps 2 and 4 in https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/yunikorn.html to setup yunikorn in the cluster.
  3. Specify yunikorn related labels in a rayjob
apiVersion: ray.io/v1
kind: RayJob
metadata:
  name: rayjob-sample
  namespace: test
  labels:
    ray.io/gang-scheduling-enabled: "true" 
    yunikorn.apache.org/app-id: test-yunikorn-0
    yunikorn.apache.org/queue: root.test
spec:
    ...
  1. submit the ray job

If everything is setup correctly, you should be able to see

  1. yunikorn related labels and annotations in raycluster, submitter job, and ray pods.
  2. yunikorn is responsible for the scheduling, as well as gang-scheduling if specified.
  3. yunikorn job in yunikorn dashboard.

Why are these changes needed?

Related issue number

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

Signed-off-by: Troy Chiu <[email protected]>
@troychiu troychiu changed the title [POC ]RayJob YuniKorn Integration [POC] RayJob YuniKorn Integration Apr 22, 2025
@@ -77,14 +79,140 @@ func (y *YuniKornScheduler) AddMetadataToPod(ctx context.Context, app *rayv1.Ray
}
}

func (y *YuniKornScheduler) isGangSchedulingEnabled(app *rayv1.RayCluster) bool {
_, exist := app.Labels[utils.RayClusterGangSchedulingEnabled]
func collectLabelsFromObject(obj client.Object, labels map[string]string, sourceKey string, targetKey string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe need to rename this func?
I think the name does not reflect what it does.

}

func (y *YuniKornScheduler) isGangSchedulingEnabled(obj client.Object) bool {
_, exist := obj.GetLabels()[utils.RayClusterGangSchedulingEnabled]
return exist
}

func (y *YuniKornScheduler) populateTaskGroupsAnnotationToPod(ctx context.Context, app *rayv1.RayCluster, pod *corev1.Pod) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always feel its inconsistent that some where it is named app while some where it's named cluster maybe we should make it more consistent?

func (v *VolcanoBatchScheduler) PropagateMetadata(_ context.Context, parent client.Object, groupName string, child client.Object) {
// Only support parent is RayCluster and child is Pod
pod, ok := child.(*corev1.Pod)
if !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to add some error msg here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants