Skip to content

Conversation

@owenowenisme
Copy link
Member

@owenowenisme owenowenisme commented Apr 22, 2025

Why are these changes needed?

This PR only shows how I observe the proper amount of Memory and Cpu for testing api server e2e.
So this shouldn't be merged.

How I test them

I only conducted them on those tests that needed computing resources.

I use k8s native metrics-server to collect the usage of Cpu and Memory in each pods, query metrics server with kubectl top pod -n namespace with 5 second interval (metrics server itself scrape those metrics with a 15 second interval, so query them every 5 seconds should work fine)

Result: TL;DR CPU: 1 & Memory 1Gi would be enough

(Currently set to CPU:2 & 4Gi)

  • cluster_server_e2e_test.go/TestCreateClusterEndpoint
=== RUN   TestCreateClusterEndpoint/Create_a_cluster_without_volumes
    utils.go:83: Found condition 'RayClusterProvisioned' for ray cluster 'bunny'
    utils.go:167: Metrics result:
        bunny-head-ndplw   203m         698Mi           
        bunny-head-ndplw   203m         698Mi           
        bunny-head-ndplw   203m         698Mi           
        
    utils.go:168: 
        Peak CPU usage: 203.0m
        Peak Memory usage: 698.0Mi
--- PASS: TestCreateClusterEndpoint/Create_a_cluster_without_volumes (32.59s)
=== RUN   TestCreateClusterEndpoint/Create_cluster_with_config_map_volume
    utils.go:83: Found condition 'RayClusterProvisioned' for ray cluster 'lioness'
    utils.go:167: Metrics result:
        bunny-head-ndplw   203m         698Mi           
        bunny-head-ndplw              79m          702Mi           
        bunny-small-wg-worker-6mn46   155m         244Mi           
        bunny-head-ndplw              79m          702Mi           
        bunny-small-wg-worker-6mn46   155m         244Mi           
        bunny-head-ndplw              79m          702Mi           
        bunny-small-wg-worker-6mn46   155m         244Mi           
        bunny-head-ndplw              66m          706Mi           
        bunny-small-wg-worker-6mn46   17m          149Mi           
        lioness-head-6ffhg            378m         697Mi           
        bunny-head-ndplw              66m          706Mi           
        bunny-small-wg-worker-6mn46   17m          149Mi           
        lioness-head-6ffhg            378m         697Mi           
        bunny-small-wg-worker-6mn46   17m          149Mi           
        lioness-head-6ffhg            378m         697Mi           
        lioness-head-6ffhg   65m          701Mi           
        lioness-head-6ffhg   65m          701Mi           
        lioness-head-6ffhg   65m          701Mi           
        lioness-head-6ffhg   52m          698Mi           
        lioness-head-6ffhg   52m          698Mi           
        lioness-head-6ffhg   52m          698Mi           
        
    utils.go:168: 
        Peak CPU usage: 378.0m
        Peak Memory usage: 706.0Mi
--- PASS: TestCreateClusterEndpoint/Create_cluster_with_config_map_volume (61.09s)
=== RUN   TestCreateClusterEndpoint/Create_cluster_with_no_workers

    utils.go:83: Found condition 'RayClusterProvisioned' for ray cluster 'macaw'
    utils.go:167: Metrics result:
        lioness-head-6ffhg   52m          698Mi           
        lioness-head-6ffhg              62m          466Mi           
        lioness-small-wg-worker-xmnhc   113m         234Mi           
        lioness-head-6ffhg              62m          466Mi           
        lioness-small-wg-worker-xmnhc   113m         234Mi           
        
    utils.go:168: 
        Peak CPU usage: 113.0m
        Peak Memory usage: 698.0Mi
  • cluster_server_autoscaler_e2e_test.go
=== RUN   TestCreateClusterAutoscaler
    utils.go:167: Metrics result:
        warthog-head-r6wmb   655m         746Mi           
        warthog-head-r6wmb   655m         746Mi           
        warthog-head-r6wmb   655m         746Mi           
        warthog-head-r6wmb   156m         780Mi           
        warthog-head-r6wmb   156m         780Mi           
        warthog-head-r6wmb   156m         780Mi           
        warthog-head-r6wmb   135m         788Mi           
        warthog-head-r6wmb   135m         788Mi           
        warthog-head-r6wmb   135m         788Mi           
        warthog-head-r6wmb              85m          794Mi           
        warthog-small-wg-worker-lkld4   38m          289Mi           
        warthog-head-r6wmb              85m          794Mi           
        warthog-small-wg-worker-lkld4   38m          289Mi           
        warthog-head-r6wmb              85m          794Mi           
        warthog-small-wg-worker-lkld4   38m          289Mi           
        warthog-head-r6wmb              84m          796Mi           
        warthog-small-wg-worker-lkld4   128m         133Mi           
        warthog-head-r6wmb              84m          796Mi           
        warthog-small-wg-worker-lkld4   128m         133Mi           
        warthog-head-r6wmb              84m          796Mi           
        warthog-small-wg-worker-lkld4   128m         133Mi           
        warthog-head-r6wmb              88m          566Mi           
        warthog-small-wg-worker-lkld4   5m           107Mi           
        warthog-head-r6wmb              88m          566Mi           
        warthog-small-wg-worker-lkld4   5m           107Mi           
        warthog-head-r6wmb              88m          566Mi           
        warthog-small-wg-worker-lkld4   5m           107Mi           
        
    utils.go:168: 
        Peak CPU usage: 655.0m
        Peak Memory usage: 796.0Mi
--- PASS: TestCreateClusterAutoscaler (105.75s)
  • job_server_e2e_test.go/TestCreateJobWithDisposableClusters
=== RUN   TestCreateJobWithDisposableClusters/Create_a_running_sample_job
    utils.go:167: Metrics result:
        frog-raycluster-6ktk2-head-ckh6d   373m         502Mi           
        frog-raycluster-6ktk2-head-ckh6d   373m         502Mi           
        frog-raycluster-6ktk2-head-ckh6d   373m         502Mi           
        frog-raycluster-6ktk2-head-ckh6d   55m          516Mi           
        frog-raycluster-6ktk2-head-ckh6d   55m          516Mi           
        frog-raycluster-6ktk2-head-ckh6d   55m          516Mi           
        frog-raycluster-6ktk2-head-ckh6d   63m          515Mi           
        frog-raycluster-6ktk2-head-ckh6d   63m          515Mi           
        frog-raycluster-6ktk2-head-ckh6d   63m          515Mi           
        frog-raycluster-6ktk2-head-ckh6d              432m         621Mi           
        frog-raycluster-6ktk2-small-wg-worker-pf7zn   148m         489Mi           
        frog-wct6w                                    301m         198Mi           
        frog-raycluster-6ktk2-head-ckh6d              432m         621Mi           
        frog-raycluster-6ktk2-small-wg-worker-pf7zn   148m         489Mi           
        frog-wct6w                                    301m         198Mi           
        
    utils.go:168: Peak CPU usage: 432.0m
        Peak Memory usage: 621.0Mi
--- PASS: TestCreateJobWithDisposableClusters/Create_a_running_sample_job (64.10s)

Next step

func (e2etc *End2EndTestingContext) CreateComputeTemplate(t *testing.T) {
	computeTemplateRequest := &api.CreateComputeTemplateRequest{
		ComputeTemplate: &api.ComputeTemplate{
			Name:      e2etc.computeTemplateName,
			Namespace: e2etc.namespaceName,
			Cpu:       2, // change to 1
			Memory:    4, // change to 1Gi
		},
		Namespace: e2etc.namespaceName,
	}

	_, _, err := e2etc.kuberayAPIServerClient.CreateComputeTemplate(computeTemplateRequest)
	require.NoErrorf(t, err, "No error expected while creating a compute template (%s, %s)", e2etc.namespaceName, e2etc.computeTemplateName)
}

Related issue number

#3426

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@owenowenisme
Copy link
Member Author

CI error is expected.

@owenowenisme
Copy link
Member Author

@dentiny PTAL

Copy link
Contributor

@dentiny dentiny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the investigation! Looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants