Skip to content

[Feature] RayService Incremental Upgrade Project Tracker #3209

@ryanaoleary

Description

@ryanaoleary

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

The following list defines the scope of the alpha for the RayService Incremental Upgrade feature.

  • Implement Ray Serve APIs for users to define custom timeouts for readiness
    • per this discussion this may not be necessary, we can revisit whether additional timeouts from the KubeRay side for a stuck upgrade are necessary after the initial implementation
  • RayService Incremental Upgrade Alpha
    • Before merging, we should ensure the following:
      • Test and document rollback behavior. Full rollback support for the cases defined in the REP can be implemented in a follow-up PR.
      • API review for the new RayService fields added
      • Load/stress testing to ensure no requests are dropped
  • Add tests for upgrade rollback/roll-forward cases
    • A->B->A case. This is where the RayService is upgraded from current state A to goal state B. In the middle of the upgrade, the goal state is changed back to A. Alternatively, a controller error during the upgrade should also result in us rolling back to state A. The expected behavior is that the RayService will rollback to the original state.
    • A->B->C case. This is where the RayService is upgraded from current state A to goal state B. In the middle of the upgrade, the goal state is changed to state C. The expected behavior is that the RayService will rollback to state A, and then upgrade again from A to C.

Use case

It'd be useful to support an API to incrementally upgrade the RayService, scaling a new RayCluster to handle only a % capacity of the total traffic to the RayService in order to avoid delays in the upgrade process due to resource constraints. This issue will track the progress of implementing the RayService incremental upgrade feature.

Related issues

REP: https://github.com/ray-project/enhancements/blob/main/reps/2024-12-4-ray-service-incr-upgrade.md

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions