-
Notifications
You must be signed in to change notification settings - Fork 650
Open
Labels
Description
Search before asking
- I had searched in the issues and found no similar feature requirement.
Description
The following list defines the scope of the alpha for the RayService Incremental Upgrade feature.
- Implement Ray Serve APIs for users to define custom timeouts for readiness
- per this discussion this may not be necessary, we can revisit whether additional timeouts from the KubeRay side for a stuck upgrade are necessary after the initial implementation
- RayService Incremental Upgrade Alpha
- Before merging, we should ensure the following:
- Test and document rollback behavior. Full rollback support for the cases defined in the REP can be implemented in a follow-up PR.
- API review for the new RayService fields added
- Load/stress testing to ensure no requests are dropped
- Before merging, we should ensure the following:
- Add tests for upgrade rollback/roll-forward cases
- A->B->A case. This is where the RayService is upgraded from current state A to goal state B. In the middle of the upgrade, the goal state is changed back to A. Alternatively, a controller error during the upgrade should also result in us rolling back to state A. The expected behavior is that the RayService will rollback to the original state.
- A->B->C case. This is where the RayService is upgraded from current state A to goal state B. In the middle of the upgrade, the goal state is changed to state C. The expected behavior is that the RayService will rollback to state A, and then upgrade again from A to C.
Use case
It'd be useful to support an API to incrementally upgrade the RayService, scaling a new RayCluster to handle only a % capacity of the total traffic to the RayService in order to avoid delays in the upgrade process due to resource constraints. This issue will track the progress of implementing the RayService incremental upgrade feature.
Related issues
REP: https://github.com/ray-project/enhancements/blob/main/reps/2024-12-4-ray-service-incr-upgrade.md
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
maxjakob