feat: WIP: allow user to over-scale a buffer of instances in an ASG #100
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello 👋 - A few weeks ago I opened #96 but promptly closed it as I could have solved the problem with a different tool. After looking back at this, I think it'd be simpler if solved in
eks-rolling-update
directly.This PR adds a new env variable:
ASG_BUFFER_INSTANCES
which allows an arbitrary number to be given toeks_rolling_update.py
and will cause each ASG to be over-scaled by that number.But why?
The past few rolling upgrades I've done have resulted in some things like workloads with PV/PVC getting stuck in pending as other pods had started, scaleout of HPAs causing pods to get stuck in pending, deployments during rollout causing issues...
Since I've been pre-scaling each ASG by a few instances it hasn't been an issue and
cluster-autoscaler
takes care of scaling in unused compute after rollout.As always, open to any feedback or ideas 😸
Thanks for an awesome tool!