Skip to content

Supporting In-place Pod Resizing #771

@sandy-0007

Description

@sandy-0007

Is your feature request related to a problem? Please describe.

Yes, our feature request is directly related to the problem of disruptive resource adjustments for application pods.

Currently, when we use Goldilocks to get resource recommendations for our application pods (CPU and Memory requests/limits), applying these recommendations often requires a pod restart. This restart, while necessary for resource changes in older Kubernetes versions, can introduce undesirable downtime or service disruption, particularly for:

Stateful applications: Where restarts can lead to temporary data unavailability or complex failover scenarios.

Long-running batch jobs: Where a restart means losing progress or requiring a full re-run.

High-traffic web services: Where any brief disruption can impact user experience and service level objectives (SLOs).

Applications with slow startup times: Where restarting can cause significant delays in regaining full operational capacity.

This forces our teams to either schedule maintenance windows for right-sizing (delaying optimization) or accept service interruptions, which is not ideal for modern, highly available cloud-native environments. We had interruptions with Java workloads and so on.

Describe the solution you'd like

We would like Goldilocks to support and surface recommendations for in-place pod resize (Vertical Scaling), leveraging the new Beta feature in Kubernetes 1.33 (InPlacePodVerticalScaling).

Specifically, we envision the following:

Detection and Indication: Goldilocks should detect if the underlying Kubernetes cluster supports in-place pod resize (i.e., Kubernetes v1.33+ with the InPlacePodVerticalScaling feature gate enabled by default).

Recommendation Display: For pods eligible for in-place resize, the Goldilocks dashboard and any generated reports should clearly indicate that the recommended resource changes can be applied without a pod restart.

Perhaps a new icon, a "non-disruptive" flag, or a specific "Apply In-Place" button/option next to the traditional YAML recommendation.

Actionable Output (YAML/Patch): When providing the YAML for applying recommendations, Goldilocks should:

Generate a kubectl patch command or a YAML snippet that specifically uses the Pod's subresource resize (e.g., kubectl patch pod --subresource resize ...) instead of modifying the deployment/statefulset directly, to facilitate in-place updates.

Consider the resizePolicy field (e.g., restartPolicy: NotRequired for CPU, RestartContainer for memory if necessary) in the recommendations, helping users understand when a restart might still be unavoidable for certain resource types (like memory decreases).

VPA Integration Awareness: Since Goldilocks builds on VPA, we'd hope it seamlessly integrates with VPA's future ability to leverage in-place resize (e.g., VPA's InPlaceOrRecreate update mode mentioned in K8s blogs). Goldilocks should expose VPA's capability to recommend in-place updates when appropriate.

Describe alternatives you've considered

Manual Patching: We could manually monitor resource usage, determine new values, and then apply kubectl patch --subresource resize commands directly. This is highly cumbersome, error-prone, and defeats the purpose of Goldilocks' automated recommendation generation.

Sticking to Traditional Restarts: We could continue to accept pod restarts for resource changes, but this goes against our goals of minimizing disruption and maximizing availability, especially for critical applications.

Using Other Solutions (e.g., CAST AI): Some commercial tools are already integrating with in-place pod resize. While they offer this functionality, we value Goldilocks' open-source nature, its clear reliance on VPA, and its straightforward dashboard for resource recommendations, which aligns well with our existing observability stack. We would prefer to extend our current Goldilocks usage rather than adopting a new, potentially more opinionated or costly platform.

Additional context

The Kubernetes community has been working towards in-place pod resize for a long time, and its graduation to Beta in Kubernetes v1.33 (released May 2025) is a significant milestone. This feature is enabled by default in v1.33+, making it widely available for new and upgraded clusters.

This enhancement to Goldilocks would provide immense value by allowing users to act on resource recommendations with minimal disruption, leading to:

Faster Optimization Cycles: Teams can apply recommendations more frequently without fear of downtime.

Improved Application Uptime: Critical applications can be scaled vertically without restarts.

More Granular Resource Management: Enabling finer-tuned resource adjustments based on real-time needs.

We believe this aligns perfectly with Goldilocks' mission of providing "just right" resource recommendations, now with the added benefit of "just in time" and "just in place" adjustments.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAdding additional functionality or improvementsstaleMarked as stale by stalebottriageThis bug needs triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions