Skip to content

Trigger graceful Machine disruption via Node deletion #13591

@aidan-canva

Description

@aidan-canva

What would you like to be added (User Story)?

As an operator, I would like to be able to gracefully remove a Node and its underlying Machine/Infrastructure respecting any disruption configurations through a deletion of the Node object in the child cluster.

Detailed Description

CAPI supports a configurable and sophisticated deletion process for Machine objects that respect workload availability (PDB's, etc). At present, this is triggered through the deletion timestamp (metadata.deletionTimestamp) on the Machine object being set.

In some environments, the operator of the CAPI infrastructure differs from the operator of the child-cluster and its resources - or, jumping across clusters is cumbersome. In this model, it is challenging for a child-cluster operator to trigger this same, safe, deletion process from the child-cluster directly.

A potential solution is to (via feature flag) have CAPI configure metadata.finalizers on managed Nodes and have it react to the Node's metadata.deletionTimestamp to trigger a deletion of the parent Machine resource, removing the finalizer when the disruption process has completed. This follows a similar pattern to what karpenter has implemented and allows for safe removal of 'bad' nodes in a pinch.

Anything else you would like to add?

If there is appetite to explore this, I'm happy to put together a formal proposal/PR.

Label(s) to be applied

/kind feature
/area machine

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/machineIssues or PRs related to machine lifecycle managementkind/featureCategorizes issue or PR as related to a new feature.needs-priorityIndicates an issue lacks a `priority/foo` label and requires one.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions