Skip to content

Conversation

@fmoehler
Copy link
Contributor

@fmoehler fmoehler commented Nov 28, 2025

In case the vm fails to start up for some reason the network interface deletion is not retried. This could lead to errors like "IP address is already in use" on consecutive tries to create the vm (bosh automatically retries the vm creation in such a case). This PR implements a retry logic for the deletion and adds the relevant tests.

Below you can see a debug log of an example where the deletion is not retried, ultimately leading to the network interface becoming "abandoned" and therefore requiring manual intervention by an operator to delete the network interface and unblock the director.

D, [2025-11-27T04:10:37.572683 #747317] DEBUG -- [req_id cpi-875037]: [Aws::EC2::Client 200 0.092467 0 retries] describe_instances(instance_ids:["i-00296ecd9c9f26631"])

W, [2025-11-27T04:10:37.573357 #747317]  WARN -- [req_id cpi-875037]: Timed out waiting for instance 'i-00296ecd9c9f26631' to be running
W, [2025-11-27T04:10:37.573397 #747317]  WARN -- [req_id cpi-875037]: Failed to configure instance 'i-00296ecd9c9f26631': #<Bosh::Clouds::VMCreationFailed: Timed out waiting for instance 'i-00296ecd9c9f26631' to be running>
D, [2025-11-27T04:10:38.077263 #747317] DEBUG -- [req_id cpi-875037]: [Aws::EC2::Client 200 0.503579 0 retries] terminate_instances(instance_ids:["i-00296ecd9c9f26631"])

I, [2025-11-27T04:10:38.077371 #747317]  INFO -- [req_id cpi-875037]: Deleting instance settings for 'i-00296ecd9c9f26631'
I, [2025-11-27T04:10:38.077385 #747317]  INFO -- [req_id cpi-875037]: Deleting instance 'i-00296ecd9c9f26631'
D, [2025-11-27T04:10:38.147487 #747317] DEBUG -- [req_id cpi-875037]: [Aws::EC2::Client 200 0.069778 0 retries] describe_instances(instance_ids:["i-00296ecd9c9f26631"])

E, [2025-11-27T04:10:38.147926 #747317] ERROR -- [req_id cpi-875037]: Failed to terminate mis-configured instance 'i-00296ecd9c9f26631': #<Aws::Waiters::Errors::FailureStateError: stopped waiting, encountered a failure state>
I, [2025-11-27T04:10:38.147987 #747317]  INFO -- [req_id cpi-875037]: Deleting network_interface: eni-0eb14aebec9772cd0
D, [2025-11-27T04:10:38.403512 #747317] DEBUG -- [req_id cpi-875037]: [Aws::EC2::Client 400 0.255269 0 retries] delete_network_interface(network_interface_id:"eni-0eb14aebec9772cd0") Aws::EC2::Errors::InvalidParameterValue Network interface 'eni-0eb14aebec9772cd0' is currently in use.

W, [2025-11-27T04:10:38.403603 #747317]  WARN -- [req_id cpi-875037]: Network interface 'eni-0eb14aebec9772cd0' could not be deleted: Network interface 'eni-0eb14aebec9772cd0' is currently in use.
E, [2025-11-27T04:10:38.403642 #747317] ERROR -- [req_id cpi-875037]: Failed to create instance: Timed out waiting for instance 'i-00296ecd9c9f26631' to be running
/var/vcap/data/packages/bosh_aws_cpi/6350e37fd32ef512a6a32a35f1473fb5f0435ecc/lib/cloud/aws/instance.rb:69:in `rescue in wait_until_running'
/var/vcap/data/packages/bosh_aws_cpi/6350e37fd32ef512a6a32a35f1473fb5f0435ecc/lib/cloud/aws/instance.rb:59:in `wait_until_running'

@fmoehler fmoehler marked this pull request as ready for review November 28, 2025 10:20
@fmoehler fmoehler requested review from a team, anshrupani and s4heid and removed request for a team November 28, 2025 10:24
@fmoehler fmoehler force-pushed the retrydeletion-of-network-interface-in-case-vm-creation-errors branch from 6efad9f to 8ddf3cb Compare December 3, 2025 09:06
@github-project-automation github-project-automation bot moved this from Inbox to Pending Merge | Prioritized in Foundational Infrastructure Working Group Dec 3, 2025
Copy link
Contributor

@a-hassanin a-hassanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rkoster rkoster merged commit b8706ae into master Dec 4, 2025
3 checks passed
@rkoster rkoster deleted the retrydeletion-of-network-interface-in-case-vm-creation-errors branch December 4, 2025 15:53
@github-project-automation github-project-automation bot moved this from Pending Merge | Prioritized to Done in Foundational Infrastructure Working Group Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

4 participants