Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VMNetwork adapter 'vEthernet (nat)*' not found #4445

Open
RomaricKanyamibwa opened this issue Nov 28, 2024 · 4 comments
Open

VMNetwork adapter 'vEthernet (nat)*' not found #4445

RomaricKanyamibwa opened this issue Nov 28, 2024 · 4 comments

Comments

@RomaricKanyamibwa
Copy link

RomaricKanyamibwa commented Nov 28, 2024

Summary

Much like the issue 2416, there seems to be an issue with the Windows_Server-2022-English-Full-ECS_Optimized AMIs, where the ECS-Agent is sometimes having issues connecting to the ECS Cluster due to some virtual hardware issues (the VMNetwork cannot be found). Like the other issue, this, too, seems random but will happen sporadically on our windows image.

Description

Using packer we create our own AMIs based on the Windows_Server-2022-English-Full-ECS_Optimized AMIs. On the AMI we install ssh, then pull our windows docker images, and finally terminate it by installing EC2Launchv2. Once the AMI is ready we use it on our ECS cluster with the user data :

# configure ecs cluster
[Environment]::SetEnvironmentVariable("ECS_CLUSTER", "cluster-x86_64-windows","Machine")
[Environment]::SetEnvironmentVariable("ECS_IMAGE_PULL_BEHAVIOR","prefer-cached","Machine")
[Environment]::SetEnvironmentVariable("ECS_AWSVPC_BLOCK_IMDS","true ","Machine")
[Environment]::SetEnvironmentVariable("ECS_ENABLE_AWSLOGS_EXECUTIONROLE_OVERRIDE","true","Machine")
# init ecs agent
Import-Module ECSTools
Initialize-ECSAgent -EnableTaskIAMRole -EnableTaskENI -LoggingDrivers "['json-file','awslogs']"

Periodically one of the instances in the ASG fails to get attached to the ECS Cluster with the following errors:

2024-11-25T10:07:52Z - [INFO]:ScheduledTask Initialize-ECSHostReboot created.
2024-11-25T10:07:52Z - [INFO]:Configuring ECS Host for Task IAM Roles...
2024-11-25T10:07:52Z - [INFO]:Server Edition: Microsoft Windows Server 2022 Datacenter
2024-11-25T10:07:55Z - [INFO]:Attempt#: 10, Adapters:

2024-11-25T10:07:55Z - [INFO]:VMNetwork adapter 'vEthernet (nat)*' not found
2024-11-25T10:07:55Z - [INFO]:Retrying after sleeping 1sec

This error makes the instance unusable to the cluster, so the ASG launches a new one while the old one is left dangling unused.

Expected Behavior

The ECS-Agent reliably connects to the ECS cluster without errors.

Observed Behavior

The ECS-Agent will sometimes fail, and the instance will not be attached to the ECS cluster and will just continue running. Rebooting the instance fixes the issues and the agent no longer produces the error.

Before the reboot we get:

PS C:\Windows\system32> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
Ethernet 4                Amazon Elastic Network Adapter #2             8 Up           06-D5-5D-A5-67-E1       5.0 Gbps

After reboot when it starts to work :

PS C:\Windows\system32> Get-NetAdapter

Name                      InterfaceDescription                    ifIndex Status       MacAddress             LinkSpeed
----                      --------------------                    ------- ------       ----------             ---------
Ethernet 4                Amazon Elastic Network Adapter #2             6 Up           06-4C-57-A5-89-89       5.0 Gbps
vEthernet (nat)           Hyper-V Virtual Ethernet Adapter             12 Up           00-15-5D-03-19-C0        10 Gbps

Environment Details:

PS C:\Windows\system32> docker info
Client:
 Version:    25.0.6.m
 Context:    default
 Debug Mode: false

Server:
ERROR: error during connect: in the default daemon configuration on Windows, the docker client must be run with elevated privileges to connect: Get "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.44/info": open //./pi
pe/docker_engine: The system cannot find the file specified.
errors pretty printing info

PS C:\Windows\system32>  Invoke-WebRequest -Uri http://localhost:51678/v1/metadata -UseBasicParsing


StatusCode        : 200
StatusDescription : OK
Content           : {"Cluster":"x86_64-windows-2022","ContainerInstanceArn":"arn:aws:ecs:eu-west-1:123456789011:container-instance/cluster-x86_64-windows-2022/a4c4329a0392450
                    ba9e659b6b...
RawContent        : HTTP/1.1 200 OK
                    Content-Length: 259
                    Content-Type: application/json
                    Date: Mon, 02 Dec 2024 10:02:18 GMT

                    {"Cluster":"x86_64-windows-2022","ContainerInstanceArn":"arn:aws:ecs:...
Forms             :
Headers           : {[Content-Length, 259], [Content-Type, application/json], [Date, Mon, 02 Dec 2024 10:02:18 GMT]}
Images            : {}
InputFields       : {}
Links             : {}
ParsedHtml        :
RawContentLength  : 259

Supporting Log Snippets

UserScript.ps1.log
output.log
err.log

@EmmanuelTsouris
Copy link

@RomaricKanyamibwa, question on one of your comments:

finally terminate it by installing EC2Launchv2
EC2 Launch v2 is already installed on the Windows_Server-2022-English-Full-ECS_Optimized AMI, can please clarify?

@RomaricKanyamibwa
Copy link
Author

@RomaricKanyamibwa, question on one of your comments:

finally terminate it by installing EC2Launchv2
EC2 Launch v2 is already installed on the Windows_Server-2022-English-Full-ECS_Optimized AMI, can please clarify?

@EmmanuelTsouris , yes, it is indeed installed; however, we reinstall it to ensure that the resulting AMI can be properly utilized by EC2. If this final step is not completed, the resulting AMI cannot be configured with a new user password. Consequently, unless we know the previous password used by the instance, accessing it becomes impossible. Concretely here is what we do at the end of the build of the AMI :

# Finalize Image preparation by installing EC2Launchv2
mkdir $env:USERPROFILE\Desktop\EC2Launchv2
$Url = "https://s3.amazonaws.com/amazon-ec2launch-v2/windows/amd64/latest/AmazonEC2Launch.msi"
$DownloadFile = "$env:USERPROFILE\Desktop\EC2Launchv2\" + $(Split-Path -Path $Url -Leaf)

[Net.ServicePointManager]::SecurityProtocol += 'tls12'
Invoke-WebRequest -Uri $Url -OutFile $DownloadFile
Start-Process msiexec -Wait -NoNewWindow -ArgumentList @("/i", "$DownloadFile", "ADDLOCAL=Basic,Clean", "/qn", "/log", "Ec2Launch.log")

# Run The Microsoft System Preparation (Sysprep) tool
Start-Process powershell -Wait -NoNewWindow -ArgumentList @("C:\Progra~1\Amazon\EC2Launch\EC2Launch.exe", "sysprep", "--shutdown=false")

@mcregan23
Copy link

Hey @RomaricKanyamibwa
When this occurs for you, would you be able to share the EC2Launchv2 logs as well? I don't want to speculate too much but it would be nice to see those as well when this issue occurs. Thanks!

@RomaricKanyamibwa
Copy link
Author

Hey @RomaricKanyamibwa When this occurs for you, would you be able to share the EC2Launchv2 logs as well? I don't want to speculate too much but it would be nice to see those as well when this issue occurs. Thanks!

Hello @mcregan23 ,

Here are the ec2 logs and you can also find them in the issue's description :
UserScript.ps1.log
output.log
err.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants