Skip to content

GlennChia/terraform-agent-observability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Terraform Agent Observability

This repository provides demo code on how to visualize Terraform agent telemetry. It includes Terraform code to deploy the agent pool, set the organization's default settings to use the agent pool. It also includes a docker-compose.yml to launch containers for the Agent, OTel collector, Prometheus, and Grafana. There is Terraform code for creating test workspaces to generate telemetry for visualization.

dashboard cover

Read the accompanying Medium blog post or Substack blog post for more details about the integration and additional screenshots.

1. Architecture

architecture diagram

2. Deployment

2.1 Agent pool and organization default settings

Step 1: Configure HCP Terraform credentials. Refer to the tfe_provider authentication docs for the various token options and guidance. For example:

export TFE_TOKEN=example

Step 2: In tf-agent directory, run an apply, review the plan output, and approve the plan accordingly. The apply outputs the commands to run the Terraform agent. This includes the agent token.

Caution

In a live environment it is not good practice to output the Terraform agent token. The token is output in this repo purely for demo purposes, such that readers can easily pass the token to the Terraform agent.

terraform init
terraform apply

2.2 Start the containers

Step 1: Run the following commands in the root directory to start up the containers (replace the agent token with the output from the previous step)

export TFC_AGENT_TOKEN=example
export TFC_AGENT_NAME=demo-agent-pool
docker compose up

3. Verify deployment

3.1 Terraform agent pool

Terraform agent pool created with an idle agent

agent pool

Terraform org settings default execution mode shows Agent

org settings general default execution mode

3.2 Prometheus

Visit localhost:9090 and choose Explore metrics

explore metrics

View tfc_agent prefixed metrics

tfc agent metrics

3.3 Grafana

Visit localhost:3000 and login with

  • Username: admin
  • Password: admin

login

Initial Terraform Agent Dashboard. This is configured from Metrics-Dashboard.json

dashboard initial

3.4 Jaeger

Visit localhost:16686. There are no traces yet since there are no workspace runs.

jaeger ui initial

4. Testing

4.1 Create workspaces for testing

Step 1: In the tf/test directory, copy tf-test/terraform.tfvars.example to terraform.tfvars and change the environment variables accordingly. GitHub credentials can use a personal access tokens. This token needs sufficient permissions to create, delete repositories, and write files to the repository.

Caution

In a live environment, it is not good practice to directly pass the GitHub token. Instead, sensitive credentials should be securely stored and accessed using solutions like HashiCorp Vault, which provides encrypted storage and access controls capabilities.

Step 3: In the tf/test-auto-scaling directory, run an apply, review the plan output, and approve the plan accordingly.

terraform init
terraform apply

4.2 GitHub repo created for testing

GitHub repository created with simple Terraform resources.

github repo

4.3 First workspace plan

Agent processes workspace runs one at a time. Agent is Busy

agent busy

Dashboard shows data about the first run

dashboard

4.4 Workspaces applied

4.4.1 HCP TF view

All workspaces are eventually applied

workspaces applied

Agent transitions to Idle

agent idle

4.4.2 Grafana dashboard

Dashboard with metrics across all the workspace runs

dashboard

Zoomed in view for various dashboard sections - Job and workspace performance

dashboard1

Resource utilization (Pool-wide)

dashboard2

Runtime metrics (Pool-wide)

dashboard3

Individual agent details

dashboard4

4.4.3 Prometheus metrics

Some metrics are available during runs. For example

  • tfc_agent_core_profiler_cpu_busy_percent
  • tfc_agent_core_profiler_memory_used_percent

agent metric during run

tfc_agent_core_profiler_memory_used_percent

agent memory

tfc_agent_core_profiler_cpu_busy_percent

agent cpu

4.4.4 Jaeger

Jaeger UI shows 10 traces. Each workspace has 2 traces - 1 for plan, and 1 for apply.

jaeger ui traces

Example of a plan trace

plan overview

This can be drilled down to the span information

plan span

Example of an apply trace

apply overview

This can be drilled down to the span information

apply span

5. Cleanup

Step 1: Run docker compose down -v

Step 2: In the tf-test directory, run destroy. Review the destroy output before approving.

terraform destroy

Step 3: In the tf-agent directory, run destroy. Review the destroy output before approving.

terraform destroy

Releases

No releases published

Packages

 
 
 

Contributors

Languages