A comprehensive Infrastructure as Code (IaC) template for managing Airbyte data pipelines using Terraform. This template provides a scalable, multi-environment data ingestion solution with support for various data sources and destinations.
- Overview
 - Architecture
 - Features
 - Prerequisites
 - Quick Start
 - Configuration
 - Data Sources
 - Data Destinations
 - Environment Management
 - Usage Examples
 - Troubleshooting
 - Contributing
 
This template automates the deployment and management of Airbyte data ingestion pipelines using Terraform. It supports multiple data sources, destinations, and environments, making it ideal for organizations looking to implement scalable data integration solutions.
- Infrastructure as Code: Version-controlled, reproducible infrastructure
 - Multi-Environment Support: Separate dev, staging, and production environments
 - Modular Design: Reusable modules for sources, destinations, and connections
 - Scalable: Easily add new data sources and destinations
 - Secure: Proper secret management and authentication
 
graph TB
    subgraph "Data Sources"
        S3C[S3: Comeet Data]
        S3D[S3: Tikal Datalake]
    end
    
    subgraph "Airbyte Platform"
        AB[Airbyte Server]
        WS1[Dev Workspace]
        WS2[Stage Workspace]
        WS3[Prod Workspace]
    end
    
    subgraph "Data Destinations"
        BQ[BigQuery]
    end
    
    subgraph "Infrastructure"
        TF[Terraform]
        K8S[Kubernetes Backend - Playground]
    end
    
    S3C --> AB
    S3D --> AB
    
    AB --> WS1
    AB --> WS2
    AB --> WS3
    
    WS1 --> BQ
    WS2 --> BQ
    WS3 --> BQ
    
    TF --> K8S
    TF --> AB
    | Component | Purpose | Technology | 
|---|---|---|
| Sources | Data ingestion from various systems | S3 (AWS) | 
| Destinations | Data storage and processing | BigQuery | 
| Orchestration | Pipeline management | Airbyte | 
| Infrastructure | Resource provisioning | Terraform + Kubernetes | 
| State Management | Infrastructure state | Kubernetes Secret Backend | 
- AWS S3: Comeet recruiting data (CSV format) and tikal-datalake documents (unstructured format)
 
- BigQuery: Data warehouse for analytics and reporting
 
- Multi-environment workspace management
 - Kubernetes-based state backend
 - Modular Terraform architecture
 - Automated resource naming and tagging
 - Secret management integration
 
- Terraform >= 1.0
 - kubectl configured with cluster access
 - Access to Airbyte server instance
 - Cloud provider credentials (GCP, AWS)
 
- Kubernetes cluster with "playground" context configured
 - Airbyte server with API access
 - Google Cloud Platform project with BigQuery API enabled
 - AWS account with S3 access for Comeet and tikal-datalake buckets
 
git clone <repository-url>
cd cne-airbyte-templatecp terraform.tfvars.example terraform.tfvarsEdit terraform.tfvars with your configuration:
WORKSPACE_ID                        = "your-airbyte-workspace-id"
USERNAME                            = "your-airbyte-username"
PASSWORD                            = "your-airbyte-password"
SERVER_URL                          = "https://your-airbyte-server.com"
SERVICE_ACCOUNT_INFO                = "your-gcp-service-account-json"
BIGQUERY_PROJECT_ID                 = "your-bigquery-project-id"
# ... additional variablesterraform init -backend-config=backend-config/config.k8s.tfbackend# For development
terraform workspace select cne-airbyte-template-dev
# For staging
terraform workspace select cne-airbyte-template-stage
# For production
terraform workspace select cne-airbyte-template-prodterraform plan -var-file=terraform.tfvars
terraform apply -var-file=terraform.tfvars| Variable | Description | Required | 
|---|---|---|
WORKSPACE_ID | 
Airbyte workspace identifier | ✅ | 
USERNAME | 
Airbyte username | ✅ | 
PASSWORD | 
Airbyte password | ✅ | 
SERVER_URL | 
Airbyte server URL | ✅ | 
SERVICE_ACCOUNT_INFO | 
GCP service account JSON | ✅ | 
BIGQUERY_PROJECT_ID | 
BigQuery project ID | ✅ | 
AWS_ACCESS_KEY_ID | 
AWS access key | ✅ | 
AWS_SECRET_ACCESS_KEY | 
AWS secret key | ✅ | 
The template uses Terraform workspaces to manage environments:
# Create new workspace
terraform workspace new cne-airbyte-template-dev
# List workspaces
terraform workspace list
# Switch workspace
terraform workspace select cne-airbyte-template-prodCurrently configured S3 sources:
- comeet_all_candidate: Candidate information from Comeet recruiting platform with detailed CSV schema
 - comeet_all_candidate_steps: Candidate workflow steps and process data
 - tikal-datalake-dev: Unstructured documents including:
- Employee markdown files from GitLab sync
 - Engineering playbooks in HTML format
 
 
Primary data warehouse with:
- Custom dataset organization using namespace formatting from 
source_table_names.json - Environment-specific dataset naming
 - Standard insert loading method with 15MB buffer size
 - Located in 
europe-central2region - Supports both structured CSV and unstructured document data
 
Each environment has its own:
- Terraform workspace (
cne-airbyte-template-{env}) - Airbyte workspace
 - BigQuery datasets
 - Resource naming conventions
 
Connections are defined per environment in locals.{env}.tf:
# Example connection configuration
"comeet_all_candidate → ${local.BIGQUERY_NAME_DEV}" = {
  source_id                            = module.s3_source["comeet_all_candidate"].source_id
  destination_id                       = module.bigquery_destination.destination_id
  status                               = "active"
  non_breaking_schema_updates_behavior = "ignore"
  namespace_definition                 = "custom_format"
  namespace_format                     = local.namespace_formats["sources_comeet"]
  schedule = {
    schedule_type   = "manual"
    cron_expression = ""
  }
  streams = [
    {
      sync_mode = "full_refresh_overwrite"
      name      = "all_candidates"
      selected  = true
    }
  ]
}- Add the configuration to 
locals.tf: 
"new_s3_source" = {
  configuration = {
    aws_access_key_id     = var.AWS_ACCESS_KEY_ID
    aws_secret_access_key = var.AWS_SECRET_ACCESS_KEY
    bucket                = "your-s3-bucket"
    streams = [
      {
        name                            = "your_data"
        days_to_sync_if_history_is_full = 3
        schemaless                      = false
        globs                           = ["path/to/your/files/*.csv"]
        input_schema                    = "{\"field1\": \"string\", \"field2\": \"number\"}"
        validation_policy               = "Emit Record"
        format = {
          "csv_format" = {
            # CSV format configuration
          }
        }
      }
    ]
  }
  workspace_id = local.workspace_id
}- Add the connection in 
locals.dev.tf(and other environments): 
"new_s3_source → ${local.BIGQUERY_NAME_DEV}" = {
  source_id      = module.s3_source["new_s3_source"].source_id
  destination_id = module.bigquery_destination.destination_id
  # ... other configuration
}- Add namespace format in 
source_table_names.json: 
{
  "sources_new_s3": "sources_new_s3"
}# Deploy only BigQuery destination
terraform apply -target=module.bigquery_destination
# Deploy specific S3 source
terraform apply -target=module.s3_source["comeet_all_candidate"]
# Deploy all connections
terraform apply -target=module.connections# View state
terraform state list
# Import existing resource
terraform import module.bigquery_destination.airbyte_destination_bigquery.destination <resource-id>
# Remove resource from state
terraform state rm module.old_source.airbyte_source_s3.sourceError: failed to create Airbyte client: authentication failedSolution: Verify USERNAME, PASSWORD, and SERVER_URL in terraform.tfvars
Error: workspace "cne-airbyte-template-dev" does not existSolution: Create the workspace first:
terraform workspace new cne-airbyte-template-devError: googleapi: Error 403: Access DeniedSolution: Ensure the service account has BigQuery Data Editor and Job User roles
Error: state lock acquired by another processSolution: Force unlock (use with caution):
terraform force-unlock <lock-id>Enable detailed logging:
export TF_LOG=DEBUG
terraform plan -var-file=terraform.tfvarsValidate configuration before applying:
terraform validate
terraform fmt -check- Fork the repository
 - Create a feature branch: 
git checkout -b feature/new-source - Make changes: Add new modules or modify existing ones
 - Test changes: Validate with 
terraform plan - Submit a pull request
 
- Use consistent naming conventions
 - Add appropriate comments and documentation
 - Follow Terraform best practices
 - Test in development environment first
 
When adding new source or destination modules:
- Create module directory: 
sources/new-source/ordestinations/new-destination/ - Add 
main.tf,variables.tf, andoutputs.tf - Update root 
main.tfto include the new module - Add environment-specific configurations in 
locals.{env}.tf - Update documentation
 
This project is licensed under the MIT License - see the LICENSE file for details.
For support and questions:
- Create an issue in this repository
 - Check the CLAUDE.md file for technical details
 - Refer to Airbyte documentation
 - Consult Terraform documentation
 
Happy Data Engineering! 🎉