Skip to content

datamesh-architecture/terraform-google-dataproduct-confluent-kafka-to-gcp-bigquery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Mesh Terraform module "Confluent Kafka to GCP BigQuery"

This Terraform module provisions the necessary services to provide a data product on the Google Cloud Platform.

Overview

Services

  • Confluent Kafka
  • Google BigQuery
  • Google Cloud Functions

Prerequisites

Google Cloud APIs

You need to enable some APIs of your Google Cloud project. E.g. enable it with the gcloud command line tool:

gcloud services enable <SERVICE_NAME>

For the Kafka to BigQuery Connector you need:

  • BigQuery API (bigquery.googleapis.com)
  • Identity and Access Management (IAM) API (iam.googleapis.com)

In addition, you need some APIs for the discovery endpoint, which runs on Cloud Functions:

  • Cloud Functions API (cloudfunctions.googleapis.com)
  • Cloud Run Admin API (run.googleapis.com)
  • Artifact Registry API (artifactregistry.googleapis.com)
  • Cloud Build API (cloudbuild.googleapis.com)
  • Cloud Storage API (storage.googleapis.com)

Google Cloud IAM permissions

You need use a service account with an IAM role which grants the following permissions:

run.services.getIamPolicy
run.services.setIamPolicy
run.services.getIamPolicy
bigquery.datasets.create
bigquery.datasets.get
bigquery.models.delete
bigquery.routines.delete
bigquery.tables.getIamPolicy
cloudfunctions.functions.create
cloudfunctions.functions.delete
cloudfunctions.functions.get
cloudfunctions.functions.getIamPolicy
cloudfunctions.functions.update
cloudfunctions.operations.get
compute.disks.create
compute.disks.get
compute.globalOperations.get
compute.instances.create
compute.instances.delete
compute.instances.get
compute.instances.setTags
compute.networks.create
compute.networks.delete
compute.networks.get
compute.subnetworks.use
compute.subnetworks.useExternalIp
compute.zoneOperations.get
compute.zones.get
iam.serviceAccountKeys.create
iam.serviceAccountKeys.delete
iam.serviceAccountKeys.get
iam.serviceAccounts.actAs
iam.serviceAccounts.create
iam.serviceAccounts.delete
iam.serviceAccounts.get
storage.buckets.create
storage.buckets.delete
storage.objects.get
storage.objects.list

Usage

module "kafka_to_bigquery" {
  source = "[email protected]:datamesh-architecture/terraform-dataproduct-confluent-kafka-to-gcp-bigquery.git"
  
  domain = "<data_product_domain>"
  name   = "<data_product_name>"
  input = [
    {
      topic  = "<topic_name>"
      format = "<topic_format>"
    }
  ]

  output = {
    data_access      = ["<gcp_principal>"]
    discovery_access = ["<gcp_principal>"]
    tables = [
      {
        id                = "<table_name>" # must be equal to corresponding topic
        schema            = "<table_schema_path>"
        delete_on_destroy = false # set true for development or testing environments
      }
    ]
  }
  
  # optional settings for time based partitioning, if needed
  output_tables_time_partitioning = {
    "stock" = {
      type  = "<time_partitioning_type>" # DAY, HOUR, MONTH, YEAR
      field = "<time_partitioning_field>" # optional, uses consumption time if null
    }
  }
}

Note: You can put all kind of principals into data_access and discovery_access.

Endpoint data

The module creates an RESTful endpoint via Google Cloud Functions (e.g. https://info-xxxxxxxxxx-xx.a.run.app). This endpoint can be used as an input for another data product or to retrieve information about this data product.

{
    "domain": "<data_product_domain>",
    "name": "<data_product_name>",
    "output": {
        "locations": ["<big_query_table_uri>"]
    }
}

Examples

Examples, how to use this module, can be found in a separate GitHub repository.

Authors

This terraform module is maintained by Stefan Negele, Christine Koppelt, Jochen Christ, and Simon Harrer.

License

MIT License.

Requirements

Name Version
confluent >= 1.35
google >= 4.59.0

Providers

Name Version
archive n/a
confluent >= 1.35
google >= 4.59.0
local n/a

Modules

No modules.

Resources

Name Type
confluent_api_key.app-consumer-kafka-api-key resource
confluent_api_key.app-producer-kafka-api-key resource
confluent_connector.sink resource
confluent_kafka_acl.app-connector-create-on-dlq-lcc-topics resource
confluent_kafka_acl.app-connector-create-on-error-lcc-topics resource
confluent_kafka_acl.app-connector-create-on-success-lcc-topics resource
confluent_kafka_acl.app-connector-describe-on-cluster resource
confluent_kafka_acl.app-connector-read-on-connect-lcc-group resource
confluent_kafka_acl.app-connector-read-on-target-topic resource
confluent_kafka_acl.app-connector-write-on-dlq-lcc-topics resource
confluent_kafka_acl.app-connector-write-on-error-lcc-topics resource
confluent_kafka_acl.app-connector-write-on-success-lcc-topics resource
confluent_kafka_acl.app-consumer-read-on-group resource
confluent_kafka_acl.app-consumer-read-on-topic resource
confluent_kafka_acl.app-producer-write-on-topic resource
confluent_service_account.app-connector resource
confluent_service_account.app-consumer resource
confluent_service_account.app-producer resource
google_bigquery_dataset.gcp_bigquery_dataset resource
google_bigquery_dataset_iam_binding.kafka_sink_dataset_iam_binding resource
google_bigquery_table.gcp_bigquery_tables resource
google_bigquery_table_iam_binding.table_iam_binding resource
google_cloud_run_service_iam_policy.policy resource
google_cloudfunctions2_function.function resource
google_service_account.kafka_sink_gcp_service_account resource
google_service_account_key.kafka_sink_gcp_service_account_key resource
google_storage_bucket.bucket resource
google_storage_bucket_object.function_source resource
local_file.info_lambda_index_js resource
local_file.info_lambda_package_json resource
archive_file.info_lambda_archive data source
google_iam_policy.allow_invocations data source

Inputs

Name Description Type Default Required
domain The domain of the data product string n/a yes
gcp project: The GCP project of your data product
region: The GCP region where your data product should be located
object({
project = string
region = string
})
n/a yes
input topic: Name of the Kafka topic which should be processed
format: Currently only 'JSON' is supported
list(object({
topic = string
format = string
}))
n/a yes
kafka Information and credentials about/from the Kafka cluster
object({
environment = object({
id = string
})
cluster = object({
id = string
api_version = string
kind = string
rest_endpoint = string
})
credentials = object({
api_key_id = string
api_key_secret = string
})
})
n/a yes
name The name of the data product string n/a yes
output dataset_id: The id of the dataset in which your data product will exist
dataset_description: A description of the dataset
grant_access: List of users with access to the data product
discovery_access: List of users with access to the discovery endpoint
region: The google cloud region in which your data product should be created
tables.id: The table_id of your data product, which will be used to create a BigQuery table. Must be equal to the corresponding kafka topic name.
tables.schema: The path to the products bigquery schema
tables.delete_on_destroy: 'true' if the BigQuery table should be deleted if the terraform resource gets destroyed. Use with care!
object({
data_access = list(string)
discovery_access = list(string)
tables = list(object({
id = string
schema = string
delete_on_destroy = bool
}))
})
n/a yes
output_tables_time_partitioning You can configure time based partitioning by passing an object which has the tables id as its key.
type: Possible values are: DAY, HOUR, MONTH, YEAR
field: The field which should be used for partitioning. Falls back to consumption time, if null is passed.
map(object({
type = string
field = string
}))
{} no

Outputs

Name Description
dataset_id The id of the Google BigQuery dataset
discovery_endpoint The URI of the generated discovery endpoint
project The Google Cloud project
table_ids The ids of all created Google BigQuery tables

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •