Data Mesh Terraform module "Confluent Kafka to GCP BigQuery"

This Terraform module provisions the necessary services to provide a data product on the Google Cloud Platform.

Services

Confluent Kafka
Google BigQuery
Google Cloud Functions

Prerequisites

Google Cloud APIs

You need to enable some APIs of your Google Cloud project. E.g. enable it with the gcloud command line tool:

gcloud services enable <SERVICE_NAME>

For the Kafka to BigQuery Connector you need:

BigQuery API (bigquery.googleapis.com)
Identity and Access Management (IAM) API (iam.googleapis.com)

In addition, you need some APIs for the discovery endpoint, which runs on Cloud Functions:

Cloud Functions API (cloudfunctions.googleapis.com)
Cloud Run Admin API (run.googleapis.com)
Artifact Registry API (artifactregistry.googleapis.com)
Cloud Build API (cloudbuild.googleapis.com)
Cloud Storage API (storage.googleapis.com)

Google Cloud IAM permissions

You need use a service account with an IAM role which grants the following permissions:

run.services.getIamPolicy
run.services.setIamPolicy
run.services.getIamPolicy
bigquery.datasets.create
bigquery.datasets.get
bigquery.models.delete
bigquery.routines.delete
bigquery.tables.getIamPolicy
cloudfunctions.functions.create
cloudfunctions.functions.delete
cloudfunctions.functions.get
cloudfunctions.functions.getIamPolicy
cloudfunctions.functions.update
cloudfunctions.operations.get
compute.disks.create
compute.disks.get
compute.globalOperations.get
compute.instances.create
compute.instances.delete
compute.instances.get
compute.instances.setTags
compute.networks.create
compute.networks.delete
compute.networks.get
compute.subnetworks.use
compute.subnetworks.useExternalIp
compute.zoneOperations.get
compute.zones.get
iam.serviceAccountKeys.create
iam.serviceAccountKeys.delete
iam.serviceAccountKeys.get
iam.serviceAccounts.actAs
iam.serviceAccounts.create
iam.serviceAccounts.delete
iam.serviceAccounts.get
storage.buckets.create
storage.buckets.delete
storage.objects.get
storage.objects.list

Usage

module "kafka_to_bigquery" {
  source = "[email protected]:datamesh-architecture/terraform-dataproduct-confluent-kafka-to-gcp-bigquery.git"
  
  domain = "<data_product_domain>"
  name   = "<data_product_name>"
  input = [
    {
      topic  = "<topic_name>"
      format = "<topic_format>"
    }
  ]

  output = {
    data_access      = ["<gcp_principal>"]
    discovery_access = ["<gcp_principal>"]
    tables = [
      {
        id                = "<table_name>" # must be equal to corresponding topic
        schema            = "<table_schema_path>"
        delete_on_destroy = false # set true for development or testing environments
      }
    ]
  }
  
  # optional settings for time based partitioning, if needed
  output_tables_time_partitioning = {
    "stock" = {
      type  = "<time_partitioning_type>" # DAY, HOUR, MONTH, YEAR
      field = "<time_partitioning_field>" # optional, uses consumption time if null
    }
  }
}

Note: You can put all kind of principals into data_access and discovery_access.

Endpoint data

The module creates an RESTful endpoint via Google Cloud Functions (e.g. https://info-xxxxxxxxxx-xx.a.run.app). This endpoint can be used as an input for another data product or to retrieve information about this data product.

{
    "domain": "<data_product_domain>",
    "name": "<data_product_name>",
    "output": {
        "locations": ["<big_query_table_uri>"]
    }
}

Examples

Examples, how to use this module, can be found in a separate GitHub repository.

Authors

This terraform module is maintained by Stefan Negele, Christine Koppelt, Jochen Christ, and Simon Harrer.

License

MIT License.

Requirements

Name	Version
confluent	>= 1.35
google	>= 4.59.0

Providers

Name	Version
archive	n/a
confluent	>= 1.35
google	>= 4.59.0
local	n/a

Modules

No modules.

Resources

Name	Type
confluent_api_key.app-consumer-kafka-api-key	resource
confluent_api_key.app-producer-kafka-api-key	resource
confluent_connector.sink	resource
confluent_kafka_acl.app-connector-create-on-dlq-lcc-topics	resource
confluent_kafka_acl.app-connector-create-on-error-lcc-topics	resource
confluent_kafka_acl.app-connector-create-on-success-lcc-topics	resource
confluent_kafka_acl.app-connector-describe-on-cluster	resource
confluent_kafka_acl.app-connector-read-on-connect-lcc-group	resource
confluent_kafka_acl.app-connector-read-on-target-topic	resource
confluent_kafka_acl.app-connector-write-on-dlq-lcc-topics	resource
confluent_kafka_acl.app-connector-write-on-error-lcc-topics	resource
confluent_kafka_acl.app-connector-write-on-success-lcc-topics	resource
confluent_kafka_acl.app-consumer-read-on-group	resource
confluent_kafka_acl.app-consumer-read-on-topic	resource
confluent_kafka_acl.app-producer-write-on-topic	resource
confluent_service_account.app-connector	resource
confluent_service_account.app-consumer	resource
confluent_service_account.app-producer	resource
google_bigquery_dataset.gcp_bigquery_dataset	resource
google_bigquery_dataset_iam_binding.kafka_sink_dataset_iam_binding	resource
google_bigquery_table.gcp_bigquery_tables	resource
google_bigquery_table_iam_binding.table_iam_binding	resource
google_cloud_run_service_iam_policy.policy	resource
google_cloudfunctions2_function.function	resource
google_service_account.kafka_sink_gcp_service_account	resource
google_service_account_key.kafka_sink_gcp_service_account_key	resource
google_storage_bucket.bucket	resource
google_storage_bucket_object.function_source	resource
local_file.info_lambda_index_js	resource
local_file.info_lambda_package_json	resource
archive_file.info_lambda_archive	data source
google_iam_policy.allow_invocations	data source

Inputs

Name	Description	Type	Default	Required
domain	The domain of the data product	`string`	n/a	yes
gcp	project: The GCP project of your data product region: The GCP region where your data product should be located	object({ project = string region = string })	n/a	yes
input	topic: Name of the Kafka topic which should be processed format: Currently only 'JSON' is supported	list(object({ topic = string format = string }))	n/a	yes
kafka	Information and credentials about/from the Kafka cluster	object({ environment = object({ id = string }) cluster = object({ id = string api_version = string kind = string rest_endpoint = string }) credentials = object({ api_key_id = string api_key_secret = string }) })	n/a	yes
name	The name of the data product	`string`	n/a	yes
output	dataset_id: The id of the dataset in which your data product will exist dataset_description: A description of the dataset grant_access: List of users with access to the data product discovery_access: List of users with access to the discovery endpoint region: The google cloud region in which your data product should be created tables.id: The table_id of your data product, which will be used to create a BigQuery table. Must be equal to the corresponding kafka topic name. tables.schema: The path to the products bigquery schema tables.delete_on_destroy: 'true' if the BigQuery table should be deleted if the terraform resource gets destroyed. Use with care!	object({ data_access = list(string) discovery_access = list(string) tables = list(object({ id = string schema = string delete_on_destroy = bool })) })	n/a	yes
output_tables_time_partitioning	You can configure time based partitioning by passing an object which has the tables id as its key. type: Possible values are: DAY, HOUR, MONTH, YEAR field: The field which should be used for partitioning. Falls back to consumption time, if null is passed.	map(object({ type = string field = string }))	`{}`	no

Outputs

Name	Description
dataset_id	The id of the Google BigQuery dataset
discovery_endpoint	The URI of the generated discovery endpoint
project	The Google Cloud project
table_ids	The ids of all created Google BigQuery tables

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
info		info
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
confluent_kafka.tf		confluent_kafka.tf
default.nix		default.nix
endpoint.tf		endpoint.tf
gcp_bigquery.tf		gcp_bigquery.tf
output.tf		output.tf
providers.tf		providers.tf
variables.tf		variables.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Mesh Terraform module "Confluent Kafka to GCP BigQuery"

Services

Prerequisites

Google Cloud APIs

Google Cloud IAM permissions

Usage

Endpoint data

Examples

Authors

License

Requirements

Providers

Modules

Resources

Inputs

Outputs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

datamesh-architecture/terraform-google-dataproduct-confluent-kafka-to-gcp-bigquery

Folders and files

Latest commit

History

Repository files navigation

Data Mesh Terraform module "Confluent Kafka to GCP BigQuery"

Services

Prerequisites

Google Cloud APIs

Google Cloud IAM permissions

Usage

Endpoint data

Examples

Authors

License

Requirements

Providers

Modules

Resources

Inputs

Outputs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages