Skip to content

Latest commit

 

History

History
198 lines (165 loc) · 7.79 KB

File metadata and controls

198 lines (165 loc) · 7.79 KB
subcategory
Unity Catalog

databricks_sql_table (Resource)

Within a metastore, Unity Catalog provides a 3-level namespace for organizing data: Catalogs, databases (also called schemas), and tables/views.

A databricks_sql_table is contained within databricks_schema, and can represent either a managed table, an external table, or a view.

This resource creates and updates the Unity Catalog table/view by executing the necessary SQL queries on a special auto-terminating cluster it would create for this operation. You could also specify a SQL warehouse or cluster for the queries to be executed on.

~> This resource doesn't handle complex cases of schema evolution due to the limitations of Terraform itself. If you need to implement schema evolution it's recommended to use specialized tools, such as, Luquibase and Flyway.

Example Usage

resource "databricks_catalog" "sandbox" {
  name    = "sandbox"
  comment = "this catalog is managed by terraform"
  properties = {
    purpose = "testing"
  }
}

resource "databricks_schema" "things" {
  catalog_name = databricks_catalog.sandbox.id
  name         = "things"
  comment      = "this database is managed by terraform"
  properties = {
    kind = "various"
  }
}

resource "databricks_sql_table" "thing" {
  provider           = databricks.workspace
  name               = "quickstart_table"
  catalog_name       = databricks_catalog.sandbox.name
  schema_name        = databricks_schema.things.name
  table_type         = "MANAGED"
  data_source_format = "DELTA"
  storage_location   = ""

  column {
    name = "id"
    type = "int"
  }
  column {
    name    = "name"
    type    = "string"
    comment = "name of thing"
  }
  comment = "this table is managed by terraform"
}

resource "databricks_sql_table" "thing_view" {
  provider     = databricks.workspace
  name         = "quickstart_table_view"
  catalog_name = databricks_catalog.sandbox.name
  schema_name  = databricks_schema.things.name
  table_type   = "VIEW"
  cluster_id   = "0423-201305-xsrt82qn"

  view_definition = format("SELECT name FROM %s WHERE id == 1", databricks_sql_table.thing.id)

  comment = "this view is managed by terraform"
}

Use an existing warehouse to create a table

resource "databricks_sql_endpoint" "this" {
  name             = "endpoint"
  cluster_size     = "2X-Small"
  max_num_clusters = 1
}

resource "databricks_sql_table" "thing" {
  provider           = databricks.workspace
  name               = "quickstart_table"
  catalog_name       = databricks_catalog.sandbox.name
  schema_name        = databricks_schema.things.name
  table_type         = "MANAGED"
  data_source_format = "DELTA"
  storage_location   = ""
  warehouse_id       = databricks_sql_endpoint.this.id

  column {
    name = "id"
    type = "int"
  }
  column {
    name    = "name"
    type    = "string"
    comment = "name of thing"
  }
  comment = "this table is managed by terraform"
}

resource "databricks_sql_table" "thing_view" {
  provider     = databricks.workspace
  name         = "quickstart_table_view"
  catalog_name = databricks_catalog.sandbox.name
  schema_name  = databricks_schema.things.name
  table_type   = "VIEW"
  warehouse_id = databricks_sql_endpoint.this.id

  view_definition = format("SELECT name FROM %s WHERE id == 1", databricks_sql_table.thing.id)

  comment = "this view is managed by terraform"
}

Use an Identity Column

resource "databricks_catalog" "sandbox" {
  name    = "sandbox"
  comment = "this catalog is managed by terraform"
  properties = {
    purpose = "testing"
  }
}
resource "databricks_schema" "things" {
  catalog_name = databricks_catalog.sandbox.id
  name         = "things"
  comment      = "this database is managed by terraform"
  properties = {
    kind = "various"
  }
}
resource "databricks_sql_table" "thing" {
  provider           = databricks.workspace
  name               = "quickstart_table"
  catalog_name       = databricks_catalog.sandbox.name
  schema_name        = databricks_schema.things.name
  table_type         = "MANAGED"
  data_source_format = "DELTA"
  storage_location   = ""
  column {
    name     = "id"
    type     = "bigint"
    identity = "default"
  }
  column {
    name    = "name"
    type    = "string"
    comment = "name of thing"
  }
  comment = "this table is managed by terraform"
}

Argument Reference

The following arguments are supported:

  • name - Name of table relative to parent catalog and schema. Change forces the creation of a new resource.
  • catalog_name - Name of parent catalog. Change forces the creation of a new resource.
  • schema_name - Name of parent Schema relative to parent Catalog. Change forces the creation of a new resource.
  • table_type - Distinguishes a view vs. managed/external Table. MANAGED, EXTERNAL, or VIEW. Change forces the creation of a new resource.
  • storage_location - (Optional) URL of storage location for Table data (required for EXTERNAL Tables). Not supported for VIEW or MANAGED table_type.
  • data_source_format - (Optional) External tables are supported in multiple data source formats. The string constants identifying these formats are DELTA, CSV, JSON, AVRO, PARQUET, ORC, and TEXT. Change forces the creation of a new resource. Not supported for MANAGED tables or VIEW.
  • view_definition - (Optional) SQL text defining the view (for table_type == "VIEW"). Not supported for MANAGED or EXTERNAL table_type.
  • cluster_id - (Optional) All table CRUD operations must be executed on a running cluster or SQL warehouse. If a cluster_id is specified, it will be used to execute SQL commands to manage this table. If empty, a cluster will be created automatically with the name terraform-sql-table.
  • warehouse_id - (Optional) All table CRUD operations must be executed on a running cluster or SQL warehouse. If a warehouse_id is specified, that SQL warehouse will be used to execute SQL commands to manage this table. Conflicts with cluster_id.
  • cluster_keys - (Optional) a subset of columns to liquid cluster the table by. Conflicts with partitions.
  • storage_credential_name - (Optional) For EXTERNAL Tables only: the name of storage credential to use. Change forces the creation of a new resource.
  • owner - (Optional) User name/group name/sp application_id of the schema owner.
  • comment - (Optional) User-supplied free-form text. Changing the comment is not currently supported on the VIEW table type.
  • options - (Optional) Map of user defined table options. Change forces creation of a new resource.
  • properties - (Optional) A map of table properties.
  • partitions - (Optional) a subset of columns to partition the table by. Change forces the creation of a new resource. Conflicts with cluster_keys. Change forces creation of a new resource.

column configuration block

For table columns Currently, changing the column definitions for a table will require dropping and re-creating the table

  • name - User-visible name of column
  • type - Column type spec (with metadata) as SQL text. Not supported for VIEW table_type.
  • identity - (Optional) Whether the field is an identity column. Can be default, always, or unset. It is unset by default.
  • comment - (Optional) User-supplied free-form text.
  • nullable - (Optional) Whether field is nullable (Default: true)

Attribute Reference

In addition to all the arguments above, the following attributes are exported:

  • id - ID of this table in the form of <catalog_name>.<schema_name>.<name>.

Import

This resource can be imported by its full name:

terraform import databricks_sql_table.this <catalog_name>.<schema_name>.<name>