Iceberg output plugin for Embulk

embulk-output-iceberg is the Embulk output plugin for Apache Iceberg.

Overview

Required Embulk version >= 0.11.5.
Java 11. iceberg API support Java 11 above. (Despite Embulk official support is Java 8)

Plugin type: output
Resume supported: no
Cleanup supported: no
Guess supported: no

Configuration

Now Only support REST Catalog with MinIO Storage, Glue Catalog and JDBC Catalog.

Embulk Configuration

catalog_name catalog name. if jdbc need to set correct catalog name. (string, optional)
namespace catalog namespace name. if glue set glue database. (string, required)
table catalog table name (string, required)
catalog_type catalog type. "REST", "JDBC", "GLUE" is available. (string, required)
uri catalog uri. if "REST" use http URI scheme. if "JDBC" use JDBC driver URI. (string, required)
warehouse_location warehouse to save data. if use S3, URI scheme. (string, required)
file_io_impl implementation of file io. (string, required)
endpoint: Object Storage endpoint. if set path_style_access true, actual path is like "http://localhost:9000/warehouse/" (string, required)
path_style_access: use path url (string, required)
jdbc_driver_path: jdbc driver jar file path. (string, optional)
jdbc_driver_class_name: jdbc class name (string, optional)
jdbc_user: jdbc database user name (string, optional)
jdbc_pass: jdbc database password (string, optional)
file_format: iceberg datafile format. "PARQUET" "ORC" "AVRO" is available (string, optional default "PARQUET")
file_size: iceberg datafile size. (maybe before compression size) (long, optional default 134217728(250MB))
mode: "APPEND" "DELETE_APPEND" is available. "APPEND" means only add data. "DELETE_APPEND" means delete all data first and add data. (string, optional default "APPEND")

environment

When access Object Storage, normally use org.apache.iceberg.aws.s3.S3FileIO.
We need to set Environment Variable below to access Object Storage.

AWS_REGION
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY

Example

Example is written by maven style. rubygem style is also available.

Only append data.

out: 
  type:
    source: maven
    group: io.github.shin1103 
    name: iceberg 
    version: 0.2.0
  catalog_name: "pg-iceberg"
  namespace: "taxi"
  table: "taxi_dataset_copy"
  catalog_type: "jdbc"
  jdbc_driver_path: "C:\\lib\\.my_local\\postgresql-42.7.5.jar"
  jdbc_driver_class_name: "org.postgresql.Driver"
  jdbc_user: "user"
  jdbc_pass: "password"
  uri: "jdbc:postgresql://localhost:5432/postgres"
  warehouse_location: "s3://iceberg/"
  file_io_impl: "org.apache.iceberg.aws.s3.S3FileIO"
  endpoint: "http://localhost:9000/"
  path_style_access: "true"
  mode: "APPEND"

Delete all data and append data.

out:
  type:
    source: maven
    group: io.github.shin1103
    name: iceberg
    version: 0.2.0
  catalog_name: "pg-iceberg"
  namespace: "taxi"
  table: "taxi_dataset_copy"
  catalog_type: "jdbc"
  jdbc_driver_path: "C:\\lib\\.my_local\\postgresql-42.7.5.jar"
  jdbc_driver_class_name: "org.postgresql.Driver"
  jdbc_user: "user"
  jdbc_pass: "password"
  uri: "jdbc:postgresql://localhost:5432/postgres"
  warehouse_location: "s3://iceberg/"
  file_io_impl: "org.apache.iceberg.aws.s3.S3FileIO"
  endpoint: "http://localhost:9000/"
  path_style_access: "true"
  mode: "DELETE_APPEND"

Change file format and file size. Multi filter is available.

out:
  type:
    source: maven
    group: io.github.shin1103
    name: iceberg
    version: 0.2.0
  catalog_name: "pg-iceberg"
  namespace: "taxi"
  table: "taxi_dataset_copy"
  catalog_type: "jdbc"
  jdbc_driver_path: "C:\\lib\\.my_local\\postgresql-42.7.5.jar"
  jdbc_driver_class_name: "org.postgresql.Driver"
  jdbc_user: "user"
  jdbc_pass: "password"
  uri: "jdbc:postgresql://localhost:5432/postgres"
  warehouse_location: "s3://iceberg/"
  file_io_impl: "org.apache.iceberg.aws.s3.S3FileIO"
  endpoint: "http://localhost:9000/"
  path_style_access: "true"
  file_format: "ORC"
  file_size: 2097152

Types

Types are different from iceberg and Embulk. So some iceberg type isn't supported.

Unsupported Iceberg Types

FIXED
BINARY
STRUCT
LIST
MAP
VARIANT
UNKNOWN
TIMESTAMP_NS
TIMESTAMPTZ_NS
GEOMETRY

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
config/checkstyle		config/checkstyle
gradle/wrapper		gradle/wrapper
src/main/java/io/github/shin1103/embulk		src/main/java/io/github/shin1103/embulk
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Iceberg output plugin for Embulk

Overview

Configuration

Embulk Configuration

environment

Example

Types

About

Uh oh!

Releases 1

Packages

Languages

License

shin1103/embulk-output-iceberg

Folders and files

Latest commit

History

Repository files navigation

Iceberg output plugin for Embulk

Overview

Configuration

Embulk Configuration

environment

Example

Types

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages