Skip to content

CUMULUS-4707, CUMULUS-4708#4406

Merged
ygliuvt merged 17 commits intomasterfrom
cumulus-4707-1
Apr 21, 2026
Merged

CUMULUS-4707, CUMULUS-4708#4406
ygliuvt merged 17 commits intomasterfrom
cumulus-4707-1

Conversation

@ygliuvt
Copy link
Copy Markdown
Member

@ygliuvt ygliuvt commented Apr 16, 2026

Summary: Summary of changes

Addresses CUMULUS-4707: Implement list of granules route in iceberg search api and CUMULUS-4708: Implement list of executions route in iceberg search api

Changes

  • Added scaffolding from iceberg API to backend Glue database in AWS.
    This can be tested locally:
    cd packages/api

NODE_ENV=development
AWS_ACCOUNT_ID=<sandbox_account_id>
api_config_secret_id=<api_config_secret_arn>
dynamoTableNameString='{"AccessTokensTable":"<access_token_table>"}'
ICEBERG_GLUE_SCHEMA=<test_glue_schema>
AWS_REGION=us-east-1
PORT=5001
node app/iceberg-index.js

Then, test with:
curl -i -H "Authorization: Bearer $token" "http://localhost:5001/granules"
curl -i -H "Authorization: Bearer $token" "http://localhost:5001/executions"

Also can be tested using the docker image that is built locally (see instruction at packages/api/app/README.md).

Test the terraform change by deploying the cumulus stack with configuration:

deploy_iceberg_api = true
iceberg_s3_bucket = "yliu-sandbox-test-iceberg"
iceberg_glue_schema = "yliu_test"

Or simply inspect and test the cumulus stack that is deployed in Sandbox by CICD for this PR:

export AWS_PROFILE=cumulus-sandbox
export PREFIX=yliu10-ci-tf

### tunnel to EC2 instance, your tunnel command might be different depending on your local setup
export EC2HOSTID=$(aws ec2 describe-instances --filters Name=tag:Deployment,Values="$PREFIX" Name=instance-state-name,Values=running | jq -r ".Reservations[0].Instances[].InstanceId")
ssh -I /usr/lib/ssh-keychain.dylib yliu10@$EC2HOSTID

### acquire cumulus token from API Gateway
curl -Ln -bj --socks5-hostname localhost:8080 "https://<yliu10-ci-tf-archive_apigateway id>.execute-api.us-east-1.amazonaws.com/dev/token" --insecure

### search for granules using iceberg api
export token=<the output from the above curl token result>
curl -i -H "Authorization: Bearer $token" --socks5-hostname localhost:8080 "https://<yliu10-ci-tf_loadbalancer_dns>/granules" --insecure 

Because of the Sandbox Postgres db currently has no granules and some executions, you will see empty result for granules search and non-empty results for executions search.

PR Checklist

  • Update CHANGELOG
  • Unit tests
  • Ad-hoc testing - Deploy changes and test manually
  • Integration tests

📝 Note:
For most pull requests, please Squash and merge to maintain a clean and readable commit history.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds ECS-based “Iceberg API” scaffolding to query Iceberg tables via DuckDB + AWS Glue, including Terraform wiring (task role + env vars) and updating the Iceberg read-only endpoints to use the new DuckDB-backed S3 search implementations.

Changes:

  • Add Terraform inputs/role policy/env vars needed to run the Iceberg API in ECS against Glue/Iceberg.
  • Replace Iceberg API endpoints to use @cumulus/db/duckdb search classes and a shared DuckDB connection pool.
  • Introduce a new @cumulus/db/duckdb subpath export and update builds (webpack/Docker) to avoid bundling native DuckDB in Lambda.

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
tf-modules/iceberg_api/variables.tf Adds new Iceberg/Glue inputs; adjusts ECS task sizing defaults
tf-modules/iceberg_api/main.tf Injects Glue env vars and switches task role to module-managed IAM role
tf-modules/iceberg_api/iam.tf Adds ECS task role + permissions for S3/Dynamo/Secrets/Glue
tf-modules/iceberg_api/README.md Updates module docs (but currently incomplete/outdated)
tf-modules/cumulus/variables.tf Adds top-level Cumulus variables for Iceberg bucket/schema
tf-modules/cumulus/iceberg_api.tf Wires new variables into the iceberg_api module
packages/db/tests/test-iceberg-connection.js Removes old Knex-based Iceberg connection tests
packages/db/src/s3search/DuckDBSearchExecutor.ts Adds debug logging of executed SQL
packages/db/src/index.ts Removes Iceberg/duckdb-related exports from default entrypoint
packages/db/src/iceberg-connection.ts Replaces Knex singleton with DuckDB instance + connection pool + Glue attach/views
packages/db/src/duckdb.ts Adds dedicated DuckDB/Iceberg entrypoint exports
packages/db/package.json Adds package exports map including ./duckdb subpath
packages/api/webpack.config.js Excludes Iceberg-specific files from Lambda bundling; externalizes DuckDB deps
packages/api/endpoints/iceberg-rules.js Switches rules list endpoint to DuckDB-backed S3 search
packages/api/endpoints/iceberg-reconciliation-reports.js Switches reconciliation-reports list endpoint to DuckDB-backed S3 search
packages/api/endpoints/iceberg-providers.js Switches providers list endpoint to DuckDB-backed S3 search
packages/api/endpoints/iceberg-pdrs.js Switches PDRs list endpoint to DuckDB-backed S3 search
packages/api/endpoints/iceberg-granules.js Switches granules list endpoint to DuckDB-backed S3 search (keeps ORCA status option)
packages/api/endpoints/iceberg-executions.js Switches executions list endpoint to DuckDB-backed S3 search
packages/api/endpoints/iceberg-collections.js Switches collections list endpoint to DuckDB-backed S3 search
packages/api/endpoints/iceberg-async-operations.js Switches async-operations list endpoint to DuckDB-backed S3 search
packages/api/app/iceberg-index.js Updates Iceberg API server startup/shutdown for DuckDB mode
packages/api/app/iceberg-db.js Updates DB init/shutdown helpers to call DuckDB lifecycle functions
packages/api/app/env.local.example Adds local env template for running Iceberg API
packages/api/app/README.md Updates docs to describe Lambda vs ECS Iceberg API modes and env vars
packages/api/app/Dockerfile Switches base image and installs DuckDB extensions into shared extension dir
packages/api/.gitignore Ignores .env.* files under packages/api
example/deployments/cumulus/yliu10-ci-tf.tfvars Adds example Iceberg bucket/schema vars
example/cumulus-tf/variables.tf Adds example variables for Iceberg bucket/schema
example/cumulus-tf/main.tf Passes new Iceberg vars into the cumulus module
Comments suppressed due to low confidence (1)

tf-modules/iceberg_api/README.md:73

  • tf-modules/iceberg_api/README.md Inputs table is now out of date: it still lists iceberg_api_cpu/memory defaults as 256/512 (but variables.tf defaults changed), and it does not document the newly required inputs (iceberg_s3_bucket, aws_account_id, iceberg_glue_schema) nor the new optional permissions_boundary_arn. Update the Usage snippet and Inputs table so module consumers know which variables are required and what the real defaults are.
| tags | Tags to apply to resources | `map(string)` | `{}` | no |
| oauth_provider | OAuth provider | `string` | n/a | yes |
| api_config_secret_arn | ARN of the API config secret | `string` | n/a | yes |
| iceberg_api_cpu | CPU allocation for Iceberg API ECS task | `number` | `256` | no |
| iceberg_api_memory | Memory allocation for Iceberg API ECS task | `number` | `512` | no |
| cumulus_iceberg_api_image_version | Version of the Cumulus Iceberg API image | `string` | n/a | yes |
| ecs_execution_role_arn | ARN of the ECS execution role | `string` | n/a | yes |
| ecs_cluster_arn | ARN of the ECS cluster | `string` | n/a | yes |
| ecs_cluster_name | Name of the ECS cluster | `string` | n/a | yes |
| ecs_cluster_instance_subnet_ids | Subnet IDs for ECS cluster instances | `list(string)` | n/a | yes |
| rds_security_group_id | ID of the RDS security group | `string` | n/a | yes |
| api_service_autoscaling_min_capacity | Minimum capacity for API service autoscaling | `number` | `1` | no |
| api_service_autoscaling_max_capacity | Maximum capacity for API service autoscaling | `number` | `10` | no |
| api_service_autoscaling_target_cpu | Target CPU utilization for API service autoscaling | `number` | `70` | no |

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/db/src/iceberg-connection.ts Outdated
Comment thread packages/db/src/iceberg-connection.ts
Comment thread tf-modules/iceberg_api/iam.tf Outdated
Comment thread tf-modules/cumulus/variables.tf Outdated
Comment thread packages/api/app/env.local.example Outdated
Comment thread packages/api/app/iceberg-index.js
Comment thread packages/api/endpoints/iceberg-executions.js Outdated
Comment thread packages/api/endpoints/iceberg-collections.js Outdated
Comment thread packages/api/endpoints/iceberg-providers.js Outdated
Comment thread packages/api/endpoints/iceberg-async-operations.js Outdated
Comment thread packages/api/app/env.local.example
Copy link
Copy Markdown

@chris-durbin chris-durbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some questions about the Terraform resources - I want to make sure that we're referring to the same set of resources and not duplicating things for the Iceberg API resources and the Iceberg replication resources.

Comment thread packages/api/app/README.md Outdated
Comment thread packages/api/endpoints/iceberg-async-operations.js Outdated
Comment thread packages/db/src/iceberg-search/DuckDBSearchExecutor.ts
Comment thread packages/db/src/iceberg-connection.ts Outdated
Comment thread packages/db/src/iceberg-connection.ts
default = 70
}

variable "iceberg_s3_bucket" {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming @indiejames added these variables already.

Comment thread tf-modules/iceberg_api/iam.tf Outdated
Comment thread tf-modules/iceberg_api/iam.tf Outdated
Comment thread tf-modules/iceberg_api/main.tf Outdated
Comment thread tf-modules/iceberg_api/variables.tf Outdated
Comment thread packages/api/app/README.md Outdated
@ygliuvt ygliuvt merged commit 4db5579 into master Apr 21, 2026
9 checks passed
@ygliuvt ygliuvt deleted the cumulus-4707-1 branch April 21, 2026 14:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants