Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds ECS-based “Iceberg API” scaffolding to query Iceberg tables via DuckDB + AWS Glue, including Terraform wiring (task role + env vars) and updating the Iceberg read-only endpoints to use the new DuckDB-backed S3 search implementations.
Changes:
- Add Terraform inputs/role policy/env vars needed to run the Iceberg API in ECS against Glue/Iceberg.
- Replace Iceberg API endpoints to use
@cumulus/db/duckdbsearch classes and a shared DuckDB connection pool. - Introduce a new
@cumulus/db/duckdbsubpath export and update builds (webpack/Docker) to avoid bundling native DuckDB in Lambda.
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| tf-modules/iceberg_api/variables.tf | Adds new Iceberg/Glue inputs; adjusts ECS task sizing defaults |
| tf-modules/iceberg_api/main.tf | Injects Glue env vars and switches task role to module-managed IAM role |
| tf-modules/iceberg_api/iam.tf | Adds ECS task role + permissions for S3/Dynamo/Secrets/Glue |
| tf-modules/iceberg_api/README.md | Updates module docs (but currently incomplete/outdated) |
| tf-modules/cumulus/variables.tf | Adds top-level Cumulus variables for Iceberg bucket/schema |
| tf-modules/cumulus/iceberg_api.tf | Wires new variables into the iceberg_api module |
| packages/db/tests/test-iceberg-connection.js | Removes old Knex-based Iceberg connection tests |
| packages/db/src/s3search/DuckDBSearchExecutor.ts | Adds debug logging of executed SQL |
| packages/db/src/index.ts | Removes Iceberg/duckdb-related exports from default entrypoint |
| packages/db/src/iceberg-connection.ts | Replaces Knex singleton with DuckDB instance + connection pool + Glue attach/views |
| packages/db/src/duckdb.ts | Adds dedicated DuckDB/Iceberg entrypoint exports |
| packages/db/package.json | Adds package exports map including ./duckdb subpath |
| packages/api/webpack.config.js | Excludes Iceberg-specific files from Lambda bundling; externalizes DuckDB deps |
| packages/api/endpoints/iceberg-rules.js | Switches rules list endpoint to DuckDB-backed S3 search |
| packages/api/endpoints/iceberg-reconciliation-reports.js | Switches reconciliation-reports list endpoint to DuckDB-backed S3 search |
| packages/api/endpoints/iceberg-providers.js | Switches providers list endpoint to DuckDB-backed S3 search |
| packages/api/endpoints/iceberg-pdrs.js | Switches PDRs list endpoint to DuckDB-backed S3 search |
| packages/api/endpoints/iceberg-granules.js | Switches granules list endpoint to DuckDB-backed S3 search (keeps ORCA status option) |
| packages/api/endpoints/iceberg-executions.js | Switches executions list endpoint to DuckDB-backed S3 search |
| packages/api/endpoints/iceberg-collections.js | Switches collections list endpoint to DuckDB-backed S3 search |
| packages/api/endpoints/iceberg-async-operations.js | Switches async-operations list endpoint to DuckDB-backed S3 search |
| packages/api/app/iceberg-index.js | Updates Iceberg API server startup/shutdown for DuckDB mode |
| packages/api/app/iceberg-db.js | Updates DB init/shutdown helpers to call DuckDB lifecycle functions |
| packages/api/app/env.local.example | Adds local env template for running Iceberg API |
| packages/api/app/README.md | Updates docs to describe Lambda vs ECS Iceberg API modes and env vars |
| packages/api/app/Dockerfile | Switches base image and installs DuckDB extensions into shared extension dir |
| packages/api/.gitignore | Ignores .env.* files under packages/api |
| example/deployments/cumulus/yliu10-ci-tf.tfvars | Adds example Iceberg bucket/schema vars |
| example/cumulus-tf/variables.tf | Adds example variables for Iceberg bucket/schema |
| example/cumulus-tf/main.tf | Passes new Iceberg vars into the cumulus module |
Comments suppressed due to low confidence (1)
tf-modules/iceberg_api/README.md:73
- tf-modules/iceberg_api/README.md Inputs table is now out of date: it still lists iceberg_api_cpu/memory defaults as 256/512 (but variables.tf defaults changed), and it does not document the newly required inputs (iceberg_s3_bucket, aws_account_id, iceberg_glue_schema) nor the new optional permissions_boundary_arn. Update the Usage snippet and Inputs table so module consumers know which variables are required and what the real defaults are.
| tags | Tags to apply to resources | `map(string)` | `{}` | no |
| oauth_provider | OAuth provider | `string` | n/a | yes |
| api_config_secret_arn | ARN of the API config secret | `string` | n/a | yes |
| iceberg_api_cpu | CPU allocation for Iceberg API ECS task | `number` | `256` | no |
| iceberg_api_memory | Memory allocation for Iceberg API ECS task | `number` | `512` | no |
| cumulus_iceberg_api_image_version | Version of the Cumulus Iceberg API image | `string` | n/a | yes |
| ecs_execution_role_arn | ARN of the ECS execution role | `string` | n/a | yes |
| ecs_cluster_arn | ARN of the ECS cluster | `string` | n/a | yes |
| ecs_cluster_name | Name of the ECS cluster | `string` | n/a | yes |
| ecs_cluster_instance_subnet_ids | Subnet IDs for ECS cluster instances | `list(string)` | n/a | yes |
| rds_security_group_id | ID of the RDS security group | `string` | n/a | yes |
| api_service_autoscaling_min_capacity | Minimum capacity for API service autoscaling | `number` | `1` | no |
| api_service_autoscaling_max_capacity | Maximum capacity for API service autoscaling | `number` | `10` | no |
| api_service_autoscaling_target_cpu | Target CPU utilization for API service autoscaling | `number` | `70` | no |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…lding time shorter.
chris-durbin
left a comment
There was a problem hiding this comment.
I had some questions about the Terraform resources - I want to make sure that we're referring to the same set of resources and not duplicating things for the Iceberg API resources and the Iceberg replication resources.
| default = 70 | ||
| } | ||
|
|
||
| variable "iceberg_s3_bucket" { |
There was a problem hiding this comment.
I'm assuming @indiejames added these variables already.
Summary: Summary of changes
Addresses CUMULUS-4707: Implement list of granules route in iceberg search api and CUMULUS-4708: Implement list of executions route in iceberg search api
Changes
This can be tested locally:
cd packages/api
NODE_ENV=development
AWS_ACCOUNT_ID=<sandbox_account_id>
api_config_secret_id=<api_config_secret_arn>
dynamoTableNameString='{"AccessTokensTable":"<access_token_table>"}'
ICEBERG_GLUE_SCHEMA=<test_glue_schema>
AWS_REGION=us-east-1
PORT=5001
node app/iceberg-index.js
Then, test with:
curl -i -H "Authorization: Bearer $token" "http://localhost:5001/granules"
curl -i -H "Authorization: Bearer $token" "http://localhost:5001/executions"
Also can be tested using the docker image that is built locally (see instruction at packages/api/app/README.md).
Test the terraform change by deploying the cumulus stack with configuration:
Or simply inspect and test the cumulus stack that is deployed in Sandbox by CICD for this PR:
PR Checklist
📝 Note:
For most pull requests, please Squash and merge to maintain a clean and readable commit history.