CLI for estimating embedding costs from PostgreSQL tables.
It scans selected tables, converts rows to text, counts tokens with js-tiktoken, and calculates estimated cost across multiple embedding models.
- Interactive CLI flow with
@clack/prompts - Parallel token counting with Node.js worker threads
- Large-table chunking for better throughput
- Real-time progress JSON output (
cost_estimation_progress.jsonby default) - Built-in web dashboard for live progress and cost tracking
- Pricing fetched from LiteLLM model pricing data, with local cache and fallback prices
- Node.js
>=20.0.0 - PostgreSQL access
- Optional internet access for live pricing fetch (falls back to bundled pricing if unavailable)
yarn installyarn embedding-cliNotes:
- Reads
.envfirst and asks only for missing values. - Prompt messages are in Portuguese.
- Starts the progress dashboard automatically (default
http://127.0.0.1:4173).
yarn estimate-costThis mode reads configuration from environment variables only.
yarn progress-dashboardOptional CLI flags:
yarn progress-dashboard --host 127.0.0.1 --port 4173 --file ./cost_estimation_progress.jsonyarn typecheckCreate a .env file in the project root:
# Database (choose one approach)
SOURCE_DB_URL=postgres://user:password@host:5432/database
# or use individual fields:
# DB_HOST=localhost
# DB_PORT=5432
# DB_NAME=mydb
# DB_USERNAME=myuser
# DB_PASSWORD=mypassword
# Required for non-interactive mode (use "*" for all tables)
SOURCE_TABLE_ALLOWLIST=users,orders,products
# Optional
DB_SCHEMA=public
SOURCE_TABLE_BLOCKLIST=audit_logs,migrations
SOURCE_UPDATED_AT_CANDIDATES=updated_at,modified_at,updatedon
TEXT_COLUMNS_MODE=auto
EXCLUDED_COLUMNS=internal_id
SOURCE_BATCH_SIZE=1000
# Progress output / dashboard
COST_PROGRESS_FILE=./cost_estimation_progress.json
COST_DASHBOARD_HOST=127.0.0.1
COST_DASHBOARD_PORT=4173Then run:
yarn embedding-cliSOURCE_DB_URL: Full PostgreSQL URL.DB_HOST,DB_PORT,DB_NAME,DB_USERNAME,DB_PASSWORD: Alternative toSOURCE_DB_URL.
SOURCE_TABLE_ALLOWLIST: Comma-separated tables or*.SOURCE_TABLE_BLOCKLIST: Optional comma-separated skip list.SOURCE_SCHEMAorDB_SCHEMA: Source schema (publicdefault).SOURCE_UPDATED_AT_CANDIDATES: Candidate timestamp columns for snapshot logic. Default:updated_at,modified_at,updatedon.
TEXT_COLUMNS_MODE:auto(default) orall.EXCLUDED_COLUMNS: Comma-separated columns to exclude.SOURCE_BATCH_SIZE: Rows fetched per batch. Default:1000.
MAX_THREADS: Worker thread limit. Default:CPU cores - 1(minimum 1).TABLES_PER_BATCH: Work items assigned per worker turn. Default:3.LARGE_TABLE_THRESHOLD: Table row threshold to split into chunks. Default:50000.CHUNK_SIZE: Rows per chunk for large tables. Default:10000.
COST_PROGRESS_FILE: Progress JSON file path. Default:./cost_estimation_progress.json.COST_DASHBOARD_HOST: Dashboard host. Default:127.0.0.1.COST_DASHBOARD_PORT: Dashboard port. Default:4173.
- Discover tables from
information_schema. - Apply allowlist/blocklist filters.
- Fetch row counts and split large tables into chunks.
- Process tables/chunks in parallel workers:
- Fetch rows in batches.
- Convert rows to text payloads.
- Count tokens with
js-tiktoken(text-embedding-3-smalltokenizer).
- Aggregate total tokens and calculate estimated cost per pricing entry.
- Write progress snapshots to JSON for terminal/dashboard visualization.
- Runtime pricing source: LiteLLM
model_prices_and_context_window.json - Local cache file:
~/.embedding-cli-pricing-cache.json - Cache TTL: 24 hours
- Fallback: bundled static pricing entries (OpenAI, Cohere, Voyage, Ollama)
/: HTML dashboard/api/progress: Current progress snapshot JSON/api/stream: Server-Sent Events stream for live updates/health: Health check
.
├── bin/
│ └── embedding-cli.js
├── src/
│ ├── embedding-cli.ts
│ ├── gather-embedding-responses.ts
│ ├── embedding-types.ts
│ ├── constants.ts
│ ├── helpers.ts
│ ├── logger.ts
│ └── cost_estimator/
│ ├── estimate.ts
│ ├── estimate_worker.ts
│ ├── thread_pool.ts
│ ├── terminal_ui.ts
│ ├── pricing.ts
│ ├── progress_file.ts
│ ├── progress_dashboard.ts
│ ├── dashboard/progress_dashboard.html
│ └── db/
│ ├── postgres.ts
│ ├── transform.ts
│ ├── hashing.ts
│ └── types.ts
├── tsconfig.json
├── tsconfig.embedding.json
└── package.json
MIT
