"Upload your Chart of Accounts. Get a production-ready financial hierarchy and dbt models. Zero config."
DataBridge AI is a Financial Data Implementation Engine. It automates the "60-second wow" — the journey from a raw COA file to a validated hierarchy and running data mart.
- Upload — Drag and drop any COA, GL extract, or financial data file.
- Auto-Map — AI detects your ERP, recommends hierarchy templates, and maps accounts.
- Deploy — Export dbt models, Snowflake scripts, or Power BI-ready structures.
Try the guided demo in seconds using Docker:
docker run -p 5050:5050 datanexum/databridge-ai:demo
# Open http://localhost:5050| Module | Tools | Description |
|---|---|---|
| Hierarchy Builder | 49 | Create and manage multi-level hierarchy projects (up to 15 levels) |
| Data Reconciliation | 38 | Compare and validate data from CSV, SQL, PDF, JSON sources |
| Wright Module | 31 | Hierarchy-driven data mart generation with 4-object pipeline |
| Cortex AI | 26 | Snowflake Cortex AI with natural language to SQL |
| Data Catalog | 19 | Centralized metadata registry with business glossary |
| Templates & Skills | 16 | Pre-built templates and AI expertise definitions |
| Data Observability | 15 | Real-time metrics, alerting, anomaly detection, health scoring |
| Data Versioning | 12 | Semantic versioning, snapshots, rollback, and diff |
| Git/CI-CD | 12 | Automated workflows and GitHub integration |
| Lineage Tracking | 11 | Column-level lineage and impact analysis |
| PlannerAgent | 11 | AI-powered workflow planning and agent suggestions |
| Forensic Auditor | 5 | Multi-industry risk analysis citing trained GraphRAG logic |
| Financial Ontology | 1,000+ | Active GL-to-FP&A mapping rules for SaaS, Mfg, O&G, Health |
| ScriptGenerator | 3 | Plan → executable Python script (E2E, BLCE, or Custom strategies) |
| Workflow Controller | 2 | Subprocess management, phase tracking, live monitoring |
| GraphRAG Engine | 10 | Anti-hallucination layer with graph + vector retrieval |
| Unified AI Agent | 10 | Cross-system operations (Book/Librarian/Researcher) |
| dbt Integration | 8 | Generate dbt projects from hierarchies |
| Data Quality | 7 | Expectation suites and data contracts |
| DataShield | 12 | Offline pre-AI masking with preserved table purpose and analytical patterns |
| Diff Utilities | 6 | Character-level text and data comparison |
| Hierarchy-Graph Bridge | 5 | Event-driven sync between hierarchies, GraphRAG, and lineage |
| Grounded Detection | 5 | KB-grounded audit loop: Detect > Feedback > Adjust > Memorize |
| AI Relationship Discovery | 8 | Schema analysis, naming pattern detection, relationship inference |
| Cortex Table Understanding | 5 | AI-powered table documentation and graph sync |
| BLCE | 84 | Business Logic Comprehension Engine — extract, normalize, govern, and deploy logic from SQL, Python, Excel, DAX, MDX, PDF |
| Mart Factory | 10 | Hierarchy-driven data mart generation with automated 4-object DDL pipeline |
| Slack Integration | 3 | Post messages, reconciliation reports, workflow alerts |
| Jira / Linear | 4 | Issue creation, workflow linking, status tracking |
| Google Sheets | 4 | Import/export data, sync hierarchies, write results |
| Power BI | 3 | DAX measures, dataset export, push datasets |
| Tableau | 3 | TDS datasource generation, hierarchy export, publishing |
| Airflow / Dagster | 3 | Export workflows as DAGs and job definitions |
| Terraform | 3 | Export Snowflake objects as HCL, generate grants |
| VS Code | 3 | Workspace, MCP settings, launch config generation |
| Batch Excel Triage | 1 | Scan, classify, and generate quality reports for Excel file batches |
| IKE — Institutional Wisdom | 38 | Self-learning knowledge engine (6 pillars): KB, Fix Gen, Debate, Scoring, Learning, Wisdom Extractor |
| And more... | 44 | Console dashboard, recommendations, orchestrator, etc. |
DataBridge AI introduces a hierarchy-centric architecture where the Hierarchy Builder serves as the central data structure. The new Hierarchy-Graph Bridge (hierarchy/graph_bridge.py) uses event-driven propagation to keep all downstream systems in sync:
- Hierarchy changes trigger automatic reindexing in the GraphRAG vector store
- Lineage graphs are rebuilt to reflect updated hierarchy relationships
- Impact analysis propagates through the catalog and observability subsystems
- Wright Pipeline can generate data marts directly from hierarchy projects via
wright_from_hierarchy
GraphRAG now includes startup bootstrap and runtime health logging so graph/vector/RAG synchronization can be verified during operation.
DataBridge AI now includes an Active Financial Ontology trained on 13,000+ multi-industry reports and 1,000+ Enterprise Chart of Accounts.
- Multi-Industry Wisdom: Ingested standard GL-to-FP&A rollups for SaaS, Manufacturing, Healthcare, and Oil & Gas.
- Adversarial Training (Red Team): The engine is trained on 180+ fraud scenarios (Cookie Jar, Round-Tripping) and 300+ structural COA defects.
- Real-World Grounding: Logic is anchored against real Wall Street data (SEC 10-Ks) and complex M&A models.
- Cost-Free Local Scaling: Uses local HuggingFace embeddings (
all-MiniLM-L6-v2) for unlimited knowledge ingestion without API costs. - Automated Forensic Reporting: Generate cited risk reports using the
generate_forensic_reporttool.
Use unified ops action to check runtime state:
rag_ops(action="runtime_status", log_limit=50)Runtime health log file:
data/graphrag/runtime_health.jsonl
Codex GraphRAG index behavior:
codex_graphragauto-builds index if missing- auto-refreshes index when
docs/Codex/*.jsonis newer thandata/codex_graphrag/index.json
DataShield is not positioned as a warehouse security boundary. It is an offline Python-based pre-AI masking layer for clients who do not want raw data sent to AI.
- Raw client data is masked before any AI processing in DataBridge.
- AI sees only post-processed scrambled/synthetic data.
- DataShield stores table summary and purpose metadata before masking so semantic context is preserved after scrambling.
- Post-processed Snowflake data is used for relationship discovery, profiling, and warehouse design, and is intentionally de-correlated from client source data.
DataBridge AI is available in four editions:
| Community (CE) | Pro | Pro Examples | Enterprise | |
|---|---|---|---|---|
| Tools | 340 | ~384 | Tests & Tutorials | 394 |
| Distribution | Public PyPI | GitHub Packages | GitHub Packages | Private Deploy |
| License | MIT (Free) | License Key | Requires Pro Key | Dedicated Key |
| Data Reconciliation | ✅ | ✅ | ✅ | |
| Fuzzy Matching | ✅ | ✅ | ✅ | |
| PDF/OCR Extraction | ✅ | ✅ | ✅ | |
| dbt Basic | ✅ | ✅ | ✅ | |
| Data Quality | ✅ | ✅ | ✅ | |
| UI Dashboard | ✅ | ✅ | ✅ | |
| Cortex AI Agent | ✅ | ✅ | ||
| Wright Pipeline | ✅ | ✅ | ||
| GraphRAG Engine | ✅ | ✅ | ||
| Data Observability | ✅ | ✅ | ||
| Full Data Catalog | ✅ | ✅ | ||
| Column Lineage | ✅ | ✅ | ||
| Hierarchy Builder | ✅ | ✅ | ||
| AI Orchestrator | ✅ | ✅ | ||
| Data Versioning | ✅ | ✅ | ||
| Git/CI-CD Automation | ✅ | ✅ | ||
| Table Understanding | ✅ | ✅ | ||
| AI Relationship Discovery | ✅ | ✅ | ||
| 47 Tests + 29 Tutorials | ✅ | |||
| Custom Agents | ✅ | |||
| White-label | ✅ | |||
| SLA Support | ✅ |
See Commercialization Guide for full details.
Pro adds 19 modules with additional tools on top of Community Edition:
| Module | Tools | Description |
|---|---|---|
| Hierarchy Builder | 49 | Multi-level hierarchy projects (up to 15 levels) |
| Wright Pipeline | 31 | 4-object data mart factory (VW_1 → DT_2 → DT_3A → DT_3) with hierarchy integration |
| Cortex AI | 26 | Snowflake Cortex natural language to SQL and AI reasoning |
| Data Catalog | 19 | Centralized metadata registry with automatic lineage |
| Faux Objects | 18 | Domain persona-based hierarchy generation |
| Connections | 16 | Multi-database connectivity management |
| AI Orchestrator | 16 | Multi-agent task coordination and workflow management |
| Data Observability | 15 | Real-time metrics, alerting, anomaly detection |
| Data Versioning | 12 | Semantic versioning, snapshots, rollback, and diff |
| Git/CI-CD | 12 | Automated workflows and GitHub integration |
| Lineage Tracking | 11 | Column-level lineage and impact analysis |
| PlannerAgent | 11 | AI-powered workflow planning and agent suggestions |
| GraphRAG Engine | 10 | Anti-hallucination layer with graph + vector retrieval |
| Unified AI Agent | 10 | Cross-system operations (Book/Librarian/Researcher) |
| Hierarchy-Graph Bridge | 5 | Event-driven sync between hierarchies, GraphRAG, and lineage |
| Console Dashboard | 5 | Real-time broadcast messaging and monitoring |
| Schema Matcher | 5 | Cross-database schema comparison and mapping |
| AI Relationship Discovery | 8 | Schema analysis, naming patterns, relationship inference |
| Cortex Table Understanding | 5 | AI-powered table documentation and graph sync |
| Data Matcher | 4 | Row-level data comparison across connections |
| BLCE | 72 | Business Logic Comprehension Engine — extract, normalize, govern, deploy business logic |
| DataShield | 12 | Offline pre-AI masking; AI only sees post-processed scrambled data |
The Pro Examples add-on (databridge-ai-examples) provides a comprehensive tests and tutorials package:
- 29 use-case tutorials: Beginner (4), Financial (7), Faux Objects (8), ERP DW (10)
- 47 test files: CE test suite + Pro test suite + shared fixtures
- Requires a valid Pro license key
The standalone reconciliation, profiling, and ingestion toolkit — no license key needed:
pip install databridge-core # v1.0.0 on PyPI# Profile a file
databridge profile sales.csv
# Compare two sources
databridge compare source.csv target.csv --keys id
# Fuzzy match across systems
databridge fuzzy erp.csv gl.csv --column name --threshold 809 CLI commands, 16 Python API functions. See databridge-core for full docs.
# Community Edition (free, from PyPI)
pip install databridge-ai
# With optional dependencies
pip install databridge-ai[pdf] # PDF support
pip install databridge-ai[snowflake] # Snowflake support
pip install databridge-ai[all] # All dependencies
# Pro Edition (requires license key)
export DATABRIDGE_LICENSE_KEY="DB-PRO-YOURKEY-20260101-signature"
pip install databridge-ai-pro --extra-index-url https://[email protected]/datanexum/DATABRIDGE_AI/main/
# Pro Examples (requires Pro license key)
pip install databridge-ai-examples # CE tests + beginner tutorials
pip install databridge-ai-examples[pro] # + Pro tests + advanced tutorialsDataBridge AI uses a key-based licensing system for Pro and Enterprise editions.
License Key Format:
DB-{TIER}-{CUSTOMER_ID}-{EXPIRY}-{SIGNATURE}
Example: DB-PRO-ACME001-20270209-a1b2c3d4e5f6
# Generate a license key (admin)
python scripts/generate_license.py PRO CUSTOMER01 365
# Check license status (MCP tool)
get_license_status()See docs/COMMERCIALIZATION.md for key management, distribution, and API reference.
python run_server.py # Slim mode (default, ~15 tools)
python run_server.py --full # Full mode (all tools)Add to your claude_desktop_config.json:
{
"mcpServers": {
"DataBridge_AI": {
"command": "python",
"args": ["run_server.py"]
}
}
}cd databridge-ce/ui
python server.py
# Open http://127.0.0.1:5050The Web UI includes several guided workflows:
- AI Planner — Conversational workflow builder powered by Claude. Describe what you want in natural language and get an executable multi-step plan with agent assignments and confidence scores.
- Execution Wizard — 3-step guided modal (Connection → Options → Review) for configuring plan execution. Discovers Snowflake connections, lets you set row limits, ERP type, and skip phases before generating a script.
- Workflow Controller — Live monitoring of running workflows via vis.js flow graph and Mermaid flowchart with 4 layout modes (LR, TB, Swimlane, Freeform).
- Onboarding Banner — First-run welcome card with guided actions to help new users get started.
from src.server import mcp
# Run as MCP server
mcp.run()┌─────────────────────────────────────────────────────────────┐
│ Workbench UI (:5050) │ Claude / LLM Client │
│ AI Planner · Workflow Editor │ MCP Protocol │
│ Execution Wizard · Dashboard │ │
└──────────────┬──────────────────┴──────────┬────────────────┘
│ HTTP/WS │ MCP
▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ Flask Server + MCP Server (340 Tools) │
│ │
│ AI Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Planner │ │ Script │ │ Workflow │ │
│ │ Agent │ │ Generator │ │ Controller │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Plugin Modules │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Hierarchy │ │ Cortex │ │ Wright │ │
│ │ Builder │ │ Agent │ │ Module │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Data │ │ Lineage │ │ Data │ │
│ │ Catalog │ │ Tracker │ │ Quality │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ GraphRAG │ │ Observabil- │ │ Data │ │
│ │ Engine │ │ ity │ │ Versioning │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ ┌─────────────┐ │
│ │ BLCE │ Business Logic Comprehension Engine │
│ │ (84 tools) │ SQL/Python/Excel/DAX/MDX/PDF parsing │
│ └─────────────┘ │
│ ┌─────────────┐ │
│ │ Integrations│ Slack, Jira, Sheets, Power BI, Tableau, │
│ │ (26 tools) │ Airflow, Terraform, VS Code │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ IKE — Institutional Wisdom (Standalone Microservice) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ P1: KB │ │ P2: Fix │ │ P3:Debate│ │P4:Scoring│ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ ┌──────────┐ ┌──────────────────────────────────────┐ │
│ │P5:Learn │ │ P6: Wisdom Extractor (Excel Intel) │ │
│ └──────────┘ └──────────────────────────────────────┘ │
│ 38 MCP tools · FastAPI :8000 · gRPC :50051 │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Snowflake / CSV / SQL / PDF │
└─────────────────────────────────────────────────────────────┘
Create and manage multi-level hierarchy projects for financial reporting, organizational structures, and data classification.
# Create a hierarchy project
create_hierarchy_project(name="Revenue P&L", description="Revenue hierarchy")
# Add hierarchies with source mappings
create_hierarchy(project_id="...", name="Product Revenue", parent_id="...")
add_source_mapping(hierarchy_id="...", source_column="ACCOUNT_CODE", source_uid="41%")
# Export and deploy
export_hierarchy_csv(project_id="...")
generate_hierarchy_scripts(project_id="...")Natural language to SQL and AI-powered data operations using Snowflake Cortex.
# Configure Cortex
configure_cortex_agent(connection_id="snowflake-prod", cortex_model="mistral-large")
# Natural language query
analyst_ask(question="What was total revenue by region last quarter?",
semantic_model_file="@ANALYTICS.PUBLIC.MODELS/sales.yaml")
# AI reasoning loop
cortex_reason(goal="Analyze data quality in PRODUCTS table")Generate data marts using the 4-object pipeline pattern: VW_1 → DT_2 → DT_3A → DT_3
# Create mart configuration
create_mart_config(project_name="upstream_gross", report_type="GROSS",
hierarchy_table="TBL_0_GROSS_LOS_REPORT_HIERARCHY")
# Generate pipeline
generate_mart_pipeline(config_name="upstream_gross")Track changes to hierarchies, catalog assets, and semantic models with semantic versioning.
# Create version snapshot
version_create(object_type="hierarchy", object_id="revenue-pl",
description="Added new cost centers", bump="minor")
# Compare versions
version_diff(object_type="hierarchy", object_id="revenue-pl",
from_version="1.0.0", to_version="1.1.0")
# Rollback
version_rollback(object_type="hierarchy", object_id="revenue-pl", to_version="1.0.0")Real-time metrics collection, alerting, anomaly detection, and health scoring.
# Record metrics
obs_record_metric(name="hierarchy.validation.success_rate", value=98.5,
type="gauge", tags='{"project_id": "revenue-pl"}')
# Create alert rules
obs_create_alert_rule(name="Low success rate",
metric_name="hierarchy.validation.success_rate",
threshold=95.0, comparison="<", severity="warning")
# Detect anomalies
obs_detect_anomaly(metric_name="hierarchy.validation.success_rate", value=72.0)
# Get asset health score
obs_get_asset_health(asset_id="revenue-pl", asset_type="hierarchy_project")
# System health dashboard
obs_get_system_health()Anti-hallucination layer using graph + vector retrieval-augmented generation.
# Unified search action
rag_search(action="search", query="What hierarchies use ACCOUNT_CODE?", max_results=5)
# Validate AI output against knowledge graph
rag_ops(action="validate_output", content="SELECT * FROM REVENUE", content_type="sql")
# Runtime health (bootstrap + bridge wiring + vector readiness)
rag_ops(action="runtime_status", log_limit=25)Create a .env file:
# Data directory
DATA_DIR=./data
# License key (for Pro/Enterprise)
DATABRIDGE_LICENSE_KEY=DB-PRO-YOURKEY-20260101-signature
# NestJS backend (optional)
NESTJS_BACKEND_URL=http://localhost:8001
NESTJS_API_KEY=your-api-key
# Snowflake (optional)
SNOWFLAKE_ACCOUNT=your-account
SNOWFLAKE_USER=your-user
SNOWFLAKE_PASSWORD=your-password
# Cortex AI
CORTEX_DEFAULT_MODEL=mistral-large- CLAUDE.md - Complete tool reference and usage guide for Claude
- GEMINI.md - Gemini integration guide and quick reference
- docs/AI_AGENT_CONTEXT.md - Directives and system context for AI agents
- docs/deep_dives/ - Detailed technical guides for core modules, agents, and plugins
- docs/MANIFEST.md - Auto-generated tool manifest
- docs/COMMERCIALIZATION.md - Licensing, tiers, and distribution
- docs/GRAPHRAG_RUNTIME_OPERATIONS.md - Runtime health, auto-updates, and troubleshooting for graph/vector/RAG sync
- docs/excel_plugin_tutorial_user_guide.html - Excel add-in tutorial and user guide
- apps/excel-plugin/SSO_SETUP.md - Microsoft SSO (NAA/MSAL-first) setup for the Excel add-in
- IKE Microservice (
ike/) — Self-learning knowledge engine with 6 pillars (KB, Fix Gen, Debate, Scoring, Learning, Wisdom Extractor), 38 MCP tools, FastAPI REST + gRPC APIs - docs/compliance/OPEN_SOURCE_COMPLIANCE.md - OSS compliance process and governance
- docs/compliance/THIRD_PARTY_NOTICES.md - Third-party package notices and license review flags
- Wiki - Architecture, getting started, and tutorials
The databridge-ce/ folder contains the open-source Community Edition with:
- Plugin architecture for custom tools
- Web UI dashboard
- Starter templates
Contributions are welcome! Please read our contributing guidelines before submitting PRs.
MIT License - See LICENSE for details.
Copyright (c) 2024-2026 DataBridge AI Team