Skip to content

atomicmilkshake/eagledb

Repository files navigation

EagleDB

An embeddable, in-process T-SQL query engine. EagleDB lets you run T-SQL queries against local DataFrames using Apache DataFusion — no SQL Server connection required at query time.

Why EagleDB?

SQL Server is great for production, but iterating on analytical queries against a live database is slow and contends with other workloads. EagleDB copies the tables you need into memory once, then executes your T-SQL locally — hundreds of times faster for complex joins and CTEs.

Features

  • T-SQL compatibilityISNULL, LEN, CHARINDEX, STUFF...FOR XML PATH, GETDATE, DATEADD, DATEDIFF, CAST, CTEs, subqueries, CASE, bitwise operators, and more
  • Automatic transpilation — Parses T-SQL, rewrites SQL Server idioms, and transpiles to DataFusion SQL
  • In-process execution — No server process; runs inside your Python application via PyO3
  • ODBC table loading — Point at a SQL Server instance, name the tables you need, and EagleDB loads them as in-memory Arrow tables
  • Batch execution — Supports GO-separated batches, DECLARE/SET variables, temp tables, and multi-statement scripts
  • Pandas integration — Results returned as pandas DataFrames

Architecture

┌─────────────────────────────────────────────────┐
│                  Python App                      │
│                                                  │
│   import eagledb                                 │
│   session = eagledb.Session()                    │
│   session.load_table(conn, "schema.table")       │
│   df = session.execute("SELECT ...")             │
│                                                  │
├─────────────────────────────────────────────────┤
│              Python Wrapper Layer                 │
│         (session.py, loader.py)                   │
│    Table loading via pyodbc, DataFrame I/O        │
├─────────────────────────────────────────────────┤
│              Rust Core (PyO3)                     │
│                                                  │
│  ┌──────────┐  ┌───────────┐  ┌──────────────┐  │
│  │  Parser   │  │  Planner  │  │   Engine     │  │
│  │ (T-SQL    │→│ (rewrite  │→│ (DataFusion  │  │
│  │  lexer)   │  │  & map)   │  │  execution)  │  │
│  └──────────┘  └───────────┘  └──────────────┘  │
│                                                  │
│  ┌──────────┐  ┌───────────┐  ┌──────────────┐  │
│  │ Functions │  │  Dialect  │  │   Catalog    │  │
│  │ (T-SQL    │  │ (syntax   │  │ (table       │  │
│  │  → DF)    │  │  compat)  │  │  registry)   │  │
│  └──────────┘  └───────────┘  └──────────────┘  │
└─────────────────────────────────────────────────┘

Conversion Pipeline

  1. Parse — State-machine lexer splits SQL into statements, respecting quotes, comments, brackets, and semicolons
  2. Normalize — Strip comments, collapse whitespace, remove WITH (NOLOCK) hints
  3. Rewrite — Convert STUFF...FOR XML PATH to STRING_AGG, handle correlated subqueries with bitwise ops, strip three-part names
  4. Map functionsISNULLCOALESCE, LENLENGTH, CHARINDEX(a,b)STRPOS(b,a), GETDATE()CURRENT_TIMESTAMP, etc.
  5. Execute — Run on DataFusion with full Arrow columnar performance

Installation

From source (requires Rust toolchain and maturin)

git clone https://github.com/atomicmilkshake/eagledb.git
cd eagledb
pip install maturin
maturin develop --release

Dependencies

  • Python 3.8+
  • Rust 1.80+ (for building)
  • ODBC Driver 17 for SQL Server (for loading tables from SQL Server)

Quick Start

import eagledb

# Create an in-memory session
session = eagledb.Session()

# Load tables from SQL Server (one-time)
conn_str = "DRIVER={ODBC Driver 17 for SQL Server};SERVER=myserver;DATABASE=mydb;Trusted_Connection=yes"
session.load_table(conn_str, "dbo.Customers")
session.load_table(conn_str, "dbo.Orders")

# Execute T-SQL locally — no server round-trip
df = session.execute("""
    SELECT c.Name, COUNT(o.OrderID) AS OrderCount
    FROM dbo.Customers c
    JOIN dbo.Orders o ON c.CustomerID = o.CustomerID
    WHERE o.OrderDate >= DATEADD(YEAR, -1, GETDATE())
    GROUP BY c.Name
    ORDER BY OrderCount DESC
""")

print(df)

Batch execution with variables

result = session.execute_batch("""
    DECLARE @cutoff DATE = '2024-01-01';

    SELECT *
    INTO #recent
    FROM dbo.Orders
    WHERE OrderDate >= @cutoff;

    SELECT CustomerID, COUNT(*) AS cnt
    FROM #recent
    GROUP BY CustomerID
    ORDER BY cnt DESC;
""")

Benchmarks

Measured on a desktop workstation (Ryzen 7, 32 GB RAM) against a typical enterprise GIS database with tables ranging from 10K to 500K rows.

Query Type SQL Server EagleDB Speedup
Simple join + filter 1.2 s 4 ms ~300x
Multi-CTE analytical 3.6 s 10 ms ~360x
13-statement stress script >4 hours 25 s

First run includes table loading time (typically 2-5 seconds per table over the network). Subsequent runs hit the in-memory cache and execute at full speed.

Supported T-SQL Features

Category Examples
Functions ISNULL, LEN, CHARINDEX, STUFF, REVERSE, GETDATE, DATEADD, DATEDIFF, CAST, CONVERT
Aggregates STRING_AGG (via STUFF...FOR XML PATH), COUNT, SUM, AVG, VAR
Syntax CTEs, subqueries, CASE, BETWEEN, IN, LIKE, TOP, DISTINCT, UNION
Operators Bitwise &, `
Batches GO separators, DECLARE/SET variables, temp tables (#temp)
Hints WITH (NOLOCK) — silently stripped
SET options SET NOCOUNT ON, SET TRANSACTION ISOLATION LEVEL, etc. — silently ignored

Project Structure

eagledb/
├── src/
│   ├── lib.rs              # PyO3 module entry point
│   ├── engine.rs           # Core DataFusion session
│   ├── catalog.rs          # Table registry
│   ├── types.rs            # Type mappings (SQL Server → Arrow)
│   └── tsql/
│       ├── parser.rs       # T-SQL statement parser
│       ├── planner.rs      # Query rewriter
│       ├── functions.rs    # Function mappings
│       └── dialect.rs      # Syntax compatibility
├── python/eagledb/
│   ├── __init__.py         # Public API
│   ├── session.py          # High-level Session class
│   └── loader.py           # ODBC table loader
├── tests/
│   ├── integration/        # Integration tests
│   └── smoke_test.py       # Quick smoke test
├── examples/
│   └── arcfm_blaster.py    # Example application
├── Cargo.toml
└── pyproject.toml

License

This project is licensed under the GNU General Public License v3.0 — see LICENSE for details.

About

An embeddable, in-process T-SQL query engine powered by Apache DataFusion. Run SQL Server queries locally against in-memory Arrow tables — hundreds of times faster than T-SQL against the production server

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors