Doc.I_ — Mobile Document Intelligence Engine

A high-performance, on-device document intelligence system designed to extract structured data from unstructured mobile-captured documents such as receipts, invoices, and forms.

🎬 Demo Videos

See Doc.I_ in action across its core workflows: document ingestion, extraction, and model evaluation.

📄 Document Upload & Extraction

Upload documents and extract structured data using on-device OCR and parsing pipelines.

Highlights

Document upload (image/PDF)
OCR text extraction
Structured field parsing
Editable results preview

🧪 Dev Mode — Model Testing & Validation

Evaluate extraction accuracy using real datasets with built-in validation tooling.

Highlights

Real dataset evaluation
Accuracy measurement + comparison
Field-level validation insights
Iterative model tuning workflow

📊 Dev Mode — Test Data Benchmarking

Run controlled test datasets to benchmark extraction performance and consistency.

Highlights

Test dataset execution
Batch processing workflows
Result comparison + analysis
Debugging extraction edge cases

⬆️ Back to top

Executive Summary

Mobile document processing presents a unique set of challenges:

inconsistent lighting, angles, and blur
noisy OCR outputs
highly unstructured layouts
domain variability across merchants and formats

Doc.i is engineered to solve this by combining:

OCR extraction
feature-rich line-level analysis
lightweight ML classification
mobile-optimized inference

The system transforms raw OCR text into structured, machine-readable JSON, enabling downstream workflows such as:

expense tracking
financial reconciliation
analytics pipelines
automation systems

The core value lies in fast, reliable, on-device understanding of documents without relying on heavy cloud inference.

Architecture & System Design

End-to-End Pipeline

Pipeline Breakdown

1. Image Acquisition

Captured via mobile camera (React Native Vision Camera)
Real-time frame processing support

2. OCR Layer

Extracts:
- text blocks
- bounding boxes
- spatial metadata
Designed to tolerate imperfect OCR outputs

3. Feature Engineering (Core Intelligence Layer)

Each OCR line is converted into a feature vector including:

Text-based Features

charLen, tokenCount
digitCount, digitRatio
alphaCount, upperRatio
keyword signals:
- hasTotalKeyword
- hasTaxKeyword
- hasMerchantKeyword

Semantic Heuristics

hasCurrencySymbol
hasDatePattern
containsPercent
endsWithAmount

Layout & Spatial Features

x, y, w, h
yFromBottom
isTopQuarter, isBottomQuarter
yRankNorm, lineIndexNorm

These features allow the model to reason about both content and layout, which is critical for document understanding.

4. Classification Engine

Model: Logistic Regression (Scikit-learn)
Exported to: ONNX for mobile inference
Pipeline:
- StandardScaler
- LogisticRegression

Labels:

[
  "merchant_name",
  "date",
  "total_amount",
  "tax_amount",
  "currency",
  "item_line",
  "other"
]

Why Logistic Regression?

Extremely fast inference (ideal for mobile)
Interpretable decision boundaries
Performs well with engineered features
Small model size → efficient ONNX deployment

5. Post-processing

Aggregates classified lines into structured fields
Applies normalization (e.g., amount parsing, date formatting)
Resolves conflicts (e.g., multiple totals)

Technical Deep Dive

Core ML Stack

import json
import os
from pathlib import Path

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score, f1_score

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

Supporting Technologies

Layer	Technology
Mobile OCR	React Native Vision Camera / ML Kit
Preprocessing	OpenCV (planned/optional)
ML Training	Scikit-learn
Model Format	ONNX
Data Handling	NumPy, Pandas
Evaluation	Sklearn metrics
UI / Labeling	Custom mobile Dev Tools UI

Classification Logic

The model does not rely on raw text alone.

Instead, it evaluates:

P(label | text_features + layout_features + heuristics)

Example:

A number near the bottom with "TOTAL" → high probability of total_amount
Uppercase text at top → likely merchant_name
Lines with repeating patterns → item_line

This hybrid approach combines:

statistical learning
rule-informed features
spatial reasoning

Improving Accuracy Strategy

1. Data-Centric Approach

Label correction via in-app labeling UI
Dataset iteration loops using "test data" flows
Focus on reducing dominance of "other" class

2. Feature Engineering

Added semantic signals:
- subtotal / discount detection
- service keywords
- receipt/invoice identifiers
Introduced normalized positional features

3. Model Optimization

Stratified splits to handle imbalance
Cross-validation (StratifiedKFold)
Feature scaling via StandardScaler

4. Feedback Loop

User Labeling → Dataset Update → Model Retrain → Deploy ONNX → Evaluate → Repeat

Performance & Metrics

Validation Strategy

Train/Test split with stratification
Cross-validation scoring
Metrics:
- Accuracy
- F1-score (critical for imbalance)
- Confusion matrix

Example Model Output

Input (OCR):

STORE ABC
Date: 12/03/2025
Total: $45.60
Tax: $2.10

Output (Structured JSON):

{
  "merchant_name": "STORE ABC",
  "date": "2025-03-12",
  "total_amount": 45.60,
  "tax_amount": 2.10,
  "currency": "USD",
  "items": []
}

Performance Characteristics

Metric	Value (Typical)
Inference Time	< 50ms (ONNX mobile)
Model Size	Small (< few MB)
Accuracy	Dataset dependent (~high with clean data)

Impact:

Real-time UX
No cloud dependency
Battery-efficient

Developer Experience & Setup

Installation

git clone <repo>
cd doc-i

python -m venv venv
source venv/bin/activate

pip install -r requirements.txt

Training the Model

python train.py

Outputs:

trained model
ONNX export (model_v2.onnx)

Running Evaluation

python evaluate.py

Project Structure

assets/
  ml/
    model_v2.onnx

data/
  datasets/

src/
  features/
  training/
  evaluation/

mobile/
  (React Native app)

Latest Release

📲 Download APK from the latest Release

Visuals & Media

UI Screenshots

Architecture

Model Evaluation Diagram

(back to top)

Roadmap

Deep learning hybrid model (layout-aware transformers)
Multi-language OCR support
Active learning loop from user corrections
Cloud sync for dataset expansion
Real-time document validation scoring

(back to top)

What Makes Doc.i Unique

1. Mobile-First Intelligence

Designed for on-device inference
Avoids heavy cloud ML pipelines

2. Feature-Driven ML (Not Black Box)

Combines:
- layout understanding
- semantic heuristics
- statistical learning
More interpretable and debuggable than deep OCR models

3. Integrated Labeling & Dev Tools

Built-in dataset creation + testing UI
Rapid iteration cycle inside the app

4. Optimized for Small Data

Works effectively with <500 samples
Focus on feature quality over dataset size

5. Real-Time UX Focus

Sub-50ms inference
Immediate feedback loop for users

Why This Matters

Traditional OCR systems extract text but fail to understand structure.

Doc.i bridges that gap by:

converting raw OCR into usable structured data
enabling automation workflows
reducing manual data entry
working efficiently on mobile devices without cloud dependency

This positions it as a practical solution for real-world document intelligence use cases.

Design Decisions & Trade-offs

Decision	Reason
Logistic Regression	Speed, interpretability, mobile compatibility
Feature Engineering	Reduces need for massive datasets
ONNX Export	Cross-platform mobile inference
No Deep Learning	Avoid heavy compute + latency

(back to top)

Links

Links:

Quick start documentation

License

MIT License — feel free to use and adapt.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.maestro/shared		.maestro/shared
app		app
assets		assets
docs		docs
ignite/templates		ignite/templates
plugins		plugins
test		test
tools/ml		tools/ml
types		types
.dependency-cruiser.js		.dependency-cruiser.js
.eslintignore		.eslintignore
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
README.md		README.md
app.config.ts		app.config.ts
app.json		app.json
babel.config.js		babel.config.js
eas.json		eas.json
getting-started.md		getting-started.md
index.tsx		index.tsx
jest.config.js		jest.config.js
metro.config.js		metro.config.js
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Doc.I_ — Mobile Document Intelligence Engine

Table of Contents

🎬 Demo Videos

📄 Document Upload & Extraction

🧪 Dev Mode — Model Testing & Validation

📊 Dev Mode — Test Data Benchmarking

Executive Summary

Architecture & System Design

End-to-End Pipeline

Pipeline Breakdown

1. Image Acquisition

2. OCR Layer

3. Feature Engineering (Core Intelligence Layer)

4. Classification Engine

5. Post-processing

Technical Deep Dive

Core ML Stack

Supporting Technologies

Classification Logic

Improving Accuracy Strategy

1. Data-Centric Approach

2. Feature Engineering

3. Model Optimization

4. Feedback Loop

Performance & Metrics

Validation Strategy

Example Model Output

Performance Characteristics

Developer Experience & Setup

Installation

Training the Model

Running Evaluation

Project Structure

Latest Release

Visuals & Media

UI Screenshots

Architecture

Model Evaluation Diagram

Roadmap

What Makes Doc.i Unique

1. Mobile-First Intelligence

2. Feature-Driven ML (Not Black Box)

3. Integrated Labeling & Dev Tools

4. Optimized for Small Data

5. Real-Time UX Focus

Why This Matters

Design Decisions & Trade-offs

Links

Links:

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages