Skip to content

n4zen-dev-studio/DocI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Doc.I_ — Mobile Document Intelligence Engine

A high-performance, on-device document intelligence system designed to extract structured data from unstructured mobile-captured documents such as receipts, invoices, and forms.

Python React Native ONNX Scikit-learn License Status Platform Performance OCR Model Deployment Mobile First


Table of Contents

Open Table of Contents
  1. Executive Summary
  2. Architecture & System Design
  3. Technical Deep Dive
  4. Improving Accuracy Strategy
  5. Performance & Metrics
  6. Developer Experience & Setup
  7. Visuals & Media
  8. Roadmap
  9. What Makes Doc.i Unique
  10. Why This Matters
  11. Design Decisions & Trade-offs
  12. Links
  13. License

🎬 Demo Videos

See Doc.I_ in action across its core workflows: document ingestion, extraction, and model evaluation.


📄 Document Upload & Extraction

Document Upload Demo

Upload documents and extract structured data using on-device OCR and parsing pipelines.

Highlights

  • Document upload (image/PDF)
  • OCR text extraction
  • Structured field parsing
  • Editable results preview

🧪 Dev Mode — Model Testing & Validation

Model Validation Demo

Evaluate extraction accuracy using real datasets with built-in validation tooling.

Highlights

  • Real dataset evaluation
  • Accuracy measurement + comparison
  • Field-level validation insights
  • Iterative model tuning workflow

📊 Dev Mode — Test Data Benchmarking

Test Data Demo

Run controlled test datasets to benchmark extraction performance and consistency.

Highlights

  • Test dataset execution
  • Batch processing workflows
  • Result comparison + analysis
  • Debugging extraction edge cases

⬆️ Back to top


Executive Summary

Mobile document processing presents a unique set of challenges:

  • inconsistent lighting, angles, and blur
  • noisy OCR outputs
  • highly unstructured layouts
  • domain variability across merchants and formats

Doc.i is engineered to solve this by combining:

  • OCR extraction
  • feature-rich line-level analysis
  • lightweight ML classification
  • mobile-optimized inference

The system transforms raw OCR text into structured, machine-readable JSON, enabling downstream workflows such as:

  • expense tracking
  • financial reconciliation
  • analytics pipelines
  • automation systems

The core value lies in fast, reliable, on-device understanding of documents without relying on heavy cloud inference.


Architecture & System Design

End-to-End Pipeline

System Architecture Diagram


Pipeline Breakdown

1. Image Acquisition

  • Captured via mobile camera (React Native Vision Camera)
  • Real-time frame processing support

2. OCR Layer

  • Extracts:

    • text blocks
    • bounding boxes
    • spatial metadata
  • Designed to tolerate imperfect OCR outputs

3. Feature Engineering (Core Intelligence Layer)

Each OCR line is converted into a feature vector including:

Text-based Features

  • charLen, tokenCount

  • digitCount, digitRatio

  • alphaCount, upperRatio

  • keyword signals:

    • hasTotalKeyword
    • hasTaxKeyword
    • hasMerchantKeyword

Semantic Heuristics

  • hasCurrencySymbol
  • hasDatePattern
  • containsPercent
  • endsWithAmount

Layout & Spatial Features

  • x, y, w, h
  • yFromBottom
  • isTopQuarter, isBottomQuarter
  • yRankNorm, lineIndexNorm

These features allow the model to reason about both content and layout, which is critical for document understanding.


4. Classification Engine

  • Model: Logistic Regression (Scikit-learn)

  • Exported to: ONNX for mobile inference

  • Pipeline:

    • StandardScaler
    • LogisticRegression

Labels:

[
  "merchant_name",
  "date",
  "total_amount",
  "tax_amount",
  "currency",
  "item_line",
  "other"
]

Why Logistic Regression?

  • Extremely fast inference (ideal for mobile)
  • Interpretable decision boundaries
  • Performs well with engineered features
  • Small model size → efficient ONNX deployment

5. Post-processing

  • Aggregates classified lines into structured fields
  • Applies normalization (e.g., amount parsing, date formatting)
  • Resolves conflicts (e.g., multiple totals)

Technical Deep Dive

Core ML Stack

import json
import os
from pathlib import Path

import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score, f1_score

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType

Supporting Technologies

Layer Technology
Mobile OCR React Native Vision Camera / ML Kit
Preprocessing OpenCV (planned/optional)
ML Training Scikit-learn
Model Format ONNX
Data Handling NumPy, Pandas
Evaluation Sklearn metrics
UI / Labeling Custom mobile Dev Tools UI

Classification Logic

The model does not rely on raw text alone.

Instead, it evaluates:

P(label | text_features + layout_features + heuristics)

Example:

  • A number near the bottom with "TOTAL" → high probability of total_amount
  • Uppercase text at top → likely merchant_name
  • Lines with repeating patterns → item_line

This hybrid approach combines:

  • statistical learning
  • rule-informed features
  • spatial reasoning

Improving Accuracy Strategy

1. Data-Centric Approach

  • Label correction via in-app labeling UI
  • Dataset iteration loops using "test data" flows
  • Focus on reducing dominance of "other" class

2. Feature Engineering

  • Added semantic signals:

    • subtotal / discount detection
    • service keywords
    • receipt/invoice identifiers
  • Introduced normalized positional features


3. Model Optimization

  • Stratified splits to handle imbalance
  • Cross-validation (StratifiedKFold)
  • Feature scaling via StandardScaler

4. Feedback Loop

User Labeling → Dataset Update → Model Retrain → Deploy ONNX → Evaluate → Repeat

Performance & Metrics

Validation Strategy

  • Train/Test split with stratification

  • Cross-validation scoring

  • Metrics:

    • Accuracy
    • F1-score (critical for imbalance)
    • Confusion matrix

Example Model Output

Input (OCR):

STORE ABC
Date: 12/03/2025
Total: $45.60
Tax: $2.10

Output (Structured JSON):

{
  "merchant_name": "STORE ABC",
  "date": "2025-03-12",
  "total_amount": 45.60,
  "tax_amount": 2.10,
  "currency": "USD",
  "items": []
}

Performance Characteristics

Metric Value (Typical)
Inference Time < 50ms (ONNX mobile)
Model Size Small (< few MB)
Accuracy Dataset dependent (~high with clean data)

Impact:

  • Real-time UX
  • No cloud dependency
  • Battery-efficient

Developer Experience & Setup

Installation

git clone <repo>
cd doc-i

python -m venv venv
source venv/bin/activate

pip install -r requirements.txt

Training the Model

python train.py

Outputs:

  • trained model
  • ONNX export (model_v2.onnx)

Running Evaluation

python evaluate.py

Project Structure

assets/
  ml/
    model_v2.onnx

data/
  datasets/

src/
  features/
  training/
  evaluation/

mobile/
  (React Native app)

Latest Release

📲 Download APK from the latest Release


Visuals & Media

UI Screenshots


Architecture

System Architecture Diagram


Model Evaluation Diagram

Evaluation Results

(back to top)


Roadmap

  • Deep learning hybrid model (layout-aware transformers)
  • Multi-language OCR support
  • Active learning loop from user corrections
  • Cloud sync for dataset expansion
  • Real-time document validation scoring

(back to top)


What Makes Doc.i Unique

1. Mobile-First Intelligence

  • Designed for on-device inference
  • Avoids heavy cloud ML pipelines

2. Feature-Driven ML (Not Black Box)

  • Combines:

    • layout understanding
    • semantic heuristics
    • statistical learning
  • More interpretable and debuggable than deep OCR models

3. Integrated Labeling & Dev Tools

  • Built-in dataset creation + testing UI
  • Rapid iteration cycle inside the app

4. Optimized for Small Data

  • Works effectively with <500 samples
  • Focus on feature quality over dataset size

5. Real-Time UX Focus

  • Sub-50ms inference
  • Immediate feedback loop for users

Why This Matters

Traditional OCR systems extract text but fail to understand structure.

Doc.i bridges that gap by:

  • converting raw OCR into usable structured data
  • enabling automation workflows
  • reducing manual data entry
  • working efficiently on mobile devices without cloud dependency

This positions it as a practical solution for real-world document intelligence use cases.


Design Decisions & Trade-offs

Decision Reason
Logistic Regression Speed, interpretability, mobile compatibility
Feature Engineering Reduces need for massive datasets
ONNX Export Cross-platform mobile inference
No Deep Learning Avoid heavy compute + latency

(back to top)


Links

Links:

License

MIT License — feel free to use and adapt.

(back to top)

About

A high-performance, on-device document intelligence system designed to extract structured data from unstructured mobile-captured documents such as receipts, invoices, and forms.

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors