A high-performance, on-device document intelligence system designed to extract structured data from unstructured mobile-captured documents such as receipts, invoices, and forms.
Open Table of Contents
See Doc.I_ in action across its core workflows: document ingestion, extraction, and model evaluation.
Mobile document processing presents a unique set of challenges:
- inconsistent lighting, angles, and blur
- noisy OCR outputs
- highly unstructured layouts
- domain variability across merchants and formats
Doc.i is engineered to solve this by combining:
- OCR extraction
- feature-rich line-level analysis
- lightweight ML classification
- mobile-optimized inference
The system transforms raw OCR text into structured, machine-readable JSON, enabling downstream workflows such as:
- expense tracking
- financial reconciliation
- analytics pipelines
- automation systems
The core value lies in fast, reliable, on-device understanding of documents without relying on heavy cloud inference.
- Captured via mobile camera (React Native Vision Camera)
- Real-time frame processing support
-
Extracts:
- text blocks
- bounding boxes
- spatial metadata
-
Designed to tolerate imperfect OCR outputs
Each OCR line is converted into a feature vector including:
Text-based Features
-
charLen,tokenCount -
digitCount,digitRatio -
alphaCount,upperRatio -
keyword signals:
hasTotalKeywordhasTaxKeywordhasMerchantKeyword
Semantic Heuristics
hasCurrencySymbolhasDatePatterncontainsPercentendsWithAmount
Layout & Spatial Features
x, y, w, hyFromBottomisTopQuarter,isBottomQuarteryRankNorm,lineIndexNorm
These features allow the model to reason about both content and layout, which is critical for document understanding.
-
Model: Logistic Regression (Scikit-learn)
-
Exported to: ONNX for mobile inference
-
Pipeline:
StandardScalerLogisticRegression
Labels:
[
"merchant_name",
"date",
"total_amount",
"tax_amount",
"currency",
"item_line",
"other"
]Why Logistic Regression?
- Extremely fast inference (ideal for mobile)
- Interpretable decision boundaries
- Performs well with engineered features
- Small model size → efficient ONNX deployment
- Aggregates classified lines into structured fields
- Applies normalization (e.g., amount parsing, date formatting)
- Resolves conflicts (e.g., multiple totals)
import json
import os
from pathlib import Path
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, StratifiedKFold, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score, f1_score
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType| Layer | Technology |
|---|---|
| Mobile OCR | React Native Vision Camera / ML Kit |
| Preprocessing | OpenCV (planned/optional) |
| ML Training | Scikit-learn |
| Model Format | ONNX |
| Data Handling | NumPy, Pandas |
| Evaluation | Sklearn metrics |
| UI / Labeling | Custom mobile Dev Tools UI |
The model does not rely on raw text alone.
Instead, it evaluates:
P(label | text_features + layout_features + heuristics)
Example:
- A number near the bottom with "TOTAL" → high probability of
total_amount - Uppercase text at top → likely
merchant_name - Lines with repeating patterns →
item_line
This hybrid approach combines:
- statistical learning
- rule-informed features
- spatial reasoning
- Label correction via in-app labeling UI
- Dataset iteration loops using "test data" flows
- Focus on reducing dominance of "other" class
-
Added semantic signals:
- subtotal / discount detection
- service keywords
- receipt/invoice identifiers
-
Introduced normalized positional features
- Stratified splits to handle imbalance
- Cross-validation (StratifiedKFold)
- Feature scaling via
StandardScaler
User Labeling → Dataset Update → Model Retrain → Deploy ONNX → Evaluate → Repeat
-
Train/Test split with stratification
-
Cross-validation scoring
-
Metrics:
- Accuracy
- F1-score (critical for imbalance)
- Confusion matrix
Input (OCR):
STORE ABC
Date: 12/03/2025
Total: $45.60
Tax: $2.10
Output (Structured JSON):
{
"merchant_name": "STORE ABC",
"date": "2025-03-12",
"total_amount": 45.60,
"tax_amount": 2.10,
"currency": "USD",
"items": []
}| Metric | Value (Typical) |
|---|---|
| Inference Time | < 50ms (ONNX mobile) |
| Model Size | Small (< few MB) |
| Accuracy | Dataset dependent (~high with clean data) |
Impact:
- Real-time UX
- No cloud dependency
- Battery-efficient
git clone <repo>
cd doc-i
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtpython train.pyOutputs:
- trained model
- ONNX export (
model_v2.onnx)
python evaluate.pyassets/
ml/
model_v2.onnx
data/
datasets/
src/
features/
training/
evaluation/
mobile/
(React Native app)
📲 Download APK from the latest Release
![]() |
![]() |
![]() |
![]() |
- Deep learning hybrid model (layout-aware transformers)
- Multi-language OCR support
- Active learning loop from user corrections
- Cloud sync for dataset expansion
- Real-time document validation scoring
- Designed for on-device inference
- Avoids heavy cloud ML pipelines
-
Combines:
- layout understanding
- semantic heuristics
- statistical learning
-
More interpretable and debuggable than deep OCR models
- Built-in dataset creation + testing UI
- Rapid iteration cycle inside the app
- Works effectively with <500 samples
- Focus on feature quality over dataset size
- Sub-50ms inference
- Immediate feedback loop for users
Traditional OCR systems extract text but fail to understand structure.
Doc.i bridges that gap by:
- converting raw OCR into usable structured data
- enabling automation workflows
- reducing manual data entry
- working efficiently on mobile devices without cloud dependency
This positions it as a practical solution for real-world document intelligence use cases.
| Decision | Reason |
|---|---|
| Logistic Regression | Speed, interpretability, mobile compatibility |
| Feature Engineering | Reduces need for massive datasets |
| ONNX Export | Cross-platform mobile inference |
| No Deep Learning | Avoid heavy compute + latency |
MIT License — feel free to use and adapt.






