ImageToTextLab

A FastAPI-based OCR and document analysis service that extracts text and structured data from images using PaddleOCR, BLIP image captioning, and TrOCR models. Optimized for processing insurance claim forms and other structured documents.

Prerequisites

Python 3.10+
A CUDA-capable GPU is optional but recommended for faster inference (used by both BLIP and PaddleOCR).

Setup

cd ImageToTextLab
python -m venv .venv
.venv\Scripts\activate  # On macOS/Linux use: source .venv/bin/activate
pip install -r requirements.txt

Running the API

uvicorn app:app --reload --port 8080

Usage

Send a multipart/form-data POST request to /extract with the following fields (PaddleOCR extracts the structured text, while BLIP provides a descriptive caption):

formType – string describing the type of form being processed
documentType – string describing the document category
attachment – the image file that contains Name, Age, and Location

Example with curl:

curl -X POST http://localhost:8080/extract \
  -F "formType=registration" \
  -F "documentType=id-card" \
  -F "attachment=@/path/to/image.png"

Response

{
  "formType": "registration",
  "documentType": "id-card",
  "rawText": "NAME : JANE DOE\nAGE : 32\nLOCATION : AUSTIN TX",
  "blipCaption": "Name: Jane Doe; Age: 32; Location: Austin, TX",
  "data": {
    "name": "Jane Doe",
    "age": "32",
    "location": "Austin, TX"
  }
}

If BLIP cannot confidently read one of the fields, the corresponding value is returned as null. The rawText field always contains the plain text generated by BLIP for auditing purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dictionaries		dictionaries
res-data		res-data
training_samples		training_samples
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
ImageToTextLab.postman_collection.json		ImageToTextLab.postman_collection.json
LICENSE		LICENSE
README.md		README.md
app.py		app.py
debug_script.py		debug_script.py
docker-compose.yml		docker-compose.yml
download_server_models.py		download_server_models.py
download_trocr.py		download_trocr.py
full_page_test.jpg		full_page_test.jpg
reproduce_full_page.py		reproduce_full_page.py
requirements.txt		requirements.txt
run_app.bat		run_app.bat
start_server.bat		start_server.bat
test_api.py		test_api.py
test_line_removal.py		test_line_removal.py
train_font_experiment.py		train_font_experiment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ImageToTextLab

Prerequisites

Setup

Running the API

Usage

Response

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ImageToTextLab

Prerequisites

Setup

Running the API

Usage

Response

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages