Skip to content

chenchu-krishna-akkarapalli/ImageToTextLab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ImageToTextLab

A FastAPI-based OCR and document analysis service that extracts text and structured data from images using PaddleOCR, BLIP image captioning, and TrOCR models. Optimized for processing insurance claim forms and other structured documents.

Prerequisites

  • Python 3.10+
  • A CUDA-capable GPU is optional but recommended for faster inference (used by both BLIP and PaddleOCR).

Setup

cd ImageToTextLab
python -m venv .venv
.venv\Scripts\activate  # On macOS/Linux use: source .venv/bin/activate
pip install -r requirements.txt

Running the API

uvicorn app:app --reload --port 8080

Usage

Send a multipart/form-data POST request to /extract with the following fields (PaddleOCR extracts the structured text, while BLIP provides a descriptive caption):

  • formType – string describing the type of form being processed
  • documentType – string describing the document category
  • attachment – the image file that contains Name, Age, and Location

Example with curl:

curl -X POST http://localhost:8080/extract \
  -F "formType=registration" \
  -F "documentType=id-card" \
  -F "attachment=@/path/to/image.png"

Response

{
  "formType": "registration",
  "documentType": "id-card",
  "rawText": "NAME : JANE DOE\nAGE : 32\nLOCATION : AUSTIN TX",
  "blipCaption": "Name: Jane Doe; Age: 32; Location: Austin, TX",
  "data": {
    "name": "Jane Doe",
    "age": "32",
    "location": "Austin, TX"
  }
}

If BLIP cannot confidently read one of the fields, the corresponding value is returned as null. The rawText field always contains the plain text generated by BLIP for auditing purposes.

About

FastAPI-based OCR and document analysis service for extracting text and structured data from images using PaddleOCR, BLIP, and TrOCR models. Optimized for claim forms and document processing with Docker support.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors