The backend API for the OCUL Government Documents AIML project.
The API will provide the ability to run and evaluate these OCR services:
- GOT_OCR2.0 (outputs HTML with a custom JS converter)
- olmOCR (Dolma-style JSONL, uses dolmaviewer)
- Marker OCR (Markdown)
- Tesseract OCR (Plain text)
This project requires Python 3.10 or later to run. You need to clone the project, install poetry, and then install dependencies before running the project:
git clone https://github.com/scholarsportal/govdocs-api.git
cd govdocs-api
pip install poetry
poetry install
poetry run uvicorn src.govdocs_api.server:app --reload
docker compose up
The API will be available on http://localhost:8000