Ollama OCR

A powerful OCR (Optical Character Recognition) package that uses state-of-the-art vision language models through Ollama to extract text from images and PDF. Available both as a Python package and a Streamlit web application.

🌟 Features

Supports PDF and Images (New! 🆕)

Multiple Vision Models Support
- LLaVA: Efficient vision-language model for real-time processing (LLaVa model can generate wrong output sometimes)
- Llama 3.2 Vision: Advanced model with high accuracy for complex documents
- Granite3.2-vision: A compact and efficient vision-language model, specifically designed for visual document understanding, enabling automated content extraction from tables, charts, infographics, plots, diagrams, and more.
- Moondream: Small vision language model designed to run efficiently on edge devices.
- Minicpm-v: MiniCPM-V 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344).
Multiple Output Formats
- Markdown: Preserves text formatting with headers and lists
- Plain Text: Clean, simple text extraction
- JSON: Structured data format
- Structured: Tables and organized data
- Key-Value Pairs: Extracts labeled information
- Table: Extract all tabular data.
Batch Processing
- Process multiple images in parallel
- Progress tracking for each image
- Image preprocessing (resize, normalize, etc.)
Custom Prompts
- Override default prompts with custom instructions for text extraction.

📦 Package Installation

pip install ollama-ocr

🚀 Quick Start

Prerequisites

Install Ollama
Pull the required model:

ollama pull llama3.2-vision:11b
ollama pull granite3.2-vision
ollama pull moondream
ollama pull minicpm-v

Using the Package

Single File Processing

from ollama_ocr import OCRProcessor

# Initialize OCR processor
ocr = OCRProcessor(model_name='llama3.2-vision:11b', base_url="http://host.docker.internal:11434/api/generate")  # You can use any vision model available on Ollama
# you can pass your custom ollama api

# Process an image
result = ocr.process_image(
    image_path="path/to/your/image.png", # path to your pdf files "path/to/your/file.pdf"
    format_type="markdown",  # Options: markdown, text, json, structured, key_value
    custom_prompt="Extract all text, focusing on dates and names.", # Optional custom prompt
    language="English" # Specify the language of the text (New! 🆕)
)
print(result)

Batch File

from ollama_ocr import OCRProcessor

# Initialize OCR processor
ocr = OCRProcessor(model_name='llama3.2-vision:11b', max_workers=4)  # max workers for parallel processing

# Process multiple images
# Process multiple images with progress tracking
batch_results = ocr.process_batch(
    input_path="path/to/images/folder",  # Directory or list of image paths
    format_type="markdown",
    recursive=True,  # Search subdirectories
    preprocess=True,  # Enable image preprocessing
    custom_prompt="Extract all text, focusing on dates and names.", # Optional custom prompt
    language="English" # Specify the language of the text (New! 🆕)
)
# Access results
for file_path, text in batch_results['results'].items():
    print(f"\nFile: {file_path}")
    print(f"Extracted Text: {text}")

# View statistics
print("\nProcessing Statistics:")
print(f"Total images: {batch_results['statistics']['total']}")
print(f"Successfully processed: {batch_results['statistics']['successful']}")
print(f"Failed: {batch_results['statistics']['failed']}")

📋 Output Format Details

Markdown Format: The output is a markdown string containing the extracted text from the image.
Text Format: The output is a plain text string containing the extracted text from the image.
JSON Format: The output is a JSON object containing the extracted text from the image.
Structured Format: The output is a structured object containing the extracted text from the image.
Key-Value Format: The output is a dictionary containing the extracted text from the image.
Table Format: Extract all tabular data.

🌐 Streamlit Web Application(supports batch processing)

User-Friendly Interface
- Drag-and-drop file upload
- Real-time processing
- Download extracted text
- Image preview with details
- Responsive design
- Language Selection: Specify the language for better OCR accuracy. (New! 🆕)

Clone the repository:

git clone https://github.com/imanoop7/Ollama-OCR.git
cd Ollama-OCR

Install dependencies:

pip install -r requirements.txt

Go to the directory where app.py is located:

cd src/ollama_ocr

Run the Streamlit app:

streamlit run app.py

📒 Example Notebooks

Ollama OCR on Colab: How to use Ollama-OCR on Google Colab.
Example Notebook: Example usage of Ollama OCR.
Ollama OCR with Autogen: Use Ollama-OCR with autogen.
Ollama OCR with LangGraph: Use Ollama-OCR with LangGraph.

Examples Output

Input Image

Sample Output

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built with Ollama Powered by Vision Models

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
example_notebooks		example_notebooks
input		input
output		output
src/ollama_ocr		src/ollama_ocr
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.ipynb		example.ipynb
logo_file.jpg		logo_file.jpg
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ollama OCR

🌟 Features

Supports PDF and Images (New! 🆕)

📦 Package Installation

🚀 Quick Start

Prerequisites

Using the Package

Single File Processing

Batch File

📋 Output Format Details

🌐 Streamlit Web Application(supports batch processing)

📒 Example Notebooks

Examples Output

Input Image

Sample Output

📄 License

🙏 Acknowledgments

Star History

About

Releases

Packages

Contributors 4

Languages

License

imanoop7/Ollama-OCR

Folders and files

Latest commit

History

Repository files navigation

Ollama OCR

🌟 Features

Supports PDF and Images (New! 🆕)

📦 Package Installation

🚀 Quick Start

Prerequisites

Using the Package

Single File Processing

Batch File

📋 Output Format Details

🌐 Streamlit Web Application(supports batch processing)

📒 Example Notebooks

Examples Output

Input Image

Sample Output

📄 License

🙏 Acknowledgments

Star History

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages