Skip to content

zjunlp/OmniThink

Repository files navigation

OmniThink

Expanding Knowledge Boundaries in Machine Writing through Thinking

Table of Contents

🔔News

  • 2025-08-24, We have added offline local search support using RAGFlow technology! Now you can search local documents without internet connection.
  • 2025-03-12, We have optimized the Docker usage for OmniThink.
  • 2025-02-20, We have added the evaluation methods from the paper to OmniThink, and in the future, we will integrate more evaluation methods.
  • 2025-01-28, We have provided support for the deepseek-reasoner model. You can try running ./examples/deepseekr1.py to test OmniThink's performance within deepseek-reasoner.
Previous News
  • 2025-01-18, we open-sourced OmniThink, a machine writing framework.

🌻Acknowledgement

📖 Quick Start

  • 🌏 The Online Demo is avaiable at ModelScope now!

📌 Introduction

Welcome to OmniThink, an innovative machine writing framework designed to replicate the human cognitive process of iterative expansion and reflection in generating insightful long-form articles.

  • Iterative Expansion and Reflection: OmniThink uses a unique mechanism that simulates human cognitive behaviors to deepen the understanding of complex topics.
  • Enhanced Knowledge Density: OmniThink focuses on expanding knowledge boundaries, resulting in articles that are rich in information and insights.
  • Comprehensive Article Generation: OmniThink constructs outlines and generates articles, delivering high-quality content that is both coherent and contextually robust.

🛠 Dependencies

📦 Conda

conda create -n OmniThink python=3.11
git clone https://github.com/zjunlp/OmniThink.git
cd OmniThink
# Install requirements
pip install -r requirements.txt

🔍 Local Search Support

OmniThink now supports offline local search using RAGFlow technology! This feature allows you to:

  • Search local documents without internet connection
  • Use vector embeddings for semantic search
  • Index and retrieve your own document collections
  • Maintain data privacy with local-only processing

Local Search Features

  • OfflineRAGFlow: Core RAG engine with FAISS vector database
  • LocalSearch: DSPy-compatible search interface
  • Sentence Transformers: High-quality text embeddings
  • Smart Chunking: Intelligent document segmentation
  • Semantic Retrieval: Context-aware search results

Quick Local Search Setup

from src.tools.rm import OfflineRAGFlow, LocalSearch

# Initialize the local RAG engine
rag_engine = OfflineRAGFlow(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    chunk_size=800,
    overlap=120,
    k=5
)

# Add documents to your local index
rag_engine.ingest(
    text="Your document content here...",
    meta={"title": "Document Title", "doc_id": "doc1"}
)

# Create DSPy-compatible search interface
local_search = LocalSearch(search=rag_engine, k=3)

# Use in your DSPy pipeline
results = local_search.forward("your search query")

🐳 Docker

git clone https://github.com/zjunlp/OmniThink.git
docker pull zjunlp/omnithink:latest
docker run -it zjunlp/omnithink:latest

🔑 Before running, please export the LM API key and SEARCH key as an environment variable:

export LM_KEY=YOUR_API_KEY
export SEARCHKEY=YOUR_SEARCHKEY

Local Search Dependencies

For local search functionality, additional packages are required:

# Install local search dependencies
pip install sentence-transformers faiss-cpu numpy

# Or use the updated requirements.txt
pip install -r requirements.txt

You can define your own LM API and SEARCH API

Note that the output of the LM should be a LIST.

Results in OmniThink

The preformance of OmniThink is shown below:

Generate Article in OmniThink

Just one command required

sh run.sh

You can find your Article, Outline and mindmap in ./results/

🔍 Evaluation

We provide convenient scripts for evaluating your method. The evaluation is divided into three categories: Rubric_Grading, Knowledge_Density, and Information_Diversity.

We use the factscore library. Please run the following code before starting the evaluation.

cd eval
git clone https://github.com/shmsw25/FActScore.git

For Rubric Grading

python Rubric_Grading.py \
 --articlepath articlepath \
 --modelpath modelpath

For Information Diversity

python Information_Diversity.py \
 --mappath mappath \
 --model_path model_path

For Knowledge_Density

python Knowledge_Density.py \
 --articlepath articlepath \
 --api_path api_path \
 --threads threads

Citation

If you find our repo useful in your research, please kindly consider cite:

@misc{xi2025omnithinkexpandingknowledgeboundaries,
      title={OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking}, 
      author={Zekun Xi and Wenbiao Yin and Jizhan Fang and Jialong Wu and Runnan Fang and Ningyu Zhang and Jiang Yong and Pengjun Xie and Fei Huang and Huajun Chen},
      year={2025},
      eprint={2501.09751},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.09751}, 
}