LinearRAG: Linear Graph Retrieval-Augmented Generation on Large-scale Corpora

A relation-free graph construction method for efficient GraphRAG. It eliminates LLM token costs during graph construction, making GraphRAG faster and more efficient than ever.

🚀 Highlights

✅ Context-Preserving: Relation-free graph construction, relying on lightweight entity recognition and semantic linking to achieve comprehensive contextual comprehension.
✅ Complex Reasoning: Enables deep retrieval via semantic bridging, achieving multi-hop reasoning in a single retrieval pass without requiring explicit relational graphs.
✅ High Scalability: Zero LLM token consumption, faster processing speed, and linear time/space complexity.

🎉 News

[2025-10-27] We release LinearRAG, a relation-free graph construction method for efficient GraphRAG.
[2025-06-06] We release GraphRAG-Bench, the benchmark for evaluating GraphRAG models.
[2025-01-21] We release the GraphRAG survey.

🛠️ Usage

1️⃣ Install Dependencies

Step 1: Install Python packages

pip install -r requirements.txt

Step 2: Download Spacy language model

python -m spacy download en_core_web_trf

Note: For the medical dataset, you need to install the scientific/biomedical Spacy model:

pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.3/en_core_sci_scibert-0.5.3.tar.gz

Step 3: Set up your OpenAI API key

export OPENAI_API_KEY="your-api-key-here"
export OPENAI_BASE_URL="your-base-url-here"

Step 4: Download Datasets

Download the datasets from HuggingFace and place them in the dataset/ folder:

git clone https://huggingface.co/datasets/Zly0523/linear-rag
cp -r linear-rag/dataset/* dataset/

Step 5: Prepare Embedding Model

Make sure the embedding model is available at:

model/all-mpnet-base-v2/

2️⃣ Quick Start Example

SPACY_MODEL="en_core_web_trf"
EMBEDDING_MODEL="model/bge-large-en-v1.5"
DATASET_NAME="2wikimultihop"
LLM_MODEL="gpt-4o-mini"
MAX_WORKERS=16

python run.py \
    --spacy_model ${SPACY_MODEL} \
    --embedding_model ${EMBEDDING_MODEL} \
    --dataset_name ${DATASET_NAME} \
    --llm_model ${LLM_MODEL} \
    --max_workers ${MAX_WORKERS}

🎯 Performance

Main results of end-to-end performance

Efficiency and performance comparison.

📖 Citation

If you find this work helpful, please consider citing us:

@article{zhuang2025linearrag,
  title={LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora},
  author={Zhuang, Luyao and Chen, Shengyuan and Xiao, Yilin and Zhou, Huachi and Zhang, Yujing and Chen, Hao and Zhang, Qinggang and Huang, Xiao},
  journal={arXiv preprint arXiv:2510.10114},
  year={2025}
}

📬 Contact

✉️ Email: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
figure		figure
src		src
readme.md		readme.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

LinearRAG: Linear Graph Retrieval-Augmented Generation on Large-scale Corpora

🚀 Highlights

🎉 News

🛠️ Usage

1️⃣ Install Dependencies

2️⃣ Quick Start Example

🎯 Performance

📖 Citation

📬 Contact

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Uh oh!

Uh oh!

DEEP-PolyU/LinearRAG

Folders and files

Latest commit

History

Repository files navigation

LinearRAG: Linear Graph Retrieval-Augmented Generation on Large-scale Corpora

🚀 Highlights

🎉 News

🛠️ Usage

1️⃣ Install Dependencies

2️⃣ Quick Start Example

🎯 Performance

📖 Citation

📬 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages