Skip to content

A relation-free graph constrcution method for efficient GraphRAG.

DEEP-PolyU/LinearRAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LinearRAG: Linear Graph Retrieval-Augmented Generation on Large-scale Corpora

A relation-free graph construction method for efficient GraphRAG. It eliminates LLM token costs during graph construction, making GraphRAG faster and more efficient than ever.

arXiv:2506.08938 HuggingFace GitHub


🚀 Highlights

  • Context-Preserving: Relation-free graph construction, relying on lightweight entity recognition and semantic linking to achieve comprehensive contextual comprehension.
  • Complex Reasoning: Enables deep retrieval via semantic bridging, achieving multi-hop reasoning in a single retrieval pass without requiring explicit relational graphs.
  • High Scalability: Zero LLM token consumption, faster processing speed, and linear time/space complexity.

Framework Overview


🎉 News

  • [2025-10-27] We release LinearRAG, a relation-free graph construction method for efficient GraphRAG.
  • [2025-06-06] We release GraphRAG-Bench, the benchmark for evaluating GraphRAG models.
  • [2025-01-21] We release the GraphRAG survey.

🛠️ Usage

1️⃣ Install Dependencies

Step 1: Install Python packages

pip install -r requirements.txt

Step 2: Download Spacy language model

python -m spacy download en_core_web_trf

Note: For the medical dataset, you need to install the scientific/biomedical Spacy model:

pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.3/en_core_sci_scibert-0.5.3.tar.gz

Step 3: Set up your OpenAI API key

export OPENAI_API_KEY="your-api-key-here"
export OPENAI_BASE_URL="your-base-url-here"

Step 4: Download Datasets

Download the datasets from HuggingFace and place them in the dataset/ folder:

git clone https://huggingface.co/datasets/Zly0523/linear-rag
cp -r linear-rag/dataset/* dataset/

Step 5: Prepare Embedding Model

Make sure the embedding model is available at:

model/all-mpnet-base-v2/

2️⃣ Quick Start Example

SPACY_MODEL="en_core_web_trf"
EMBEDDING_MODEL="model/bge-large-en-v1.5"
DATASET_NAME="2wikimultihop"
LLM_MODEL="gpt-4o-mini"
MAX_WORKERS=16

python run.py \
    --spacy_model ${SPACY_MODEL} \
    --embedding_model ${EMBEDDING_MODEL} \
    --dataset_name ${DATASET_NAME} \
    --llm_model ${LLM_MODEL} \
    --max_workers ${MAX_WORKERS}

🎯 Performance

framework

Main results of end-to-end performance

framework

Efficiency and performance comparison.

📖 Citation

If you find this work helpful, please consider citing us:

@article{zhuang2025linearrag,
  title={LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora},
  author={Zhuang, Luyao and Chen, Shengyuan and Xiao, Yilin and Zhou, Huachi and Zhang, Yujing and Chen, Hao and Zhang, Qinggang and Huang, Xiao},
  journal={arXiv preprint arXiv:2510.10114},
  year={2025}
}

📬 Contact

✉️ Email: [email protected]

About

A relation-free graph constrcution method for efficient GraphRAG.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages