Skip to content

feat(rag-test): add RAG Pipeline testing and evaluation module#2

Open
2561056571 wants to merge 1 commit intomainfrom
wegent/feat-rag-pipeline-test-module
Open

feat(rag-test): add RAG Pipeline testing and evaluation module#2
2561056571 wants to merge 1 commit intomainfrom
wegent/feat-rag-pipeline-test-module

Conversation

@2561056571
Copy link
Copy Markdown
Owner

Summary

Add a comprehensive RAG Pipeline testing and evaluation module for testing different document conversion methods and splitting strategies with RAGAS-based automated scoring.

Key Features

Document Converters (5 types):

  • Docling: Main converter for PDF/PPT/Word/MD/TXT with complex structure extraction
  • Marker: Specialized PDF to Markdown converter with structure preservation
  • Pandoc: Universal document format converter
  • PyPDF/PyMuPDF: Lightweight PDF text extraction
  • Unstructured.io: General-purpose document parsing

Markdown Splitters (2 strategies):

  • LangChain: MarkdownHeaderTextSplitterRecursiveCharacterTextSplitter (chunk_size=1024, overlap=50)
  • LlamaIndex: MarkdownNodeParserSentenceSplitter (chunk_size=1024, overlap=50)

Core Components:

  • Vector Store: FAISS for efficient similarity search
  • Embedding: Qwen3 Embedding (reuses existing RAGAS configuration)
  • LLM: GLM-4 for query rewriting and answer generation
  • Evaluation: RAGAS metrics (Faithfulness, Answer Relevancy, Context Precision)

CLI Commands

# Convert documents to Markdown
python -m app.services.rag_test.cli convert --source-dir /path/to/docs --converter docling --output-dir /path/to/md

# Split and index Markdown files
python -m app.services.rag_test.cli index --markdown-dir /path/to/md --splitter langchain --output-index /path/to/index

# Run evaluation
python -m app.services.rag_test.cli evaluate --index-path /path/to/index --queries /path/to/queries.txt --output /path/to/results.json

# Full pipeline test
python -m app.services.rag_test.cli run --source-dir /path/to/docs --queries /path/to/queries.txt --converter docling --splitter langchain --output /path/to/results

# Compare all combinations (5 converters × 2 splitters = 10 tests)
python -m app.services.rag_test.cli compare --source-dir /path/to/docs --queries /path/to/queries.txt --output-dir /path/to/results

# Generate comparison report
python -m app.services.rag_test.cli report --results-dir /path/to/results --output /path/to/report.json

Module Structure

backend/app/services/rag_test/
├── __init__.py
├── cli.py                      # CLI entry (typer + rich progress)
├── config.py                   # Pydantic config models
├── converters/                 # Document converters
│   ├── base.py                 # Abstract base class
│   ├── docling_converter.py
│   ├── marker_converter.py
│   ├── pandoc_converter.py
│   ├── pypdf_converter.py
│   └── unstructured_converter.py
├── splitters/                  # Markdown splitters
│   ├── base.py
│   ├── langchain_splitter.py
│   └── llamaindex_splitter.py
├── embeddings/qwen_embedding.py
├── vectorstore/faiss_store.py
├── retrieval/retriever.py
├── llm/glm_client.py
├── evaluation/ragas_evaluator.py
├── pipeline.py                 # Pipeline orchestration
├── output.py                   # Result handling
└── utils.py

New Dependencies

  • Document conversion: docling, marker-pdf, pypandoc, pymupdf, unstructured, python-pptx, python-docx
  • Splitters: llama-index-core, langchain-text-splitters
  • Vector store: faiss-cpu
  • LLM: zhipuai
  • CLI: typer, rich

Test plan

  • Verify CLI help commands work: python -m app.services.rag_test.cli --help
  • Test document conversion with sample files
  • Test indexing with sample Markdown files
  • Test evaluation with sample queries
  • Test full pipeline with sample data
  • Test comparison across converter/splitter combinations

Add a comprehensive RAG Pipeline testing module for evaluating
different document conversion methods and splitting strategies.

Key features:
- Document converters: Docling, Marker, Pandoc, PyPDF, Unstructured
- Markdown splitters: LangChain (Header+Recursive), LlamaIndex (Node+Sentence)
- Vector store: FAISS for efficient similarity search
- Embedding: Qwen3 Embedding (reuses existing RAGAS config)
- LLM: GLM for query rewriting and answer generation
- Evaluation: RAGAS metrics (faithfulness, relevancy, precision)

CLI commands:
- convert: Document to Markdown conversion
- index: Split and build vector index
- evaluate: Run RAGAS evaluation
- run: Full pipeline test
- compare: Test all converter×splitter combinations
- report: Generate comparison report

The module operates independently as a CLI tool, reusing existing
RAGAS LLM and Embedding configurations from the main project.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant