feat(rag-test): add RAG Pipeline testing and evaluation module by 2561056571 · Pull Request #2 · 2561056571/wegent-evaluate

2561056571 · 2026-02-10T08:27:39Z

Summary

Add a comprehensive RAG Pipeline testing and evaluation module for testing different document conversion methods and splitting strategies with RAGAS-based automated scoring.

Key Features

Document Converters (5 types):

Docling: Main converter for PDF/PPT/Word/MD/TXT with complex structure extraction
Marker: Specialized PDF to Markdown converter with structure preservation
Pandoc: Universal document format converter
PyPDF/PyMuPDF: Lightweight PDF text extraction
Unstructured.io: General-purpose document parsing

Markdown Splitters (2 strategies):

LangChain: MarkdownHeaderTextSplitter → RecursiveCharacterTextSplitter (chunk_size=1024, overlap=50)
LlamaIndex: MarkdownNodeParser → SentenceSplitter (chunk_size=1024, overlap=50)

Core Components:

Vector Store: FAISS for efficient similarity search
Embedding: Qwen3 Embedding (reuses existing RAGAS configuration)
LLM: GLM-4 for query rewriting and answer generation
Evaluation: RAGAS metrics (Faithfulness, Answer Relevancy, Context Precision)

CLI Commands

# Convert documents to Markdown
python -m app.services.rag_test.cli convert --source-dir /path/to/docs --converter docling --output-dir /path/to/md

# Split and index Markdown files
python -m app.services.rag_test.cli index --markdown-dir /path/to/md --splitter langchain --output-index /path/to/index

# Run evaluation
python -m app.services.rag_test.cli evaluate --index-path /path/to/index --queries /path/to/queries.txt --output /path/to/results.json

# Full pipeline test
python -m app.services.rag_test.cli run --source-dir /path/to/docs --queries /path/to/queries.txt --converter docling --splitter langchain --output /path/to/results

# Compare all combinations (5 converters × 2 splitters = 10 tests)
python -m app.services.rag_test.cli compare --source-dir /path/to/docs --queries /path/to/queries.txt --output-dir /path/to/results

# Generate comparison report
python -m app.services.rag_test.cli report --results-dir /path/to/results --output /path/to/report.json

Module Structure

backend/app/services/rag_test/
├── __init__.py
├── cli.py                      # CLI entry (typer + rich progress)
├── config.py                   # Pydantic config models
├── converters/                 # Document converters
│   ├── base.py                 # Abstract base class
│   ├── docling_converter.py
│   ├── marker_converter.py
│   ├── pandoc_converter.py
│   ├── pypdf_converter.py
│   └── unstructured_converter.py
├── splitters/                  # Markdown splitters
│   ├── base.py
│   ├── langchain_splitter.py
│   └── llamaindex_splitter.py
├── embeddings/qwen_embedding.py
├── vectorstore/faiss_store.py
├── retrieval/retriever.py
├── llm/glm_client.py
├── evaluation/ragas_evaluator.py
├── pipeline.py                 # Pipeline orchestration
├── output.py                   # Result handling
└── utils.py

New Dependencies

Document conversion: docling, marker-pdf, pypandoc, pymupdf, unstructured, python-pptx, python-docx
Splitters: llama-index-core, langchain-text-splitters
Vector store: faiss-cpu
LLM: zhipuai
CLI: typer, rich

Test plan

Verify CLI help commands work: python -m app.services.rag_test.cli --help
Test document conversion with sample files
Test indexing with sample Markdown files
Test evaluation with sample queries
Test full pipeline with sample data
Test comparison across converter/splitter combinations

Add a comprehensive RAG Pipeline testing module for evaluating different document conversion methods and splitting strategies. Key features: - Document converters: Docling, Marker, Pandoc, PyPDF, Unstructured - Markdown splitters: LangChain (Header+Recursive), LlamaIndex (Node+Sentence) - Vector store: FAISS for efficient similarity search - Embedding: Qwen3 Embedding (reuses existing RAGAS config) - LLM: GLM for query rewriting and answer generation - Evaluation: RAGAS metrics (faithfulness, relevancy, precision) CLI commands: - convert: Document to Markdown conversion - index: Split and build vector index - evaluate: Run RAGAS evaluation - run: Full pipeline test - compare: Test all converter×splitter combinations - report: Generate comparison report The module operates independently as a CLI tool, reusing existing RAGAS LLM and Embedding configurations from the main project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rag-test): add RAG Pipeline testing and evaluation module#2

feat(rag-test): add RAG Pipeline testing and evaluation module#2
2561056571 wants to merge 1 commit intomainfrom
wegent/feat-rag-pipeline-test-module

2561056571 commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

2561056571 commented Feb 10, 2026

Summary

Key Features

CLI Commands

Module Structure

New Dependencies

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant