GitHub Β· Report Bug Β· Discussions
PowerRAG Community Edition is an open-source project based on RAGFlow, licensed under Apache License 2.0. While preserving RAGFlow's core capabilities and interface compatibility, this project extends functionality in document processing, structured information extraction, effectiveness evaluation, and feedback mechanisms, aiming to provide a more comprehensive integrated data service engine for Large Language Model (LLM) applications.
PowerRAG Community Edition targets developers and research teams building RAG (Retrieval-Augmented Generation) applications. Through atomic API design, it can be flexibly embedded into various intelligent applications, supporting rapid construction, monitoring, and optimization of LLM-based Q&A, knowledge extraction, and generation systems.
PowerRAG extends RAGFlow's document processing capabilities with multi-engine and multi-mode support, suitable for more complex document scenarios:
- Multi-Engine OCR Support: Integrates MinerU and Dots.OCR for complex document recognition and text extraction
- Multiple Chunking Strategies: Supports title-based, regex-based, and intelligent chunking algorithms to improve content organization and retrieval efficiency
- Structured Information Extraction: Implements structured information recognition and extraction based on LangExtract, supporting extraction of tables, fields, entities, and other structured content from documents, providing data foundation for knowledge graphs and semantic retrieval
PowerRAG application platform is built on OceanBase's multi-modal integrated database architecture (SQL + NoSQL), fully leveraging OceanBase's high performance, scalability, and hybrid storage capabilities to provide high-performance underlying support for intelligent retrieval and knowledge services.
- Hybrid Index Retrieval: With OceanBase 4.4.1 capabilities, implements joint queries of vector indexes and full-text indexes, combining semantic relevance with keyword matching to improve the comprehensiveness and accuracy of information recall
- Multi-Modal Data Retrieval: Introduces scalar conditions on top of vector retrieval, enabling further filtering based on numerical, temporal, or categorical attributes in semantic results, achieving precise control of result ranking and filtering
- Unified Data Access Layer: Through OceanBase's multi-modal integrated interface, uniformly manages text, vector, and structured data, enabling efficient cross-modal and cross-type queries
This capability enables PowerRAG to provide more flexible knowledge access patterns in multi-type knowledge sources and complex retrieval scenarios, providing efficient and scalable underlying data support for LLM applications.
PowerRAG Community Edition introduces an evaluation and feedback module, which is built on Langfuse, to help developers systematically measure and optimize LLM application effectiveness, forming an observable, analyzable, and improvable closed-loop system. When introducing this component, PowerRAG Community Edition has added localization adaptations, Qwen model integration, and implemented a compatibility bridge adapter with PowerRAG to ensure seamless integration into the PowerRAG ecosystem. This module includes the following core capabilities:
- Observability: Provides end-to-end call chain tracing and performance analysis. Developers can fully understand the entire model inference process, including input/output, tool calls, retry processes, latency, and call costs, supporting model performance optimization and cost control
- Prompt Management: Supports storage, version management, and retrieval of prompts, facilitating team prompt tuning, sharing, and reuse, achieving standardization and traceability of prompt design
- Evaluation Capabilities: Provides multiple evaluation methods, supporting effectiveness verification and quality comparison of model outputs at different stages, helping teams achieve continuous optimization and automated testing
Through this module, PowerRAG can achieve a complete feedback loop from data input, prompt design to effect evaluation in the model development and application process, helping teams improve model interpretability and application quality.
- Docker and Docker Compose
- At least 8GB of available memory
-
Clone the repository
git clone https://github.com/oceanbase/powerrag.git cd powerrag -
Configure environment variables
Navigate to the
dockerdirectory and copy/edit the environment file:cd docker cp .env.example .env # if .env.example exists # Edit .env file as needed
-
Start services
Start all services using Docker Compose:
docker-compose up -d
This will start PowerRAG and all its dependent services (including database, storage, etc.).
-
Check service status
docker-compose ps
After successful startup, you can access the service at
http://localhost:80(or the configured port).
For more detailed configuration and usage instructions, see the Docker Deployment Documentation.
PowerRAG Community Edition natively maintains compatibility with RAGFlow's access interfaces and can directly reuse its APIs, SDKs, and documentation system. In the overall architecture, RAGFlow remains the underlying foundational service framework, while PowerRAG Community Edition provides extended capabilities and enhanced components on top of it.
π‘ Note
PowerRAG Community Edition documentation only covers the new independent capabilities added by PowerRAG Community Edition. For other features and usage methods shared with RAGFlow, please refer to the RAGFlow official documentation.
PowerRAG runs as an independent backend service that:
- Shares RAGFlow's database and data models
- Operates on port 6000 (configurable)
- Can run alongside RAGFlow service (port 9380)
- Uses RAGFlow's task executor for asynchronous processing
ββββββββββββββββ
β Frontend β
ββββββββ¬ββββββββ
β
ββββββββ΄ββββββββ
β β
βΌ βΌ
ββββββββββββββββ ββββββββββββββββ
β RAGFlow β β PowerRAG β
β Server β β Server β
β (Port 9380) β β (Port 6000) β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β
ββββββββββ¬βββββββββ
β
βΌ
ββββββββββββββββ
β OceanBase β
β Database β
ββββββββββββββββ
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
TODO: Development guide content to be added
- PowerRAG Community Edition Documentation: PowerRAG Community Edition product documentation repository
- Issue Reporting: GitHub Issues
- Discussions: GitHub Discussions