RAG Builder is a well-architected, scalable, and secure RAG (Retrieval-Augmented Generation) application built on AWS. It allows users to create a knowledge base from PDFs and websites and then ask questions about it. The project is built with Python, AWS CDK, and LangChain, and it serves as a powerful demonstration of how to build production-ready GenAI applications on AWS.
rag-builder-demo.mp4
The application is built using a serverless-first architecture on AWS, designed for scalability, security, and maintainability.
architecture-beta
%% External
service user(internet)[User]
%% AWS Cloud
group aws(cloud)[AWS]
service cloudfront(internet)[Cloudfront] in aws
service chainlit_app(server)[Chainlit App] in aws
service cognito(cloud)[Cognito] in aws
%% Storage
group data_plane(database)[Conversation Memory] in aws
service dynamodb(disk)[DynamoDB] in data_plane
service s3_chainlit(database)[S3] in data_plane
group knowledge_base(database)[Knowledge Base] in aws
service dynamodb_metadata(disk)[DynamoDB Metadata] in knowledge_base
service vector_store(database)[S3 LanceDB Vector Store] in knowledge_base
%% Bedrock
service bedrock(cloud)[Bedrock LLM and Embeddings] in aws
%% Backend
service backend(server)[Lambda FastAPI Backend] in aws
%% Data processing
service lambda(server)[Lambda Data Processing Layer] in aws
%% Edges
user:R --> L:cloudfront
cloudfront:R --> L:chainlit_app
chainlit_app:T -- B:cognito
dynamodb:R -- L:s3_chainlit
chainlit_app:B -- T:dynamodb{group}
dynamodb_metadata:R -- L:vector_store
s3_chainlit{group}:R -- L:dynamodb_metadata{group}
chainlit_app:R -- L:bedrock
chainlit_app:R --> L:backend
backend:R -- L:lambda
lambda:B -- T:dynamodb_metadata{group}
-
Frontend: A Chainlit application running on an AWS Fargate container. It is fronted by an Application Load Balancer and a CloudFront distribution to provide HTTPS and low-latency content delivery.
-
Authentication: Amazon Cognito is used for user authentication and authorization, securing the application and its data.
-
Backend API: A FastAPI application running on a Lambda function and exposed via API Gateway. It provides a RESTful API for managing documents and the knowledge base.
-
Document Processing:
- Document loading and deletion are handled asynchronously using Amazon SQS queues, which makes the application more resilient and responsive.
- A Lambda function is triggered by the queue to download, chunk, create embeddings for, and store documents in the vector store.
- The vector store is built with LanceDB and stored on Amazon S3, providing a serverless and scalable solution for vector search.
-
AI Models: The application uses Amazon Bedrock for both the embeddings model (
amazon.titan-embed-text-v2:0) and the agent's language model (amazon.nova-pro-v1:0). -
Database: Amazon DynamoDB is used to store document metadata and Chainlit conversation history.
-
Scheduled Tasks: A weekly scheduled Lambda function, triggered by Amazon EventBridge Scheduler, optimizes the LanceDB vector store to maintain performance.
Optimization covers three operations:
- Compaction: Merges small files into larger ones
- Prune: Removes old versions of the dataset
- Index: Optimizes the indices, adding new data to existing indices (incremental indexing)
The RAG agent employs a hybrid search strategy to retrieve relevant context from the LanceDB vector store. This approach combines:
- Vector Search: Retrieves documents based on semantic similarity using embeddings generated by Amazon Bedrock.
- Keyword Search: Matches specific terms using full-text search.
- Reranking: Uses Reciprocal Rank Fusion (RRF) to combine and order the results from both search methods.
This ensures a robust retrieval process that captures both conceptually similar content and exact keyword matches.
The evaluation module provides a comprehensive framework (CLI tool built with Typer) for testing and analyzing RAG system performance. It uses RAGAS to generate synthetic test datasets, run experiments with different configurations, and visualize results through an interactive dashboard.
The evaluation workflow consists of four main steps:
- Create Knowledge Base - Build an evaluation dataset from research papers
- Generate Test Set - Create synthetic question-answer pairs using RAGAS
- Run Experiments - Test different model configurations and measure performance
- Visualize Results - Analyze experiment outcomes through an interactive dashboard
Creates a knowledge base for evaluation by downloading and processing a curated
set of research papers from arXiv. The documents are stored in a LanceDB table
named evaluation_{embedding_model} for use in subsequent evaluation steps.
Generates a synthetic test dataset using RAGAS based on the evaluation knowledge base. Creates realistic question-answer pairs with personas and different query types for comprehensive testing.
Runs evaluation experiments using the synthetic testset with specified model configurations. Measures faithfulness and answer accuracy metrics to assess RAG performance.
Generates an interactive Plotly dashboard to visualize experiment results over time. Shows trends in faithfulness and accuracy metrics across different experimental configurations.
Example output:
Tip
Hover over data points to see the detailed experiment configuration.
This project implements a production-grade CI/CD pipeline using GitHub Actions, focusing on speed, security, and developer feedback.
Triggered on: Pull Requests to
main
- Efficient Dependency Management: Uses
uvto install Python dependencies at lightning speeds, significantly reducing CI build times compared topiporpoetry. - Smart Monorepo Testing: Implements
dorny/paths-filterto only run tests for components that have changed (e.g., if only the Backend API is modified, only those tests run), saving compute resources. - Automated Feedback: Posts detailed Test Coverage Reports directly to Pull Requests as comments, ensuring code quality visibility before merging.
- Isolated Environments: Runs unit tests for each Lambda function in isolated environments to prevent dependency conflicts.
Triggered on: Push to
main(after CI passes and PR is merged)
- Secure Authentication: Uses OpenID Connect (OIDC) to authenticate with AWS, eliminating the need for long-lived Access Keys in GitHub Secrets.
- Infrastructure as Code: Automatically deploys infrastructure changes via AWS CDK.
- Concurrency Control: Prevents race conditions by ensuring only one deployment pipeline runs at a time for the production environment.
- An AWS account
- AWS CLI configured with your credentials and appropriate permissions
- Python 3.12+
- uv installed
-
Clone the repository
git clone https://github.com/gontzalm/rag-builder.git cd rag-builder -
Install dependencies
uv sync
-
Bootstrap the CDK environment (if you haven't already)
cdk bootstrap
-
Deploy the stack
cdk deploy
The deployment will take several minutes. Once it's complete, the CDK will output the URL of the Chainlit application and a
.envfile for local testing.
Tip
To save costs, speed up deployments, or if you're developing the Chainlit UI
locally, you can disable its deployment (Fargate service, Load Balancer, and
CloudFront distribution) by using the deploy_chainlit context value:
cdk deploy -c deploy_chainlit=falseFor a faster development cycle, you can run the Chainlit application locally while connecting to the deployed AWS resources.
-
After deploying the stack, copy the
.envfile content from the CDK output. -
Create a file named
.envin therag_builder/fargate/chainlit-app/directory and paste the content into it. -
Navigate to the Chainlit app directory:
cd rag_builder/fargate/chainlit-app -
Install the local dependencies:
uv sync
-
Run the Chainlit application:
uv run chainlit run main.py -w
This will start a local server, and you can access the application at
http://localhost:8000.
| Category | Technology |
|---|---|
| Infrastructure as Code | |
| Frontend | |
| Backend | |
| GenAI | |
| CLI | |
| Package Management |





