📄 Serverless RAG Chatbot on AWS

A production-ready Serverless Retrieval-Augmented Generation (RAG) system built on AWS Lambda, API Gateway, Pinecone, and OpenAI. The system allows users to upload documents (PDF) and ask natural language questions via a ChatGPT-like web UI.

🚀 Key Features

🔍 Semantic Search (RAG) using vector embeddings
📚 PDF ingestion & chunking
🧠 LLM-based answer generation (OpenAI)
⚡ Serverless architecture (AWS Lambda + API Gateway)
🐳 Container-based Lambda (ECR)
🌐 ChatGPT-like Web UI (pure HTML/CSS/JS)
🔐 Secrets managed via GitHub Secrets
📦 Infrastructure as Code with Terraform
🔄 CI/CD with GitHub Actions

🏗️ High-Level Architecture

User (Browser)
   ↓
index.html (Chat UI)
   ↓  POST /ask
AWS API Gateway
   ↓
AWS Lambda (Query Handler)
   ├─ Retrieval → Pinecone Vector DB
   └─ Generation → OpenAI Chat Model
   ↓
Answer returned to UI

A separate Ingestion Lambda is used to process and embed documents.

📂 Project Structure

SERVERLESS_RAG_PROJECT
├── .github/
│   └── workflows/
│       └── deploy.yml          # CI/CD pipeline
├── infra/                      # Terraform IaC
│   ├── main.tf
│   ├── variables.tf
│   └── outputs.tf
├── src/
│   ├── common/
│   │   └── logger.py
│   ├── ingestion/
│   │   ├── handler.py          # Document ingestion Lambda
│   │   └── service.py
│   └── retrieval/
│       ├── handler.py          # Query Lambda
│       ├── search.py
│       └── generator.py
├── index.html                  # ChatGPT-like frontend
├── Dockerfile
├── requirements.txt
└── README.md

🧠 RAG Flow Explained

1️⃣ Ingestion Phase

Upload PDF documents
Extract text per page
Split text into chunks
Generate embeddings using OpenAI Embeddings
Store vectors in Pinecone

2️⃣ Query Phase

User submits a question
Convert question to embedding
Perform similarity search in Pinecone
Retrieve top-K relevant chunks
Inject context into prompt
Generate final answer via OpenAI Chat Model

🛠️ Technology Stack

Layer	Technology
Frontend	HTML, CSS, Vanilla JS
API	AWS API Gateway
Compute	AWS Lambda (Container Image)
Container	Docker + Amazon ECR
Vector DB	Pinecone
Embeddings	OpenAI `text-embedding-3-small`
LLM	OpenAI Chat Models
IaC	Terraform
CI/CD	GitHub Actions

🔐 Environment Variables

Configured via GitHub Secrets and injected by Terraform:

OPENAI_API_KEY=
PINECONE_API_KEY=
PINECONE_INDEX_NAME=
PINECONE_NAMESPACE=default

⚠️ Changing env vars does NOT require rebuilding Docker images — only a redeploy.

🔄 CI/CD Pipeline (GitHub Actions)

Triggered on push to master:

Checkout code
Build Docker image
Push image to Amazon ECR
Run Terraform
Update Lambda functions

Image tagging strategy:

latest
${GITHUB_SHA} (immutable)

🐳 Docker & Lambda

Base image: public.ecr.aws/lambda/python:3.12
Optimized for fast cold start
No heavy ML libraries (no torch, no sentence-transformers)

Lambda handlers:

src.ingestion.handler.handler
src.retrieval.handler.handler

🌐 Frontend Usage

Open index.html in a browser and ask questions:

POST /ask
{
  "query": "What is the main content of ML.pdf?"
}

Features:

ChatGPT-style UI
User / Bot message alignment
Loading indicator
CORS enabled

💰 Cost Considerations

💵 Pay-as-you-go (OpenAI)
Pinecone: free / starter tier suitable for small docs
AWS Lambda: extremely low cost for light usage

Example:

1 PDF (~6 pages): < $0.01 embedding cost
Typical query: fractions of a cent

✅ Production Best Practices Applied

No secrets in Docker images
Immutable image tags
Environment-based configuration
Stateless Lambdas
Warm-start optimization

🚧 Future Improvements

🔐 Authentication (Cognito / JWT)
📎 Source citation in answers
🔄 Streaming responses
📊 Observability & metrics
🌍 Multi-language support

👤 Author

Built by Thanh – Backend Engineer

Focused on AWS, Serverless, and AI-powered systems.

⭐ Summary

This project demonstrates a clean, scalable, and production-ready RAG architecture using modern cloud-native and AI technologies.

If you are learning Serverless + GenAI, this is a solid real-world reference.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
front_end		front_end
infra		infra
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Serverless RAG Chatbot on AWS

🚀 Key Features

🏗️ High-Level Architecture

📂 Project Structure

🧠 RAG Flow Explained

1️⃣ Ingestion Phase

2️⃣ Query Phase

🛠️ Technology Stack

🔐 Environment Variables

🔄 CI/CD Pipeline (GitHub Actions)

🐳 Docker & Lambda

🌐 Frontend Usage

💰 Cost Considerations

✅ Production Best Practices Applied

🚧 Future Improvements

👤 Author

⭐ Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📄 Serverless RAG Chatbot on AWS

🚀 Key Features

🏗️ High-Level Architecture

📂 Project Structure

🧠 RAG Flow Explained

1️⃣ Ingestion Phase

2️⃣ Query Phase

🛠️ Technology Stack

🔐 Environment Variables

🔄 CI/CD Pipeline (GitHub Actions)

🐳 Docker & Lambda

🌐 Frontend Usage

💰 Cost Considerations

✅ Production Best Practices Applied

🚧 Future Improvements

👤 Author

⭐ Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages