Aether is an end-to-end MLOps application that solves a core business problem: the automated creation of personalized ad copy. The system uses a classic machine learning model (K-Means Clustering) on real sales data to discover customer segments and then leverages a powerful large language model (Google's Gemini 2.5 Pro) to generate tailored ad copy for each of those segments on demand.
The entire application is built on a scalable, asynchronous, and containerized architecture using FastAPI, Celery, Docker, and PostgreSQL. It is further enhanced with a professional CI/CD pipeline for automated testing and builds, and is fully deployable to a production-like Kubernetes environment.
- Data-Driven Customer Segmentation: Uses K-Means clustering on the Olist e-commerce dataset to identify distinct customer profiles based on purchasing behavior (e.g., "High-Value Champions", "New Customers").
- On-Demand AI Ad Generation: Employs Google's Gemini 2.5 Pro API with advanced prompt engineering to create unique, high-quality ad copy for each customer segment.
- Asynchronous & Scalable Backend: Built with a modern Python stack (FastAPI, Celery, Redis) to handle multiple long-running AI tasks efficiently without blocking the user.
- Containerized for Portability: Fully containerized with Docker, allowing the entire multi-service application (
api
,worker
,db
,redis
) to run consistently in any environment. - CI/CD Automation: Includes a GitHub Actions workflow that automatically runs the test suite, builds the production Docker image, and pushes it to a container registry on every commit to the main branch.
- Production-Ready Deployment: Comes with complete Kubernetes manifests for deploying the application to a production-grade orchestration platform.
- Optimization-Ready: Features a
/feedback
endpoint to track ad performance (clicks/impressions), enabling a closed-loop system for future optimization and model re-training. - Fully Tested: Includes a comprehensive suite of automated integration tests using
pytest
to ensure code quality and reliability.
The system is designed as a set of communicating microservices, separating the API, background processing, and data storage into distinct, scalable containers.
+------------------------------------------------------------------------------------------------------------------------+
| THE AETHER PRODUCTION SYSTEM (Run Time) |
| |
| [Olist Kaggle Data] -> [K-Means Model] -> [Customer Segments] |
| (Runs on CPU) | |
| v |
| [User] -> [FastAPI] -> [Redis] -> [Celery Worker] --(Uses Segments to build prompt)-->[Calls Gemini 2.5 Pro API] |
| | | ^ |
| | | | (Stores Results) |
| | v +------------------+ |
| +-----> [POST /feedback] -> [Postgres DB] <------------------+ |
| (Enables Optimization) |
+------------------------------------------------------------------------------------------------------------------------+
The application workflow is fully asynchronous to handle the time-intensive AI generation process:
- API Request: A user sends a
POST
request to the/campaigns
endpoint with product information. The FastAPI server immediately creates aPENDING
campaign in the PostgreSQL database. - Task Queuing: The API places a job on a Redis queue with the campaign ID and instantly returns a
202 Accepted
response to the user. - Background Processing: A Celery worker, running in a separate container, picks up the job from the Redis queue.
- Customer Segmentation: The worker loads a pre-trained K-Means model (built from the Olist dataset) to get a list of customer segments.
- AI Generation: For each segment, the worker constructs a detailed prompt and calls the Gemini 2.5 Pro API to generate tailored ad copy.
- Persistence: The worker saves each piece of generated ad copy to the
creatives
table in the PostgreSQL database and updates the campaign's status toCOMPLETED
. - Result Retrieval: The user can then send a
GET
request to the/campaigns/{id}
endpoint to retrieve the final, AI-generated results.
- Backend: FastAPI, Celery
- Database: PostgreSQL
- Cache & Message Broker: Redis
- Machine Learning: Scikit-learn, Pandas, Joblib
- Generative AI: Google Gemini 2.5 Pro API
- Infrastructure: Docker, Docker Compose, Kubernetes
- Automation (CI/CD): GitHub Actions
- Testing: Pytest
- Dependency Management: Poetry
The project is organized with a clean separation of concerns for the application source (src
), tests, and infrastructure configurations.
aether-marketing-system/
├── .github/
│ └── workflows/
│ └── ci-pipeline.yml # CI/CD automation workflow
├── data/ # (Not in Git) Contains raw and processed data
│ ├── raw/
│ │ ├── olist_customers_dataset.csv
│ │ └── ... (all other Olist CSVs)
│ └── processed/
│ ├── customer_segmentation_model.joblib
│ └── customer_segmentation_scaler.joblib
├── kubernetes/
│ ├── api-deployment.yaml
│ ├── api-service.yaml
│ ├── configmap.yaml
│ ├── db-deployment.yaml
│ ├── redis-deployment.yaml
│ ├── secrets.yaml
│ └── worker-deployment.yaml
├── src/
│ ├── api/
│ │ ├── endpoints/
│ │ │ ├── campaigns.py
│ │ │ └── feedback.py
│ │ ├── main.py
│ │ └── schemas.py
│ ├── core/
│ │ ├── config.py
│ │ └── db.py
│ ├── models/
│ │ └── campaign_models.py
│ └── worker/
│ ├── celery_app.py
│ ├── customer_segmentation.py
│ └── tasks.py
├── tests/
│ └── test_api.py # Automated API tests
├── .env # (Not in Git) Your local secret keys
├── .env.example # Template for environment variables
├── .gitignore
├── docker-compose.yml # Local development orchestration
├── Dockerfile # Builds the application container
├── kaggle.json # (Not in Git) Your Kaggle API credentials
├── poetry.lock
├── pyproject.toml # Python dependency management
├── pytest.ini
└── README.md # This file
To run this project, you will need the following accounts and tools:
- Docker Desktop: Installed and running on your local machine.
- Kubernetes: Enabled within your Docker Desktop settings.
- Google API Key: A valid API key with access to the Gemini 2.5 Pro model, obtainable from Google AI Studio.
- Kaggle API Key: Your Kaggle username and API key, obtainable from your Kaggle Account Settings.
- Docker Hub Account: A free account at hub.docker.com for the CI/CD pipeline to push images to.
This project is designed to run in a containerized environment. The following instructions provide a clear, step-by-step flow from initial setup to running the application both locally with Docker Compose and in a production-like environment with Kubernetes.
This initial setup only needs to be performed once to prepare your machine and accounts.
Ensure the following tools are installed and running on your machine:
- Docker Desktop: For building and running the containers.
- Git: For cloning the repository.
Open Docker Desktop, navigate to Settings > Kubernetes, and check the "Enable Kubernetes" box. Click "Apply & Restart" and wait for the indicator in the bottom-left to turn green.
You will need to create accounts and generate credentials from three external services:
- Google API Key: For accessing the Gemini 2.5 Pro model. Obtain this from Google AI Studio.
- Kaggle API Key & Dataset: The project uses the public Brazilian E-Commerce Public Dataset by Olist. To download it automatically, you need an API key. Go to your Kaggle Account Settings, click "Create New Token" to download a
kaggle.json
file containing yourusername
andkey
. - Docker Hub Credentials: A username and a personal access token are required for the CI/CD pipeline. Create a free account at hub.docker.com and generate an access token under Account Settings > Security.
git clone [https://github.com/Aditya-ADII/Aether-Marketing-System.git](https://github.com/Aditya-ADII/Aether-Marketing-System.git)
cd aether-marketing-system
The application requires two secret files in the project root. These are listed in .gitignore
to protect your credentials and must be created manually.
-
Create the
.env
file: Copy the template:cp .env.example .env
. Open the new.env
file with a text editor and add your Google API Key and other required values. -
Create the
kaggle.json
file: Create a new file namedkaggle.json
and add your Kaggle credentials from the file you downloaded in step 3.{ "username": "your-kaggle-username", "key": "your-kaggle-api-key" }
For the automated CI/CD pipeline to work, you must add your credentials to your GitHub repository's secrets. Go to your repository's Settings > Secrets and variables > Actions and create the following four repository secrets:
DOCKERHUB_USERNAME
: Your Docker Hub username.DOCKERHUB_TOKEN
: The access token you generated on Docker Hub.KAGGLE_USERNAME
: Your Kaggle username.KAGGLE_KEY
: Your Kaggle API key.
This project has three distinct operational stages: local development and testing, CI/CD automation, and deployment to Kubernetes. Following these steps will allow you to run, test, and deploy the entire application.
This workflow is for running the application and its test suite on your local machine.
Step 1: Install Dependencies
This project uses Poetry for dependency management. Install all required packages, including development dependencies like pytest
, by running:
poetry install
Step 2: Run Automated Tests Before running the full application, you can verify the core logic by running the automated test suite. The tests use a temporary, in-memory database and will not affect your Docker environment.
pytest
A successful run will show 4 passed
, confirming the application's logic is sound.
Step 3: Build and Run with Docker Compose
This single command builds the Docker image (which includes downloading the Kaggle dataset and training the segmentation model) and starts all four services (api
, worker
, db
, redis
):
docker-compose up --build
The application is now running.
- The API is available at
http://localhost:8000
. - Interactive documentation is at
http://localhost:8000/docs
.
Step 4: Manually Test the Live Application You can now interact with the running system.
- To see the logs from all services in real-time, run:
docker-compose logs -f
- To create a campaign, open a new terminal and send a
POST
request:Invoke-WebRequest -Uri http://localhost:8000/campaigns/ -Method POST -ContentType "application/json" -Body '{"product_info": "A new line of premium, organic coffee beans."}'
- To get the results after waiting ~1 minute, use a
GET
request:A successful run will return a JSON object withInvoke-WebRequest http://localhost:8000/campaigns/1
"status": "COMPLETED"
and a list of AI-generated creatives.
This workflow automates the testing and build process whenever you push code to your repository.
Step 1: The Trigger
Commit and push your code to the main
branch on GitHub.
git push origin main
Step 2: The Automated Workflow
This push automatically triggers the GitHub Actions pipeline defined in .github/workflows/ci-pipeline.yml
. The pipeline will execute the following jobs in the cloud:
- Install all Python dependencies using Poetry.
- Run the entire
pytest
suite to validate the code. - Build the final, production-ready Docker image.
- Push the tagged image to your Docker Hub repository using the secrets you configured.
Step 3: Verification You can watch the pipeline run in real-time by going to the "Actions" tab on your GitHub repository. A green checkmark indicates a successful run, meaning your code has been tested and a deployable artifact has been created.
This workflow demonstrates how to deploy the final application to a production-grade environment.
Step 1: Prerequisite Ensure your CI/CD pipeline has run successfully and pushed the latest image to your Docker Hub repository.
Step 2: Update Kubernetes Manifests
In the kubernetes/api-deployment.yaml
and kubernetes/worker-deployment.yaml
files, ensure the image:
field points to your correct Docker Hub repository (e.g., aditya12121/aether-marketing-system:latest
).
Step 3: Deploy the Application From the project root, apply all Kubernetes configurations to your running local cluster:
kubectl apply -f kubernetes/
Step 4: Verify the Deployment Check that all your application "pods" (containers) are running successfully.
kubectl get pods
Wait until all pods show Running
under the STATUS
column with 0
restarts.
Step 5: Access and Test the Service To access the API running inside Kubernetes, open a new terminal and forward the port:
kubectl port-forward svc/aether-api-service 8080:80
The API will now be accessible at http://localhost:8080
.
The project has been fully validated at every stage, from local testing to a live Kubernetes deployment.
A POST
request was sent to create a campaign, and a subsequent GET
request confirmed the successful generation of AI-powered ad copy with a COMPLETED
status. The final results are persisted correctly in the PostgreSQL database.
The project includes a full suite of automated tests using pytest
to ensure code quality and reliability. The successful test run below confirms that all API endpoints and logic are working as expected.
The GitHub Actions pipeline successfully automated the testing, building, and publishing of the application's Docker image.
The application was successfully deployed to a local Kubernetes cluster, with all pods (api
, worker
, db
, redis
) in a healthy, Running
state.
The Database files to this project.
The Reddis Key to this project.
The Docker Container Build to this project.
The Docker Container in Docker Desktop to this project.