An Enterprise-Grade Data Governance Tool Powered by Generative AI
The Metadata Enhancer solves the critical "blank cover" problem in enterprise data catalogs. By leveraging Google Gemini 2.0, it automatically analyzes technical schemas, raw data samples, and usage logs to generate rich, business-ready documentation.
This project demonstrates a complete End-to-End DevOps Lifecycle, featuring containerization, automated CI/CD pipelines, and Kubernetes orchestration.
graph TD
User[User] -->|Uploads Files| UI["Frontend (HTML/JS)"]
UI -->|POST /api/generate| API["FastAPI Backend"]
API -->|Parse| Parser["Parser Service"]
API -->|Context| AI["AI Service"]
AI -->|Prompt| Gemini["Google Gemini API"]
Gemini -->|Description| AI
AI -->|JSON Result| API
API -->|Response| UI
| Feature | Description |
|---|---|
| 🤖 AI-Powered Analysis | Instantly generates business context and descriptions from raw schemas. |
| 📊 Multi-Source Input | Combines insights from JSON/DDL schemas, CSV samples, and logs. |
| 🛡️ Data Quality | Automatically detects missing values, outliers, and format inconsistencies. |
| 💡 Smart Recommendations | Suggests SQL queries and analytical use cases for the data. |
| 📦 Standardized Exports | Download metadata in JSON or XML for integration with catalogs like Collibra. |
- Backend: Python 3.9, FastAPI
- Frontend: Vanilla JS, Tailwind CSS
- AI Engine: Google Gemini 2.0 Flash
- Infrastructure: Docker, Kubernetes
- CI/CD: GitHub Actions
Run the application in a containerized environment.
# Build and Run
docker-compose up --buildAccess the app at http://localhost:8000
- Install Dependencies:
pip install -r requirements.txt
- Configure Environment:
Create a
.envfile with your API key:GEMINI_API_KEY=your_key_here
- Run Server:
uvicorn main:app --reload
This project is built with modern DevOps best practices.
- Automated Testing: Every push triggers
flake8linting and unit tests. - Continuous Delivery: Successful builds are automatically pushed to Docker Hub.
- Workflow: Defined in
.github/workflows/ci-cd.yml.
Deployable to any K8s cluster with high availability.
# Deploy to cluster
./scripts/deploy.sh- Scalability: Configured for 2 replicas by default.
- Security: API keys are managed via Kubernetes Secrets.
- Secret Management: No hardcoded keys. Uses
.envfor local and K8s Secrets for production. - Least Privilege: Docker container runs as a non-root user (configurable).
- Image Safety: Built on official
python:slimimages to reduce vulnerabilities.
This project is licensed under the MIT License - see the LICENSE file for details.