Transform your AWS infrastructure into AI-ready insights with CloudQuery, PostgreSQL, and pgvector. This demo showcases how to build AI/ML pipelines on clean cloud infrastructure data.
- Docker and Docker Compose installed
- AWS CLI configured (optional - demo works with sample data)
# 1. Make scripts executable and run setup
chmod +x *.sh
./setup.sh
# 2. Run the interactive demo
./demo.sh
That's it! The setup script will automatically:
- Install CloudQuery CLI
- Start PostgreSQL with pgvector extension
- Load sample AWS infrastructure data
- Verify everything is working
This demo creates a complete AI pipeline that:
- Extracts AWS infrastructure data using CloudQuery
- Stores data in PostgreSQL with pgvector for AI capabilities
- Generates real vector embeddings using local AI models (sentence-transformers)
- Performs AI-powered analysis including similarity search and clustering
- Provides actionable insights for cost optimization and standardization
This demo uses local AI models to generate meaningful vector embeddings from your infrastructure configurations:
- Model:
all-MiniLM-L6-v2
(384 dimensions, production-ready) - Processing: Converts resource metadata to descriptive text
- Generation: Creates semantic embeddings locally without external APIs
- Storage: pgvector-optimized vectors for fast similarity search
- Benefits: No API costs, works offline, genuine semantic understanding
The embeddings capture the semantic meaning of your infrastructure, enabling intelligent similarity analysis between resources, teams, and environments.
setup.sh
- Full automated setup (start here!)quickstart.sh
- Quick infrastructure start for existing installationscleanup.sh
- Reset environment for fresh starthealthcheck.sh
- Diagnose any issuesdemo.sh
- Interactive demo with explanations
generate_embeddings.py
- Local AI embedding generation using sentence-transformersrun_embeddings.sh
- Automated embedding generation scriptDockerfile.embeddings
- Containerized embedding servicerequirements.txt
- Python dependencies for AI models
- CloudQuery Hub - Explore 100+ data source plugins
- CloudQuery Documentation - Complete setup and usage guides
- pgvector Documentation - Vector similarity search
- PostgreSQL Documentation - Database reference
This demo shows one way to use CloudQuery with AI pipelines. CloudQuery connects to 100+ data sources including:
- Cloud Providers: AWS, GCP, Azure, DigitalOcean
- SaaS Platforms: GitHub, GitLab, Slack, Jira
- Infrastructure: Kubernetes, Terraform, Docker
- Security: CrowdStrike, Okta, Auth0
- And many more...
Each plugin provides normalized, SQL-ready data that you can integrate with any AI/ML workflow, vector database, or analytics platform.
CloudQuery: The data foundation for infrastructure AI/ML pipelines