diff --git a/aws/.gitignore b/aws/.gitignore new file mode 100644 index 0000000..6f740ae --- /dev/null +++ b/aws/.gitignore @@ -0,0 +1,25 @@ +# Python +__pycache__/ +*.py[cod] +*.pyo +*.egg-info/ +*.egg +.venv/ +.env + +# CDK / AWS +cdk.out*/ +cdk.context.json + +# Logs / test artifacts +npm-debug.log* +pip-log.txt +pytest_cache/ +coverage/ +htmlcov/ +deployment-outputs*.json + + +# Mac / Linux +.DS_Store +*.swp \ No newline at end of file diff --git a/aws/POST-DEPLOYMENT-STEPS.md b/aws/POST-DEPLOYMENT-STEPS.md new file mode 100644 index 0000000..27d1a4a --- /dev/null +++ b/aws/POST-DEPLOYMENT-STEPS.md @@ -0,0 +1,255 @@ +# πŸš€ StackAI Post-Deployment Steps + +After your CDK infrastructure deployment completes, you need to follow these steps to get your StackAI application running. + +## βœ… Prerequisites + +1. CDK deployment completed successfully (`CREATE_COMPLETE` status) +2. Docker installed and running on your machine +3. kubectl installed +4. AWS CLI configured with appropriate permissions + +## πŸ“‹ Step-by-Step Deployment + +### 1. Check Infrastructure Status + +```bash +# Check if CDK deployment completed +export AWS_DEFAULT_REGION=us-east-2 +python3 -c "import boto3; cf = boto3.client('cloudformation', region_name='us-east-2'); print(cf.describe_stacks(StackName='StackaiEksCdkStack')['Stacks'][0]['StackStatus'])" +``` + +Wait until status shows `CREATE_COMPLETE` before proceeding. + +### 2. Get Connection Information + +```bash +# Run the helper script to get all connection details +./get-connection-info.sh +``` + +This will output: + +- EKS cluster name +- Aurora database endpoint +- DocumentDB endpoint +- Redis endpoint +- S3 bucket name +- Connection strings for your applications + +### 3. Configure kubectl + +```bash +# Configure kubectl to connect to your EKS cluster +aws eks update-kubeconfig --region us-east-2 --name [CLUSTER_NAME_FROM_STEP_2] + +# Verify connection +kubectl get nodes +kubectl get pods -A +``` + +### 4. Build and Push Docker Images + +```bash +# Build Docker images for your application services +./build-images.sh +``` + +This script will: + +- Create ECR repositories +- Build Docker images for StackWeb, StackEnd, and StackRepl +- Push images to ECR + +### 5. Update Configuration + +Edit `deploy-applications.sh` and update the following: + +1. **CLUSTER_NAME**: Use the name from step 2 +2. **Database connection strings**: Replace placeholders with actual endpoints +3. **Docker image URIs**: Use the ECR URIs from step 4 +4. **API keys and secrets**: Replace placeholder values + +Example updates needed: + +```bash +# Update in deploy-applications.sh +CLUSTER_NAME="your-actual-cluster-name" + +# Replace placeholder connection strings +- value: "postgresql://postgres:PASSWORD@AURORA_ENDPOINT:5432/postgres" ++ value: "postgresql://postgres:YOUR_ACTUAL_PASSWORD@your-aurora-endpoint.amazonaws.com:5432/postgres" + +# Replace placeholder image URIs +- image: stackai/stackweb:latest ++ image: 881490119564.dkr.ecr.us-east-2.amazonaws.com/stackai/stackweb:latest +``` + +### 6. Get Database Passwords + +```bash +# Get Aurora password from Secrets Manager +aws secretsmanager get-secret-value --secret-id StackaiEksCdkStack-ManagedServices-AuroraSecret --region us-east-2 --query SecretString --output text | jq -r .password + +# Get DocumentDB password from Secrets Manager +aws secretsmanager get-secret-value --secret-id StackaiEksCdkStack-ManagedServices-DocDbSecret --region us-east-2 --query SecretString --output text | jq -r .password +``` + +### 7. Deploy Applications + +```bash +# Deploy all application services to EKS +./deploy-applications.sh +``` + +This will deploy: + +- StackWeb (frontend) +- StackEnd (backend + Celery workers) +- StackRepl (code execution) +- Weaviate (vector database) +- Unstructured API (document processing) +- Ingress for external access + +### 8. Verify Deployment + +```bash +# Check if all pods are running +kubectl get pods -A + +# Check ingress and load balancer status +kubectl get ingress -A +kubectl get svc -A | grep LoadBalancer + +# Get load balancer URL +kubectl get ingress stackai-main-ingress -n stackweb -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' +``` + +### 9. Database Initialization + +**Supabase Database Setup:** + +```bash +# Connect to Aurora and run Supabase migrations +kubectl exec -it deployment/gotrue -n supabase -- /bin/sh + +# Inside the container, run database migrations +# (Supabase will automatically create required tables on first startup) +``` + +**StackEnd Database Setup:** + +```bash +# Connect to StackEnd backend pod +kubectl exec -it deployment/stackend-backend -n stackend -- /bin/sh + +# Run database migrations +python manage.py migrate # If using Django +# or +alembic upgrade head # If using SQLAlchemy/Alembic +``` + +### 10. Configure DNS and SSL + +1. **Get Load Balancer URL:** + + ```bash + kubectl get ingress stackai-main-ingress -n stackweb -o jsonpath='{.status.loadBalancer.ingress[0].hostname}' + ``` + +2. **Set up DNS records:** + + - Point `app.stackai.com` to the load balancer + - Point `api.stackai.com` to the load balancer + - Point `backend.stackai.com` to the load balancer + +3. **Configure SSL certificates:** + + ```bash + # Install cert-manager for automatic SSL + kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.yaml + + # Create certificate issuer (Let's Encrypt) + kubectl apply -f - <=2.XX + + 3. Python 3.9+ + β€’ Create and activate a virtual environment for Python. + ``` + +python3 -m venv .venv +source .venv/bin/activate + + 4. kubectl (optional, to inspect EKS cluster after deployment) + β€’ Install via package manager or curl -LO from official Kubernetes releases. + +--- + +## AWS User Setup + +For security and organization, we'll create a dedicated IAM user for deploying the StackAI infrastructure. This approach allows for: + +- **Isolated permissions** specific to this deployment +- **Easy resource tracking** through consistent tagging +- **Simple cleanup** by removing all tagged resources +- **Audit trail** for deployment activities + +### Option A: Automated Setup (Recommended) + +We provide a script that automates the entire user creation process: + +```bash +# First, configure AWS CLI with your admin credentials +aws configure + +# Run the automated setup script +./setup-aws-user.sh +``` + +The script will: + +- Create the deployment user with proper tags +- Generate and save access keys securely +- Create and attach the necessary IAM policy +- Set up the resource group for tracking +- Configure the AWS CLI profile automatically + +### Option B: Manual Setup + +If you prefer to create the user manually, follow these steps: + +### Step 1: Create the Deployment User + +First, configure your AWS CLI with administrative credentials to create the deployment user: + +```bash +# Configure AWS CLI with your admin credentials (one-time setup) +aws configure +``` + +Create the StackAI deployment user: + +```bash +# Create IAM user for StackAI deployment +aws iam create-user \ + --user-name stackai-deployment-user \ + --tags Key=Project,Value=StackAI Key=Purpose,Value=Deployment \ + --path /stackai/ + +# Create access key for the user +aws iam create-access-key \ + --user-name stackai-deployment-user > stackai-deployment-user-keys.json + +# Display the access key (save these securely!) +cat stackai-deployment-user-keys.json +``` + +### Step 2: Create and Attach IAM Policy + +Create a comprehensive policy for CDK deployment: + +```bash +# Create IAM policy for StackAI deployment +cat > stackai-deployment-policy.json << 'EOF' +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "cloudformation:*", + "iam:*", + "ec2:*", + "eks:*", + "rds:*", + "docdb:*", + "elasticache:*", + "s3:*", + "secretsmanager:*", + "apigateway:*", + "lambda:*", + "logs:*", + "ses:*", + "acm:*", + "route53:*", + "elasticloadbalancing:*", + "autoscaling:*", + "ssm:*", + "kms:*", + "sts:*", + "tag:*" + ], + "Resource": "*" + }, + { + "Effect": "Allow", + "Action": [ + "iam:PassRole" + ], + "Resource": "*" + } + ] +} +EOF + +# Create the policy +aws iam create-policy \ + --policy-name StackAIDeploymentPolicy \ + --policy-document file://stackai-deployment-policy.json \ + --description "Policy for StackAI CDK deployment" \ + --tags Key=Project,Value=StackAI Key=Purpose,Value=Deployment + +# Attach policy to user +aws iam attach-user-policy \ + --user-name stackai-deployment-user \ + --policy-arn "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/StackAIDeploymentPolicy" +``` + +### Step 3: Create Resource Group for Easy Management + +Create a resource group to track all StackAI resources: + +```bash +# Create resource group for StackAI resources +aws resource-groups create-group \ + --name "StackAI-Infrastructure" \ + --description "All AWS resources for StackAI deployment" \ + --resource-query '{ + "Type": "TAG_FILTERS_1_0", + "Query": "{\"ResourceTypeFilters\":[\"AWS::AllSupported\"],\"TagFilters\":[{\"Key\":\"Project\",\"Values\":[\"StackAI\"]}]}" + }' \ + --tags Project=StackAI,Environment=Production,ManagedBy=CDK +``` + +### Step 4: Configure AWS CLI with Deployment User + +Configure your AWS CLI to use the new deployment user: + +```bash +# Configure AWS CLI with deployment user credentials +aws configure --profile stackai-deployment +# Enter the AccessKeyId and SecretAccessKey from stackai-deployment-user-keys.json +# Set your preferred region (e.g., us-east-1) +# Set output format to json + +# Set the profile as default for this session +export AWS_PROFILE=stackai-deployment + +# Verify the configuration +aws sts get-caller-identity +``` + +### Step 5: Resource Cleanup (When Needed) + +#### Option A: Automated Cleanup (Recommended) + +Use the provided cleanup script to remove everything: + +```bash +# Run the automated cleanup script +./cleanup-aws-resources.sh +``` + +The script will automatically: + +- Destroy the CDK stack +- Find and list any remaining tagged resources +- Remove the resource group +- Delete the deployment user and access keys +- Remove the IAM policy +- Clean up local credential files + +#### Option B: Manual Cleanup + +When you want to remove all StackAI infrastructure manually: + +```bash +# First, destroy the CDK stack +cdk destroy StackaiEksCdkStack --force + +# Find and delete any remaining tagged resources +aws resourcegroupstaggingapi get-resources \ + --tag-filters Key=Project,Values=StackAI \ + --query 'ResourceTagMappingList[].ResourceARN' \ + --output table + +# Clean up the deployment user and policies (optional) +aws iam detach-user-policy \ + --user-name stackai-deployment-user \ + --policy-arn "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/StackAIDeploymentPolicy" + +aws iam delete-access-key \ + --user-name stackai-deployment-user \ + --access-key-id $(cat stackai-deployment-user-keys.json | jq -r '.AccessKey.AccessKeyId') + +aws iam delete-user --user-name stackai-deployment-user + +aws iam delete-policy \ + --policy-arn "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):policy/StackAIDeploymentPolicy" + +# Remove local credential files +rm -f stackai-deployment-user-keys.json stackai-deployment-policy.json +``` + +--- + +## Repository Structure + +```txt +stackai_eks_cdk/ +β”œβ”€β”€ bin/ +β”‚ └── stackai_eks_cdk.py # CDK entry point +β”œβ”€β”€ lib/ +β”‚ └── stackai_eks_cdk_stack.py # Main CDK stack definition +β”œβ”€β”€ k8s/ +β”‚ β”œβ”€β”€ supabase/ +β”‚ β”‚ β”œβ”€β”€ goTrue_deployment.yaml +β”‚ β”‚ β”œβ”€β”€ postgrest_deployment.yaml +β”‚ β”‚ β”œβ”€β”€ realtime_deployment.yaml +β”‚ β”‚ β”œβ”€β”€ storage_deployment.yaml +β”‚ β”‚ β”œβ”€β”€ pgmeta_deployment.yaml +β”‚ β”‚ β”œβ”€β”€ edgefunctions_deployment.yaml +β”‚ β”‚ β”œβ”€β”€ configmaps_and_secrets.yaml +β”‚ β”‚ └── ingress_rules.yaml +β”‚ β”œβ”€β”€ weaviate_deployment.yaml +β”‚ β”œβ”€β”€ unstructured_deployment.yaml +β”‚ β”œβ”€β”€ stackweb_deployment.yaml +β”‚ β”œβ”€β”€ stackend_backend_deployment.yaml +β”‚ β”œβ”€β”€ stackend_celery_deployment.yaml +β”‚ β”œβ”€β”€ stackrepl_deployment.yaml +β”‚ └── common/ +β”‚ β”œβ”€β”€ namespace.yaml +β”‚ β”œβ”€β”€ serviceaccount_irsa.yaml # IRSA YAML for pods needing AWS permissions +β”‚ └── rbac.yaml # RBAC for AWS Load Balancer Controller, etc. +β”œβ”€β”€ requirements.txt # Python dependencies +β”œβ”€β”€ cdk.json # CDK config (points to bin/stackai_eks_cdk.py) +└── README.md # This file +``` + + β€’ bin/stackai_eks_cdk.py β€” Entry point that instantiates the single CDK stack. + β€’ lib/stackai_eks_cdk_stack.py β€” Defines all AWS resources, EKS cluster, managed services, IRSA roles, and applies Kubernetes manifests. + β€’ k8s/ β€” Contains raw Kubernetes YAML files for each service, grouped by folder. These can be used to generate or update manifests, but CDK applies them directly via cluster.add_manifest(...). + β€’ requirements.txt β€” Lists aws-cdk-lib and constructs versions (for CDK v2 Python). + +--- + +## Installation + +### 1. Clone the repository + +```bash +git clone https://github.com/stackai/stackai-onprem +cd stackai-onprem/aws +``` + +### 2. Create and activate a Python virtual environment + +```bash +python3 -m venv .venv +source .venv/bin/activate +``` + +### 3. Install Python dependencies + +```bash +pip install -r requirements.txt +``` + +### 4. (Optional) Install CDK CLI + +If you haven’t installed CDK v2 globally, you can install via npm: + +```bash +npm install -g aws-cdk@2.x +``` + +### 5. Verify CDK version + +```bash +cdk --version # should be 2.x.y +``` + +--- + +## Bootstrapping & Deployment + +⚠️ **Important**: Complete the [AWS User Setup](#aws-user-setup) section first to create the deployment user and configure your AWS CLI. + +Before deploying any CDK v2 stacks, you must bootstrap the AWS environment once. The bootstrap step provisions an S3 bucket and IAM roles that CDK uses to store synthesized templates and assets. + +### 1. Bootstrap the environment + +```bash +# Make sure you're using the stackai-deployment profile +export AWS_PROFILE=stackai-deployment + +# Verify your identity +aws sts get-caller-identity + +# Bootstrap CDK for your account and region +cdk bootstrap +``` + +If you need to specify a different region: + +```bash +cdk bootstrap aws://$(aws sts get-caller-identity --query Account --output text)/us-west-2 +``` + + 2. Synthesize CloudFormation templates (optional) + +cdk synth + +Generates the CloudFormation YAML/JSON in the cdk.out/ folder. + + 3. Deploy the stack + +cdk deploy StackaiEksCdkStack --require-approval never + +This will provision all AWS resources described in lib/stackai_eks_cdk_stack.py. CDK will display progress and output the following values when complete: +β€’ EksClusterName +β€’ DocumentDbEndpoint +β€’ AuroraPostgresEndpoint +β€’ RedisEndpoint +β€’ SupabaseStorageBucketName +β€’ EdgeFunctionsApiUrl + + 4. (Optional) Kubeconfig setup + +After the EKS cluster is created, CDK will automatically update your local kubeconfig so you can interact with it via kubectl. + +aws eks update-kubeconfig --name --region +kubectl get nodes # verify that EKS nodes are ready + +βΈ» + +Project Components + +1. VPC & Subnets + β€’ A new VPC with two AZs. + β€’ One public and one private subnet in each AZ. + β€’ A NAT Gateway in a public subnet to allow outbound traffic from private subnets. + +2. Amazon DocumentDB (MongoDB) + β€’ A Secrets Manager secret (stackai-docdb-admin) is created with random password for docdb_admin. + β€’ A DocumentDB cluster in private subnets. + β€’ Security Group allowing EKS IP ranges (VPC CIDR) to connect on port 27017. + +3. Amazon Aurora Serverless v2 (PostgreSQL) + β€’ A Serverless Aurora v2 cluster with an autogenerated Secrets Manager secret. + β€’ Runs in private subnets. + β€’ Security Group allowing EKS to connect on port 5432. + β€’ Autoscaling from 2 ACU up to 16 ACU, auto-pause after 10 minutes of inactivity. + +4. Amazon ElastiCache Redis + β€’ A single-node Redis cluster in private subnets. + β€’ Security Group allowing EKS to connect on port 6379. + +5. Amazon S3 Bucket (Supabase Storage) + β€’ Encrypted, private S3 bucket named SupabaseStorageBucket. + β€’ Auto-delete objects and removal policy = DESTROY (for dev/test; change in production). + +6. Amazon SES Identity (GoTrue email) + β€’ Assumes you have already verified an SES identity (noreply@mydomain.com). + β€’ IAM Role / Service Account (IRSA) for GoTrue pods to call ses:SendEmail and ses:SendRawEmail. + +7. IAM Roles & IRSA (Service Accounts) + β€’ EksAdminRole: master role for EKS cluster (optional, attaches to cluster control plane). + β€’ GoTrueSA: Kubernetes ServiceAccount in supabase namespace allowing SES access. + β€’ StorageSA: ServiceAccount in supabase namespace with s3:PutObject, s3:GetObject, etc., on the Supabase bucket. + β€’ SupabaseDbSecret: Secret in Secrets Manager with username/password for Supabase containers (pg-pooler, PostgREST, Realtime, Storage, pg-meta). + β€’ CelerySA: ServiceAccount in stackend namespace (no AWS API calls needed if Redis/RDS are accessed by hostname). + β€’ Additional IRSA manifests may be added in k8s/common/serviceaccount_irsa.yaml for other pod permissions. + +8. Amazon EKS Cluster + β€’ Created across two AZs. + β€’ Kubernetes version v1.24. + β€’ Managed Node Group with t3.large instances (2 to 4 nodes). + β€’ Endpoint access set to public & private. + β€’ Add-on: AWS Load Balancer Controller and IAM role for it (RBAC and permissions defined in k8s/common/rbac.yaml). + +9. Kubernetes Namespaces & RBAC + β€’ Namespaces: supabase, weaviate, unstructured, stackweb, stackend, stackrepl. + β€’ A ConfigMap in kube-system for ALB Ingress (cluster name and ingress class). + β€’ Additional RBAC manifests for AWS Load Balancer Controller in k8s/common/rbac.yaml. + +10. Kubernetes Manifests + +Each service has its own Deployment, Service, and Ingress (where applicable). They live under k8s/ and are applied via CDK: +β€’ supabase/ +β€’ configmaps_and_secrets.yaml (defines supabase-env-secret containing all .env-style values for GoTrue, PostgREST, Realtime, Storage, pg-meta, pg-pooler). +β€’ goTrue_deployment.yaml, postgrest_deployment.yaml, etc. (each Deployment + Service). +β€’ ingress_rules.yaml (Ingress resource with path-based routing in supabase namespace). +β€’ weaviate_deployment.yaml (Deployment, Service, Ingress). +β€’ unstructured_deployment.yaml (Deployment, Service, Ingress). +β€’ stackweb_deployment.yaml (Deployment, Service, Ingress). +β€’ stackend_backend_deployment.yaml (Deployment, Service, Ingress). +β€’ stackend_celery_deployment.yaml (Deployment only; Celery workers do not expose services). +β€’ stackrepl_deployment.yaml (Deployment, Service). +β€’ common/ +β€’ namespace.yaml (defines all namespaces). +β€’ serviceaccount_irsa.yaml (IRSA ServiceAccount bindings, if additional pods need AWS permissions). +β€’ rbac.yaml (ClusterRole, ClusterRoleBinding for AWS Load Balancer Controller, etc.). + +11. API Gateway & Lambda (Supabase Edge Functions) + β€’ Creates a single Lambda function (EdgeFunctionLambda) with inline β€œHello world” code. + β€’ Creates an API Gateway REST API named StackAiEdgeFunctionsApi. + β€’ Defines a proxy resource /functions/{proxy+} that forwards all requests to the Lambda. + +βΈ» + +Kubernetes Manifests + +All raw YAML manifests are checked in under the k8s/ directory. CDK uses cluster.add_manifest(...) (inline JSON/YAML) to apply them directly. If you wish to modify or regenerate manifests, edit the YAML files and update the CDK stack accordingly. + +Key files: +β€’ k8s/common/namespace.yaml +β€’ k8s/common/serviceaccount_irsa.yaml +β€’ k8s/common/rbac.yaml +β€’ k8s/supabase/configmaps_and_secrets.yaml +β€’ k8s/supabase/goTrue_deployment.yaml +β€’ k8s/supabase/postgrest_deployment.yaml +β€’ k8s/supabase/realtime_deployment.yaml +β€’ k8s/supabase/storage_deployment.yaml +β€’ k8s/supabase/pgmeta_deployment.yaml +β€’ k8s/supabase/edgefunctions_deployment.yaml +β€’ k8s/supabase/ingress_rules.yaml +β€’ k8s/weaviate_deployment.yaml +β€’ k8s/unstructured_deployment.yaml +β€’ k8s/stackweb_deployment.yaml +β€’ k8s/stackend_backend_deployment.yaml +β€’ k8s/stackend_celery_deployment.yaml +β€’ k8s/stackrepl_deployment.yaml + +βΈ» + +Environment Variables & Secrets + +All environment variables required by Supabase components (GoTrue, PostgREST, Realtime, Storage, pg-meta, pg-pooler) are consolidated into a single Kubernetes Secret called supabase-env-secret in the supabase namespace. Example keys include: +β€’ PGMETA_DB_PASSWORD β†’ pulled from the Aurora Serverless v2 secret +β€’ GOTRUE_JWT_SECRET, POSTGREST_JWT_SECRET β†’ random JWT secrets you must replace before deployment +β€’ GOTRUE_SMTP_PASSWORD β†’ SES SMTP password (store in Secrets Manager or directly in supabase-env-secret if using a test account) +β€’ SUPABASE_ANON_KEY, SUPABASE_SERVICE_KEY β†’ your own generated Supabase keys + +Rather than hardcoding secrets in manifests, CDK references rds_secret.secret_value_from_json("password") for the Aurora password. DocumentDB admin credentials are stored in stackai-docdb-admin (Secrets Manager) and injected into pods that need MongoDB access via IRSA or environment variables if applicable. + +If you need to add more variables, edit stackai_eks_cdk_stack.py under the SupabaseConfigAndSecrets manifest, and update the corresponding YAML in k8s/supabase/configmaps_and_secrets.yaml. + +βΈ» + +Post-Deployment Verification 1. Check CDK Outputs +After cdk deploy, note the following outputs: +β€’ EksClusterName β†’ Name of the EKS cluster +β€’ DocumentDbEndpoint β†’ Endpoint for DocumentDB (MongoDB) +β€’ AuroraPostgresEndpoint β†’ Host:port for Aurora Serverless v2 Postgres +β€’ RedisEndpoint β†’ Redis host +β€’ SupabaseStorageBucketName β†’ S3 bucket name +β€’ EdgeFunctionsApiUrl β†’ URL for Supabase Edge Functions API 2. Verify EKS Cluster + +aws eks update-kubeconfig --name --region +kubectl get nodes +kubectl get namespaces +kubectl get deployments,services,ingress -A + + 3. Test Service Endpoints + +After AWS Load Balancer Controller provisions an ALB, you’ll have a hostname (findable in the AWS Console under EC2 β†’ Load Balancers, look for an ALB with the name beginning with β€œa2w…”). Replace in the commands below: + +# Supabase GoTrue health + +curl http:///auth/health + +# Supabase PostgREST + +curl http:///rest/v1/ + +# Supabase GraphQL (if enabled) + +curl http:///graphql/v1/ + +# Supabase Realtime + +curl http:///realtime/health + +# Supabase Storage + +curl http:///storage/v1/buckets + +# Supabase pg-meta + +curl http:///pg/health + +# StackWeb UI + +curl http:/// + +# StackEnd (backend) health + +curl http:///stackend/health + +# Unstructured API + +curl http:///unstructured/ + +# Supabase Edge Functions + +curl /functions/hello + +# Weaviate (if domain set up) + +curl http://weaviate.yourdomain.com/ + +# Unstructured (if domain set up) + +curl http://unstructured.yourdomain.com/unstructured/ + +# StackWeb (if domain set up) + +curl http://app.yourdomain.com/ + +# StackEnd (if domain set up) + +curl http://backend.yourdomain.com/stackend/health + +βΈ» + +Further Improvements +β€’ HTTPS / TLS +β€’ Request or import an ACM certificate, and update Ingress annotations to reference the certificate’s ARN +β€’ Example: + +alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789012:certificate/abcd-efgh-ijkl-... +alb.ingress.kubernetes.io/ssl-redirect: "443" + + β€’ Kubernetes Secrets Store CSI Driver + β€’ Instead of storing sensitive values in Kubernetes secret.stringData, use the Secrets Store CSI driver to fetch AWS Secrets Manager or Parameter Store values at runtime. + β€’ Autoscaling + β€’ Add Horizontal Pod Autoscalers (HPA) for high-traffic services (e.g., Supabase Realtime, StackEnd backend). + β€’ Configure the EKS Cluster Autoscaler to scale EC2 instances based on pod demands. + β€’ Fine-tune Aurora auto-scaling configurations (min/max ACU). + β€’ Monitoring & Logging + β€’ Enable CloudWatch Container Insights for EKS. + β€’ Deploy Prometheus & Grafana in the cluster for detailed metrics. + β€’ Configure CloudWatch Alarms for critical metrics (e.g., high CPU/Memory on pods, RDS CPU utilization, Redis memory). + β€’ CI/CD Pipelines + β€’ Integrate with GitHub Actions, CodePipeline, or other CI/CD tools to automate cdk synth and cdk deploy. + β€’ Implement canary or blue/green deployments for Kubernetes workloads (e.g., via Argo Rollouts, Flagger). + β€’ Production Hardening + β€’ Adjust removal policies (e.g., set to RETAIN instead of DESTROY). + β€’ Enable encryption at rest for Aurora, DocumentDB, ElastiCache, and S3. + β€’ Configure multi-AZ replication for production-level resilience. + +βΈ» + +License + +This project is licensed under the MIT License. See the LICENSE file for details. diff --git a/aws/aws-architecture.png b/aws/aws-architecture.png new file mode 100644 index 0000000..ae31578 Binary files /dev/null and b/aws/aws-architecture.png differ diff --git a/aws/bin/stackai_eks_cdk.py b/aws/bin/stackai_eks_cdk.py new file mode 100644 index 0000000..b02859d --- /dev/null +++ b/aws/bin/stackai_eks_cdk.py @@ -0,0 +1,42 @@ +#!/usr/bin/env python3 +""" +StackAI EKS CDK Application Entry Point + +This application creates a complete AWS infrastructure stack to replace +the Docker Compose "stackai-onprem" setup with a fully managed AWS EKS-based architecture. +""" +import os +import sys +from pathlib import Path + +# Add the parent directory to the Python path so we can import from lib +sys.path.append(str(Path(__file__).parent.parent)) + +import aws_cdk as cdk +from lib.stackai_eks_cdk_stack import StackaiEksCdkStack + +# Get environment variables for stack configuration +account = os.environ.get('CDK_DEFAULT_ACCOUNT', cdk.Aws.ACCOUNT_ID) +region = os.environ.get('CDK_DEFAULT_REGION', 'us-east-1') + +app = cdk.App() + +# Create the main stack +StackaiEksCdkStack( + app, + "StackaiEksCdkStack", + description="StackAI on AWS EKS - Complete infrastructure for running StackAI on managed AWS services", + env=cdk.Environment( + account=account, + region=region + ), + # Add tags to all resources for better management + tags={ + "Project": "StackAI", + "Environment": "Production", + "ManagedBy": "CDK", + "Owner": "StackAI-Team" + } +) + +app.synth() \ No newline at end of file diff --git a/aws/build-images.sh b/aws/build-images.sh new file mode 100755 index 0000000..ad3af05 --- /dev/null +++ b/aws/build-images.sh @@ -0,0 +1,148 @@ +#!/bin/bash + +# Build and Push StackAI Docker Images +# This script builds Docker images from the source code and pushes them to a registry + +set -e + +# Configuration - UPDATE THESE VALUES +ECR_REGISTRY="881490119564.dkr.ecr.us-east-2.amazonaws.com" +REGION="us-east-2" +PROJECT_ROOT="/Users/alfonso.hernandez/Documents/StackAI/stackai-onprem" + +echo "🐳 Building and pushing StackAI Docker images..." + +# 1. Login to ECR +echo "πŸ” Logging into ECR..." +aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ECR_REGISTRY + +# 2. Create ECR repositories if they don't exist +echo "πŸ“¦ Creating ECR repositories..." +for repo in stackai/stackweb stackai/stackend stackai/stackrepl; do + aws ecr describe-repositories --repository-names $repo --region $REGION >/dev/null 2>&1 || \ + aws ecr create-repository --repository-name $repo --region $REGION +done + +# 3. Build StackWeb image +echo "🌐 Building StackWeb image..." +cd $PROJECT_ROOT/stackweb + +# Create Dockerfile for StackWeb if it doesn't exist +if [ ! -f Dockerfile ]; then +cat > Dockerfile < Dockerfile < Dockerfile </dev/null) +if [ $? -eq 0 ]; then + echo "πŸ“‹ Stack Status: $STACK_STATUS" +else + echo "❓ Stack not found or not accessible" +fi +echo + +# Check recent CloudFormation events +echo "3. Recent CloudFormation events (last 5)..." +aws cloudformation describe-stack-events \ + --stack-name StackaiEksCdkStack \ + --region us-east-1 \ + --query 'StackEvents[0:5].[Timestamp,LogicalResourceId,ResourceStatus,ResourceStatusReason]' \ + --output table 2>/dev/null || echo "❓ Could not retrieve stack events" +echo + +# Check EKS cluster if it exists +echo "4. Checking EKS cluster..." +CLUSTER_STATUS=$(aws eks describe-cluster --name StackAiEksCluster --region us-east-1 --query 'cluster.status' --output text 2>/dev/null) +if [ $? -eq 0 ]; then + echo "🎯 EKS Cluster Status: $CLUSTER_STATUS" +else + echo "❓ EKS cluster not found or not accessible yet" +fi +echo + +echo "=== Check complete ===" +echo "πŸ’‘ Tip: Run this script again in a few minutes to see progress" \ No newline at end of file diff --git a/aws/cleanup-aws-resources.sh b/aws/cleanup-aws-resources.sh new file mode 100755 index 0000000..3337435 --- /dev/null +++ b/aws/cleanup-aws-resources.sh @@ -0,0 +1,258 @@ +#!/bin/bash + +# StackAI AWS Resources Cleanup Script +# This script removes all StackAI-related AWS resources and the deployment user + +set -e + +echo "🧹 StackAI AWS Resources Cleanup" +echo "================================" +echo "" + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Warning +echo -e "${RED}⚠️ WARNING: This will DELETE ALL StackAI AWS resources!${NC}" +echo -e "${RED}⚠️ This action is IRREVERSIBLE!${NC}" +echo "" +read -p "Are you sure you want to continue? (type 'yes' to confirm): " CONFIRM + +if [ "$CONFIRM" != "yes" ]; then + echo -e "${YELLOW}❌ Cleanup cancelled.${NC}" + exit 0 +fi + +# Check if AWS CLI is installed +if ! command -v aws &> /dev/null; then + echo -e "${RED}❌ AWS CLI not found. Please install AWS CLI first.${NC}" + exit 1 +fi + +# Check if jq is installed +if ! command -v jq &> /dev/null; then + echo -e "${RED}❌ jq not found. Please install jq first.${NC}" + exit 1 +fi + +# Step 1: Destroy CDK Stack +echo -e "${BLUE}πŸ—οΈ Step 1: Destroying CDK stack...${NC}" + +# Try with stackai-deployment profile first +if aws sts get-caller-identity --profile stackai-deployment &> /dev/null; then + export AWS_PROFILE=stackai-deployment + echo -e "${BLUE}Using stackai-deployment profile${NC}" +fi + +# Check if stack exists +if aws cloudformation describe-stacks --stack-name StackaiEksCdkStack &> /dev/null; then + echo -e "${YELLOW}πŸ”„ Destroying StackaiEksCdkStack...${NC}" + + # First try to delete stuck Redis resources manually + echo -e "${BLUE}πŸ”§ Attempting to clean up stuck Redis resources...${NC}" + + # Find and delete Redis replication groups + REDIS_GROUPS=$(aws elasticache describe-replication-groups --query 'ReplicationGroups[?contains(ReplicationGroupId, `stmwua22zqbpf87`)].ReplicationGroupId' --output text 2>/dev/null || echo "") + if [ ! -z "$REDIS_GROUPS" ]; then + for REDIS_GROUP in $REDIS_GROUPS; do + echo -e "${YELLOW}πŸ—‘οΈ Deleting Redis cluster: $REDIS_GROUP${NC}" + aws elasticache delete-replication-group --replication-group-id "$REDIS_GROUP" --no-retain-primary-cluster 2>/dev/null || echo -e "${YELLOW}⚠️ Failed to delete Redis cluster${NC}" + done + + # Wait for Redis deletion + echo -e "${BLUE}⏳ Waiting 60 seconds for Redis cleanup...${NC}" + sleep 60 + fi + + # Try CDK destroy if available and virtual env is set up + if command -v cdk &> /dev/null && [ -d ".venv" ]; then + echo -e "${BLUE}πŸ”„ Activating virtual environment and using CDK...${NC}" + source .venv/bin/activate + cdk destroy StackaiEksCdkStack --force 2>/dev/null || { + echo -e "${YELLOW}⚠️ CDK destroy failed, trying direct CloudFormation deletion...${NC}" + aws cloudformation delete-stack --stack-name StackaiEksCdkStack + } + else + echo -e "${BLUE}πŸ”„ Using direct CloudFormation deletion...${NC}" + aws cloudformation delete-stack --stack-name StackaiEksCdkStack + fi + + # Wait for stack deletion + echo -e "${BLUE}⏳ Waiting for stack deletion to complete...${NC}" + aws cloudformation wait stack-delete-complete --stack-name StackaiEksCdkStack 2>/dev/null || { + echo -e "${YELLOW}⚠️ Stack deletion may have failed or timed out${NC}" + echo -e "${BLUE}πŸ’‘ Check AWS Console for manual cleanup if needed${NC}" + } + + echo -e "${GREEN}βœ… CDK stack destruction attempted${NC}" +else + echo -e "${YELLOW}⚠️ StackaiEksCdkStack not found or already destroyed${NC}" +fi + +# Step 2: Find and clean up remaining tagged resources +echo -e "\n${BLUE}πŸ” Step 2: Finding and cleaning up remaining tagged resources...${NC}" + +# Switch to admin credentials for cleanup +unset AWS_PROFILE +if ! aws sts get-caller-identity &> /dev/null; then + echo -e "${RED}❌ No admin AWS credentials configured. Please run 'aws configure' first.${NC}" + exit 1 +fi + +TAGGED_RESOURCES=$(aws resourcegroupstaggingapi get-resources \ + --tag-filters Key=Project,Values=StackAI \ + --query 'ResourceTagMappingList[].ResourceARN' \ + --output text 2>/dev/null || echo "") + +if [ ! -z "$TAGGED_RESOURCES" ]; then + echo -e "${YELLOW}⚠️ Found remaining tagged resources:${NC}" + echo "$TAGGED_RESOURCES" | tr '\t' '\n' + echo "" + + # Clean up specific resource types + echo -e "${BLUE}🧹 Attempting to clean up remaining resources...${NC}" + + # Clean up subnets + SUBNETS=$(echo "$TAGGED_RESOURCES" | grep "subnet/" | sed 's/.*subnet\///') + for SUBNET in $SUBNETS; do + if [ ! -z "$SUBNET" ]; then + echo -e "${YELLOW}πŸ—‘οΈ Deleting subnet: $SUBNET${NC}" + aws ec2 delete-subnet --subnet-id "$SUBNET" 2>/dev/null || echo -e "${YELLOW}⚠️ Failed to delete subnet $SUBNET${NC}" + fi + done + + # Clean up log groups + LOG_GROUPS=$(echo "$TAGGED_RESOURCES" | grep "log-group:" | sed 's/.*log-group://') + for LOG_GROUP in $LOG_GROUPS; do + if [ ! -z "$LOG_GROUP" ]; then + echo -e "${YELLOW}πŸ—‘οΈ Deleting log group: $LOG_GROUP${NC}" + aws logs delete-log-group --log-group-name "$LOG_GROUP" 2>/dev/null || echo -e "${YELLOW}⚠️ Failed to delete log group $LOG_GROUP${NC}" + fi + done + + echo -e "${GREEN}βœ… Cleanup of remaining resources attempted${NC}" +else + echo -e "${GREEN}βœ… No remaining tagged resources found${NC}" +fi + +# Step 3: Remove resource group +echo -e "\n${BLUE}πŸ“¦ Step 3: Removing resource group...${NC}" + +if aws resource-groups get-group --group-name "StackAI-Infrastructure" &> /dev/null; then + aws resource-groups delete-group --group-name "StackAI-Infrastructure" + echo -e "${GREEN}βœ… Resource group removed${NC}" +else + echo -e "${YELLOW}⚠️ Resource group 'StackAI-Infrastructure' not found${NC}" +fi + +# Step 4: Clean up deployment user +echo -e "\n${BLUE}πŸ‘€ Step 4: Cleaning up deployment user...${NC}" + +ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) +POLICY_ARN="arn:aws:iam::${ACCOUNT_ID}:policy/StackAIDeploymentPolicy" + +# Clean up deployment user +if aws iam get-user --user-name stackai-deployment-user &> /dev/null; then + echo -e "${BLUE}πŸ”— Detaching all policies from user...${NC}" + + # List and detach all attached policies + ATTACHED_POLICIES=$(aws iam list-attached-user-policies --user-name stackai-deployment-user --query 'AttachedPolicies[].PolicyArn' --output text 2>/dev/null || echo "") + for POLICY in $ATTACHED_POLICIES; do + if [ ! -z "$POLICY" ]; then + aws iam detach-user-policy --user-name stackai-deployment-user --policy-arn "$POLICY" 2>/dev/null || echo -e "${YELLOW}⚠️ Failed to detach policy $POLICY${NC}" + echo -e "${GREEN}βœ… Detached policy: $POLICY${NC}" + fi + done + + # List and delete all inline policies + INLINE_POLICIES=$(aws iam list-user-policies --user-name stackai-deployment-user --query 'PolicyNames[]' --output text 2>/dev/null || echo "") + for POLICY_NAME in $INLINE_POLICIES; do + if [ ! -z "$POLICY_NAME" ]; then + aws iam delete-user-policy --user-name stackai-deployment-user --policy-name "$POLICY_NAME" 2>/dev/null || echo -e "${YELLOW}⚠️ Failed to delete inline policy $POLICY_NAME${NC}" + echo -e "${GREEN}βœ… Deleted inline policy: $POLICY_NAME${NC}" + fi + done + + # Delete access keys + echo -e "${BLUE}πŸ”‘ Deleting access keys...${NC}" + ACCESS_KEYS=$(aws iam list-access-keys --user-name stackai-deployment-user --query 'AccessKeyMetadata[].AccessKeyId' --output text 2>/dev/null || echo "") + for ACCESS_KEY in $ACCESS_KEYS; do + if [ ! -z "$ACCESS_KEY" ]; then + aws iam delete-access-key \ + --user-name stackai-deployment-user \ + --access-key-id "$ACCESS_KEY" 2>/dev/null || echo -e "${YELLOW}⚠️ Failed to delete access key $ACCESS_KEY${NC}" + echo -e "${GREEN}βœ… Access key $ACCESS_KEY deleted${NC}" + fi + done + + # Delete user + echo -e "${BLUE}πŸ‘€ Deleting user...${NC}" + aws iam delete-user --user-name stackai-deployment-user 2>/dev/null && echo -e "${GREEN}βœ… User deleted${NC}" || echo -e "${YELLOW}⚠️ Failed to delete user${NC}" +else + echo -e "${YELLOW}⚠️ User 'stackai-deployment-user' not found${NC}" +fi + +# Step 5: Delete policy +echo -e "\n${BLUE}πŸ“‹ Step 5: Deleting IAM policy...${NC}" + +if aws iam get-policy --policy-arn "$POLICY_ARN" &> /dev/null; then + aws iam delete-policy --policy-arn "$POLICY_ARN" + echo -e "${GREEN}βœ… Policy deleted${NC}" +else + echo -e "${YELLOW}⚠️ Policy 'StackAIDeploymentPolicy' not found${NC}" +fi + +# Step 6: Clean up local files +echo -e "\n${BLUE}πŸ—‚οΈ Step 6: Cleaning up local files...${NC}" + +FILES_TO_REMOVE=( + "stackai-deployment-user-keys.json" + "stackai-deployment-policy.json" + "cdk.out" +) + +for FILE in "${FILES_TO_REMOVE[@]}"; do + if [ -f "$FILE" ] || [ -d "$FILE" ]; then + rm -rf "$FILE" + echo -e "${GREEN}βœ… Removed $FILE${NC}" + fi +done + +# Step 7: Remove AWS CLI profile +echo -e "\n${BLUE}βš™οΈ Step 7: Removing AWS CLI profile...${NC}" + +if aws configure list-profiles 2>/dev/null | grep -q "stackai-deployment"; then + # Remove profile sections from AWS config files + if [ -f ~/.aws/credentials ]; then + sed -i.bak '/\[stackai-deployment\]/,/^$/d' ~/.aws/credentials 2>/dev/null || true + fi + if [ -f ~/.aws/config ]; then + sed -i.bak '/\[profile stackai-deployment\]/,/^$/d' ~/.aws/config 2>/dev/null || true + fi + echo -e "${GREEN}βœ… AWS CLI profile removed${NC}" +else + echo -e "${YELLOW}⚠️ AWS CLI profile 'stackai-deployment' not found${NC}" +fi + +echo -e "\n${GREEN}πŸŽ‰ Cleanup Complete!${NC}" +echo -e "${GREEN}=====================${NC}" +echo "" +echo -e "${BLUE}Summary:${NC}" +echo -e "βœ… CDK stack destroyed (if it existed)" +echo -e "βœ… Remaining tagged resources listed" +echo -e "βœ… Resource group removed" +echo -e "βœ… Deployment user and access keys deleted" +echo -e "βœ… IAM policy deleted" +echo -e "βœ… Local files cleaned up" +echo -e "βœ… AWS CLI profile removed" +echo "" +echo -e "${BLUE}Notes:${NC}" +echo -e "- If any resources remain, check the AWS Console" +echo -e "- Some resources may have deletion protection enabled" +echo -e "- Backup files (.bak) were created for AWS config files" +echo "" +echo -e "${GREEN}All StackAI AWS resources have been cleaned up!${NC}" \ No newline at end of file diff --git a/aws/deploy-applications.sh b/aws/deploy-applications.sh new file mode 100755 index 0000000..ddf21c9 --- /dev/null +++ b/aws/deploy-applications.sh @@ -0,0 +1,355 @@ +#!/bin/bash + +# Deploy StackAI Applications to EKS +# Run this script after the CDK infrastructure deployment completes + +set -e + +REGION="us-east-2" +CLUSTER_NAME="" # Will be populated from CDK outputs + +echo "πŸš€ Deploying StackAI Applications to EKS..." + +# 1. Configure kubectl +echo "πŸ“‹ Configuring kubectl access..." +aws eks update-kubeconfig --region $REGION --name $CLUSTER_NAME + +# 2. Verify cluster access +echo "βœ… Verifying cluster access..." +kubectl cluster-info +kubectl get nodes + +# 3. Create application namespaces +echo "πŸ“ Creating application namespaces..." +kubectl apply -f - < None: + super().__init__(scope, construct_id, **kwargs) + + # Create VPC with public and private subnets across 2 AZs + self.vpc = ec2.Vpc( + self, "StackAiVpc", + max_azs=2, + nat_gateways=1, # Cost optimization: single NAT gateway + subnet_configuration=[ + # Public subnets for ALB and bastion hosts + ec2.SubnetConfiguration( + name="PublicSubnet", + subnet_type=ec2.SubnetType.PUBLIC, + cidr_mask=24 + ), + # Private subnets for EKS, RDS, ElastiCache, DocumentDB + ec2.SubnetConfiguration( + name="PrivateSubnet", + subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS, + cidr_mask=24 + ) + ], + enable_dns_hostnames=True, + enable_dns_support=True + ) + + # Create security groups + self._create_security_groups() + + # Output VPC information + CfnOutput( + self, "VpcId", + value=self.vpc.vpc_id, + description="VPC ID for StackAI infrastructure" + ) + + CfnOutput( + self, "VpcCidr", + value=self.vpc.vpc_cidr_block, + description="VPC CIDR block" + ) + + def _create_security_groups(self) -> None: + """Create security groups for different application tiers""" + + # Security group for EKS cluster + self.eks_cluster_sg = ec2.SecurityGroup( + self, "EksClusterSG", + vpc=self.vpc, + description="Security group for EKS cluster control plane", + allow_all_outbound=True + ) + + # Security group for EKS nodes + self.eks_nodes_sg = ec2.SecurityGroup( + self, "EksNodesSG", + vpc=self.vpc, + description="Security group for EKS worker nodes", + allow_all_outbound=True + ) + + # Allow communication between cluster and nodes + self.eks_cluster_sg.add_ingress_rule( + peer=self.eks_nodes_sg, + connection=ec2.Port.tcp(443), + description="Allow nodes to communicate with cluster API server" + ) + + self.eks_nodes_sg.add_ingress_rule( + peer=self.eks_cluster_sg, + connection=ec2.Port.tcp_range(1025, 65535), + description="Allow cluster to communicate with nodes" + ) + + # Allow node-to-node communication + self.eks_nodes_sg.add_ingress_rule( + peer=self.eks_nodes_sg, + connection=ec2.Port.all_traffic(), + description="Allow nodes to communicate with each other" + ) + + # Security group for RDS (PostgreSQL) + self.rds_sg = ec2.SecurityGroup( + self, "RdsSG", + vpc=self.vpc, + description="Security group for Aurora PostgreSQL", + allow_all_outbound=False + ) + + self.rds_sg.add_ingress_rule( + peer=self.eks_nodes_sg, + connection=ec2.Port.tcp(5432), + description="Allow EKS nodes to connect to PostgreSQL" + ) + + # Security group for DocumentDB (MongoDB) + self.docdb_sg = ec2.SecurityGroup( + self, "DocDbSG", + vpc=self.vpc, + description="Security group for DocumentDB", + allow_all_outbound=False + ) + + self.docdb_sg.add_ingress_rule( + peer=self.eks_nodes_sg, + connection=ec2.Port.tcp(27017), + description="Allow EKS nodes to connect to DocumentDB" + ) + + # Security group for ElastiCache (Redis) + self.redis_sg = ec2.SecurityGroup( + self, "RedisSG", + vpc=self.vpc, + description="Security group for ElastiCache Redis", + allow_all_outbound=False + ) + + self.redis_sg.add_ingress_rule( + peer=self.eks_nodes_sg, + connection=ec2.Port.tcp(6379), + description="Allow EKS nodes to connect to Redis" + ) + + # Security group for ALB + self.alb_sg = ec2.SecurityGroup( + self, "AlbSG", + vpc=self.vpc, + description="Security group for Application Load Balancer", + allow_all_outbound=True + ) + + # Allow HTTP and HTTPS traffic to ALB + self.alb_sg.add_ingress_rule( + peer=ec2.Peer.any_ipv4(), + connection=ec2.Port.tcp(80), + description="Allow HTTP traffic from internet" + ) + + self.alb_sg.add_ingress_rule( + peer=ec2.Peer.any_ipv4(), + connection=ec2.Port.tcp(443), + description="Allow HTTPS traffic from internet" + ) + + # Allow ALB to communicate with EKS nodes + self.eks_nodes_sg.add_ingress_rule( + peer=self.alb_sg, + connection=ec2.Port.tcp_range(30000, 32767), + description="Allow ALB to reach NodePort services" + ) + + @property + def private_subnets(self) -> List[ec2.ISubnet]: + """Return private subnets for database and cache deployments""" + return self.vpc.private_subnets + + @property + def public_subnets(self) -> List[ec2.ISubnet]: + """Return public subnets for ALB deployment""" + return self.vpc.public_subnets \ No newline at end of file diff --git a/aws/lib/constructs/eks_cluster.py b/aws/lib/constructs/eks_cluster.py new file mode 100644 index 0000000..f7ec76c --- /dev/null +++ b/aws/lib/constructs/eks_cluster.py @@ -0,0 +1,493 @@ +""" +EKS Cluster Construct + +This construct creates and configures the Amazon EKS cluster: +- EKS cluster with proper IAM roles +- Managed node groups with auto-scaling +- AWS Load Balancer Controller add-on +- Cluster autoscaler +- CoreDNS and kube-proxy add-ons +- IAM roles for service accounts (IRSA) +""" +from typing import Dict, Any +from constructs import Construct +from aws_cdk import ( + aws_eks as eks, + aws_ec2 as ec2, + aws_iam as iam, + aws_lambda as _lambda, + CfnOutput +) +from .base_infrastructure import BaseInfrastructure +from .managed_services import ManagedServices + + +class EksCluster(Construct): + """Construct for Amazon EKS cluster and related resources""" + + def __init__( + self, + scope: Construct, + construct_id: str, + infrastructure: BaseInfrastructure, + managed_services: ManagedServices, + **kwargs + ) -> None: + super().__init__(scope, construct_id, **kwargs) + + self.infrastructure = infrastructure + self.managed_services = managed_services + + # Create IAM roles + self._create_iam_roles() + + # Create EKS cluster + self._create_cluster() + + # Add managed node groups + self._add_node_groups() + + # Install essential add-ons + self._install_addons() + + # Create service accounts with IRSA + self._create_service_accounts() + + # Create outputs + self._create_outputs() + + def _create_iam_roles(self) -> None: + """Create IAM roles for EKS cluster and workers""" + + # EKS cluster service role + self.cluster_role = iam.Role( + self, "EksClusterRole", + assumed_by=iam.ServicePrincipal("eks.amazonaws.com"), + managed_policies=[ + iam.ManagedPolicy.from_aws_managed_policy_name("AmazonEKSClusterPolicy") + ] + ) + + # Add additional permissions for EKS cluster operations + self.cluster_role.add_to_policy( + iam.PolicyStatement( + effect=iam.Effect.ALLOW, + actions=[ + "ec2:DescribeAccountAttributes", + "ec2:DescribeAddresses", + "ec2:DescribeInternetGateways", + "logs:CreateLogGroup", + "logs:CreateLogStream", + "logs:PutLogEvents", + "logs:DescribeLogGroups", + "logs:DescribeLogStreams" + ], + resources=["*"] + ) + ) + + # EKS node group role + self.nodegroup_role = iam.Role( + self, "EksNodeGroupRole", + assumed_by=iam.ServicePrincipal("ec2.amazonaws.com"), + managed_policies=[ + iam.ManagedPolicy.from_aws_managed_policy_name("AmazonEKSWorkerNodePolicy"), + iam.ManagedPolicy.from_aws_managed_policy_name("AmazonEKS_CNI_Policy"), + iam.ManagedPolicy.from_aws_managed_policy_name("AmazonEC2ContainerRegistryReadOnly"), + iam.ManagedPolicy.from_aws_managed_policy_name("AmazonSSMManagedInstanceCore") + ] + ) + + # Add CloudWatch permissions for container insights + self.nodegroup_role.add_to_policy( + iam.PolicyStatement( + effect=iam.Effect.ALLOW, + actions=[ + "cloudwatch:PutMetricData", + "ec2:DescribeVolumes", + "ec2:DescribeTags", + "logs:PutLogEvents", + "logs:CreateLogGroup", + "logs:CreateLogStream", + "logs:DescribeLogStreams", + "logs:DescribeLogGroups" + ], + resources=["*"] + ) + ) + + # Add additional permissions for EKS node operations + self.nodegroup_role.add_to_policy( + iam.PolicyStatement( + effect=iam.Effect.ALLOW, + actions=[ + "ecr:GetAuthorizationToken", + "ecr:BatchCheckLayerAvailability", + "ecr:GetDownloadUrlForLayer", + "ecr:BatchGetImage" + ], + resources=["*"] + ) + ) + + def _create_cluster(self) -> None: + """Create the EKS cluster""" + + # Add comprehensive Lambda permissions to cluster role to fix lambda:GetFunction errors + # This resolves the issue where CDK's kubectl provider Lambda functions can't call each other + self.cluster_role.add_to_policy( + iam.PolicyStatement( + actions=[ + "lambda:GetFunction", + "lambda:InvokeFunction", + "lambda:GetFunctionConfiguration", + "lambda:UpdateFunctionConfiguration", + "lambda:ListFunctions" + ], + resources=["*"] + ) + ) + + self.cluster = eks.Cluster( + self, "StackAiEksCluster", + version=eks.KubernetesVersion.V1_28, + vpc=self.infrastructure.vpc, + vpc_subnets=[ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS)], + default_capacity=0, # We'll add managed node groups explicitly + endpoint_access=eks.EndpointAccess.PUBLIC_AND_PRIVATE, + role=self.cluster_role, + security_group=self.infrastructure.eks_cluster_sg, + output_cluster_name=True, + output_config_command=True, + cluster_logging=[ + eks.ClusterLoggingTypes.API, + eks.ClusterLoggingTypes.AUDIT, + eks.ClusterLoggingTypes.AUTHENTICATOR, + eks.ClusterLoggingTypes.CONTROLLER_MANAGER, + eks.ClusterLoggingTypes.SCHEDULER + ] + ) + + # Enable encryption for EKS secrets + self.cluster.add_manifest("EncryptionConfig", { + "apiVersion": "v1", + "kind": "EncryptionConfiguration", + "resources": [ + { + "resources": ["secrets"], + "providers": [ + { + "kms": { + "name": "alias/eks-encryption-key", + "cachesize": 1000 + } + }, + { + "identity": {} + } + ] + } + ] + }) + + + + def _add_node_groups(self) -> None: + """Add managed node groups to the cluster""" + + # Primary node group for general workloads + self.primary_nodegroup = self.cluster.add_nodegroup_capacity( + "PrimaryNodeGroup", + instance_types=[ + ec2.InstanceType("t3.large"), + ec2.InstanceType("t3.xlarge") + ], + min_size=2, + max_size=10, + desired_size=3, + disk_size=100, + ami_type=eks.NodegroupAmiType.AL2_X86_64, + capacity_type=eks.CapacityType.ON_DEMAND, + node_role=self.nodegroup_role, + subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS), + tags={ + "Name": "StackAI-Primary-Node", + "kubernetes.io/cluster/stackai-eks-cluster": "owned", + "k8s.io/cluster-autoscaler/enabled": "true", + "k8s.io/cluster-autoscaler/stackai-eks-cluster": "owned" + } + ) + + # Spot instances node group for cost optimization + self.spot_nodegroup = self.cluster.add_nodegroup_capacity( + "SpotNodeGroup", + instance_types=[ + ec2.InstanceType("t3.medium"), + ec2.InstanceType("t3.large"), + ec2.InstanceType("m5.large") + ], + min_size=0, + max_size=5, + desired_size=1, + disk_size=50, + ami_type=eks.NodegroupAmiType.AL2_X86_64, + capacity_type=eks.CapacityType.SPOT, + node_role=self.nodegroup_role, + subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS), + taints=[ + eks.TaintSpec( + key="spot-instance", + value="true", + effect=eks.TaintEffect.NO_SCHEDULE + ) + ], + tags={ + "Name": "StackAI-Spot-Node", + "kubernetes.io/cluster/stackai-eks-cluster": "owned", + "k8s.io/cluster-autoscaler/enabled": "true", + "k8s.io/cluster-autoscaler/stackai-eks-cluster": "owned", + "k8s.io/cluster-autoscaler/node-template/taint/spot-instance": "true:NoSchedule" + } + ) + + def _install_addons(self) -> None: + """Install essential EKS add-ons""" + + # AWS Load Balancer Controller + self.alb_controller = eks.AlbController( + self, "AlbController", + cluster=self.cluster, + version=eks.AlbControllerVersion.V2_6_2 + ) + + # EBS CSI Driver for persistent volumes + self.cluster.add_manifest("EbsCsiDriver", { + "apiVersion": "v1", + "kind": "StorageClass", + "metadata": { + "name": "gp3", + "annotations": { + "storageclass.kubernetes.io/is-default-class": "true" + } + }, + "provisioner": "ebs.csi.aws.com", + "volumeBindingMode": "WaitForFirstConsumer", + "parameters": { + "type": "gp3", + "encrypted": "true" + } + }) + + # Metrics Server for HPA + self.cluster.add_helm_chart( + "MetricsServer", + chart="metrics-server", + repository="https://kubernetes-sigs.github.io/metrics-server/", + namespace="kube-system", + values={ + "args": [ + "--cert-dir=/tmp", + "--secure-port=4443", + "--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname", + "--kubelet-use-node-status-port" + ] + } + ) + + # Cluster Autoscaler + self._install_cluster_autoscaler() + + def _install_cluster_autoscaler(self) -> None: + """Install cluster autoscaler for automatic node scaling""" + + # Create service account for cluster autoscaler + cluster_autoscaler_sa = self.cluster.add_service_account( + "ClusterAutoscalerServiceAccount", + name="cluster-autoscaler", + namespace="kube-system" + ) + + # Add permissions for cluster autoscaler + cluster_autoscaler_sa.add_to_principal_policy( + iam.PolicyStatement( + effect=iam.Effect.ALLOW, + actions=[ + "autoscaling:DescribeAutoScalingGroups", + "autoscaling:DescribeAutoScalingInstances", + "autoscaling:DescribeLaunchConfigurations", + "autoscaling:DescribeTags", + "autoscaling:SetDesiredCapacity", + "autoscaling:TerminateInstanceInAutoScalingGroup", + "ec2:DescribeLaunchTemplateVersions" + ], + resources=["*"] + ) + ) + + # Deploy cluster autoscaler + self.cluster.add_manifest("ClusterAutoscaler", { + "apiVersion": "apps/v1", + "kind": "Deployment", + "metadata": { + "name": "cluster-autoscaler", + "namespace": "kube-system", + "labels": { + "app": "cluster-autoscaler" + } + }, + "spec": { + "selector": { + "matchLabels": { + "app": "cluster-autoscaler" + } + }, + "template": { + "metadata": { + "labels": { + "app": "cluster-autoscaler" + } + }, + "spec": { + "serviceAccountName": "cluster-autoscaler", + "containers": [ + { + "image": "k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0", + "name": "cluster-autoscaler", + "resources": { + "limits": { + "cpu": "100m", + "memory": "300Mi" + }, + "requests": { + "cpu": "100m", + "memory": "300Mi" + } + }, + "command": [ + "./cluster-autoscaler", + "--v=4", + "--stderrthreshold=info", + "--cloud-provider=aws", + "--skip-nodes-with-local-storage=false", + "--expander=least-waste", + f"--node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/{self.cluster.cluster_name}" + ], + "volumeMounts": [ + { + "name": "ssl-certs", + "mountPath": "/etc/ssl/certs/ca-certificates.crt", + "readOnly": True + } + ], + "imagePullPolicy": "Always" + } + ], + "volumes": [ + { + "name": "ssl-certs", + "hostPath": { + "path": "/etc/ssl/certs/ca-bundle.crt" + } + } + ] + } + } + } + }) + + def _create_service_accounts(self) -> None: + """Create service accounts with IRSA for AWS service access""" + + # Service account for Supabase GoTrue (needs SES access) + self.gotrue_sa = self.cluster.add_service_account( + "GoTrueServiceAccount", + name="gotrue-sa", + namespace="supabase" + ) + + # Add SES permissions for GoTrue + self.gotrue_sa.add_to_principal_policy( + iam.PolicyStatement( + effect=iam.Effect.ALLOW, + actions=[ + "ses:SendEmail", + "ses:SendRawEmail", + "ses:GetSendQuota", + "ses:GetSendStatistics" + ], + resources=["*"] + ) + ) + + # Service account for Supabase Storage (needs S3 access) + self.storage_sa = self.cluster.add_service_account( + "StorageServiceAccount", + name="storage-sa", + namespace="supabase" + ) + + # Grant S3 permissions to storage service account + self.managed_services.storage_bucket.grant_read_write(self.storage_sa) + + # Service account for accessing secrets + self.secrets_sa = self.cluster.add_service_account( + "SecretsServiceAccount", + name="secrets-sa", + namespace="supabase" + ) + + # Add Secrets Manager permissions + self.secrets_sa.add_to_principal_policy( + iam.PolicyStatement( + effect=iam.Effect.ALLOW, + actions=[ + "secretsmanager:GetSecretValue", + "secretsmanager:DescribeSecret" + ], + resources=[ + self.managed_services.aurora_secret.secret_arn, + self.managed_services.docdb_secret.secret_arn, + self.managed_services.supabase_secret.secret_arn + ] + ) + ) + + def _create_outputs(self) -> None: + """Create CloudFormation outputs for cluster information""" + + CfnOutput( + self, "EksClusterName", + value=self.cluster.cluster_name, + description="EKS cluster name" + ) + + CfnOutput( + self, "EksClusterEndpoint", + value=self.cluster.cluster_endpoint, + description="EKS cluster API endpoint" + ) + + CfnOutput( + self, "EksClusterArn", + value=self.cluster.cluster_arn, + description="EKS cluster ARN" + ) + + CfnOutput( + self, "KubectlConfig", + value=f"aws eks update-kubeconfig --region {self.cluster.stack.region} --name {self.cluster.cluster_name}", + description="kubectl configuration command" + ) + + def get_cluster_info(self) -> Dict[str, Any]: + """Return cluster information for use by other constructs""" + return { + "cluster": self.cluster, + "cluster_name": self.cluster.cluster_name, + "cluster_endpoint": self.cluster.cluster_endpoint, + "service_accounts": { + "gotrue": self.gotrue_sa, + "storage": self.storage_sa, + "secrets": self.secrets_sa + } + } \ No newline at end of file diff --git a/aws/lib/constructs/managed_services.py b/aws/lib/constructs/managed_services.py new file mode 100644 index 0000000..a1fac8e --- /dev/null +++ b/aws/lib/constructs/managed_services.py @@ -0,0 +1,354 @@ +""" +Managed Services Construct + +This construct creates and configures AWS managed services: +- Aurora Serverless v2 (PostgreSQL) for Supabase +- DocumentDB for MongoDB workloads +- ElastiCache Redis for Celery and caching +- S3 bucket for Supabase Storage +- Secrets Manager for credentials +""" +from typing import Dict, Any +from constructs import Construct +from aws_cdk import ( + aws_rds as rds, + aws_docdb as docdb, + aws_elasticache as elasticache, + aws_s3 as s3, + aws_secretsmanager as secretsmanager, + aws_ec2 as ec2, + RemovalPolicy, + Duration, + CfnOutput +) +from .base_infrastructure import BaseInfrastructure + + +class ManagedServices(Construct): + """Construct for AWS managed services used by StackAI""" + + def __init__( + self, + scope: Construct, + construct_id: str, + infrastructure: BaseInfrastructure, + **kwargs + ) -> None: + super().__init__(scope, construct_id, **kwargs) + + self.infrastructure = infrastructure + + # Create secrets first + self._create_secrets() + + # Create Aurora PostgreSQL cluster + self._create_aurora_cluster() + + # Create DocumentDB cluster + self._create_documentdb_cluster() + + # Create ElastiCache Redis + self._create_redis_cluster() + + # Create S3 bucket for storage + self._create_s3_bucket() + + # Output service endpoints + self._create_outputs() + + def _create_secrets(self) -> None: + """Create secrets for database credentials""" + + # DocumentDB admin secret + self.docdb_secret = secretsmanager.Secret( + self, "DocDbAdminSecret", + secret_name="stackai-docdb-admin", + description="DocumentDB administrator credentials", + generate_secret_string=secretsmanager.SecretStringGenerator( + secret_string_template='{"username":"docdb_admin"}', + generate_string_key="password", + exclude_punctuation=True, + password_length=32, + exclude_characters='"@/\\' + ) + ) + + # Supabase database secrets + self.supabase_secret = secretsmanager.Secret( + self, "SupabaseSecret", + secret_name="stackai-supabase", + description="Supabase application secrets", + generate_secret_string=secretsmanager.SecretStringGenerator( + secret_string_template='{"jwt_secret":"","anon_key":"","service_role_key":""}', + generate_string_key="password", + exclude_punctuation=True, + password_length=64 + ) + ) + + def _create_aurora_cluster(self) -> None: + """Create Aurora Serverless v2 PostgreSQL cluster""" + + # Create DB subnet group + db_subnet_group = rds.SubnetGroup( + self, "AuroraSubnetGroup", + description="Subnet group for Aurora PostgreSQL", + vpc=self.infrastructure.vpc, + vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS) + ) + + # Create parameter group for PostgreSQL optimization + parameter_group = rds.ParameterGroup( + self, "AuroraParameterGroup", + engine=rds.DatabaseClusterEngine.aurora_postgres( + version=rds.AuroraPostgresEngineVersion.VER_15_4 + ), + description="Parameter group for StackAI Aurora PostgreSQL", + parameters={ + "shared_preload_libraries": "pg_stat_statements,pg_hint_plan", + "log_statement": "all", + "log_min_duration_statement": "1000", + "track_activity_query_size": "2048" + } + ) + + # Create Aurora Serverless v2 cluster + self.aurora_cluster = rds.DatabaseCluster( + self, "AuroraCluster", + engine=rds.DatabaseClusterEngine.aurora_postgres( + version=rds.AuroraPostgresEngineVersion.VER_15_4 + ), + credentials=rds.Credentials.from_generated_secret( + username="postgres", + secret_name="stackai-aurora-postgres" + ), + vpc=self.infrastructure.vpc, + subnet_group=db_subnet_group, + security_groups=[self.infrastructure.rds_sg], + default_database_name="postgres", + parameter_group=parameter_group, + backup=rds.BackupProps( + retention=Duration.days(7), + preferred_window="03:00-04:00" + ), + preferred_maintenance_window="sun:04:00-sun:05:00", + cloudwatch_logs_exports=["postgresql"], + removal_policy=RemovalPolicy.DESTROY, # Change to RETAIN for production + writer=rds.ClusterInstance.serverless_v2( + "writer", + auto_minor_version_upgrade=True + ), + readers=[ + rds.ClusterInstance.serverless_v2( + "reader", + scale_with_writer=True, + auto_minor_version_upgrade=True + ) + ], + serverless_v2_min_capacity=0.5, + serverless_v2_max_capacity=16 + ) + + def _create_documentdb_cluster(self) -> None: + """Create DocumentDB cluster for MongoDB workloads""" + + # Create DocumentDB subnet group + docdb_subnet_group = docdb.CfnDBSubnetGroup( + self, "DocDbSubnetGroup", + db_subnet_group_description="Subnet group for DocumentDB", + subnet_ids=[subnet.subnet_id for subnet in self.infrastructure.private_subnets], + db_subnet_group_name="stackai-docdb-subnet-group" + ) + + # Create DocumentDB cluster + self.docdb_cluster = docdb.CfnDBCluster( + self, "DocDbCluster", + master_username=self.docdb_secret.secret_value_from_json("username").unsafe_unwrap(), + master_user_password=self.docdb_secret.secret_value_from_json("password").unsafe_unwrap(), + db_subnet_group_name=docdb_subnet_group.ref, # Use ref instead of db_subnet_group_name + vpc_security_group_ids=[self.infrastructure.docdb_sg.security_group_id], + backup_retention_period=7, + preferred_backup_window="03:00-04:00", + preferred_maintenance_window="sun:04:00-sun:05:00", + storage_encrypted=True, + deletion_protection=False # Set to True for production + ) + + # Add dependency - cluster depends on subnet group + self.docdb_cluster.add_dependency(docdb_subnet_group) + + # Create DocumentDB instances + for i in range(2): # Primary + 1 replica for HA + docdb_instance = docdb.CfnDBInstance( + self, f"DocDbInstance{i+1}", + db_cluster_identifier=self.docdb_cluster.ref, + db_instance_class="db.t3.medium", + auto_minor_version_upgrade=True + ) + # Instance depends on cluster + docdb_instance.add_dependency(self.docdb_cluster) + + def _create_redis_cluster(self) -> None: + """Create ElastiCache Redis cluster""" + + # Create Redis subnet group + redis_subnet_group = elasticache.CfnSubnetGroup( + self, "RedisSubnetGroup", + description="Subnet group for Redis", + subnet_ids=[subnet.subnet_id for subnet in self.infrastructure.private_subnets], + cache_subnet_group_name="stackai-redis-subnet-group" + ) + + # Create Redis parameter group + redis_parameter_group = elasticache.CfnParameterGroup( + self, "RedisParameterGroup", + cache_parameter_group_family="redis7", + description="Parameter group for StackAI Redis", + properties={ + "maxmemory-policy": "allkeys-lru", + "timeout": "300", + "tcp-keepalive": "300" + } + ) + + # Create Redis replication group for HA + self.redis_cluster = elasticache.CfnReplicationGroup( + self, "RedisCluster", + replication_group_description="Redis cluster for StackAI caching and Celery", + cache_node_type="cache.t3.micro", + engine="redis", + engine_version="7.0", + num_cache_clusters=2, # Primary + 1 replica + automatic_failover_enabled=True, + multi_az_enabled=True, + cache_subnet_group_name=redis_subnet_group.ref, # Use ref instead of cache_subnet_group_name + cache_parameter_group_name=redis_parameter_group.ref, + security_group_ids=[self.infrastructure.redis_sg.security_group_id], + at_rest_encryption_enabled=True, + transit_encryption_enabled=True, + preferred_maintenance_window="sun:04:00-sun:05:00", + snapshot_retention_limit=5, + snapshot_window="03:00-04:00" + ) + + # Add dependencies + self.redis_cluster.add_dependency(redis_subnet_group) + self.redis_cluster.add_dependency(redis_parameter_group) + + def _create_s3_bucket(self) -> None: + """Create S3 bucket for Supabase Storage""" + + self.storage_bucket = s3.Bucket( + self, "SupabaseStorageBucket", + bucket_name=None, # Let CDK generate unique name + versioned=True, + encryption=s3.BucketEncryption.S3_MANAGED, + block_public_access=s3.BlockPublicAccess.BLOCK_ALL, + removal_policy=RemovalPolicy.DESTROY, # Change to RETAIN for production + auto_delete_objects=True, # Change to False for production + lifecycle_rules=[ + s3.LifecycleRule( + id="DeleteIncompleteMultipartUploads", + abort_incomplete_multipart_upload_after=Duration.days(1) + ), + s3.LifecycleRule( + id="TransitionToIA", + transitions=[ + s3.Transition( + storage_class=s3.StorageClass.INFREQUENT_ACCESS, + transition_after=Duration.days(30) + ), + s3.Transition( + storage_class=s3.StorageClass.GLACIER, + transition_after=Duration.days(90) + ) + ] + ) + ], + cors=[ + s3.CorsRule( + allowed_headers=["*"], + allowed_methods=[ + s3.HttpMethods.GET, + s3.HttpMethods.POST, + s3.HttpMethods.PUT, + s3.HttpMethods.DELETE, + s3.HttpMethods.HEAD + ], + allowed_origins=["*"], # Restrict this in production + max_age=3000 + ) + ] + ) + + def _create_outputs(self) -> None: + """Create CloudFormation outputs for service endpoints""" + + CfnOutput( + self, "AuroraClusterEndpoint", + value=self.aurora_cluster.cluster_endpoint.hostname, + description="Aurora PostgreSQL cluster endpoint" + ) + + CfnOutput( + self, "AuroraClusterReaderEndpoint", + value=self.aurora_cluster.cluster_read_endpoint.hostname, + description="Aurora PostgreSQL cluster reader endpoint" + ) + + CfnOutput( + self, "DocumentDbEndpoint", + value=self.docdb_cluster.attr_endpoint, + description="DocumentDB cluster endpoint" + ) + + CfnOutput( + self, "RedisEndpoint", + value=self.redis_cluster.attr_configuration_end_point_address, + description="Redis cluster configuration endpoint" + ) + + CfnOutput( + self, "StorageBucketName", + value=self.storage_bucket.bucket_name, + description="S3 bucket name for Supabase Storage" + ) + + CfnOutput( + self, "DocDbSecretArn", + value=self.docdb_secret.secret_arn, + description="DocumentDB credentials secret ARN" + ) + + CfnOutput( + self, "SupabaseSecretArn", + value=self.supabase_secret.secret_arn, + description="Supabase application secrets ARN" + ) + + @property + def aurora_secret(self) -> secretsmanager.ISecret: + """Return Aurora cluster secret""" + return self.aurora_cluster.secret + + def get_connection_info(self) -> Dict[str, Any]: + """Return connection information for all services""" + return { + "aurora": { + "endpoint": self.aurora_cluster.cluster_endpoint.hostname, + "port": self.aurora_cluster.cluster_endpoint.port, + "secret_arn": self.aurora_cluster.secret.secret_arn + }, + "documentdb": { + "endpoint": self.docdb_cluster.attr_endpoint, + "port": 27017, + "secret_arn": self.docdb_secret.secret_arn + }, + "redis": { + "endpoint": self.redis_cluster.attr_configuration_end_point_address, + "port": 6379 + }, + "storage": { + "bucket_name": self.storage_bucket.bucket_name + } + } \ No newline at end of file diff --git a/aws/lib/constructs/supabase.py b/aws/lib/constructs/supabase.py new file mode 100644 index 0000000..484f444 --- /dev/null +++ b/aws/lib/constructs/supabase.py @@ -0,0 +1,639 @@ +""" +Supabase Construct + +This construct deploys all Supabase services to the EKS cluster: +- GoTrue (Authentication) +- PostgREST (REST API) +- Realtime (WebSocket connections) +- Storage (File storage) +- pg-meta (Database management) +- Edge Functions (Serverless functions) +- Kong (API Gateway) +- Studio (Admin dashboard) +- Analytics (Logflare) +""" +from typing import Dict, Any +from constructs import Construct +from aws_cdk import ( + aws_apigateway as apigw, + aws_lambda as _lambda, + aws_logs as logs, + Duration, + CfnOutput +) +from .base_infrastructure import BaseInfrastructure +from .managed_services import ManagedServices +from .eks_cluster import EksCluster + + +class SupabaseServices(Construct): + """Construct for deploying Supabase services on EKS""" + + def __init__( + self, + scope: Construct, + construct_id: str, + infrastructure: BaseInfrastructure, + managed_services: ManagedServices, + eks_cluster: EksCluster, + **kwargs + ) -> None: + super().__init__(scope, construct_id, **kwargs) + + self.infrastructure = infrastructure + self.managed_services = managed_services + self.eks_cluster = eks_cluster + self.cluster = eks_cluster.cluster + + # Deploy namespace and base resources + self._create_namespace() + + # Create configuration and secrets + self._create_config_and_secrets() + + # Deploy core Supabase services + self._deploy_auth_service() + self._deploy_rest_service() + self._deploy_storage_service() + self._deploy_meta_service() + + # Create Edge Functions with API Gateway + Lambda + self._create_edge_functions() + + # Create ingress for routing + self._create_ingress() + + # Create outputs + self._create_outputs() + + def _create_namespace(self) -> None: + """Create Kubernetes namespace for Supabase""" + + self.cluster.add_manifest("SupabaseNamespace", { + "apiVersion": "v1", + "kind": "Namespace", + "metadata": { + "name": "supabase", + "labels": { + "name": "supabase" + } + } + }) + + def _create_config_and_secrets(self) -> None: + """Create ConfigMaps and Secrets for Supabase services""" + + # Get connection info from managed services + conn_info = self.managed_services.get_connection_info() + + # Create Supabase configuration Secret + self.cluster.add_manifest("SupabaseSecret", { + "apiVersion": "v1", + "kind": "Secret", + "metadata": { + "name": "supabase-config", + "namespace": "supabase" + }, + "type": "Opaque", + "stringData": { + # Database configuration + "POSTGRES_HOST": conn_info["aurora"]["endpoint"], + "POSTGRES_PORT": str(conn_info["aurora"]["port"]), + "POSTGRES_DB": "postgres", + "POSTGRES_PASSWORD": "PLACEHOLDER_FOR_SECRET_VALUE", + + # JWT Configuration - will need to be updated with real values + "JWT_SECRET": "your-super-secret-jwt-token-with-at-least-32-characters-long", + "JWT_EXPIRY": "3600", + + # API Keys - will need to be updated with real values + "ANON_KEY": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6InN0YWNrYWkiLCJyb2xlIjoiYW5vbiIsImlhdCI6MTY0Mjc3NjAwMCwiZXhwIjoxOTU4MzUyMDAwfQ.placeholder", + "SERVICE_ROLE_KEY": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZSIsInJlZiI6InN0YWNrYWkiLCJyb2xlIjoic2VydmljZV9yb2xlIiwiaWF0IjoxNjQyNzc2MDAwLCJleHAiOjE5NTgzNTIwMDB9.placeholder", + + # Site URLs + "SITE_URL": "https://app.stackai.com", + "API_EXTERNAL_URL": "https://api.stackai.com", + "SUPABASE_PUBLIC_URL": "https://api.stackai.com", + + # Storage configuration + "STORAGE_BACKEND": "s3", + "GLOBAL_S3_BUCKET": conn_info["storage"]["bucket_name"], + "AWS_DEFAULT_REGION": "us-east-1", + + # Email configuration (SES) + "SMTP_HOST": "email-smtp.us-east-1.amazonaws.com", + "SMTP_PORT": "587", + "SMTP_USER": "PLACEHOLDER_SES_SMTP_USER", + "SMTP_PASS": "PLACEHOLDER_SES_SMTP_PASS", + "SMTP_ADMIN_EMAIL": "admin@stackai.com", + "SMTP_SENDER_NAME": "StackAI", + + # Auth configuration + "DISABLE_SIGNUP": "false", + "ENABLE_EMAIL_SIGNUP": "true", + "ENABLE_EMAIL_AUTOCONFIRM": "false", + "ENABLE_PHONE_SIGNUP": "false", + "ENABLE_ANONYMOUS_USERS": "false", + + # Dashboard configuration + "DASHBOARD_USERNAME": "admin", + "DASHBOARD_PASSWORD": "PLACEHOLDER_DASHBOARD_PASSWORD", + } + }) + + # Create ConfigMap for common configuration + self.cluster.add_manifest("SupabaseConfigMap", { + "apiVersion": "v1", + "kind": "ConfigMap", + "metadata": { + "name": "supabase-config", + "namespace": "supabase" + }, + "data": { + "PGRST_DB_SCHEMAS": "public,storage,graphql_public", + "STUDIO_DEFAULT_ORGANIZATION": "StackAI", + "STUDIO_DEFAULT_PROJECT": "StackAI Platform" + } + }) + + def _deploy_auth_service(self) -> None: + """Deploy GoTrue authentication service""" + + # GoTrue Deployment + self.cluster.add_manifest("GoTrueDeployment", { + "apiVersion": "apps/v1", + "kind": "Deployment", + "metadata": { + "name": "gotrue", + "namespace": "supabase", + "labels": {"app": "gotrue"} + }, + "spec": { + "replicas": 2, + "selector": {"matchLabels": {"app": "gotrue"}}, + "template": { + "metadata": { + "labels": {"app": "gotrue"} + }, + "spec": { + "serviceAccountName": "gotrue-sa", + "containers": [{ + "name": "gotrue", + "image": "supabase/gotrue:v2.158.1", + "ports": [{"containerPort": 9999, "name": "http"}], + "env": [ + {"name": "GOTRUE_API_HOST", "value": "0.0.0.0"}, + {"name": "GOTRUE_API_PORT", "value": "9999"}, + { + "name": "GOTRUE_DB_DATABASE_URL", + "value": "postgres://supabase_auth_admin:$(POSTGRES_PASSWORD)@$(POSTGRES_HOST):$(POSTGRES_PORT)/$(POSTGRES_DB)" + }, + {"name": "GOTRUE_JWT_ADMIN_ROLES", "value": "service_role"}, + {"name": "GOTRUE_JWT_AUD", "value": "authenticated"}, + {"name": "GOTRUE_JWT_DEFAULT_GROUP_NAME", "value": "authenticated"}, + {"name": "GOTRUE_EXTERNAL_EMAIL_ENABLED", "value": "true"}, + {"name": "GOTRUE_EXTERNAL_ANONYMOUS_USERS_ENABLED", "value": "false"}, + {"name": "GOTRUE_MAILER_AUTOCONFIRM", "value": "false"} + ], + "envFrom": [ + {"secretRef": {"name": "supabase-config"}}, + {"configMapRef": {"name": "supabase-config"}} + ], + "resources": { + "requests": {"memory": "256Mi", "cpu": "100m"}, + "limits": {"memory": "512Mi", "cpu": "500m"} + }, + "livenessProbe": { + "httpGet": {"path": "/health", "port": 9999}, + "initialDelaySeconds": 30, + "periodSeconds": 15 + }, + "readinessProbe": { + "httpGet": {"path": "/health", "port": 9999}, + "initialDelaySeconds": 10, + "periodSeconds": 5 + } + }] + } + } + } + }) + + # GoTrue Service + self.cluster.add_manifest("GoTrueService", { + "apiVersion": "v1", + "kind": "Service", + "metadata": { + "name": "gotrue-svc", + "namespace": "supabase", + "labels": {"app": "gotrue"} + }, + "spec": { + "selector": {"app": "gotrue"}, + "ports": [{ + "port": 9999, + "targetPort": 9999, + "name": "http" + }], + "type": "ClusterIP" + } + }) + + def _deploy_rest_service(self) -> None: + """Deploy PostgREST API service""" + + # PostgREST Deployment + self.cluster.add_manifest("PostgRESTDeployment", { + "apiVersion": "apps/v1", + "kind": "Deployment", + "metadata": { + "name": "postgrest", + "namespace": "supabase", + "labels": {"app": "postgrest"} + }, + "spec": { + "replicas": 3, + "selector": {"matchLabels": {"app": "postgrest"}}, + "template": { + "metadata": { + "labels": {"app": "postgrest"}, + "annotations": { + "prometheus.io/scrape": "true", + "prometheus.io/port": "3000" + } + }, + "spec": { + "containers": [{ + "name": "postgrest", + "image": "postgrest/postgrest:v12.2.0", + "ports": [{"containerPort": 3000, "name": "http"}], + "env": [ + { + "name": "PGRST_DB_URI", + "value": "postgres://authenticator:$(POSTGRES_PASSWORD)@$(POSTGRES_HOST):$(POSTGRES_PORT)/$(POSTGRES_DB)" + }, + {"name": "PGRST_DB_ANON_ROLE", "value": "anon"}, + {"name": "PGRST_DB_USE_LEGACY_GUCS", "value": "false"} + ], + "envFrom": [ + {"secretRef": {"name": "supabase-config"}}, + {"configMapRef": {"name": "supabase-config"}} + ], + "resources": { + "requests": {"memory": "128Mi", "cpu": "50m"}, + "limits": {"memory": "256Mi", "cpu": "200m"} + }, + "livenessProbe": { + "httpGet": {"path": "/", "port": 3000}, + "initialDelaySeconds": 30, + "periodSeconds": 15 + }, + "readinessProbe": { + "httpGet": {"path": "/", "port": 3000}, + "initialDelaySeconds": 10, + "periodSeconds": 5 + } + }] + } + } + } + }) + + # PostgREST Service + self.cluster.add_manifest("PostgRESTService", { + "apiVersion": "v1", + "kind": "Service", + "metadata": { + "name": "postgrest-svc", + "namespace": "supabase", + "labels": {"app": "postgrest"} + }, + "spec": { + "selector": {"app": "postgrest"}, + "ports": [{ + "port": 3000, + "targetPort": 3000, + "name": "http" + }], + "type": "ClusterIP" + } + }) + + def _deploy_storage_service(self) -> None: + """Deploy Storage service for file management""" + + # Storage Deployment + self.cluster.add_manifest("StorageDeployment", { + "apiVersion": "apps/v1", + "kind": "Deployment", + "metadata": { + "name": "storage", + "namespace": "supabase", + "labels": {"app": "storage"} + }, + "spec": { + "replicas": 2, + "selector": {"matchLabels": {"app": "storage"}}, + "template": { + "metadata": { + "labels": {"app": "storage"}, + "annotations": { + "prometheus.io/scrape": "true", + "prometheus.io/port": "5000" + } + }, + "spec": { + "serviceAccountName": "storage-sa", + "containers": [{ + "name": "storage", + "image": "supabase/storage-api:v1.11.13", + "ports": [{"containerPort": 5000, "name": "http"}], + "env": [ + {"name": "POSTGREST_URL", "value": "http://postgrest-svc:3000"}, + { + "name": "DATABASE_URL", + "value": "postgres://supabase_storage_admin:$(POSTGRES_PASSWORD)@$(POSTGRES_HOST):$(POSTGRES_PORT)/$(POSTGRES_DB)" + }, + {"name": "FILE_SIZE_LIMIT", "value": "52428800"}, + {"name": "TENANT_ID", "value": "stub"}, + {"name": "REGION", "value": "us-east-1"}, + {"name": "ENABLE_IMAGE_TRANSFORMATION", "value": "false"} + ], + "envFrom": [ + {"secretRef": {"name": "supabase-config"}}, + {"configMapRef": {"name": "supabase-config"}} + ], + "resources": { + "requests": {"memory": "256Mi", "cpu": "100m"}, + "limits": {"memory": "512Mi", "cpu": "300m"} + }, + "livenessProbe": { + "httpGet": {"path": "/status", "port": 5000}, + "initialDelaySeconds": 30, + "periodSeconds": 15 + }, + "readinessProbe": { + "httpGet": {"path": "/status", "port": 5000}, + "initialDelaySeconds": 10, + "periodSeconds": 5 + } + }] + } + } + } + }) + + # Storage Service + self.cluster.add_manifest("StorageService", { + "apiVersion": "v1", + "kind": "Service", + "metadata": { + "name": "storage-svc", + "namespace": "supabase", + "labels": {"app": "storage"} + }, + "spec": { + "selector": {"app": "storage"}, + "ports": [{ + "port": 5000, + "targetPort": 5000, + "name": "http" + }], + "type": "ClusterIP" + } + }) + + def _deploy_meta_service(self) -> None: + """Deploy pg-meta service for database management""" + + # pg-meta Deployment + self.cluster.add_manifest("PgMetaDeployment", { + "apiVersion": "apps/v1", + "kind": "Deployment", + "metadata": { + "name": "pgmeta", + "namespace": "supabase", + "labels": {"app": "pgmeta"} + }, + "spec": { + "replicas": 1, + "selector": {"matchLabels": {"app": "pgmeta"}}, + "template": { + "metadata": { + "labels": {"app": "pgmeta"} + }, + "spec": { + "containers": [{ + "name": "pgmeta", + "image": "supabase/postgres-meta:v0.83.2", + "ports": [{"containerPort": 8080, "name": "http"}], + "env": [ + {"name": "PG_META_PORT", "value": "8080"}, + {"name": "PG_META_DB_HOST", "valueFrom": {"secretKeyRef": {"name": "supabase-config", "key": "POSTGRES_HOST"}}}, + {"name": "PG_META_DB_PORT", "valueFrom": {"secretKeyRef": {"name": "supabase-config", "key": "POSTGRES_PORT"}}}, + {"name": "PG_META_DB_NAME", "valueFrom": {"secretKeyRef": {"name": "supabase-config", "key": "POSTGRES_DB"}}}, + {"name": "PG_META_DB_USER", "value": "supabase_admin"}, + {"name": "PG_META_DB_PASSWORD", "valueFrom": {"secretKeyRef": {"name": "supabase-config", "key": "POSTGRES_PASSWORD"}}} + ], + "resources": { + "requests": {"memory": "128Mi", "cpu": "50m"}, + "limits": {"memory": "256Mi", "cpu": "200m"} + }, + "livenessProbe": { + "httpGet": {"path": "/health", "port": 8080}, + "initialDelaySeconds": 30, + "periodSeconds": 15 + }, + "readinessProbe": { + "httpGet": {"path": "/health", "port": 8080}, + "initialDelaySeconds": 10, + "periodSeconds": 5 + } + }] + } + } + } + }) + + # pg-meta Service + self.cluster.add_manifest("PgMetaService", { + "apiVersion": "v1", + "kind": "Service", + "metadata": { + "name": "pgmeta-svc", + "namespace": "supabase", + "labels": {"app": "pgmeta"} + }, + "spec": { + "selector": {"app": "pgmeta"}, + "ports": [{ + "port": 8080, + "targetPort": 8080, + "name": "http" + }], + "type": "ClusterIP" + } + }) + + def _create_edge_functions(self) -> None: + """Create Edge Functions using API Gateway + Lambda""" + + # Create Lambda function for Edge Functions + self.edge_function = _lambda.Function( + self, "EdgeFunctionLambda", + runtime=_lambda.Runtime.NODEJS_18_X, + handler="index.handler", + code=_lambda.Code.from_inline(""" +const crypto = require('crypto'); + +exports.handler = async (event) => { + console.log('Request:', JSON.stringify(event, null, 2)); + + const response = { + statusCode: 200, + headers: { + 'Content-Type': 'application/json', + 'Access-Control-Allow-Origin': '*', + 'Access-Control-Allow-Methods': 'GET, POST, PUT, DELETE, OPTIONS', + 'Access-Control-Allow-Headers': 'Content-Type, Authorization' + }, + body: JSON.stringify({ + message: 'Hello from Supabase Edge Functions!', + timestamp: new Date().toISOString(), + request_id: crypto.randomUUID(), + path: event.path, + method: event.httpMethod + }) + }; + + return response; +}; + """), + log_retention=logs.RetentionDays.ONE_WEEK, + timeout=Duration.seconds(30), + memory_size=256, + environment={ + "NODE_ENV": "production" + } + ) + + # Create API Gateway for Edge Functions + self.edge_api = apigw.RestApi( + self, "EdgeFunctionsApi", + rest_api_name="StackAI-EdgeFunctions", + description="Supabase Edge Functions API", + default_cors_preflight_options=apigw.CorsOptions( + allow_origins=apigw.Cors.ALL_ORIGINS, + allow_methods=apigw.Cors.ALL_METHODS, + allow_headers=["Content-Type", "Authorization", "X-Requested-With"] + ), + endpoint_configuration=apigw.EndpointConfiguration( + types=[apigw.EndpointType.REGIONAL] + ) + ) + + # Create functions resource with proxy integration + functions_resource = self.edge_api.root.add_resource("functions") + functions_resource.add_proxy( + default_integration=apigw.LambdaIntegration( + self.edge_function, + proxy=True, + allow_test_invoke=True + ), + any_method=True + ) + + def _create_ingress(self) -> None: + """Create ALB Ingress for external access""" + + self.cluster.add_manifest("SupabaseIngress", { + "apiVersion": "networking.k8s.io/v1", + "kind": "Ingress", + "metadata": { + "name": "supabase-ingress", + "namespace": "supabase", + "annotations": { + "kubernetes.io/ingress.class": "alb", + "alb.ingress.kubernetes.io/scheme": "internet-facing", + "alb.ingress.kubernetes.io/target-type": "ip", + "alb.ingress.kubernetes.io/listen-ports": '[{"HTTP":80}, {"HTTPS":443}]', + "alb.ingress.kubernetes.io/ssl-redirect": "443", + "alb.ingress.kubernetes.io/healthcheck-path": "/health", + "alb.ingress.kubernetes.io/healthcheck-interval-seconds": "30", + "alb.ingress.kubernetes.io/healthcheck-timeout-seconds": "5", + "alb.ingress.kubernetes.io/healthy-threshold-count": "2", + "alb.ingress.kubernetes.io/unhealthy-threshold-count": "2" + } + }, + "spec": { + "rules": [{ + "http": { + "paths": [ + { + "path": "/auth", + "pathType": "Prefix", + "backend": { + "service": { + "name": "gotrue-svc", + "port": {"number": 9999} + } + } + }, + { + "path": "/rest", + "pathType": "Prefix", + "backend": { + "service": { + "name": "postgrest-svc", + "port": {"number": 3000} + } + } + }, + { + "path": "/storage", + "pathType": "Prefix", + "backend": { + "service": { + "name": "storage-svc", + "port": {"number": 5000} + } + } + }, + { + "path": "/pg", + "pathType": "Prefix", + "backend": { + "service": { + "name": "pgmeta-svc", + "port": {"number": 8080} + } + } + } + ] + } + }] + } + }) + + def _create_outputs(self) -> None: + """Create CloudFormation outputs""" + + CfnOutput( + self, "EdgeFunctionsApiUrl", + value=self.edge_api.url, + description="Supabase Edge Functions API URL" + ) + + CfnOutput( + self, "EdgeFunctionsApiId", + value=self.edge_api.rest_api_id, + description="Edge Functions API Gateway ID" + ) + + def get_service_info(self) -> Dict[str, Any]: + """Return service information""" + return { + "edge_functions_url": self.edge_api.url, + "edge_functions_api_id": self.edge_api.rest_api_id, + "namespace": "supabase" + } \ No newline at end of file diff --git a/aws/lib/stackai_eks_cdk_stack.py b/aws/lib/stackai_eks_cdk_stack.py new file mode 100644 index 0000000..015c3ad --- /dev/null +++ b/aws/lib/stackai_eks_cdk_stack.py @@ -0,0 +1,130 @@ +""" +StackAI EKS CDK Stack + +This is the main CDK stack that orchestrates all constructs to create +a complete AWS infrastructure for running StackAI with Supabase on EKS. + +Architecture: +- VPC with public/private subnets +- Aurora Serverless v2 (PostgreSQL) for Supabase +- DocumentDB for MongoDB workloads +- ElastiCache Redis for caching +- S3 for storage +- EKS cluster with managed node groups +- Supabase services deployed on Kubernetes +- ALB for ingress traffic +- API Gateway + Lambda for Edge Functions +""" +from constructs import Construct +from aws_cdk import ( + Stack, + CfnOutput, + Tags +) + +from .constructs.base_infrastructure import BaseInfrastructure +from .constructs.managed_services import ManagedServices +from .constructs.eks_cluster import EksCluster +from .constructs.supabase import SupabaseServices + + +class StackaiEksCdkStack(Stack): + """Main CDK stack for StackAI on AWS EKS""" + + def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None: + super().__init__(scope, construct_id, **kwargs) + + # 1. Create base infrastructure (VPC, subnets, security groups) + self.infrastructure = BaseInfrastructure( + self, "BaseInfrastructure" + ) + + # 2. Create managed AWS services (RDS, DocumentDB, ElastiCache, S3) + self.managed_services = ManagedServices( + self, "ManagedServices", + infrastructure=self.infrastructure + ) + + # 3. Create EKS cluster with node groups and add-ons + self.eks_cluster = EksCluster( + self, "EksCluster", + infrastructure=self.infrastructure, + managed_services=self.managed_services + ) + + # 4. Deploy Supabase services to EKS + self.supabase_services = SupabaseServices( + self, "SupabaseServices", + infrastructure=self.infrastructure, + managed_services=self.managed_services, + eks_cluster=self.eks_cluster + ) + + # Add tags to all resources + self._add_common_tags() + + # Create stack-level outputs + self._create_stack_outputs() + + def _add_common_tags(self) -> None: + """Add common tags to all resources in the stack""" + + Tags.of(self).add("Project", "StackAI") + Tags.of(self).add("Environment", "Production") + Tags.of(self).add("Owner", "StackAI-Team") + Tags.of(self).add("ManagedBy", "CDK") + Tags.of(self).add("CostCenter", "Engineering") + Tags.of(self).add("Version", "1.0.0") + + def _create_stack_outputs(self) -> None: + """Create high-level outputs for the entire stack""" + + # Connection information for external access + CfnOutput( + self, "StackAI-QuickStart", + value=( + f"1. Configure kubectl: aws eks update-kubeconfig --region {self.region} --name {self.eks_cluster.cluster.cluster_name}\n" + f"2. Check pods: kubectl get pods -n supabase\n" + f"3. Get ALB URL: kubectl get ingress -n supabase\n" + f"4. Edge Functions URL: {self.supabase_services.edge_api.url}" + ), + description="Quick start commands to access your StackAI deployment" + ) + + CfnOutput( + self, "ImportantEndpoints", + value=( + f"Aurora: {self.managed_services.aurora_cluster.cluster_endpoint.hostname}\n" + f"DocumentDB: {self.managed_services.docdb_cluster.attr_endpoint}\n" + f"Redis: {self.managed_services.redis_cluster.attr_configuration_end_point_address}\n" + f"S3 Bucket: {self.managed_services.storage_bucket.bucket_name}\n" + f"Edge Functions: {self.supabase_services.edge_api.url}" + ), + description="Important service endpoints" + ) + + # Debugging information + CfnOutput( + self, "DebuggingInfo", + value=( + f"VPC ID: {self.infrastructure.vpc.vpc_id}\n" + f"Cluster Name: {self.eks_cluster.cluster.cluster_name}\n" + f"Region: {self.region}\n" + f"Account: {self.account}" + ), + description="Information for debugging and troubleshooting" + ) + + @property + def cluster_info(self): + """Return cluster information for external access""" + return self.eks_cluster.get_cluster_info() + + @property + def service_endpoints(self): + """Return all service endpoints""" + return { + **self.managed_services.get_connection_info(), + **self.supabase_services.get_service_info(), + "cluster": self.cluster_info + } \ No newline at end of file diff --git a/aws/requirements.txt b/aws/requirements.txt new file mode 100644 index 0000000..ad1aa94 --- /dev/null +++ b/aws/requirements.txt @@ -0,0 +1,3 @@ +aws-cdk-lib==2.117.0 +constructs>=10.0.0,<11.0.0 +boto3>=1.28.0 \ No newline at end of file diff --git a/aws/setup-aws-user.sh b/aws/setup-aws-user.sh new file mode 100755 index 0000000..cc65637 --- /dev/null +++ b/aws/setup-aws-user.sh @@ -0,0 +1,212 @@ +#!/bin/bash + +# StackAI AWS User Setup Script +# This script creates a dedicated IAM user for StackAI deployment with proper permissions and resource grouping + +set -e + +echo "πŸš€ StackAI AWS User Setup" +echo "=========================" +echo "" + +# Colors for output +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Check if AWS CLI is installed +if ! command -v aws &> /dev/null; then + echo -e "${RED}❌ AWS CLI not found. Please install AWS CLI first.${NC}" + exit 1 +fi + +# Check if jq is installed +if ! command -v jq &> /dev/null; then + echo -e "${YELLOW}⚠️ jq not found. Installing via package manager...${NC}" + # Try to install jq based on OS + if [[ "$OSTYPE" == "darwin"* ]]; then + if command -v brew &> /dev/null; then + brew install jq + else + echo -e "${RED}❌ Please install jq manually: brew install jq${NC}" + exit 1 + fi + elif [[ "$OSTYPE" == "linux-gnu"* ]]; then + sudo apt-get update && sudo apt-get install -y jq + else + echo -e "${RED}❌ Please install jq manually${NC}" + exit 1 + fi +fi + +# Verify AWS credentials are configured +echo -e "${BLUE}πŸ” Checking AWS credentials...${NC}" +if ! aws sts get-caller-identity &> /dev/null; then + echo -e "${RED}❌ No AWS credentials configured. Please run 'aws configure' first with admin credentials.${NC}" + exit 1 +fi + +ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) +echo -e "${GREEN}βœ… AWS Account ID: ${ACCOUNT_ID}${NC}" + +# Step 1: Create the deployment user +echo -e "\n${BLUE}πŸ“ Step 1: Creating StackAI deployment user...${NC}" + +if aws iam get-user --user-name stackai-deployment-user &> /dev/null; then + echo -e "${YELLOW}⚠️ User 'stackai-deployment-user' already exists. Skipping creation.${NC}" +else + aws iam create-user \ + --user-name stackai-deployment-user \ + --tags Key=Project,Value=StackAI Key=Purpose,Value=Deployment \ + --path /stackai/ + echo -e "${GREEN}βœ… User created successfully${NC}" +fi + +# Step 2: Create access key +echo -e "\n${BLUE}πŸ”‘ Step 2: Creating access key...${NC}" + +# Check if access key already exists +ACCESS_KEYS=$(aws iam list-access-keys --user-name stackai-deployment-user --query 'AccessKeyMetadata[].AccessKeyId' --output text) +if [ ! -z "$ACCESS_KEYS" ]; then + echo -e "${YELLOW}⚠️ Access key already exists for user. Using existing key.${NC}" + echo -e "${YELLOW}πŸ’‘ If you need a new key, delete the existing one first:${NC}" + echo -e "${YELLOW} aws iam delete-access-key --user-name stackai-deployment-user --access-key-id $ACCESS_KEYS${NC}" +else + aws iam create-access-key \ + --user-name stackai-deployment-user > stackai-deployment-user-keys.json + echo -e "${GREEN}βœ… Access key created and saved to stackai-deployment-user-keys.json${NC}" + echo -e "${YELLOW}πŸ” Please save these credentials securely:${NC}" + cat stackai-deployment-user-keys.json | jq . +fi + +# Step 3: Create IAM policy +echo -e "\n${BLUE}πŸ“‹ Step 3: Creating IAM policy...${NC}" + +cat > stackai-deployment-policy.json << 'EOF' +{ + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "cloudformation:*", + "iam:*", + "ec2:*", + "eks:*", + "rds:*", + "docdb:*", + "elasticache:*", + "s3:*", + "secretsmanager:*", + "apigateway:*", + "lambda:*", + "logs:*", + "ses:*", + "acm:*", + "route53:*", + "elasticloadbalancing:*", + "autoscaling:*", + "ssm:*", + "kms:*", + "sts:*", + "tag:*" + ], + "Resource": "*" + }, + { + "Effect": "Allow", + "Action": [ + "iam:PassRole" + ], + "Resource": "*" + } + ] +} +EOF + +POLICY_ARN="arn:aws:iam::${ACCOUNT_ID}:policy/StackAIDeploymentPolicy" + +if aws iam get-policy --policy-arn "$POLICY_ARN" &> /dev/null; then + echo -e "${YELLOW}⚠️ Policy 'StackAIDeploymentPolicy' already exists. Skipping creation.${NC}" +else + aws iam create-policy \ + --policy-name StackAIDeploymentPolicy \ + --policy-document file://stackai-deployment-policy.json \ + --description "Policy for StackAI CDK deployment" \ + --tags Key=Project,Value=StackAI Key=Purpose,Value=Deployment + echo -e "${GREEN}βœ… Policy created successfully${NC}" +fi + +# Step 4: Attach policy to user +echo -e "\n${BLUE}πŸ”— Step 4: Attaching policy to user...${NC}" + +aws iam attach-user-policy \ + --user-name stackai-deployment-user \ + --policy-arn "$POLICY_ARN" +echo -e "${GREEN}βœ… Policy attached successfully${NC}" + +# Step 5: Create resource group +echo -e "\n${BLUE}πŸ“¦ Step 5: Creating resource group...${NC}" + +if aws resource-groups get-group --group-name "StackAI-Infrastructure" &> /dev/null; then + echo -e "${YELLOW}⚠️ Resource group 'StackAI-Infrastructure' already exists. Skipping creation.${NC}" +else + aws resource-groups create-group \ + --name "StackAI-Infrastructure" \ + --description "All AWS resources for StackAI deployment" \ + --resource-query '{ + "Type": "TAG_FILTERS_1_0", + "Query": "{\"ResourceTypeFilters\":[\"AWS::AllSupported\"],\"TagFilters\":[{\"Key\":\"Project\",\"Values\":[\"StackAI\"]}]}" + }' \ + --tags Project=StackAI,Environment=Production,ManagedBy=CDK + echo -e "${GREEN}βœ… Resource group created successfully${NC}" +fi + +# Step 6: Configure AWS CLI profile +echo -e "\n${BLUE}βš™οΈ Step 6: Configuring AWS CLI profile...${NC}" + +if [ -f "stackai-deployment-user-keys.json" ]; then + ACCESS_KEY_ID=$(cat stackai-deployment-user-keys.json | jq -r '.AccessKey.AccessKeyId') + SECRET_ACCESS_KEY=$(cat stackai-deployment-user-keys.json | jq -r '.AccessKey.SecretAccessKey') + + # Get current region + CURRENT_REGION=$(aws configure get region 2>/dev/null || echo "us-east-1") + + # Configure the profile + aws configure set aws_access_key_id "$ACCESS_KEY_ID" --profile stackai-deployment + aws configure set aws_secret_access_key "$SECRET_ACCESS_KEY" --profile stackai-deployment + aws configure set region "$CURRENT_REGION" --profile stackai-deployment + aws configure set output json --profile stackai-deployment + + echo -e "${GREEN}βœ… AWS CLI profile 'stackai-deployment' configured${NC}" + + # Test the configuration + echo -e "\n${BLUE}πŸ§ͺ Testing the new configuration...${NC}" + AWS_PROFILE=stackai-deployment aws sts get-caller-identity + echo -e "${GREEN}βœ… Configuration test successful${NC}" + +else + echo -e "${YELLOW}⚠️ Access key file not found. Please configure manually:${NC}" + echo -e "${YELLOW} aws configure --profile stackai-deployment${NC}" +fi + +echo -e "\n${GREEN}πŸŽ‰ Setup Complete!${NC}" +echo -e "${GREEN}==================${NC}" +echo "" +echo -e "${BLUE}Next steps:${NC}" +echo -e "1. Export the profile: ${YELLOW}export AWS_PROFILE=stackai-deployment${NC}" +echo -e "2. Bootstrap CDK: ${YELLOW}cdk bootstrap${NC}" +echo -e "3. Deploy the stack: ${YELLOW}cdk deploy StackaiEksCdkStack${NC}" +echo "" +echo -e "${BLUE}Files created:${NC}" +echo -e "- stackai-deployment-user-keys.json (${RED}keep secure!${NC})" +echo -e "- stackai-deployment-policy.json" +echo "" +echo -e "${BLUE}To use this setup:${NC}" +echo -e "${YELLOW}export AWS_PROFILE=stackai-deployment${NC}" +echo -e "${YELLOW}aws sts get-caller-identity${NC}" +echo "" +echo -e "${BLUE}To clean up everything later:${NC}" +echo -e "${YELLOW}./cleanup-aws-resources.sh${NC}" \ No newline at end of file