Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@

[![Go Version](https://img.shields.io/badge/go-%3E%3D1.25.5-blue.svg)](https://golang.org/) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

This application is primarily intended for exploring technical concepts. My goal is to experiment with different technologies, software architecture designs, and all the essential components involved in building distributed systems in Golang.
This application is primarily intended for exploring technical concepts. My goal is to experiment with different technologies, software architecture designs, and all the essential components involved in building distributed systems in Golang, simulating.

## Features :sparkles:

- `Event-driven architecture` using `Kafka` for event streaming, `Redis PubSub` for real-time message broadcasting, and `Asynq` for distributed task scheduling (order and cart services)
- Hybrid communication model utilizing `gRPC` for high-performance, synchronous inter-service calls alongside an `Event-Driven Architecture (EDA)` with `Apache Kafka` for persistent event streaming, `Redis Pub/Sub` for real-time broadcasting, and `Asynq` for distributed task scheduling
- `Clean Architecture` (entity, repository, service, handler) with `Domain-Driven Design (DDD)` principles across all services
- Each microservice has its own dedicated `PostgreSQL` database instance
- 3-node `Kafka Cluster` running on `KRaft mode` (ZooKeeper-free)
- 6-node `Redis Cluster` (3 masters + 3 replicas)
- Central instrumentation using `OpenTelemetry` combined with LGTM stack (`Loki, Grafana, Tempo, Prometheus`) and `Alloy` as telemetry collector
- Unified observability pipeline using `OpenTelemetry` combined with LGTM stack (`Loki, Grafana, Tempo, Prometheus`) and `Alloy` as telemetry collector
- Two local development options:
- `Docker Compose` setup for rapid development (infrastructure + 7 core services)
- `Kubernetes Cluster` with `Tilt + (Kind or MicroK8s)` for hot reload in a production-like environment with all 11 services
Expand All @@ -21,8 +21,8 @@ This application is primarily intended for exploring technical concepts. My goal
- Infrastructure as Code with `Terraform` for GKE cluster provisioning on GCP
- `Kubernetes` for robust, scalable container orchestration in production environments
- Secure authentication implemented via `JWT` with `RS256` asymmetric algorithm and refresh token rotation
- Unified REST `API Gateway` and `GraphQL Federation` for type-safe client-server communication
- Internal communication via synchronous `gRPC calls` for microservices to interact with each other.
- Implemented `GraphQL Federation` and `REST Gateways` to provide a type-safe, unified interface for complex microservices
- Implemented API-first development standards using `OpenAPI 3` to automate documentation and client generation
- Database Management with schema migrations handled by `golang-migrate`
- Validation using `go-playground/validator` for input sanitization
- Order creation is implemented using two saga orchestration options:
Expand Down Expand Up @@ -663,7 +663,7 @@ terraform/

**Key Features**:

- 5-tier node pool architecture (stateful, stateless, monitoring, control-plane, gateway)
- 5-tier node pool architecture (stateful, stateless, monitoring, infra, gateway)
- Spot VMs for stateless workloads (~60% cost savings)
- Automated TLS with cert-manager and Let's Encrypt
- External Secrets Operator for GCP Secret Manager integration
Expand Down
18 changes: 9 additions & 9 deletions deployments/helm/operators/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Operators extend Kubernetes functionality by managing custom resources.

## Operators vs Custom Resources

### Operator (Control Plane) - The Manager
### Operator (Infra Pool) - The Manager

The operator is **controller software** that watches and manages resources.

Expand Down Expand Up @@ -40,20 +40,20 @@ operators/

```
┌─────────────────────────────────────┐
│ Operator (Control Plane)
│ Installed once via Helm │
│ Location: operators/*/values.yaml │
│ Operator
│ Installed once via Helm
│ Location: operators/*/values.yaml
└─────────────────────────────────────┘
↓ Watches & Manages ↓
┌─────────────────────────────────────┐
│ Custom Resources (Data Plane) │
│ Deployed via Kustomize │
│ Location: {postgres,kafka,redis}/ │
│ Custom Resources (Data Plane)
│ Deployed via Kustomize
│ Location: {postgres,kafka,redis}/
└─────────────────────────────────────┘
↓ Creates & Manages ↓
┌─────────────────────────────────────┐
│ Actual Workloads │
│ PostgreSQL pods, Kafka brokers, etc│
│ Actual Workloads
│ PostgreSQL pods, Kafka brokers, etc
└─────────────────────────────────────┘
```

Expand Down
4 changes: 2 additions & 2 deletions deployments/k8s/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Kubernetes Deployments - GitOps

This directory contains Kubernetes manifests for deploying the go-micro-commerce platform using **ArgoCD GitOps** with an industry-standard hybrid pattern.
This directory contains Kubernetes manifests for deploying the go-micro-commerce platform using **ArgoCD GitOps**.

## Architecture Overview

Expand Down Expand Up @@ -121,7 +121,7 @@ deployments/k8s/
Contains **ApplicationSet** manifests that define auto-discovery patterns:

- **Purpose**: Meta-layer that generates ArgoCD Applications
- **Pattern**: Industry-standard GitOps control plane
- **Pattern**: Industry-standard GitOps structure
- **Managed by**: Terraform bootstrap ApplicationSet

### `/infrastructure/` - Platform Services
Expand Down
2 changes: 1 addition & 1 deletion scripts/k8s-kind-setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ cat <<EOF | kind create cluster --name "${CLUSTER_NAME}" --config=-
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: infra
kubeadmConfigPatches:
- |
kind: InitConfiguration
Expand Down
32 changes: 14 additions & 18 deletions terraform/README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,31 @@
# Terraform Infrastructure

Enterprise-grade infrastructure as code for the Go Micro-Commerce platform on Google Cloud Platform (GCP).

## Overview

This Terraform configuration provisions a cost-optimized, production-ready GKE cluster with:

- **5-Tier Node Pool Architecture**: Dedicated pools for stateful (databases), stateless (microservices), monitoring (observability), control-plane (operators), and gateway (ingress) workloads
- **5-Tier Node Pool Architecture**: Dedicated pools for stateful (databases), stateless (microservices), monitoring (observability), infra (operators), and gateway (ingress) workloads
- **Cost Optimization**: Right-sized node pools for learning/testing environments
- **Complete Operator Stack**: CloudNative PostgreSQL, Strimzi Kafka, Redis Operator
- **Complete OSS Operator Stack**: CloudNative PostgreSQL, Strimzi Kafka, Redis Operator
- **Full Observability**: Prometheus, Grafana, Loki, Tempo, Alloy monitoring stack with dedicated pool
- **GitOps Ready**: ArgoCD for application deployments on dedicated control plane pool
- **GitOps Ready**: ArgoCD for application deployments on dedicated infra pool
- **Production Ingress**: Traefik ingress controller with dedicated gateway pool

### Cost Breakdown

**Note**: Optimized for learning/testing with 190GB total disk allocation in asia-southeast2 region.
**Note**: This configuration is optimized for learning and testing environments with limited resources, ensuring that the total disk allocation does not exceed 250GB in the asia-southeast2 region.

| Component | Configuration |
| ------------------------ | ---------------------------------------------- |
| **Stateful Pool** | 3 × e2-medium (regular VMs, 30GB balanced) |
| **Stateless Pool** | 2 × e2-standard-2 (regular VMs, 20GB balanced) |
| **Monitoring Pool** | 1 × e2-standard-2 (regular VMs, 25GB balanced) |
| **Control Plane Pool** | 1 × e2-standard-2 (regular VMs, 20GB balanced) |
| **Infra Pool** | 1 × e2-standard-2 (regular VMs, 20GB balanced) |
| **Gateway Pool** | 1 × e2-medium (regular VMs, 15GB balanced) |
| **Frontend Hosting** | Cloudflare Pages (React + Vite) |
| **Total Infrastructure** | - |

**Total Disk Allocation**: 190GB (90GB stateful + 40GB stateless + 25GB monitoring + 20GB control plane + 15GB gateway)
**Total Disk Allocation**: 190GB (90GB stateful + 40GB stateless + 25GB monitoring + 20GB infra + 15GB gateway)

## Frontend Deployment

Expand Down Expand Up @@ -170,13 +168,13 @@ Provisions GKE cluster with 5-tier node pool architecture:
- Autoscaling capable (min/max nodes configurable)
- Taint: `workload-type=monitoring:NoSchedule`

**Control Plane Pool** (Operators, ArgoCD, ESO)
**Infra Pool** (Operators, ArgoCD, ESO)

- 1 × e2-standard-2 nodes (2 vCPU, 8GB RAM)
- 20GB balanced persistent disk per node (20GB total)
- Regular VMs for control plane reliability
- Regular VMs for infra reliability
- Autoscaling capable (min/max nodes configurable)
- Taint: `workload-type=control-plane:NoSchedule`
- Taint: `workload-type=infra:NoSchedule`

**Gateway Pool** (Traefik, Apollo Router, API Gateway)

Expand All @@ -190,7 +188,7 @@ Provisions GKE cluster with 5-tier node pool architecture:

- **Private nodes enabled** (nodes have no external IPs)
- **Cloud NAT** for outbound internet access
- **Public control plane endpoint** (for kubectl access)
- **Public infra endpoint** (for kubectl access)
- Workload Identity enabled
- Shielded nodes enabled
- Private Google access
Expand Down Expand Up @@ -541,19 +539,19 @@ spec:
workload-type: "monitoring"
```

### Control Plane Workloads (Operators, ArgoCD, ESO)
### Infra Workloads (Operators, ArgoCD, ESO)

To schedule on the control plane pool:
To schedule on the infra pool:

```yaml
spec:
tolerations:
- key: "workload-type"
operator: "Equal"
value: "control-plane"
value: "infra"
effect: "NoSchedule"
nodeSelector:
workload-type: "control-plane"
workload-type: "infra"
```

### Gateway Workloads (Traefik, Apollo Router, API Gateway)
Expand Down Expand Up @@ -600,8 +598,6 @@ cat ~/.ssh/argocd-repo.pub
export TF_VAR_argocd_git_ssh_private_key=$(cat ~/.ssh/argocd-repo)
```

**📖 See [ARGOCD_AUTHENTICATION.md](./ARGOCD_AUTHENTICATION.md) for detailed setup instructions.**

### 2. Update ArgoCD Configuration

Edit `terraform/environments/prod/terraform.tfvars`:
Expand Down
14 changes: 7 additions & 7 deletions terraform/environments/prod/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -55,13 +55,13 @@ module "gke_cluster" {
monitoring_pool_disk_size_gb = var.monitoring_pool_disk_size_gb
monitoring_pool_disk_type = var.monitoring_pool_disk_type

# Control plane pool (operators, ArgoCD, ESO)
control_plane_pool_enabled = var.control_plane_pool_enabled
control_plane_pool_min_nodes = var.control_plane_pool_min_nodes
control_plane_pool_max_nodes = var.control_plane_pool_max_nodes
control_plane_pool_machine_type = var.control_plane_pool_machine_type
control_plane_pool_disk_size_gb = var.control_plane_pool_disk_size_gb
control_plane_pool_disk_type = var.control_plane_pool_disk_type
# Infra pool (operators, ArgoCD, ESO)
infra_pool_enabled = var.infra_pool_enabled
infra_pool_min_nodes = var.infra_pool_min_nodes
infra_pool_max_nodes = var.infra_pool_max_nodes
infra_pool_machine_type = var.infra_pool_machine_type
infra_pool_disk_size_gb = var.infra_pool_disk_size_gb
infra_pool_disk_type = var.infra_pool_disk_type

# Gateway pool (Traefik, Apollo Router, API Gateway)
gateway_pool_enabled = var.gateway_pool_enabled
Expand Down
6 changes: 3 additions & 3 deletions terraform/environments/prod/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -75,9 +75,9 @@ output "monitoring_pool_name" {
value = module.gke_cluster.monitoring_pool_name
}

output "control_plane_pool_name" {
description = "Control plane node pool name (operators, ArgoCD, ESO)"
value = module.gke_cluster.control_plane_pool_name
output "infra_pool_name" {
description = "Infra node pool name (operators, ArgoCD, ESO)"
value = module.gke_cluster.infra_pool_name
}

output "gateway_pool_name" {
Expand Down
18 changes: 9 additions & 9 deletions terraform/environments/prod/terraform.tfvars.example
Original file line number Diff line number Diff line change
Expand Up @@ -61,15 +61,15 @@ monitoring_pool_disk_size_gb = 25 # 25GB balanced disk per node (2
monitoring_pool_disk_type = "pd-balanced" # Balanced disk for cost efficiency

# ============================================================================
# Control Plane Node Pool (Operators: ArgoCD, ESO, CNPG, Strimzi, Redis)
# Regular VMs with autoscaling for control plane
# ============================================================================
control_plane_pool_enabled = true # Enable control plane node pool
control_plane_pool_min_nodes = 1 # Minimum nodes (autoscaling)
control_plane_pool_max_nodes = 1 # Maximum nodes (autoscaling)
control_plane_pool_machine_type = "e2-standard-2" # 2 vCPU, 8GB RAM
control_plane_pool_disk_size_gb = 20 # 20GB balanced disk per node (1 node × 20GB = 20GB)
control_plane_pool_disk_type = "pd-balanced" # Balanced disk for cost efficiency
# Infra Node Pool (Operators: ArgoCD, ESO, CNPG, Strimzi, Redis)
# Regular VMs with autoscaling for infra
# ============================================================================
infra_pool_enabled = true # Enable infra node pool
infra_pool_min_nodes = 1 # Minimum nodes (autoscaling)
infra_pool_max_nodes = 1 # Maximum nodes (autoscaling)
infra_pool_machine_type = "e2-standard-2" # 2 vCPU, 8GB RAM
infra_pool_disk_size_gb = 20 # 20GB balanced disk per node (1 node × 20GB = 20GB)
infra_pool_disk_type = "pd-balanced" # Balanced disk for cost efficiency

# ============================================================================
# Gateway Node Pool (Ingress: Traefik, Apollo Router, API Gateway)
Expand Down
26 changes: 13 additions & 13 deletions terraform/environments/prod/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -217,39 +217,39 @@ variable "monitoring_pool_disk_type" {
default = "pd-balanced"
}

# Control Plane Pool (Operators, ArgoCD, ESO)
variable "control_plane_pool_enabled" {
description = "Enable control plane node pool"
# Infra Pool (Operators, ArgoCD, ESO)
variable "infra_pool_enabled" {
description = "Enable infra node pool"
type = bool
default = true
}

variable "control_plane_pool_min_nodes" {
description = "Minimum nodes in control plane pool"
variable "infra_pool_min_nodes" {
description = "Minimum nodes in infra pool"
type = number
default = 1
}

variable "control_plane_pool_max_nodes" {
description = "Maximum nodes in control plane pool"
variable "infra_pool_max_nodes" {
description = "Maximum nodes in infra pool"
type = number
default = 2
}

variable "control_plane_pool_machine_type" {
description = "Machine type for control plane pool"
variable "infra_pool_machine_type" {
description = "Machine type for infra pool"
type = string
default = "e2-small"
}

variable "control_plane_pool_disk_size_gb" {
description = "Disk size for control plane pool"
variable "infra_pool_disk_size_gb" {
description = "Disk size for infra pool"
type = number
default = 30
}

variable "control_plane_pool_disk_type" {
description = "Disk type for control plane pool"
variable "infra_pool_disk_type" {
description = "Disk type for infra pool"
type = string
default = "pd-balanced"
}
Expand Down
Loading
Loading