Production-grade Kubernetes homelab with three-tier PKI, automated certificate management, complete observability stack, GitOps continuous delivery, and Mac workstation kubectl workflow.
# On WSL2 (hill-arch)
./scripts/01-setup-cluster.sh # Create cluster
./scripts/02-deploy-all.sh # Deploy all services
./scripts/03-export-root-ca.sh # Export root CA certificate
# Export kubeconfig to Mac
scp ~/.kube/config jon@<mac-ip>:~/.kube/homelab-configNetwork Flow:
Mac Workstation
├─> kubectl → Windows → WSL2 → kind API (0.0.0.0:6443)
└─> Browser → Windows → WSL2 → nginx-ingress → Services
PKI Chain:
Root CA (10 years)
└─> Intermediate CA (5 years)
└─> Service Certificates (90 days, auto-renewed)
Infrastructure:
- Cluster: 3-node kind cluster (1 control-plane, 2 workers)
- Certificates: cert-manager v1.13.2 with three-tier PKI
- Ingress: nginx-ingress controller
- DNS: AdGuard Home (cluster-wide DNS with ad blocking)
- GitOps: ArgoCD (declarative continuous delivery with auto-sync)
- Database: PostgreSQL 16 (shared database for applications)
- Monitoring: metrics-server, Prometheus, Grafana, kube-state-metrics, node-exporter
- Observability: OpenTelemetry Collector, Loki (logs), Tempo (traces), Promtail (log collection)
- Applications: Kubernetes Dashboard, Portainer (Docker + K8s management), AdGuard Home, whoami test app
| Service | URL | Certificate | Purpose |
|---|---|---|---|
| Kubernetes Dashboard | https://dashboard.homelab.local | Trusted (green lock) | K8s cluster management |
| Portainer | https://portainer.homelab.local | Trusted (green lock) | Docker + K8s management |
| ArgoCD | https://argocd.homelab.local | Trusted (green lock) | GitOps continuous delivery |
| AdGuard Home | https://adguard.homelab.local | Trusted (green lock) | DNS + ad blocking (cluster-wide) |
| Prometheus | https://prometheus.homelab.local | Trusted (green lock) | Metrics collection & monitoring |
| Grafana | https://grafana.homelab.local | Trusted (green lock) | Metrics visualization |
| whoami Test App | https://whoami.homelab.local | Trusted (green lock) | Ingress test |
Note: Credentials for Grafana and ArgoCD are stored in Obsidian project notes, not in this repository.
Resource Count:
- Nodes: 3 (1 control-plane, 2 workers)
- Pods: 46 (across 16 namespaces)
- Certificates: 9 (all READY=True)
- Ingress Routes: 7
- PersistentVolumes: 7 (85Gi total)
- Helm Releases: 1 (ArgoCD)
- ArgoCD Applications: 2 (whoami, postgres)
Resource Usage (Typical Idle):
- CPU: ~9.6% cluster-wide
- Memory: ~7.0% cluster-wide
- Disk: ~1.24% cluster-wide
Storage Allocation:
- AdGuard Home: 10Gi (DNS query logs & config)
- Prometheus: 10Gi (metrics, 15-day retention)
- Grafana: 5Gi (dashboards & settings)
- Loki: 20Gi (application logs, 15-day retention)
- Tempo: 20Gi (distributed traces, 2 PVCs)
- PostgreSQL: 20Gi (shared database for litellm, n8n)
Mac:
- kubectl, k9s installed
- Root CA trusted in Keychain
/etc/hostsentries:192.168.68.100 dashboard.homelab.local 192.168.68.100 whoami.homelab.local 192.168.68.100 portainer.homelab.local 192.168.68.100 argocd.homelab.local 192.168.68.100 adguard.homelab.local 192.168.68.100 prometheus.homelab.local 192.168.68.100 grafana.homelab.local
Windows:
- Port forwarding script running (
C:\Scripts\wsl-port-forward.ps1) - Root CA trusted in Certificate Store
WSL2 (hill-arch):
- Docker, kind, kubectl installed
- Git repository cloned
- Architecture - Detailed network flow and component design
- Setup Guide - Step-by-step installation instructions
- Troubleshooting - Common issues and solutions
- Port Forwarding - Windows port forwarding configuration
- Monitoring README - Complete monitoring guide
| Script | Purpose |
|---|---|
01-setup-cluster.sh |
Create kind cluster with correct API binding |
02-deploy-all.sh |
Deploy all manifests in correct order |
03-export-root-ca.sh |
Export root CA certificate to trust |
backup-cluster.sh |
Backup all cluster resources |
destroy-cluster.sh |
Delete cluster and cleanup |
k8s-homelab/
├── kind-config.yaml # Cluster definition (0.0.0.0:6443)
├── manifests/
│ ├── 00-namespaces/ # Namespace definitions
│ ├── 01-cert-manager/ # Three-tier PKI configuration
│ ├── 02-adguard-home/ # DNS + ad blocking
│ ├── 02-ingress-nginx/ # Ingress controller
│ ├── 03-kubernetes-dashboard/ # Dashboard with ingress
│ ├── 04-whoami/ # Test application
│ ├── 05-dns/ # DNS configuration
│ ├── 06-portainer/ # Portainer ingress + agent
│ │ ├── agent/ # Portainer agent for K8s management
│ │ ├── ingress.yaml # HTTPS ingress for Portainer UI
│ │ ├── namespace.yaml # Portainer namespace
│ │ └── service.yaml # Service pointing to Docker container
│ ├── 07-metrics-server/ # Kubernetes metrics for kubectl top
│ ├── 08-monitoring/ # Prometheus + Grafana + exporters
│ ├── 09-opentelemetry/ # OpenTelemetry collector
│ ├── 10-observability/ # Loki + Tempo + Promtail
│ ├── 11-argocd/ # ArgoCD GitOps applications
│ │ └── applications/ # ArgoCD Application manifests
│ └── 12-database/ # PostgreSQL shared database
├── scripts/ # Automation scripts
└── docs/ # Documentation
API Server Binding:
- Address:
0.0.0.0:6443(accessible from Mac via Windows port forwarding) - NOT
127.0.0.1:6443(localhost only)
Certificate Issuers:
selfsigned-issuer- Bootstrap self-signed issuerhomelab-root-ca-issuer- Root CA issuer (10y)homelab-issuer- Final cluster issuer (uses intermediate CA)
Ingress Configuration:
- Runs on control-plane node (label:
ingress-ready=true) - Ports 80 and 443 mapped to host
- TLS certificates auto-issued by cert-manager
Daily Development:
# From Mac - no SSH required
kubectl get pods -A
kubectl top nodes
k9sQuick Health Check (30 seconds):
kubectl get nodes # Expected: 3 Ready nodes
kubectl get pods -A | grep -v Running # Should be empty (all pods Running)
kubectl get certificate -A # Expected: 9 certificates, all READY=True
kubectl get ingress -A # Expected: 7 ingress resources
kubectl top nodes # Should show CPU/memory usage
kubectl top pods -A # Should show pod resource usageDeploy New Service (GitOps Workflow):
# 1. Add manifest with cert-manager annotation
cert-manager.io/cluster-issuer: homelab-issuer
# 2. Create ArgoCD Application manifest in manifests/11-argocd/applications/
# 3. Commit and push to Git
git add manifests/
git commit -m "Add new service"
git push
# 4. ArgoCD auto-syncs within 3 minutes (or manually sync)
argocd app sync <app-name>
# 5. Verify deployment
kubectl get pods -n <namespace>
kubectl get certificate -n <namespace>ArgoCD provides declarative continuous delivery with automatic synchronization from Git.
Workflow:
Git Push → ArgoCD Detects Change → Auto-Sync (within 3 min) → Deploy to Cluster
Current Applications:
whoami- Test application (auto-sync enabled, self-heal enabled)postgres- Shared PostgreSQL database (auto-sync enabled, self-heal enabled)
Key Features:
- ✅ Auto-sync: Git changes automatically deployed within 3 minutes
- ✅ Self-heal: Manual kubectl changes auto-reverted to Git state
- ✅ Drift detection: ArgoCD shows OutOfSync when cluster ≠ Git
- ✅ Prune: Deleted manifests in Git = deleted resources in cluster
Access ArgoCD:
# Web UI
https://argocd.homelab.local
# CLI
argocd login argocd.homelab.local
argocd app list
argocd app sync <app-name>Adding New Applications:
- Create Application manifest in
manifests/11-argocd/applications/ - Reference the target manifest directory (e.g.,
manifests/04-whoami/) - Enable auto-sync and prune in sync policy
- Apply:
kubectl apply -f manifests/11-argocd/applications/ - ArgoCD will deploy and continuously sync from Git
Shared PostgreSQL 16 database for applications requiring persistent data.
Configuration:
- Image:
postgres:16-alpine - Storage: 20Gi PVC (StatefulSet with persistent data)
- Port: 5432
- Service:
postgres.database.svc.cluster.local
Application Databases:
litellm- LiteLLM proxy databasen8n- n8n workflow automation database
Connection String Pattern:
postgresql://postgres:<password>@postgres.database.svc.cluster.local:5432/<database>
Monitoring:
- PostgreSQL exporter (sidecar container on port 9187)
- Prometheus scraping database metrics
- Grafana dashboard (ID 9628) showing connections, queries, cache hits, locks
Managed by ArgoCD:
- Source:
manifests/12-database/ - Auto-sync: Enabled
- Git changes auto-deployed within 3 minutes
Portainer runs as a Docker container (not in Kubernetes) with ingress access via nginx-ingress.
Deploy Portainer:
# On WSL2 (hill-arch) - one-time setup
docker run -d \
--name portainer \
--restart=always \
-p 9000:9000 \
-p 9443:9443 \
-v /var/run/docker.sock:/var/run/docker.sock \
-v portainer_data:/data \
--network kind \
portainer/portainer-ce:latest
# Deploy Kubernetes ingress (from Mac)
kubectl apply -f manifests/06-portainer/Why Docker instead of Kubernetes?
- Direct access to Docker socket (manage kind nodes and other containers)
- Can deploy Docker Compose stacks alongside Kubernetes
- Not affected by cluster restarts
- Still accessible via HTTPS with trusted certificate through ingress
Deploy Portainer Agent (for Kubernetes management):
# Deploy agent to cluster (from Mac)
kubectl apply -f manifests/06-portainer/agent/
# Verify agent running
kubectl get pods -n portainerAdd Kubernetes to Portainer:
- Go to https://portainer.homelab.local
- Environments → Add environment → Kubernetes (via agent)
- Name:
homelab-cluster - Environment URL:
172.18.0.4:30778(control-plane IP + NodePort) - Click Add environment
Access: https://portainer.homelab.local (manage both Docker and Kubernetes)
AdGuard Home provides DNS resolution and ad blocking for all pods in the cluster.
CoreDNS Configuration:
# CoreDNS forwards all DNS queries to AdGuard Home
# AdGuard Service IP: 10.96.126.140:53Initial Setup:
- Go to https://adguard.homelab.local
- Complete setup wizard:
- Set admin username/password
- Configure upstream DNS servers:
1.1.1.1,1.0.0.1,8.8.8.8,8.8.4.4
- Enable DNS blocklists (Filters → DNS blocklists)
DNS Flow:
Pod → CoreDNS (10.96.0.10) → AdGuard Home (10.96.126.140) → Upstream DNS
Benefits:
- Ad blocking for all cluster applications
- DNS query logging and analytics
- Custom filtering rules
- Safe browsing (malware/phishing protection)
Note: AdGuard is configured for cluster-internal use only. External devices (Mac, Windows) use their default DNS unless manually configured.
The cluster includes a complete observability stack for metrics, logs, and traces.
Complete Infrastructure Monitoring (7,000+ metrics):
Prometheus scrapes:
├── Windows Host (192.168.68.100:9182) → windows_exporter
│ ├── CPU, memory, disk, network metrics
│ └── 157 Windows services monitored
├── WSL2 Host (172.27.157.7:9100) → node-exporter
│ └── Linux host metrics (filesystem, network, kernel)
├── Docker Daemon (172.27.157.7:9323) → Docker metrics
│ └── Engine info, container lifecycle, build stats
├── kind Nodes (3) → node-exporter DaemonSet
│ └── Container host metrics for all cluster nodes
├── Kubernetes Objects → kube-state-metrics
│ └── Pod/Deployment/Node/PVC state metrics
├── Containers (96) → cAdvisor via kubelet
│ └── Container CPU, memory, network, disk I/O
├── Applications → Custom exporters
│ ├── ArgoCD → GitOps metrics (sync status, health, git operations)
│ └── PostgreSQL → Database metrics (connections, queries, locks, cache hits)
├── kubelet → Node health metrics
├── API server → Control plane metrics
└── metrics-server → Resource metrics for kubectl top
Access:
- Prometheus: https://prometheus.homelab.local
- Query metrics, view scrape targets (Status → Targets)
- All targets should show "UP" status
- Grafana: https://grafana.homelab.local
- Prometheus datasource pre-configured
- Operational Dashboards (9 total):
- Kubernetes Cluster (Prometheus) - Cluster-wide metrics and health
- Kubernetes cluster monitoring - Pod/deployment/node states
- Node Exporter Full (kind nodes) - Container host metrics
- Node Exporter Full (WSL2) - WSL2 host metrics (select instance: wsl2-host)
- Docker Dashboard (ID 1229) - Docker engine metrics and container lifecycle
- Windows Exporter (ID 14694) - Windows host CPU/memory/disk/services
- CoreDNS - DNS performance and query metrics
- ArgoCD - GitOps health and sync status
- PostgreSQL (ID 9628) - Database performance and health
Key Metrics Available:
- Windows Host:
windows_cpu_time_total- Windows CPU usagewindows_os_physical_memory_free_bytes- Windows memorywindows_logical_disk_free_bytes- Windows disk spacewindows_service_state- Windows service status (157 services)
- WSL2/Linux Hosts:
node_cpu_seconds_total- CPU usagenode_memory_MemTotal_bytes- Memory capacitynode_filesystem_size_bytes- Disk usage
- Docker Daemon:
engine_daemon_engine_info- Docker version and infoengine_daemon_container_actions_seconds- Container lifecycle
- Kubernetes:
kube_pod_status_phase- Pod states (Running/Pending/Failed)kube_deployment_replicas- Deployment healthcontainer_cpu_usage_seconds_total- Container CPUcontainer_memory_usage_bytes- Container memory
- Applications:
argocd_app_sync_status- ArgoCD application sync stateargocd_app_health_status- ArgoCD application healthpg_stat_database_*- PostgreSQL database statisticspg_locks_count- PostgreSQL lock contention
Example Queries:
# Total running pods
sum(kube_pod_status_phase{phase="Running"})
# Node CPU usage (%)
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Pod memory usage
sum(container_memory_usage_bytes{pod=~".*"}) by (pod)
# ArgoCD applications out of sync
count(argocd_app_info{sync_status!="Synced"})
# PostgreSQL active connections
sum(pg_stat_database_numbackends) by (datname)
Components:
- Loki: Centralized log storage (20Gi retention, 15-day retention)
- Promtail: Log collection agent (DaemonSet on all nodes)
Access:
- Loki integrated with Grafana (Explore → Loki datasource)
- Query logs using LogQL syntax
Example LogQL Queries:
# All logs from namespace
{namespace="monitoring"}
# Error logs across cluster
{job="kubernetes-pods"} |= "error"
# Logs from specific pod
{pod="prometheus-965fd69bb-km2nn"}
Components:
- Tempo: Trace storage backend (20Gi retention)
- OpenTelemetry Collector: Trace ingestion (OTLP endpoints)
Endpoints:
- OTLP gRPC:
otel-collector.opentelemetry:4317 - OTLP HTTP:
otel-collector.opentelemetry:4318
Integration: Applications can send traces using OpenTelemetry SDKs to the collector endpoints.
# Verify metrics-server
kubectl top nodes
kubectl top pods -A
# Check Prometheus targets
curl -k https://prometheus.homelab.local/api/v1/targets
# Check all monitoring pods
kubectl get pods -n monitoring
kubectl get pods -n observability
kubectl get pods -n opentelemetry
# Verify exporters
kubectl get pods -l app=kube-state-metrics -n monitoring
kubectl get pods -l app=node-exporter -n monitoring
# Check persistent storage
kubectl get pvc -n monitoring
kubectl get pvc -n observability
kubectl get pvc -n database# Check cluster health
kubectl get nodes # Expected: 3 nodes Ready
kubectl get pods -A # Expected: 46 pods Running
kubectl top nodes # Should show resource usage
kubectl top pods -A # Should show pod resource usage
# Check certificates
kubectl get certificate -A # Expected: 9 certificates, all READY=True
kubectl get secret -A | grep tls # TLS secrets created
# Check ingress
kubectl get ingress -A # Expected: 7 ingress resources
curl -k https://dashboard.homelab.local
curl -k https://whoami.homelab.local
curl -k https://portainer.homelab.local
curl -k https://adguard.homelab.local
curl -k https://prometheus.homelab.local
curl -k https://grafana.homelab.local
curl -k https://argocd.homelab.local
# Check observability stack
kubectl get pods -n monitoring # Expected: 7/7 Running
kubectl get pods -n observability # Expected: 5/5 Running
kubectl get pods -n opentelemetry # Expected: 1/1 Running
kubectl get pods -n argocd # Expected: 7/7 Running
kubectl get pods -n database # Expected: 1/1 Running (postgres-0 with 2/2 containers)
# Check ArgoCD applications
argocd app list # Expected: 2 applications (whoami, postgres)
argocd app get whoami # Should show Synced + Healthy
argocd app get postgres # Should show Synced + Healthy
# Check recent pod restarts (troubleshooting)
kubectl get pods -A -o wide | awk '{if ($4 > 5) print}' # Pods with >5 restarts-
Lab v1: Successfully deployed, archived to
.archive/- ❌ API server bound to
127.0.0.1(incorrect) - ❌ No Git source control
- ✅ Three-tier PKI working
- ✅ Trusted certificates operational
- ❌ API server bound to
-
Lab v2: Production-grade rebuild (current)
- ✅ API server on
0.0.0.0:6443 - ✅ Full Git source control
- ✅ Mac-first kubectl workflow
- ✅ Reproducible infrastructure
- ✅ Complete observability stack
- ✅ GitOps continuous delivery with ArgoCD
- ✅ Shared PostgreSQL database infrastructure
- ✅ API server on
Timeline:
- 2025-10-28: Project initiated, repository created
- 2025-10-29: MVP complete (Dashboard + whoami + TLS)
- 2025-10-30: AdGuard Home deployed (port configuration fix)
- 2025-11-01: Complete observability stack deployed
- Portainer hybrid architecture (Docker + K8s management)
- AdGuard CoreDNS integration (cluster-wide DNS)
- Monitoring stack (metrics-server, Prometheus, Grafana, kube-state-metrics, node-exporter)
- Observability stack (Loki, Tempo, Promtail, OpenTelemetry Collector)
- ArgoCD GitOps (automated continuous delivery)
- 2025-11-02: PostgreSQL database infrastructure
- Shared PostgreSQL 16 instance (20Gi storage)
- Application databases created (litellm, n8n)
- First ArgoCD-managed StatefulSet deployment
- GitOps workflow established for database layer
- ArgoCD and PostgreSQL monitoring integration
This is a personal homelab project. See project documentation in basic-memory for architecture decisions and rationale.
MIT