Skip to content

Latest commit

 

History

History
325 lines (232 loc) · 6.47 KB

File metadata and controls

325 lines (232 loc) · 6.47 KB

Operations Guide

This guide covers production deployment, monitoring, and maintenance of the Distributed Message Broker.

Table of Contents

  1. Deployment Options
  2. Docker Deployment
  3. Kubernetes Deployment
  4. Systemd Deployment
  5. Configuration
  6. Monitoring
  7. Backup & Recovery
  8. Troubleshooting

Deployment Options

Method Use Case Complexity
Docker Compose Development, testing, small deployments Low
Kubernetes Production, cloud-native environments Medium
Systemd Bare metal, VMs Medium

Docker Deployment

Quick Start (3-Node Cluster)

# Start the cluster
docker-compose up -d

# Check status
docker-compose ps

# View logs
docker-compose logs -f broker-1

# Stop the cluster
docker-compose down

With Monitoring

# Start with Prometheus and Grafana
docker-compose --profile monitoring up -d

# Access Grafana at http://localhost:3000 (admin/admin)

Scaling

# Note: For scaling beyond 3 nodes, modify docker-compose.yml
# Each new node needs a unique ID and should join the cluster

Kubernetes Deployment

Prerequisites

  • Kubernetes 1.21+
  • kubectl configured
  • StorageClass with ReadWriteOnce support

Deploy

# Apply all manifests
kubectl apply -f deploy/kubernetes/broker.yaml

# Check pod status
kubectl get pods -n broker

# Wait for all pods to be ready
kubectl wait --for=condition=ready pod -l app.kubernetes.io/name=distributed-broker -n broker --timeout=300s

Verify Cluster Formation

# Check logs of first pod
kubectl logs broker-0 -n broker

# Port forward to access the API
kubectl port-forward svc/broker 8080:8080 -n broker

Scaling

# Scale to 5 replicas
kubectl scale statefulset broker --replicas=5 -n broker

Systemd Deployment

Installation

# 1. Build or download the binary
go build -o /usr/local/bin/broker .

# 2. Create user and directories
sudo useradd -r -s /bin/false broker
sudo mkdir -p /var/lib/broker/data /etc/broker
sudo chown -R broker:broker /var/lib/broker

# 3. Copy configuration
sudo cp deploy/systemd/broker.env.example /etc/broker/broker.env
sudo vim /etc/broker/broker.env  # Edit as needed

# 4. Install service
sudo cp deploy/systemd/broker.service /etc/systemd/system/
sudo systemctl daemon-reload

# 5. Start and enable
sudo systemctl enable broker
sudo systemctl start broker

Multi-Node Setup

On each node, edit /etc/broker/broker.env:

Node 1 (Bootstrap):

BROKER_NODE_ID=broker-1
BROKER_BOOTSTRAP=true
BROKER_ADVERTISE_ADDR=192.168.1.10

Node 2:

BROKER_NODE_ID=broker-2
BROKER_BOOTSTRAP=false
BROKER_JOIN_ADDR=192.168.1.10:8080
BROKER_ADVERTISE_ADDR=192.168.1.11

Node 3:

BROKER_NODE_ID=broker-3
BROKER_BOOTSTRAP=false
BROKER_JOIN_ADDR=192.168.1.10:8080
BROKER_ADVERTISE_ADDR=192.168.1.12

Configuration

Configuration File

Copy and modify config.yaml.example:

node:
  id: "broker-1"
  data_dir: "/var/lib/broker/data"

network:
  grpc_port: 8080
  raft_port: 9080
  data_port: 9180
  metrics_port: 9090

cluster:
  bootstrap: true
  controller: true

Environment Variables

All settings can be overridden via environment variables:

Variable Description Default
BROKER_NODE_ID Unique node identifier Required
BROKER_GRPC_PORT Client API port 8080
BROKER_RAFT_PORT Controller communication 9080
BROKER_DATA_PORT Inter-broker replication 9180
BROKER_METRICS_PORT Prometheus metrics 9090
BROKER_DATA_DIR Data directory ./data
BROKER_BOOTSTRAP Bootstrap new cluster false
BROKER_JOIN_ADDR Node to join ""
BROKER_ADVERTISE_ADDR External address localhost
BROKER_LOG_LEVEL Logging level info

Monitoring

Prometheus Metrics

The broker exposes metrics at http://<host>:9090/metrics:

Metric Type Description
broker_produce_total Counter Total messages produced
broker_consume_total Counter Total messages consumed
broker_produce_latency_seconds Histogram Produce latency
broker_consume_latency_seconds Histogram Consume latency
broker_topic_count Gauge Number of topics
broker_partition_count Gauge Number of partitions

Grafana Dashboard

Import deploy/grafana/dashboards/broker-dashboard.json or use the provided docker-compose setup.

Health Check

# HTTP health endpoint
curl http://localhost:9090/health

# Expected response: {"status": "ok"}

Backup & Recovery

Data Directory Structure

/var/lib/broker/data/
├── raft/           # Raft consensus logs and snapshots
├── topics/         # Topic data and segments
└── partitions/     # Partition data

Backup Procedure

# 1. Stop the broker (optional but recommended)
sudo systemctl stop broker

# 2. Create backup
sudo tar -czvf broker-backup-$(date +%Y%m%d).tar.gz /var/lib/broker/data

# 3. Restart broker
sudo systemctl start broker

Restore Procedure

# 1. Stop the broker
sudo systemctl stop broker

# 2. Clear existing data
sudo rm -rf /var/lib/broker/data/*

# 3. Restore from backup
sudo tar -xzvf broker-backup-YYYYMMDD.tar.gz -C /

# 4. Start broker
sudo systemctl start broker

Troubleshooting

Common Issues

Cluster won't form:

# Check connectivity between nodes
nc -zv <peer-ip> 9080

# Check Raft logs
journalctl -u broker | grep -i raft

Leader election stuck:

# Ensure quorum (majority of nodes running)
# For 3-node cluster, need at least 2 nodes

# Check leader status
curl http://localhost:9090/metrics | grep leader

High memory usage:

# Check segment count
ls -la /var/lib/broker/data/topics/*/

# Consider adjusting retention policy

Slow produce/consume:

# Check disk I/O
iostat -x 1

# Check network latency between nodes
ping <peer-ip>

Log Levels

Set BROKER_LOG_LEVEL=debug for detailed logging:

# Systemd
sudo sed -i 's/BROKER_LOG_LEVEL=info/BROKER_LOG_LEVEL=debug/' /etc/broker/broker.env
sudo systemctl restart broker

Getting Support

  1. Check logs: journalctl -u broker -f
  2. Review metrics at /metrics
  3. Open an issue with logs and metrics attached