|
1 | 1 | # Temporal Worker Controller
|
2 | 2 |
|
3 |
| -> ⚠️ This project is 100% experimental. Please do not attempt to install the controller in any production and/or shared environment. |
| 3 | +[](LICENSE) |
| 4 | +[](https://goreportcard.com/report/github.com/temporalio/temporal-worker-controller) |
4 | 5 |
|
5 |
| -The goal of the Temporal Worker Controller is to make it easy to run workers on Kubernetes while leveraging |
6 |
| -[Worker Deployments](https://docs.temporal.io/production-deployment/worker-deployments). |
| 6 | +> 🚀 **Public Preview**: This project is in [Public Preview](https://docs.temporal.io/evaluate/development-production-features/release-stages) and ready for production use cases*. Core functionality is complete with stable APIs. |
| 7 | +> |
| 8 | +> *Dynamic auto-scaling based on workflow load is not yet implemented. Use cases must work with fixed worker replica counts. |
7 | 9 |
|
8 |
| -## Why |
| 10 | +**The Temporal Worker Controller makes it simple and safe to deploy Temporal workers on Kubernetes.** |
9 | 11 |
|
10 |
| -Temporal's [deterministic constraints](https://docs.temporal.io/workflows#deterministic-constraints) can cause headaches |
11 |
| -when rolling out or rolling back workflow code changes. |
| 12 | +Temporal workflows require deterministic execution, which means updating worker code can break running workflows if the changes aren't backward compatible. Traditional deployment strategies force you to either risk breaking existing workflows or use Temporal's [Patching API](https://docs.temporal.io/patching) to maintain compatibility across versions. |
12 | 13 |
|
13 |
| -The traditional approach to workflow determinism is to gate new behavior behind |
14 |
| -[versioning checks](https://docs.temporal.io/workflows#workflow-versioning), otherwise known as the Patching API. Over time these checks can become a |
15 |
| -source of technical debt, as safely removing them from a codebase is a careful process that often involves querying all |
16 |
| -running workflows. |
| 14 | +Temporal's [Worker Versioning](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning) feature solves this dilemma by providing programmatic control over worker versions and traffic routing. The Temporal Worker Controller automates a deployment system that uses Worker Versioning on Kubernetes. When you deploy new code, the controller automatically creates a new worker version while keeping the old version running. Existing workflows continue on the old version while new workflows use the new version. This approach eliminates the need for patches in many cases and ensures running workflows are never disrupted. |
17 | 15 |
|
18 |
| -**Worker Versioning** is a Temporal feature that allows you to pin Workflows to individual versions of your workers, which |
19 |
| -are called **Worker Deployment Versions**. Using pinning, you’ll no longer need to patch most Workflows as part of routine |
20 |
| -deploys! With this guarantee, you can freely make changes that would have previously caused non-determinism errors had |
21 |
| -you done them without patching. And provided your Activities and Workflows are running in the same worker deployment version, |
22 |
| -you also do not need to ensure interface compatibility across versions. |
| 16 | +## What does it do? |
23 | 17 |
|
24 |
| -This greatly simplifies Workflow upgrades, but the cost is that your deployment system must support multiple versions |
25 |
| -running simultaneously and allow you to control when they are sunsetted. This is typically known as a [rainbow deploy](https://release.com/blog/rainbow-deployment-why-and-how-to-do-it) |
26 |
| -(of which a **blue-green deploy** is a special case) and contrasts to a **rolling deploy** in which your Workers are upgraded in |
27 |
| -place without the ability to keep old versions around. |
| 18 | +🔒 **Protected [Pinned](https://docs.temporal.io/worker-versioning#pinned) workflows** - Workflows pinned to a version stay on that version and won't break |
| 19 | +🎚️ **Controlled rollout for [AutoUpgrade](https://docs.temporal.io/worker-versioning#auto-upgrade) workflows** - AutoUpgrade workflows shifted to new versions with configurable safety controls |
| 20 | +📦 **Automatic version management** - Registers versions with Temporal, manages routing rules, and tracks version lifecycle |
| 21 | +🎯 **Smart traffic routing** - New workflows automatically get routed to your target worker version |
| 22 | +🛡️ **Progressive rollouts** - Catch incompatible changes early with small traffic percentages before they spread |
| 23 | +⚡ **Easy rollbacks** - Instantly route traffic back to a previous version if issues are detected |
28 | 24 |
|
29 |
| -This project aims to provide automation to enable rainbow deployments of your workers by simplifying the bookkeeping around |
30 |
| -tracking which versions still have active workflows, managing the lifecycle of versioned worker deployments, and calling |
31 |
| -Temporal APIs to update the routing config of Temporal Worker Deployments to route workflow traffic to new versions. |
| 25 | +## Quick Example |
32 | 26 |
|
33 |
| -## Terminology |
34 |
| -Note that in Temporal, **Worker Deployment** is sometimes referred to as **Deployment**, but since the controller makes |
35 |
| -significant references to Kubernetes Deployment resource, within this repository we will stick to these terms: |
36 |
| -- **Worker Deployment Version**: A version of a deployment or service that runs [Temporal Workers](https://docs.temporal.io/workers). It can have multiple Workers, but they all run the same build. Sometimes shortened to "version" or "deployment version." |
37 |
| -- **Worker Deployment**: A deployment or service across multiple deployment versions. In a rainbow deploy, a Worker Deployment can have multiple active Deployment Versions running at once. |
38 |
| -- **Deployment**: A [Kubernetes Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) resource. A Deployment is "versioned" if it is running versioned Temporal workers/pollers. |
| 27 | +Instead of this traditional approach where deployments can break running workflows: |
39 | 28 |
|
40 |
| -## Features |
41 |
| - |
42 |
| -- [x] Registration of new Temporal Worker Deployment Versions |
43 |
| -- [x] Creation of versioned Deployment resources (that manage the Pods that run your Temporal pollers) |
44 |
| -- [x] Deletion of resources associated with drained Worker Deployment Versions |
45 |
| -- [x] `Manual`, `AllAtOnce`, and `Progressive` rollouts of new versions |
46 |
| -- [x] Ability to specify a "gate" workflow that must succeed on the new version before routing real traffic to that version |
47 |
| -- [ ] Autoscaling of versioned Deployments |
48 |
| - |
49 |
| -## Usage |
50 |
| - |
51 |
| -In order to be compatible with this controller, workers need to be configured using these standard environment |
52 |
| -variables: |
53 |
| - |
54 |
| -- `TEMPORAL_ADDRESS`: The host and port of the Temporal server, e.g. `default.foo.tmprl.cloud:7233` |
55 |
| -- `TEMPORAL_NAMESPACE`: The Temporal namespace to connect to, e.g. `default` |
56 |
| -- `TEMPORAL_DEPLOYMENT_NAME`: The name of the worker deployment. This must be unique to the worker deployment and should not |
57 |
| - change between versions. |
58 |
| -- `TEMPORAL_WORKER_BUILD_ID`: The build ID of the worker. This should change with each new worker rollout. |
59 |
| - |
60 |
| -Each of these will be automatically set by the controller, and must not be manually specified in the worker's pod template. |
61 |
| - |
62 |
| -## How It Works |
63 |
| - |
64 |
| -Every `TemporalWorkerDeployment` resource manages one or more standard `Deployment` resources. Each Deployment manages pods |
65 |
| -which in turn poll Temporal for tasks routed to their respective worker versions. |
66 |
| - |
67 |
| -```mermaid |
68 |
| -flowchart TD |
69 |
| - subgraph "K8s Namespace 'ns'" |
70 |
| - twd[TemporalWorkerDeployment 'foo'] |
71 |
| - |
72 |
| - subgraph "Current/default version" |
73 |
| - d5["Deployment foo-v5, Version{DeploymentName: foo/ns, BuildId: v5}"] |
74 |
| - rs5["ReplicaSet foo-v5"] |
75 |
| - p5a["Pod foo-v5-a"] |
76 |
| - p5b["Pod foo-v5-b"] |
77 |
| - p5c["Pod foo-v5-c"] |
78 |
| - d5 --> rs5 |
79 |
| - rs5 --> p5a |
80 |
| - rs5 --> p5b |
81 |
| - rs5 --> p5c |
82 |
| - end |
83 |
| -
|
84 |
| - subgraph "Deprecated versions" |
85 |
| - d1["Deployment foo-v1 Version{DeploymentName: foo/ns, BuildId: v1}"] |
86 |
| - rs1["ReplicaSet foo-v1"] |
87 |
| - p1a["Pod foo-v1-a"] |
88 |
| - p1b["Pod foo-v1-b"] |
89 |
| - d1 --> rs1 |
90 |
| - rs1 --> p1a |
91 |
| - rs1 --> p1b |
92 |
| -
|
93 |
| - dN["Deployment ..."] |
94 |
| - end |
95 |
| - end |
96 |
| -
|
97 |
| - twd --> d1 |
98 |
| - twd --> dN |
99 |
| - twd --> d5 |
100 |
| -
|
101 |
| - p1a -. "poll version {foo/ns, v1}" .-> server |
102 |
| - p1b -. "poll version {foo/ns, v1}" .-> server |
103 |
| -
|
104 |
| - p5a -. "poll version {foo/ns, v5}" .-> server |
105 |
| - p5b -. "poll version {foo/ns, v5}" .-> server |
106 |
| - p5c -. "poll version {foo/ns, v5}" .-> server |
107 |
| -
|
108 |
| - server["Temporal Server"] |
| 29 | +```yaml |
| 30 | +# ❌ Traditional deployment - risky for running workflows |
| 31 | +apiVersion: apps/v1 |
| 32 | +kind: Deployment |
| 33 | +metadata: |
| 34 | + name: my-worker |
| 35 | +spec: |
| 36 | + template: |
| 37 | + spec: |
| 38 | + containers: |
| 39 | + - name: worker |
| 40 | + image: my-worker:v2.0.0 # This change might break existing workflows! |
109 | 41 | ```
|
110 | 42 |
|
111 |
| -### Worker Lifecycle |
112 |
| - |
113 |
| -When a new worker deployment version is deployed, the worker controller detects it and automatically begins the process |
114 |
| -of making that version the new **Current Version** of the worker deployment it is a part of. This could happen |
115 |
| -immediately if `rollout.strategy = AllAtOnce`, or gradually if `rollout.strategy = Progressive`. |
116 |
| - |
117 |
| -As older pinned workflows finish executing and deprecated deployment versions become **Drained**, the worker controller |
118 |
| -frees up resources by sunsetting the `Deployment` resources running workers that poll those versions. |
| 43 | +You define your worker like this: |
119 | 44 |
|
120 |
| -Here is an example of a progressive cut-over strategy gated on the success of the `HelloWorld` workflow: |
121 | 45 | ```yaml
|
| 46 | +# ✅ Temporal Worker Controller - safe deployments |
| 47 | +apiVersion: temporal.io/v1alpha1 |
| 48 | +kind: TemporalWorkerDeployment |
| 49 | +metadata: |
| 50 | + name: my-worker |
| 51 | +spec: |
| 52 | + replicas: 3 |
122 | 53 | rollout:
|
123 |
| - strategy: Progressive |
| 54 | + strategy: Progressive # Gradual, safe rollout |
124 | 55 | steps:
|
125 |
| - - rampPercentage: 1 |
126 |
| - pauseDuration: 30s |
127 | 56 | - rampPercentage: 10
|
128 |
| - pauseDuration: 1m |
129 |
| - gate: |
130 |
| - workflowType: "HelloWorld" |
| 57 | + pauseDuration: 5m |
| 58 | + - rampPercentage: 50 |
| 59 | + pauseDuration: 10m |
| 60 | + template: |
| 61 | + spec: |
| 62 | + containers: |
| 63 | + - name: worker |
| 64 | + image: my-worker:v2.0.0 # Safe to deploy! |
131 | 65 | ```
|
132 | 66 |
|
133 |
| -```mermaid |
134 |
| -sequenceDiagram |
135 |
| - autonumber |
136 |
| - participant Dev as Developer |
137 |
| - participant K8s as Kubernetes |
138 |
| - participant Ctl as WorkerController |
139 |
| - participant T as Temporal |
140 |
| - |
141 |
| - Dev->>K8s: Create TemporalWorkerDeployment "foo" (v1) |
142 |
| - K8s-->>Ctl: Notify TemporalWorkerDeployment "foo" created |
143 |
| - Ctl->>K8s: Create Deployment "foo-v1" |
144 |
| - Ctl->>T: Register build "v1" as new current version of "foo/ns" |
145 |
| - Dev->>K8s: Update TemporalWorker "foo" (v2) |
146 |
| - K8s-->>Ctl: Notify TemporalWorker "foo" updated |
147 |
| - Ctl->>K8s: Create Deployment "foo-v2" |
148 |
| - Ctl->>T: Register build "v2" as new current version of "foo/ns" |
149 |
| - |
150 |
| - loop Poll Temporal API |
151 |
| - Ctl-->>T: Wait for version {foo/ns, v1} to be drained (no open pinned wfs) |
152 |
| - end |
153 |
| - |
154 |
| - Ctl->>K8s: Delete Deployment "foo-v1" |
| 67 | +When you update the image, the controller automatically: |
| 68 | +1. 🆕 Creates a new deployment with your updated worker |
| 69 | +2. 📊 Gradually routes new workflows and AutoUpgrade workflows to the new version |
| 70 | +3. 🔒 Keeps Pinned workflows running on their original version (guaranteed safety) |
| 71 | +4. 🧹 Automatically scales down and cleans up old versions once they are [drained](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning#sunsetting-an-old-deployment-version) |
| 72 | +
|
| 73 | +## 🏃♂️ Getting Started |
| 74 | +
|
| 75 | +### Prerequisites |
| 76 | +
|
| 77 | +- Kubernetes cluster (1.19+) |
| 78 | +- [Temporal Server](https://docs.temporal.io/) (Cloud or self-hosted [v1.28.1](https://github.com/temporalio/temporal/releases/tag/v1.28.1)) |
| 79 | +- Basic familiarity with Temporal [Workers](https://docs.temporal.io/workers), [Workflows](https://docs.temporal.io/workflows), and [Worker Versioning](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning) |
| 80 | +
|
| 81 | +### 🔧 Installation |
| 82 | +
|
| 83 | +```bash |
| 84 | +# Install using Helm in your preferred namespace |
| 85 | +helm install temporal-worker-controller \ |
| 86 | + oci://docker.io/temporalio/temporal-worker-controller \ |
| 87 | + --namespace <your-namespace> |
155 | 88 | ```
|
156 | 89 |
|
157 |
| -## Contributing |
| 90 | +### Next Steps |
| 91 | + |
| 92 | +**New to deploying workers with this controller?** → Start with our [Migration Guide](docs/migration-guide.md) to learn how to safely transition from traditional deployments. |
| 93 | + |
| 94 | +**Ready to dive deeper?** → Check out the [Architecture Guide](docs/architecture.md) to understand how the controller works, or the [Temporal Worker Versioning docs](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning) to learn about the underlying Temporal feature. |
| 95 | + |
| 96 | +**Need configuration help?** → See the [Configuration Reference](docs/configuration.md) for all available options. |
| 97 | + |
| 98 | +## Features |
| 99 | + |
| 100 | +- ✅ **Registration of new Temporal Worker Deployment Versions** |
| 101 | +- ✅ **Creation of versioned Deployment resources** (managing Pods that run your Temporal workers) |
| 102 | +- ✅ **Automatic lifecycle scaling** - Scales down worker versions when no longer needed |
| 103 | +- ✅ **Deletion of resources** associated with drained Worker Deployment Versions |
| 104 | +- ✅ **Multiple rollout strategies**: `Manual`, `AllAtOnce`, and `Progressive` rollouts |
| 105 | +- ✅ **Gate workflows** - Test new versions with a [pre-deployment test](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning#adding-a-pre-deployment-test) before routing real traffic to them |
| 106 | +- ⏳ **Load-based auto-scaling** - Not yet implemented (use fixed replica counts) |
| 107 | + |
| 108 | + |
| 109 | +## 💡 Why Use This? |
| 110 | + |
| 111 | +### Manual Worker Versioning is Complex |
| 112 | + |
| 113 | +While Temporal's [Worker Versioning](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning) feature solves deployment safety problems, using it manually requires: |
| 114 | + |
| 115 | +- **Manual API calls** - Register versions, manage routing rules, track version states |
| 116 | +- **Infrastructure coordination** - Deploy multiple Kubernetes resources for each version |
| 117 | +- **Lifecycle monitoring** - Watch for drained versions and clean up resources |
| 118 | +- **Rollout orchestration** - Manually control progressive traffic shifting |
| 119 | + |
| 120 | +### The Controller Automates Everything |
| 121 | + |
| 122 | +The Temporal Worker Controller eliminates this operational overhead by automating the entire Worker Versioning lifecycle on Kubernetes: |
| 123 | + |
| 124 | +- **Automatic Temporal integration** - Registers versions and manages routing without manual API calls |
| 125 | +- **Kubernetes-native workflow** - Update a single custom resource, get full [rainbow deployments](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning#deployment-systems) |
| 126 | +- **Intelligent cleanup** - Monitors version [drainage](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning#sunsetting-an-old-deployment-version) and automatically removes unused resources |
| 127 | +- **Built-in rollout strategies** - Progressive, AllAtOnce, and Manual with configurable safety controls |
| 128 | + |
| 129 | +## 📖 Documentation |
| 130 | + |
| 131 | +| Document | Description | |
| 132 | +|----------|-------------| |
| 133 | +| [Migration Guide](docs/migration-guide.md) | Step-by-step guide for migrating from traditional deployments | |
| 134 | +| [Architecture](docs/architecture.md) | Technical deep-dive into how the controller works | |
| 135 | +| [Configuration](docs/configuration.md) | Complete configuration reference | |
| 136 | +| [Concepts](docs/concepts.md) | Key concepts and terminology | |
| 137 | +| [Limits](docs/limits.md) | Technical constraints and limitations | |
| 138 | + |
| 139 | +## 🔧 Worker Configuration |
| 140 | + |
| 141 | +Your workers need these environment variables (automatically set by the controller): |
| 142 | + |
| 143 | +```bash |
| 144 | +TEMPORAL_ADDRESS=your-temporal-server:7233 |
| 145 | +TEMPORAL_NAMESPACE=your-namespace |
| 146 | +TEMPORAL_DEPLOYMENT_NAME=my-worker # Unique worker deployment name |
| 147 | +TEMPORAL_WORKER_BUILD_ID=v1.2.3 # Version identifier |
| 148 | +``` |
| 149 | + |
| 150 | +**Important**: Don't set the above environment variables manually - the controller manages these automatically. |
| 151 | + |
| 152 | +## 🤝 Contributing |
| 153 | + |
| 154 | +We welcome all contributions! This includes: |
| 155 | + |
| 156 | +- 🔧 **Code contributions** - Please start by [opening an issue](https://github.com/temporalio/temporal-worker-controller/issues/new) to discuss your idea |
| 157 | +- 🐛 **Bug reports** - [File an issue](https://github.com/temporalio/temporal-worker-controller/issues/new) |
| 158 | +- 💡 **Feature requests** - Tell us what you'd like to see |
| 159 | +- 💬 **Feedback** - Join [#safe-deploys](https://temporalio.slack.com/archives/C07MDJ6S3HP) on [Temporal Slack](https://t.mp/slack) |
| 160 | + |
| 161 | +## 🛠️ Development |
158 | 162 |
|
159 |
| -This project is in very early stages; as such external code contributions are not yet being solicited. |
| 163 | +Want to try the controller locally? Check out the [local demo guide](internal/demo/README.md) for development setup. |
160 | 164 |
|
161 |
| -Bug reports and feature requests are welcome! Please [file an issue](https://github.com/jlegrone/worker-controller/issues/new). |
| 165 | +## 📄 License |
162 | 166 |
|
163 |
| -You may also reach out to [#safe-deploys](https://temporalio.slack.com/archives/C07MDJ6S3HP) or @jlegrone on the |
164 |
| -[Temporal Slack](https://t.mp/slack) if you have questions, suggestions, or are interested in making other contributions. |
| 167 | +This project is licensed under the [MIT License](LICENSE). |
165 | 168 |
|
166 |
| -## Development |
| 169 | +--- |
167 | 170 |
|
168 |
| -For local development setup and running the controller locally, see the [local demo guide](internal/demo/README.md). |
| 171 | +**Questions?** Reach out to [@jlegrone](https://github.com/jlegrone) or the [#safe-deploys](https://temporalio.slack.com/archives/C07MDJ6S3HP) channel on Temporal Slack! |
0 commit comments