Skip to content

Commit 66ca45b

Browse files
authored
Refactor README for Public Preview (#137)
<!--- Note to EXTERNAL Contributors --> <!-- Thanks for opening a PR! If it is a significant code change, please **make sure there is an open issue** for this. We work best with you when we have accepted the idea first before you code. --> <!--- For ALL Contributors 👇 --> ## What was changed Refactor README ## Why? Update information and make it more user friendly ## Checklist <!--- add/delete as needed ---> 1. Closes #87 2. How was this tested: <!--- Please describe how you tested your changes/how we can test them --> 3. Any docs updates needed? <!--- update README if applicable or point out where to update docs.temporal.io -->
1 parent b4068c1 commit 66ca45b

File tree

2 files changed

+265
-141
lines changed

2 files changed

+265
-141
lines changed

README.md

Lines changed: 144 additions & 141 deletions
Original file line numberDiff line numberDiff line change
@@ -1,168 +1,171 @@
11
# Temporal Worker Controller
22

3-
> ⚠️ This project is 100% experimental. Please do not attempt to install the controller in any production and/or shared environment.
3+
[![License](https://img.shields.io/github/license/temporalio/temporal-worker-controller)](LICENSE)
4+
[![Go Report Card](https://goreportcard.com/badge/github.com/temporalio/temporal-worker-controller)](https://goreportcard.com/report/github.com/temporalio/temporal-worker-controller)
45

5-
The goal of the Temporal Worker Controller is to make it easy to run workers on Kubernetes while leveraging
6-
[Worker Deployments](https://docs.temporal.io/production-deployment/worker-deployments).
6+
> 🚀 **Public Preview**: This project is in [Public Preview](https://docs.temporal.io/evaluate/development-production-features/release-stages) and ready for production use cases*. Core functionality is complete with stable APIs.
7+
>
8+
> *Dynamic auto-scaling based on workflow load is not yet implemented. Use cases must work with fixed worker replica counts.
79
8-
## Why
10+
**The Temporal Worker Controller makes it simple and safe to deploy Temporal workers on Kubernetes.**
911

10-
Temporal's [deterministic constraints](https://docs.temporal.io/workflows#deterministic-constraints) can cause headaches
11-
when rolling out or rolling back workflow code changes.
12+
Temporal workflows require deterministic execution, which means updating worker code can break running workflows if the changes aren't backward compatible. Traditional deployment strategies force you to either risk breaking existing workflows or use Temporal's [Patching API](https://docs.temporal.io/patching) to maintain compatibility across versions.
1213

13-
The traditional approach to workflow determinism is to gate new behavior behind
14-
[versioning checks](https://docs.temporal.io/workflows#workflow-versioning), otherwise known as the Patching API. Over time these checks can become a
15-
source of technical debt, as safely removing them from a codebase is a careful process that often involves querying all
16-
running workflows.
14+
Temporal's [Worker Versioning](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning) feature solves this dilemma by providing programmatic control over worker versions and traffic routing. The Temporal Worker Controller automates a deployment system that uses Worker Versioning on Kubernetes. When you deploy new code, the controller automatically creates a new worker version while keeping the old version running. Existing workflows continue on the old version while new workflows use the new version. This approach eliminates the need for patches in many cases and ensures running workflows are never disrupted.
1715

18-
**Worker Versioning** is a Temporal feature that allows you to pin Workflows to individual versions of your workers, which
19-
are called **Worker Deployment Versions**. Using pinning, you’ll no longer need to patch most Workflows as part of routine
20-
deploys! With this guarantee, you can freely make changes that would have previously caused non-determinism errors had
21-
you done them without patching. And provided your Activities and Workflows are running in the same worker deployment version,
22-
you also do not need to ensure interface compatibility across versions.
16+
## What does it do?
2317

24-
This greatly simplifies Workflow upgrades, but the cost is that your deployment system must support multiple versions
25-
running simultaneously and allow you to control when they are sunsetted. This is typically known as a [rainbow deploy](https://release.com/blog/rainbow-deployment-why-and-how-to-do-it)
26-
(of which a **blue-green deploy** is a special case) and contrasts to a **rolling deploy** in which your Workers are upgraded in
27-
place without the ability to keep old versions around.
18+
🔒 **Protected [Pinned](https://docs.temporal.io/worker-versioning#pinned) workflows** - Workflows pinned to a version stay on that version and won't break
19+
🎚️ **Controlled rollout for [AutoUpgrade](https://docs.temporal.io/worker-versioning#auto-upgrade) workflows** - AutoUpgrade workflows shifted to new versions with configurable safety controls
20+
📦 **Automatic version management** - Registers versions with Temporal, manages routing rules, and tracks version lifecycle
21+
🎯 **Smart traffic routing** - New workflows automatically get routed to your target worker version
22+
🛡️ **Progressive rollouts** - Catch incompatible changes early with small traffic percentages before they spread
23+
**Easy rollbacks** - Instantly route traffic back to a previous version if issues are detected
2824

29-
This project aims to provide automation to enable rainbow deployments of your workers by simplifying the bookkeeping around
30-
tracking which versions still have active workflows, managing the lifecycle of versioned worker deployments, and calling
31-
Temporal APIs to update the routing config of Temporal Worker Deployments to route workflow traffic to new versions.
25+
## Quick Example
3226

33-
## Terminology
34-
Note that in Temporal, **Worker Deployment** is sometimes referred to as **Deployment**, but since the controller makes
35-
significant references to Kubernetes Deployment resource, within this repository we will stick to these terms:
36-
- **Worker Deployment Version**: A version of a deployment or service that runs [Temporal Workers](https://docs.temporal.io/workers). It can have multiple Workers, but they all run the same build. Sometimes shortened to "version" or "deployment version."
37-
- **Worker Deployment**: A deployment or service across multiple deployment versions. In a rainbow deploy, a Worker Deployment can have multiple active Deployment Versions running at once.
38-
- **Deployment**: A [Kubernetes Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/) resource. A Deployment is "versioned" if it is running versioned Temporal workers/pollers.
27+
Instead of this traditional approach where deployments can break running workflows:
3928

40-
## Features
41-
42-
- [x] Registration of new Temporal Worker Deployment Versions
43-
- [x] Creation of versioned Deployment resources (that manage the Pods that run your Temporal pollers)
44-
- [x] Deletion of resources associated with drained Worker Deployment Versions
45-
- [x] `Manual`, `AllAtOnce`, and `Progressive` rollouts of new versions
46-
- [x] Ability to specify a "gate" workflow that must succeed on the new version before routing real traffic to that version
47-
- [ ] Autoscaling of versioned Deployments
48-
49-
## Usage
50-
51-
In order to be compatible with this controller, workers need to be configured using these standard environment
52-
variables:
53-
54-
- `TEMPORAL_ADDRESS`: The host and port of the Temporal server, e.g. `default.foo.tmprl.cloud:7233`
55-
- `TEMPORAL_NAMESPACE`: The Temporal namespace to connect to, e.g. `default`
56-
- `TEMPORAL_DEPLOYMENT_NAME`: The name of the worker deployment. This must be unique to the worker deployment and should not
57-
change between versions.
58-
- `TEMPORAL_WORKER_BUILD_ID`: The build ID of the worker. This should change with each new worker rollout.
59-
60-
Each of these will be automatically set by the controller, and must not be manually specified in the worker's pod template.
61-
62-
## How It Works
63-
64-
Every `TemporalWorkerDeployment` resource manages one or more standard `Deployment` resources. Each Deployment manages pods
65-
which in turn poll Temporal for tasks routed to their respective worker versions.
66-
67-
```mermaid
68-
flowchart TD
69-
subgraph "K8s Namespace 'ns'"
70-
twd[TemporalWorkerDeployment 'foo']
71-
72-
subgraph "Current/default version"
73-
d5["Deployment foo-v5, Version{DeploymentName: foo/ns, BuildId: v5}"]
74-
rs5["ReplicaSet foo-v5"]
75-
p5a["Pod foo-v5-a"]
76-
p5b["Pod foo-v5-b"]
77-
p5c["Pod foo-v5-c"]
78-
d5 --> rs5
79-
rs5 --> p5a
80-
rs5 --> p5b
81-
rs5 --> p5c
82-
end
83-
84-
subgraph "Deprecated versions"
85-
d1["Deployment foo-v1 Version{DeploymentName: foo/ns, BuildId: v1}"]
86-
rs1["ReplicaSet foo-v1"]
87-
p1a["Pod foo-v1-a"]
88-
p1b["Pod foo-v1-b"]
89-
d1 --> rs1
90-
rs1 --> p1a
91-
rs1 --> p1b
92-
93-
dN["Deployment ..."]
94-
end
95-
end
96-
97-
twd --> d1
98-
twd --> dN
99-
twd --> d5
100-
101-
p1a -. "poll version {foo/ns, v1}" .-> server
102-
p1b -. "poll version {foo/ns, v1}" .-> server
103-
104-
p5a -. "poll version {foo/ns, v5}" .-> server
105-
p5b -. "poll version {foo/ns, v5}" .-> server
106-
p5c -. "poll version {foo/ns, v5}" .-> server
107-
108-
server["Temporal Server"]
29+
```yaml
30+
# ❌ Traditional deployment - risky for running workflows
31+
apiVersion: apps/v1
32+
kind: Deployment
33+
metadata:
34+
name: my-worker
35+
spec:
36+
template:
37+
spec:
38+
containers:
39+
- name: worker
40+
image: my-worker:v2.0.0 # This change might break existing workflows!
10941
```
11042
111-
### Worker Lifecycle
112-
113-
When a new worker deployment version is deployed, the worker controller detects it and automatically begins the process
114-
of making that version the new **Current Version** of the worker deployment it is a part of. This could happen
115-
immediately if `rollout.strategy = AllAtOnce`, or gradually if `rollout.strategy = Progressive`.
116-
117-
As older pinned workflows finish executing and deprecated deployment versions become **Drained**, the worker controller
118-
frees up resources by sunsetting the `Deployment` resources running workers that poll those versions.
43+
You define your worker like this:
11944
120-
Here is an example of a progressive cut-over strategy gated on the success of the `HelloWorld` workflow:
12145
```yaml
46+
# ✅ Temporal Worker Controller - safe deployments
47+
apiVersion: temporal.io/v1alpha1
48+
kind: TemporalWorkerDeployment
49+
metadata:
50+
name: my-worker
51+
spec:
52+
replicas: 3
12253
rollout:
123-
strategy: Progressive
54+
strategy: Progressive # Gradual, safe rollout
12455
steps:
125-
- rampPercentage: 1
126-
pauseDuration: 30s
12756
- rampPercentage: 10
128-
pauseDuration: 1m
129-
gate:
130-
workflowType: "HelloWorld"
57+
pauseDuration: 5m
58+
- rampPercentage: 50
59+
pauseDuration: 10m
60+
template:
61+
spec:
62+
containers:
63+
- name: worker
64+
image: my-worker:v2.0.0 # Safe to deploy!
13165
```
13266
133-
```mermaid
134-
sequenceDiagram
135-
autonumber
136-
participant Dev as Developer
137-
participant K8s as Kubernetes
138-
participant Ctl as WorkerController
139-
participant T as Temporal
140-
141-
Dev->>K8s: Create TemporalWorkerDeployment "foo" (v1)
142-
K8s-->>Ctl: Notify TemporalWorkerDeployment "foo" created
143-
Ctl->>K8s: Create Deployment "foo-v1"
144-
Ctl->>T: Register build "v1" as new current version of "foo/ns"
145-
Dev->>K8s: Update TemporalWorker "foo" (v2)
146-
K8s-->>Ctl: Notify TemporalWorker "foo" updated
147-
Ctl->>K8s: Create Deployment "foo-v2"
148-
Ctl->>T: Register build "v2" as new current version of "foo/ns"
149-
150-
loop Poll Temporal API
151-
Ctl-->>T: Wait for version {foo/ns, v1} to be drained (no open pinned wfs)
152-
end
153-
154-
Ctl->>K8s: Delete Deployment "foo-v1"
67+
When you update the image, the controller automatically:
68+
1. 🆕 Creates a new deployment with your updated worker
69+
2. 📊 Gradually routes new workflows and AutoUpgrade workflows to the new version
70+
3. 🔒 Keeps Pinned workflows running on their original version (guaranteed safety)
71+
4. 🧹 Automatically scales down and cleans up old versions once they are [drained](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning#sunsetting-an-old-deployment-version)
72+
73+
## 🏃‍♂️ Getting Started
74+
75+
### Prerequisites
76+
77+
- Kubernetes cluster (1.19+)
78+
- [Temporal Server](https://docs.temporal.io/) (Cloud or self-hosted [v1.28.1](https://github.com/temporalio/temporal/releases/tag/v1.28.1))
79+
- Basic familiarity with Temporal [Workers](https://docs.temporal.io/workers), [Workflows](https://docs.temporal.io/workflows), and [Worker Versioning](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning)
80+
81+
### 🔧 Installation
82+
83+
```bash
84+
# Install using Helm in your preferred namespace
85+
helm install temporal-worker-controller \
86+
oci://docker.io/temporalio/temporal-worker-controller \
87+
--namespace <your-namespace>
15588
```
15689

157-
## Contributing
90+
### Next Steps
91+
92+
**New to deploying workers with this controller?** → Start with our [Migration Guide](docs/migration-guide.md) to learn how to safely transition from traditional deployments.
93+
94+
**Ready to dive deeper?** → Check out the [Architecture Guide](docs/architecture.md) to understand how the controller works, or the [Temporal Worker Versioning docs](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning) to learn about the underlying Temporal feature.
95+
96+
**Need configuration help?** → See the [Configuration Reference](docs/configuration.md) for all available options.
97+
98+
## Features
99+
100+
-**Registration of new Temporal Worker Deployment Versions**
101+
-**Creation of versioned Deployment resources** (managing Pods that run your Temporal workers)
102+
-**Automatic lifecycle scaling** - Scales down worker versions when no longer needed
103+
-**Deletion of resources** associated with drained Worker Deployment Versions
104+
-**Multiple rollout strategies**: `Manual`, `AllAtOnce`, and `Progressive` rollouts
105+
-**Gate workflows** - Test new versions with a [pre-deployment test](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning#adding-a-pre-deployment-test) before routing real traffic to them
106+
-**Load-based auto-scaling** - Not yet implemented (use fixed replica counts)
107+
108+
109+
## 💡 Why Use This?
110+
111+
### Manual Worker Versioning is Complex
112+
113+
While Temporal's [Worker Versioning](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning) feature solves deployment safety problems, using it manually requires:
114+
115+
- **Manual API calls** - Register versions, manage routing rules, track version states
116+
- **Infrastructure coordination** - Deploy multiple Kubernetes resources for each version
117+
- **Lifecycle monitoring** - Watch for drained versions and clean up resources
118+
- **Rollout orchestration** - Manually control progressive traffic shifting
119+
120+
### The Controller Automates Everything
121+
122+
The Temporal Worker Controller eliminates this operational overhead by automating the entire Worker Versioning lifecycle on Kubernetes:
123+
124+
- **Automatic Temporal integration** - Registers versions and manages routing without manual API calls
125+
- **Kubernetes-native workflow** - Update a single custom resource, get full [rainbow deployments](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning#deployment-systems)
126+
- **Intelligent cleanup** - Monitors version [drainage](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning#sunsetting-an-old-deployment-version) and automatically removes unused resources
127+
- **Built-in rollout strategies** - Progressive, AllAtOnce, and Manual with configurable safety controls
128+
129+
## 📖 Documentation
130+
131+
| Document | Description |
132+
|----------|-------------|
133+
| [Migration Guide](docs/migration-guide.md) | Step-by-step guide for migrating from traditional deployments |
134+
| [Architecture](docs/architecture.md) | Technical deep-dive into how the controller works |
135+
| [Configuration](docs/configuration.md) | Complete configuration reference |
136+
| [Concepts](docs/concepts.md) | Key concepts and terminology |
137+
| [Limits](docs/limits.md) | Technical constraints and limitations |
138+
139+
## 🔧 Worker Configuration
140+
141+
Your workers need these environment variables (automatically set by the controller):
142+
143+
```bash
144+
TEMPORAL_ADDRESS=your-temporal-server:7233
145+
TEMPORAL_NAMESPACE=your-namespace
146+
TEMPORAL_DEPLOYMENT_NAME=my-worker # Unique worker deployment name
147+
TEMPORAL_WORKER_BUILD_ID=v1.2.3 # Version identifier
148+
```
149+
150+
**Important**: Don't set the above environment variables manually - the controller manages these automatically.
151+
152+
## 🤝 Contributing
153+
154+
We welcome all contributions! This includes:
155+
156+
- 🔧 **Code contributions** - Please start by [opening an issue](https://github.com/temporalio/temporal-worker-controller/issues/new) to discuss your idea
157+
- 🐛 **Bug reports** - [File an issue](https://github.com/temporalio/temporal-worker-controller/issues/new)
158+
- 💡 **Feature requests** - Tell us what you'd like to see
159+
- 💬 **Feedback** - Join [#safe-deploys](https://temporalio.slack.com/archives/C07MDJ6S3HP) on [Temporal Slack](https://t.mp/slack)
160+
161+
## 🛠️ Development
158162

159-
This project is in very early stages; as such external code contributions are not yet being solicited.
163+
Want to try the controller locally? Check out the [local demo guide](internal/demo/README.md) for development setup.
160164

161-
Bug reports and feature requests are welcome! Please [file an issue](https://github.com/jlegrone/worker-controller/issues/new).
165+
## 📄 License
162166

163-
You may also reach out to [#safe-deploys](https://temporalio.slack.com/archives/C07MDJ6S3HP) or @jlegrone on the
164-
[Temporal Slack](https://t.mp/slack) if you have questions, suggestions, or are interested in making other contributions.
167+
This project is licensed under the [MIT License](LICENSE).
165168

166-
## Development
169+
---
167170

168-
For local development setup and running the controller locally, see the [local demo guide](internal/demo/README.md).
171+
**Questions?** Reach out to [@jlegrone](https://github.com/jlegrone) or the [#safe-deploys](https://temporalio.slack.com/archives/C07MDJ6S3HP) channel on Temporal Slack!

0 commit comments

Comments
 (0)