docs: rollout concept doc (#819)

Azure · May 24, 2024 · af2840b · af2840b
1 parent 7a49cc9
commit af2840b
Show file tree

Hide file tree

Showing 5 changed files with 254 additions and 17 deletions.
diff --git a/apis/placement/v1/clusterresourceplacement_types.go b/apis/placement/v1/clusterresourceplacement_types.go
@@ -522,9 +522,14 @@ type RollingUpdateConfig struct {
 	// +optional
 	MaxSurge *intstr.IntOrString `json:"maxSurge,omitempty"`
 
-	// UnavailablePeriodSeconds is used to config the time to wait between rolling out phases.
-	// A resource placement is considered available after `UnavailablePeriodSeconds` seconds
-	// has passed after the resources are applied to the target cluster successfully.
+	// UnavailablePeriodSeconds is used to configure the waiting time between rollout phases when we
+	// cannot determine if the resources have rolled out successfully or not.
+	// We have a built-in resource state detector to determine the availability status of following well-known Kubernetes
+	// native resources: Deployment, StatefulSet, DaemonSet, Job, Service, Namespace, ConfigMap, Secret,
+	// ClusterRole, ClusterRoleBinding, Role, RoleBinding.
+	// Please see [SafeRollout](https://github.com/Azure/fleet/tree/main/docs/concepts/SafeRollout/README.md) for more details.
+	// For other types of resources, we consider them as available after `UnavailablePeriodSeconds` seconds
+	// have passed since they were successfully applied to the target cluster.
 	// Default is 60.
 	// +kubebuilder:default=60
 	// +optional

diff --git a/apis/placement/v1beta1/clusterresourceplacement_types.go b/apis/placement/v1beta1/clusterresourceplacement_types.go
@@ -522,9 +522,14 @@ type RollingUpdateConfig struct {
 	// +optional
 	MaxSurge *intstr.IntOrString `json:"maxSurge,omitempty"`
 
-	// UnavailablePeriodSeconds is used to config the time to wait between rolling out phases.
-	// A resource placement is considered available after `UnavailablePeriodSeconds` seconds
-	// has passed after the resources are applied to the target cluster successfully.
+	// UnavailablePeriodSeconds is used to configure the waiting time between rollout phases when we
+	// cannot determine if the resources have rolled out successfully or not.
+	// We have a built-in resource state detector to determine the availability status of following well-known Kubernetes
+	// native resources: Deployment, StatefulSet, DaemonSet, Job, Service, Namespace, ConfigMap, Secret,
+	// ClusterRole, ClusterRoleBinding, Role, RoleBinding.
+	// Please see [SafeRollout](https://github.com/Azure/fleet/tree/main/docs/concepts/SafeRollout/README.md) for more details.
+	// For other types of resources, we consider them as available after `UnavailablePeriodSeconds` seconds
+	// have passed since they were successfully applied to the target cluster.
 	// Default is 60.
 	// +kubebuilder:default=60
 	// +optional

diff --git a/config/crd/bases/placement.kubernetes-fleet.io_clusterresourceplacements.yaml b/config/crd/bases/placement.kubernetes-fleet.io_clusterresourceplacements.yaml
@@ -716,9 +716,14 @@ spec:
                       unavailablePeriodSeconds:
                         default: 60
                         description: |-
-                          UnavailablePeriodSeconds is used to config the time to wait between rolling out phases.
-                          A resource placement is considered available after `UnavailablePeriodSeconds` seconds
-                          has passed after the resources are applied to the target cluster successfully.
+                          UnavailablePeriodSeconds is used to configure the waiting time between rollout phases when we
+                          cannot determine if the resources have rolled out successfully or not.
+                          We have a built-in resource state detector to determine the availability status of following well-known Kubernetes
+                          native resources: Deployment, StatefulSet, DaemonSet, Job, Service, Namespace, ConfigMap, Secret,
+                          ClusterRole, ClusterRoleBinding, Role, RoleBinding.
+                          Please see [SafeRollout](https://github.com/Azure/fleet/tree/main/docs/concepts/SafeRollout/README.md) for more details.
+                          For other types of resources, we consider them as available after `UnavailablePeriodSeconds` seconds
+                          have passed since they were successfully applied to the target cluster.
                           Default is 60.
                         type: integer
                     type: object
@@ -1815,9 +1820,14 @@ spec:
                       unavailablePeriodSeconds:
                         default: 60
                         description: |-
-                          UnavailablePeriodSeconds is used to config the time to wait between rolling out phases.
-                          A resource placement is considered available after `UnavailablePeriodSeconds` seconds
-                          has passed after the resources are applied to the target cluster successfully.
+                          UnavailablePeriodSeconds is used to configure the waiting time between rollout phases when we
+                          cannot determine if the resources have rolled out successfully or not.
+                          We have a built-in resource state detector to determine the availability status of following well-known Kubernetes
+                          native resources: Deployment, StatefulSet, DaemonSet, Job, Service, Namespace, ConfigMap, Secret,
+                          ClusterRole, ClusterRoleBinding, Role, RoleBinding.
+                          Please see [SafeRollout](https://github.com/Azure/fleet/tree/main/docs/concepts/SafeRollout/README.md) for more details.
+                          For other types of resources, we consider them as available after `UnavailablePeriodSeconds` seconds
+                          have passed since they were successfully applied to the target cluster.
                           Default is 60.
                         type: integer
                     type: object

diff --git a/docs/concepts/README.md b/docs/concepts/README.md
@@ -4,27 +4,26 @@ The Concepts section helps you learn about the parts of the Fleet system and the
 and helps you obtain a deeper understanding how Fleet works.
 
 ## [Components](Components/README.md)
-
 The high level components concepts behind the fleet.
 
 ## [MemberCluster](MemberCluster/README.md)
 Understand `MemberCluster` concept to join or leave the fleet.
 
 ## [ClusterResourcePlacement](ClusterResourcePlacement/README.md)
-
 Concepts and resources behind the `ClusterResourcePlacement`.
 
 ## [Scheduler](Scheduler/README.md)
-
 Understand how multi-cluster scheduling works.
 
 ## [Scheduling Framework](Scheduling-Framework/README.md)
-
 Lower-level multi-cluster scheduling system design.
 
+## [Safe Rollout](SafeRollout/README.md)
+
+Understand how we support rolling out the changes in a safe way.
+
 ## [Override](Override/README.md)
 Allow slightly different manifests depends on the cluster it lands.
 
 ## [PropertyProvider](PropertyProviderAndClusterProperties/README.md)
-
 More ways to select the clusters based on its property.
diff --git a/docs/concepts/SafeRollout/README.md b/docs/concepts/SafeRollout/README.md
@@ -0,0 +1,218 @@
+# Safe Rollout
+
+One of the most important features of Fleet is the ability to safely rollout changes across multiple clusters. We do
+this by rolling out the changes in a controlled manner, ensuring that we only continue to propagate the changes to the
+next target clusters if the resources are successfully applied to the previous target clusters.
+
+## Overview
+
+We automatically propagate any resource changes that are selected by a `ClusterResourcePlacement` from the hub cluster 
+to the target clusters based on the placement policy defined in the `ClusterResourcePlacement`. In order to reduce the
+blast radius of such operation, we provide users a way to safely rollout the new changes so that a bad release 
+won't affect all the running instances all at once.
+
+## Rollout Strategy
+
+We currently only support the `RollingUpdate` rollout strategy. It updates the resources in the selected target clusters
+gradually based on the `maxUnavailable` and `maxSurge` settings.
+
+## In place update policy
+
+We always try to do in-place update by respecting the rollout strategy if there is no change in the placement. This is to avoid unnecessary
+interrupts to the running workloads when there is only resource changes. For example, if you only change the tag of the
+deployment in the namespace you want to place, we will do an in-place update on the deployments already placed on the 
+targeted cluster instead of moving the existing deployments to other clusters even if the labels or properties of the 
+current clusters are not the best to match the current placement policy.
+
+## How To Use RollingUpdateConfig
+
+RolloutUpdateConfig is used to control behavior of the rolling update strategy.
+
+### MaxUnavailable and MaxSurge
+
+`MaxUnavailable` specifies the maximum number of connected clusters to the fleet compared to `target number of clusters` 
+specified in `ClusterResourcePlacement` policy in which resources propagated by the `ClusterResourcePlacement` can be 
+unavailable. Minimum value for `MaxUnavailable` is set to 1 to avoid stuck rollout during in-place resource update.
+
+`MaxSurge` specifies the maximum number of clusters that can be scheduled with resources above the `target number of clusters` 
+specified in `ClusterResourcePlacement` policy.
+
+> **Note:** `MaxSurge` only applies to rollouts to newly scheduled clusters, and doesn't apply to rollouts of workload triggered by 
+updates to already propagated resource. For updates to already propagated resources, we always try to do the updates in 
+place with no surge.
+
+`target number of clusters` changes based on the `ClusterResourcePlacement` policy.
+
+- For PickAll, it's the number of clusters picked by the scheduler.
+- For PickN, it's the number of clusters specified in the `ClusterResourcePlacement` policy.
+- For PickFixed, it's the length of the list of cluster names specified in the `ClusterResourcePlacement` policy.
+
+#### Example 1:
+
+Consider a fleet with 4 connected member clusters (cluster-1, cluster-2, cluster-3 & cluster-4) where every member 
+cluster has label `env: prod`. The hub cluster has a namespace called `test-ns` with a deployment in it.
+
+The `ClusterResourcePlacement` spec is defined as follows:
+
+```yaml
+spec:
+  resourceSelectors:
+    - group: ""
+      kind: Namespace
+      version: v1
+      name: test-ns
+  policy:
+    placementType: PickN
+    numberOfClusters: 3
+    affinity:
+      clusterAffinity:
+        requiredDuringSchedulingIgnoredDuringExecution:
+          clusterSelectorTerms:
+            - labelSelector:
+                matchLabels:
+                  env: prod
+  strategy:
+    rollingUpdate:
+      maxUnavailable: 1
+      maxSurge: 1
+```
+
+The rollout will be as follows:
+
+- We try to pick 3 clusters out of 4, for this scenario let's say we pick cluster-1, cluster-2 & cluster-3.
+- Since we can't track the initial availability for the deployment, we rollout the namespace with deployment to 
+cluster-1, cluster-2 & cluster-3.
+
+- Then we update the deployment with a bad image name to update the resource in place on cluster-1, cluster-2 & cluster-3.
+
+- But since we have `maxUnavailable` set to 1, we will rollout the bad image name update for deployment to one of the clusters 
+(which cluster the resource is rolled out to first is non-deterministic).
+
+- Once the deployment is updated on the first cluster, we will wait for the deployment's availability to be true before 
+rolling out to the other clusters
+- And since we rolled out a bad image name update for the deployment it's availability will always be false and hence the 
+rollout for the other two clusters will be stuck
+- Users might think `maxSurge` of 1 might be utilized here but in this case since we are updating the resource in place
+`maxSurge` will not be utilized to surge and pick cluster-4.
+
+> **Note:** `maxSurge` will be utilized to pick cluster-4, if we change the policy to pick 4 cluster or change placement 
+type to `PickAll`.
+
+#### Example 2:
+
+Consider a fleet with 4 connected member clusters (cluster-1, cluster-2, cluster-3 & cluster-4) where,
+
+- cluster-1 and cluster-2 has label `loc: west`
+- cluster-3 and cluster-4 has label `loc: east`
+
+The hub cluster has a namespace called `test-ns` with a deployment in it.
+
+Initially, the `ClusterResourcePlacement` spec is defined as follows:
+
+```yaml
+spec:
+  resourceSelectors:
+    - group: ""
+      kind: Namespace
+      version: v1          
+      name: test-ns
+  policy:
+    placementType: PickN
+    numberOfClusters: 2
+    affinity:
+      clusterAffinity:
+        requiredDuringSchedulingIgnoredDuringExecution:
+          clusterSelectorTerms:
+              - labelSelector:
+                  matchLabels:
+                    loc: west
+  strategy:
+    rollingUpdate:
+      maxSurge: 2
+```
+
+The rollout will be as follows:
+- We try to pick clusters (cluster-1 and cluster-2) by specifying the label selector `loc: west`.
+- Since we can't track the initial availability for the deployment, we rollout the namespace with deployment to cluster-1
+and cluster-2 and wait till they become available.
+
+Then we update the `ClusterResourcePlacement` spec to the following:
+
+```yaml
+spec:
+  resourceSelectors:
+    - group: ""
+      kind: Namespace
+      version: v1          
+      name: test-ns
+  policy:
+    placementType: PickN
+    numberOfClusters: 2
+    affinity:
+      clusterAffinity:
+        requiredDuringSchedulingIgnoredDuringExecution:
+          clusterSelectorTerms:
+              - labelSelector:
+                  matchLabels:
+                    loc: east
+  strategy:
+    rollingUpdate:
+      maxSurge: 2
+```
+
+The rollout will be as follows:
+
+- We try to pick clusters (cluster-3 and cluster-4) by specifying the label selector `loc: east`.
+- But this time around since we have `maxSurge` set to 2 we are saying we can propagate resources to a maximum of 
+4 clusters but our target number of clusters specified is 2, we will rollout the namespace with deployment to both 
+cluster-3 and cluster-4 before removing the deployment from cluster-1 and cluster-2. 
+- And since `maxUnavailable` is always set to 25% by default which is rounded off to 1, we will remove the 
+resource from one of the existing clusters (cluster-1 or cluster-2) because when `maxUnavailable` is 1 the policy 
+mandates at least one cluster to be available.
+
+### UnavailablePeriodSeconds
+
+`UnavailablePeriodSeconds` is used to configure the waiting time between rollout phases when we cannot determine if the 
+resources have rolled out successfully or not. This field is used only if the availability of resources we propagate 
+are not trackable. Refer to the [Data only object](#data-only-objects) section for more details.
+
+## Availability based Rollout
+We have built-in mechanisms to determine the availability of some common Kubernetes native resources. We only mark them 
+as available in the target clusters when they meet the criteria we defined.
+
+### How It Works
+We have an agent running in the target cluster to check the status of the resources. We have specific criteria for each 
+of the following resources to determine if they are available or not. Here are the list of resources we support:
+
+#### Deployment
+We only mark a `Deployment` as available when all its pods are running, ready and updated according to the latest spec. 
+
+#### DaemonSet 
+We only mark a `DaemonSet` as available when all its pods are available and updated according to the latest spec on all 
+desired scheduled nodes.
+
+#### StatefulSet
+We only mark a `StatefulSet` as available when all its pods are running, ready and updated according to the latest revision.
+
+#### Job
+We only mark a `Job` as available when it has at least one succeeded pod or one ready pod.
+
+#### Service
+For `Service` based on the service type the availability is determined as follows:
+
+- For `ClusterIP` & `NodePort` service, we mark it as available when a cluster IP is assigned.
+- For `LoadBalancer` service, we mark it as available when a `LoadBalancerIngress` has been assigned along with an IP or Hostname.
+- For `ExternalName` service, checking availability is not supported, so it will be marked as available with not trackable reason.
+
+
+#### Data only objects
+
+For the objects described below since they are a data resource we mark them as available immediately after creation,
+
+- Namespace
+- Secret
+- ConfigMap
+- Role
+- ClusterRole
+- RoleBinding
+- ClusterRoleBinding