Skip to content

Commit

Permalink
docs: rollout concept doc (#819)
Browse files Browse the repository at this point in the history
  • Loading branch information
ryanzhang-oss authored May 24, 2024
1 parent 7a49cc9 commit af2840b
Show file tree
Hide file tree
Showing 5 changed files with 254 additions and 17 deletions.
11 changes: 8 additions & 3 deletions apis/placement/v1/clusterresourceplacement_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -522,9 +522,14 @@ type RollingUpdateConfig struct {
// +optional
MaxSurge *intstr.IntOrString `json:"maxSurge,omitempty"`

// UnavailablePeriodSeconds is used to config the time to wait between rolling out phases.
// A resource placement is considered available after `UnavailablePeriodSeconds` seconds
// has passed after the resources are applied to the target cluster successfully.
// UnavailablePeriodSeconds is used to configure the waiting time between rollout phases when we
// cannot determine if the resources have rolled out successfully or not.
// We have a built-in resource state detector to determine the availability status of following well-known Kubernetes
// native resources: Deployment, StatefulSet, DaemonSet, Job, Service, Namespace, ConfigMap, Secret,
// ClusterRole, ClusterRoleBinding, Role, RoleBinding.
// Please see [SafeRollout](https://github.com/Azure/fleet/tree/main/docs/concepts/SafeRollout/README.md) for more details.
// For other types of resources, we consider them as available after `UnavailablePeriodSeconds` seconds
// have passed since they were successfully applied to the target cluster.
// Default is 60.
// +kubebuilder:default=60
// +optional
Expand Down
11 changes: 8 additions & 3 deletions apis/placement/v1beta1/clusterresourceplacement_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -522,9 +522,14 @@ type RollingUpdateConfig struct {
// +optional
MaxSurge *intstr.IntOrString `json:"maxSurge,omitempty"`

// UnavailablePeriodSeconds is used to config the time to wait between rolling out phases.
// A resource placement is considered available after `UnavailablePeriodSeconds` seconds
// has passed after the resources are applied to the target cluster successfully.
// UnavailablePeriodSeconds is used to configure the waiting time between rollout phases when we
// cannot determine if the resources have rolled out successfully or not.
// We have a built-in resource state detector to determine the availability status of following well-known Kubernetes
// native resources: Deployment, StatefulSet, DaemonSet, Job, Service, Namespace, ConfigMap, Secret,
// ClusterRole, ClusterRoleBinding, Role, RoleBinding.
// Please see [SafeRollout](https://github.com/Azure/fleet/tree/main/docs/concepts/SafeRollout/README.md) for more details.
// For other types of resources, we consider them as available after `UnavailablePeriodSeconds` seconds
// have passed since they were successfully applied to the target cluster.
// Default is 60.
// +kubebuilder:default=60
// +optional
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -716,9 +716,14 @@ spec:
unavailablePeriodSeconds:
default: 60
description: |-
UnavailablePeriodSeconds is used to config the time to wait between rolling out phases.
A resource placement is considered available after `UnavailablePeriodSeconds` seconds
has passed after the resources are applied to the target cluster successfully.
UnavailablePeriodSeconds is used to configure the waiting time between rollout phases when we
cannot determine if the resources have rolled out successfully or not.
We have a built-in resource state detector to determine the availability status of following well-known Kubernetes
native resources: Deployment, StatefulSet, DaemonSet, Job, Service, Namespace, ConfigMap, Secret,
ClusterRole, ClusterRoleBinding, Role, RoleBinding.
Please see [SafeRollout](https://github.com/Azure/fleet/tree/main/docs/concepts/SafeRollout/README.md) for more details.
For other types of resources, we consider them as available after `UnavailablePeriodSeconds` seconds
have passed since they were successfully applied to the target cluster.
Default is 60.
type: integer
type: object
Expand Down Expand Up @@ -1815,9 +1820,14 @@ spec:
unavailablePeriodSeconds:
default: 60
description: |-
UnavailablePeriodSeconds is used to config the time to wait between rolling out phases.
A resource placement is considered available after `UnavailablePeriodSeconds` seconds
has passed after the resources are applied to the target cluster successfully.
UnavailablePeriodSeconds is used to configure the waiting time between rollout phases when we
cannot determine if the resources have rolled out successfully or not.
We have a built-in resource state detector to determine the availability status of following well-known Kubernetes
native resources: Deployment, StatefulSet, DaemonSet, Job, Service, Namespace, ConfigMap, Secret,
ClusterRole, ClusterRoleBinding, Role, RoleBinding.
Please see [SafeRollout](https://github.com/Azure/fleet/tree/main/docs/concepts/SafeRollout/README.md) for more details.
For other types of resources, we consider them as available after `UnavailablePeriodSeconds` seconds
have passed since they were successfully applied to the target cluster.
Default is 60.
type: integer
type: object
Expand Down
9 changes: 4 additions & 5 deletions docs/concepts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,26 @@ The Concepts section helps you learn about the parts of the Fleet system and the
and helps you obtain a deeper understanding how Fleet works.

## [Components](Components/README.md)

The high level components concepts behind the fleet.

## [MemberCluster](MemberCluster/README.md)
Understand `MemberCluster` concept to join or leave the fleet.

## [ClusterResourcePlacement](ClusterResourcePlacement/README.md)

Concepts and resources behind the `ClusterResourcePlacement`.

## [Scheduler](Scheduler/README.md)

Understand how multi-cluster scheduling works.

## [Scheduling Framework](Scheduling-Framework/README.md)

Lower-level multi-cluster scheduling system design.

## [Safe Rollout](SafeRollout/README.md)

Understand how we support rolling out the changes in a safe way.

## [Override](Override/README.md)
Allow slightly different manifests depends on the cluster it lands.

## [PropertyProvider](PropertyProviderAndClusterProperties/README.md)

More ways to select the clusters based on its property.
218 changes: 218 additions & 0 deletions docs/concepts/SafeRollout/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
# Safe Rollout

One of the most important features of Fleet is the ability to safely rollout changes across multiple clusters. We do
this by rolling out the changes in a controlled manner, ensuring that we only continue to propagate the changes to the
next target clusters if the resources are successfully applied to the previous target clusters.

## Overview

We automatically propagate any resource changes that are selected by a `ClusterResourcePlacement` from the hub cluster
to the target clusters based on the placement policy defined in the `ClusterResourcePlacement`. In order to reduce the
blast radius of such operation, we provide users a way to safely rollout the new changes so that a bad release
won't affect all the running instances all at once.

## Rollout Strategy

We currently only support the `RollingUpdate` rollout strategy. It updates the resources in the selected target clusters
gradually based on the `maxUnavailable` and `maxSurge` settings.

## In place update policy

We always try to do in-place update by respecting the rollout strategy if there is no change in the placement. This is to avoid unnecessary
interrupts to the running workloads when there is only resource changes. For example, if you only change the tag of the
deployment in the namespace you want to place, we will do an in-place update on the deployments already placed on the
targeted cluster instead of moving the existing deployments to other clusters even if the labels or properties of the
current clusters are not the best to match the current placement policy.

## How To Use RollingUpdateConfig

RolloutUpdateConfig is used to control behavior of the rolling update strategy.

### MaxUnavailable and MaxSurge

`MaxUnavailable` specifies the maximum number of connected clusters to the fleet compared to `target number of clusters`
specified in `ClusterResourcePlacement` policy in which resources propagated by the `ClusterResourcePlacement` can be
unavailable. Minimum value for `MaxUnavailable` is set to 1 to avoid stuck rollout during in-place resource update.

`MaxSurge` specifies the maximum number of clusters that can be scheduled with resources above the `target number of clusters`
specified in `ClusterResourcePlacement` policy.

> **Note:** `MaxSurge` only applies to rollouts to newly scheduled clusters, and doesn't apply to rollouts of workload triggered by
updates to already propagated resource. For updates to already propagated resources, we always try to do the updates in
place with no surge.

`target number of clusters` changes based on the `ClusterResourcePlacement` policy.

- For PickAll, it's the number of clusters picked by the scheduler.
- For PickN, it's the number of clusters specified in the `ClusterResourcePlacement` policy.
- For PickFixed, it's the length of the list of cluster names specified in the `ClusterResourcePlacement` policy.

#### Example 1:

Consider a fleet with 4 connected member clusters (cluster-1, cluster-2, cluster-3 & cluster-4) where every member
cluster has label `env: prod`. The hub cluster has a namespace called `test-ns` with a deployment in it.

The `ClusterResourcePlacement` spec is defined as follows:

```yaml
spec:
resourceSelectors:
- group: ""
kind: Namespace
version: v1
name: test-ns
policy:
placementType: PickN
numberOfClusters: 3
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- labelSelector:
matchLabels:
env: prod
strategy:
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
```
The rollout will be as follows:
- We try to pick 3 clusters out of 4, for this scenario let's say we pick cluster-1, cluster-2 & cluster-3.
- Since we can't track the initial availability for the deployment, we rollout the namespace with deployment to
cluster-1, cluster-2 & cluster-3.
- Then we update the deployment with a bad image name to update the resource in place on cluster-1, cluster-2 & cluster-3.
- But since we have `maxUnavailable` set to 1, we will rollout the bad image name update for deployment to one of the clusters
(which cluster the resource is rolled out to first is non-deterministic).

- Once the deployment is updated on the first cluster, we will wait for the deployment's availability to be true before
rolling out to the other clusters
- And since we rolled out a bad image name update for the deployment it's availability will always be false and hence the
rollout for the other two clusters will be stuck
- Users might think `maxSurge` of 1 might be utilized here but in this case since we are updating the resource in place
`maxSurge` will not be utilized to surge and pick cluster-4.

> **Note:** `maxSurge` will be utilized to pick cluster-4, if we change the policy to pick 4 cluster or change placement
type to `PickAll`.

#### Example 2:

Consider a fleet with 4 connected member clusters (cluster-1, cluster-2, cluster-3 & cluster-4) where,

- cluster-1 and cluster-2 has label `loc: west`
- cluster-3 and cluster-4 has label `loc: east`

The hub cluster has a namespace called `test-ns` with a deployment in it.

Initially, the `ClusterResourcePlacement` spec is defined as follows:

```yaml
spec:
resourceSelectors:
- group: ""
kind: Namespace
version: v1
name: test-ns
policy:
placementType: PickN
numberOfClusters: 2
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- labelSelector:
matchLabels:
loc: west
strategy:
rollingUpdate:
maxSurge: 2
```

The rollout will be as follows:
- We try to pick clusters (cluster-1 and cluster-2) by specifying the label selector `loc: west`.
- Since we can't track the initial availability for the deployment, we rollout the namespace with deployment to cluster-1
and cluster-2 and wait till they become available.

Then we update the `ClusterResourcePlacement` spec to the following:

```yaml
spec:
resourceSelectors:
- group: ""
kind: Namespace
version: v1
name: test-ns
policy:
placementType: PickN
numberOfClusters: 2
affinity:
clusterAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
clusterSelectorTerms:
- labelSelector:
matchLabels:
loc: east
strategy:
rollingUpdate:
maxSurge: 2
```

The rollout will be as follows:

- We try to pick clusters (cluster-3 and cluster-4) by specifying the label selector `loc: east`.
- But this time around since we have `maxSurge` set to 2 we are saying we can propagate resources to a maximum of
4 clusters but our target number of clusters specified is 2, we will rollout the namespace with deployment to both
cluster-3 and cluster-4 before removing the deployment from cluster-1 and cluster-2.
- And since `maxUnavailable` is always set to 25% by default which is rounded off to 1, we will remove the
resource from one of the existing clusters (cluster-1 or cluster-2) because when `maxUnavailable` is 1 the policy
mandates at least one cluster to be available.

### UnavailablePeriodSeconds

`UnavailablePeriodSeconds` is used to configure the waiting time between rollout phases when we cannot determine if the
resources have rolled out successfully or not. This field is used only if the availability of resources we propagate
are not trackable. Refer to the [Data only object](#data-only-objects) section for more details.

## Availability based Rollout
We have built-in mechanisms to determine the availability of some common Kubernetes native resources. We only mark them
as available in the target clusters when they meet the criteria we defined.

### How It Works
We have an agent running in the target cluster to check the status of the resources. We have specific criteria for each
of the following resources to determine if they are available or not. Here are the list of resources we support:

#### Deployment
We only mark a `Deployment` as available when all its pods are running, ready and updated according to the latest spec.

#### DaemonSet
We only mark a `DaemonSet` as available when all its pods are available and updated according to the latest spec on all
desired scheduled nodes.

#### StatefulSet
We only mark a `StatefulSet` as available when all its pods are running, ready and updated according to the latest revision.

#### Job
We only mark a `Job` as available when it has at least one succeeded pod or one ready pod.

#### Service
For `Service` based on the service type the availability is determined as follows:

- For `ClusterIP` & `NodePort` service, we mark it as available when a cluster IP is assigned.
- For `LoadBalancer` service, we mark it as available when a `LoadBalancerIngress` has been assigned along with an IP or Hostname.
- For `ExternalName` service, checking availability is not supported, so it will be marked as available with not trackable reason.


#### Data only objects

For the objects described below since they are a data resource we mark them as available immediately after creation,

- Namespace
- Secret
- ConfigMap
- Role
- ClusterRole
- RoleBinding
- ClusterRoleBinding

0 comments on commit af2840b

Please sign in to comment.