You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At {{page.version}}, Kubernetes supports clusters with up to 1000 nodes. More specifically, we support configurations that meet *all* of the following criteria:
8
8
9
-
* No more than 1000 nodes
9
+
* No more than 1000 nodes
10
10
* No more than 30000 total pods
11
-
* No more than 60000 total containers
12
-
* No more than 100 pods per node
13
-
14
-
* TOC
15
-
{:toc}
11
+
* No more than 60000 total containers
12
+
* No more than 100 pods per node
13
+
14
+
* TOC
15
+
{:toc}
16
16
17
17
## Setup
18
18
@@ -26,8 +26,8 @@ When setting up a large Kubernetes cluster, the following issues must be conside
26
26
27
27
### Quota Issues
28
28
29
-
To avoid running into cloud provider quota issues, when creating a cluster with many nodes, consider:
30
-
29
+
To avoid running into cloud provider quota issues, when creating a cluster with many nodes, consider:
30
+
31
31
* Increase the quota for things like CPU, IPs, etc.
32
32
* In [GCE, for example,](https://cloud.google.com/compute/docs/resource-quotas) you'll want to increase the quota for:
33
33
* CPUs
@@ -40,35 +40,58 @@ To avoid running into cloud provider quota issues, when creating a cluster with
40
40
* Target pools
41
41
* Gating the setup script so that it brings up new node VMs in smaller batches with waits in between, because some cloud providers rate limit the creation of VMs.
42
42
43
-
### Etcd storage
44
-
45
-
To improve performance of large clusters, we store events in a separate dedicated etcd instance.
46
-
47
-
When creating a cluster, existing salt scripts:
48
-
49
-
* start and configure additional etcd instance
50
-
* configure api-server to use it for storing events
51
-
43
+
### Etcd storage
44
+
45
+
To improve performance of large clusters, we store events in a separate dedicated etcd instance.
46
+
47
+
When creating a cluster, existing salt scripts:
48
+
49
+
* start and configure additional etcd instance
50
+
* configure api-server to use it for storing events
51
+
52
+
### Size of master and master components
53
+
54
+
On GCE/GKE and AWS, `kube-up` automatically configures the proper VM size for your master depending on the number of nodes
55
+
in your cluster. On other providers, you will need to configure it manually. For reference, the sizes we use on GCE are
56
+
57
+
* 1-5 nodes: n1-standard-1
58
+
* 6-10 nodes: n1-standard-2
59
+
* 11-100 nodes: n1-standard-4
60
+
* 101-250 nodes: n1-standard-8
61
+
* 251-500 nodes: n1-standard-16
62
+
* more than 500 nodes: n1-standard-32
63
+
64
+
And the sizes we use on AWS are
65
+
66
+
* 1-5 nodes: m3.medium
67
+
* 6-10 nodes: m3.large
68
+
* 11-100 nodes: m3.xlarge
69
+
* 101-250 nodes: m3.2xlarge
70
+
* 251-500 nodes: c4.4xlarge
71
+
* more than 500 nodes: c4.8xlarge
72
+
73
+
Note that these master node sizes are currently only set at cluster startup time, and are not adjusted if you later scale your cluster up or down (e.g. manually removing or adding nodes, or using a cluster autoscaler).
74
+
52
75
### Addon Resources
53
76
54
77
To prevent memory leaks or other resource issues in [cluster addons](https://releases.k8s.io/{{page.githubbranch}}/cluster/addons) from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR [#10653](http://pr.k8s.io/10653/files) and [#10778](http://pr.k8s.io/10778/files)).
55
78
56
79
For [example](https://github.com/kubernetes/kubernetes/tree/{{page.githubbranch}}/cluster/saltbase/salt/fluentd-gcp/fluentd-gcp.yaml):
57
80
58
-
```yaml
81
+
```yaml
59
82
containers:
60
-
- name: fluentd-cloud-logging
83
+
- name: fluentd-cloud-logging
61
84
image: gcr.io/google_containers/fluentd-gcp:1.16
62
85
resources:
63
86
limits:
64
87
cpu: 100m
65
88
memory: 200Mi
66
-
```
67
-
89
+
```
90
+
68
91
Except for Heapster, these limits are static and are based on data we collected from addons running on 4-node clusters (see [#10335](http://issue.k8s.io/10335#issuecomment-117861225)). The addons consume a lot more resources when running on large deployment clusters (see [#5880](http://issue.k8s.io/5880#issuecomment-113984085)). So, if a large cluster is deployed without adjusting these values, the addons may continuously get killed because they keep hitting the limits.
69
92
70
-
To avoid running into cluster addon resource issues, when creating a cluster with many nodes, consider the following:
71
-
93
+
To avoid running into cluster addon resource issues, when creating a cluster with many nodes, consider the following:
94
+
72
95
* Scale memory and CPU limits for each of the following addons, if used, as you scale up the size of cluster (there is one replica of each handling the entire cluster so memory and CPU usage tends to grow proportionally with size/load on cluster):
73
96
* [InfluxDB and Grafana](http://releases.k8s.io/{{page.githubbranch}}/cluster/addons/cluster-monitoring/influxdb/influxdb-grafana-controller.yaml)
74
97
* [skydns, kube2sky, and dns etcd](http://releases.k8s.io/{{page.githubbranch}}/cluster/addons/dns/skydns-rc.yaml.in)
@@ -79,20 +102,20 @@ To avoid running into cluster addon resource issues, when creating a cluster wit
79
102
* [FluentD with ElasticSearch Plugin](http://releases.k8s.io/{{page.githubbranch}}/cluster/saltbase/salt/fluentd-es/fluentd-es.yaml)
80
103
* [FluentD with GCP Plugin](http://releases.k8s.io/{{page.githubbranch}}/cluster/saltbase/salt/fluentd-gcp/fluentd-gcp.yaml)
81
104
82
-
Heapster's resource limits are set dynamically based on the initial size of your cluster (see [#16185](http://issue.k8s.io/16185) and [#21258](http://issue.k8s.io/21258)). If you find that Heapster is running
83
-
out of resources, you should adjust the formulas that compute heapster memory request (see those PRs for details).
84
-
85
-
For directions on how to detect if addon containers are hitting resource limits, see the [Troubleshooting section of Compute Resources](/docs/user-guide/compute-resources/#troubleshooting).
86
-
87
-
In the [future](http://issue.k8s.io/13048), we anticipate to set all cluster addon resource limits based on cluster size, and to dynamically adjust them if you grow or shrink your cluster.
88
-
We welcome PRs that implement those features.
89
-
90
-
### Allowing minor node failure at startup
91
-
92
-
For various reasons (see [#18969](https://github.com/kubernetes/kubernetes/issues/18969) for more details) running
93
-
`kube-up.sh` with a very large `NUM_NODES` may fail due to a very small number of nodes not coming up properly.
94
-
Currently you have two choices: restart the cluster (`kube-down.sh` and then `kube-up.sh` again), or before
95
-
running `kube-up.sh` set the environment variable `ALLOWED_NOTREADY_NODES` to whatever value you feel comfortable
96
-
with. This will allow `kube-up.sh` to succeed with fewer than `NUM_NODES` coming up. Depending on the
97
-
reason for the failure, those additional nodes may join later or the cluster may remain at a size of
98
-
`NUM_NODES - ALLOWED_NOTREADY_NODES`.
105
+
Heapster's resource limits are set dynamically based on the initial size of your cluster (see [#16185](http://issue.k8s.io/16185) and [#21258](http://issue.k8s.io/21258)). If you find that Heapster is running
106
+
out of resources, you should adjust the formulas that compute heapster memory request (see those PRs for details).
107
+
108
+
For directions on how to detect if addon containers are hitting resource limits, see the [Troubleshooting section of Compute Resources](/docs/user-guide/compute-resources/#troubleshooting).
109
+
110
+
In the [future](http://issue.k8s.io/13048), we anticipate to set all cluster addon resource limits based on cluster size, and to dynamically adjust them if you grow or shrink your cluster.
111
+
We welcome PRs that implement those features.
112
+
113
+
### Allowing minor node failure at startup
114
+
115
+
For various reasons (see [#18969](https://github.com/kubernetes/kubernetes/issues/18969) for more details) running
116
+
`kube-up.sh` with a very large `NUM_NODES` may fail due to a very small number of nodes not coming up properly.
117
+
Currently you have two choices: restart the cluster (`kube-down.sh` and then `kube-up.sh` again), or before
118
+
running `kube-up.sh` set the environment variable `ALLOWED_NOTREADY_NODES` to whatever value you feel comfortable
119
+
with. This will allow `kube-up.sh` to succeed with fewer than `NUM_NODES` coming up. Depending on the
120
+
reason for the failure, those additional nodes may join later or the cluster may remain at a size of
0 commit comments