Skip to content

Commit 3a20266

Browse files
author
David Oppenheimer
committed
Merge pull request kubernetes#77 from kubernetes/davidopp-patch-1
Add to cluster-large.md CPU and memory consumption for master components at various cluster sizes
2 parents e3b57c3 + 7c81b53 commit 3a20266

File tree

1 file changed

+65
-42
lines changed

1 file changed

+65
-42
lines changed

docs/admin/cluster-large.md

Lines changed: 65 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
1-
---
2-
---
1+
---
2+
---
33

44

55
## Support
66

77
At {{page.version}}, Kubernetes supports clusters with up to 1000 nodes. More specifically, we support configurations that meet *all* of the following criteria:
88

9-
* No more than 1000 nodes
9+
* No more than 1000 nodes
1010
* No more than 30000 total pods
11-
* No more than 60000 total containers
12-
* No more than 100 pods per node
13-
14-
* TOC
15-
{:toc}
11+
* No more than 60000 total containers
12+
* No more than 100 pods per node
13+
14+
* TOC
15+
{:toc}
1616

1717
## Setup
1818

@@ -26,8 +26,8 @@ When setting up a large Kubernetes cluster, the following issues must be conside
2626

2727
### Quota Issues
2828

29-
To avoid running into cloud provider quota issues, when creating a cluster with many nodes, consider:
30-
29+
To avoid running into cloud provider quota issues, when creating a cluster with many nodes, consider:
30+
3131
* Increase the quota for things like CPU, IPs, etc.
3232
* In [GCE, for example,](https://cloud.google.com/compute/docs/resource-quotas) you'll want to increase the quota for:
3333
* CPUs
@@ -40,35 +40,58 @@ To avoid running into cloud provider quota issues, when creating a cluster with
4040
* Target pools
4141
* Gating the setup script so that it brings up new node VMs in smaller batches with waits in between, because some cloud providers rate limit the creation of VMs.
4242

43-
### Etcd storage
44-
45-
To improve performance of large clusters, we store events in a separate dedicated etcd instance.
46-
47-
When creating a cluster, existing salt scripts:
48-
49-
* start and configure additional etcd instance
50-
* configure api-server to use it for storing events
51-
43+
### Etcd storage
44+
45+
To improve performance of large clusters, we store events in a separate dedicated etcd instance.
46+
47+
When creating a cluster, existing salt scripts:
48+
49+
* start and configure additional etcd instance
50+
* configure api-server to use it for storing events
51+
52+
### Size of master and master components
53+
54+
On GCE/GKE and AWS, `kube-up` automatically configures the proper VM size for your master depending on the number of nodes
55+
in your cluster. On other providers, you will need to configure it manually. For reference, the sizes we use on GCE are
56+
57+
* 1-5 nodes: n1-standard-1
58+
* 6-10 nodes: n1-standard-2
59+
* 11-100 nodes: n1-standard-4
60+
* 101-250 nodes: n1-standard-8
61+
* 251-500 nodes: n1-standard-16
62+
* more than 500 nodes: n1-standard-32
63+
64+
And the sizes we use on AWS are
65+
66+
* 1-5 nodes: m3.medium
67+
* 6-10 nodes: m3.large
68+
* 11-100 nodes: m3.xlarge
69+
* 101-250 nodes: m3.2xlarge
70+
* 251-500 nodes: c4.4xlarge
71+
* more than 500 nodes: c4.8xlarge
72+
73+
Note that these master node sizes are currently only set at cluster startup time, and are not adjusted if you later scale your cluster up or down (e.g. manually removing or adding nodes, or using a cluster autoscaler).
74+
5275
### Addon Resources
5376

5477
To prevent memory leaks or other resource issues in [cluster addons](https://releases.k8s.io/{{page.githubbranch}}/cluster/addons) from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR [#10653](http://pr.k8s.io/10653/files) and [#10778](http://pr.k8s.io/10778/files)).
5578

5679
For [example](https://github.com/kubernetes/kubernetes/tree/{{page.githubbranch}}/cluster/saltbase/salt/fluentd-gcp/fluentd-gcp.yaml):
5780

58-
```yaml
81+
```yaml
5982
containers:
60-
- name: fluentd-cloud-logging
83+
- name: fluentd-cloud-logging
6184
image: gcr.io/google_containers/fluentd-gcp:1.16
6285
resources:
6386
limits:
6487
cpu: 100m
6588
memory: 200Mi
66-
```
67-
89+
```
90+
6891
Except for Heapster, these limits are static and are based on data we collected from addons running on 4-node clusters (see [#10335](http://issue.k8s.io/10335#issuecomment-117861225)). The addons consume a lot more resources when running on large deployment clusters (see [#5880](http://issue.k8s.io/5880#issuecomment-113984085)). So, if a large cluster is deployed without adjusting these values, the addons may continuously get killed because they keep hitting the limits.
6992
70-
To avoid running into cluster addon resource issues, when creating a cluster with many nodes, consider the following:
71-
93+
To avoid running into cluster addon resource issues, when creating a cluster with many nodes, consider the following:
94+
7295
* Scale memory and CPU limits for each of the following addons, if used, as you scale up the size of cluster (there is one replica of each handling the entire cluster so memory and CPU usage tends to grow proportionally with size/load on cluster):
7396
* [InfluxDB and Grafana](http://releases.k8s.io/{{page.githubbranch}}/cluster/addons/cluster-monitoring/influxdb/influxdb-grafana-controller.yaml)
7497
* [skydns, kube2sky, and dns etcd](http://releases.k8s.io/{{page.githubbranch}}/cluster/addons/dns/skydns-rc.yaml.in)
@@ -79,20 +102,20 @@ To avoid running into cluster addon resource issues, when creating a cluster wit
79102
* [FluentD with ElasticSearch Plugin](http://releases.k8s.io/{{page.githubbranch}}/cluster/saltbase/salt/fluentd-es/fluentd-es.yaml)
80103
* [FluentD with GCP Plugin](http://releases.k8s.io/{{page.githubbranch}}/cluster/saltbase/salt/fluentd-gcp/fluentd-gcp.yaml)
81104
82-
Heapster's resource limits are set dynamically based on the initial size of your cluster (see [#16185](http://issue.k8s.io/16185) and [#21258](http://issue.k8s.io/21258)). If you find that Heapster is running
83-
out of resources, you should adjust the formulas that compute heapster memory request (see those PRs for details).
84-
85-
For directions on how to detect if addon containers are hitting resource limits, see the [Troubleshooting section of Compute Resources](/docs/user-guide/compute-resources/#troubleshooting).
86-
87-
In the [future](http://issue.k8s.io/13048), we anticipate to set all cluster addon resource limits based on cluster size, and to dynamically adjust them if you grow or shrink your cluster.
88-
We welcome PRs that implement those features.
89-
90-
### Allowing minor node failure at startup
91-
92-
For various reasons (see [#18969](https://github.com/kubernetes/kubernetes/issues/18969) for more details) running
93-
`kube-up.sh` with a very large `NUM_NODES` may fail due to a very small number of nodes not coming up properly.
94-
Currently you have two choices: restart the cluster (`kube-down.sh` and then `kube-up.sh` again), or before
95-
running `kube-up.sh` set the environment variable `ALLOWED_NOTREADY_NODES` to whatever value you feel comfortable
96-
with. This will allow `kube-up.sh` to succeed with fewer than `NUM_NODES` coming up. Depending on the
97-
reason for the failure, those additional nodes may join later or the cluster may remain at a size of
98-
`NUM_NODES - ALLOWED_NOTREADY_NODES`.
105+
Heapster's resource limits are set dynamically based on the initial size of your cluster (see [#16185](http://issue.k8s.io/16185) and [#21258](http://issue.k8s.io/21258)). If you find that Heapster is running
106+
out of resources, you should adjust the formulas that compute heapster memory request (see those PRs for details).
107+
108+
For directions on how to detect if addon containers are hitting resource limits, see the [Troubleshooting section of Compute Resources](/docs/user-guide/compute-resources/#troubleshooting).
109+
110+
In the [future](http://issue.k8s.io/13048), we anticipate to set all cluster addon resource limits based on cluster size, and to dynamically adjust them if you grow or shrink your cluster.
111+
We welcome PRs that implement those features.
112+
113+
### Allowing minor node failure at startup
114+
115+
For various reasons (see [#18969](https://github.com/kubernetes/kubernetes/issues/18969) for more details) running
116+
`kube-up.sh` with a very large `NUM_NODES` may fail due to a very small number of nodes not coming up properly.
117+
Currently you have two choices: restart the cluster (`kube-down.sh` and then `kube-up.sh` again), or before
118+
running `kube-up.sh` set the environment variable `ALLOWED_NOTREADY_NODES` to whatever value you feel comfortable
119+
with. This will allow `kube-up.sh` to succeed with fewer than `NUM_NODES` coming up. Depending on the
120+
reason for the failure, those additional nodes may join later or the cluster may remain at a size of
121+
`NUM_NODES - ALLOWED_NOTREADY_NODES`.

0 commit comments

Comments
 (0)