Merge pull request kubernetes#77 from kubernetes/davidopp-patch-1

David Oppenheimer · David Oppenheimer · commit 3a2026677da5 · 2016-03-14T21:50:15.000-07:00
Add to cluster-large.md CPU and memory consumption for master components at various cluster sizes
diff --git a/docs/admin/cluster-large.md b/docs/admin/cluster-large.md
@@ -1,18 +1,18 @@
----
----
+---
+---
 
 
 ## Support
 
 At {{page.version}}, Kubernetes supports clusters with up to 1000 nodes. More specifically, we support configurations that meet *all* of the following criteria:
 
-* No more than 1000 nodes
+* No more than 1000 nodes
 * No more than 30000 total pods
-* No more than 60000 total containers
-* No more than 100 pods per node
-
-* TOC
-{:toc}
+* No more than 60000 total containers
+* No more than 100 pods per node
+
+* TOC
+{:toc}
 
 ## Setup
 
@@ -26,8 +26,8 @@ When setting up a large Kubernetes cluster, the following issues must be conside
 
 ### Quota Issues
 
-To avoid running into cloud provider quota issues, when creating a cluster with many nodes, consider:
-
+To avoid running into cloud provider quota issues, when creating a cluster with many nodes, consider:
+
 * Increase the quota for things like CPU, IPs, etc.
   * In [GCE, for example,](https://cloud.google.com/compute/docs/resource-quotas) you'll want to increase the quota for:
     * CPUs
@@ -40,35 +40,58 @@ To avoid running into cloud provider quota issues, when creating a cluster with
     * Target pools
 * Gating the setup script so that it brings up new node VMs in smaller batches with waits in between, because some cloud providers rate limit the creation of VMs.
 
-### Etcd storage
-
-To improve performance of large clusters, we store events in a separate dedicated etcd instance.
-
-When creating a cluster, existing salt scripts:
-
-* start and configure additional etcd instance
-* configure api-server to use it for storing events
-
+### Etcd storage
+
+To improve performance of large clusters, we store events in a separate dedicated etcd instance.
+
+When creating a cluster, existing salt scripts:
+
+* start and configure additional etcd instance
+* configure api-server to use it for storing events
+
+### Size of master and master components
+
+On GCE/GKE and AWS, `kube-up` automatically configures the proper VM size for your master depending on the number of nodes
+in your cluster. On other providers, you will need to configure it manually. For reference, the sizes we use on GCE are
+
+* 1-5 nodes: n1-standard-1
+* 6-10 nodes: n1-standard-2
+* 11-100 nodes: n1-standard-4
+* 101-250 nodes: n1-standard-8
+* 251-500 nodes: n1-standard-16
+* more than 500 nodes: n1-standard-32
+
+And the sizes we use on AWS are
+
+* 1-5 nodes: m3.medium
+* 6-10 nodes: m3.large
+* 11-100 nodes: m3.xlarge
+* 101-250 nodes: m3.2xlarge
+* 251-500 nodes: c4.4xlarge
+* more than 500 nodes: c4.8xlarge
+
+Note that these master node sizes are currently only set at cluster startup time, and are not adjusted if you later scale your cluster up or down (e.g. manually removing or adding nodes, or using a cluster autoscaler).
+
 ### Addon Resources
 
 To prevent memory leaks or other resource issues in [cluster addons](https://releases.k8s.io/{{page.githubbranch}}/cluster/addons) from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR [#10653](http://pr.k8s.io/10653/files) and [#10778](http://pr.k8s.io/10778/files)).
 
 For [example](https://github.com/kubernetes/kubernetes/tree/{{page.githubbranch}}/cluster/saltbase/salt/fluentd-gcp/fluentd-gcp.yaml):
 
-```yaml
+```yaml
   containers:
-  - name: fluentd-cloud-logging
+  - name: fluentd-cloud-logging
     image: gcr.io/google_containers/fluentd-gcp:1.16
     resources:
       limits:
         cpu: 100m
         memory: 200Mi
-```
-
+```
+
 Except for Heapster, these limits are static and are based on data we collected from addons running on 4-node clusters (see [#10335](http://issue.k8s.io/10335#issuecomment-117861225)). The addons consume a lot more resources when running on large deployment clusters (see [#5880](http://issue.k8s.io/5880#issuecomment-113984085)). So, if a large cluster is deployed without adjusting these values, the addons may continuously get killed because they keep hitting the limits.
 
-To avoid running into cluster addon resource issues, when creating a cluster with many nodes, consider the following:
-
+To avoid running into cluster addon resource issues, when creating a cluster with many nodes, consider the following:
+
 * Scale memory and CPU limits for each of the following addons, if used, as you scale up the size of cluster (there is one replica of each handling the entire cluster so memory and CPU usage tends to grow proportionally with size/load on cluster):
   * [InfluxDB and Grafana](http://releases.k8s.io/{{page.githubbranch}}/cluster/addons/cluster-monitoring/influxdb/influxdb-grafana-controller.yaml)
   * [skydns, kube2sky, and dns etcd](http://releases.k8s.io/{{page.githubbranch}}/cluster/addons/dns/skydns-rc.yaml.in)
@@ -79,20 +102,20 @@ To avoid running into cluster addon resource issues, when creating a cluster wit
   * [FluentD with ElasticSearch Plugin](http://releases.k8s.io/{{page.githubbranch}}/cluster/saltbase/salt/fluentd-es/fluentd-es.yaml)
   * [FluentD with GCP Plugin](http://releases.k8s.io/{{page.githubbranch}}/cluster/saltbase/salt/fluentd-gcp/fluentd-gcp.yaml)
 
-Heapster's resource limits are set dynamically based on the initial size of your cluster (see [#16185](http://issue.k8s.io/16185) and [#21258](http://issue.k8s.io/21258)). If you find that Heapster is running
-out of resources, you should adjust the formulas that compute heapster memory request (see those PRs for details).
-
-For directions on how to detect if addon containers are hitting resource limits, see the [Troubleshooting section of Compute Resources](/docs/user-guide/compute-resources/#troubleshooting).
-
-In the [future](http://issue.k8s.io/13048), we anticipate to set all cluster addon resource limits based on cluster size, and to dynamically adjust them if you grow or shrink your cluster.
-We welcome PRs that implement those features.
-
-### Allowing minor node failure at startup
-
-For various reasons (see [#18969](https://github.com/kubernetes/kubernetes/issues/18969) for more details) running
-`kube-up.sh` with a very large `NUM_NODES` may fail due to a very small number of nodes not coming up properly.
-Currently you have two choices: restart the cluster (`kube-down.sh` and then `kube-up.sh` again), or before
-running `kube-up.sh` set the environment variable `ALLOWED_NOTREADY_NODES` to whatever value you feel comfortable
-with. This will allow `kube-up.sh` to succeed with fewer than `NUM_NODES` coming up. Depending on the
-reason for the failure, those additional nodes may join later or the cluster may remain at a size of
-`NUM_NODES - ALLOWED_NOTREADY_NODES`.
+Heapster's resource limits are set dynamically based on the initial size of your cluster (see [#16185](http://issue.k8s.io/16185) and [#21258](http://issue.k8s.io/21258)). If you find that Heapster is running
+out of resources, you should adjust the formulas that compute heapster memory request (see those PRs for details).
+
+For directions on how to detect if addon containers are hitting resource limits, see the [Troubleshooting section of Compute Resources](/docs/user-guide/compute-resources/#troubleshooting).
+
+In the [future](http://issue.k8s.io/13048), we anticipate to set all cluster addon resource limits based on cluster size, and to dynamically adjust them if you grow or shrink your cluster.
+We welcome PRs that implement those features.
+
+### Allowing minor node failure at startup
+
+For various reasons (see [#18969](https://github.com/kubernetes/kubernetes/issues/18969) for more details) running
+`kube-up.sh` with a very large `NUM_NODES` may fail due to a very small number of nodes not coming up properly.
+Currently you have two choices: restart the cluster (`kube-down.sh` and then `kube-up.sh` again), or before
+running `kube-up.sh` set the environment variable `ALLOWED_NOTREADY_NODES` to whatever value you feel comfortable
+with. This will allow `kube-up.sh` to succeed with fewer than `NUM_NODES` coming up. Depending on the
+reason for the failure, those additional nodes may join later or the cluster may remain at a size of
+`NUM_NODES - ALLOWED_NOTREADY_NODES`.