Skip to content

Commit

Permalink
Scheduler extension
Browse files Browse the repository at this point in the history
  • Loading branch information
Ravi Gadde committed Nov 25, 2015
1 parent ef84c57 commit cadc24e
Show file tree
Hide file tree
Showing 20 changed files with 1,278 additions and 138 deletions.
117 changes: 117 additions & 0 deletions docs/design/scheduler_extender.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->

<!-- BEGIN STRIP_FOR_RELEASE -->

<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">
<img src="http://kubernetes.io/img/warning.png" alt="WARNING"
width="25" height="25">

<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>

If you are using a released version of Kubernetes, you should
refer to the docs that go with that version.

<strong>
The latest release of this document can be found
[here](http://releases.k8s.io/release-1.1/docs/design/scheduler_extender.md).

Documentation for other releases can be found at
[releases.k8s.io](http://releases.k8s.io).
</strong>
--

<!-- END STRIP_FOR_RELEASE -->

<!-- END MUNGE: UNVERSIONED_WARNING -->

# Scheduler extender

There are three ways to add new scheduling rules (predicates and priority functions) to Kubernetes: (1) by adding these rules to the scheduler and recompiling (described here: https://github.com/kubernetes/kubernetes/blob/master/docs/devel/scheduler.md), (2) implementing your own scheduler process that runs instead of, or alongside of, the standard Kubernetes scheduler, (3) implementing a "scheduler extender" process that the standard Kubernetes scheduler calls out to as a final pass when making scheduling decisions.

This document describes the third approach. This approach is needed for use cases where scheduling decisions need to be made on resources not directly managed by the standard Kubernetes scheduler. The extender helps make scheduling decisions based on such resources. (Note that the three approaches are not mutually exclusive.)

When scheduling a pod, the extender allows an external process to filter and prioritize nodes. Two separate http/https calls are issued to the extender, one for "filter" and one for "prioritize" actions. To use the extender, you must create a scheduler policy configuration file. The configuration specifies how to reach the extender, whether to use http or https and the timeout.

```go
// Holds the parameters used to communicate with the extender. If a verb is unspecified/empty,
// it is assumed that the extender chose not to provide that extension.
type ExtenderConfig struct {
// URLPrefix at which the extender is available
URLPrefix string `json:"urlPrefix"`
// Verb for the filter call, empty if not supported. This verb is appended to the URLPrefix when issuing the filter call to extender.
FilterVerb string `json:"filterVerb,omitempty"`
// Verb for the prioritize call, empty if not supported. This verb is appended to the URLPrefix when issuing the prioritize call to extender.
PrioritizeVerb string `json:"prioritizeVerb,omitempty"`
// The numeric multiplier for the node scores that the prioritize call generates.
// The weight should be a positive integer
Weight int `json:"weight,omitempty"`
// EnableHttps specifies whether https should be used to communicate with the extender
EnableHttps bool `json:"enableHttps,omitempty"`
// TLSConfig specifies the transport layer security config
TLSConfig *client.TLSClientConfig `json:"tlsConfig,omitempty"`
// HTTPTimeout specifies the timeout duration for a call to the extender. Filter timeout fails the scheduling of the pod. Prioritize
// timeout is ignored, k8s/other extenders priorities are used to select the node.
HTTPTimeout time.Duration `json:"httpTimeout,omitempty"`
}
```

A sample scheduler policy file with extender configuration:

```json
{
"predicates": [
{
"name": "HostName"
},
{
"name": "MatchNodeSelector"
},
{
"name": "PodFitsResources"
}
],
"priorities": [
{
"name": "LeastRequestedPriority",
"weight": 1
}
],
"extenders": [
{
"urlPrefix": "http://127.0.0.1:12345/api/scheduler",
"filterVerb": "filter",
"enableHttps": false
}
]
}
```

Arguments passed to the FilterVerb endpoint on the extender are the set of nodes filtered through the k8s predicates and the pod. Arguments passed to the PrioritizeVerb endpoint on the extender are the set of nodes filtered through the k8s predicates and extender predicates and the pod.

```go
// ExtenderArgs represents the arguments needed by the extender to filter/prioritize
// nodes for a pod.
type ExtenderArgs struct {
// Pod being scheduled
Pod api.Pod `json:"pod"`
// List of candidate nodes where the pod can be scheduled
Nodes api.NodeList `json:"nodes"`
}
```

The "filter" call returns a list of nodes (api.NodeList). The "prioritize" call returns priorities for each node (schedulerapi.HostPriorityList).

The "filter" call may prune the set of nodes based on its predicates. Scores returned by the "prioritize" call are added to the k8s scores (computed through its priority functions) and used for final host selection.

Multiple extenders can be configured in the scheduler policy.

<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/scheduler_extender.md?pixel)]()
<!-- END MUNGE: GENERATED_ANALYTICS -->
5 changes: 3 additions & 2 deletions examples/examples_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,8 @@ func TestExampleObjectSchemas(t *testing.T) {
"daemon": &extensions.DaemonSet{},
},
"../examples": {
"scheduler-policy-config": &schedulerapi.Policy{},
"scheduler-policy-config": &schedulerapi.Policy{},
"scheduler-policy-config-with-extender": &schedulerapi.Policy{},
},
"../examples/rbd/secret": {
"ceph-secret": &api.Secret{},
Expand Down Expand Up @@ -409,7 +410,7 @@ func TestExampleObjectSchemas(t *testing.T) {
t.Logf("skipping : %s/%s\n", path, name)
return
}
if name == "scheduler-policy-config" {
if strings.Contains(name, "scheduler-policy-config") {
if err := schedulerapilatest.Codec.DecodeInto(data, expectedType); err != nil {
t.Errorf("%s did not decode correctly: %v\n%s", path, err, string(data))
return
Expand Down
25 changes: 25 additions & 0 deletions examples/scheduler-policy-config-with-extender.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"kind" : "Policy",
"apiVersion" : "v1",
"predicates" : [
{"name" : "PodFitsPorts"},
{"name" : "PodFitsResources"},
{"name" : "NoDiskConflict"},
{"name" : "MatchNodeSelector"},
{"name" : "HostName"}
],
"priorities" : [
{"name" : "LeastRequestedPriority", "weight" : 1},
{"name" : "BalancedResourceAllocation", "weight" : 1},
{"name" : "ServiceSpreadingPriority", "weight" : 1},
{"name" : "EqualPriority", "weight" : 1}
],
"extender": {
"url": "http://127.0.0.1:12346/scheduler",
"apiVersion": "v1beta1",
"filterVerb": "filter",
"prioritizeVerb": "prioritize",
"weight": 5,
"enableHttps": false
}
}
27 changes: 14 additions & 13 deletions plugin/pkg/scheduler/algorithm/priorities/priorities.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import (
"k8s.io/kubernetes/pkg/labels"
"k8s.io/kubernetes/plugin/pkg/scheduler/algorithm"
"k8s.io/kubernetes/plugin/pkg/scheduler/algorithm/predicates"
schedulerapi "k8s.io/kubernetes/plugin/pkg/scheduler/api"
)

// the unused capacity is calculated on a scale of 0-10
Expand Down Expand Up @@ -73,7 +74,7 @@ func getNonzeroRequests(requests *api.ResourceList) (int64, int64) {

// Calculate the resource occupancy on a node. 'node' has information about the resources on the node.
// 'pods' is a list of pods currently scheduled on the node.
func calculateResourceOccupancy(pod *api.Pod, node api.Node, pods []*api.Pod) algorithm.HostPriority {
func calculateResourceOccupancy(pod *api.Pod, node api.Node, pods []*api.Pod) schedulerapi.HostPriority {
totalMilliCPU := int64(0)
totalMemory := int64(0)
capacityMilliCPU := node.Status.Capacity.Cpu().MilliValue()
Expand Down Expand Up @@ -104,7 +105,7 @@ func calculateResourceOccupancy(pod *api.Pod, node api.Node, pods []*api.Pod) al
cpuScore, memoryScore,
)

return algorithm.HostPriority{
return schedulerapi.HostPriority{
Host: node.Name,
Score: int((cpuScore + memoryScore) / 2),
}
Expand All @@ -114,14 +115,14 @@ func calculateResourceOccupancy(pod *api.Pod, node api.Node, pods []*api.Pod) al
// It calculates the percentage of memory and CPU requested by pods scheduled on the node, and prioritizes
// based on the minimum of the average of the fraction of requested to capacity.
// Details: cpu((capacity - sum(requested)) * 10 / capacity) + memory((capacity - sum(requested)) * 10 / capacity) / 2
func LeastRequestedPriority(pod *api.Pod, podLister algorithm.PodLister, nodeLister algorithm.NodeLister) (algorithm.HostPriorityList, error) {
func LeastRequestedPriority(pod *api.Pod, podLister algorithm.PodLister, nodeLister algorithm.NodeLister) (schedulerapi.HostPriorityList, error) {
nodes, err := nodeLister.List()
if err != nil {
return algorithm.HostPriorityList{}, err
return schedulerapi.HostPriorityList{}, err
}
podsToMachines, err := predicates.MapPodsToMachines(podLister)

list := algorithm.HostPriorityList{}
list := schedulerapi.HostPriorityList{}
for _, node := range nodes.Items {
list = append(list, calculateResourceOccupancy(pod, node, podsToMachines[node.Name]))
}
Expand All @@ -144,7 +145,7 @@ func NewNodeLabelPriority(label string, presence bool) algorithm.PriorityFunctio
// CalculateNodeLabelPriority checks whether a particular label exists on a node or not, regardless of its value.
// If presence is true, prioritizes nodes that have the specified label, regardless of value.
// If presence is false, prioritizes nodes that do not have the specified label.
func (n *NodeLabelPrioritizer) CalculateNodeLabelPriority(pod *api.Pod, podLister algorithm.PodLister, nodeLister algorithm.NodeLister) (algorithm.HostPriorityList, error) {
func (n *NodeLabelPrioritizer) CalculateNodeLabelPriority(pod *api.Pod, podLister algorithm.PodLister, nodeLister algorithm.NodeLister) (schedulerapi.HostPriorityList, error) {
var score int
nodes, err := nodeLister.List()
if err != nil {
Expand All @@ -157,7 +158,7 @@ func (n *NodeLabelPrioritizer) CalculateNodeLabelPriority(pod *api.Pod, podListe
labeledNodes[node.Name] = (exists && n.presence) || (!exists && !n.presence)
}

result := []algorithm.HostPriority{}
result := []schedulerapi.HostPriority{}
//score int - scale of 0-10
// 0 being the lowest priority and 10 being the highest
for nodeName, success := range labeledNodes {
Expand All @@ -166,7 +167,7 @@ func (n *NodeLabelPrioritizer) CalculateNodeLabelPriority(pod *api.Pod, podListe
} else {
score = 0
}
result = append(result, algorithm.HostPriority{Host: nodeName, Score: score})
result = append(result, schedulerapi.HostPriority{Host: nodeName, Score: score})
}
return result, nil
}
Expand All @@ -177,21 +178,21 @@ func (n *NodeLabelPrioritizer) CalculateNodeLabelPriority(pod *api.Pod, podListe
// close the two metrics are to each other.
// Detail: score = 10 - abs(cpuFraction-memoryFraction)*10. The algorithm is partly inspired by:
// "Wei Huang et al. An Energy Efficient Virtual Machine Placement Algorithm with Balanced Resource Utilization"
func BalancedResourceAllocation(pod *api.Pod, podLister algorithm.PodLister, nodeLister algorithm.NodeLister) (algorithm.HostPriorityList, error) {
func BalancedResourceAllocation(pod *api.Pod, podLister algorithm.PodLister, nodeLister algorithm.NodeLister) (schedulerapi.HostPriorityList, error) {
nodes, err := nodeLister.List()
if err != nil {
return algorithm.HostPriorityList{}, err
return schedulerapi.HostPriorityList{}, err
}
podsToMachines, err := predicates.MapPodsToMachines(podLister)

list := algorithm.HostPriorityList{}
list := schedulerapi.HostPriorityList{}
for _, node := range nodes.Items {
list = append(list, calculateBalancedResourceAllocation(pod, node, podsToMachines[node.Name]))
}
return list, nil
}

func calculateBalancedResourceAllocation(pod *api.Pod, node api.Node, pods []*api.Pod) algorithm.HostPriority {
func calculateBalancedResourceAllocation(pod *api.Pod, node api.Node, pods []*api.Pod) schedulerapi.HostPriority {
totalMilliCPU := int64(0)
totalMemory := int64(0)
score := int(0)
Expand Down Expand Up @@ -234,7 +235,7 @@ func calculateBalancedResourceAllocation(pod *api.Pod, node api.Node, pods []*ap
score,
)

return algorithm.HostPriority{
return schedulerapi.HostPriority{
Host: node.Name,
Score: score,
}
Expand Down
Loading

0 comments on commit cadc24e

Please sign in to comment.