This document discusses the motivation and potential implementation for supporting "Configuration-level resources" in Akri. Currently, Akri only supports creating a Kubernetes resource (i.e. device plugin) for each individual device. Since each device in Akri is represented as an Instance custom resource, these are called Instance-level resources. A Configuration-level resource is a resource that represents all of the devices discovered via a Configuration, i.e. all IP cameras.
Currently, with only Instance-level resources, specific devices must be requested as resources. For example, a Pod could be deployed that uses both ip-camera-UNIQUE_ID1
and ip-camera-UNIQUE_ID2
by adding the following resource request to the PodSpec
:
resources:
requests:
akri.sh/onvif-camera-UNIQUE_ID1: "1"
akri.sh/onvif-camera-UNIQUE_ID2: "1"
With Configuration-level resources, instead of needing to know the specific Instances to request, two generic cameras could be requested and the Agent will behind the scenes do the work of selecting which Instances to reserve:
resources:
requests:
akri.sh/onvif-camera: "2"
Furthermore, with Configuration-level resources, users could develop their own deployment strategies rather than relying on the Akri Controller to deploy Pods to discovered devices. Higher level Kubernetes objects (Deployments, ReplicaSets, DaemonSets, etc.) can be deployed requesting resources, and Pods will only be run on Nodes that have those resources available.
Term | Definition |
---|---|
capacity |
Maximum amount of containers that can use a device. Defines the number of usage slots a device's device plugin advertises to the kubelet |
Configuration-level (CL) resource | An extended resource that is created by the Agent for each Configuration. This equates to the creation of a Kubernetes device plugin for each Configuration. It enables requesting any of a set of (Configuration) resources, such as any IP camera, thermometer, etc. Resources are given the Configuration name. A request to use a CL resource is mapped to an IL resource by the Agent. |
Instance-level (IL) resource | An extended resource that is created by the Agent for each discovered device (i.e. Instance), such as a specific IP camera, thermometer, etc. This equates to the creation of a Kubernetes device plugin for each Instance. Resources are named in the format -, where the input to the one-way hash is a descriptor of the Instance (and node name for local non-shared devices). |
To support Configuration-level resources, the Agent will create an additional device plugin for each Configuration (that initiates discovery). No changes are needed to the Controller or Discovery Handlers.
The device plugin created for each Configuration will contain capacity
* number of instances
slots. Each slot will map to a "real" slot of an Instance device plugin. For example,
after deploying a onvif
Configuration with a capacity
of 2
, the NodeSpec
of a node that
could see both cameras would be:
Capacity:
akri.sh/akri-onvif: 4
akri.sh/akri-onvif-8120fe: 2
akri.sh/akri-onvif-a19705: 2
Now, a Pod could be deployed that requests 2 cameras by requesting 2 akri.sh/akri-onvif
resources.
apiVersion: "apps/v1"
kind: Pod
metadata:
name: onvif-broker
spec:
containers:
- name: nginx
image: "nginx:latest"
resources:
requests:
"akri.sh/akri-onvif": "2"
limits:
"akri.sh/akri-onvif": "2"
The Kubernetes scheduler will schedule the Pod to the Node since the resources is available.
The kubelet on that Node will make an allocate
call to the akri.sh/akri-onvif
device plugin in the Akri Agent.
The Agent will then reserve slots in the associated Instance device plugins (akri.sh/akri-onvif-8120fe
and akri.sh/akri-onvif-a19705
).
When selecting the slots from Instances, the Agent will preference unique Instances; for example, in the above Pod, the Agent will make sure 1 slot of each akri-onvif-8120fe
and akri-onvif-a19705
is reserved for each Pod.
There are two implementation options in the case where there are not enough unique Instances to meet the requested number of Configuration-level resources. One scenario where this could happen is if a Pod requests 3 cameras when only 2 exist with a capacity of 2 each. In this case, the Configuration-level camera resource shows as having a quantity of 4, despite there being two cameras. In this case, the kubelet will try to schedule the Pod, thinking there are enough resources. The Agent could either allocate 2 spots on one camera and one on the other or deny the allocation request. The latter is the preferred approach as it is more consistent and ensures the workload is getting the number of unique devices it expects. After failing an allocate
request from the kubelet, the Pod will be in an UnexpectedAdmissionError
state until another camera comes online and it can be successfully scheduled.
The Agent implements the latter approach that denies allocate
requests that request more than available cameras.
It manages the number of available slots reported to kubelet to prevent kubelet from scheduling Pod that requests invalid number of resources.
The Agent exposes Configuration-level and Instance-level resources to kubelt as device plugins. The CL and IL device plugins have
different behaviors but use almost the same set of information. Originally, the IL device plugin is implemented in
the DevicePluginService
class. To support Configuration-level resources, the behavior part of device plugin is extracted out
from DevicePluginService
and introduce the ConfiguratinDevicePlugin
and InstanceDevicePlugin
classes to represent the behavior of
CL and IL device plugin. The DevicePluginService
holds the common data for device plugins to operate and
a device_plugin_behavior
field for the desired behavior (CL or IL).
CL/IL specific data and behavior is defined in ConfiguratinDevicePlugin
and InstanceDevicePlugin
class.
The Rust code in Agent for DevicePluginService
, ConfiguratinDevicePlugin
and InstanceDevicePlugin
look like below
pub enum DevicePluginBehavior {
Configuration(ConfigurationDevicePlugin),
Instance(InstanceDevicePlugin),
}
pub struct DevicePluginService {
/// Instance CRD name
pub instance_name: String,
/// Instance's Configuration
pub config: ConfigurationSpec,
/// ... data used by both CL and IL device plugin
/// Enum object that defines the behavior of the device plugin
pub device_plugin_behavior: DevicePluginBehavior,
}
The ConfiguratinDevicePlugin
defines the behavior of the configuration-level resources, combined with DevicePluginService
, it forms a device plugin that advertises the configuration-level resources to the kubelet.
Similar to the Agent creates a DevicePluginService
with InstanceDevicePlugin
behavior for each discovered Instance. The Agent creates a
DevicePluginService
with ConfigurationDevicePlugin
behavior for each Configuration when the Configuration is applied to the cluster. All
DevicePluginService
share the same structure that has list_and_watch
and allocate
for kubelet to call. The actual behavior of
list_and_watch
and allocate
is defined in ConfigurationDevicePlugin
and InstanceDevicePlugin
for Configuration and Instance respectively.
The CL and IL device plugin need to coordinate to sync up available resources so when a resource is claimed by CL (or IL) device plugin, the
other device plugin is notified and re-calculate the available resources for itself. The list_and_watch_message_sender
in
DevicePluginService
is used for notifying available resource change within a device plugin. For notifying resource change across device
plugins, a copy of list_and_watch_message_sender
for each DevicePluginService
are saved in InstanceConfig
, where
usage_update_message_sender
holds the message sender of CL device plugin and IL device plugin's list_and_watch_message_sender
is saved in
each instance's InstanceInfo
.
For IL device plugin, the list_and_watch
is notified when available IL resource changed, which include
- the instance is deleted
- an IL allocate failed
- a CL allocate succeeded.
For CL device plugin, the list_and_watch
is notified when
- the configuration is deleted
- an instance slot is added or deleted
- a CL allocate failed
- an IL allocate succeeded
The following diagram shows the notification flows for the device plugins.
The Instance device plugins report available resources using fixed virtual device id. Instance device plugin construct the virtual device id by
appending an index to the Instance name. For example, for an Instance device plugin, if the instance name is akri.sh/akri-onvif-8120fe
and
the capacity
is 2, Instance device plugin reports 2 virtual devices akri-onvif-8120fe-0
and akri-onvif-8120fe-1
. This works fine for
Instance device plugin but is not flexible for Configuration device plugin as we want to minimize the possibility that kubelet issues allocate
requests that request more than available cameras. Instead of using fixed virtual device id, Configuration device plugins expose available
resources using dynamic virtual device ids to provide maximum device usage flexibility.
Using the same example that two cameras are discovered for a Configuration (akri.sh/akri-onvif-8120fe
and akri.sh/akri-onvif-a19705
) and the
capacity
is 2. The Configuration device plugin calculates available resources and, instead of reporting fixed id like akri-onvif-8120fe-0
,
the Configuration device plugin reports virtual device id "0", "1", ... as 'place holder' in list_and_watch
. The actual device slot to be used
is determined when allocate
is called.
To avoid kubelet issues allocate
requests that requests 3 cameras when only 2 cameras exist with a capacity of 2 each. The Configuration
device plugin only report resources already being claimed by the Configuration device plugin and +1 for each camera that still has at least one
free slot. For our example, if all slots from 2 cameras are free, the Configuration device plugin reports 2 virtual devices "0" and "1" even
there are actually 4 slots available to use. In a different case, if akri-onvif-8120fe-1
had been claimed by the Configuration device plugin
as virtual device id "4" and all other 3 slots are free, the Configuration device plugin reports virtual device id "0", "1", and "4" as
available virtual device ids. The virtual device id "4" maps to akri-onvif-8120fe-1
and id "0" and "1" can be mapped to
akri-onvif-8120fe-0
, akri-onvif-a19705-0
, or akri-onvif-a19705-1
later when allocate
is called. By managing the number of available
virtual devices, we can reduce the chances that kubelet issues allocate
requests more than available cameras exist.
Note that there is still chances that all slots have been claimed by Configuration device plugin. For example, "0": akri-onvif-8120fe-0
, "1":
akri-onvif-8120fe-1
, "2":akri-onvif-a19705-0
, "3":akri-onvif-a19705-1
. In this case, Configuration device plugin reports all 4 virtual
devices are available, and it's possible that kubelet requests "0" and "1" which map to the same camera. In that case, the Configuration device
plugin denies the allocation request and the Pod will be in an UnexpectedAdmissionError
state. The Configuration device plugin calculates
virtual device availability periodically and reduces the number of available virtual devices when the reconciler detects slots not in-use and
set the slot usage to free. In our example, assume "3":akri-onvif-a19705-1
is the only slot being used. The reconciler set slots
akri-onvif-8120fe-0
,
akri-onvif-8120fe-1
, and akri-onvif-a19705-0
to free. The Configuration device plugin then reduces the device availability to "0", "1" and "3".
If kubelet retries to claim "0" and "1", the Configuration device plugin will allow it by mapping "0" to akri-onvif-8120fe-0
or akri-onvif-8120fe-1
, "1" to akri-onvif-a19705-0
.
The Configuration device plugin reports "0", "1", ... as virtual device ids in list_and_watch
and determines the actual device slot to be used
when allocate
is called. The algorithm to map virtual device ids to actual device slot works on the allocation requests on a per-container
basis that:
- For a given container request, ensure allocated devices are unique instances
- If a virtual device id was claimed before, use the previous allocation information.
- If a virtual device id has being claimed, pick a device slot from the instance that has the most free slots.
For example, assume "3":akri-onvif-a19705-1
is the only slot being used and other slots are all free. Configuration device plugin
list_and_watch
reports virtual device id "0", "1" and "3" are available (1 free from akri-onvif-8120fe
, 1 free from akri-onvif-a19705
and
1 previously claimed from akri-onvif-a19705
).
- If kubelet requests "0" and "3" in a container request of an
allocate
call, the mapping is "0":akri-onvif-8120fe-0
orakri-onvif-8120fe-1
(sinceakri-onvif-8120fe
has the most free slots between instanceakri-onvif-8120fe
(2) andakri-onvif-a19705
(1)) and "3":akri-onvif-a19705-1
(previously claimed). - If kubelet requests "0", "1" and "3" in a container request of an
allocate
call, Configuration device plugin denies the request as there is only 2 cameras available to allocate. - If kubelet request "0" and "1" in a container request and "3" in a different container request of an
allocate
call, the Configuratin device plugin maps "0":akri-onvif-8120fe-0
orakri-onvif-8120fe-1
(the most free slots) and "1":akri-onvif-a19705-0
(unique instance) to container request 1 and "3":akri-onvif-a19705-1
(previously claimed) to container request 2.
The Instance.deviceUsage
in Akri Instances and Akri slot annotation are extended to support Configuration device plugin.
The Instance.deviceUsage
may look like this:
deviceUsage:
my-resource-00095f-0: ""
my-resource-00095f-1: ""
my-resource-00095f-2: ""
my-resource-00095f-3: "node-a"
my-resource-00095f-4: ""
where empty string means the slot is free and non-empty string indicates the slot is used (by the node). To support Configuration device plugin,
we need a way to distinguish if a slot is used by a Configuration device plugin or Instance device plugin. The Instance.deviceUsage
format is
extended hold the additional information, the deviceUsage can be a "<node_name>" (for Instance) or a "C:<virtual_device_id>:<node_name>" (for
Configuration). For example, the Instance.deviceUsage
shows the slot my-resource-00095f-2
is used by virtual device id "0" of the
Configuration device plugin on node-b
. The slot my-resource-00095f-3
is used by Instance device plugin on node-a
. The other 3 slots are
free.
deviceUsage:
my-resource-00095f-0: ""
my-resource-00095f-1: ""
my-resource-00095f-2: "C:0:node-b"
my-resource-00095f-3: "node-a"
my-resource-00095f-4: ""
The format of Akri slot annotation is extended in a similar manner, too. The Akri slot annotation looked like this:
"akri.agent.slot-my-resource-00095f-3": "my-resource-00095f-3"
To support Configuration device plugin, the Akri slot annotation can be a "<slot_name>" (for Instance) or a "C:<virtual_device_id>:<slot_name>" (for Configuration). The example below shows the pod uses the slot my-resource-00095f-2
claimed by virtual device id "0" of the Configuration device plugin. The pod also uses the slot my-resource-00095f-3
which is claimed by Instance device plugin.
"akri.agent.slot-my-resource-00095f-2": "C:0:my-resource-00095f-2"
"akri.agent.slot-my-resource-00095f-3": "my-resource-00095f-3"
The Akri Agent and Discovery Handlers enable device discovery and Kubernetes resource creation: they discover devices, create Kubernetes resources to represent the devices, and ensure only capacity
containers are using a device at once via the device plugin framework. The Akri Controller eases device use. If a broker is specified in a Configuration, the Controller will automatically deploy Kubernetes Pods or Jobs to discovered devices. Currently the Controller only supports two deployment strategies: either deploying a non-terminating Pod (that Akri calls a "broker") to each Node that can see a device or deploying a single Job to the cluster for each discovered device. There are plenty of scenarios that do not fit these two strategies such as a ReplicaSet like deployment of n number of Pods to the cluster. With Configuration-level resources, users could easily achieve their own scenarios without the Akri Controller, as selecting resources is more declarative. A user specifies in a resource request how many OPC UA servers are needed rather than needing to delineate the exact ones already discovered by Akri, as explained in Akri's current documentation on requesting Akri resources.
For example, with Configuration-level resources, the following Deployment could be applied to a cluster:
apiVersion: "apps/v1"
kind: Deployment
metadata:
name: onvif-broker-deployment
spec:
replicas: 2
selector:
matchLabels:
name: onvif-broker
template:
metadata:
labels:
name: onvif-broker
spec:
containers:
- name: nginx
image: "nginx:latest"
resources:
requests:
"akri.sh/akri-onvif": "2"
limits:
"akri.sh/akri-onvif": "2"
Pods will only be successfully scheduled to a Node and run if the resources exist and are available. In the case of the
above scenario, if there were two cameras on the network like in the implementation scenario, two
Pods would be deployed to the cluster. If there are not enough resources, say there is only one camera on the network,
the two Pods will be left in a Pending
state until another is discovered. This is the case with any deployment on
Kubernetes where there are not enough resources. However, Pending
Pods do not use up cluster resources.
Currently, the Akri Controller handles deployment of Pods to avoid Pods being scheduled to Nodes without resources. This is another advantage of using the Akri Controller rather than applying Deployments, DaemonSets, etc. In conjunction with this work on supporting Configuration level resources, the Akri's Controller could support a new deployment strategy for using these CL resources. The Controller also takes care of bringing down Pods when resources are no longer available, while Pods from manually created Deployments would continue to run even if the resources are no longer there.