-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Ingestion Spec controller #313
Comments
What do you think are the advantages of having this in operator/crd, instead of having them in druid? If I understand this correctly, instead of directly submitting a job to Druid, users have to deploy a CRD. Correct me if I am wrong here. My concerns are that deploying a crd every time may not be feasible for everyone. Not sure what everyone else thinks. |
Do you see kubernetes as an orchestration platform for running druid or you see kubernetes as a control plane for running druid. If you consider the latter, you will can leverage CRD for handling supervisor specs etc. Just like kafka operator, you can deploy kafka + manage topics and acls via crds. Of course, you can create kafka topics using cli clients same way we can create druid supervisors from the console, but if you want full control from k8s + enhance gitops and build operator as a control plane, this can be a way. |
Current State of Druid Operator
Druid operator is supporting installation, upgrade and maintaining a druid cluster. Internally druid operator has a druid controller which talks to the k8s api for operations. Most of the intelligence built is in from an k8s installation perspective, the CRD spec is very flexible. The current reconcile loop is stable and battle tested.
Current CRD's belong to group druid.apache.org, and is in v1alpha1 version with Druid as the only kind supported. The manager hooks in a single controller ie druid_controller.
Goals
IMHO druid operator ( the operator framework ) is powerful enough to leverage kubernetes as a control plane for running druid. All operations and specs can be handled as CRD definitions. Automating and handling supervisors configs for ingestion for a druid cluster by adding a new CRD to the group druid.apache.org.
Design
Seperation of concerns , A new CRD + Controller
Relation between a druidingestion CR and druid CR is one to one. Having one to many will add complexity and confusion.
Taking motivation from other operators such as strimzi kafka operator that has a separate crd for kafka, kafka topics and kafka acls.
Authentication with Druid API
Design of the CRD, CR Spec and Reconcilation
CRD is scoped to namespaced ie for kind. Validation using open api v3, for more complex validation of the json validation webhook can be added.
A sample CR Spec
Controller shall reconcile this spec, and send POST requests to the overlord API ie http://localhost:8090/druid/indexer/v1/supervisor.
Reconcilation and State Changes
Controllers are combination of level driven and event driven. An update to the druidingestion CR can be reconciled as an update event. Still there is a possibility of an outage. The reconcile loops triggers every N seconds to prevent that.
In the current controller all configs in the CR are converged to first class kubernetes objects. The supervisor spec can be created as a configmap, this configmap can help incase an event is missed, we can trigger an update if the current state is not same as desired state. Druid operator adds an objectHash to the cm. ( same flow as the current controller )
A CRD shall have a status. The status shall be patched with the fields from the http response from druid API, status shall have the supervisor id. To suspend the supervisor spec, controller shall the get the id from status, and POST request to overload api to suspend the supervisor /druid/indexer/v1/supervisor//suspend
Updating suspend to false in the ingestion CR, shall cause reset of the supervisor spec. Operator shall emit events using events API for each operation handled and update the status of the CR.
A deletion of the druid ingestion config shall be controlled by finalizers. Before the deletion controller makes an HTTP call to delete the supervisor spec, at this point the CR will be marked as terminating. Once requests are completed CR will be removed.
This proposal might have missed in some druid specific details of the API. The original issue : #251
The text was updated successfully, but these errors were encountered: