-
Notifications
You must be signed in to change notification settings - Fork 606
feat: Add Grove and Kai scheduler as part of dynamo cloud helm chart #2755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Julien Mancuso <[email protected]>
WalkthroughAdds Grove and Kai-scheduler as optional Helm chart dependencies, a conditional Helm template to render Queue CRs when scheduler is enabled, extensive new/expanded Helm values and README templates, CRD API doc generation targets, and multiple documentation updates across guides. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Helm as Helm
participant Chart as Platform Chart
participant KS as kai-scheduler (subchart)
participant Grove as grove (subchart)
participant K8s as Kubernetes API
User->>Helm: helm install/upgrade platform
Helm->>Chart: load chart + values
Chart->>Helm: declare dependencies (KS, Grove) with conditions
alt kai-scheduler.enabled == true
Chart->>Chart: render grove.yaml -> Queue CRs
Chart->>K8s: apply Queue CRs (scheduling.run.ai/v2)
Helm->>KS: resolve & render kai-scheduler subchart
Helm->>Grove: resolve & render grove subchart
else
Note right of Chart: Queue template and subcharts not rendered
end
Helm->>K8s: apply all rendered manifests
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.2.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
deploy/cloud/helm/platform/values.yaml (1)
58-63
: Enable flags added correctly; consider schema/docs to prevent drift.
Looks good. Recommend adding these to values.schema.json (if present) and documenting in README to surfacegrove.enabled
andkai-scheduler.enabled
for users.deploy/cloud/helm/platform/templates/grove.yaml (2)
20-21
: Potential name collision for “default” Queue.
Hard-coding a cluster-global “default” Queue may clash with an existing one. Make names configurable (e.g., via values) or suffix with release name.Example:
-metadata: - name: default +metadata: + name: {{ .Values.grove.queues.default.name | default "default" | quote }}And expose
grove.queues.default.name
in values.yaml.
22-34
: DRY the repeated resource blocks.
Two nearly identical Queue specs can be generated from a list to reduce duplication/errors.-# (two explicit Queue manifests) +{{- $queues := list (dict "name" "default") (dict "name" "dynamo" "parent" "default") -}} +{{- range $q := $queues }} +apiVersion: scheduling.run.ai/v2 +kind: Queue +metadata: + name: {{ $q.name }} +spec: + {{- if $q.parent }} + parentQueue: {{ $q.parent }} + {{- end }} + resources: + {{- range $r := list "cpu" "gpu" "memory" }} + {{ $r }}: + quota: -1 + limit: -1 + overQuotaWeight: 1 + {{- end }} +--- +{{- end }}Also applies to: 43-54
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
deploy/cloud/helm/platform/Chart.yaml
(1 hunks)deploy/cloud/helm/platform/templates/grove.yaml
(1 hunks)deploy/cloud/helm/platform/values.yaml
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
🪛 YAMLlint (1.37.1)
deploy/cloud/helm/platform/templates/grove.yaml
[error] 16-16: syntax error: expected the node content, but found '-'
(syntax)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
deploy/cloud/helm/platform/Chart.yaml (1)
37-45
: Verify OCI chart dependency versions (manual check required)The
version
fields in deploy/cloud/helm/platform/Chart.yaml (lines 37–45) include leading “v” prefixes (v0.8.1
) and a commit-based tag (v0.0.0-6e30275
). Helm requires an exact match against upstream chart versions, and many OCI-hosted charts are published without a leading “v” in their semver.I attempted to run
helm show chart
in this environment but the CLI was unavailable. Please manually verify that these versions align with what’s published in the NVIDIA OCI registry. For example:• helm show chart oci://ghcr.io/nvidia/kai-scheduler --version v0.8.1
• helm show chart oci://ghcr.io/nvidia/kai-scheduler --version 0.8.1
• helm show chart oci://ghcr.io/nvidia/grove --version v0.0.0-6e30275If the published versions omit the “v” (e.g.
0.8.1
) or differ in any way, update theversion:
fields here to match exactly.
Signed-off-by: Julien Mancuso <[email protected]>
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
deploy/cloud/helm/platform/values.yaml (1)
17-600
: Strip trailing whitespace in deploy/cloud/helm/platform/values.yaml
Multiple lines (e.g. 23, 26, 29, 36, 41 … 589) contain trailing spaces; run your formatter or pre-commit (orsed -i 's/[ \t]*$//' deploy/cloud/helm/platform/values.yaml
) to normalize.deploy/cloud/operator/Makefile (1)
59-66
: manifests target is doing too much and installs system-wide yq.
- Very long recipe (checkmake warning) and side-effects (writes docs, edits headers, installs yq to /usr/local/bin) make CI/dev flows brittle.
- Install yq into LOCALBIN instead of requiring root; split transforms into subtargets to keep manifests lean.
-.PHONY: ensure-yq -ensure-yq: - @if ! command -v yq &> /dev/null; then \ - echo "Installing yq..."; \ - ARCH=$$(uname -m | sed 's/x86_64/amd64/' | sed 's/aarch64/arm64/'); \ - OS=$$(uname -s | tr '[:upper:]' '[:lower:]'); \ - wget https://github.com/mikefarah/yq/releases/latest/download/yq_$${OS}_$${ARCH} -O /usr/local/bin/yq && \ - chmod +x /usr/local/bin/yq; \ - else \ - echo "yq is already installed: $$(yq --version)"; \ - fi +.PHONY: yq +yq: $(LOCALBIN)/yq ## Download yq locally if necessary. +$(LOCALBIN)/yq: $(LOCALBIN) + @echo "Installing yq..." + @ARCH=$$(uname -m | sed 's/x86_64/amd64/' | sed 's/aarch64/arm64/'); \ + OS=$$(uname -s | tr '[:upper:]' '[:lower:]'); \ + curl -sSL -o $(LOCALBIN)/yq https://github.com/mikefarah/yq/releases/latest/download/yq_$${OS}_$${ARCH} + @chmod +x $(LOCALBIN)/yq +YQ ?= $(LOCALBIN)/yq @@ -manifests: controller-gen ensure-yq generate-api-docs ## Generate WebhookConfiguration, ClusterRole and CustomResourceDefinition objects. +manifests: controller-gen yq crd-openapi crd-headers crd-keep-annotation crd-sync ## Generate WebhookConfiguration, ClusterRole and CustomResourceDefinition objects. # Use a large maxDescLen to ensure all field comments are included as OpenAPI descriptions $(CONTROLLER_GEN) rbac:roleName=manager-role crd:maxDescLen=100000 webhook paths="./..." output:crd:artifacts:config=config/crd/bases - echo "Removing name from mainContainer required fields" - for file in config/crd/bases/*.yaml; do \ - yq eval '(.. | select(has("mainContainer")) | .mainContainer.required) |= (. - ["name"])' -i --indent 2 $$file || exit 1; \ - done - echo "Removing containers from extraPodSpec required fields" - for file in config/crd/bases/*.yaml; do \ - yq eval '(.. | select(has("extraPodSpec")) | .extraPodSpec.required) |= (. - ["containers"])' -i --indent 2 $$file || exit 1; \ - done + $(MAKE) crd-postprocess + +.PHONY: crd-postprocess crd-openapi crd-headers crd-keep-annotation crd-sync +crd-postprocess: crd-openapi crd-headers crd-keep-annotation crd-sync +crd-openapi: + @echo "Removing name from mainContainer required fields" + @for file in config/crd/bases/*.yaml; do \ + $(YQ) eval '(.. | select(has("mainContainer")) | .mainContainer.required) |= (. - ["name"])' -i --indent 2 $$file || exit 1; \ + done + @echo "Removing containers from extraPodSpec required fields" + @for file in config/crd/bases/*.yaml; do \ + $(YQ) eval '(.. | select(has("extraPodSpec")) | .extraPodSpec.required) |= (. - ["containers"])' -i --indent 2 $$file || exit 1; \ + done @@ - if [ -d "../helm/crds/templates/" ]; then \ - cp config/crd/bases/*.yaml ../helm/crds/templates/; \ - fi + @if [ -d "../helm/crds/templates/" ]; then \ + cp config/crd/bases/*.yaml ../helm/crds/templates/; \ + fiAlso applies to: 68-76, 94-103
♻️ Duplicate comments (1)
deploy/cloud/helm/platform/templates/grove.yaml (1)
15-16
: Fix yamllint error and gate rendering on both feature flags + CRD presence.Move the YAML document start inside the conditional and require grove.enabled, kai-scheduler.enabled, and CRD availability to avoid CRD-not-found and lint failures.
---- -{{- if index .Values "kai-scheduler" "enabled" -}} +{{- if and (index .Values "grove" "enabled") (index .Values "kai-scheduler" "enabled") (.Capabilities.APIVersions.Has "scheduling.run.ai/v2") -}} +---
🧹 Nitpick comments (16)
deploy/cloud/helm/platform/templates/grove.yaml (1)
17-55
: Make queues configurable and add chart ownership labels to reduce conflicts.Hardcoding a cluster-wide “default” Queue may clash with existing clusters. Consider optional creation and templated names; also add labels for traceability.
metadata: - name: default + name: {{ .Values.grove.queues.default.name | default "default" }} + labels: + app.kubernetes.io/managed-by: "helm" + app.kubernetes.io/part-of: "dynamo-platform" @@ metadata: - name: dynamo + name: {{ .Values.grove.queues.dynamo.name | default "dynamo" }} @@ - parentQueue: default + parentQueue: {{ .Values.grove.queues.dynamo.parent | default "default" }}You could also add a boolean like
.Values.grove.createDefaultQueue
to allow disabling the first resource in clusters that already manage it.deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (1)
102-104
: Tighten wording and reference concrete Kubernetes types in the doc comment.Minor grammar/style improvements; explicitly name corev1 types.
-// ExtraPodSpec allows to override the main pod spec configuration. -// It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field -// that allows overriding the main container configuration. +// ExtraPodSpec allows overriding the main PodSpec configuration. +// It is a standard Kubernetes corev1.PodSpec and also contains a MainContainer (corev1.Container) +// field to override the main container configuration.docs/guides/dynamo_deploy/api-reference.md (1)
9-9
: Wrap bare license URL to satisfy markdownlint (MD034).Angle-bracket the Apache URL to avoid “bare URL” warnings.
-http://www.apache.org/licenses/LICENSE-2.0 +<http://www.apache.org/licenses/LICENSE-2.0>docs/guides/dynamo_deploy/dynamo_operator.md (2)
26-29
: Add spacing for readability and linting.Add a blank line before and after the link block to prevent header run-on and potential grammar/lint flags.
-For the complete technical API reference for Dynamo Custom Resource Definitions, see: - -**📖 [Dynamo CRD API Reference](../../../deploy/cloud/operator/docs/api-reference.md)** +For the complete technical API reference for Dynamo Custom Resource Definitions, see: + +**📖 [Dynamo CRD API Reference](../../../deploy/cloud/operator/docs/api-reference.md)** +
106-111
: Use plural, lowercase resource in kubectl example.Kubectl typically uses the plural, lowercase resource name. This also matches other docs in the repo.
-kubectl get dynamographdeployment llm-agg -n $NAMESPACE +kubectl get dynamographdeployments llm-agg -n $NAMESPACEdocs/guides/dynamo_deploy/dynamo_cloud.md (3)
56-56
: Align comment with version floor.Comment says “0.3.2+” while examples use 0.5.0. Align for clarity.
-export RELEASE_VERSION=0.5.0 # any version of Dynamo 0.3.2+ +export RELEASE_VERSION=0.5.0 # any version of Dynamo 0.5.0+
68-75
: Show a complete helm command for enabling Grove and Kai Scheduler.Improve usability; keep quoted keys due to hyphenated value name.
-> By default, Grove and Kai Scheduler are NOT installed. You can enable them by setting the following values in the `dynamo-platform` Helm chart: - -```bash ---set "grove.enabled=true" ---set "kai-scheduler.enabled=true" -``` +> By default, Grove and Kai Scheduler are NOT installed. Enable them during install: +```bash +helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \ + --namespace ${NAMESPACE} \ + --set "grove.enabled=true" \ + --set "kai-scheduler.enabled=true" +```
140-143
: Update expected pods to include NATS when enabled by default.NATS is enabled by default per platform chart.
-# Expected: dynamo-operator-* and etcd-* pods Running +# Expected: dynamo-operator-*, etcd-*, and nats-* pods Runningdocs/guides/dynamo_deploy/README.md (2)
45-45
: Fix kubectl resource casing/pluralization.Use lowercase and plural for reliability.
-kubectl get dynamoGraphDeployment -n ${NAMESPACE} +kubectl get dynamographdeployments -n ${NAMESPACE}
62-69
: Avoid duplicate “API Reference & Documentation” sections.You introduce this section here and later with patterns. Consider keeping one section here and renaming the later block to “Deployment Patterns” only.
deploy/cloud/helm/platform/README.md (3)
9-9
: Wrap bare license URL (MD034).-http://www.apache.org/licenses/LICENSE-2.0 +<http://www.apache.org/licenses/LICENSE-2.0>
85-86
: Clarify cluster-scope/RBAC impact for optional operators.Grove and Kai Scheduler deploy cluster-scoped controllers. Make this explicit in “Prerequisites.”
## 📋 Prerequisites - Kubernetes cluster (v1.20+) - Helm 3.8+ - Sufficient cluster resources for your deployment scale - Container registry access (if using private images) + - Cluster-admin privileges required if enabling Grove or Kai Scheduler (cluster-scoped operators)
53-89
: Provide an enablement example command.Add a minimal helm install snippet showing both toggles together for quick copy/paste.
## Values @@ | nats.enabled | bool | `true` | Whether to enable NATS deployment, disable if you want to use an external NATS instance | +### Quick enablement example +```bash +helm install dynamo-platform ./platform \ + --set "grove.enabled=true" \ + --set "kai-scheduler.enabled=true" +```deploy/cloud/helm/platform/README.md.gotmpl (1)
58-64
: Add links for Grove and Kai scheduler docs for completeness.Small UX win; keeps all optional deps discoverable from this README.
-## 📚 Additional Resources +## 📚 Additional Resources ... +- [Grove Documentation](https://github.com/NVIDIA/galaxy/tree/main/grove) <!-- adjust if you host elsewhere --> +- [Kai Scheduler Documentation](https://github.com/volcano-sh/kai)deploy/cloud/operator/docs/api-reference.md (1)
352-366
: markdownlint MD034 false-positives stem from AsciiDoc link syntax.Will disappear if you generate Markdown or rename to .adoc and exclude from markdownlint.
docs/Makefile (1)
65-71
: Avoid realpath for portability; call the binary directly.realpath isn’t guaranteed on macOS default; not needed here.
- @cd ../deploy/cloud/helm/platform && $(realpath $(HELM_DOCS)) \ + @cd ../deploy/cloud/helm/platform && $(HELM_DOCS) \
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (14)
deploy/cloud/helm/platform/Chart.yaml
(1 hunks)deploy/cloud/helm/platform/README.md
(1 hunks)deploy/cloud/helm/platform/README.md.gotmpl
(1 hunks)deploy/cloud/helm/platform/templates/grove.yaml
(1 hunks)deploy/cloud/helm/platform/values.yaml
(3 hunks)deploy/cloud/operator/Makefile
(2 hunks)deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go
(1 hunks)deploy/cloud/operator/docs/api-reference.md
(1 hunks)deploy/cloud/operator/docs/crd-ref-docs-config.yaml
(1 hunks)docs/Makefile
(1 hunks)docs/guides/dynamo_deploy/README.md
(1 hunks)docs/guides/dynamo_deploy/api-reference.md
(1 hunks)docs/guides/dynamo_deploy/dynamo_cloud.md
(5 hunks)docs/guides/dynamo_deploy/dynamo_operator.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
📚 Learning: 2025-07-18T16:05:05.534Z
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Applied to files:
docs/guides/dynamo_deploy/dynamo_operator.md
📚 Learning: 2025-07-18T16:04:31.771Z
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.
Applied to files:
deploy/cloud/operator/docs/crd-ref-docs-config.yaml
🪛 YAMLlint (1.37.1)
deploy/cloud/helm/platform/templates/grove.yaml
[error] 16-16: syntax error: expected the node content, but found '-'
(syntax)
deploy/cloud/operator/docs/crd-ref-docs-config.yaml
[error] 37-37: trailing spaces
(trailing-spaces)
[error] 38-38: trailing spaces
(trailing-spaces)
[error] 40-40: trailing spaces
(trailing-spaces)
[error] 43-43: trailing spaces
(trailing-spaces)
deploy/cloud/helm/platform/values.yaml
[error] 23-23: trailing spaces
(trailing-spaces)
[error] 26-26: trailing spaces
(trailing-spaces)
[error] 29-29: trailing spaces
(trailing-spaces)
[error] 36-36: trailing spaces
(trailing-spaces)
[error] 41-41: trailing spaces
(trailing-spaces)
[error] 51-51: trailing spaces
(trailing-spaces)
[error] 58-58: trailing spaces
(trailing-spaces)
[error] 61-61: trailing spaces
(trailing-spaces)
[error] 66-66: trailing spaces
(trailing-spaces)
[error] 71-71: trailing spaces
(trailing-spaces)
[error] 74-74: trailing spaces
(trailing-spaces)
[error] 89-89: trailing spaces
(trailing-spaces)
[error] 98-98: trailing spaces
(trailing-spaces)
[error] 105-105: trailing spaces
(trailing-spaces)
[error] 108-108: trailing spaces
(trailing-spaces)
[error] 128-128: trailing spaces
(trailing-spaces)
[error] 137-137: trailing spaces
(trailing-spaces)
[error] 142-142: trailing spaces
(trailing-spaces)
[error] 145-145: trailing spaces
(trailing-spaces)
[error] 170-170: trailing spaces
(trailing-spaces)
[error] 232-232: trailing spaces
(trailing-spaces)
[error] 268-268: trailing spaces
(trailing-spaces)
[error] 371-371: trailing spaces
(trailing-spaces)
[warning] 389-389: wrong indentation: expected 6 but found 4
(indentation)
[error] 536-536: trailing spaces
(trailing-spaces)
[error] 547-547: trailing spaces
(trailing-spaces)
[error] 589-589: trailing spaces
(trailing-spaces)
🪛 LanguageTool
docs/guides/dynamo_deploy/README.md
[grammar] ~62-~62: There might be a mistake here.
Context: ...ph. ## 📖 API Reference & Documentation For detailed technical specifications of...
(QB_NEW_EN)
[grammar] ~67-~67: There might be a mistake here.
Context: ...mo operator configuration and management - **Create Deployment...
(QB_NEW_EN)
docs/guides/dynamo_deploy/dynamo_operator.md
[grammar] ~28-~28: There might be a mistake here.
Context: ...ns, see: 📖 Dynamo CRD API Reference ## Installation [See installation steps](d...
(QB_NEW_EN)
deploy/cloud/operator/docs/api-reference.md
[grammar] ~4-~4: There might be a mistake here.
Context: ...refix: k8s-api [id="{p}-api-reference"] == API Reference .Packages - xref:{anch...
(QB_NEW_EN)
[grammar] ~7-~7: There might be a mistake here.
Context: ...-reference"] == API Reference .Packages - xref:{anchor_prefix}-nvidia-com-v1alpha1...
(QB_NEW_EN)
[grammar] ~11-~11: There might be a mistake here.
Context: ...d="{anchor_prefix}-nvidia-com-v1alpha1"] === nvidia.com/v1alpha1 Package v1alpha...
(QB_NEW_EN)
[grammar] ~16-~16: There might be a mistake here.
Context: ....com v1alpha1 API group .Resource Types - xref:{anchor_prefix}-github-com-ai-dynam...
(QB_NEW_EN)
[grammar] ~17-~17: There might be a mistake here.
Context: ...eployment[$$DynamoComponentDeployment$$] - xref:{anchor_prefix}-github-com-ai-dynam...
(QB_NEW_EN)
[grammar] ~22-~22: There might be a mistake here.
Context: ...loud-operator-api-v1alpha1-autoscaling"] ==== Autoscaling .Appears In: ***...
(QB_NEW_EN)
[grammar] ~33-~33: There might be a mistake here.
Context: ...ynamoComponentDeploymentOverridesSpec$$] - xref:{anchor_prefix}-github-com-ai-dynam...
(QB_NEW_EN)
[grammar] ~34-~34: There might be a mistake here.
Context: ...$$DynamoComponentDeploymentSharedSpec$$] - xref:{anchor_prefix}-github-com-ai-dynam...
(QB_NEW_EN)
[grammar] ~38-~38: There might be a mistake here.
Context: ...ols="20a,50a,15a,15a", options="header"] |=== | Field | Description | Default | V...
(QB_NEW_EN)
[grammar] ~39-~39: There might be a mistake here.
Context: ...20a,50a,15a,15a", options="header"] |=== | Field | Description | Default | Valida...
(QB_NEW_EN)
[grammar] ~40-~40: There might be a mistake here.
Context: ...eld | Description | Default | Validation | enabled
boolean | | | | *`m...
(QB_NEW_EN)
[grammar] ~41-~41: There might be a mistake here.
Context: ...ation | enabled
boolean | | | | minReplicas
integer | | | |...
(QB_NEW_EN)
[grammar] ~42-~42: There might be a mistake here.
Context: ... | minReplicas
integer | | | | maxReplicas
integer | | | |...
(QB_NEW_EN)
[grammar] ~43-~43: There might be a mistake here.
Context: ... | maxReplicas
integer | | | | behavior
__link:https://kubernetes...
(QB_NEW_EN)
[grammar] ~44-~44: There might be a mistake here.
Context: ...ontalPodAutoscalerBehavior$$]__ | | | | metrics
__link:https://kubernetes....
(QB_NEW_EN)
[grammar] ~51-~51: There might be a mistake here.
Context: ...cloud-operator-api-v1alpha1-basestatus"] ==== BaseStatus .Appears In: ****...
(QB_NEW_EN)
[grammar] ~65-~65: There might be a mistake here.
Context: ...ols="20a,50a,15a,15a", options="header"] |=== | Field | Description | Default | V...
(QB_NEW_EN)
[grammar] ~66-~66: There might be a mistake here.
Context: ...20a,50a,15a,15a", options="header"] |=== | Field | Description | Default | Valida...
(QB_NEW_EN)
[grammar] ~67-~67: There might be a mistake here.
Context: ...eld | Description | Default | Validation | version
string | | | | *`st...
(QB_NEW_EN)
[grammar] ~68-~68: There might be a mistake here.
Context: ...dation | version
string | | | | state
string | | | | *`cond...
(QB_NEW_EN)
[grammar] ~69-~69: There might be a mistake here.
Context: ...| | | | state
string | | | | conditions
__link:https://kubernet...
(QB_NEW_EN)
[grammar] ~74-~74: There might be a mistake here.
Context: ...api-v1alpha1-dynamocomponentdeployment"] ==== DynamoComponentDeployment Dynamo...
(QB_NEW_EN)
[grammar] ~97-~97: There might be a mistake here.
Context: ...dynamocomponentdeploymentoverridesspec"] ==== DynamoComponentDeploymentOverridesS...
(QB_NEW_EN)
[grammar] ~115-~115: There might be a mistake here.
Context: ...e, and Ingress when applicable). + | | | labels
__object (keys:string, valu...
(QB_NEW_EN)
[grammar] ~129-~129: There might be a mistake here.
Context: ...annotations to the created Pods. + | | | extraPodSpec
__xref:{anchor_prefix...
(QB_NEW_EN)
[grammar] ~140-~140: There might be a mistake here.
Context: ...a1-dynamocomponentdeploymentsharedspec"] ==== DynamoComponentDeploymentSharedSpec...
(QB_NEW_EN)
[grammar] ~151-~151: There might be a mistake here.
Context: ...ynamoComponentDeploymentOverridesSpec$$] - xref:{anchor_prefix}-github-com-ai-dynam...
(QB_NEW_EN)
[grammar] ~159-~159: There might be a mistake here.
Context: ...e, and Ingress when applicable). + | | | labels
__object (keys:string, valu...
(QB_NEW_EN)
[grammar] ~173-~173: There might be a mistake here.
Context: ...annotations to the created Pods. + | | | extraPodSpec
__xref:{anchor_prefix...
(QB_NEW_EN)
[grammar] ~184-~184: There might be a mistake here.
Context: ...v1alpha1-dynamocomponentdeploymentspec"] ==== DynamoComponentDeploymentSpec Dy...
(QB_NEW_EN)
[grammar] ~202-~202: There might be a mistake here.
Context: ...n the packaged Dynamo artifacts. + | | | dynamoTag
string | contains th...
(QB_NEW_EN)
[grammar] ~203-~203: There might be a mistake here.
Context: ... example, "my_package:MyService" + | | | backendFramework
string | Back...
(QB_NEW_EN)
[grammar] ~207-~207: There might be a mistake here.
Context: ...e, and Ingress when applicable). + | | | labels
__object (keys:string, valu...
(QB_NEW_EN)
[grammar] ~221-~221: There might be a mistake here.
Context: ...annotations to the created Pods. + | | | extraPodSpec
__xref:{anchor_prefix...
(QB_NEW_EN)
[grammar] ~232-~232: There might be a mistake here.
Context: ...alpha1-dynamocomponentdeploymentstatus"] ==== DynamoComponentDeploymentStatus ...
(QB_NEW_EN)
[grammar] ~258-~258: There might be a mistake here.
Context: ...tor-api-v1alpha1-dynamographdeployment"] ==== DynamoGraphDeployment DynamoGrap...
(QB_NEW_EN)
[grammar] ~281-~281: There might be a mistake here.
Context: ...api-v1alpha1-dynamographdeploymentspec"] ==== DynamoGraphDeploymentSpec Dynamo...
(QB_NEW_EN)
[grammar] ~313-~313: There might be a mistake here.
Context: ...i-v1alpha1-dynamographdeploymentstatus"] ==== DynamoGraphDeploymentStatus Dyna...
(QB_NEW_EN)
[grammar] ~336-~336: There might be a mistake here.
Context: ...loud-operator-api-v1alpha1-ingressspec"] ==== IngressSpec .Appears In: ***...
(QB_NEW_EN)
[grammar] ~347-~347: There might be a mistake here.
Context: ...ynamoComponentDeploymentOverridesSpec$$] - xref:{anchor_prefix}-github-com-ai-dynam...
(QB_NEW_EN)
[grammar] ~348-~348: There might be a mistake here.
Context: ...$$DynamoComponentDeploymentSharedSpec$$] - xref:{anchor_prefix}-github-com-ai-dynam...
(QB_NEW_EN)
[grammar] ~360-~360: There might be a mistake here.
Context: ...ngress/VirtualService resources. + | | | labels
__object (keys:string, valu...
(QB_NEW_EN)
[grammar] ~368-~368: There might be a mistake here.
Context: ...d-operator-api-v1alpha1-ingresstlsspec"] ==== IngressTLSSpec .Appears In: ...
(QB_NEW_EN)
[grammar] ~382-~382: There might be a mistake here.
Context: ...ols="20a,50a,15a,15a", options="header"] |=== | Field | Description | Default | V...
(QB_NEW_EN)
[grammar] ~383-~383: There might be a mistake here.
Context: ...20a,50a,15a,15a", options="header"] |=== | Field | Description | Default | Valida...
(QB_NEW_EN)
[grammar] ~384-~384: There might be a mistake here.
Context: ...eld | Description | Default | Validation | secretName
string | SecretName...
(QB_NEW_EN)
[grammar] ~389-~389: There might be a mistake here.
Context: ...ud-operator-api-v1alpha1-multinodespec"] ==== MultinodeSpec .Appears In: *...
(QB_NEW_EN)
[grammar] ~400-~400: There might be a mistake here.
Context: ...ynamoComponentDeploymentOverridesSpec$$] - xref:{anchor_prefix}-github-com-ai-dynam...
(QB_NEW_EN)
[grammar] ~401-~401: There might be a mistake here.
Context: ...$$DynamoComponentDeploymentSharedSpec$$] - xref:{anchor_prefix}-github-com-ai-dynam...
(QB_NEW_EN)
[grammar] ~415-~415: There might be a mistake here.
Context: ...deploy-cloud-operator-api-v1alpha1-pvc"] ==== PVC .Appears In: **** - xref...
(QB_NEW_EN)
[grammar] ~426-~426: There might be a mistake here.
Context: ...ynamoComponentDeploymentOverridesSpec$$] - xref:{anchor_prefix}-github-com-ai-dynam...
(QB_NEW_EN)
[grammar] ~427-~427: There might be a mistake here.
Context: ...$$DynamoComponentDeploymentSharedSpec$$] - xref:{anchor_prefix}-github-com-ai-dynam...
(QB_NEW_EN)
[grammar] ~436-~436: There might be a mistake here.
Context: ...y if the PVC is already created. + | | | size
__xref:{anchor_prefix}-k8s-io...
(QB_NEW_EN)
[grammar] ~437-~437: There might be a mistake here.
Context: ... in Gi, used during PVC creation + | | | volumeAccessMode
__link:https://ku...
(QB_NEW_EN)
[grammar] ~438-~438: There might be a mistake here.
Context: ...he volume access mode of the PVC + | | | mountPoint
string | | | |==...
(QB_NEW_EN)
[grammar] ~443-~443: There might be a mistake here.
Context: ...operator-api-v1alpha1-sharedmemoryspec"] ==== SharedMemorySpec .Appears In...
(QB_NEW_EN)
[grammar] ~454-~454: There might be a mistake here.
Context: ...ynamoComponentDeploymentOverridesSpec$$] - xref:{anchor_prefix}-github-com-ai-dynam...
(QB_NEW_EN)
[grammar] ~455-~455: There might be a mistake here.
Context: ...$$DynamoComponentDeploymentSharedSpec$$] - xref:{anchor_prefix}-github-com-ai-dynam...
(QB_NEW_EN)
[grammar] ~459-~459: There might be a mistake here.
Context: ...ols="20a,50a,15a,15a", options="header"] |=== | Field | Description | Default | V...
(QB_NEW_EN)
[grammar] ~460-~460: There might be a mistake here.
Context: ...20a,50a,15a,15a", options="header"] |=== | Field | Description | Default | Valida...
(QB_NEW_EN)
[grammar] ~461-~461: There might be a mistake here.
Context: ...eld | Description | Default | Validation | disabled
boolean | | | | *`...
(QB_NEW_EN)
[grammar] ~462-~462: There might be a mistake here.
Context: ...tion | disabled
boolean | | | | size
__xref:{anchor_prefix}-k8s-io...
(QB_NEW_EN)
deploy/cloud/helm/platform/README.md
[grammar] ~24-~24: There might be a mistake here.
Context: ...ional?style=flat-square) ## 🚀 Overview The Dynamo Platform Helm chart deploys t...
(QB_NEW_EN)
[grammar] ~28-~28: There might be a mistake here.
Context: ...operator for managing Dynamo deployments - NATS: High-performance messaging syste...
(QB_NEW_EN)
[grammar] ~29-~29: There might be a mistake here.
Context: ...aging system for component communication - etcd: Distributed key-value store for ...
(QB_NEW_EN)
[grammar] ~30-~30: There might be a mistake here.
Context: ...alue store for operator state management - Grove: Multi-node inference orchestrat...
(QB_NEW_EN)
[grammar] ~31-~31: There might be a mistake here.
Context: ...-node inference orchestration (optional) - Kai Scheduler: Advanced workload sched...
(QB_NEW_EN)
[grammar] ~34-~34: There might be a mistake here.
Context: ...heduling (optional) ## 📋 Prerequisites - Kubernetes cluster (v1.20+) - Helm 3.8+ ...
(QB_NEW_EN)
[grammar] ~36-~36: There might be a mistake here.
Context: ...equisites - Kubernetes cluster (v1.20+) - Helm 3.8+ - Sufficient cluster resources...
(QB_NEW_EN)
[grammar] ~37-~37: There might be a mistake here.
Context: ... Kubernetes cluster (v1.20+) - Helm 3.8+ - Sufficient cluster resources for your de...
(QB_NEW_EN)
[grammar] ~38-~38: There might be a mistake here.
Context: ...ster resources for your deployment scale - Container registry access (if using priv...
(QB_NEW_EN)
[grammar] ~41-~41: There might be a mistake here.
Context: ...ing private images) ## 🔧 Configuration ## Requirements | Repository | Name | Vers...
(QB_NEW_EN)
[grammar] ~45-~45: There might be a mistake here.
Context: ...rements | Repository | Name | Version | |------------|------|---------| | file:/...
(QB_NEW_EN)
[grammar] ~46-~46: There might be a mistake here.
Context: ...ersion | |------------|------|---------| | file://components/operator | dynamo-op...
(QB_NEW_EN)
[grammar] ~47-~47: There might be a mistake here.
Context: ...nts/operator | dynamo-operator | 0.5.0 | | https://charts.bitnami.com/bitnami | e...
(QB_NEW_EN)
[grammar] ~48-~48: There might be a mistake here.
Context: ...ts.bitnami.com/bitnami | etcd | 11.1.0 | | https://nats-io.github.io/k8s/helm/cha...
(QB_NEW_EN)
[grammar] ~49-~49: There might be a mistake here.
Context: ...hub.io/k8s/helm/charts/ | nats | 1.3.2 | | oci://ghcr.io/nvidia/grove | grove(gro...
(QB_NEW_EN)
[grammar] ~50-~50: There might be a mistake here.
Context: ...| grove(grove-charts) | v0.0.0-6e30275 | | oci://ghcr.io/nvidia/kai-scheduler | k...
(QB_NEW_EN)
[style] ~88-~88: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... enable NATS deployment, disable if you want to use an external NATS instance | ### NA...
(REP_WANT_TO_VB)
[grammar] ~92-~92: There might be a mistake here.
Context: ... official NATS Helm chart documentation: **[NATS Helm Chart Documentation](https:...
(QB_NEW_EN)
[grammar] ~97-~97: There might be a mistake here.
Context: ...l Bitnami etcd Helm chart documentation: **[etcd Helm Chart Documentation](https:...
(QB_NEW_EN)
[grammar] ~100-~100: There might be a mistake here.
Context: ...nami/etcd)** ## 📚 Additional Resources - [Dynamo Cloud Deployment Guide](../../../...
(QB_NEW_EN)
[grammar] ~102-~102: There might be a mistake here.
Context: ...ources - Dynamo Cloud Deployment Guide - [NATS Documentation](https://docs.nats.io...
(QB_NEW_EN)
[grammar] ~103-~103: There might be a mistake here.
Context: ...y/dynamo_cloud.md) - NATS Documentation - [etcd Documentation](https://etcd.io/docs...
(QB_NEW_EN)
[grammar] ~104-~104: There might be a mistake here.
Context: ...s://docs.nats.io/) - etcd Documentation - [Kubernetes Operator Pattern](https://kub...
(QB_NEW_EN)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2755/merge) by julienmancuso.
docs/guides/dynamo_deploy/README.md
[error] 1-1: Trailing whitespace detected by pre-commit and fixed. Command: 'pre-commit run --show-diff-on-failure --color=always --all-files'.
deploy/cloud/operator/docs/api-reference.md
[error] 1-1: Trailing whitespace detected by pre-commit and fixed. Command: 'pre-commit run --show-diff-on-failure --color=always --all-files'.
deploy/cloud/operator/docs/crd-ref-docs-config.yaml
[error] 1-1: Trailing whitespace detected by pre-commit and fixed. Command: 'pre-commit run --show-diff-on-failure --color=always --all-files'.
deploy/cloud/helm/platform/README.md
[error] 1-1: Trailing whitespace detected by pre-commit and fixed. Command: 'pre-commit run --show-diff-on-failure --color=always --all-files'.
[error] 1-1: Trailing whitespace detected by pre-commit and fixed. Command: 'pre-commit run --show-diff-on-failure --color=always --all-files'.
deploy/cloud/helm/platform/README.md.gotmpl
[error] 1-1: Trailing whitespace detected by pre-commit and fixed. Command: 'pre-commit run --show-diff-on-failure --color=always --all-files'.
[error] 1-1: Trailing whitespace detected by pre-commit and fixed. Command: 'pre-commit run --show-diff-on-failure --color=always --all-files'.
deploy/cloud/helm/platform/values.yaml
[error] 1-1: Trailing whitespace detected by pre-commit and fixed. Command: 'pre-commit run --show-diff-on-failure --color=always --all-files'.
🪛 markdownlint-cli2 (0.17.2)
deploy/cloud/operator/docs/api-reference.md
44-44: Bare URL used
(MD034, no-bare-urls)
45-45: Bare URL used
(MD034, no-bare-urls)
70-70: Bare URL used
(MD034, no-bare-urls)
90-90: Bare URL used
(MD034, no-bare-urls)
123-123: Bare URL used
(MD034, no-bare-urls)
133-133: Bare URL used
(MD034, no-bare-urls)
134-134: Bare URL used
(MD034, no-bare-urls)
167-167: Bare URL used
(MD034, no-bare-urls)
177-177: Bare URL used
(MD034, no-bare-urls)
178-178: Bare URL used
(MD034, no-bare-urls)
215-215: Bare URL used
(MD034, no-bare-urls)
225-225: Bare URL used
(MD034, no-bare-urls)
226-226: Bare URL used
(MD034, no-bare-urls)
249-249: Bare URL used
(MD034, no-bare-urls)
274-274: Bare URL used
(MD034, no-bare-urls)
305-305: Bare URL used
(MD034, no-bare-urls)
331-331: Bare URL used
(MD034, no-bare-urls)
438-438: Bare URL used
(MD034, no-bare-urls)
deploy/cloud/helm/platform/README.md
48-48: Bare URL used
(MD034, no-bare-urls)
49-49: Bare URL used
(MD034, no-bare-urls)
59-59: Bare URL used
(MD034, no-bare-urls)
59-59: Bare URL used
(MD034, no-bare-urls)
🪛 checkmake (0.2.2)
deploy/cloud/operator/Makefile
[warning] 60-60: Target body for "manifests" exceeds allowed length of 5 (42).
(maxbodylength)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (8)
deploy/cloud/helm/platform/Chart.yaml (1)
37-45
: Dependencies wired correctly with conditions.Kai-scheduler and Grove dependencies look good and align with values keys; versions are pinned. No action needed here.
deploy/cloud/helm/platform/values.yaml (1)
113-121
: Sane defaults for Grove and Kai toggles.Both are disabled by default, matching dependency conditions. Good.
docs/guides/dynamo_deploy/api-reference.md (1)
20-22
: LGTM — clear cross-link to operator API reference.Path looks correct relative to this doc location.
docs/guides/dynamo_deploy/dynamo_cloud.md (2)
42-44
: Version bump looks good.Default image tag updated to 0.5.0 consistently with the rest of the guide.
188-189
: Link path looks correct.Advanced Options link correctly targets platform README.
deploy/cloud/helm/platform/README.md.gotmpl (1)
36-38
: Verify minimum supported Kubernetes version (v1.20+) is still accurate.Operator/tooling elsewhere targets 1.28–1.29; align prereqs or document tested ranges.
deploy/cloud/operator/Makefile (1)
135-139
: build depends on manifests ⇒ always regenerates docs; confirm this is desired for CI.If not, decouple docs generation from manifests/build and move it under docs-only workflows.
docs/Makefile (1)
79-86
: Coupling to operator Makefile target; ensure extension matches after renderer change.If you switch CRD docs to Markdown or .adoc, confirm guides link to the correct path.
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename the deploy/cloud/operator/docs/api-reference.md to deploy/cloud/operator/docs/api_reference.md to follow our convention
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
@julienmancuso please try not to merge if fixable checks like lychee are newly failing: https://github.com/ai-dynamo/dynamo/actions/runs/17313166829/job/49150966630 but thanks for fixing it in the follow up: #2779! 🚀 |
…2755) Signed-off-by: Julien Mancuso <[email protected]> Signed-off-by: Jason Zhou <[email protected]>
…2755) Signed-off-by: Julien Mancuso <[email protected]> Signed-off-by: Michael Shin <[email protected]>
…2755) Signed-off-by: Julien Mancuso <[email protected]> Signed-off-by: Krishnan Prashanth <[email protected]>
…2755) Signed-off-by: Julien Mancuso <[email protected]> Signed-off-by: nnshah1 <[email protected]>
Overview:
Add Grove and Kai scheduler as part of dynamo cloud helm chart
Summary by CodeRabbit
New Features
Documentation
Chores