|
| 1 | +<!-- |
| 2 | +SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 3 | +SPDX-License-Identifier: Apache-2.0 |
| 4 | +
|
| 5 | +Licensed under the Apache License, Version 2.0 (the "License"); |
| 6 | +you may not use this file except in compliance with the License. |
| 7 | +You may obtain a copy of the License at |
| 8 | +
|
| 9 | +http://www.apache.org/licenses/LICENSE-2.0 |
| 10 | +
|
| 11 | +Unless required by applicable law or agreed to in writing, software |
| 12 | +distributed under the License is distributed on an "AS IS" BASIS, |
| 13 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 14 | +See the License for the specific language governing permissions and |
| 15 | +limitations under the License. |
| 16 | +--> |
| 17 | + |
| 18 | +# dynamo-platform |
| 19 | + |
| 20 | +A Helm chart for NVIDIA Dynamo Platform. |
| 21 | + |
| 22 | +  |
| 23 | + |
| 24 | +## 🚀 Overview |
| 25 | + |
| 26 | +The Dynamo Platform Helm chart deploys the complete Dynamo Cloud infrastructure on Kubernetes, including: |
| 27 | + |
| 28 | +- **Dynamo Operator**: Kubernetes operator for managing Dynamo deployments |
| 29 | +- **NATS**: High-performance messaging system for component communication |
| 30 | +- **etcd**: Distributed key-value store for operator state management |
| 31 | +- **Grove**: Multi-node inference orchestration (optional) |
| 32 | +- **Kai Scheduler**: Advanced workload scheduling (optional) |
| 33 | + |
| 34 | +## 📋 Prerequisites |
| 35 | + |
| 36 | +- Kubernetes cluster (v1.20+) |
| 37 | +- Helm 3.8+ |
| 38 | +- Sufficient cluster resources for your deployment scale |
| 39 | +- Container registry access (if using private images) |
| 40 | + |
| 41 | +## 🔧 Configuration |
| 42 | + |
| 43 | +## Requirements |
| 44 | + |
| 45 | +| Repository | Name | Version | |
| 46 | +|------------|------|---------| |
| 47 | +| file://components/operator | dynamo-operator | 0.5.0 | |
| 48 | +| https://charts.bitnami.com/bitnami | etcd | 11.1.0 | |
| 49 | +| https://nats-io.github.io/k8s/helm/charts/ | nats | 1.3.2 | |
| 50 | +| oci://ghcr.io/nvidia/grove | grove(grove-charts) | v0.0.0-6e30275 | |
| 51 | +| oci://ghcr.io/nvidia/kai-scheduler | kai-scheduler | v0.8.1 | |
| 52 | + |
| 53 | +## Values |
| 54 | + |
| 55 | +| Key | Type | Default | Description | |
| 56 | +|-----|------|---------|-------------| |
| 57 | +| dynamo-operator.enabled | bool | `true` | Whether to enable the Dynamo Kubernetes operator deployment | |
| 58 | +| dynamo-operator.natsAddr | string | `""` | NATS server address for operator communication (leave empty to use the bundled NATS chart). Format: "nats://hostname:port" | |
| 59 | +| dynamo-operator.etcdAddr | string | `""` | etcd server address for operator state storage (leave empty to use the bundled etcd chart). Format: "http://hostname:port" or "https://hostname:port" | |
| 60 | +| dynamo-operator.namespaceRestriction.enabled | bool | `true` | Whether to restrict operator to specific namespaces | |
| 61 | +| dynamo-operator.namespaceRestriction.targetNamespace | string | `nil` | Target namespace for operator deployment (leave empty for current namespace) | |
| 62 | +| dynamo-operator.controllerManager.tolerations | list | `[]` | Node tolerations for controller manager pods | |
| 63 | +| dynamo-operator.controllerManager.manager.image.repository | string | `"nvcr.io/nvidia/ai-dynamo/kubernetes-operator"` | Official NVIDIA Dynamo operator image repository | |
| 64 | +| dynamo-operator.controllerManager.manager.image.tag | string | `""` | Image tag (leave empty to use chart default) | |
| 65 | +| dynamo-operator.controllerManager.manager.image.pullPolicy | string | `"IfNotPresent"` | Image pull policy - when to pull the image | |
| 66 | +| dynamo-operator.controllerManager.manager.args[0] | string | `"--health-probe-bind-address=:8081"` | Health probe endpoint for Kubernetes health checks | |
| 67 | +| dynamo-operator.controllerManager.manager.args[1] | string | `"--metrics-bind-address=127.0.0.1:8080"` | Metrics endpoint for Prometheus scraping (localhost only for security) | |
| 68 | +| dynamo-operator.imagePullSecrets | list | `[]` | Secrets for pulling private container images | |
| 69 | +| dynamo-operator.dynamo.groveTerminationDelay | string | `"15m"` | How long to wait before forcefully terminating Grove instances | |
| 70 | +| dynamo-operator.dynamo.internalImages.debugger | string | `"python:3.12-slim"` | Debugger image for troubleshooting deployments | |
| 71 | +| dynamo-operator.dynamo.enableRestrictedSecurityContext | bool | `false` | Whether to enable restricted security contexts for enhanced security | |
| 72 | +| dynamo-operator.dynamo.dockerRegistry.useKubernetesSecret | bool | `false` | Whether to use Kubernetes secrets for registry authentication | |
| 73 | +| dynamo-operator.dynamo.dockerRegistry.server | string | `nil` | Docker registry server URL | |
| 74 | +| dynamo-operator.dynamo.dockerRegistry.username | string | `nil` | Registry username | |
| 75 | +| dynamo-operator.dynamo.dockerRegistry.password | string | `nil` | Registry password (consider using existingSecretName instead) | |
| 76 | +| dynamo-operator.dynamo.dockerRegistry.existingSecretName | string | `nil` | Name of existing Kubernetes secret containing registry credentials | |
| 77 | +| dynamo-operator.dynamo.dockerRegistry.secure | bool | `true` | Whether the registry uses HTTPS | |
| 78 | +| dynamo-operator.dynamo.ingress.enabled | bool | `false` | Whether to create ingress resources | |
| 79 | +| dynamo-operator.dynamo.ingress.className | string | `nil` | Ingress class name (e.g., "nginx", "traefik") | |
| 80 | +| dynamo-operator.dynamo.ingress.tlsSecretName | string | `"my-tls-secret"` | Secret name containing TLS certificates | |
| 81 | +| dynamo-operator.dynamo.istio.enabled | bool | `false` | Whether to enable Istio integration | |
| 82 | +| dynamo-operator.dynamo.istio.gateway | string | `nil` | Istio gateway name for routing | |
| 83 | +| dynamo-operator.dynamo.ingressHostSuffix | string | `""` | Host suffix for generated ingress hostnames | |
| 84 | +| dynamo-operator.dynamo.virtualServiceSupportsHTTPS | bool | `false` | Whether VirtualServices should support HTTPS routing | |
| 85 | +| grove.enabled | bool | `false` | Whether to enable Grove for multi-node inference coordination, if enabled, the Grove operator will be deployed cluster-wide | |
| 86 | +| kai-scheduler.enabled | bool | `false` | Whether to enable Kai Scheduler for intelligent resource allocation, if enabled, the Kai Scheduler operator will be deployed cluster-wide | |
| 87 | +| etcd.enabled | bool | `true` | Whether to enable etcd deployment, disable if you want to use an external etcd instance | |
| 88 | +| nats.enabled | bool | `true` | Whether to enable NATS deployment, disable if you want to use an external NATS instance | |
| 89 | + |
| 90 | +### NATS Configuration |
| 91 | + |
| 92 | +For detailed NATS configuration options beyond `nats.enabled`, please refer to the official NATS Helm chart documentation: |
| 93 | +**[NATS Helm Chart Documentation](https://github.com/nats-io/k8s/tree/main/helm/charts/nats)** |
| 94 | + |
| 95 | +### etcd Configuration |
| 96 | + |
| 97 | +For detailed etcd configuration options beyond `etcd.enabled`, please refer to the official Bitnami etcd Helm chart documentation: |
| 98 | +**[etcd Helm Chart Documentation](https://github.com/bitnami/charts/tree/main/bitnami/etcd)** |
| 99 | + |
| 100 | +## 📚 Additional Resources |
| 101 | + |
| 102 | +- [Dynamo Cloud Deployment Guide](../../../../docs/guides/dynamo_deploy/dynamo_cloud.md) |
| 103 | +- [NATS Documentation](https://docs.nats.io/) |
| 104 | +- [etcd Documentation](https://etcd.io/docs/) |
| 105 | +- [Kubernetes Operator Pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/) |
| 106 | + |
| 107 | +---------------------------------------------- |
| 108 | +Autogenerated from chart metadata using [helm-docs v1.14.2](https://github.com/norwoodj/helm-docs/releases/v1.14.2) |
0 commit comments