diff --git a/config/_default/menus/main.en.yaml b/config/_default/menus/main.en.yaml index ee649fc3cb43b..a018f5d3792d1 100644 --- a/config/_default/menus/main.en.yaml +++ b/config/_default/menus/main.en.yaml @@ -840,31 +840,36 @@ menu: identifier: otel_kafka_metrics parent: otel_integrations weight: 809 + - name: Kubernetes Metrics + url: opentelemetry/integrations/kubernetes_metrics/ + identifier: otel_kubernetes_metrics + parent: otel_integrations + weight: 810 - name: MySQL Metrics url: opentelemetry/integrations/mysql_metrics/ identifier: otel_mysql_metrics parent: otel_integrations - weight: 810 + weight: 811 - name: NGINX Metrics url: opentelemetry/integrations/nginx_metrics/ identifier: otel_nginx_metrics parent: otel_integrations - weight: 811 + weight: 812 - name: Podman Metrics url: opentelemetry/integrations/podman_metrics/ identifier: otel_podman_metrics parent: otel_integrations - weight: 812 + weight: 813 - name: Runtime Metrics url: opentelemetry/integrations/runtime_metrics/ identifier: otel_runtime_metrics parent: otel_integrations - weight: 813 + weight: 814 - name: Trace Metrics url: opentelemetry/integrations/trace_metrics/ identifier: otel_trace_metrics parent: otel_integrations - weight: 814 + weight: 815 - name: Troubleshooting url: opentelemetry/troubleshooting/ identifier: otel_troubleshooting diff --git a/content/en/opentelemetry/integrations/_index.md b/content/en/opentelemetry/integrations/_index.md index 05cd83c8fc8fb..2472cb6570ec9 100644 --- a/content/en/opentelemetry/integrations/_index.md +++ b/content/en/opentelemetry/integrations/_index.md @@ -48,6 +48,7 @@ Gain insights into your containerized environments and host systems: - [Docker Metrics][5] - Monitor Docker container performance - [Host Metrics][6] - Track system metrics such as CPU, disk, and memory usage +- [Kubernetes Metrics][18] - Monitor Kubernetes cluster health and performance - [Podman Metrics][16] - Monitor Podman container performance ### Web servers and proxies @@ -93,3 +94,5 @@ Monitor big data processing frameworks: [15]: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/dockerstatsreceiver/metadata.yaml [16]: /opentelemetry/integrations/podman_metrics/ [17]: /opentelemetry/integrations/datadog_extension/ +[18]: /opentelemetry/integrations/kubernetes_metrics/ + diff --git a/content/en/opentelemetry/integrations/kubernetes_metrics.md b/content/en/opentelemetry/integrations/kubernetes_metrics.md new file mode 100644 index 0000000000000..e19fb665b7581 --- /dev/null +++ b/content/en/opentelemetry/integrations/kubernetes_metrics.md @@ -0,0 +1,207 @@ +--- +title: Kubernetes Metrics +further_reading: +- link: "/opentelemetry/setup/" + tag: "Documentation" + text: "Send OpenTelemetry Data to Datadog" +- link: "https://docs.datadoghq.com/getting_started/tagging/unified_service_tagging/" + tag: "Documentation" + text: "Unified Service Tagging" +- link: "https://github.com/DataDog/opentelemetry-examples/tree/main/guides/kubernetes" + tag: "GitHub" + text: "Example Collector Configurations" +--- + +
The OpenTelemetry Kubernetes integration is in Preview. To request access, contact your Datadog account team.
+ +## Overview + +Collect Kubernetes metrics using the OpenTelemetry Collector to gain comprehensive insights into your cluster's health and performance. This integration uses a combination of OpenTelemetry receivers to gather data, which populates the [Kubernetes - Overview][1] dashboard. + +{{< img src="/opentelemetry/collector_exporter/kubernetes_metrics.png" alt="The 'Kubernetes - Overview' dashboard, showing metrics for containers, including status and resource usage of your cluster and its containers." style="width:100%;" >}} + +This integration requires the [`kube-state-metrics`][8] service and uses a two-collector architecture to gather data. + +The `kube-state-metrics` service is a required component that generates detailed metrics about the state of Kubernetes objects like deployments, nodes, and pods. This architecture uses two separate OpenTelemetry Collectors: +- A Cluster Collector, deployed as a Kubernetes Deployment, gathers cluster-wide metrics (for example, the total number of deployments). +- A Node Collector, deployed as a Kubernetes DaemonSet, runs on each node to collect node-specific metrics (for example, CPU and memory usage per node). + +This approach ensures that cluster-level metrics are collected only once, preventing data duplication, while node-level metrics are gathered from every node in the cluster. + +## Setup + +To collect Kubernetes metrics with OpenTelemetry, you need to deploy `kube-state-metrics` and configure both of the above OpenTelemetry Collectors in your cluster. + +### Prerequisites + +* **Helm**: The setup uses Helm to deploy resources. To install Helm, see the [official Helm documentation][2]. +* **Collector Image**: This guide uses the `otel/opentelemetry-collector-contrib:0.130.0` image or newer. + +### Installation + +#### 1. Install kube-state-metrics + +Run the following commands to add the `prometheus-community` Helm repository and install `kube-state-metrics`: +```sh +helm repo add prometheus-community https://prometheus-community.github.io/helm-charts +helm repo update +helm install kube-state-metrics prometheus-community/kube-state-metrics +``` + +#### 2. Create a Datadog API Key Secret + +Create a Kubernetes secret to store your Datadog API key securely. +```sh +export DD_API_KEY="" +kubectl create secret generic datadog-secret --from-literal api-key=$DD_API_KEY +``` + +#### 3. Install the OpenTelemetry Collectors + +1. Add the OpenTelemetry Helm chart repository: + ```sh + helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts + helm repo update + ``` + +1. Download the configuration files for the two Collectors: + - [cluster-collector.yaml][3] + - [daemonset-collector.yaml][4] + +1. Set your cluster name as an environment variable and use Helm to deploy both the Cluster and Node Collectors. Make sure the paths to the YAML files are correct. + + ```bash + # Set your cluster name + export K8S_CLUSTER_NAME="" + + # Install the Node Collector (DaemonSet) + helm install otel-daemon-collector open-telemetry/opentelemetry-collector \ + -f daemonset-collector.yaml \ + --set image.repository=otel/opentelemetry-collector-contrib \ + --set image.tag=0.130.0 \ + --set-string "config.processors.resource.attributes[0].key=k8s.cluster.name" \ + --set-string "config.processors.resource.attributes[0].value=${K8S_CLUSTER_NAME}" + + # Install the Cluster Collector (Deployment) + helm install otel-cluster-collector open-telemetry/opentelemetry-collector \ + -f cluster-collector.yaml \ + --set image.repository=otel/opentelemetry-collector-contrib \ + --set image.tag=0.130.0 \ + --set-string "config.processors.resource.attributes[0].key=k8s.cluster.name" \ + --set-string "config.processors.resource.attributes[0].value=${K8S_CLUSTER_NAME}" + ``` + +## Metric metadata configuration + +Some metrics require manual metadata updates in Datadog to ensure they are interpreted and displayed correctly. + +To edit a metric's metadata: +1. Go to **[Metrics > Summary][6]**. +1. Select the metric you want to edit. +1. Click **Edit** in the side panel. +1. Edit the metadata as needed. +1. Click **Save**. + +Repeat this process for each of the metrics listed in the following table: + +| Metric Name | Metric Type | Unit | +|--------------------------|-------------|------------------------------------------| +| `k8s.pod.cpu.usage` | `Gauge` | `core` | +| `k8s.pod.network.io` | `Gauge` | `byte_in_binary_bytes_family per second` | +| `k8s.pod.network.errors` | `Gauge` | `byte_in_binary_bytes_family per second` | + +## Correlating traces with infrastructure metrics + +To correlate your APM traces with Kubernetes infrastructure metrics, Datadog uses [unified service tagging][7]. This requires setting three standard resource attributes on telemetry from both your application and your infrastructure. Datadog automatically maps these OpenTelemetry attributes to the standard Datadog tags (`env`, `service`, and `version`) used for correlation. + +The required OpenTelemetry attributes are: + +- `service.name` +- `service.version` +- `deployment.environment.name` (formerly `deployment.environment`) + +This ensures that telemetry from your application is consistently tagged, allowing Datadog to link traces, metrics, and logs to the same service. + +### Application configuration + +Set the following environment variables in your application's container specification to tag outgoing telemetry: + +```yaml +spec: + containers: + - name: my-container + env: + - name: OTEL_SERVICE_NAME + value: "" + - name: OTEL_SERVICE_VERSION + value: "" + - name: OTEL_ENVIRONMENT + value: "" + - name: OTEL_RESOURCE_ATTRIBUTES + value: "service.name=$(OTEL_SERVICE_NAME),service.version=$(OTEL_SERVICE_VERSION),deployment.environment.name=$(OTEL_ENVIRONMENT)" +``` + +### Infrastructure configuration + +Add the corresponding annotations to your Kubernetes `Deployment` metadata. The `k8sattributes` processor in the Collector uses these annotations to enrich infrastructure metrics with service context. + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: my-app + annotations: + # Use resource.opentelemetry.io/ for the k8sattributes processor + resource.opentelemetry.io/service.name: "" + resource.opentelemetry.io/service.version: "" + resource.opentelemetry.io/deployment.environment.name: "" +spec: + template: + metadata: + annotations: + resource.opentelemetry.io/service.name: "" + resource.opentelemetry.io/service.version: "" + resource.opentelemetry.io/deployment.environment.name: "" +# ... rest of the manifest +``` + +## Data collected + +This integration collects metrics using several OpenTelemetry receivers. + +### kube-state-metrics (using Prometheus receiver) + +Metrics scraped from the `kube-state-metrics` endpoint provide information about the state of Kubernetes API objects. + +### Kubelet stats receiver + +The `kubeletstatsreceiver` collects metrics from the Kubelet on each node, focusing on pod, container, and volume resource usage. + +{{< mapping-table resource="kubeletstats.csv">}} + +### Kubernetes cluster receiver + +The `k8sclusterreceiver` collects cluster-level metrics, such as the status and count of nodes, pods, and other objects. + +{{< mapping-table resource="k8scluster.csv">}} + +### Host metrics receiver + +The `hostmetricsreceiver` gathers system-level metrics from each node in the cluster. + +{{< mapping-table resource="host.csv">}} + +See [OpenTelemetry Metrics Mapping][5] for more information. + +## Further reading + +{{< partial name="whats-next/whats-next.html" >}} + +[1]: https://app.datadoghq.com/dash/integration/86/kubernetes---overview +[2]: https://helm.sh/docs/intro/install/ +[3]: https://github.com/DataDog/opentelemetry-examples/blob/main/guides/kubernetes/configuration/cluster-collector.yaml +[4]: https://github.com/DataDog/opentelemetry-examples/blob/main/guides/kubernetes/configuration/daemonset-collector.yaml +[5]: /opentelemetry/schema_semantics/metrics_mapping/ +[6]: https://app.datadoghq.com/metric/summary +[7]: /getting_started/tagging/unified_service_tagging/?tab=kubernetes#opentelemetry +[8]: https://github.com/kubernetes/kube-state-metrics diff --git a/static/images/opentelemetry/collector_exporter/kubernetes_metrics.png b/static/images/opentelemetry/collector_exporter/kubernetes_metrics.png new file mode 100644 index 0000000000000..a727e9bffbd55 Binary files /dev/null and b/static/images/opentelemetry/collector_exporter/kubernetes_metrics.png differ