Skip to content

Kubernetes OTel Integration #30815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Kubernetes OTel Integration #30815

wants to merge 9 commits into from

Conversation

brett0000FF
Copy link
Contributor

@brett0000FF brett0000FF commented Jul 30, 2025

What does this PR do? What is the motivation?

Adds a new guide for monitoring Kubernetes clusters using the OpenTelemetry Collector. Based on opentelemetry-examples page.

  • Adds a new page at /opentelemetry/integrations/kubernetes_metrics.md.
  • Updates the main OTel integrations page (/opentelemetry/integrations.md) to include a link to the new guide under the "Containers and hosts" section.

Merge instructions

Merge readiness:

  • Ready for merge

For Datadog employees:

Your branch name MUST follow the <name>/<description> convention and include the forward slash (/). Without this format, your pull request will not pass CI, the GitLab pipeline will not run, and you won't get a branch preview. Getting a branch preview makes it easier for us to check any issues with your PR, such as broken links.

If your branch doesn't follow this format, rename it or create a new branch and PR.

[6/5/2025] Merge queue has been disabled on the documentation repo. If you have write access to the repo, the PR has been reviewed by a Documentation team member, and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #documentation channel in Slack.

Additional notes

Copy link
Contributor

github-actions bot commented Jul 30, 2025

📝 Documentation Team Review Required

This pull request requires approval from the @DataDog/documentation team before it can be merged.

Please ensure your changes follow our documentation guidelines and wait for a team member to review and approve your changes.

Copy link
Contributor

github-actions bot commented Jul 30, 2025

@github-actions github-actions bot added the Images Images are added/removed with this PR label Jul 31, 2025
@github-actions github-actions bot added the Architecture Everything related to the Doc backend label Jul 31, 2025
@brett0000FF brett0000FF requested a review from justin-lesko July 31, 2025 15:45
@brett0000FF brett0000FF marked this pull request as ready for review July 31, 2025 15:45
@brett0000FF brett0000FF requested a review from a team as a code owner July 31, 2025 15:45
@brett0000FF brett0000FF added the editorial review Waiting on a more in-depth review label Jul 31, 2025
@brett0000FF brett0000FF requested a review from shanelhuang August 4, 2025 15:27
Copy link
Contributor

@janine-c janine-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, Brett! I had some things I had some minor questions about, but nothing that would be a showstopper. Always open to chat further about things, or approve again if this one gets stale 🙂

### Prerequisites

* **Helm**: The setup uses Helm to deploy resources. To install Helm, see the [official Helm documentation][2].
* **Collector Image**: This guide uses the `otel/opentelemetry-collector-contrib:0.130.0` image or newer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In context, should a user usually know which version of the collector image they're using? If not, maybe we could add a link where they could check/update it if necessary?


3. **Install the OpenTelemetry Collectors**

First, add the OpenTelemetry Helm chart repository:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend using a nested ordered list to indicate these substeps, just to make them easier to see 🙂 Words like "first," "next," and "finally" work okay, but if you need to add more steps in the future, for example, they can get unnecessarily unwieldy or difficult to keep track of.

1. Select the metric you want to edit.
1. Click **Edit** in the side panel.
1. Apply the following updates:
- `k8s.pod.cpu.usage`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there's anything to be done about this, but it was a little jarring to see "Select the metric you want to edit" before seeing which metrics required edits. Then, at the end, it looks like you only have to click Save once, but it looks like you have to click it for each metric you're modifying.

I'm kind of toying with the idea of moving this list somewhere else - like, if it were me doing it from scratch, I might put the metrics into a table below the procedure, and then have the procedure refer down to it. But it also kind of feels like a bit of a bug workaround, and that might contribute to the kind of awkward feeling around it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. I am going to reorganize it a bit.

## Correlating traces with infrastructure metrics
To correlate your APM traces with Kubernetes infrastructure metrics, Datadog uses [unified service tagging][7]. This requires setting three standard resource attributes on telemetry from both your application and your infrastructure. Datadog automatically maps these OpenTelemetry attributes to the standard Datadog tags (`env`, `service`, and `version`) used for correlation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I clicked the unified service tagging link and it went to an OTel specific section, so I was wondering if the first instance of Datadog here should say OpenTelemetry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understandable confusion here. Unified Service Tagging is a Datadog feature. Datadog uses this system to correlate data by mapping standard OpenTelemetry resource attributes to standard Datadog tags (env, service, version). I have a separate PR which should make the OTel nuance more clear (which I'll swap the link to when it's ready). But it is correct to say Datadog here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense, thank you for sating my curiosity!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review! Great feedback!

- `service.name`
- `service.version`
- `deployment.environment.name` (Supported in Agent v7.58.0+ and Collector Exporter v0.110.0+; otherwise, use `deployment.environment`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the Prerequisites section says that this guide assumes users have a newer version of the Collector Exporter. Could be worth throwing the Agent version there up too?


## Overview

Collect Kubernetes metrics using the OpenTelemetry Collector to gain comprehensive insights into your cluster's health and performance. This integration uses a combination of OpenTelemetry receivers to gather data, which populates the [Containers - Overview][1] dashboard.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this (and the screenshot) to be the Kubernetes - Overview dashboard?

helm repo update
helm install kube-state-metrics prometheus-community/kube-state-metrics --set "metricLabelsAllowlist[0]=pods=[*]"
```
**Note**: The `--set "metricLabelsAllowlist[0]=pods=[*]"` flag configures `kube-state-metrics` to include all available labels for pod-related metrics. This provides maximum detail but may increase cardinality in large clusters. For production environments, you may want to customize this to a specific list of required labels.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not required given the current configuration files. Pod uid is included by default in the metrics and that's all we're using the pipeline right now.

- `service.name`
- `service.version`
- `deployment.environment.name` (Supported in Agent v7.58.0+ and Collector Exporter v0.110.0+; otherwise, use `deployment.environment`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that these instructions are for customers who won't be using the DD agent and we've listed OTel version 0.130.0 listed as a prerequisite, I think we can delete the version notes. Perhaps we could just say "formerly deployment.environment"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Architecture Everything related to the Doc backend editorial review Waiting on a more in-depth review Images Images are added/removed with this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants