-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Kubernetes OTel Integration #30815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Kubernetes OTel Integration #30815
Conversation
📝 Documentation Team Review RequiredThis pull request requires approval from the @DataDog/documentation team before it can be merged. Please ensure your changes follow our documentation guidelines and wait for a team member to review and approve your changes. |
Preview links (active after the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, Brett! I had some things I had some minor questions about, but nothing that would be a showstopper. Always open to chat further about things, or approve again if this one gets stale 🙂
### Prerequisites | ||
|
||
* **Helm**: The setup uses Helm to deploy resources. To install Helm, see the [official Helm documentation][2]. | ||
* **Collector Image**: This guide uses the `otel/opentelemetry-collector-contrib:0.130.0` image or newer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In context, should a user usually know which version of the collector image they're using? If not, maybe we could add a link where they could check/update it if necessary?
|
||
3. **Install the OpenTelemetry Collectors** | ||
|
||
First, add the OpenTelemetry Helm chart repository: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend using a nested ordered list to indicate these substeps, just to make them easier to see 🙂 Words like "first," "next," and "finally" work okay, but if you need to add more steps in the future, for example, they can get unnecessarily unwieldy or difficult to keep track of.
1. Select the metric you want to edit. | ||
1. Click **Edit** in the side panel. | ||
1. Apply the following updates: | ||
- `k8s.pod.cpu.usage` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure there's anything to be done about this, but it was a little jarring to see "Select the metric you want to edit" before seeing which metrics required edits. Then, at the end, it looks like you only have to click Save once, but it looks like you have to click it for each metric you're modifying.
I'm kind of toying with the idea of moving this list somewhere else - like, if it were me doing it from scratch, I might put the metrics into a table below the procedure, and then have the procedure refer down to it. But it also kind of feels like a bit of a bug workaround, and that might contribute to the kind of awkward feeling around it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point. I am going to reorganize it a bit.
## Correlating traces with infrastructure metrics | ||
To correlate your APM traces with Kubernetes infrastructure metrics, Datadog uses [unified service tagging][7]. This requires setting three standard resource attributes on telemetry from both your application and your infrastructure. Datadog automatically maps these OpenTelemetry attributes to the standard Datadog tags (`env`, `service`, and `version`) used for correlation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I clicked the unified service tagging
link and it went to an OTel specific section, so I was wondering if the first instance of Datadog
here should say OpenTelemetry
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understandable confusion here. Unified Service Tagging is a Datadog feature. Datadog uses this system to correlate data by mapping standard OpenTelemetry resource attributes to standard Datadog tags (env, service, version). I have a separate PR which should make the OTel nuance more clear (which I'll swap the link to when it's ready). But it is correct to say Datadog here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, makes sense, thank you for sating my curiosity!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review! Great feedback!
- `service.name` | ||
- `service.version` | ||
- `deployment.environment.name` (Supported in Agent v7.58.0+ and Collector Exporter v0.110.0+; otherwise, use `deployment.environment`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the Prerequisites section says that this guide assumes users have a newer version of the Collector Exporter. Could be worth throwing the Agent version there up too?
|
||
## Overview | ||
|
||
Collect Kubernetes metrics using the OpenTelemetry Collector to gain comprehensive insights into your cluster's health and performance. This integration uses a combination of OpenTelemetry receivers to gather data, which populates the [Containers - Overview][1] dashboard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this (and the screenshot) to be the Kubernetes - Overview
dashboard?
helm repo update | ||
helm install kube-state-metrics prometheus-community/kube-state-metrics --set "metricLabelsAllowlist[0]=pods=[*]" | ||
``` | ||
**Note**: The `--set "metricLabelsAllowlist[0]=pods=[*]"` flag configures `kube-state-metrics` to include all available labels for pod-related metrics. This provides maximum detail but may increase cardinality in large clusters. For production environments, you may want to customize this to a specific list of required labels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually not required given the current configuration files. Pod uid
is included by default in the metrics and that's all we're using the pipeline right now.
- `service.name` | ||
- `service.version` | ||
- `deployment.environment.name` (Supported in Agent v7.58.0+ and Collector Exporter v0.110.0+; otherwise, use `deployment.environment`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that these instructions are for customers who won't be using the DD agent and we've listed OTel version 0.130.0
listed as a prerequisite, I think we can delete the version notes. Perhaps we could just say "formerly deployment.environment
"?
Co-authored-by: Janine Chan <[email protected]>
What does this PR do? What is the motivation?
Adds a new guide for monitoring Kubernetes clusters using the OpenTelemetry Collector. Based on opentelemetry-examples page.
Merge instructions
Merge readiness:
For Datadog employees:
Your branch name MUST follow the
<name>/<description>
convention and include the forward slash (/
). Without this format, your pull request will not pass CI, the GitLab pipeline will not run, and you won't get a branch preview. Getting a branch preview makes it easier for us to check any issues with your PR, such as broken links.If your branch doesn't follow this format, rename it or create a new branch and PR.
[6/5/2025] Merge queue has been disabled on the documentation repo. If you have write access to the repo, the PR has been reviewed by a Documentation team member, and all of the required checks have passed, you can use the Squash and Merge button to merge the PR. If you don't have write access, or you need help, reach out in the #documentation channel in Slack.
Additional notes