-
Notifications
You must be signed in to change notification settings - Fork 13
refactor: cleanup the telemetry configuration api #1002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The API was very much not clear. The spawn_with_config function took a config that would act as default, with a lower priority than fields set in the ConfigBuilder. Using Builder::run would also try to pull configuration directly from env variables which is not good since there are other sources that could override it. This was so that integrations in the sidecar, and ddtrace dotnet would work without having to specify all configurations... but it's pretty horrible. In practice after checking in libdatadog and other repositories I'm pretty sure this weird pattern can be replaced by putting the config in the builder and then overriding some fields as needed.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1002 +/- ##
==========================================
- Coverage 71.85% 71.60% -0.25%
==========================================
Files 335 334 -1
Lines 50353 50354 +1
==========================================
- Hits 36180 36058 -122
- Misses 14173 14296 +123
🚀 New features to boost your workflow:
|
BenchmarksComparisonBenchmark execution time: 2025-04-07 14:05:31 Comparing candidate commit 0c54fed in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 52 metrics, 2 unstable metrics. CandidateCandidate benchmark detailsGroup 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Group 11
Group 12
Group 13
BaselineOmitted due to size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - nothing block. A few questions/comment.
// It should receive 3 payloads: app-started, metrics and app-closing. | ||
mock_metrics.assert_hits(3); | ||
// It should receive 1 payloads: metrics | ||
mock_metrics.assert_hits(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/question
do all the libraries have telemetry which send started
and closing
?
Just wondering if this would ever make sense to do all the telemetry via native.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes all the libraries have a telemetry cleint sending started
and closing
. This is tested in system-tests
The issue is the additional data sent with app-started and app-closing. The existing telemetry clients in libraries also send their dependencies and configuration.
There is an API for this on the telemetry client, and we could eventually replace the in-language telemetry client with the libdatadog one, but IMO this should not be done in the same initiative as the trace exporter integration to prevent scope creep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect 👌
|
||
if let Some(id) = self.runtime_id { | ||
builder.runtime_id = Some(id); | ||
} | ||
|
||
let (worker, handle) = builder | ||
.spawn_with_config(self.config) | ||
.spawn() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these two APIs with_config
and spawn_with_config
/spawn
look very confusing, could you add some docs which one to use and when?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed spawn_with_config
in favor of spawn
The issue was that the config passed in spawn_with_config
would actually be used as the default values.
So if you did this, you would end up with debug_logging_enabled = true
in the final config.
let mut builder = TelemetryWorkerBuilder::new(...):
builder.config.debug_logging_enabled = Some(true);
builder
.spawn_with_config(Config {
debug_logging_enabled: false,
..default(),
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frankly we should just remove one, two configs all coming from manual code configuration is a bad API.
Although, I'm not going to block the PR for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I know, but I'm fixing the priority first, and then I think I'll add a builder pattern in a followup PR. The issue is that doing both at the same time makes the changes really hard to distinguish
Artifact Size Benchmark Reportaarch64-alpine-linux-musl
aarch64-unknown-linux-gnu
libdatadog-x64-windows
libdatadog-x86-windows
x86_64-alpine-linux-musl
x86_64-unknown-linux-gnu
|
This PR does 2 things
Libraries already have a telemetry client sending lifecycle events and sending duplicates from the trace exporter is a bug.
There is already an API on the TelemetryWorker to drop the lifecycle events and it's juste a matter of using it.
The API was very much not clear.
The spawn_with_config function took a config that would act as default, with a lower priority than fields set in the ConfigBuilder.
Using Builder::run would also try to pull configuration directly from env variables which is not good since there are other sources that could override it. This was so that integrations in the sidecar, and ddtrace dotnet would work without having to specify all configurations... but it's pretty horrible.
In practice after checking in libdatadog and other repositories I'm pretty sure this weird pattern can be replaced by putting the config in the builder and then overriding some fields as needed.
This PR also hides the
endpoint
behind a setter an a getter that properly set the telemetry path based on if direct submission is enabled or not.What does this PR do?
A brief description of the change being made with this pull request.
Motivation
What inspired you to submit this pull request?
Additional Notes
Anything else we should know when reviewing?
How to test the change?
Describe here in detail how the change can be validated.