-
Notifications
You must be signed in to change notification settings - Fork 1.7k
fix(datadog_logs sink): Apply agent-json header on events from agent #22701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(datadog_logs sink): Apply agent-json header on events from agent #22701
Conversation
a659211
to
bdd8f95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me.
I think this change is also missing one component. We must also verify the message originated from the datadog agent source |
bdd8f95
to
6d33b4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will take another look when this #22701 (comment) is resolved.
That actually has been implemented here |
6d33b4c
to
ae8d173
Compare
Force pushed changes that modify the previous solution. This second pass allows Vector to forcefully normalize the payload to the agent format (conditionally upon detection of settings that enable this behavior). The logic will move any non reserved fields to be nested under the |
agree with the existing comments that @pront has |
- That way a future test can use those parts to prime its test env
- When logs namespacing is enabled on the datadog logs sink data loss will be observed. This is due to a bug that was introduced in a method called normalize_event which makes events conform with a standard that the datadogs logs backend defines. - The cause is due to unexpected behavior in how some of the methods in LogEvent.rs behave when the underlying Value type is not an Object. When it is not an object the value will be coerced into one and the existing data that was in the value type would be lost. This is why some of the existing unit tests passed, because in those tests the input was hardcoded to the type of an object, whereas coming from the datadog_agent source, the Event was of type Bytes. - The fix is to ahead of time coerce the type into an object (if necesssary) and nest it under the 'message' key, where the datadog logs backend expects the content of the log to exist.
07e1f78
to
aa0a7db
Compare
@@ -66,6 +66,13 @@ pub struct DatadogLogsConfig { | |||
#[configurable(derived)] | |||
#[serde(default)] | |||
pub request: RequestConfig, | |||
|
|||
/// When enabled this sink will normalize events to conform to the Datadog Agent standard. This | |||
/// also sends requests to the logs backend with the `DD-PROTOCOL: agent-json` header. This bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any Datadog docs we can link to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brent-hronik are there any docs that explain specifically the agent message format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question about the efficiency of the approach below. Otherwise LGTM
src/sinks/datadog/logs/sink.rs
Outdated
for key in keys_to_move { | ||
if let Some((entry_k, entry_v)) = object_map.remove_entry(key.as_str()) { | ||
if let Some(returned_entry_v) = message.insert(entry_k, entry_v) { | ||
collisions.insert(key, returned_entry_v); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this process be done the opposite way:
take
the root object, replacing it with an empty object- For each key in the reserved attributes, remove it from the former root into the new root.
- Insert the remainder into
message
and re-insertmessage
into the root.
This eliminates the repeated scans over the reserved attributes and the creation of any temporaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just prototyped this, ends up being more code (2 loops one for step 2 and one for step 3) and the temporaries (keys_to_move) I believe must stay. That is because you cannot call .remove
on a map that you're iterating over and for step 3 (above) we would have to iterate over all remaining keys, calling .remove
on the same map. The main drawback being repeated scans over the reserved attrs list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was thinking was something like this, which seems different than what you described:
let old_root = std::mem::take(&object_map);
for key in DD_RESERVED_SEMANTIC_ATTRS {
if let Some((key, value)) = old_root.remove_entry(key) {
object_map.insert(key, value); // will never be `Some`
}
}
for (key, value) in old_root {
if let Some(returned_entry_v) = message.insert(key, value) {
collisions.insert(key, returned_entry_v);
}
}
object_map.insert(MESSAGE, message);
src/sinks/datadog/logs/sink.rs
Outdated
{ | ||
warn!( | ||
message = "Some duplicate field names collided with ones already existing within the 'message' field. They have been stored under a new object at 'message._collisions'.", | ||
internal_log_rate_limit = true, | ||
); | ||
} else { | ||
error!( | ||
message = "Could not create field named _collisions at .message, a field with that name already exists.", | ||
internal_log_rate_limit = true, | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be internal events that increment a metric as well? The formatting is also kinda funky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt that the workflow would be for the user to observe these, then modify their processors to avoid collisions.
- Remove level of indentation within normalize_as_agent() by exiting early if the events internal value is not of a map type.
…ectordotdev#22701) * fix(datadog_logs sink): Normalize payload to agent format * Set DD-PROTOCOL header and conditionally apply normalization logic * Refactor test to reuse large EventMetadata definition * Unit tests for normalize_as_agent_event() routine * Add changelog file * Fix clippy error * Update comment block * Test agent_conforming against Vector namespaced data * Fix broken unit test * Break out parts of normalize_vector_namepace test - That way a future test can use those parts to prime its test env * Fix for data loss in datadog logs sink w/ logs namespacing enabled - When logs namespacing is enabled on the datadog logs sink data loss will be observed. This is due to a bug that was introduced in a method called normalize_event which makes events conform with a standard that the datadogs logs backend defines. - The cause is due to unexpected behavior in how some of the methods in LogEvent.rs behave when the underlying Value type is not an Object. When it is not an object the value will be coerced into one and the existing data that was in the value type would be lost. This is why some of the existing unit tests passed, because in those tests the input was hardcoded to the type of an object, whereas coming from the datadog_agent source, the Event was of type Bytes. - The fix is to ahead of time coerce the type into an object (if necesssary) and nest it under the 'message' key, where the datadog logs backend expects the content of the log to exist. * Add changelog file for datadog sink logs namespace bug * Expand on changelog with more detail * Add a period to the end of log messages * cargo markdown language annotation * Update docs * Update comment * Create is_reserved_attribute method - Remove level of indentation within normalize_as_agent() by exiting early if the events internal value is not of a map type. * Slightly rewording error message --------- Co-authored-by: Pavlos Rontidis <[email protected]>
Sorry for the late feedback — I realize the PR is already closed, but I wanted to quickly share an idea for future improvements. After this PR, the behavior looks like: Input: {
"a": 1,
"status": "info",
"message": {
"a": 2
}
} Output: {
"status": "info",
"message": {
"a": 2,
"_collisions": {
"a": 1
}
}
} Drawbacks:
Proposed solution: Instead of creating _collisions, move all custom attributes and the original {
"status": "info",
"message": {
"a": 1,
"message": {
"a": 2
}
}
} Why:
|
Thank you, feedback implemented here |
Summary
When routing datadog_agent logs through Vector before sending to the datadog logs intake, users have complained that they lose contextual log level information.
The cause was determined to be a header
DD-PROTOCOL: agent-json
, the agent prepares HTTP requests to the logs backend with this header while Vector does not.This header lets the logs intake take precedence of the attributes that are nestedwithin the 'message' attribute. Therefore things like log_level will depend on values arriving from the users application. Without the header the logs backend falls back on using the value applied at the root of the message, which is usually a value of info or error set by the datadog agent dependent on whether the log was emitted via stdout/stderr.
The remediation is to have Vector apply the HTTP header, however conditionally. This will occur only if the event had originated from the datadog_agent and if the user hadn't applied any transforms to remove or modify reserve attributes in a non standard way.
Vector will partition events on the two aformentioned conditions, events which do not need these conditions will still be sent to datadogs logs backend however without the
DD-PROTOCOL: agent-json
header applied.Change Type
Is this a breaking change?
How did you test this PR?
Tested by sending data to datadog from a service I had created that wrote messages to a local file that looked like this:
Before the change it can be seen that all of the logs in datadog have the status of
info
and after the change the UI shows logs of typeinfo
,error
andwarn
, respecting thelevel
field.Does this PR include user facing changes?
Checklist
make check-all
is a good command to run locally. This check isdefined here. Some of these
checks might not be relevant to your PR. For Rust changes, at the very least you should run:
cargo fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo nextest run --workspace
(alternatively, you can runcargo test --all
)Cargo.lock
), pleaserun
dd-rust-license-tool write
to regenerate the license inventory and commit the changes (if any). More details here.References
#13291