fix(datadog_logs sink): Apply agent-json header on events from agent #22701

graphcareful · 2025-03-20T23:42:12Z

Summary

When routing datadog_agent logs through Vector before sending to the datadog logs intake, users have complained that they lose contextual log level information.

The cause was determined to be a header DD-PROTOCOL: agent-json, the agent prepares HTTP requests to the logs backend with this header while Vector does not.

This header lets the logs intake take precedence of the attributes that are nestedwithin the 'message' attribute. Therefore things like log_level will depend on values arriving from the users application. Without the header the logs backend falls back on using the value applied at the root of the message, which is usually a value of info or error set by the datadog agent dependent on whether the log was emitted via stdout/stderr.

The remediation is to have Vector apply the HTTP header, however conditionally. This will occur only if the event had originated from the datadog_agent and if the user hadn't applied any transforms to remove or modify reserve attributes in a non standard way.

Vector will partition events on the two aformentioned conditions, events which do not need these conditions will still be sent to datadogs logs backend however without the DD-PROTOCOL: agent-json header applied.

Change Type

Bug fix
New feature
Non-functional (chore, refactoring, docs)
Performance

Is this a breaking change?

Yes
No

How did you test this PR?

Tested by sending data to datadog from a service I had created that wrote messages to a local file that looked like this:

{"level": "info", "message": "This is sample message: 50", "time": 1742513705.3499498}
{"level": "error", "message": "This is sample message: 72", "time": 1742513706.355413}
{"level": "warn", "message": "This is sample message: 10", "time": 1742513707.3605719}

Before the change it can be seen that all of the logs in datadog have the status of info and after the change the UI shows logs of type info, error and warn, respecting the level field.

Does this PR include user facing changes?

Yes. Please add a changelog fragment based on our guidelines.
No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

Please read our Vector contributor resources.
- make check-all is a good command to run locally. This check is
  defined here. Some of these
  checks might not be relevant to your PR. For Rust changes, at the very least you should run:
  - cargo fmt --all
  - cargo clippy --workspace --all-targets -- -D warnings
  - cargo nextest run --workspace (alternatively, you can run cargo test --all)
If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
run dd-rust-license-tool write to regenerate the license inventory and commit the changes (if any). More details here.

References

#13291

bruceg

Makes sense to me.

src/sinks/datadog/logs/sink.rs

changelog.d/datadog_agent_http_header.fix.md

graphcareful · 2025-03-21T13:49:15Z

I think this change is also missing one component. We must also verify the message originated from the datadog agent source

src/sinks/datadog/logs/sink.rs

pront

Will take another look when this #22701 (comment) is resolved.

graphcareful · 2025-03-21T21:00:21Z

Will take another look when this #22701 (comment) is resolved.

That actually has been implemented here

src/sinks/datadog/logs/sink.rs

graphcareful · 2025-04-14T14:54:13Z

Force pushed changes that modify the previous solution. This second pass allows Vector to forcefully normalize the payload to the agent format (conditionally upon detection of settings that enable this behavior). The logic will move any non reserved fields to be nested under the message key. If there are any collisions during this process they will be stored under message._collisions. In these cases !warn or !errors may be logged to alert the user that this has been performed. This should address the concerns of data loss that @pront and @brent-hronik had mentioned.

src/sinks/datadog/logs/sink.rs

src/sinks/datadog/logs/config.rs

myself

brent-hronik · 2025-04-14T16:02:41Z

agree with the existing comments that @pront has

- That way a future test can use those parts to prime its test env

- When logs namespacing is enabled on the datadog logs sink data loss will be observed. This is due to a bug that was introduced in a method called normalize_event which makes events conform with a standard that the datadogs logs backend defines. - The cause is due to unexpected behavior in how some of the methods in LogEvent.rs behave when the underlying Value type is not an Object. When it is not an object the value will be coerced into one and the existing data that was in the value type would be lost. This is why some of the existing unit tests passed, because in those tests the input was hardcoded to the type of an object, whereas coming from the datadog_agent source, the Event was of type Bytes. - The fix is to ahead of time coerce the type into an object (if necesssary) and nest it under the 'message' key, where the datadog logs backend expects the content of the log to exist.

changelog.d/datadog_agent_http_header.fix.md

pront · 2025-04-16T13:27:26Z

src/sinks/datadog/logs/config.rs

@@ -66,6 +66,13 @@ pub struct DatadogLogsConfig {
    #[configurable(derived)]
    #[serde(default)]
    pub request: RequestConfig,
+
+    /// When enabled this sink will normalize events to conform to the Datadog Agent standard. This
+    /// also sends requests to the logs backend with the `DD-PROTOCOL: agent-json` header. This bool


Are there any Datadog docs we can link to here?

@brent-hronik are there any docs that explain specifically the agent message format?

src/sinks/datadog/logs/sink.rs

bruceg

I have a question about the efficiency of the approach below. Otherwise LGTM

src/sinks/datadog/logs/sink.rs

bruceg · 2025-04-16T15:57:20Z

src/sinks/datadog/logs/sink.rs

+        for key in keys_to_move {
+            if let Some((entry_k, entry_v)) = object_map.remove_entry(key.as_str()) {
+                if let Some(returned_entry_v) = message.insert(entry_k, entry_v) {
+                    collisions.insert(key, returned_entry_v);
+                }
+            }
+        }


Could this process be done the opposite way:

take the root object, replacing it with an empty object

For each key in the reserved attributes, remove it from the former root into the new root.

Insert the remainder into message and re-insert message into the root.

This eliminates the repeated scans over the reserved attributes and the creation of any temporaries.

I just prototyped this, ends up being more code (2 loops one for step 2 and one for step 3) and the temporaries (keys_to_move) I believe must stay. That is because you cannot call .remove on a map that you're iterating over and for step 3 (above) we would have to iterate over all remaining keys, calling .remove on the same map. The main drawback being repeated scans over the reserved attrs list.

What I was thinking was something like this, which seems different than what you described:

let old_root = std::mem::take(&object_map); for key in DD_RESERVED_SEMANTIC_ATTRS { if let Some((key, value)) = old_root.remove_entry(key) { object_map.insert(key, value); // will never be `Some` } } for (key, value) in old_root { if let Some(returned_entry_v) = message.insert(key, value) { collisions.insert(key, returned_entry_v); } } object_map.insert(MESSAGE, message);

bruceg · 2025-04-16T15:58:50Z

src/sinks/datadog/logs/sink.rs

+            {
+                warn!(
+                message = "Some duplicate field names collided with ones already existing within the 'message' field. They have been stored under a new object at 'message._collisions'.",
+                internal_log_rate_limit = true,
+            );
+            } else {
+                error!(
+                message = "Could not create field named _collisions at .message, a field with that name already exists.",
+                internal_log_rate_limit = true,
+            );
+            }


Should these be internal events that increment a metric as well? The formatting is also kinda funky.

I felt that the workflow would be for the user to observe these, then modify their processors to avoid collisions.

…eader

- Remove level of indentation within normalize_as_agent() by exiting early if the events internal value is not of a map type.

…ectordotdev#22701) * fix(datadog_logs sink): Normalize payload to agent format * Set DD-PROTOCOL header and conditionally apply normalization logic * Refactor test to reuse large EventMetadata definition * Unit tests for normalize_as_agent_event() routine * Add changelog file * Fix clippy error * Update comment block * Test agent_conforming against Vector namespaced data * Fix broken unit test * Break out parts of normalize_vector_namepace test - That way a future test can use those parts to prime its test env * Fix for data loss in datadog logs sink w/ logs namespacing enabled - When logs namespacing is enabled on the datadog logs sink data loss will be observed. This is due to a bug that was introduced in a method called normalize_event which makes events conform with a standard that the datadogs logs backend defines. - The cause is due to unexpected behavior in how some of the methods in LogEvent.rs behave when the underlying Value type is not an Object. When it is not an object the value will be coerced into one and the existing data that was in the value type would be lost. This is why some of the existing unit tests passed, because in those tests the input was hardcoded to the type of an object, whereas coming from the datadog_agent source, the Event was of type Bytes. - The fix is to ahead of time coerce the type into an object (if necesssary) and nest it under the 'message' key, where the datadog logs backend expects the content of the log to exist. * Add changelog file for datadog sink logs namespace bug * Expand on changelog with more detail * Add a period to the end of log messages * cargo markdown language annotation * Update docs * Update comment * Create is_reserved_attribute method - Remove level of indentation within normalize_as_agent() by exiting early if the events internal value is not of a map type. * Slightly rewording error message --------- Co-authored-by: Pavlos Rontidis <[email protected]>

vladimir-dd · 2025-04-28T15:22:53Z

Sorry for the late feedback — I realize the PR is already closed, but I wanted to quickly share an idea for future improvements.

After this PR, the behavior looks like:

Input:

{
  "a": 1,
  "status": "info",
  "message": {
    "a": 2
  }
}

Output:

{
  "status": "info",
  "message": {
    "a": 2,
    "_collisions": {
      "a": 1
    }
  }
}

Drawbacks:

Introduces a non-standard _collisions field that users are not expecting.

Proposed solution:

Instead of creating _collisions, move all custom attributes and the original message into a new top-level message field, leaving reserved fields at the root.
Result:

{
  "status": "info",
  "message": {
    "a": 1,
    "message": {
      "a": 2
    }
  }
}

Why:

No new fields are invented.
The Logs Intake already parses the message recursively and naturally handles any conflicts.
This preserves the behavior users were familiar with before the change.

graphcareful · 2025-04-28T21:55:42Z

Sorry for the late feedback — I realize the PR is already closed, but I wanted to quickly share an idea for future improvements.

Thank you, feedback implemented here

graphcareful requested a review from a team as a code owner March 20, 2025 23:42

github-actions bot added the domain: sinks Anything related to the Vector's sinks label Mar 20, 2025

graphcareful force-pushed the datadog-logs-http-header branch from a659211 to bdd8f95 Compare March 20, 2025 23:44

graphcareful requested review from bruceg, pront and tessneau March 20, 2025 23:45

graphcareful changed the title ~~fix(datadog): Apply agent-json header on events from agent~~ fix(datadog_logs sink): Apply agent-json header on events from agent Mar 20, 2025

bruceg approved these changes Mar 21, 2025

View reviewed changes

src/sinks/datadog/logs/sink.rs Outdated Show resolved Hide resolved

src/sinks/datadog/logs/sink.rs Outdated Show resolved Hide resolved

changelog.d/datadog_agent_http_header.fix.md Outdated Show resolved Hide resolved

pront reviewed Mar 21, 2025

View reviewed changes

src/sinks/datadog/logs/sink.rs Outdated Show resolved Hide resolved

graphcareful force-pushed the datadog-logs-http-header branch from bdd8f95 to 6d33b4c Compare March 21, 2025 15:30

brent-hronik reviewed Mar 21, 2025

View reviewed changes

src/sinks/datadog/logs/sink.rs Outdated Show resolved Hide resolved

pront previously requested changes Mar 21, 2025

View reviewed changes

pront self-requested a review March 25, 2025 14:28

pront reviewed Mar 25, 2025

View reviewed changes

src/sinks/datadog/logs/sink.rs Outdated Show resolved Hide resolved

src/sinks/datadog/logs/sink.rs Outdated Show resolved Hide resolved

graphcareful added 5 commits April 14, 2025 10:49

fix(datadog_logs sink): Normalize payload to agent format

a4c167f

Set DD-PROTOCOL header and conditionally apply normalization logic

fe6b3e4

Refactor test to reuse large EventMetadata definition

6a5060b

Unit tests for normalize_as_agent_event() routine

ce5bfe7

Add changelog file

ae8d173

graphcareful force-pushed the datadog-logs-http-header branch from 6d33b4c to ae8d173 Compare April 14, 2025 14:50

graphcareful requested review from pront, brent-hronik and bruceg April 14, 2025 14:51

pront reviewed Apr 14, 2025

View reviewed changes

src/sinks/datadog/logs/sink.rs Outdated Show resolved Hide resolved

src/sinks/datadog/logs/sink.rs Outdated Show resolved Hide resolved

src/sinks/datadog/logs/sink.rs Show resolved Hide resolved

src/sinks/datadog/logs/config.rs Show resolved Hide resolved

Fix clippy error

e253623

graphcareful added 9 commits April 15, 2025 12:08

Update comment block

2294833

Test agent_conforming against Vector namespaced data

7bf6b6b

Fix broken unit test

624df5d

Break out parts of normalize_vector_namepace test

c87a57a

- That way a future test can use those parts to prime its test env

Add changelog file for datadog sink logs namespace bug

cf7c822

Expand on changelog with more detail

6cb3685

Add a period to the end of log messages

e29a857

cargo markdown language annotation

ce7d09d

graphcareful requested a review from a team as a code owner April 16, 2025 03:23

github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label Apr 16, 2025

Update docs

aa0a7db

graphcareful force-pushed the datadog-logs-http-header branch from 07e1f78 to aa0a7db Compare April 16, 2025 04:20

pront approved these changes Apr 16, 2025

View reviewed changes

graphcareful added 2 commits April 16, 2025 10:57

Update comment

abd59c2

Merge remote-tracking branch 'origin' into datadog-logs-http-header

fd2076a

aliciascott approved these changes Apr 16, 2025

View reviewed changes

bruceg reviewed Apr 16, 2025

View reviewed changes

graphcareful and others added 4 commits April 16, 2025 13:08

Merge branch 'master' into datadog-logs-http-header

2dbb963

Merge remote-tracking branch 'origin/master' into datadog-logs-http-h…

a2badd1

…eader

Create is_reserved_attribute method

4955580

- Remove level of indentation within normalize_as_agent() by exiting early if the events internal value is not of a map type.

Slightly rewording error message

a99ee6c

graphcareful added this pull request to the merge queue Apr 16, 2025

Merged via the queue into vectordotdev:master with commit ce170d5 Apr 16, 2025
56 checks passed

graphcareful deleted the datadog-logs-http-header branch April 16, 2025 22:27

graphcareful mentioned this pull request Apr 28, 2025

fix(datadog_logs sink): Remove _collisions field in agent normalization routine #22956

Merged

8 tasks

fix(datadog_logs sink): Apply agent-json header on events from agent #22701

fix(datadog_logs sink): Apply agent-json header on events from agent #22701

Uh oh!

Conversation

graphcareful commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type

Is this a breaking change?

How did you test this PR?

Does this PR include user facing changes?

Checklist

References

Uh oh!

bruceg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

graphcareful commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

pront left a comment

Choose a reason for hiding this comment

Uh oh!

graphcareful commented Mar 21, 2025

Uh oh!

Uh oh!

Uh oh!

graphcareful commented Apr 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brent-hronik commented Apr 14, 2025

Uh oh!

Uh oh!

Uh oh!

pront Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

graphcareful Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bruceg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bruceg Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

graphcareful Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

bruceg Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

bruceg Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

graphcareful Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vladimir-dd commented Apr 28, 2025

Uh oh!

graphcareful commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

graphcareful commented Mar 20, 2025 •

edited

Loading

graphcareful commented Apr 28, 2025 •

edited

Loading