-
Notifications
You must be signed in to change notification settings - Fork 2k
enhancement(observability): Add metrics to measure total event processing time #24481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
We want to record the time at which events are ingested in order to track latency as each travels through the topology. This is currently recorded for log events using the Vector namespace in the `vector.ingest_timestamp` metadata field, but we want it to be usable for all event types. As such, we need a new field in `struct EventMetadata`. This change adds the new field as an `Option` so as to retain sane semantics for `Default` implementations and to avoid extra calls to `Utc::now`. The `SourceSender::send` method sets this if the source doesn't so as to ensure complete coverage. For backward compatibility, this metadata is still inserted into the Vector log namespace metadata, taken from this new field. Since this metadata is set up before passing the events to the `SourceSender`, the ingest timestamp is set manually in sources that can create Vector namespace logs.
|
One question regarding the scale of the metrics here for use in a histogram: The processing time for each event is typically very small. This results in a histogram where all (or almost all) of the counts land in the smallest bucket which is rather undesirable. Should we scale the numbers to milliseconds (relatively standard) or microseconds (more precise), extend the smallest buckets down smaller yet, or something else? |
…sing time This adds an optional `trait BufferInstrumentation` hook to `struct BufferSender` which is called at the very start of the buffer send path. We use that hook to take the previously added universal event `ingest_timestamp` metadata and from it calculate the total time spent processing the event, including buffering delays. This time is emitted in internal metrics as a `event_processing_time_seconds` histogram and `event_processing_time_mean_seconds` gauge, the latter using an EWMA to smooth the mean over time.
edcd7d8 to
d5e5b7e
Compare
drichards-87
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a small suggestion on the PR from the Docs Team and approved the PR.
Summary
This is a combination of two major changes:
latency as each travels through the topology. This is currently recorded for log
events using the Vector namespace in the
vector.ingest_timestampmetadatafield, but we want it to be usable for all event types. As such, we need a new
field in
struct EventMetadata.This change adds the new field as an
Optionso as to retain sane semantics forDefaultimplementations and to avoid extra calls toUtc::now. TheSourceSender::sendmethod sets this if the source doesn't so as to ensurecomplete coverage.
For backward compatibility, this metadata is still inserted into the Vector log
namespace metadata, taken from this new field. Since this metadata is set up
before passing the events to the
SourceSender, the ingest timestamp is setmanually in sources that can create Vector namespace logs.
trait BufferInstrumentationhook tostruct BufferSenderwhich is called at the very start of the buffer send path. We usethat hook to take the previously added universal event
ingest_timestampmetadata and from it calculate the total time spent processing the event,
including buffering delays. This time is emitted in internal metrics as a
event_processing_time_secondshistogram andevent_processing_time_mean_secondsgauge, thelatter using an EWMA to smooth the mean over time.
This PR is broken down into 4 incremental changes, so if it is too large to review in one go, I can trivially break it apart.
Vector configuration
Same configuration as #24453 with simply
internal_metrics, a transform, andconsole, and there are the new metrics.How did you test this PR?
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details here.