Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish queue latency metrics from tracked thread pools #120488

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

nicktindall
Copy link
Contributor

@nicktindall nicktindall commented Jan 21, 2025

We only publish queue latency for thread pools for which EsExecutors.TaskTrackingConfig#trackExecutionTime is true.

We only use existing timestamps (the queue time is measured as the time between TimedRunnable#creationTimeNanos and TimedRunnable#startTimeNanos).

We maintain an in-memory ExponentialBucketHistogram (formerly HandlingTimeTracker) for each monitored thread-poll to track the queue latencies for the tasks executed. Each time we poll for metrics we publish a hard-coded set of percentiles (I put 50th and 90th to begin with) as gauge values. This makes the querying possible with ES/QL and will allow ordering/filtering on those values.

After we've published the values we clear the histogram to start collecting observations for the next interval.

Happy to break this up (e.g. rename histogram/imlement percentiles, then add the metric) if we think this PR is doing too much.

Closes: ES-10531

handlingTimeTracker.clear();
return metricValues;
}
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if rather than publishing a time-series-per-percentile (using percentile attribute) we should publish a metric-per-percentile.
The metric makes no sense if you don't filter by a percentile label.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it easier to plot different percentiles on the same graph with labels (and group by) compared to two different time series?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that makes a difference, but I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just having a look at Kibana just now, it would be much easier to plot as a single metric grouped-by the percentiles. As separate metrics we'd need to add them as distinct time-series.

private final Function<Runnable, WrappedRunnable> runnableWrapper;
private final ExponentiallyWeightedMovingAverage executionEWMA;
private final LongAdder totalExecutionTime = new LongAdder();
private final boolean trackOngoingTasks;
// The set of currently running tasks and the timestamp of when they started execution in the Executor.
private final Map<Runnable, Long> ongoingTasks = new ConcurrentHashMap<>();
private final HandlingTimeTracker handlingTimeTracker = new HandlingTimeTracker();
Copy link
Contributor Author

@nicktindall nicktindall Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By using a HandlingTimeTracker we can publish specific percentile streams. This is as opposed to using an APM histogram metric, which comes with a lot of limitations (doesn't work in ESQL, can't sort by, can't filter on) and is costly to aggregate. The histogram is cleared each time we publish the percentiles, so the percentiles are for samples received since the last publish.

If we agree with this approach, I think it might be worth moving the HandlingTimeTracker somewhere common and giving it a more generic name (e.g. ExponentialBucketHistogram).

HandlingTimeTracker is the simplest-possible solution to publishing percentiles. We could get more accurate metrics over a window that was decoupled from the metric polling interval if we used something like exponential histograms or decaying histograms though they would likely incur a larger synchronisation overhead.

}
assert false : "We shouldn't ever get here";
return Long.MAX_VALUE;
}
Copy link
Contributor Author

@nicktindall nicktindall Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted to implement "exclusive" percentile because it was easier with the way the counts are stored in HandlingTimeTracker (the arrival of a value increments the counts of the first bucket with lower bound <= the value). If we wanted to get fancy we could look at interpolation, but I don't think it's necessary.

public String toCompositeString() {
return nodeName == null ? threadPoolName : nodeName + "/" + threadPoolName;
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this so we could separate the node name and thread pool name for the purpose of generating valid metric names.

i.e. es.thread_pool.instance-000003/write.threads.queue.latency.histogram is invalid, we want es.thread_pool.write.threads.queue.latency.histogram instead, so better to pass in something structured and unambiguous than making assumptions about the format of the string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether an alternative could be add a new method each to ExecutorHolder and TaskExecutionTimeTrackingEsThreadPoolExecutor, something like:

class ExecutorHolder {
    void registerMetrics(MeterRegistry meterRegistry) {
        if (executor instanceof TaskExecutionTimeTrackingEsThreadPoolExecutor te) {
            te.registerMetrics(info.name, meterRegistry)
        }
    }
}

class TaskExecutionTimeTrackingEsThreadPoolExecutor {
    void registerMetrics(String name, MeterRegistry meterRegistry) {
        meterRegistry.registerLongsGauge(...);
    }
}

and call ExecutorHolder#registerMetrics after building it in ThreadPool. I think it avoids the need of this class and associated cascading changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think you're on to something there. I will review that tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I've tidied that up now, thanks for the suggestion @ywangd

@nicktindall nicktindall marked this pull request as ready for review March 26, 2025 00:29
@nicktindall nicktindall requested a review from a team as a code owner March 26, 2025 00:29
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Mar 26, 2025
@nicktindall nicktindall added >enhancement :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) :Core/Infra/Metrics Metrics and metering infrastructure and removed needs:triage Requires assignment of a team area label labels Mar 26, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @nicktindall, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added Team:Core/Infra Meta label for core/infra team Team:Distributed Coordination Meta label for Distributed Coordination team labels Mar 26, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Metrics Metrics and metering infrastructure :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement Team:Core/Infra Meta label for core/infra team Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants