From a81e9d10a74ed39ae4ee96d24ff026db4aaf946c Mon Sep 17 00:00:00 2001 From: Jason Stirnaman Date: Wed, 13 Aug 2025 13:12:37 -0500 Subject: [PATCH 1/4] feat(clustered): add dashboard screenshot guidance for query performance issues Adds specific IOx Querier dashboard metrics to capture when reporting query performance issues to InfluxData engineers, including: - CPU utilization and resource limits - Object Store Traffic/Latency metrics - Cache Requests and miss rates - Query concurrency and rate metrics - Parquet files per query counts - Request Duration timing Also includes sample EXPLAIN ANALYZE output to help users focus on the most critical performance bottleneck indicators. Addresses secondary issue in influxdata/DAR#514 --- .../report-query-performance-issues.md | 35 +++++++++++++++++-- 1 file changed, 32 insertions(+), 3 deletions(-) diff --git a/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md b/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md index 0fc9d97b42..f4a7dd61f5 100644 --- a/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md +++ b/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md @@ -145,14 +145,24 @@ Your test findings and associated debug information from your Kubernetes environment can help recommend configuration changes to improve query performance as your usage scales. - +On the IOx Querier dashboard, capture screenshots showing: + +- **CPU utilization**: Is it running high (close to the limits you set)? +- **Object Store Traffic/Latency**: Often a major contributor to performance issues +- **Cache Requests bytes**: Shows cache misses as separate series +- **Query concurrency and rate metrics**: + - grpc Requests + - Query Rate + - Query Concurrency (note the 10-minute maximum limitation) +- **Parquet files per query**: Number of files accessed per query +- **Request Duration...DoGet**: Query execution timing ### Gather debug information @@ -317,6 +327,25 @@ curl --get "https://{{< influxdb/host >}}/query" \ {{% /code-placeholders %}} +The `EXPLAIN ANALYZE` output can be dense and hard to read. +Focus on the sections with the highest `elapsed_compute` times, as these indicate performance bottlenecks. +For example, here is extracted timing data from an ANALYZE output showing the most time-consuming operations: + +```text +DeduplicateExec + └→ elapsed_compute=3.514663491s 3514.66ms +SortPreservingMergeExec + └→ elapsed_compute=12.440516244s 12440.52ms +SortExec + └→ elapsed_compute=993.952663ms 993.95ms +AggregateExec + └→ elapsed_compute=406.163116ms 406.16ms +ParquetExec + └→ time_elapsed_scanning_total=1044.149737489s 1044149.74ms + └→ time_elapsed_opening=3.001925899s 3001.93ms + └→ time_elapsed_processing=2.255025048s 2255.03ms +``` + ### Gather system information > [!Warning] From 96fdb3ea0bff0f8d9e4fa4751b2b253be29ffc76 Mon Sep 17 00:00:00 2001 From: Jason Stirnaman Date: Wed, 13 Aug 2025 14:26:30 -0500 Subject: [PATCH 2/4] Update content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md --- .../report-query-performance-issues.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md b/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md index f4a7dd61f5..68beabf39f 100644 --- a/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md +++ b/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md @@ -327,8 +327,10 @@ curl --get "https://{{< influxdb/host >}}/query" \ {{% /code-placeholders %}} -The `EXPLAIN ANALYZE` output can be dense and hard to read. -Focus on the sections with the highest `elapsed_compute` times, as these indicate performance bottlenecks. + ```suggestion +Include `EXPLAIN ANALYZE` output. + +When using the output for troubleshooting performance, focus on the sections with the highest `elapsed_compute` times, as these indicate performance bottlenecks. For example, here is extracted timing data from an ANALYZE output showing the most time-consuming operations: ```text From 429e413c2ed66438be91567b9daa0848bc842589 Mon Sep 17 00:00:00 2001 From: Jason Stirnaman Date: Wed, 13 Aug 2025 14:27:30 -0500 Subject: [PATCH 3/4] Update content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md --- .../report-query-performance-issues.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md b/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md index 68beabf39f..d58f124e32 100644 --- a/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md +++ b/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md @@ -147,7 +147,7 @@ improve query performance as your usage scales. ### Capture dashboard screenshots -For query performance issues, always capture screenshots of the query dashboard as a first step. +For query performance issues, always capture screenshots of the Querier Dashboard as a first step. If you have set up alerts and dashboards for monitoring your cluster, capture screenshots of dashboard events for Queriers, Compactors, and Ingesters. From 11af1444eff72dd64d2ed60d8411050290c8da86 Mon Sep 17 00:00:00 2001 From: Jason Stirnaman Date: Wed, 13 Aug 2025 14:28:17 -0500 Subject: [PATCH 4/4] Update content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md --- .../report-query-performance-issues.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md b/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md index d58f124e32..91abd1593c 100644 --- a/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md +++ b/content/influxdb3/clustered/query-data/troubleshoot-and-optimize/report-query-performance-issues.md @@ -152,7 +152,7 @@ For query performance issues, always capture screenshots of the Querier Dashboar If you have set up alerts and dashboards for monitoring your cluster, capture screenshots of dashboard events for Queriers, Compactors, and Ingesters. -On the IOx Querier dashboard, capture screenshots showing: +On the Querier dashboard, capture screenshots showing: - **CPU utilization**: Is it running high (close to the limits you set)? - **Object Store Traffic/Latency**: Often a major contributor to performance issues