Skip to content

Conversation

pracucci
Copy link
Collaborator

@pracucci pracucci commented Sep 25, 2025

What this PR does

In this PR I propose to add an experimental config option to exclude the le when compute the series hash, used for sharding to ingesters / partitions.

This is to confirm whether the flakyness we see when querying the most recent data from classic histograms is due to inconsistent bucket values when the ingest storage is used. We see the similar issue in classic architecture, and apparently the issue disappear if this same exact change is done in Alloy remote storage sharding, but it still persist in the ingest storage architecture. My theory is that it's because Mimir suffers the same issue too, so I would like to experiment removing the le from classic histograms sharding.

No measurable perf impact:

goos: darwin
goarch: arm64
pkg: github.com/grafana/mimir/pkg/mimirpb
cpu: Apple M3 Pro
                                                                             │ before.txt  │             after.txt             │
                                                                             │   sec/op    │   sec/op     vs base              │
ShardBySeriesLabels/ExcludeClassicHistogramBucketLabel_is_disabled-11          148.8n ± 2%   153.3n ± 3%  +3.06% (p=0.004 n=6)
ShardBySeriesLabelAdapters/ExcludeClassicHistogramBucketLabel_is_disabled-11   133.0n ± 0%   133.2n ± 1%  +0.15% (p=0.043 n=6)
ShardBySeriesLabels/ExcludeClassicHistogramBucketLabel_is_enabled-11                         153.1n ± 3%
ShardBySeriesLabelAdapters/ExcludeClassicHistogramBucketLabel_is_enabled-11                  134.2n ± 6%
geomean                                                                        140.7n        143.1n       +1.59%

                                                                             │  before.txt  │             after.txt              │
                                                                             │     B/op     │    B/op     vs base                │
ShardBySeriesLabels/ExcludeClassicHistogramBucketLabel_is_disabled-11          0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
ShardBySeriesLabelAdapters/ExcludeClassicHistogramBucketLabel_is_disabled-11   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
ShardBySeriesLabels/ExcludeClassicHistogramBucketLabel_is_enabled-11                          0.000 ± 0%
ShardBySeriesLabelAdapters/ExcludeClassicHistogramBucketLabel_is_enabled-11                   0.000 ± 0%
geomean                                                                                   ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                                                             │  before.txt  │             after.txt              │
                                                                             │  allocs/op   │ allocs/op   vs base                │
ShardBySeriesLabels/ExcludeClassicHistogramBucketLabel_is_disabled-11          0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
ShardBySeriesLabelAdapters/ExcludeClassicHistogramBucketLabel_is_disabled-11   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=6) ¹
ShardBySeriesLabels/ExcludeClassicHistogramBucketLabel_is_enabled-11                          0.000 ± 0%
ShardBySeriesLabelAdapters/ExcludeClassicHistogramBucketLabel_is_enabled-11                   0.000 ± 0%
geomean                                                                                   ²               +0.00%               ²
¹ all samples are equal
² summaries must be >0 to compute geomean

Which issue(s) this PR fixes or relates to

Ref: https://github.com/grafana/mimir-squad/issues/3263

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

Copy link
Contributor

github-actions bot commented Sep 25, 2025

@pracucci pracucci marked this pull request as ready for review September 25, 2025 13:22
@pracucci pracucci requested review from tacole02 and a team as code owners September 25, 2025 13:22
Copy link
Contributor

@seizethedave seizethedave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM%C

"validation.max-cost-attribution-cardinality": 2000,
"validation.cost-attribution-cooldown": 0,
"ruler.evaluation-delay-duration": 60000000000,
"ruler.evaluation-consistency-max-delay": 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Know where this diff came from?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new file introduced in #12766. There was a race between merging that PR and #12751, reason why the new config option (-ruler.evaluation-consistency-max-delay) wasn't included in Dimitar's PR.

return
}

h = HashAdd32(h, l.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it wouldn't improve anything to include the label name in the hash as all series in foo_seconds_histogram have that label, correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think adding just the label name makes any practical difference.

@pracucci pracucci force-pushed the shard-without-le-label branch from 4bd5b2d to 32cf5f2 Compare September 26, 2025 06:41
@pracucci
Copy link
Collaborator Author

Tested in a dev environment and doesn't solve the issue we're experiencing, so switch it back to draft while I keep investigating it.

@pracucci pracucci marked this pull request as draft September 26, 2025 09:27
Copy link
Contributor

@tacole02 tacole02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs look good! I left a few minor suggestions. Thank you!

* [ENHANCEMENT] Ingester: Improved the performance of active series custom trackers matchers. #12663
* [ENHANCEMENT] Compactor: Log sizes of downloaded and uploaded blocks. #12656
* [ENHANCEMENT] Ingester: Add `cortex_ingest_storage_reader_receive_and_consume_delay_seconds` metric tracking the time between when a write request is received in the distributor and its content is ingested in ingesters, when the ingest storage is enabled. #12751
* [ENHANCEMENT] Distributor: Add experimental configuration option `-distributor.sharding.exclude-classic-histogram-bucket-label` to compute the series hash – used for sharding – excluding the 'le' label. When this configuration option is set to true, all buckets of a classic histogram metric are stored in the same shard. #12815
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* [ENHANCEMENT] Distributor: Add experimental configuration option `-distributor.sharding.exclude-classic-histogram-bucket-label` to compute the series hashused for shardingexcluding the 'le' label. When this configuration option is set to true, all buckets of a classic histogram metric are stored in the same shard. #12815
* [ENHANCEMENT] Distributor: Add anexperimental configuration option, `-distributor.sharding.exclude-classic-histogram-bucket-label`, to compute the series hash, used for sharding, excluding the 'le' label. When you set this configuration option to true, all buckets of a classic histogram metric are stored in the same shard. #12815

"kind": "field",
"name": "exclude_classic_histogram_bucket_label",
"required": false,
"desc": "When set to true, the distributor computes the series hash – used for sharding – excluding the 'le' label. This means that, when this configuration option is set to true, all buckets of a classic histogram metric are stored in the same shard. This configuration option should be set for distributors, rulers and ingesters.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"desc": "When set to true, the distributor computes the series hashused for shardingexcluding the 'le' label. This means that, when this configuration option is set to true, all buckets of a classic histogram metric are stored in the same shard. This configuration option should be set for distributors, rulers and ingesters.",
"desc": "When set to true, the distributor computes the series hash, used for sharding, excluding the 'le' label. This means that when you set this configuration option to true, all buckets of a classic histogram metric are stored in the same shard. Set this configuration option for distributors, rulers, and ingesters.",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants