Skip to content

Conversation

@narqo
Copy link
Contributor

@narqo narqo commented Jan 7, 2026

What this PR does

Following #13875 and #13963

The PR adds a new cortex_bucket_store_bucket_index_discovery_difference_total metric, that tracks how often store-gateway receives requests from querier instances, whose bucket-index version is difference from the version, the store-gateway aware of.

This is how the debug data looked from a live dev cell (I'm not a huge fan of git-style "ours" vs. "theirs", etc naming; but that's what I came with)

Screenshot 2026-01-07 at 21 19 34

Our hypothesis that (typically) there should be zero cases, where queriers discover a new version of bucket-index faster than store-gateways. To start, the new metric allows us to proof this theory.

I'm thinking that in the future, we may move the passed bucket index from gRPC context to a request's hints to make this (an optional) part of the API contract. The theory is that we can leverage the fact that querier and store-gateway are in agreement about their view a bucket-index (this is still handwave'y for the moment; no concrete details here).

Which issue(s) this PR fixes or relates to

Relates to https://github.com/grafana/mimir-squad/issues/3373

@narqo narqo added the changelog-not-needed PRs that don't need a CHANGELOG.md entry label Jan 7, 2026
@narqo narqo force-pushed the vldmr/store-gw-recv-bucket-index-version branch 2 times, most recently from f7f4e2f to 8c1b022 Compare January 8, 2026 14:05
@narqo narqo force-pushed the vldmr/store-gw-recv-bucket-index-version branch from 688ca2c to db59721 Compare January 8, 2026 16:49
@narqo narqo changed the title wip! store-gateway: receive bucket index metadata from queriers store-gateway: receive bucket index metadata from queriers Jan 8, 2026
@narqo narqo marked this pull request as ready for review January 8, 2026 16:53
@narqo narqo requested a review from a team as a code owner January 8, 2026 16:53
Copy link
Contributor

@alexweav alexweav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if diff > 0 {
group = labelDiscoveryDiffNewer
} else {
group = labelDiscoveryDiffOlder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our hypothesis that (typically) there should be zero cases, where queriers discover a new version of bucket-index faster than store-gateways

If this is the case, it might be cheaper to just log at warning level when it's unexpectedly older, instead of introducing new metrics that are always recorded but in theory will be entirely in the Newer category?

Copy link
Contributor Author

@narqo narqo Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now as you mentioned this, I wonder if we really need this metric at all — how about we start with only a warning (and a debug for normal case)? WDYT? Ref 31d7e2d

narqo added 2 commits January 9, 2026 13:20
Signed-off-by: Vladimir Varankin <[email protected]>
Signed-off-by: Vladimir Varankin <[email protected]>
@narqo narqo force-pushed the vldmr/store-gw-recv-bucket-index-version branch from 0616302 to 0db1823 Compare January 9, 2026 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog-not-needed PRs that don't need a CHANGELOG.md entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants