Skip to content

Conversation

@zenador
Copy link
Contributor

@zenador zenador commented Jan 5, 2026

What this PR does

Make the delayed name removal feature configurable per tenant. Note that this applies only to MQE. The feature in Prometheus engine is disabled and we are not adding a flag to control that in this PR (please let me know if we should add it) as it is only used as a fallback engine.

Tested locally in docker-compose.

Which issue(s) this PR fixes or relates to

Follow up to #12509

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]. If changelog entry is not needed, please add the changelog-not-needed label to the PR.
  • about-versioning.md updated with experimental features.

Note

Moves delayed name removal control from a global querier flag to a per-tenant limit (MQE-only) and threads it through planning, execution, and analysis.

  • Adds per-tenant limit enable_delayed_name_removal (CLI -querier.enable-delayed-name-removal); removes querier.enable_delayed_name_removal from querier config and disables Prometheus engine’s version
  • MQE now reads the setting via QueryLimitsProvider (new GetEnableDelayedNameRemoval); returns an error on mixed-tenant conflicts
  • QueryPlanner.NewQueryPlan now accepts enableDelayedNameRemoval; analysis handler takes limits provider and passes the setting through
  • Refactors eliminate-dedupe optimization to take the flag per-plan instead of storing it; updates help text, docs, config descriptor, and CHANGELOG
  • Updates tests/benchmarks: new NewStaticQueryLimitsProvider(limit, enable) signature and added boolean arg to planner calls

Written by Cursor Bugbot for commit 847fd12. This will update automatically on new commits. Configure here.

@zenador zenador requested review from a team and tacole02 as code owners January 5, 2026 22:38
@zenador
Copy link
Contributor Author

zenador commented Jan 5, 2026

Previously, the feature was controlled at an engine level. Now that this is per tenant, we apply it per query / query plan based on the tenants. One problem is that the setting is no longer applied at the query planner level, so now the optimization passes do not know whether the feature is enabled or not, and we use it in EnableEliminateDeduplicateAndMerge. Not sure what the best way to solve this is. For now I have modified that optimization pass to work without this knowledge, but it eliminates fewer nodes now to be on the safe side regardless of the delayed name removal feature, so it doesn't optimize as much as it used to. Please let me know if you have any suggestions on how to better handle this.

One alternative approach I considered and discarded for this is to create two engines, one with the feature enabled and the other with it disabled, and for each query we can direct it to the appropriate engine based on the tenants, which is a similar mechanism to the fallback engine. I decided against this for now as if we were to do this for future per-tenant toggles as well, the number of engines we create would increase exponentially.

Also, how can we get info about the tenants in the query with the Analysis endpoint? It's just hardcoded as disabled in there for now.

Fixed based on Charles' suggestions.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

Copy link
Contributor

@tacole02 tacole02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs look good! I left one minor suggestion. Thank you!

cursor[bot]

This comment was marked as outdated.

@cursor
Copy link

cursor bot commented Jan 19, 2026

Bugbot Autofix prepared a fix for the bug found in the latest run.

  • ✅ Fixed: Wrong context breaks per-tenant delayed name removal
    • Changed context.Background() to r.Context() in the GetEnableDelayedNameRemoval call so tenant IDs can be extracted from the request context.

Create PR

@zenador zenador force-pushed the zenador/per-tenant-delayed-name-removal branch from 3be20a2 to 1c36681 Compare January 19, 2026 19:25
@zenador zenador force-pushed the zenador/per-tenant-delayed-name-removal branch 2 times, most recently from ac1ace3 to cba2b94 Compare January 22, 2026 00:14
@zenador zenador force-pushed the zenador/per-tenant-delayed-name-removal branch from cba2b94 to 850529d Compare January 22, 2026 00:20
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is ON. A Cloud Agent has been kicked off to fix the reported issue.

@cursor
Copy link

cursor bot commented Jan 22, 2026

Bugbot Autofix prepared a fix for the bug found in the latest run.

  • ✅ Fixed: Test expects success but implementation returns error for conflicting settings
    • Fixed the test by changing user-3 to have EnableDelayedNameRemoval: true and updating the test case to use user-2|user-3 (both with enabled=true), ensuring no conflicting settings that would cause GetEnableDelayedNameRemoval to return an error.

Create PR

@zenador zenador force-pushed the zenador/per-tenant-delayed-name-removal branch from 9c4dbc9 to 847fd12 Compare January 22, 2026 01:02
Comment on lines +1017 to +1018
limitsProvider := querier.NewTenantQueryLimitsProvider(t.Overrides)
analysisHandler := analysis.Handler(t.QuerierQueryPlanner, limitsProvider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#14086 has been merged since you opened this - we should use the shared limits provider instance rather than creating a new one:
t.QueryLimitsProvider

Suggested change
limitsProvider := querier.NewTenantQueryLimitsProvider(t.Overrides)
analysisHandler := analysis.Handler(t.QuerierQueryPlanner, limitsProvider)
analysisHandler := analysis.Handler(t.QuerierQueryPlanner, t.QueryLimitsProvider)

Comment on lines +1057 to +1058
limitsProvider := querier.NewTenantQueryLimitsProvider(t.Overrides)
analysisHandler := analysis.Handler(t.QueryFrontendQueryPlanner, limitsProvider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above:

Suggested change
limitsProvider := querier.NewTenantQueryLimitsProvider(t.Overrides)
analysisHandler := analysis.Handler(t.QueryFrontendQueryPlanner, limitsProvider)
analysisHandler := analysis.Handler(t.QueryFrontendQueryPlanner, t.QueryLimitsProvider)

Comment on lines +1929 to +1932
// If the first call succeeded but the second failed, use the second error, otherwise keep the first.
if actualErr == nil && actualErr2 != nil {
actualErr = actualErr2
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be clearer to split this test in two: one test for GetMaxEstimatedMemoryConsumptionPerQuery, and one for GetEnableDelayedNameRemoval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants