Skip to content

Conversation

@fionaliao
Copy link
Contributor

@fionaliao fionaliao commented Jan 7, 2026

manual backport of #13944 (due to merge conflicts)

<!--  Thanks for sending a pull request!  Before submitting:

1. Read our CONTRIBUTING.md guide
2. Rebase your PR if it gets out of sync with main
-->

The querier api wraps its queryable with a
NewErrorTranslateSampleAndChunkQueryable, which includes Mimir error
mappings, by default mapping to promql.ErrStorage (which is later mapped
to a 500).

https://github.com/grafana/mimir/blob/f5d064968c732ac49f72ac2551b4d98f596d21ed/pkg/api/handlers.go#L228

https://github.com/grafana/mimir/blob/da237be9f86efbd4a1daa38cb7d0db37dfe80daa/pkg/querier/error_translate_queryable.go#L105

https://github.com/grafana/mimir/blob/da237be9f86efbd4a1daa38cb7d0db37dfe80daa/pkg/querier/error_translate_queryable.go#L91

https://github.com/grafana/mimir/blob/068f3d023248d572b929234940cf981705cf8d82/pkg/api/error/error.go#L65-L68

MQE remote execution does not use the querier api, instead using the
`Dispatcher`, which does not use the error mapping queryable. This means
some errors coming back from storage (e.g. `"too many unhealthy
instances in the ring"`) were being incorrectly mapped as 422s instead,
as for a fallback the dispatcher returns a `apierror.TypeExec` error
which maps to 422 and there's no custom mapping for storage.

https://github.com/grafana/mimir/blob/1fdbf332931c920fbf5ff0503003328761b74a74/pkg/querier/dispatcher.go#L193-L196

Fixed by wrapping the queryable used by the dispatcher with
NewErrorTranslateSampleAndChunkQueryable, the same as for the querier
api.

Fixes #<issue number>

- [x] Tests updated.
- [ ] Documentation added.
- [x] `CHANGELOG.md` updated - the order of entries should be
`[CHANGE]`, `[FEATURE]`, `[ENHANCEMENT]`, `[BUGFIX]`. If changelog entry
is not needed, please add the `changelog-not-needed` label to the PR.
- [ ]
[`about-versioning.md`](https://github.com/grafana/mimir/blob/main/docs/sources/mimir/configure/about-versioning.md)
updated with experimental features.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **BUGFIX: MQE remote execution error mapping**
>
> - Wraps `Dispatcher` queryable with
`NewErrorTranslateSampleAndChunkQueryable` and changes type to
`storage.SampleAndChunkQueryable` to ensure storage errors map correctly
(e.g., HTTP 500).
> - Adds `TestDispatcher_RingErrorTranslation` covering ring errors like
`ErrTooManyUnhealthyInstances` and `ErrEmptyRing` (including wrapped
cases).
> - Updates `CHANGELOG.md` with the MQE bugfix entry.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
897a85f. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

(cherry picked from commit 0d601c4)
@fionaliao fionaliao requested a review from a team as a code owner January 7, 2026 19:33
@fionaliao fionaliao changed the title [r369] Fix issues running sharding inside MQE (#13536) [r369] Map remote execution storage errors correctly (#13944) Jan 7, 2026
@fionaliao fionaliao merged commit 940cba5 into r369 Jan 8, 2026
39 checks passed
@fionaliao fionaliao deleted the backport-13944-to-r369 branch January 8, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants