Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[design doc] attribution and profiling #29762

Merged
merged 3 commits into from
Oct 17, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
228 changes: 228 additions & 0 deletions doc/developer/design/20240925_attribution_profiling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
# Attribution and profiling

- Associated: https://github.com/MaterializeInc/database-issues/issues/6551

## The Problem

It is difficult to understand query performance, namely: which part of
the query is responsible for poor performance?

There are three stakeholders who want to understand query performance:

- **customers**, who care about the queries they run;
- **field engineering**, who support customers;
- **the cluster team**, who support field engineering and
(independently) want to understand how queries behave to evaluate
their work.

Existing introspection views make it possible to understand dataflow
performance; writing WMR queries over these introspection views
suffices to get useful information about dataflow performance.

With some (??? how much/how tolerable ???) work, the cluster team and
field engineering can attribute this profile information to parts of
MIR or even SQL. Improvements to rendering have made the names of
dataflows more meaningful, but there is still a substantial gap
between dataflows and the MIR.

It is not clear to what extent customers can understand the
introspection views or the WMR queries aggregating their information.

## Success Criteria

A stakeholder can run a command like:

```
EXPLAIN PLAN PROFILE FOR INDEX my_crazy_view_idx;
```

and receive some output that explains how `my_crazy_view_idx` is
currently running, i.e., what operations does it perform and how well
are those operations performing (in terms of, e.g., memory or CPU
usage, hydration status, number of input rows for each operator,
worker skew).

In an ideal solution, all three stakeholders will receive output that
is legible at their level of sophistication.

There are three criteria that determine how legible the output is:

- **sophistication** Optimizer engineers are prepared to deal with
low-level details of the process, but others may not be prepared to
deal with such details.

- **education** Documentation and other materials can clarify
meanings and offer examples and explanation.

- **design** Our choices in how we represent the output and the
quality of our implementation.

## Approaches

What structures do we attribute profiling information to? We could
annotate dataflows; we could annotate the original SQL query. Our
current architecture admits five possibilities:

- (**dfl**) `EXPLAIN PLAN PROFILE` is a macro around a WMR analysis
query; it describes dataflows. Attribution is direct.

- (**lir**) `EXPLAIN PLAN PROFILE` annotates an LIR plan with
information mapped back from the dataflows. Attribution is more or
less direct.

- (**mir**) `EXPLAIN PLAN PROFILE` annotates an MIR plan with
information mapped back from the dataflows. Attribution requires
some mapping between MIR and LIR; the proposed 'address
predicting' forward mapping should support this.

- (**hir**) `EXPLAIN PLAN PROFILE` annotates an HIR plan with
information mapped back from the dataflows. Attribution requires
the MIR->LIR mapping as well as some mapping from HIR->MIR. The
current lowering is not careful about provenance; we would need to
add source location information at the HIR level and track
provenance at the MIR level, which would be a large change to the
`transform` crate.

- (**sql**) `EXPLAIN PLAN PROFILE` annotates the original SQL query
information mapped back from the dataflows. Attribution requires
HIR->MIR->LIR mapping; we would additionally need source location
information from the SQL parser, another large change.

Each of these approaches is differently legible; as we move closer to
the SQL query itself, implementation complexity increases.

| Approach | Cluster? | FieldEng? | Customer? | Impl. Complexity |
| -------- | -------- | --------- | --------- | ---------------- |
| dfl | ~ | ~ | X | low |
| lir | + | + | X | low |
| mir | + | + | ~ | medium |
| hir | ~ | ~ | + | high |
| sql | ~ | ~ | + | v high |

It seems unrealistic to expect customers to understand dataflows or
LIR as they exist now. MIR is a plausible thing to show customers (per
field engineering), though we lack documentation.

HIR and SQL will be more legible to customers, but the current
approach to lowering will make attribution at these levels _less_
useful for the cluster team and field engineering.

## Out of Scope

Designing a new level---DDIR, an MIR replacement---is too much for a
first cut. Refining an existing level is a good idea---but not the
first step.

## Solution Proposal

We should attribute at the MIR level: it offers the best mix of
abstraction and legibility at only a moderate level of complexity.

There is good precedent for choosing MIR, as MIR is cognate to
`EXPLAIN (FORMAT TEXT)` from Postgres, `EXPLAIN FORMAT=TREE` from
MySQL, or `EXPLAIN USING TEXT` from Snowflake.

We should identify a partner customer who can help us refine the
output of `EXPLAIN` in general and `EXPLAIN PLAN PROFILE` in
particular, rather than attempting to guess at the right information
to show.

## Implementation strategy

We will use introspection sources to do the source mapping. At present,
we have introspection sources that let us map dataflow addresses to
their statistics and operators.

Each LIR operator has a distinct (to that query) `LirId`, a `u64`
value generated when lowering MIR to LIR.

Each MIR operator has a distinct index in the post-order traversal;
we can call this index its `MirId`.

We will add introspection sources:

- mapping `GlobalId`s and `LirId`s to dataflow addresses
- mapping `GlobalId`s and `LirId`s to the LIR operator (think: a
single line of an `EXPLAIN PHYSICAL PLAN`)
- mapping `GlobalId`s and `MirId`s to an `LirId`
- mapping `GlobalId`s and `MirId`s to the MIR operator (think: a
single line of an `EXPLAIN PLAN`)

With these introspection sources, we can implement `EXPLAIN PLAN
PROFILE FOR (INDEX|MATERIALIZED VIEW) [name]` as a query not unlike
the existing ad-hoc WMR used to analyze dataflow. That query will:

- resolve `name` to a `GlobalId`
- run a query that:
+ joins (for that `GlobalId`) the `MirId`, the MIR operator, the
`LirId`, the dataflow address, and the dataflow statistics
+ sorts appropriately (by `MirId` descending will give you
post-order)
+ indents appropriately (cf. the existing WMR query)

We might see output like the following (totally made up numbers, query
drawn from `joins.slt`; plan seems weird to me, not relevant):

```
> CREATE MATERIALIZED VIEW v AS SELECT name, id FROM v4362 WHERE name = (SELECT name FROM v4362 WHERE id = 1);
> EXPLAIN PLAN PROFILE FOR MATERIALIZED VIEW v;
|operator |memory|
|----------------------------------------------------------------|------|
|Project (#0, #1) | |
| Join on=(#0 = #2) type=differential | |
| ArrangeBy keys=[[#0]] | 12GB|
| ReadStorage materialize.public.t4362 | |
| ArrangeBy keys=[[#0]] | 10GB|
| Union | |
| Project (#0) | |
| Get l0 | |
| Map (error("more than one record produced in subquery"))| |
| Project () | |
| Filter (#0 > 1) | |
| Reduce aggregates=[count(*)] | 3GB|
| Project () | |
| Get l0 | |
|cte l0 = | |
| Filter (#1 = 1) | |
| ReadStorage materialize.public.t4362 | |
```

There are pros and cons to this approach.

+ Minimal new code (just the new tables and the logic for the
`EXPLAIN PLAN PROFILE` macro).
+ Avoids tricky reentrancy: if we implemented `EXPLAIN PLAN PROFILE`
inside environmentd, we'd need to have a way to read from
introspection sources while running an `EXPLAIN`.
+ Adaptable: we can run custom queries or edit the macro easily. We
can therefore experiment a little more freely with how we render
different MIR operators.

- Doesn't hook in to existing `EXPLAIN` infrastructure.

If we find we miss features from the `EXPLAIN` infrastructure, we can
enrich what we put in the `MirId` operator mapping.

As an extension, we can support `EXPLAIN PLAN PROFILE ... AS SQL`,
which would generate and print the attribution query without actually
running it; this allows us to edit that query as we please.

## Open questions

We must be careful to show profiles and plans for the existing,
running dataflows. For example, suppose the user creates
`my_crazy_view` on top of `my_perfectly_sane_view` and then creates
`my_crazy_view_idx`. Later on, they create an index for
`my_perfectly_sane_view` called `my_perfectly_sane_view_idx`. The
running dataflow for `my_crazy_view_idx` will not use
`my_perfectly_sane_view_idx`, but if we were to recreate the index it
might!

There are several possible profiling metrics to track---current memory
usage, peak memory usage, current number of rows, row throughput,
worker skew, latency (average, P50, P99, max), timestamp. Which should
we do first? (I would propose some form of memory usage and some form
of latency.)

If we commit to implementing `EXPLAIN PLAN PROFILE` at the MIR level,
how hard is it to _also_ implement it for LIR along the way?
Loading