docs: Improvements in EXPLAIN docs #31201

ggevay · 2025-01-27T14:51:59Z

Update example output for fast path EXPLAIN
CTE correspondence
MIR Union text tweak:
- Only some Union stages force consolidation.
- Eliminate "stage" terminology, because this is an internal-only terminology (coming from the dataflow world). Also, it's easy to confuse it with "EXPLAIN stages", which is a completely different thing, referring to the stages of compiling a query.
- Note that it corresponds to UNION DISTINCT.
Join
- Correct mem usage, and link to the Optimization page.
- Correct RAW PLAN example.
Various tweaks for Reduce.
Change the order of the "Plan operators" table, to have the default EXPLAIN first.
I removed the "Private preview" marker from "filter pushdown". I'm pretty sure this is enabled globally for everybody, and is turned on by default.
Various operators had the following: "Uses memory proportional to the number of input updates" This might be misleading in multiple ways:
- The reader might think that this grows without bound, if there are new updates coming in continuously.
- The reader might think that the initial snapshot is not included. I think we should simply say "Uses memory proportional to the input size"
Flipped Return ... With ... to With ... Return ..., to follow Change CTE order in EXPLAIN output #30983 and Change HIR's and LIR's EXPLAIN CTE order #31132 (And tweaked the text.)
And many other minor tweaks.

Additionally, in the Optimization page, tweak the Delta join section: Eliminate the DeltaQuery terminology, as it is not used externally in other parts of the docs. Plus some minor tweaks in the text.

cc @ala2134

Motivation

https://www.notion.so/materialize/EXPLAIN-PLAN-Usability-17613f48d37b80139742d8c3ee710640

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

ggevay · 2025-01-27T14:59:35Z

doc/user/content/sql/explain-plan.md

 **cardinality** | Annotate each subplan with a symbolic estimate of its cardinality.
 **join implementations** | Render details about the [implementation strategy of optimized MIR `Join` nodes](#explain-with-join-implementations).
-**keys** | Annotate each subplan with its unique keys.
+**keys** | Annotates each subplan with unique keys, presented as a list of unique keys (in parentheses), where each unique key (in brackets) is a list of column identifiers. A list of column identifiers is reported as a unique key when for each setting of those columns to values there is at most one record in the collection. For example, `([0], [1,2])` indicates that column zero is a unique key, and columns 1 and 2 also form a unique key. Materialize only reports the most succinct form of keys, so for example while `[0]` and `[0, 1]` might both be unique keys, the latter is implied by the former and omitted. `()` indicates that the collection does not have any unique keys, while `([])` indicates that the empty projection is a unique key, meaning that the collection consists of 0 or 1 rows.


See https://github.com/MaterializeInc/database-issues/issues/8569

ala2134

Content looks great! Added a few comments.

doc/user/content/sql/explain-plan.md

doc/user/data/explain_plan_operators.yml

kay-kim

Thanks much! I left couple of questions, but overall looks good.

We probably could call out certain things more (using boxes/section headings) for presentation/skimmability. Let me know if you want me to do a small patch to add these. (it can also wait until I refactor this page when redoing our sql reference pages)

kay-kim · 2025-01-30T22:36:03Z

doc/user/data/explain_plan_operators.yml

    uses_memory: True
    memory_details: |
-      Depends. When it does, uses memory proportional to the number of input updates.
+      Uses memory proportional to the input size. Note that in the LIR / physical plan.


Q: "Note that in the LIR / physical plan" -- is this statement a note to ourselves? or are we stating that in the LIR/physical plan, it's annotated as such.

Sorry, I forgot to finish this sentence... Corrected now! It says the following:
"Uses memory proportional to the input size. Note that in the LIR / physical plan, Arrange/ArrangeBy almost always means that an arrangement will actually be created. (This is in contrast to the "optimized" plan, where an ArrangeBy being present in the plan often does not mean that an arrangement will actually be created.)"

kay-kim · 2025-01-30T22:38:07Z

doc/user/data/explain_plan_operators.yml

@@ -261,7 +265,7 @@ operators:
      Alias for a `Reduce` with an empty aggregate list.
    uses_memory: True
    memory_details: |
-      Uses memory proportional to the number of input updates, twice.
+      Uses memory proportional to the input and output size.


For this one, not twice?

I looked into this more, and it turns out that it's output size twice. I went through all the other ones again, and made some more corrections in a separate commit.

Btw. there is doc/developer/arrangements.md, a very old internal document discussing mem usage of operators. I think the things there are still mostly true...

doc/user/data/explain_plan_operators.yml

kay-kim · 2025-01-30T23:01:18Z

doc/user/content/sql/explain-plan.md

 **cardinality** | Annotate each subplan with a symbolic estimate of its cardinality.
 **join implementations** | Render details about the [implementation strategy of optimized MIR `Join` nodes](#explain-with-join-implementations).
-**keys** | Annotate each subplan with its unique keys.
+**keys** | Annotates each subplan with unique keys, presented as a list of unique keys (in parentheses), where each unique key (in brackets) is a list of column identifiers. A list of column identifiers is reported as a unique key when for each setting of those columns to values there is at most one record in the collection. For example, `([0], [1,2])` indicates that column zero is a unique key, and columns 1 and 2 also form a unique key. Materialize only reports the most succinct form of keys, so for example while `[0]` and `[0, 1]` might both be unique keys, the latter is implied by the former and omitted. `()` indicates that the collection does not have any unique keys, while `([])` indicates that the empty projection is a unique key, meaning that the collection consists of 0 or 1 rows.


Maybe more like:

Annotates each subplan with a parenthesized list of unique keys. Each unique key is presented as a bracketed list of column identifiers. A list of column identifiers is reported as a unique key when for each setting of those columns to values there is at most one record in the collection. For example, ([0], [1,2]) is a list of two unique keys: column zero is a unique key, and columns 1 and 2 also form a unique key. ...

Thanks, this sounds much better!

- Update example output for fast path EXPLAIN - CTE correspondence - MIR Union text tweak: - Only _some_ Union stages force consolidation. - Eliminate "stage" terminology, because this is an internal-only terminology (coming from the dataflow world). Also, it's easy to confuse it with "EXPLAIN stages", which is a completely different thing, referring to the stages of compiling a query. - Note that it corresponds to UNION DISTINCT. - Join - Correct mem usage, and link to the Optimization page. - Correct RAW PLAN example. - Various tweaks for Reduce. - Change the order of the "Plan operators" table, to have the default EXPLAIN first. - I removed the "**Private preview**" marker from "filter pushdown". I'm pretty sure this is enabled globally for everybody, and is turned on by default. - Various operators had the following: "Uses memory proportional to the number of input updates" This might be misleading in multiple ways: - The reader might think that this grows without bound, if there are new updates coming in continuously. - The reader might think that the initial snapshot is not included. I think we should simply say "Uses memory proportional to the input size" - Flipped `Return ... With ...` to `With ... Return ...`, to follow MaterializeInc#30983 and MaterializeInc#31132 (And tweaked the text.) - And many other minor tweaks... Additionally, in the Optimization page, tweak the Delta join section: Eliminate the `DeltaQuery` terminology, as it is not used externally in other parts of the docs. Plus some minor tweaks in the text.

ggevay added the A-docs Area: documentation label Jan 27, 2025

ggevay requested a review from a team as a code owner January 27, 2025 14:52

ggevay requested a review from kay-kim January 27, 2025 14:53

ggevay commented Jan 27, 2025

View reviewed changes

ala2134 reviewed Jan 27, 2025

View reviewed changes

doc/user/content/sql/explain-plan.md Outdated Show resolved Hide resolved

doc/user/data/explain_plan_operators.yml Outdated Show resolved Hide resolved

ggevay force-pushed the explain-docs-1 branch from e957f40 to 596e8ca Compare January 27, 2025 16:00

kay-kim approved these changes Jan 31, 2025

View reviewed changes

ggevay added 2 commits February 1, 2025 16:59

docs: Further corrections for mem usage of operators

9f0c107

ggevay force-pushed the explain-docs-1 branch from 596e8ca to 9f0c107 Compare February 1, 2025 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Improvements in EXPLAIN docs #31201

docs: Improvements in EXPLAIN docs #31201

ggevay commented Jan 27, 2025 •

edited

Loading

ggevay Jan 27, 2025

ala2134 left a comment

kay-kim left a comment

kay-kim Jan 30, 2025

ggevay Feb 1, 2025

kay-kim Jan 30, 2025

ggevay Feb 1, 2025

ggevay Feb 1, 2025

kay-kim Jan 30, 2025

ggevay Feb 1, 2025

docs: Improvements in EXPLAIN docs #31201

Are you sure you want to change the base?

docs: Improvements in EXPLAIN docs #31201

Conversation

ggevay commented Jan 27, 2025 • edited Loading

Motivation

Tips for reviewer

Checklist

Choose a reason for hiding this comment

ala2134 left a comment

Choose a reason for hiding this comment

kay-kim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggevay commented Jan 27, 2025 •

edited

Loading