Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Improvements in EXPLAIN docs #31201

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ggevay
Copy link
Contributor

@ggevay ggevay commented Jan 27, 2025

  • Update example output for fast path EXPLAIN
  • CTE correspondence
  • MIR Union text tweak:
    • Only some Union stages force consolidation.
    • Eliminate "stage" terminology, because this is an internal-only terminology (coming from the dataflow world). Also, it's easy to confuse it with "EXPLAIN stages", which is a completely different thing, referring to the stages of compiling a query.
    • Note that it corresponds to UNION DISTINCT.
  • Join
    • Correct mem usage, and link to the Optimization page.
    • Correct RAW PLAN example.
  • Various tweaks for Reduce.
  • Change the order of the "Plan operators" table, to have the default EXPLAIN first.
  • I removed the "Private preview" marker from "filter pushdown". I'm pretty sure this is enabled globally for everybody, and is turned on by default.
  • Various operators had the following: "Uses memory proportional to the number of input updates" This might be misleading in multiple ways:
    • The reader might think that this grows without bound, if there are new updates coming in continuously.
    • The reader might think that the initial snapshot is not included. I think we should simply say "Uses memory proportional to the input size"
  • Flipped Return ... With ... to With ... Return ..., to follow Change CTE order in EXPLAIN output #30983 and Change HIR's and LIR's EXPLAIN CTE order #31132 (And tweaked the text.)
  • And many other minor tweaks.

Additionally, in the Optimization page, tweak the Delta join section: Eliminate the DeltaQuery terminology, as it is not used externally in other parts of the docs. Plus some minor tweaks in the text.

cc @ala2134

Motivation

https://www.notion.so/materialize/EXPLAIN-PLAN-Usability-17613f48d37b80139742d8c3ee710640

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ggevay ggevay added the A-docs Area: documentation label Jan 27, 2025
@ggevay ggevay requested a review from a team as a code owner January 27, 2025 14:52
@ggevay ggevay requested a review from kay-kim January 27, 2025 14:53
**cardinality** | Annotate each subplan with a symbolic estimate of its cardinality.
**join implementations** | Render details about the [implementation strategy of optimized MIR `Join` nodes](#explain-with-join-implementations).
**keys** | Annotate each subplan with its unique keys.
**keys** | Annotates each subplan with unique keys, presented as a list of unique keys (in parentheses), where each unique key (in brackets) is a list of column identifiers. A list of column identifiers is reported as a unique key when for each setting of those columns to values there is at most one record in the collection. For example, `([0], [1,2])` indicates that column zero is a unique key, and columns 1 and 2 also form a unique key. Materialize only reports the most succinct form of keys, so for example while `[0]` and `[0, 1]` might both be unique keys, the latter is implied by the former and omitted. `()` indicates that the collection does not have any unique keys, while `([])` indicates that the empty projection is a unique key, meaning that the collection consists of 0 or 1 rows.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@ala2134 ala2134 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content looks great! Added a few comments.

doc/user/content/sql/explain-plan.md Outdated Show resolved Hide resolved
doc/user/data/explain_plan_operators.yml Outdated Show resolved Hide resolved
Copy link
Contributor

@kay-kim kay-kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks much! I left couple of questions, but overall looks good.

We probably could call out certain things more (using boxes/section headings) for presentation/skimmability. Let me know if you want me to do a small patch to add these. (it can also wait until I refactor this page when redoing our sql reference pages)

uses_memory: True
memory_details: |
Depends. When it does, uses memory proportional to the number of input updates.
Uses memory proportional to the input size. Note that in the LIR / physical plan.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: "Note that in the LIR / physical plan" -- is this statement a note to ourselves? or are we stating that in the LIR/physical plan, it's annotated as such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot to finish this sentence... Corrected now! It says the following:
"Uses memory proportional to the input size. Note that in the LIR / physical plan, Arrange/ArrangeBy almost always means that an arrangement will actually be created. (This is in contrast to the "optimized" plan, where an ArrangeBy being present in the plan often does not mean that an arrangement will actually be created.)"

@@ -261,7 +265,7 @@ operators:
Alias for a `Reduce` with an empty aggregate list.
uses_memory: True
memory_details: |
Uses memory proportional to the number of input updates, twice.
Uses memory proportional to the input and output size.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, not twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this more, and it turns out that it's output size twice. I went through all the other ones again, and made some more corrections in a separate commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. there is doc/developer/arrangements.md, a very old internal document discussing mem usage of operators. I think the things there are still mostly true...

doc/user/data/explain_plan_operators.yml Outdated Show resolved Hide resolved
**cardinality** | Annotate each subplan with a symbolic estimate of its cardinality.
**join implementations** | Render details about the [implementation strategy of optimized MIR `Join` nodes](#explain-with-join-implementations).
**keys** | Annotate each subplan with its unique keys.
**keys** | Annotates each subplan with unique keys, presented as a list of unique keys (in parentheses), where each unique key (in brackets) is a list of column identifiers. A list of column identifiers is reported as a unique key when for each setting of those columns to values there is at most one record in the collection. For example, `([0], [1,2])` indicates that column zero is a unique key, and columns 1 and 2 also form a unique key. Materialize only reports the most succinct form of keys, so for example while `[0]` and `[0, 1]` might both be unique keys, the latter is implied by the former and omitted. `()` indicates that the collection does not have any unique keys, while `([])` indicates that the empty projection is a unique key, meaning that the collection consists of 0 or 1 rows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more like:

Annotates each subplan with a parenthesized list of unique keys. Each unique key is presented as a bracketed list of column identifiers. A list of column identifiers is reported as a unique key when for each setting of those columns to values there is at most one record in the collection. For example, ([0], [1,2]) is a list of two unique keys: column zero is a unique key, and columns 1 and 2 also form a unique key. ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this sounds much better!

- Update example output for fast path EXPLAIN
- CTE correspondence
- MIR Union text tweak:
  - Only _some_ Union stages force consolidation.
  - Eliminate "stage" terminology, because this is an internal-only
    terminology (coming from the dataflow world). Also, it's easy
    to confuse it with "EXPLAIN stages", which is a completely
    different thing, referring to the stages of compiling a query.
  - Note that it corresponds to UNION DISTINCT.
- Join
  - Correct mem usage, and link to the Optimization page.
  - Correct RAW PLAN example.
- Various tweaks for Reduce.
- Change the order of the "Plan operators" table, to have the default
  EXPLAIN first.
- I removed the "**Private preview**" marker from "filter pushdown". I'm
  pretty sure this is enabled globally for everybody, and is turned on
  by default.
- Various operators had the following:
  "Uses memory proportional to the number of input updates"
  This might be misleading in multiple ways:
  - The reader might think that this grows
    without bound, if there are new updates coming in continuously.
  - The reader might think that the initial snapshot is not included.
  I think we should simply say
  "Uses memory proportional to the input size"
- Flipped `Return ... With ...` to `With ... Return ...`, to follow
  MaterializeInc#30983 and
  MaterializeInc#31132
  (And tweaked the text.)
- And many other minor tweaks...

Additionally, in the Optimization page, tweak the Delta join section:
Eliminate the `DeltaQuery` terminology, as it is not used externally
in other parts of the docs. Plus some minor tweaks in the text.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants