-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Improvements in EXPLAIN docs #31201
base: main
Are you sure you want to change the base?
Conversation
doc/user/content/sql/explain-plan.md
Outdated
**cardinality** | Annotate each subplan with a symbolic estimate of its cardinality. | ||
**join implementations** | Render details about the [implementation strategy of optimized MIR `Join` nodes](#explain-with-join-implementations). | ||
**keys** | Annotate each subplan with its unique keys. | ||
**keys** | Annotates each subplan with unique keys, presented as a list of unique keys (in parentheses), where each unique key (in brackets) is a list of column identifiers. A list of column identifiers is reported as a unique key when for each setting of those columns to values there is at most one record in the collection. For example, `([0], [1,2])` indicates that column zero is a unique key, and columns 1 and 2 also form a unique key. Materialize only reports the most succinct form of keys, so for example while `[0]` and `[0, 1]` might both be unique keys, the latter is implied by the former and omitted. `()` indicates that the collection does not have any unique keys, while `([])` indicates that the empty projection is a unique key, meaning that the collection consists of 0 or 1 rows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Content looks great! Added a few comments.
e957f40
to
596e8ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks much! I left couple of questions, but overall looks good.
We probably could call out certain things more (using boxes/section headings) for presentation/skimmability. Let me know if you want me to do a small patch to add these. (it can also wait until I refactor this page when redoing our sql reference pages)
uses_memory: True | ||
memory_details: | | ||
Depends. When it does, uses memory proportional to the number of input updates. | ||
Uses memory proportional to the input size. Note that in the LIR / physical plan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: "Note that in the LIR / physical plan" -- is this statement a note to ourselves? or are we stating that in the LIR/physical plan, it's annotated as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot to finish this sentence... Corrected now! It says the following:
"Uses memory proportional to the input size. Note that in the LIR / physical plan, Arrange
/ArrangeBy
almost always means that an arrangement will actually be created. (This is in contrast to the "optimized" plan, where an ArrangeBy
being present in the plan often does not mean that an arrangement will actually be created.)"
@@ -261,7 +265,7 @@ operators: | |||
Alias for a `Reduce` with an empty aggregate list. | |||
uses_memory: True | |||
memory_details: | | |||
Uses memory proportional to the number of input updates, twice. | |||
Uses memory proportional to the input and output size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this one, not twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this more, and it turns out that it's output size twice. I went through all the other ones again, and made some more corrections in a separate commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw. there is doc/developer/arrangements.md
, a very old internal document discussing mem usage of operators. I think the things there are still mostly true...
doc/user/content/sql/explain-plan.md
Outdated
**cardinality** | Annotate each subplan with a symbolic estimate of its cardinality. | ||
**join implementations** | Render details about the [implementation strategy of optimized MIR `Join` nodes](#explain-with-join-implementations). | ||
**keys** | Annotate each subplan with its unique keys. | ||
**keys** | Annotates each subplan with unique keys, presented as a list of unique keys (in parentheses), where each unique key (in brackets) is a list of column identifiers. A list of column identifiers is reported as a unique key when for each setting of those columns to values there is at most one record in the collection. For example, `([0], [1,2])` indicates that column zero is a unique key, and columns 1 and 2 also form a unique key. Materialize only reports the most succinct form of keys, so for example while `[0]` and `[0, 1]` might both be unique keys, the latter is implied by the former and omitted. `()` indicates that the collection does not have any unique keys, while `([])` indicates that the empty projection is a unique key, meaning that the collection consists of 0 or 1 rows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe more like:
Annotates each subplan with a parenthesized list of unique keys. Each unique key is presented as a bracketed list of column identifiers. A list of column identifiers is reported as a unique key when for each setting of those columns to values there is at most one record in the collection. For example, ([0], [1,2])
is a list of two unique keys: column zero is a unique key, and columns 1 and 2 also form a unique key. ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this sounds much better!
- Update example output for fast path EXPLAIN - CTE correspondence - MIR Union text tweak: - Only _some_ Union stages force consolidation. - Eliminate "stage" terminology, because this is an internal-only terminology (coming from the dataflow world). Also, it's easy to confuse it with "EXPLAIN stages", which is a completely different thing, referring to the stages of compiling a query. - Note that it corresponds to UNION DISTINCT. - Join - Correct mem usage, and link to the Optimization page. - Correct RAW PLAN example. - Various tweaks for Reduce. - Change the order of the "Plan operators" table, to have the default EXPLAIN first. - I removed the "**Private preview**" marker from "filter pushdown". I'm pretty sure this is enabled globally for everybody, and is turned on by default. - Various operators had the following: "Uses memory proportional to the number of input updates" This might be misleading in multiple ways: - The reader might think that this grows without bound, if there are new updates coming in continuously. - The reader might think that the initial snapshot is not included. I think we should simply say "Uses memory proportional to the input size" - Flipped `Return ... With ...` to `With ... Return ...`, to follow MaterializeInc#30983 and MaterializeInc#31132 (And tweaked the text.) - And many other minor tweaks... Additionally, in the Optimization page, tweak the Delta join section: Eliminate the `DeltaQuery` terminology, as it is not used externally in other parts of the docs. Plus some minor tweaks in the text.
596e8ca
to
9f0c107
Compare
Return ... With ...
toWith ... Return ...
, to follow Change CTE order in EXPLAIN output #30983 and Change HIR's and LIR's EXPLAIN CTE order #31132 (And tweaked the text.)Additionally, in the Optimization page, tweak the Delta join section: Eliminate the
DeltaQuery
terminology, as it is not used externally in other parts of the docs. Plus some minor tweaks in the text.cc @ala2134
Motivation
https://www.notion.so/materialize/EXPLAIN-PLAN-Usability-17613f48d37b80139742d8c3ee710640
Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.