-
Notifications
You must be signed in to change notification settings - Fork 66
Adds RANK, DENSE_RANK, LAG, LEAD, & ROW_NUMBER Window Functions & Operator #1746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1746 +/- ##
============================================
+ Coverage 31.10% 31.84% +0.74%
- Complexity 99 102 +3
============================================
Files 31 31
Lines 2016 2013 -3
Branches 152 151 -1
============================================
+ Hits 627 641 +14
+ Misses 1366 1350 -16
+ Partials 23 22 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
6111dc6
to
eb013ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished Parser and Ast Package. As always, very delightful code : ).
One preliminary comment:
Perhaps it is worth to track the feature gap between components if anything exists.
i.e., if something is supported in parsing but not supported in planning, etc.
*/ | ||
@Builder(builderClassName = "Builder") | ||
@EqualsAndHashCode(callSuper = false) | ||
public static final class NoArg extends WindowFunctionType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, mind evaluating why the choice of NoArg
vs say Ranking
and RowNumber
.
So seemingly for one we are modeling using number of args, for the other we are modeling using "function characteristic" for lack of better words.
I understand that both Ranking
and RowNumber
must not contains argument... And no arg is, well, no arg... and you don't have to worry about argument type....
But perhaps using NoArg
is bit over-smart, in that it might cause difficulty when (especially a user is) constructing the ast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, well, a user can just use a factory method that we provide. We could do what you're proposing, but also the enum is also a small footprint. I'm not sold in either direction. If you feel strongly, I don't mind changing it to be that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decided to go ahead with your observation. Trying to thread the needle between your comments and @alancai98's.
partiql-parser/src/main/kotlin/org/partiql/parser/internal/PartiQLParserDefault.kt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving some initial comments for the parsing + AST modeling. I'll need some more time to read through the rest of the PR
partiql-eval/src/test/kotlin/org/partiql/eval/internal/WindowTests.kt
Outdated
Show resolved
Hide resolved
* EBNF 2023: | ||
* <window function> ::= <window function type> OVER <window name or specification> | ||
*/ | ||
windowFunction: funcType=windowFunctionType OVER spec=windowNameOrSpecification; | ||
|
||
/** | ||
* EBNF 2023: | ||
* <window function type> ::= | ||
* <rank function type> <left paren> <right paren> | ||
* | ROW_NUMBER <left paren> <right paren> | ||
* | <lead or lag function> | ||
* | and more... | ||
*/ | ||
windowFunctionType | ||
: rankFunctionType PAREN_LEFT PAREN_RIGHT # WindowFunctionTypeRank | ||
| ROW_NUMBER PAREN_LEFT PAREN_RIGHT # WindowFunctionTypeRowNumber | ||
| leadOrLagFunction # WindowFunctionTypeLeadOrLag | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than have the window function type be hard-coded, could we do something similar to what we do for scalar and aggregations (i.e. the modeling of functionCall
uses a qualifiedName
rather than any hard-coded names)? Doing so will make it a lot easier to add new window functions and also user-defined window functions in the future.
Relevant prior PRs to get rid of hard-coded scalar and aggregations from the antlr grammar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say let's do that when we get there. The EBNF does not contribute to API breaking changes. WindowFunctionType also is not a sealed interface. If we want to make it more explicit, I can make NoArg specific to RowNumber and Rank. Then, if we ever decided to allow user-defined window functions, we can do so by adding a variant specifically for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not fully-convinced that the modeling for the window functions needs to be hard-coded in the parser and AST. In the future if we are to support more window functions, we would need additional rules in the parser and variants within the WindowFunctionType
, which may be redundant. Trino, for instance, defines all scalar, aggregate, and window functions as a FunctionCall (see their grammar).
Since we've marked all the public APIs as experimental though, I'm still fine w/ proceeding and experimenting further as we get feedback from customers.
// TODO: In the future: @Nullable private final WindowFrameClause frameClause; | ||
|
||
@Nullable | ||
private final Identifier.Simple existingName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could we just call this name
? The SQL2023 spec uses <existing window name>
but it just wraps a <window name>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I like the fact that it matches the EBNF and that it makes it apparent that the WindowSpecification operates against a pre-defined (aka existing) window, effectively creating a new window.
partiql-ast/src/main/java/org/partiql/ast/WindowOrderClause.java
Outdated
Show resolved
Hide resolved
partiql-ast/src/main/java/org/partiql/ast/WindowSpecification.java
Outdated
Show resolved
Hide resolved
/** | ||
* Represents a window function type. | ||
* @see ExprWindowFunction#getFunctionType() | ||
*/ | ||
public abstract class WindowFunctionType extends AstNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the parser comment, I'm wondering if we should change the window functions to not be as hard-coded within the AST modeling and follow something more like scalar and aggregate function calls. IMO this can make it a lot more straightforward to add future support for user-defined window functions.
Trino for example models window functions (along with scalars and aggregates) using FunctionsCall. For all of the function names, they just use a QualifiedName
rather than any sort of hard-coded names within their AST, leaving the validation to a later stage. I'm not sure if we should lump all three of those under ExprCall
but modeling window functions with just an identifier name rather than the WindowFunctionType
could be a cleaner modeling as more window functions are added over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*/ | ||
@Builder(builderClassName = "Builder") | ||
@EqualsAndHashCode(callSuper = false) | ||
public static final class InLineSpecification extends WindowReference { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me why we need a separate in-line window specification class. Perhaps a simpler modeling could follow what Trino does with their Window interface.
The Window
interface is implemented by two separate classes
Following a similar modeling, we could have a Window
abstract class that is used by the ExprWindowFunction
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could put WindowSpecification to be a variant of WindowReference (and make it a nested class, or leave it out separately, doesn't matter to me). Alternatively, extract a WindowReference interface. Did this locally, and while it's nice, I'm really despising how AstNode is an abstract class. Should've just created a public AstNodeBase. I'll publish a rev, and it should be apparent why it's a burden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually going to take @yliuuuu's recommendation of removing WindowReference entirely. It removes all the shenanigans, and we get what you're describing also 😄
private final Expr extent; | ||
private final Long offset; | ||
private final Expr defaultValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If window functions were modeled more closely to ExprCall
, these could all be function arguments rather than hard-coded properties.
partiql-planner/src/main/kotlin/org/partiql/planner/internal/transforms/RelConverter.kt
Show resolved
Hide resolved
partiql-planner/src/main/kotlin/org/partiql/planner/internal/transforms/RelConverter.kt
Show resolved
Hide resolved
...anner/src/main/kotlin/org/partiql/planner/internal/window/WindowFunctionSignatureProvider.kt
Outdated
Show resolved
Hide resolved
partiql-eval/src/main/kotlin/org/partiql/eval/internal/window/NavigationFunction.kt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. Just one comment on the Typing of Lag/Lead Function.
*/ | ||
@Builder(builderClassName = "Builder") | ||
@EqualsAndHashCode(callSuper = false) | ||
public static final class LeadOrLag extends WindowFunctionType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection here, but might argue that by breaking it down into two classes, the API is more consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I mean I'm fine with that too 😄 Whatever floats your boat
partiql-eval/src/main/kotlin/org/partiql/eval/internal/operator/rel/RelOpWindow.kt
Show resolved
Hide resolved
partiql-eval/src/main/kotlin/org/partiql/eval/internal/operator/rel/RelOpWindow.kt
Show resolved
Hide resolved
partiql-eval/src/main/kotlin/org/partiql/eval/internal/operator/rel/RelOpWindow.kt
Show resolved
Hide resolved
returnTyper.accumulate(expr) | ||
returnTyper.accumulate(default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to accumulate or throw on mismatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something. The lag/lead's output type will be the expr or default if the offset row doesn't exist, so they don't need to match. In the case of coercible types, maybe we coerce them to their common super type, hence the TODO. But, I don't see any problem with the output type being ANY if the types aren't coercible.
The only validation I'd say is required is that the offset by coercible to BIGINT (or for now just BIGINT).
… clause Adds AST modeling for window functions Adds plan modeling for window functions Deprecates old window function AST APIs due to incompatibility Adds RANK, DENSE_RANK window functions Adds ROW_NUMBER window function Adds LAG and LEAD window functions Adds support for referencing window clause from window functions
Adds children check Leverages different warning type
Adds deprecation notices to APIs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating the PR and adding the experimental notices! Mostly left some minor comments/feedback that should be quick to address before we merge. A couple other things to highlight:
- I think we should try to use a separate tracking issue for things that are left as TODOs. Otherwise, the TODOs may be hidden in the code and hard to prioritize.
- The current modeling of window functions in the parser/ast may need further experimentation as we integrate w/ customers. Since everything that's public will be marked w/ experimental, I think we can still proceed with merging in its current modeling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partiql-tests submodule change should be reverted
@@ -27,6 +27,7 @@ Thank you to all who have contributed! | |||
|
|||
### Added | |||
- partiql-ast: add `With` and `WithListElement` to `AstVisitor` and `SqlDialect` | |||
- Adds end-to-end support for window functions including RANK, DENSE_RANK, LAG, LEAD, and ROW_NUMBER functions alongside the WINDOW clause. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should call out that the new window clause and window function support is experimental.
*/ | ||
@lombok.Builder(builderClassName = "Builder") | ||
@EqualsAndHashCode(callSuper = false) | ||
public static final class WindowDefinition extends AstNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can unnest from the WindowClause
or just call this Definition
. Currently the type is WindowClause.WindowDefinition
, which is pretty lengthy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need a deprecated/experimental notice
* @deprecated This feature is experimental and is subject to change. | ||
*/ | ||
@Deprecated | ||
public abstract class WindowPartition extends AstNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the WindowPartition
abstract
? If I'm matching this up properly to the SQL2023 spec, this lines up with <window partition column reference>
, which will have a <column reference>
and an optional <collate clause>
. Perhaps a simpler modeling is just as a final class with a columnReference
field.
@@ -27,6 +27,7 @@ Thank you to all who have contributed! | |||
|
|||
### Added | |||
- partiql-ast: add `With` and `WithListElement` to `AstVisitor` and `SqlDialect` | |||
- Adds end-to-end support for window functions including RANK, DENSE_RANK, LAG, LEAD, and ROW_NUMBER functions alongside the WINDOW clause. | |||
- **EXPERIMENTAL** partiql-plan: add representation of `RelWith` and `WithListElement` to the plan and the | |||
`OperatorVisitor` | |||
- **EXPERIMENTAL** partiql-planner: add planner builder function to control whether `With` table references are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably also call out that the prior modeling of window is deprecated and replaced with the newer modeling.
// if (t != null) { | ||
// return t; | ||
// } | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we returning null
for the cause here?
@@ -35,7 +35,7 @@ public PType getType() { | |||
public String toString() { | |||
return "DatumString{" + | |||
"_type=" + _type + | |||
", _value=" + _value + | |||
", _value='" + _value + '\'' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for consistency and don't need the escape if we use double-quotes
", _value='" + _value + '\'' + | |
", _value='" + _value + "'" + |
@@ -324,6 +325,106 @@ public R visitExprArray(ExprArray node, C ctx) { | |||
return defaultVisit(node, ctx); | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these functions, probably need an experimental notice.
@EqualsAndHashCode(callSuper = false) | ||
@Deprecated | ||
public final class WindowSpecification extends AstNode { | ||
// TODO: In the future: @Nullable private final WindowFrameClause frameClause; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we create a tracking issue for the TODOs brought up in this PR? In the code and in PR description there are a lot of TODOs mentioned (e.g. window frame clause, unsupported functions in WindowBuiltins
) but may be hard to discover unless we do a code search of the repo.
* EBNF 2023: | ||
* <window function> ::= <window function type> OVER <window name or specification> | ||
*/ | ||
windowFunction: funcType=windowFunctionType OVER spec=windowNameOrSpecification; | ||
|
||
/** | ||
* EBNF 2023: | ||
* <window function type> ::= | ||
* <rank function type> <left paren> <right paren> | ||
* | ROW_NUMBER <left paren> <right paren> | ||
* | <lead or lag function> | ||
* | and more... | ||
*/ | ||
windowFunctionType | ||
: rankFunctionType PAREN_LEFT PAREN_RIGHT # WindowFunctionTypeRank | ||
| ROW_NUMBER PAREN_LEFT PAREN_RIGHT # WindowFunctionTypeRowNumber | ||
| leadOrLagFunction # WindowFunctionTypeLeadOrLag | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not fully-convinced that the modeling for the window functions needs to be hard-coded in the parser and AST. In the future if we are to support more window functions, we would need additional rules in the parser and variants within the WindowFunctionType
, which may be redundant. Trino, for instance, defines all scalar, aggregate, and window functions as a FunctionCall (see their grammar).
Since we've marked all the public APIs as experimental though, I'm still fine w/ proceeding and experimenting further as we get feedback from customers.
Description
Deprecated
Window Design
WindowPartition
much easier (and fairly performant).WindowFunction
and the plan and evaluator's definition of aWindowFunctionNode
andWindowFunction
. This PR acts this way to maintain a separation of concerns. For now, we are only modeling the built-ins as provided by the database system, however, the APIs can be extended in the future to accept and converts SPI-provided window functions.Future Changes
RelWindow
,WindowFunctionNode
, andWindowFunction
will be need to be updated to support frames, which is totally fine. ForWindowFunction
, we'll likely need to add a new method that takes as input some frame information, which by default should call the existing (probably deprecated in the future)eval
method. This will not break the public API, nor would it break the existing customer flow.Other Information
License Information
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.