-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benoit/eng-362-update-ndc-postgres-to-ndc_models-020 #666
base: main
Are you sure you want to change the base?
Benoit/eng-362-update-ndc-postgres-to-ndc_models-020 #666
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Failing tests: we expect failing tests related to the deprecation of the root column comparison.
These will be fixed in a separate PR, to be merged on this one before merging to main.
This has now been merged.
}, | ||
) | ||
}) | ||
.collect(), | ||
) | ||
} | ||
|
||
/// Infer scalar type representation from scalar type name, if necessary. Defaults to JSON representation | ||
fn convert_or_infer_type_representation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V0.2.0 requires type representation.
Type representation comes from introspection configuration, and may be absent.
So, if type representation is missing, we infer the type based on the name and fetch the corresponding type representation from the default introspection configuration.
ed42b7e
to
6fa57df
Compare
Note we are pointing to a specific sdk revision We should tag a release and point to that
…n does not include a type representation, we infer one based on the scalar type name. We default to JSON representation if we don't recognize the scalar type. The mapping is pulled from the default introspection configuration. This should enable a smooth upgrade, but we may need to publish a new version of the configuration with a mechanism to guarantee type representations, later.
Note! This is a regression with regards to named scopes, which replace the previously supported RootTableColumn. There was technically no way to consume this api from the engine, so this is not a major issue, and will be addressed in an upcoming PR.
Type representations are no longer optional Schema Response now includes a reference to the scalar type to be used for count results. AggregateFunctionDefinition is now an enum, so we map based on function name. Note! We are currently lying by omission about the return types. Postgres aggregates will return NULL if aggregating over no rows, except COUNT. We should have a discussion about wether we want to change aggregate function definitions to reflect this behavior, whether all these scalars will be implicitly nullable, or whether we want to change the SQL using COALESCE to default to some value when no rows are present. Arguably, there's no proper MAX, MIN, or AVG default values. As for SUM, ndc-test expects all SUM return values to be either represented as 64 bit integers or 64 bit floats. Postgres has types like INTERVAL, which is represented as a string, and can be aggregated with SUM. We need to discuss whether any of the above needs to be revisited. We cannot represent intervals as float64 or int64.
…, so that we may count nested properties using field_path
…eign key may be on a nested field. for now, we do not suport relationships.nested, so erroring out in that case
Add reference to configuration.schema.json Add missing type representations Add missing scalar types (v4 did not require all referenced scalar types to be defined)
note thise feature is still not implemented so the test still fails
…only non-null rows, instead of COUNT(*) which would count all rows
ndc spec expects sum aggregates return a scalar represented as either return f64 or i64 Because ndc-postgres represents i64 as a string, we only mark sum aggregates returning a f64 any other sum aggregate will function as a custom aggregate and have no special meaning additionally, we wrap SUM with `COALESCE(SUM(col), 0)` to ensure we return 0 when aggregating over no rows. similarly, we only mark avg functions returning a f64, and treat any other avg as a custom aggregate
…y tables in scope for an exists, instead of only root and current. (#674) <!-- The PR description should answer 2 (maybe 3) important questions: --> ### What <!-- What is this PR trying to accomplish (and why, if it's not obvious)? --> `ComparisonTarget::RootCollectionColumn` was removed, to be replaced by [named scopes](https://github.com/hasura/ndc-spec/blob/36855ff20dcbd7d129427794aee9746b895390af/rfcs/0015-named-scopes.md). This PR implements the replacement functionality. <!-- Consider: do we need to add a changelog entry? --> ### How <!-- How is it trying to accomplish it (what are the implementation steps)? --> This PR replaces RootAndCurrentTables, with TableScope, a struct that keeps track of the current table and any tables in scope for exists expression. See the accompanying review for details on the code itself.
4234509
to
829886f
Compare
}, | ||
) | ||
}) | ||
.collect(), | ||
) | ||
} | ||
|
||
/// Infer scalar type representation from scalar type name, if necessary. Defaults to JSON representation | ||
fn convert_or_infer_type_representation( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
V0.2.0 requires type representation, but configuration did not require configuration to be present.
To maximize compatibility with older configuration versions, we infer missing type representations based on scalar type name. If missing, we default to JSON representation.
@@ -29,22 +29,22 @@ impl ComparisonOperatorMapping { | |||
ComparisonOperatorMapping { | |||
operator_name: "<=".to_string(), | |||
exposed_name: "_lte".to_string(), | |||
operator_kind: OperatorKind::Custom, | |||
operator_kind: OperatorKind::LessThanOrEqual, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default introspection configuration changed to tag lt(e),gt(e) operators.
This will only affect new configurations, so any deployments with existing configuration will see no change in behavior.
function_name.as_str(), | ||
function_definition.return_type.as_str(), | ||
) { | ||
("sum", "float8" | "int8") => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v0.2.0 adds standard aggregate functions. These have specific expectations, such as sum
needing to return a scalar represented as either Float64
or Int64
.
We check for specific aggregate functions returning matching data types, and mark applicable functions as such.
Non-compliant functions (eg. sum
on interval types which are represented as strings) will be tagged as custom aggregate functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good - let's add these comments into the code where it makes sense.
Ok(models::SchemaResponse { | ||
collections, | ||
procedures, | ||
functions: vec![], | ||
object_types, | ||
scalar_types, | ||
capabilities: Some(models::CapabilitySchemaInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this is required, but also means we will see a change in returned schemas, even if configuration has not been changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, the entire schema types are going to change because of the ndc-models
bump anyway.
field_path, | ||
scope, | ||
} => { | ||
let scoped_table = current_table_scope.scoped_table(scope)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply scope, if any, before traversing path
args: vec![column], | ||
} | ||
} | ||
OrderByAggregate::CountStar | OrderByAggregate::Count => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Count Star and Count actually behave the same, where we only count left-hand rows that actually exists.
This is important, as left joins + count(*) will actually count all rows, even if there were no matching left-hand rows.
I believe those semantics are correct, but something to double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked through here and I'm not sure, perhaps a question for @daniel-chambers ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CountStar
and Count(column)
are not semantically the same. As mentioned in that doc, CountStar
counts all rows, Count(column)
counts all rows that have a non-null value in column
. This is actually consistent with the usual behaviour COUNT(*)
and COUNT(column)
in SQL.
I'd need to see the SQL that this generates to know what's actually going on here... because this is counting across a join, you can't use count(*)
as Benoit pointed out, but I'd like to know what we're actually counting then. I tried to follow the code and it exceeded my 10:30pm patience 😂. My gut says that in this situation a "count(*)" should be count("the join key column"), which may be what's going on here.
Regardless, it needs to satisfy the semantics I mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generated SQL for CountStar looks like this:
LEFT OUTER JOIN LATERAL (
SELECT
COUNT("%1_ORDER_PART_Album"."count") AS "count"
FROM
(
SELECT
1 AS "count"
FROM
"public"."Album" AS "%1_ORDER_PART_Album"
WHERE
(
"%0_Artist"."ArtistId" = "%1_ORDER_PART_Album"."ArtistId"
)
) AS "%1_ORDER_PART_Album"
) AS "%2_ORDER_FOR_Artist" ON ('true')
There's a previous step that generates the inner SQL, and gives us a reference to the column we can count on, which is either a synthetic column with a value 1 for CountStar
, or the column to count, or the column to aggregate on when aggregating with a custom function.
Which is to say, I'm pretty confident we are doing the right thing here
} | ||
|
||
enum OrderByAggregate { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We created a new enum for the various ordering aggregates
@@ -703,10 +740,10 @@ fn translate_targets( | |||
// Aggregates do not have a field path. | |||
field_path: (&None).into(), | |||
expression: sql::ast::Expression::Value(sql::ast::Value::Int4(1)), | |||
aggregate: Some(sql::ast::Function::Unknown("COUNT".to_string())), | |||
aggregate: Some(OrderByAggregate::CountStar), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We used our new ordering aggregate enum instead of a direct SQL AST function.
@@ -53,15 +53,18 @@ nonempty = "0.10" | |||
percent-encoding = "2" | |||
prometheus = "0.13" | |||
ref-cast = "1" | |||
reqwest = "0.11" | |||
reqwest = "0.12" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we bump this separately before the PR goes in? Good not to mix up functional and non-functional changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. ndc-test
needs to be updated alongside ndc-models
and ndc-sdk
, and it uses reqwest
12
, which is the reason for this change.
crates/query-engine/translation/tests/goldenfiles/aggregate_distinct_albums/request.json
Outdated
Show resolved
Hide resolved
if let Some(representation) = representation { | ||
representation | ||
} else { | ||
match scalar_type_name.as_str() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have the default_base_type_representations
to do this mapping from scalar type name to TypeRepresentation
. If this function adds any more mappings we should add them there rather than adding a new layer of indirection. The idea that whatever mappings are in the configuration are what will be returned in the schema is a useful property for understanding all this, and we would need a compelling reason to remove that property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v0.2 requires type representation in schema responses, however type representation is optional in the configuration itself.
So this is a piece of backwards compatiblity code, to allow for configurations that are missing type representation.
We could instead create configuration v6, and refuse to run with older configuration versions.
As for the use of default_base_type_representations
: this mapping only exists for configurations v4 and v5. Here in v3 we replicate that functionality instead.
I prefered copying the code but keeping the different modules separate.
And yes it's a bit cheaty: we have a default when creating a new config, and reuse that default when type representation is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably best to leave version 3 configs doing whatever they were before, ie falling back to JSON where they have no TypeRepresentation
, or where they used the deprecated Number
or Integer
types.
Also, there is no need for a version 6, it looks like we were already making any mention of Number
and Integer
fallback to JSON before this PR, and since we have our own type exposed in the config type, we won't be removing any items and thus making any breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defaulting to JSON is a behavior change. Representation was optional in both configuration and schema response, so we just returned it as-is.
This applies to configurations 3,4,5
Confirm we'd rather default to JSON if representation missing, without first trying to infer based on type name?
@@ -30,7 +30,7 @@ pub struct ScalarType { | |||
pub description: Option<String>, | |||
pub aggregate_functions: BTreeMap<models::AggregateFunctionName, AggregateFunction>, | |||
pub comparison_operators: BTreeMap<models::ComparisonOperatorName, ComparisonOperator>, | |||
pub type_representation: Option<TypeRepresentation>, | |||
pub type_representation: TypeRepresentation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am starting to think we should just use ndc_models::TypeRepresentation
here instead of this type which appears to be a complete copy? Feel bugs will lurk in subtle differences between the two, and I don't understand what the indirection buys us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I'll take this chance to do that. I didn't change it originally to keep changes to a minimum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be done in a separate PR to be fair, before or after this one.
// postgres SUM aggregate returns null if no input rows are provided | ||
// however, the ndc spec requires that SUM aggregates over no input rows return 0 | ||
// we achieve this with COALESCE, falling back to 0 if the aggregate expression returns null | ||
if function.as_str() == "sum" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thank you for the explanatory comments.
...uery-engine/translation/tests/goldenfiles/aggregate_limit_offset_order_by/configuration.json
Outdated
Show resolved
Hide resolved
@@ -31,15 +32,15 @@ | |||
"collection_relationships": { | |||
"ArtistAlbums": { | |||
"column_mapping": { | |||
"ArtistId": "ArtistId" | |||
"ArtistId": ["ArtistId"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This column_mapping
type change seems to be the main change to request files - is there anything else I should expect to see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll also see any requests with a left-handed path be replaced with a exists
expression, since left-handed path was deprecated.
pub struct RootAndCurrentTables { | ||
/// The root (top-most) table in the query. | ||
pub root_table: TableSourceAndReference, | ||
pub struct TableScope { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this TableScope
refactor still make sense in ndc-models
0.1.0? If so I would really like us to apply it in a separate PR before this one, as it's the source of a lot of changes I can see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we could split out the TableScope
changes from the ndc-models
changes then I think this will become a much clearer atomic change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TableScope is for v0.2, as it enables supporting named scopes, which replace root column references. We can apply it in a subsequent PR, but some tests won't pass until then
@@ -97,11 +97,15 @@ struct Column(models::FieldName); | |||
/// An aggregate operation to select from a table used in an order by. | |||
#[derive(Debug)] | |||
enum Aggregate { | |||
CountStarAggregate, | |||
SingleColumnAggregate { | |||
StarCount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new names are better, but again, this change is arbitrary and just more noise, this kind of thing should just be a tiny gardening PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, the suffix was removed to make the linter happy
@@ -23,7 +24,7 @@ FROM | |||
COUNT("%2_Invoice"."InvoiceId") AS "InvoiceId_count", | |||
min("%2_Invoice"."Total") AS "Total__min", | |||
max("%2_Invoice"."Total") AS "Total__max", | |||
sum("%2_Invoice"."Total") AS "Total__sum", | |||
coalesce(sum("%2_Invoice"."Total"), 0) AS "Total__sum", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -19,11 +19,8 @@ | |||
"operator": "_in", | |||
"value": { | |||
"type": "column", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is going on with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ComparisonValue::Column no longer uses ComparisonTarget to pick the column. Instead, the necessary column and pathing details are inlined onto the enum variant.
This got flattened
What
This PR updates ndc-postgres to ndc spec v0.2.0
This includes a lot of changes to tests. These have been justified in individual commits.
How