Skip to content

[Enhancement] Better error handling for JOIN query limitation #4058

@RyanL1997

Description

@RyanL1997

Description

According to the official documentation: OpenSearch doc - JOIN does not support aggregations on the joined result, JOIN queries in today's SQL plugin has certain limitations for aggregated join result. However, currently we do not have certain error handling to capture these limitations to warn users.

Reproduce of the issue

  • Env Setup - latest version of opensearch
  • Construct a JOIN query with some aggregate functions, e.g.
SELECT entity.country_code, count(interaction_aggregate.interaction_id), count(if(interaction_aggregate.is_1to1visit, interaction_aggregate.interaction_id, null))
FROM interaction_aggregate
JOIN entity on entity.entity_key = interaction_aggregate.entity_key
GROUP BY entity.country_code LIMIT 10
  • Execute the above query in the latest OS version, and from the response, it will output the column with aliasing issue - count(b.channel) c incorrectly generates duplicate columns a.c and b.c with null values instead of the expected aggregation result

  • In the query explanation, the symptoms are:

    • No aggregation operator in the logical or physical plan

      • "Project [ columns=[interaction_aggregate_0.COUNT, hcp_1.COUNT, hcp_1.country_code] ]" in response indicates that the SQL engine is treating the COUNT(...) expressions as literal field names, not as functions to be evaluated, because it cannot resolve an aggregation function over joined data.
    • JOIN followed by projection, not aggregation

      • The physical plan is using a BlockHashJoin, then directly projecting the COUNT columns from both sources
       "schema": [
         {"name": "interaction_aggregate_0.COUNT(interaction_id)", "type": "long"},
         ...
       ],
       "datarows": [[null, "CO", null], ...]

Expectation/Exit Criteria

  • SQL plugin can properly detect the usage of JOIN with aggregation
  • Throw an error and point to the correct official documentation for user's awareness.
  • Proper test cases for the enhanced behavior

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions