Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A query in scalar context should use the first column #75

Open
xitology opened this issue Jan 7, 2025 · 2 comments
Open

A query in scalar context should use the first column #75

xitology opened this issue Jan 7, 2025 · 2 comments

Comments

@xitology
Copy link
Member

xitology commented Jan 7, 2025

Consider a query used in a scalar context, such as SELECT clause, EXISTS expression, IN expression, or an argument of a scalar function.

For convenience, assume a query that returns one row, e.g., @funsql from(person).limit(1). It does return multiple columns: person_id, gender_concept_id, etc. How should this query be interpreted when used in a scalar context, such as:

@funsql select(from(person).limit(1))

This is a challenge because in a scalar context, a query must return exactly one column. Currently, the returned column is NULL unless the column is explicitly specified with a select() combinator. Thus, select(from(person).limit(1)) returns NULL, but select(from(person).limit(1).select(person_id)) returns the value of person_id.

This interpretation allows us to accept any query in a scalar context, which is particularly useful for EXISTS. However, it may cause confusion when the query is used as an argument of IN or a scalar function.

There is a better interpretation: A query used in a scalar context should return its first column.

This interpretation does not change the semantics of a query with an explicit select(). For queries without select(), it would pick the first column of a table, which is typically its primary key. This allows us to write, for example

@funsql begin
    cohort() = begin
        from(person)
        filter(gender_concept_id == 8532)
    end

    relevant_visit() = begin
        from(visit_occurrence)
        filter(person_id in cohort()) # rather than `in cohort().select(person_id)`
    end
end
@clarkevans
Copy link
Contributor

clarkevans commented Jan 7, 2025

Could we instead use the label for the lhs of in() to select the column with the same label if the rhs doesn't have a select?

e.g. (colname in query()) would be interpreted as (colname in query().select(colname)) if query() isn't scalar

colname might also be (expr).as(colname)

I'm hesitant to pick the first column since this makes the interpretation of a query depend upon the catalog: tables with the same name and same columns may have different interpretation if order of columns somehow differs. Also, an implicit positional approach may lead to confusion. For example, with (race_concept_id in person()) one may guess that it'd match for the "same" race. However, with person_id as first person() column, concept_id is matched against person_id rather than raising an error.

Alternatively, perhaps we might use the primary key column, should the PK have exactly one field. Or perhaps we use a tuple in this case? If there is no primary key, it's an error.

@xitology
Copy link
Member Author

xitology commented Jan 7, 2025

My primary motivating example is concept matching:

filter(condition_concept_id == SNOMED("Essential hypertension"))
filter(condition_concept_id in SNOMED("Essential hypertension").with_descendands())

If you interpret colname in query() as colname in query().select(colname), it no longer works. This is also not algebraic because makes in care too much about its arguments.

The first column = the primary key is a very common conventions, at least for ORM-generated schemas. This interpretation will give exactly what you need in the majority of cases, and you can always use explicit select() if really needed.

@clarkevans clarkevans pinned this issue Jan 7, 2025
@clarkevans clarkevans unpinned this issue Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants