Skip to content

A query in scalar context should use the first column #75

Closed
@xitology

Description

@xitology

Consider a query used in a scalar context, such as SELECT clause, EXISTS expression, IN expression, or an argument of a scalar function.

For convenience, assume a query that returns one row, e.g., @funsql from(person).limit(1). It does return multiple columns: person_id, gender_concept_id, etc. How should this query be interpreted when used in a scalar context, such as:

@funsql select(from(person).limit(1))

This is a challenge because in a scalar context, a query must return exactly one column. Currently, the returned column is NULL unless the column is explicitly specified with a select() combinator. Thus, select(from(person).limit(1)) returns NULL, but select(from(person).limit(1).select(person_id)) returns the value of person_id.

This interpretation allows us to accept any query in a scalar context, which is particularly useful for EXISTS. However, it may cause confusion when the query is used as an argument of IN or a scalar function.

There is a better interpretation: A query used in a scalar context should return its first column.

This interpretation does not change the semantics of a query with an explicit select(). For queries without select(), it would pick the first column of a table, which is typically its primary key. This allows us to write, for example

@funsql begin
    cohort() = begin
        from(person)
        filter(gender_concept_id == 8532)
    end

    relevant_visit() = begin
        from(visit_occurrence)
        filter(person_id in cohort()) # rather than `in cohort().select(person_id)`
    end
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions