Skip to content

WITH ORDINALITY AS t(col, ord) alias gains a duplicate ord column when an Unnest node is rendered more than once in a single Generator.sql() pass #7566

@CPAPI-104

Description

@CPAPI-104

Bug: WITH ORDINALITY AS t(col, ord) alias gains a duplicate ord column when an Unnest node is rendered more than once in a single Generator.sql() pass

Before you file an issue

  • ✅ Read dialect specified: parse_one(sql, read="postgres")
  • ✅ Write dialect specified: ast.sql(dialect="postgres")
  • ✅ Reproduces on main (tested on sqlglot 25.0.0, 26.0.0, 27.21.0, 28.0.0, 28.5.0, 28.10.0, 29.0.1, 30.6.0)

Summary

When an Unnest node has both an offset and a TableAlias with explicit columns, Generator.unnest_sql mutates alias.columns by appending the offset identifier on every render of that node. If a generator codepath renders the same Unnest expression instance more than once during a single Generator.sql() pass — for example Postgres' single-arg TRIM(...) codepath, which calls self.sql(expression, "this") twice — the alias accumulates duplicate offset entries. The output SQL declares an alias column list one entry longer than the underlying SELECT, which Postgres then rejects with InvalidColumnReference: table "<alias>" has 2 columns available but 3 columns specified.

Fully reproducible code snippet

from sqlglot import parse_one

sql = """
SELECT TRIM(ARRAY_TO_STRING(ARRAY(
    SELECT tok
    FROM   (SELECT tok, rn
            FROM UNNEST(string_to_array('a b', ' ')) WITH ORDINALITY AS t(tok, rn)) d
    WHERE  NOT EXISTS (
        SELECT 1
        FROM UNNEST(string_to_array('a b', ' ')) WITH ORDINALITY AS s(tok2, rn2)
        WHERE s.tok2 = d.tok AND s.rn2 < d.rn
    )
    ORDER BY rn
), ' ')) AS comp
"""

print(parse_one(sql, read="postgres").sql(dialect="postgres"))

Expected output (semantics-preserving)

SELECT TRIM(ARRAY_TO_STRING(ARRAY(SELECT tok FROM (SELECT tok, rn FROM UNNEST(STRING_TO_ARRAY('a b', ' ')) WITH ORDINALITY AS t(tok, rn)) AS d WHERE NOT EXISTS(SELECT 1 FROM UNNEST(STRING_TO_ARRAY('a b', ' ')) WITH ORDINALITY AS s(tok2, rn2) WHERE s.tok2 = d.tok AND s.rn2 < d.rn) ORDER BY rn), ' ')) AS comp

Actual output

SELECT TRIM(ARRAY_TO_STRING(ARRAY(SELECT tok FROM (SELECT tok, rn FROM UNNEST(STRING_TO_ARRAY('a b', ' ')) WITH ORDINALITY AS t(tok, rn, rn)) AS d WHERE NOT EXISTS(SELECT 1 FROM UNNEST(STRING_TO_ARRAY('a b', ' ')) WITH ORDINALITY AS s(tok2, rn2, rn2) WHERE s.tok2 = d.tok AND s.rn2 < d.rn) ORDER BY rn), ' ')) AS comp

The alias column list (tok, rn) is emitted as (tok, rn, rn) and (tok2, rn2) as (tok2, rn2, rn2) — a duplicate offset column in each.

Root cause (mutation evidence)

Generator.unnest_sql appends the offset identifier into alias.columns on every call rather than rendering it transiently. Calling the generator on the same Unnest node N times produces N appended copies:

from sqlglot import exp, parse_one
from sqlglot.dialects.postgres import Postgres

tree = parse_one(
    "SELECT * FROM UNNEST(string_to_array('a b', ' ')) WITH ORDINALITY AS t(tok, rn)",
    read="postgres",
)
unnest = next(tree.find_all(exp.Unnest))
gen = Postgres().generator()

print([c.name for c in unnest.args["alias"].args.get("columns") or []])
# ['tok']
print(gen.sql(unnest))
# UNNEST(STRING_TO_ARRAY('a b', ' ')) WITH ORDINALITY AS t(tok, rn)
print([c.name for c in unnest.args["alias"].args.get("columns") or []])
# ['tok', 'rn']                                       ← mutated
print(gen.sql(unnest))
# UNNEST(STRING_TO_ARRAY('a b', ' ')) WITH ORDINALITY AS t(tok, rn, rn)
print([c.name for c in unnest.args["alias"].args.get("columns") or []])
# ['tok', 'rn', 'rn']                                 ← mutated again

In the bug scenario the whole tree is rendered exactly once via tree.sql(...), but the inner Unnest is visited twice due to Postgres' TRIM single-arg codepath (generator.py calls self.sql(expression, "this") twice for Trim), producing the (tok, rn, rn) output above.

Why it surfaces in nested contexts only

A top-level UNNEST(...) WITH ORDINALITY AS t(tok, rn) is rendered exactly once, so the mutation doesn't accumulate. The bug only surfaces when the Unnest is reached through a generator codepath that renders the same Expression instance more than once in a pass (e.g. inside a TRIM(...) argument).

Workaround

Pre-fold the offset column into alias.columns before rendering, then replace the offset argument on the Unnest with a non-Expression truthy sentinel so unnest_sql still emits WITH ORDINALITY but skips the mutation branch:

from sqlglot import exp, parse_one


def _fold_unnest_ordinality_offset(expression: exp.Expression) -> None:
    for unnest in expression.find_all(exp.Unnest):
        offset = unnest.args.get("offset")
        alias = unnest.args.get("alias")
        if not isinstance(offset, exp.Expression):
            continue
        if not isinstance(alias, exp.TableAlias):
            continue
        columns = alias.args.get("columns") or []
        if not any(
            isinstance(c, exp.Identifier) and c.name == offset.name for c in columns
        ):
            alias.append("columns", offset.copy())
        unnest.set("offset", True)


tree = parse_one(sql, read="postgres")
_fold_unnest_ordinality_offset(tree)
print(tree.sql(dialect="postgres"))
# correct output: AS t(tok, rn) ... AS s(tok2, rn2) ...

Suggested fix

Either (a) make unnest_sql non-mutating — e.g. emit WITH ORDINALITY AS {alias.name}({alias.columns}, {offset}) directly without modifying alias — or (b) make the codepaths that render the same Expression instance twice (Trim, etc.) operate on a copy.

Official Postgres documentation

Versions tested

sqlglot==25.0.0, 26.0.0, 27.21.0, 28.0.0, 28.5.0, 28.10.0, 29.0.1, 30.6.0 — bug present in all.

Environment

  • Python 3.13.5
  • macOS 25.4.0 (darwin arm64)
  • target dialect: postgres (server: PostgreSQL 17.7)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions