Creating tables with aggregation with automatically named columns results in unnamed columns error. #2175

cottrell · 2023-03-14T15:09:02Z

What happened?

This one compiles without error

let A = (
  from invoices
  filter invoice_date >= @1970-01-16
  derive [                        # This adds columns
    transaction_fees = 0.8,
    income = total - transaction_fees  # Columns can use other columns
  ]
  filter income > 1     # Transforms can be repeated.
  group customer_id (   # Use a nested pipeline on each group
    aggregate [         # Aggregate each group to a single row
      count = count,
    ]
  )
)

from A

But the below PRQL input does not.

Some minor name thing probably.

PRQL input

let A = (
  from invoices
  filter invoice_date >= @1970-01-16
  derive [                        # This adds columns
    transaction_fees = 0.8,
    income = total - transaction_fees  # Columns can use other columns
  ]
  filter income > 1     # Transforms can be repeated.
  group customer_id (   # Use a nested pipeline on each group
    aggregate [         # Aggregate each group to a single row
      count,
    ]
  )
)

from A

SQL output

Error: 
    ╭─[:16:6]
    │
 16 │ from A
    ·      ┬  
    ·      ╰── This table contains unnamed columns that need to be referenced by name
    · 
    · Help: The name may have been overridden later in the pipeline.
────╯

Expected SQL output

WITH table_1 AS (
  SELECT
    customer_id,
    total - 0.8 AS _expr_0
  FROM
    invoices
  WHERE
    invoice_date >= DATE '1970-01-16'
),
"A" AS (
  SELECT
    customer_id,
    COUNT(*) AS count
  FROM
    table_1 AS table_0
  WHERE
    _expr_0 > 1
  GROUP BY
    customer_id
)
SELECT
  customer_id,
  count
FROM
  "A"

-- Generated by PRQL compiler version:0.6.1 (https://prql-lang.org)

MVCE confirmation

Minimal example
New issue

Anything else?

No response

The text was updated successfully, but these errors were encountered:

max-sixty · 2023-03-15T04:55:18Z

This is surprising, thanks a lot @cottrell ...

Here's an even simpler example:

let a = (
  from invoices
  group customer_id (
    aggregate [count]
  )
)

from a

Error: 
   ╭─[:8:6]
   │
 8 │ from a
   ·      ┬  
   ·      ╰── This table contains unnamed columns that need to be referenced by name
   · 
   · Help: The name may have been overridden later in the pipeline.
───╯

aljazerzen · 2023-04-13T11:52:38Z

I think this message is quite self-explanatory, but as its author, of course I'm biased. Let me elaborate:

Definition of the table compiles to this:

WITH a AS (
    SELECT customer_id, COUNT(*) FROM invoices GROUP BY customer_id
)

If you try to write the main SELECT, you get something like:

WITH ...
SELECT customer_id, ? FROM a

... but because the second column doesn't have a name, you cannot reference it!

(a partial solution would be to use a SELECT * here, but you cannot do that if you append select ![customer_id] to the main query)

How should we change the error to make this more obvious?

max-sixty · 2023-04-16T02:05:16Z

(a partial solution would be to use a SELECT * here, but you cannot do that if you append select ![customer_id] to the main query)

Just from the perspective of the result, without thinking at all about how it's built:

Sure, if someone appends select ![customer_id], then that breaks.
But the claim that from foo should never break is possible — the compiler can use * until it no longer can...
Possibly related to select ! should raise an error if it can't exclude #2292 ?

aljazerzen · 2023-04-16T19:28:25Z

Ok, I must admit that for this specific case, it could produce a *.

cottrell · 2023-04-18T09:10:32Z

I can barely remember this one now but it seems to me like it might simplify to do away with the convenience of "nameless" or "implicitly named" aggregations? i.e. "count = count" not "count". I'm not sure if it was more complicated than that. More explicit and less surface area is usually better in the long run.

aljazerzen · 2023-04-18T13:25:22Z

Automatic name inference can be tricky. In this case it seems obvious that the inferred named should be count, but we have had different suggestions (that I cannot find now):

infer column name from function name:

from orders
aggregate [count] # column named `count`

but also:

from orders
select [lag 1 created_at] # column named `lag`

infer column name from the only agg function argument:

from orders
aggregate [sum total, sum discount_percent]
# columns `total` and `discount_percent`

There is also an option to combine these two rules, but I'm hesitant to choose something that would be too complicated. I don't want something unpredictable that people would rely on, but would then change when the function definition is changed.

SQL engines each have their own rules around this so there is no option to say "let's just do what most SQL engines do".

(this is a bit off topic, let's create a new issue for more discussion)

cottrell added the bug Invalid compiler output or panic label Mar 14, 2023

max-sixty added the priority label Mar 17, 2023

aljazerzen mentioned this issue Apr 14, 2023

select works after from_text but derive fails #2392

Closed

2 tasks

aljazerzen added compiler and removed bug Invalid compiler output or panic labels Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating tables with aggregation with automatically named columns results in unnamed columns error. #2175

Creating tables with aggregation with automatically named columns results in unnamed columns error. #2175

cottrell commented Mar 14, 2023

max-sixty commented Mar 15, 2023

aljazerzen commented Apr 13, 2023

max-sixty commented Apr 16, 2023

aljazerzen commented Apr 16, 2023

cottrell commented Apr 18, 2023

aljazerzen commented Apr 18, 2023

Creating tables with aggregation with automatically named columns results in unnamed columns error. #2175

Creating tables with aggregation with automatically named columns results in unnamed columns error. #2175

Comments

cottrell commented Mar 14, 2023

What happened?

PRQL input

SQL output

Expected SQL output

MVCE confirmation

Anything else?

max-sixty commented Mar 15, 2023

aljazerzen commented Apr 13, 2023

max-sixty commented Apr 16, 2023

aljazerzen commented Apr 16, 2023

cottrell commented Apr 18, 2023

aljazerzen commented Apr 18, 2023