Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queries with COUNT(*) OVER() aggregations return an error #228

Open
jeromefroe opened this issue Feb 10, 2025 · 1 comment
Open

Queries with COUNT(*) OVER() aggregations return an error #228

jeromefroe opened this issue Feb 10, 2025 · 1 comment

Comments

@jeromefroe
Copy link

Hi, I've noticed that queries with COUNT(*) OVER() aggregations return an error: mismatch rowid 1 != 2. An example of such a query is:

WITH Produce AS
 (SELECT 'kale' as item, 23 as purchases, 'vegetable' as category
  UNION ALL SELECT 'banana', 2, 'fruit'
  UNION ALL SELECT 'cabbage', 9, 'vegetable'
  UNION ALL SELECT 'apple', 8, 'fruit'
  UNION ALL SELECT 'leek', 2, 'vegetable'
  UNION ALL SELECT 'lettuce', 10, 'vegetable')
SELECT item, purchases, COUNT(*) OVER () AS total_items
FROM Produce

If I run the same query in BigQuery it returns the following results:

item purchases total_items
kale 23 6
banana 2 6
cabbage 9 6
apple 8 6
leek 2 6
lettuce 10 6

I've noticed that the zetasql library transforms this query into the following:

WITH Produce AS (
    SELECT 
        `item#1` AS `item#19`, 
        `purchases#2` AS `purchases#20`, 
        `category#3` AS `category#21`
    FROM (
        ...
    )
)

SELECT 
    `item#22` AS `item`, 
    `purchases#23` AS `purchases`, 
    `category#24` AS `category`, 
    `total_items#26` AS `total_items`
FROM (
    SELECT 
        `item#22`, 
        `purchases#23`, 
        `category#24`, 
        (
            SELECT 
                zetasqlite_window_count_star(
                    zetasqlite_ignore_nulls(), 
                    zetasqlite_window_frame_unit(1), 
                    zetasqlite_window_boundary_start(1, 0), 
                    zetasqlite_window_boundary_end(5, 0), 
                    zetasqlite_window_rowid(`row_id`)
                )
            FROM (
                SELECT 
                    `item#19` AS `item#22`, 
                    `purchases#20` AS `purchases#23`, 
                    `category#21` AS `category#24`
                FROM `Produce`
            )
        ) AS `total_items#26`
    FROM (
        SELECT 
            *, 
            ROW_NUMBER() OVER() AS `row_id`
        FROM (
            SELECT 
                `item#19` AS `item#22`, 
                `purchases#20` AS `purchases#23`, 
                `category#21` AS `category#24`
            FROM `Produce`
        )
    )
);

I think the issue is that SQLite only callszetasqlite_window_count_star once for each row and produces a single row, whereas the desired behavior would be to create the same number of rows as the input data has. This is the behavior if the COUNT(*) OVER() aggregation is replaced with SUM(purchases) OVER(). I modified the query slightly to test the different behavior between BigQuery and SQLite:

WITH Produce AS
 (SELECT 'kale' as item, 23 as purchases, 'vegetable' as category
  UNION ALL SELECT 'banana', 2, 'fruit'
  UNION ALL SELECT 'cabbage', 9, 'vegetable'
  UNION ALL SELECT 'apple', 8, 'fruit'
  UNION ALL SELECT 'leek', 2, 'vegetable'
  UNION ALL SELECT 'lettuce', 10, 'vegetable')

SELECT 
    `item#22` AS `item`, 
    `purchases#23` AS `purchases`, 
    `category#24` AS `category`, 
    `total_items#26` AS `total_items`
FROM (
    SELECT 
        `item#22`, 
        `purchases#23`, 
        `category#24`, 
        (
            SELECT 
                SUM(`row_id`)
            FROM (
                SELECT 
                    `item` AS `item#22`,
                    `purchases` AS `purchases#23`,
                    `category` AS `category#24`
                FROM `Produce`
            )
        ) AS `total_items#26`
    FROM (
        SELECT 
            *, 
            ROW_NUMBER() OVER() AS `row_id`
        FROM (
            SELECT 
                `item` AS `item#22`,
                `purchases` AS `purchases#23`,
                `category` AS `category#24`
            FROM `Produce`
        )
    )
);

On BigQuery this query returns 6 results, but SQLite only returns a single result.

@ohaibbq
Copy link
Contributor

ohaibbq commented Feb 10, 2025

@jeromefroe - I tested this in our fork (Recidiviz/bigquery-emulator) with #169 merged and it works as expected.

@goccy I've got little hope that my upstream feature in mattn/go-sqlite will be merged as its already been a year. Do you think you could comment on the issue expressing your support for it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants