You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
union all +aggregate function in the recursive cte results an infinite loop
To Reproduce
Steps to reproduce the behavior:
importdatafusionfromdatafusionimportSessionContextctx=SessionContext()
df=ctx.sql("""with recursive t as(select value i,1 lv from generate_series(1,6,1)union allselect max(i),max(lv)+1 from t where lv<2)select * from t where lv=2;""")
>>>print(df)
❯ duckdb
v1.1.3 19864453f7
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D WITH RECURSIVE t(i, lv) AS (
SELECT generate_series AS i, 1 AS lv
FROM generate_series(1, 6, 1)
UNION ALL
SELECT MAX(i), MAX(lv) + 1
FROM t
WHERE lv < 2
)
SELECT *
FROM t
WHERE lv = 2;
I think this isn’t a bug in DataFusion-Python so much as a quirk of SQL’s recursive-CTE semantics when you use UNION ALL with an aggregate that never changes its output. Any engine that follows the SQL standard will do the same. DuckDB likewise spins forever because the recursive member keeps re-emitting the same row, and UNION ALL does not remove duplicates, so the CTE never reaches a fixpoint.
Amending the query to UNION does complete in duckdb:
❯ duckdb
v1.1.3 19864453f7
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D WITH RECURSIVE t(i, lv) AS (
SELECT generate_series AS i, 1 AS lv
FROM generate_series(1, 6, 1)
UNION
SELECT MAX(i), MAX(lv) + 1
FROM t
WHERE lv < 2
)
SELECT *
FROM t
WHERE lv = 2;
┌───────┬───────┐
│ i │ lv │
│ int64 │ int32 │
├───────┼───────┤
│ 6 │ 2 │
└───────┴───────┘
But the same query is not yet supported in Datafusion:
>>> sql = """
... with recursive t as(select value i,1 lv from generate_series(1,6,1)
... union
... select max(i),max(lv)+1 from t where lv<2)
... select * from t where lv=2;
... """
>>> df = ctx.sql(sql)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/kosiew/GitHub/datafusion-python/python/datafusion/context.py", line 589, in sql
return DataFrame(self.ctx.sql(query))
^^^^^^^^^^^^^^^^^^^
Exception: DataFusion error: NotImplemented("Recursive queries with a distinct 'UNION' (in which the previous iteration's results will be de-duplicated) is not supported")
Describe the bug
union all +aggregate function in the recursive cte results an infinite loop
To Reproduce
Steps to reproduce the behavior:
Expected behavior
it outputs
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: