Skip to content

Wrong query results for filters that involve partition columns and data file columns and pushdown_filters is enabled #15912

Closed
@adriangb

Description

@adriangb

Describe the bug

Filters such as partition_col = col_from_file are never applied if datafusion.execution.parquet.pushdown_filters = true

To Reproduce

With datafusion-cli:

COPY  (
  SELECT arrow_cast('a', 'Utf8') AS val
)  TO 'test_files/scratch/test/part=a/123.parquet'
STORED AS PARQUET;
COPY  (
  SELECT arrow_cast('b', 'Utf8') AS val
)  TO 'test_files/scratch/test/part=b/123.parquet'
STORED AS PARQUET;
COPY  (
  SELECT arrow_cast('xyz', 'Utf8') AS val
)  TO 'test_files/scratch/test/part=c/123.parquet'
STORED AS PARQUET;

set datafusion.execution.parquet.pushdown_filters = true;

CREATE EXTERNAL TABLE test(part text, val text)
STORED AS PARQUET
PARTITIONED BY (part)
LOCATION 'test_files/scratch/test/';

SELECT * FROM test;

explain analyze
select * from test
where part != val;
> select * from test
where part != val;
+-----+------+
| val | part |
+-----+------+
| a   | a    |
| xyz | c    |
| b   | b    |
+-----+------+
3 row(s) fetched. 

Which is clearly wrong.

Expected behavior

> select * from test
where part != val;
+-----+------+
| val | part |
+-----+------+
| xyz | c    |
+-----+------+

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions