Skip to content

Commit a878ed4

Browse files
committed
Merge branch 'main' into dataframe-display-config
2 parents 71c64b9 + 09b929a commit a878ed4

File tree

11 files changed

+330
-10
lines changed

11 files changed

+330
-10
lines changed

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717

1818
[package]
1919
name = "datafusion-python"
20-
version = "45.2.0"
20+
version = "46.0.0"
2121
homepage = "https://datafusion.apache.org/python"
2222
repository = "https://github.com/apache/datafusion-python"
2323
authors = ["Apache DataFusion <[email protected]>"]

dev/changelog/46.0.0.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Apache DataFusion Python 46.0.0 Changelog
21+
22+
This release consists of 21 commits from 11 contributors. See credits at the end of this changelog for more information.
23+
24+
**Implemented enhancements:**
25+
26+
- feat: reads using global ctx [#982](https://github.com/apache/datafusion-python/pull/982) (ion-elgreco)
27+
- feat: Implementation of udf and udaf decorator [#1040](https://github.com/apache/datafusion-python/pull/1040) (CrystalZhou0529)
28+
- feat: expose regex_count function [#1066](https://github.com/apache/datafusion-python/pull/1066) (nirnayroy)
29+
- feat: Update DataFusion dependency to 46 [#1079](https://github.com/apache/datafusion-python/pull/1079) (timsaucer)
30+
31+
**Fixed bugs:**
32+
33+
- fix: add to_timestamp_nanos [#1020](https://github.com/apache/datafusion-python/pull/1020) (chenkovsky)
34+
- fix: type checking [#993](https://github.com/apache/datafusion-python/pull/993) (chenkovsky)
35+
36+
**Other:**
37+
38+
- [infra] Fail Clippy on rust build warnings [#1029](https://github.com/apache/datafusion-python/pull/1029) (kevinjqliu)
39+
- Add user documentation for the FFI approach [#1031](https://github.com/apache/datafusion-python/pull/1031) (timsaucer)
40+
- build(deps): bump arrow from 54.1.0 to 54.2.0 [#1035](https://github.com/apache/datafusion-python/pull/1035) (dependabot[bot])
41+
- Chore: Release datafusion-python 45 [#1024](https://github.com/apache/datafusion-python/pull/1024) (timsaucer)
42+
- Enable Dataframe to be converted into views which can be used in register_table [#1016](https://github.com/apache/datafusion-python/pull/1016) (kosiew)
43+
- Add ruff check for missing futures import [#1052](https://github.com/apache/datafusion-python/pull/1052) (timsaucer)
44+
- Enable take comments to assign issues to users [#1058](https://github.com/apache/datafusion-python/pull/1058) (timsaucer)
45+
- Update python min version to 3.9 [#1043](https://github.com/apache/datafusion-python/pull/1043) (kevinjqliu)
46+
- feat/improve ruff test coverage [#1055](https://github.com/apache/datafusion-python/pull/1055) (timsaucer)
47+
- feat/making global context accessible for users [#1060](https://github.com/apache/datafusion-python/pull/1060) (jsai28)
48+
- Renaming Internal Structs [#1059](https://github.com/apache/datafusion-python/pull/1059) (Spaarsh)
49+
- test: add pytest asyncio tests [#1063](https://github.com/apache/datafusion-python/pull/1063) (jsai28)
50+
- Add decorator for udwf [#1061](https://github.com/apache/datafusion-python/pull/1061) (kosiew)
51+
- Add additional ruff suggestions [#1062](https://github.com/apache/datafusion-python/pull/1062) (Spaarsh)
52+
- Improve collection during repr and repr_html [#1036](https://github.com/apache/datafusion-python/pull/1036) (timsaucer)
53+
54+
## Credits
55+
56+
Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor.
57+
58+
```
59+
7 Tim Saucer
60+
2 Kevin Liu
61+
2 Spaarsh
62+
2 jsai28
63+
2 kosiew
64+
1 Chen Chongchen
65+
1 Chongchen Chen
66+
1 Crystal Zhou
67+
1 Ion Koutsouris
68+
1 Nirnay Roy
69+
1 dependabot[bot]
70+
```
71+
72+
Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release.
73+

docs/source/user-guide/basics.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@
2020
Concepts
2121
========
2222

23-
In this section, we will cover a basic example to introduce a few key concepts. We will use the same
24-
source file as described in the :ref:`Introduction <guide>`, the Pokemon data set.
23+
In this section, we will cover a basic example to introduce a few key concepts. We will use the
24+
2021 Yellow Taxi Trip Records ([download](https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-01.parquet)), from the [TLC Trip Record Data](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page).
2525

2626
.. ipython:: python
2727

python/datafusion/__init__.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,9 @@
2626
except ImportError:
2727
import importlib_metadata
2828

29-
# Local module imports
30-
from . import functions, object_store, substrait
29+
from . import functions, object_store, substrait, unparser
30+
31+
# The following imports are okay to remain as opaque to the user.
3132
from ._internal import Config
3233
from .catalog import Catalog, Database, Table
3334
from .common import DFSchema
@@ -85,6 +86,7 @@
8586
"udaf",
8687
"udf",
8788
"udwf",
89+
"unparser",
8890
]
8991

9092

python/datafusion/unparser.py

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
"""This module provides support for unparsing datafusion plans to SQL.
19+
20+
For additional information about unparsing, see https://docs.rs/datafusion-sql/latest/datafusion_sql/unparser/index.html
21+
"""
22+
23+
from ._internal import unparser as unparser_internal
24+
from .plan import LogicalPlan
25+
26+
27+
class Dialect:
28+
"""DataFusion data catalog."""
29+
30+
def __init__(self, dialect: unparser_internal.Dialect) -> None:
31+
"""This constructor is not typically called by the end user."""
32+
self.dialect = dialect
33+
34+
@staticmethod
35+
def default() -> "Dialect":
36+
"""Create a new default dialect."""
37+
return Dialect(unparser_internal.Dialect.default())
38+
39+
@staticmethod
40+
def mysql() -> "Dialect":
41+
"""Create a new MySQL dialect."""
42+
return Dialect(unparser_internal.Dialect.mysql())
43+
44+
@staticmethod
45+
def postgres() -> "Dialect":
46+
"""Create a new PostgreSQL dialect."""
47+
return Dialect(unparser_internal.Dialect.postgres())
48+
49+
@staticmethod
50+
def sqlite() -> "Dialect":
51+
"""Create a new SQLite dialect."""
52+
return Dialect(unparser_internal.Dialect.sqlite())
53+
54+
@staticmethod
55+
def duckdb() -> "Dialect":
56+
"""Create a new DuckDB dialect."""
57+
return Dialect(unparser_internal.Dialect.duckdb())
58+
59+
60+
class Unparser:
61+
"""DataFusion unparser."""
62+
63+
def __init__(self, dialect: Dialect) -> None:
64+
"""This constructor is not typically called by the end user."""
65+
self.unparser = unparser_internal.Unparser(dialect.dialect)
66+
67+
def plan_to_sql(self, plan: LogicalPlan) -> str:
68+
"""Convert a logical plan to a SQL string."""
69+
return self.unparser.plan_to_sql(plan._raw_plan)
70+
71+
def with_pretty(self, pretty: bool) -> "Unparser":
72+
"""Set the pretty flag."""
73+
self.unparser = self.unparser.with_pretty(pretty)
74+
return self
75+
76+
77+
__all__ = [
78+
"Dialect",
79+
"Unparser",
80+
]

python/tests/test_dataframe.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1380,8 +1380,7 @@ def test_display_config_affects_repr(data):
13801380
# The representation should show truncated data (3 rows as specified)
13811381
assert (
13821382
# 5 = 1 header row + 3 separator line + 1 truncation message
1383-
repr_str.count("\n")
1384-
<= max_table_rows_in_repr + 5
1383+
repr_str.count("\n") <= max_table_rows_in_repr + 5
13851384
)
13861385
assert "Data truncated" in repr_str
13871386

@@ -1397,8 +1396,7 @@ def test_display_config_affects_repr(data):
13971396
# Should show all data without truncation message
13981397
assert (
13991398
# 4 = 1 header row + 3 separator lines
1400-
repr_str2.count("\n")
1401-
== max_table_rows_in_repr + 4
1399+
repr_str2.count("\n") == max_table_rows_in_repr + 4
14021400
) # All rows should be shown
14031401
assert "Data truncated" not in repr_str2
14041402

python/tests/test_unparser.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
from datafusion.context import SessionContext
19+
from datafusion.unparser import Dialect, Unparser
20+
21+
22+
def test_unparser():
23+
ctx = SessionContext()
24+
df = ctx.sql("SELECT 1")
25+
for dialect in [
26+
Dialect.mysql(),
27+
Dialect.postgres(),
28+
Dialect.sqlite(),
29+
Dialect.duckdb(),
30+
]:
31+
unparser = Unparser(dialect)
32+
sql = unparser.plan_to_sql(df.logical_plan())
33+
assert sql == "SELECT 1"

src/lib.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ pub mod pyarrow_util;
5252
mod record_batch;
5353
pub mod sql;
5454
pub mod store;
55+
pub mod unparser;
5556

5657
#[cfg(feature = "substrait")]
5758
pub mod substrait;
@@ -104,6 +105,10 @@ fn _internal(py: Python, m: Bound<'_, PyModule>) -> PyResult<()> {
104105
expr::init_module(&expr)?;
105106
m.add_submodule(&expr)?;
106107

108+
let unparser = PyModule::new(py, "unparser")?;
109+
unparser::init_module(&unparser)?;
110+
m.add_submodule(&unparser)?;
111+
107112
// Register the functions as a submodule
108113
let funcs = PyModule::new(py, "functions")?;
109114
functions::init_module(&funcs)?;

src/unparser/dialect.rs

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
// Licensed to the Apache Software Foundation (ASF) under one
2+
// or more contributor license agreements. See the NOTICE file
3+
// distributed with this work for additional information
4+
// regarding copyright ownership. The ASF licenses this file
5+
// to you under the Apache License, Version 2.0 (the
6+
// "License"); you may not use this file except in compliance
7+
// with the License. You may obtain a copy of the License at
8+
//
9+
// http://www.apache.org/licenses/LICENSE-2.0
10+
//
11+
// Unless required by applicable law or agreed to in writing,
12+
// software distributed under the License is distributed on an
13+
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
// KIND, either express or implied. See the License for the
15+
// specific language governing permissions and limitations
16+
// under the License.
17+
18+
use std::sync::Arc;
19+
20+
use datafusion::sql::unparser::dialect::{
21+
DefaultDialect, Dialect, DuckDBDialect, MySqlDialect, PostgreSqlDialect, SqliteDialect,
22+
};
23+
use pyo3::prelude::*;
24+
25+
#[pyclass(name = "Dialect", module = "datafusion.unparser", subclass)]
26+
#[derive(Clone)]
27+
pub struct PyDialect {
28+
pub dialect: Arc<dyn Dialect>,
29+
}
30+
31+
#[pymethods]
32+
impl PyDialect {
33+
#[staticmethod]
34+
pub fn default() -> Self {
35+
Self {
36+
dialect: Arc::new(DefaultDialect {}),
37+
}
38+
}
39+
#[staticmethod]
40+
pub fn postgres() -> Self {
41+
Self {
42+
dialect: Arc::new(PostgreSqlDialect {}),
43+
}
44+
}
45+
#[staticmethod]
46+
pub fn mysql() -> Self {
47+
Self {
48+
dialect: Arc::new(MySqlDialect {}),
49+
}
50+
}
51+
#[staticmethod]
52+
pub fn sqlite() -> Self {
53+
Self {
54+
dialect: Arc::new(SqliteDialect {}),
55+
}
56+
}
57+
#[staticmethod]
58+
pub fn duckdb() -> Self {
59+
Self {
60+
dialect: Arc::new(DuckDBDialect::new()),
61+
}
62+
}
63+
}

0 commit comments

Comments
 (0)