@@ -23,17 +23,100 @@ DataFusion also offers a SQL API, read the full reference `here <https://arrow.a
2323.. ipython :: python
2424
2525 import datafusion
26- from datafusion import col
27- import pyarrow
26+ from datafusion import DataFrame, SessionContext
2827
2928 # create a context
3029 ctx = datafusion.SessionContext()
3130
3231 # register a CSV
33- ctx.register_csv(' pokemon' , ' pokemon.csv' )
32+ ctx.register_csv(" pokemon" , " pokemon.csv" )
3433
3534 # create a new statement via SQL
3635 df = ctx.sql(' SELECT "Attack"+"Defense", "Attack"-"Defense" FROM pokemon' )
3736
3837 # collect and convert to pandas DataFrame
39- df.to_pandas()
38+ df.to_pandas()
39+
40+ Parameterized queries
41+ ---------------------
42+
43+ In DataFusion-Python 51.0.0 we introduced the ability to pass parameters
44+ in a SQL query. These are similar in concept to
45+ `prepared statements <https://datafusion.apache.org/user-guide/sql/prepared_statements.html >`_,
46+ but allow passing named parameters into a SQL query. Consider this simple
47+ example.
48+
49+ .. ipython :: python
50+
51+ def show_attacks (ctx : SessionContext, threshold : int ) -> None :
52+ ctx.sql(
53+ ' SELECT "Name", "Attack" FROM pokemon WHERE "Attack" > $val' , val = threshold
54+ ).show(num = 5 )
55+ show_attacks(ctx, 75 )
56+
57+ When passing parameters like the example above we convert the Python objects
58+ into their string representation. We also have special case handling
59+ for :py:class: `~datafusion.dataframe.DataFrame ` objects, since they cannot simply
60+ be turned into string representations for an SQL query. In these cases we
61+ will register a temporary view in the :py:class: `~datafusion.context.SessionContext `
62+ using a generated table name.
63+
64+ The formatting for passing string replacement objects is to precede the
65+ variable name with a single ``$ ``. This works for all dialects in
66+ the SQL parser except ``hive `` and ``mysql ``. Since these dialects do not
67+ support named placeholders, we are unable to do this type of replacement.
68+ We recommend either switching to another dialect or using Python
69+ f-string style replacement.
70+
71+ .. warning ::
72+
73+ To support DataFrame parameterized queries, your session must support
74+ registration of temporary views. The default
75+ :py:class: `~datafusion.catalog.CatalogProvider ` and
76+ :py:class: `~datafusion.catalog.SchemaProvider ` do have this capability.
77+ If you have implemented custom providers, it is important that temporary
78+ views do not persist across :py:class: `~datafusion.context.SessionContext `
79+ or you may get unintended consequences.
80+
81+ The following example shows passing in both a :py:class: `~datafusion.dataframe.DataFrame `
82+ object as well as a Python object to be used in parameterized replacement.
83+
84+ .. ipython :: python
85+
86+ def show_column (
87+ ctx : SessionContext, column : str , df : DataFrame, threshold : int
88+ ) -> None :
89+ ctx.sql(
90+ ' SELECT "Name", $col FROM $df WHERE $col > $val' ,
91+ col = column,
92+ df = df,
93+ val = threshold,
94+ ).show(num = 5 )
95+ df = ctx.table(" pokemon" )
96+ show_column(ctx, ' "Defense"' , df, 75 )
97+
98+ The approach implemented for conversion of variables into a SQL query
99+ relies on string conversion. This has the potential for data loss,
100+ specifically for cases like floating point numbers. If you need to pass
101+ variables into a parameterized query and it is important to maintain the
102+ original value without conversion to a string, then you can use the
103+ optional parameter ``param_values `` to specify these. This parameter
104+ expects a dictionary mapping from the parameter name to a Python
105+ object. Those objects will be cast into a
106+ `PyArrow Scalar Value <https://arrow.apache.org/docs/python/generated/pyarrow.Scalar.html >`_.
107+
108+ Using ``param_values `` will rely on the SQL dialect you have configured
109+ for your session. This can be set using the :ref: `configuration options <configuration >`
110+ of your :py:class: `~datafusion.context.SessionContext `. Similar to how
111+ `prepared statements <https://datafusion.apache.org/user-guide/sql/prepared_statements.html >`_
112+ work, these parameters are limited to places where you would pass in a
113+ scalar value, such as a comparison.
114+
115+ .. ipython :: python
116+
117+ def param_attacks (ctx : SessionContext, threshold : int ) -> None :
118+ ctx.sql(
119+ ' SELECT "Name", "Attack" FROM pokemon WHERE "Attack" > $val' ,
120+ param_values = {" val" : threshold},
121+ ).show(num = 5 )
122+ param_attacks(ctx, 75 )
0 commit comments