-
Notifications
You must be signed in to change notification settings - Fork 23
Closed
Description
While running sqlframe:
import sqlframe
sqlframe.activate('duckdb')
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
session = SparkSession.builder.getOrCreate()
data = {"a": [[3, None, 2, 4, None]]}
rows = [{key: value[i] for key, value in data.items()} for i in range(len(data[next(iter(data.keys()))]))]
df = session.createDataFrame(rows)
df = df.withColumn("sorted_expr", F.sort_array("a"))
df.show()Nones are placed last while sorting in ascending order:
+-----------------------+-----------------------+
| a | sorted_expr |
+-----------------------+-----------------------+
| [3, None, 2, 4, None] | [2, 3, 4, None, None] |
+-----------------------+-----------------------+while pyspark using the same code:
import sqlframe
#sqlframe.activate('duckdb')
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
session = SparkSession.builder.getOrCreate()
data = {"a": [[3, None, 2, 4, None]]}
rows = [{key: value[i] for key, value in data.items()} for i in range(len(data[next(iter(data.keys()))]))]
df = session.createDataFrame(rows)
df = df.withColumn("sorted_expr", F.sort_array("a"))
df.show()outputs NULLS first:
+--------------------+--------------------+
| a| sorted_expr|
+--------------------+--------------------+
|[3, NULL, 2, 4, N...|[NULL, NULL, 2, 3...|
+--------------------+--------------------+Also, in the documentation it says "Null elements will be placed at the beginning of the returned array in ascending order" (https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.sort_array.html).
Name: sqlframe
Version: 3.43.8
Name: duckdb
Version: 1.3.0
Metadata
Metadata
Assignees
Labels
No labels