Skip to content

bug: inconsistent None placement in sort_array in ascending order #560

@raisadz

Description

@raisadz

While running sqlframe:

import sqlframe
sqlframe.activate('duckdb')
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

session = SparkSession.builder.getOrCreate()

data = {"a": [[3, None, 2, 4, None]]}
rows = [{key: value[i] for key, value in data.items()} for i in range(len(data[next(iter(data.keys()))]))]
df = session.createDataFrame(rows)

df = df.withColumn("sorted_expr", F.sort_array("a"))
df.show()

Nones are placed last while sorting in ascending order:

+-----------------------+-----------------------+
|           a           |      sorted_expr      |
+-----------------------+-----------------------+
| [3, None, 2, 4, None] | [2, 3, 4, None, None] |
+-----------------------+-----------------------+

while pyspark using the same code:

import sqlframe
#sqlframe.activate('duckdb')
from pyspark.sql import SparkSession
import pyspark.sql.functions as F

session = SparkSession.builder.getOrCreate()

data = {"a": [[3, None, 2, 4, None]]}
rows = [{key: value[i] for key, value in data.items()} for i in range(len(data[next(iter(data.keys()))]))]
df = session.createDataFrame(rows)

df = df.withColumn("sorted_expr", F.sort_array("a"))
df.show()

outputs NULLS first:

+--------------------+--------------------+                                     
|                   a|         sorted_expr|
+--------------------+--------------------+
|[3, NULL, 2, 4, N...|[NULL, NULL, 2, 3...|
+--------------------+--------------------+

Also, in the documentation it says "Null elements will be placed at the beginning of the returned array in ascending order" (https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.sort_array.html).

Name: sqlframe
Version: 3.43.8

Name: duckdb
Version: 1.3.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions