SQL samples do not produce idemponent agregate operations #543

arejula27 · 2025-02-19T10:57:59Z

What went wrong?

Ran several times a SQL aggregate query using sampling and I got different results

How to reproduce?

Just follow the tutorial from the doc

1. Code that triggered the bug, or steps to reproduce:


val ecommerce = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("ecommerce100K_2019_Oct.csv")

val qbeastTablePath = "table/qbeast-test-data/qtable"
(ecommerce.write.mode("overwrite").format("qbeast").option("columnsToIndex", "user_id,product_id").option("cubeSize", "500").save(qbeastTablePath))


val qbeastDf = (spark.read.format("qbeast").load(qbeastTablePath))
qbeastDf.sample(0.1).agg(avg("price")).show()

ecommerce.createOrReplaceTempView("ecommerce_october")
spark.sql("SELECT avg(price) FROM ecommerce_october TABLESAMPLE(10 PERCENT)").show()

2. Branch and commit id:

--packages io.qbeast:qbeast-spark_2.12:0.6.0

3. Spark version:

res0: String = 3.5.0

4. Hadoop version:

res1: String = 3.3.4.

5. How are you running Spark?

Local computer using nix shell

6. Stack trace:

No error output

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL samples do not produce idemponent agregate operations #543

SQL samples do not produce idemponent agregate operations #543

arejula27 commented Feb 19, 2025

SQL samples do not produce idemponent agregate operations #543

SQL samples do not produce idemponent agregate operations #543

Comments

arejula27 commented Feb 19, 2025

What went wrong?

How to reproduce?

1. Code that triggered the bug, or steps to reproduce:

2. Branch and commit id:

3. Spark version:

4. Hadoop version:

5. How are you running Spark?

6. Stack trace: