Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL samples do not produce idemponent agregate operations #543

Open
arejula27 opened this issue Feb 19, 2025 · 0 comments
Open

SQL samples do not produce idemponent agregate operations #543

arejula27 opened this issue Feb 19, 2025 · 0 comments

Comments

@arejula27
Copy link

What went wrong?

Ran several times a SQL aggregate query using sampling and I got different results

Image

How to reproduce?

Just follow the tutorial from the doc

1. Code that triggered the bug, or steps to reproduce:


val ecommerce = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("ecommerce100K_2019_Oct.csv")

val qbeastTablePath = "table/qbeast-test-data/qtable"
(ecommerce.write.mode("overwrite").format("qbeast").option("columnsToIndex", "user_id,product_id").option("cubeSize", "500").save(qbeastTablePath))


val qbeastDf = (spark.read.format("qbeast").load(qbeastTablePath))
qbeastDf.sample(0.1).agg(avg("price")).show()

ecommerce.createOrReplaceTempView("ecommerce_october")
spark.sql("SELECT avg(price) FROM ecommerce_october TABLESAMPLE(10 PERCENT)").show()


2. Branch and commit id:

--packages io.qbeast:qbeast-spark_2.12:0.6.0

3. Spark version:

res0: String = 3.5.0

4. Hadoop version:

res1: String = 3.3.4.

5. How are you running Spark?

Local computer using nix shell

6. Stack trace:

No error output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant