-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[SPARK-53536][Core] Adding a Golden File Test With Randomly Generated SQL Scripts #52287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[SPARK-53536][Core] Adding a Golden File Test With Randomly Generated SQL Scripts #52287
Conversation
sql/core/src/test/resources/sql-tests/inputs/scripting/randomly_generated_scripts.sql
Outdated
Show resolved
Hide resolved
how did we verify the result? is there a reference system? |
Every script was hand picked and manually run using local spark. Behavior and script code coverage was not tested, but every script does have its logical flow described under the script itself. |
So you manually verified the results? |
Yes, by manually running the scripts locally. |
I'm a bit worried about the golden answer. How did you know if the result is correct or not when you verify? By analyzing the script manually and come up with the result? I will be more reliefed if it's verified by llm... |
Scripts outputted by the LLM were not edited manually at all, they were only run for checking parsing errors. Semantics and logic flow were not thoroughly analyzed, and were analyzed for script diversity (only the combinations of flow control blocks like ifs, for loops, while loops etc). Final results of scripts were not analyzed/verified thoroughly, and all the tags, expected and executes comments were generated by the LLM. |
Can we go a bit futher to verify the results with a reference system like pgsql? It's good to have more tests, but without verifying the test result, the tests do not prove anything. |
What changes were proposed in this pull request?
PR adds a new golden file test that contains 100 randomly generated SQL Scripts. Scripts were generated using an LLM (Perplexity AI), and the scripts were handpicked from those generated SQL Scripts. Criterium for selection was:
Following prompt was used:
Why are the changes needed?
These tests will be used to catch regression errors inside scripting.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
It was manually tested using Databrick's notebooks, and via inspection of the golden file generation output files.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Perplexity AI