Skip to content

Conversation

sean-eyre
Copy link

The current query used to get data objects in Athena becomes very slow as more models are included.

This PR uses a "CTE of VALUES with a join" strategy to speed up the query time significantly. From ~4 mins to <10 seconds when querying for 412 objects in my case.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@sean-eyre sean-eyre changed the title Se/improve athena get data objs feat: quicker _get_data_objects query for athena Aug 14, 2025
@erindru
Copy link
Collaborator

erindru commented Aug 14, 2025

Nice, thanks!

Can you please sign the CLA and also enable the Athena integration tests to run for this PR, which you can do by adjusting the engine_tests_cloud section in .circleci/continue_config.yml to something like:

          matrix:
            parameters:
              engine:
#                - snowflake
#                - databricks
#                - redshift
#                - bigquery
#                - clickhouse-cloud
                - athena
#                - gcp-postgres
#          filters:
#            branches:
#              only:
#                - main

(that is, comment out the branch filter and the other databases to get the athena tests to run)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants