Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add async to query method #45

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

add async to query method #45

wants to merge 1 commit into from

Conversation

danfrankj
Copy link
Collaborator

@danfrankj danfrankj commented Mar 30, 2018

Summary

Currently you can run an execute statement async but then you have to format the cursor yourself. This PR makes it so that you can query async and receive a future as a result. Haven't thought this through entirely, but wanted to get your thoughts @matthewwardrop

FYI @ajfriend

@matthewwardrop
Copy link
Collaborator

Thanks for the patch @danfrankj . It's quite timely, since I have started thinking about delegated execution (i.e. running jobs through an external service, and then asynchronously retrieving results). I'll let this one sit for a bit, since I need to give some thought to how best to achieve this. If it takes too long for me to get to this, we can merge this one in and revisit it later; but I'm currently thinking to have this done within a month.

@staubda
Copy link

staubda commented Feb 1, 2019

Hey @matthewwardrop, @danfrankj, wanted to see if there were any new thoughts around this issue, particularly the idea of

delegated execution (i.e. running jobs through an external service, and then asynchronously retrieving results)

I recently wrote some utils to achieve a poor man's version of the above, but would love to have a more robust, better integrated way of doing things.

My implementation roughly does the following:

launch_query(query, query_name, remote_host)

  1. Prepend HQL for creating a Hive table to query (for temp storage of query results).
  2. Connect to remote_host using omniduct.SSHClient, execute query in a detached screen via the hive CLI.
  3. Pipe the hive CLI output into query_name.log.

get_query_info(remote_host)

  1. Return a dict listing all active/completed query_names running on remote_host (by looking at logs and screens matching the naming convention used by launch_query)

tail_query_log(query_name, remote_host)

  1. Use SSH to tail the log associated with query_name and forward stdout to the client.

get_query_results(query_name)

  1. Simple wrapper to execute a "SELECT *" on the temp table associated with query_name and return results as a DataFrame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants