Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Expose query results as an IO object. #3

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nelson-vantage
Copy link

@nelson-vantage nelson-vantage commented Dec 27, 2024

Sketch of one possible approach to exposing the query results as an IO object instead of a singular string. Rushed through the interface – I believe ideally we replace the current buf with buf_io.

@@ -19,6 +22,33 @@ typedef struct {
struct local_result_v2 *c_result;
} LocalResult;

VALUE rb_io_from_buffer(const char *buf, long len) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly generated this via ChatGPT. One downside here is that we're copying the buffer into a pipe, which means we're still duplicating the byte array rather than wrapping it in an IO object that knows how to read from it.

@@ -29,6 +29,7 @@ def build_query_string(query_str, output_format)
format_suffix = case output_format.downcase
when "csv" then " FORMAT CSVWithNames"
when "json" then " FORMAT JSON"
when "jsoneachrow" then " FORMAT JSONEachRow"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allows us to parse each line separately each time we yield it, rather than needing to parse a potentially large result set in one go.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we can benefit from some type of enum for all supported formats as well.

@nelson-vantage
Copy link
Author

nelson-vantage commented Dec 27, 2024

Will get back to this for some proper cleanup - opening this PR early for any possible comments and suggestions.

@g3ortega
Copy link
Owner

g3ortega commented Jan 2, 2025

Nice addition, @nelson-vantage; I'll take a closer look today.

@nelson-vantage
Copy link
Author

nelson-vantage commented Jan 2, 2025

Nice addition, @nelson-vantage; I'll take a closer look today.

Gracias! I actually wouldn't merge this PR – it doesn't work as it should because the IO writer will block if the pipe object doesn't have enough space for the results. Still think we can improve the efficiency of the library by (a) not creating a new string every time we call Result#buf and (b) by being smart about how we parse CSV and JSON objects, using buf.each_line instead of trying to parse the entirety of the results in one go. Happy to take a stab at it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants