feat: Expose query results as an IO object. #3

nelson-vantage · 2024-12-27T23:42:40Z

Sketch of one possible approach to exposing the query results as an IO object instead of a singular string. Rushed through the interface – I believe ideally we replace the current buf with buf_io.

nelson-vantage · 2024-12-27T23:43:37Z

ext/chdb/chdb.c

@@ -19,6 +22,33 @@ typedef struct {
    struct local_result_v2 *c_result;
 } LocalResult;

+VALUE rb_io_from_buffer(const char *buf, long len) {


Admittedly generated this via ChatGPT. One downside here is that we're copying the buffer into a pipe, which means we're still duplicating the byte array rather than wrapping it in an IO object that knows how to read from it.

nelson-vantage · 2024-12-27T23:44:12Z

lib/chdb.rb

@@ -29,6 +29,7 @@ def build_query_string(query_str, output_format)
      format_suffix = case output_format.downcase
                      when "csv" then " FORMAT CSVWithNames"
                      when "json" then " FORMAT JSON"
+                      when "jsoneachrow" then " FORMAT JSONEachRow"


Allows us to parse each line separately each time we yield it, rather than needing to parse a potentially large result set in one go.

Think we can benefit from some type of enum for all supported formats as well.

nelson-vantage · 2024-12-27T23:48:18Z

Will get back to this for some proper cleanup - opening this PR early for any possible comments and suggestions.

g3ortega · 2025-01-02T13:57:07Z

Nice addition, @nelson-vantage; I'll take a closer look today.

nelson-vantage · 2025-01-02T15:16:43Z

Nice addition, @nelson-vantage; I'll take a closer look today.

Gracias! I actually wouldn't merge this PR – it doesn't work as it should because the IO writer will block if the pipe object doesn't have enough space for the results. Still think we can improve the efficiency of the library by (a) not creating a new string every time we call Result#buf and (b) by being smart about how we parse CSV and JSON objects, using buf.each_line instead of trying to parse the entirety of the results in one go. Happy to take a stab at it.

feat: Expose query results as an IO object.

340f45f

nelson-vantage commented Dec 27, 2024

View reviewed changes

nelson-vantage mentioned this pull request Dec 27, 2024

Return IO object for LocalResult#buf #2

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Expose query results as an IO object. #3

feat: Expose query results as an IO object. #3

nelson-vantage commented Dec 27, 2024 •

edited

Loading

nelson-vantage Dec 27, 2024

nelson-vantage Dec 27, 2024

nelson-vantage Dec 27, 2024

nelson-vantage commented Dec 27, 2024 •

edited

Loading

g3ortega commented Jan 2, 2025

nelson-vantage commented Jan 2, 2025 •

edited

Loading

feat: Expose query results as an IO object. #3

Are you sure you want to change the base?

feat: Expose query results as an IO object. #3

Conversation

nelson-vantage commented Dec 27, 2024 • edited Loading

nelson-vantage Dec 27, 2024

Choose a reason for hiding this comment

nelson-vantage Dec 27, 2024

Choose a reason for hiding this comment

nelson-vantage Dec 27, 2024

Choose a reason for hiding this comment

nelson-vantage commented Dec 27, 2024 • edited Loading

g3ortega commented Jan 2, 2025

nelson-vantage commented Jan 2, 2025 • edited Loading

nelson-vantage commented Dec 27, 2024 •

edited

Loading

nelson-vantage commented Dec 27, 2024 •

edited

Loading

nelson-vantage commented Jan 2, 2025 •

edited

Loading