-
Notifications
You must be signed in to change notification settings - Fork 19
feat: add wrapper for reading table data using Storage API #431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add wrapper for reading table data using Storage API #431
Conversation
Some early results:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some more comments, address some minor issues and pushed a new test using CTA and generate_array
to test it with more data cc @shollyman @leahecole
return stream | ||
.pipe(new ArrowRawTransform()) | ||
.pipe(new ArrowRecordReaderTransform(info!)) | ||
.pipe(new ArrowRecordBatchTransform()) as ResourceStream<RecordBatch>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to use pipeline
, but is meant to be used when we have a destination. In this case where, we are just applying a bunch of transforms and we don't know the destination beforehand.
The error that I got:
TypeError [ERR_INVALID_ARG_TYPE]: The "streams[stream.length - 1]" property must be of type function. Received an instance of ArrowRecordBatchTransform
return stream.pipe( | ||
new ArrowRecordBatchTableRowTransform() | ||
) as ResourceStream<TableRow>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors are handled by the consumer of the stream and when used internally like here, we handle the errors.
Add support for easily reading Tables using the BigQuery Storage API instead of the BigQuery API. This will provide increased performance and reduced memory usage for most use cases and will allow users to keep using the same interface as they used to use on our main library or fetch data directly via a new veneer on BigQuery Storage Read API