Skip to content

Conversation

alvarowolfx
Copy link
Contributor

@alvarowolfx alvarowolfx commented Mar 25, 2024

Add support for easily reading Tables using the BigQuery Storage API instead of the BigQuery API. This will provide increased performance and reduced memory usage for most use cases and will allow users to keep using the same interface as they used to use on our main library or fetch data directly via a new veneer on BigQuery Storage Read API

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquerystorage Issues related to the googleapis/nodejs-bigquery-storage API. labels Mar 25, 2024
@alvarowolfx
Copy link
Contributor Author

Some early results:

  1. SELECT repository_url as url, repository_owner as owner, repository_forks as forks FROM `bigquery-public-data.samples.github_timeline` where repository_url is not null LIMIT 300000
  • fetchGetQueryResultsI: 31.135s 🔴
  • fetchStorageAPI: 20.033s ⬆️ 36% speedup
  1. SELECT repository_url as url, repository_owner as owner, repository_forks as forks FROM `bigquery-public-data.samples.github_timeline` where repository_url is not null LIMIT 1000000
  • fetchGetQueryResultsI: 1:32.622 (m:ss.mmm) 🔴
  • fetchStorageAPI: 1:07.363 (m:ss.mmm) ⬆️ 27% faster
  1. SELECT name, number, state from `bigquery-public-data.usa_names.usa_1910_current
  • fetchGetQueryResults: 5:00.514 (m:ss.mmm) 🔴
  • fetchStorageAPI: 3:20.987 (m:ss.mmm) ⬆️ 33% faster

@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Jul 17, 2024
@alvarowolfx alvarowolfx marked this pull request as ready for review July 24, 2024 20:26
@alvarowolfx alvarowolfx requested a review from a team as a code owner July 24, 2024 20:26
Copy link
Contributor Author

@alvarowolfx alvarowolfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more comments, address some minor issues and pushed a new test using CTA and generate_array to test it with more data cc @shollyman @leahecole

Comment on lines +92 to +95
return stream
.pipe(new ArrowRawTransform())
.pipe(new ArrowRecordReaderTransform(info!))
.pipe(new ArrowRecordBatchTransform()) as ResourceStream<RecordBatch>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to use pipeline, but is meant to be used when we have a destination. In this case where, we are just applying a bunch of transforms and we don't know the destination beforehand.

The error that I got:

  TypeError [ERR_INVALID_ARG_TYPE]: The "streams[stream.length - 1]" property must be of type function. Received an instance of ArrowRecordBatchTransform

Comment on lines +99 to +101
return stream.pipe(
new ArrowRecordBatchTableRowTransform()
) as ResourceStream<TableRow>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors are handled by the consumer of the stream and when used internally like here, we handle the errors.

@alvarowolfx alvarowolfx added the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 6, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 6, 2024
@alvarowolfx alvarowolfx added automerge Merge the pull request once unit tests and other checks pass. and removed automerge Merge the pull request once unit tests and other checks pass. labels Sep 20, 2024
@alvarowolfx alvarowolfx added the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 20, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 20, 2024
@alvarowolfx alvarowolfx added automerge Merge the pull request once unit tests and other checks pass. owlbot:run Add this label to trigger the Owlbot post processor. labels Sep 20, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 20, 2024
@alvarowolfx alvarowolfx added the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 20, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 20, 2024
@alvarowolfx alvarowolfx removed the automerge Merge the pull request once unit tests and other checks pass. label Sep 20, 2024
@alvarowolfx alvarowolfx requested a review from a team as a code owner September 23, 2024 17:45
@alvarowolfx alvarowolfx added automerge Merge the pull request once unit tests and other checks pass. owlbot:run Add this label to trigger the Owlbot post processor. labels Sep 23, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 23, 2024
@gcf-merge-on-green gcf-merge-on-green bot merged commit 03f2b1f into googleapis:main Sep 23, 2024
14 checks passed
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerystorage Issues related to the googleapis/nodejs-bigquery-storage API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants