feat: add wrapper for reading table data using Storage API #431

alvarowolfx · 2024-03-25T20:03:12Z

Add support for easily reading Tables using the BigQuery Storage API instead of the BigQuery API. This will provide increased performance and reduced memory usage for most use cases and will allow users to keep using the same interface as they used to use on our main library or fetch data directly via a new veneer on BigQuery Storage Read API

alvarowolfx · 2024-06-04T16:27:34Z

Some early results:

SELECT repository_url as url, repository_owner as owner, repository_forks as forks FROM `bigquery-public-data.samples.github_timeline` where repository_url is not null LIMIT 300000

fetchGetQueryResultsI: 31.135s 🔴
fetchStorageAPI: 20.033s ⬆️ 36% speedup

SELECT repository_url as url, repository_owner as owner, repository_forks as forks FROM `bigquery-public-data.samples.github_timeline` where repository_url is not null LIMIT 1000000

fetchGetQueryResultsI: 1:32.622 (m:ss.mmm) 🔴
fetchStorageAPI: 1:07.363 (m:ss.mmm) ⬆️ 27% faster

SELECT name, number, state from `bigquery-public-data.usa_names.usa_1910_current

fetchGetQueryResults: 5:00.514 (m:ss.mmm) 🔴
fetchStorageAPI: 3:20.987 (m:ss.mmm) ⬆️ 33% faster

alvarowolfx

Added some more comments, address some minor issues and pushed a new test using CTA and generate_array to test it with more data cc @shollyman @leahecole

alvarowolfx · 2024-09-03T20:46:19Z

src/reader/arrow_reader.ts

+    return stream
+      .pipe(new ArrowRawTransform())
+      .pipe(new ArrowRecordReaderTransform(info!))
+      .pipe(new ArrowRecordBatchTransform()) as ResourceStream<RecordBatch>;


Tried to use pipeline, but is meant to be used when we have a destination. In this case where, we are just applying a bunch of transforms and we don't know the destination beforehand.

The error that I got:

TypeError [ERR_INVALID_ARG_TYPE]: The "streams[stream.length - 1]" property must be of type function. Received an instance of ArrowRecordBatchTransform

alvarowolfx · 2024-09-03T21:02:59Z

src/reader/table_reader.ts

+    return stream.pipe(
+      new ArrowRecordBatchTableRowTransform()
+    ) as ResourceStream<TableRow>;


Errors are handled by the consumer of the stream and when used internally like here, we handle the errors.

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

feat: add wrapper for reading table data using Storage API

485b33d

product-auto-label bot added size: l Pull request size is large. api: bigquerystorage Issues related to the googleapis/nodejs-bigquery-storage API. labels Mar 25, 2024

alvarowolfx added 9 commits March 28, 2024 16:59

feat: parse arrow record batches and convert to TableRow

39d3370

fix: set arrow to v14

1ef1c9b

feat: add bigquery as dep

4bf1511

feat: remove dep on @google-cloud/bigquery

45a4afa

fix: Stream.toArray not available on node < 17

aa57c03

fix: add paginator dep

72afe00

fix: lint issues

7a5847a

feat: move to stream transform instead of implementing Readable

6b98711

Merge branch 'main' into feat-storage-read-veneer

77fff01

alvarowolfx mentioned this pull request May 31, 2024

feat: use Storage Read API for faster data fetching googleapis/nodejs-bigquery#1368

Draft

feat: modular arrow streams and transforms

98546f3

alvarowolfx mentioned this pull request Jul 11, 2024

Memory Issue with bigQuery.createQueryStream in Node.js googleapis/nodejs-bigquery#1392

Closed

docs: update doc strings

bd67c85

product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Jul 17, 2024

alvarowolfx added 9 commits July 16, 2024 21:07

fix: lint issues

aaeb3bd

Merge branch 'main' into feat-storage-read-veneer

9fa976a

fix: read rows sample

7382c34

test: arrow transforms

63805b3

test: add reader package tests

182d323

fix: rollback arrow to v14

ac0a018

fix: add node 14 pollyfil for array.at

26d5b95

fix: properly close connection

456f2d4

fix: lint issue

a02c634

alvarowolfx marked this pull request as ready for review July 24, 2024 20:26

alvarowolfx requested a review from a team as a code owner July 24, 2024 20:26

fix: address pr comments and add bigger table test

898ae4b

alvarowolfx commented Sep 3, 2024

View reviewed changes

shollyman approved these changes Sep 4, 2024

View reviewed changes

Merge branch 'main' into feat-storage-read-veneer

6a86580

alvarowolfx added the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 6, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 6, 2024

gcf-owl-bot bot and others added 2 commits September 6, 2024 15:03

🦉 Updates from OwlBot post-processor

872f0f3

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

Merge branch 'main' into feat-storage-read-veneer

052713b

alvarowolfx added automerge Merge the pull request once unit tests and other checks pass. and removed automerge Merge the pull request once unit tests and other checks pass. labels Sep 20, 2024

build: update types/node to fix build

94886cc

alvarowolfx added the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 20, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 20, 2024

alvarowolfx added automerge Merge the pull request once unit tests and other checks pass. owlbot:run Add this label to trigger the Owlbot post processor. labels Sep 20, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 20, 2024

Merge branch 'main' into feat-storage-read-veneer

6b9f68c

alvarowolfx added the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 20, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 20, 2024

alvarowolfx removed the automerge Merge the pull request once unit tests and other checks pass. label Sep 20, 2024

leahecole approved these changes Sep 23, 2024

View reviewed changes

Merge branch 'main' into feat-storage-read-veneer

67baeca

alvarowolfx requested a review from a team as a code owner September 23, 2024 17:45

alvarowolfx added automerge Merge the pull request once unit tests and other checks pass. owlbot:run Add this label to trigger the Owlbot post processor. labels Sep 23, 2024

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 23, 2024

gcf-merge-on-green bot merged commit 03f2b1f into googleapis:main Sep 23, 2024
14 checks passed

gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Sep 23, 2024

release-please bot mentioned this pull request Sep 23, 2024

chore(main): release 4.10.0 #479

Merged

release-please bot mentioned this pull request Aug 13, 2025

chore(main): release 5.1.1 #583

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add wrapper for reading table data using Storage API #431

feat: add wrapper for reading table data using Storage API #431

Uh oh!

alvarowolfx commented Mar 25, 2024 •

edited

Loading

Uh oh!

alvarowolfx commented Jun 4, 2024

Uh oh!

alvarowolfx left a comment

Uh oh!

alvarowolfx Sep 3, 2024

Uh oh!

alvarowolfx Sep 3, 2024

Uh oh!

Uh oh!

Uh oh!

feat: add wrapper for reading table data using Storage API #431

feat: add wrapper for reading table data using Storage API #431

Uh oh!

Conversation

alvarowolfx commented Mar 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alvarowolfx commented Jun 4, 2024

Uh oh!

alvarowolfx left a comment

Choose a reason for hiding this comment

Uh oh!

alvarowolfx Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

alvarowolfx Sep 3, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

alvarowolfx commented Mar 25, 2024 •

edited

Loading