Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add DataFrame.to_pandas_batches() to download large DataFrame objects #136

Merged
merged 23 commits into from
Oct 26, 2023

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Oct 24, 2023

Builds on #132

Towards internal issue 280662868
🦕

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Oct 24, 2023
@tswast tswast force-pushed the b280662868-to_pandas_batches branch from 68f50ef to 359a90c Compare October 26, 2023 16:21
@tswast tswast changed the base branch from main to b280662868-to_arrow October 26, 2023 16:21
@tswast tswast force-pushed the b280662868-to_pandas_batches branch from 359a90c to 1a7b2d7 Compare October 26, 2023 16:23
@product-auto-label product-auto-label bot added size: s Pull request size is small. and removed size: l Pull request size is large. labels Oct 26, 2023
Base automatically changed from b280662868-to_arrow to main October 26, 2023 16:48
@tswast tswast marked this pull request as ready for review October 26, 2023 17:15
@tswast tswast requested review from a team as code owners October 26, 2023 17:15
@tswast tswast requested a review from ashleyxuu October 26, 2023 17:15
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Oct 26, 2023
"""
if self.index_columns:
df.set_index(list(self.index_columns), inplace=True)
df.index.names = self.index.names # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious what's the error here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bigframes/core/blocks.py:434: error: Incompatible types in assignment (expression has type "Sequence[Hashable]", variable has type "list[str]") [assignment]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though list[str] is a sequence of hashable objects, I think maybe type covariance can't kick in. I'll try with some casts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I had it opposite. pandas says "names" is list[str], but actually I don't think that's true.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed pandas-dev/pandas-stubs#804 and commented with an explanation.

@tswast tswast added the automerge Merge the pull request once unit tests and other checks pass. label Oct 26, 2023
@gcf-merge-on-green gcf-merge-on-green bot merged commit 3afd4a3 into main Oct 26, 2023
@gcf-merge-on-green gcf-merge-on-green bot deleted the b280662868-to_pandas_batches branch October 26, 2023 23:02
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Oct 26, 2023
ashleyxuu pushed a commit that referenced this pull request Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants