Skip to content

Conversation

@kylebarron
Copy link
Owner

@kylebarron kylebarron commented Nov 13, 2024

This takes just 1.1s for the stream to start and then 1.0s more for the first record batch to be fetched. While it's >60s for the full file to download on my internet.

from time import time

t0 = time()
url = "https://overturemaps-us-west-2.s3.amazonaws.com/release/2024-03-12-alpha.0/theme=buildings/type=building/part-00217-4dfc75cd-2680-4d52-b5e0-f4cc9f36b267-c000.zstd.parquet"
store = HTTPStore.from_url(url)
stream = await read_parquet_async("", store=store)
t1 = time()
first = await stream.__anext__()
t2 = time()

print(t1 - t0) # 1.1302871704101562
print(t2 - t1) # 1.0420188903808594

@kylebarron kylebarron enabled auto-merge (squash) November 13, 2024 22:28
@kylebarron kylebarron disabled auto-merge November 13, 2024 22:28
@kylebarron kylebarron marked this pull request as draft November 13, 2024 22:28
@kylebarron
Copy link
Owner Author

superseded by #313

@kylebarron kylebarron closed this Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants