Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use rasteret to read COG? #348

Open
TomNicholas opened this issue Dec 15, 2024 · 6 comments
Open

Use rasteret to read COG? #348

TomNicholas opened this issue Dec 15, 2024 · 6 comments
Labels
enhancement New feature or request readers

Comments

@TomNicholas
Copy link
Member

We have made a custom python code for byte-range calculation code based on GDAL’s C++ approach.

We're currently working on an open-source library which will be called “Rasteret”

Sounds like these people may have just written written a COG reader for us

https://blog.terrafloww.com/efficient-cloud-native-raster-data-access-an-alternative-to-rasterio-gdal/

@TomNicholas TomNicholas added enhancement New feature or request readers labels Dec 15, 2024
@maxrjones
Copy link
Member

we look forward to sharing the library and more technical details in an upcoming deep dive blog. Stay tuned!

It's a cool article, I look forward to checking out Rasteret once they open source it.

It'll also be worth considering the differences in byte range serialization approaches and whether VirtualiZarr should support some sort of to_stac option.

@TomNicholas
Copy link
Member Author

TomNicholas commented Jan 11, 2025

Turns out although they haven't published their blog post, you can already see the code of the Rasteret library.

However I'm not sure how useful this is to us in its current form. I found the part that defines byte range requests:

https://github.com/terrafloww/rasteret/blob/main/src%2Frasteret%2Ffetch%2Fcog.py

But this whole library appears to do so many steps in one go that I'm struggling to see how we could pull out only the one part we need.

IIUC they have structured it around the user asking for a specific polygon of a specific data source (e.g. Sentinel2), then immediately turn that into the byte range requests they want then submit those requests. If they had some intermediate file-level abstraction layer in between generating the byte ranges and submitting them then it would be easier for us to relate the COGs to the Zarr model.

@maxrjones
Copy link
Member

ICYMI here's the newer blog post on rasteret - https://blog.terrafloww.com/rasteret-a-library-for-faster-and-cheaper-open-satellite-data-access/

@TomNicholas
Copy link
Member Author

Nice! @print-sid8 it looks like we all have thoughts along very similar lines!

Do you think it's possible for us to import just the part of Rasteret that would be useful to VirtualiZarr (i.e. a function that accepts a url to a COG and returns some structure containing all the metadata, byte ranges and offsets for all the variables in that COG)?

@print-sid8
Copy link

print-sid8 commented Jan 13, 2025

Nice! @print-sid8 it looks like we all have thoughts along very similar lines!

Oh yes we do, I loved learning about Kerchunk a few months ago, and Virtualizarr too ! @norlandrhagen gave a nice talk in the CNG Virtual Conference!

IIUC they have structured it around the user asking for a specific polygon of a specific data source (e.g. Sentinel2), then immediately turn that into the byte range requests they want then submit those requests.

Do you think it's possible for us to import just the part of Rasteret that would be useful to VirtualiZarr (i.e. a function that accepts a url to a COG and returns some structure containing all the metadata, byte ranges and offsets for all the variables in that COG)?

@TomNicholas you got it right. I have written that part of library, the 'fetch' module and 'cog.py' pretty much user request oriented.

You guys should look at 'parser.py' inside 'stac' module.

Specifically this async method called -
parse_cog_headers
This piece of code is the one I basically converted from GDAL C++, which parsers byte ranges, based on the original Geotiff standard/spec.

The only thing it takes is 1 COG URL

It uses 2 other functions/methods, just for sake of sanity I kept it separate.

with that, parse_cog_headers reads everything it can about a COG file, and returns a CogMetadata DataClass.

Hope this helps!

P.S.
I use _get_asset_url early/higher up in Indexer.py, and in Scene.py to created Signed URL for paid buckets, before they reach either parser.py or cog.py.

Thanks for all the the interesting comments in this issue here! Glad to see you guys have been following my blogs for a while , haha!

@TomNicholas
Copy link
Member Author

Thank you for the explanation @print-sid8! This looks very promising.

I won't have time to look at this in any more detail for a couple of weeks, but @maxrjones I know is interested 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request readers
Projects
None yet
Development

No branches or pull requests

3 participants