-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use rasteret to read COG? #348
Comments
It's a cool article, I look forward to checking out Rasteret once they open source it. It'll also be worth considering the differences in byte range serialization approaches and whether VirtualiZarr should support some sort of |
Turns out although they haven't published their blog post, you can already see the code of the Rasteret library. However I'm not sure how useful this is to us in its current form. I found the part that defines byte range requests: https://github.com/terrafloww/rasteret/blob/main/src%2Frasteret%2Ffetch%2Fcog.py But this whole library appears to do so many steps in one go that I'm struggling to see how we could pull out only the one part we need. IIUC they have structured it around the user asking for a specific polygon of a specific data source (e.g. Sentinel2), then immediately turn that into the byte range requests they want then submit those requests. If they had some intermediate file-level abstraction layer in between generating the byte ranges and submitting them then it would be easier for us to relate the COGs to the Zarr model. |
ICYMI here's the newer blog post on rasteret - https://blog.terrafloww.com/rasteret-a-library-for-faster-and-cheaper-open-satellite-data-access/ |
Nice! @print-sid8 it looks like we all have thoughts along very similar lines! Do you think it's possible for us to import just the part of Rasteret that would be useful to VirtualiZarr (i.e. a function that accepts a url to a COG and returns some structure containing all the metadata, byte ranges and offsets for all the variables in that COG)? |
Oh yes we do, I loved learning about Kerchunk a few months ago, and Virtualizarr too ! @norlandrhagen gave a nice talk in the CNG Virtual Conference!
@TomNicholas you got it right. I have written that part of library, the 'fetch' module and 'cog.py' pretty much user request oriented. You guys should look at 'parser.py' inside 'stac' module. Specifically this async method called - The only thing it takes is 1 COG URL It uses 2 other functions/methods, just for sake of sanity I kept it separate.
with that, parse_cog_headers reads everything it can about a COG file, and returns a CogMetadata DataClass. Hope this helps! P.S. Thanks for all the the interesting comments in this issue here! Glad to see you guys have been following my blogs for a while , haha! |
Thank you for the explanation @print-sid8! This looks very promising. I won't have time to look at this in any more detail for a couple of weeks, but @maxrjones I know is interested 😉 |
Sounds like these people may have just written written a COG reader for us
https://blog.terrafloww.com/efficient-cloud-native-raster-data-access-an-alternative-to-rasterio-gdal/
The text was updated successfully, but these errors were encountered: