This repository was archived by the owner on Jun 2, 2025. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 12
This repository was archived by the owner on Jun 2, 2025. It is now read-only.
Add Visualization Options #231
Copy link
Copy link
Closed as not planned
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed
Description
Detailed Description
We want to be able to easily see what our batches look like and have utilities that plot them to help with debugging and ensuring that our pipelines are doing what we expect.
We have had multiple one-off visualization scripts before, but the goal of this is to build them into datapipes, and ideally keep them up to date, and possibly run them on PRs to give a quick, automatic view if any of the datapipes are changed or updated.
I think the steps would be
- Make
visualizationmodule in datapipes - Add visualizing a whole example of all modalities as an image
- Add visualizing examples as little videos (to see the timeseries in the videos)
- Add option to save out batches in more interpretable format (i.e. NetCDF or something that keeps coordinates and the like, vs PyTorch tensors)
Possible Implementation
Satip used to have a step in the workflows that ran visualization code of the outputs of some processing steps on PRs, it was quite helpful to know if changes broke end-to-end processing pipelines, and for the images coming out still looked correct.
Notes
Goal:
- to show what is in the batches right before the model runs
- To show in training what is going in at any timestep
- User can step through periods
- Time and space is aligned
Users: - ML team only
- Prototype examples
- NWP data wasn’t aligned with GSP data - James found this when plotting these out
- Early on, Jacob found the satellite data was 500 km off
Effort to build: - Make it so people don’t need to rebuild anything from scratch
- Build something a bit less ad-hoc than before
Effort to run: - Hopefully takes someone <1 min to run this from Datapipes
- It would be useful for training & production use cases
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgood first issueGood for newcomersGood for newcomershelp wantedExtra attention is neededExtra attention is needed