Add plotting functionality#25
Conversation
|
@treigerm Have you looked at earthkit-plots before? |
| ) | ||
|
|
||
| df = rename_error_bounds(df, bound_names) | ||
| normalized_df = normalize(df, bound_normalize="mid") |
There was a problem hiding this comment.
All the metrics can vary by several order of magnitudes between different variables. If we don't normalize them somehow then the variable with the largest magnitude will just dominate. So we need some way to bring the metrics of all variables onto the same scale.
There was a problem hiding this comment.
Thanks for the explanation - this needs to also be clear in the plot. What I usually like to do is show both the normalised version and the actual values that we normalise against (e.g. in a second side subplot)
There was a problem hiding this comment.
For the aggregate plots I have started to rename the axes from "Normalized {metric}" to "Median {metric} Relative to SZ3". I'll change the naming of the variables as way because "normalized" is a bit of a misnomer on second thought, it's more that it converts all the metrics into relative measurements.
Yeah I have been starting to make plots which just plot the raw compression ratios and errors for each variable (this is just for a single error bound):
There was a problem hiding this comment.
Any other ideas to visualise the results are greatly appreciated! I think we'll always need several plots to interpret the results.
| for metric in ["Normalized_MAE", "Normalized_DSSIM", "Normalized_MaxAbsError"]: | ||
| plot_aggregated_rd_curve( | ||
| normalized_df, | ||
| plots_path / f"rd_curve_{metric.lower()}.png", |
There was a problem hiding this comment.
Can we save plots as vector graphics in general (and if the plots contain maps then we can still rasterize just the map content, but especially for all text and lines storing to e.g. PDF is just better for look and file size)
There was a problem hiding this comment.
I kept the plots of the maps as PNG for now but converted the rest to PDF. The maps for the individual variables are only meant as a brief visualization anyway to get an idea of how the compressors behave (rather than putting them into presentations/papers).
|
@juntyr thanks for the review! I have addressed your comments. I haven't seen earthkit-plots before it looks great! I think for any publications we will need some more polished versions of the "input - output - error" plots so we want to adjust those plots at some point. Especially for the high-resolution fields it will make sense to zoom into some local regions as well (rather than the whole global view). |
|
I think now would be a good time to start maintaining a current-state-of-results for the e.g. the tiny datasets. We should create a top-to-bottom script or notebook that handles all tiny dataset loading, compression, metrics, and plotting. For each change to benchmark behaviour, we should rerun the notebook, keeping existing outputs where possible and regenerating the ones that were affected. This notebook could live in a new repo that we never make public but that always shows us where we're at and how the outputs evolve |
|
Later we could then automate this process, but for now the semi-manual notebook seems like a good approach |
|
Thanks @juntyr, made the final adjustments. I agree with the push to have a script/notebook which allows us to reproduce the current results. I think it would be good to have a way to specify a processing pipeline. Right now, the whole process of downloading the data to making plots probably would take a couple of hours. With the different scripts we have already broken up into different scripts but when running the pipeline it would be good to have mechanisms in place that we only re-run parts of the pipeline if we only change a part of it, etc.. |
juntyr
left a comment
There was a problem hiding this comment.
LGTM, feel free to squash and merge :)

This adds the code to create the plots I had sent in Slack. I have adapted it based on how the other scripts were adapted in #23 . I haven't tried to run it in the online lab, it might need further refactoring to easily create the plots in the online lab.