Skip to content

Add plotting functionality#25

Merged
treigerm merged 5 commits into
mainfrom
plot_metrics
Apr 22, 2025
Merged

Add plotting functionality#25
treigerm merged 5 commits into
mainfrom
plot_metrics

Conversation

@treigerm
Copy link
Copy Markdown
Member

This adds the code to create the plots I had sent in Slack. I have adapted it based on how the other scripts were adapted in #23 . I haven't tried to run it in the online lab, it might need further refactoring to easily create the plots in the online lab.

@treigerm treigerm requested a review from juntyr April 16, 2025 08:57
Comment thread pyproject.toml Outdated
@juntyr
Copy link
Copy Markdown
Collaborator

juntyr commented Apr 16, 2025

@treigerm Have you looked at earthkit-plots before?

Comment thread src/climatebenchpress/compressor/plotting/plot_metrics.py Outdated
)

df = rename_error_bounds(df, bound_names)
normalized_df = normalize(df, bound_normalize="mid")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the normalisation?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the metrics can vary by several order of magnitudes between different variables. If we don't normalize them somehow then the variable with the largest magnitude will just dominate. So we need some way to bring the metrics of all variables onto the same scale.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation - this needs to also be clear in the plot. What I usually like to do is show both the normalised version and the actual values that we normalise against (e.g. in a second side subplot)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the aggregate plots I have started to rename the axes from "Normalized {metric}" to "Median {metric} Relative to SZ3". I'll change the naming of the variables as way because "normalized" is a bit of a misnomer on second thought, it's more that it converts all the metrics into relative measurements.

Yeah I have been starting to make plots which just plot the raw compression ratios and errors for each variable (this is just for a single error bound):

image

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other ideas to visualise the results are greatly appreciated! I think we'll always need several plots to interpret the results.

for metric in ["Normalized_MAE", "Normalized_DSSIM", "Normalized_MaxAbsError"]:
plot_aggregated_rd_curve(
normalized_df,
plots_path / f"rd_curve_{metric.lower()}.png",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we save plots as vector graphics in general (and if the plots contain maps then we can still rasterize just the map content, but especially for all text and lines storing to e.g. PDF is just better for look and file size)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the plots of the maps as PNG for now but converted the rest to PDF. The maps for the individual variables are only meant as a brief visualization anyway to get an idea of how the compressors behave (rather than putting them into presentations/papers).

Comment thread src/climatebenchpress/compressor/plotting/variable_plotters.py Outdated
@treigerm
Copy link
Copy Markdown
Member Author

@juntyr thanks for the review! I have addressed your comments. I haven't seen earthkit-plots before it looks great! I think for any publications we will need some more polished versions of the "input - output - error" plots so we want to adjust those plots at some point. Especially for the high-resolution fields it will make sense to zoom into some local regions as well (rather than the whole global view).

Copy link
Copy Markdown
Collaborator

@juntyr juntyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With just one more nit, this should be good to go! Thanks @treigerm!

Comment thread src/climatebenchpress/compressor/plotting/variable_plotters.py
@juntyr
Copy link
Copy Markdown
Collaborator

juntyr commented Apr 17, 2025

I think now would be a good time to start maintaining a current-state-of-results for the e.g. the tiny datasets. We should create a top-to-bottom script or notebook that handles all tiny dataset loading, compression, metrics, and plotting. For each change to benchmark behaviour, we should rerun the notebook, keeping existing outputs where possible and regenerating the ones that were affected. This notebook could live in a new repo that we never make public but that always shows us where we're at and how the outputs evolve

@juntyr
Copy link
Copy Markdown
Collaborator

juntyr commented Apr 17, 2025

Later we could then automate this process, but for now the semi-manual notebook seems like a good approach

@treigerm
Copy link
Copy Markdown
Member Author

Thanks @juntyr, made the final adjustments.

I agree with the push to have a script/notebook which allows us to reproduce the current results.

I think it would be good to have a way to specify a processing pipeline. Right now, the whole process of downloading the data to making plots probably would take a couple of hours. With the different scripts we have already broken up into different scripts but when running the pipeline it would be good to have mechanisms in place that we only re-run parts of the pipeline if we only change a part of it, etc..

Copy link
Copy Markdown
Collaborator

@juntyr juntyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to squash and merge :)

@treigerm treigerm merged commit 5552506 into main Apr 22, 2025
3 checks passed
@treigerm treigerm deleted the plot_metrics branch April 22, 2025 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants