Skip to content

Add notebook that explains neural-lam pytorch tensors graph disk format#52

Open
leifdenby wants to merge 14 commits into
mllam:mainfrom
leifdenby:doc/neural-lam-graphs
Open

Add notebook that explains neural-lam pytorch tensors graph disk format#52
leifdenby wants to merge 14 commits into
mllam:mainfrom
leifdenby:doc/neural-lam-graphs

Conversation

@leifdenby
Copy link
Copy Markdown
Member

@leifdenby leifdenby commented Oct 28, 2025

Describe your changes

Add notebook that describes the current neural-lam pickled pytorch tensors graph disk format. I feel like this documentation is currently missing, and since we write to this format in weather-model-graphs I think this would be a place where we could document it. This documentation could maybe also exist within neural-lam itself. But I feel like having this notebook could be a starting point at least.

NOTE: The notebook also shows that there is currently a bug in the index ordering in adjacency lists stored into pickle pytorch Tensors.

No change of dependencies needed.

Issue Link

n/a

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)
  • Once the PR is ready to be merged, squash commits and merge the PR.

@leifdenby leifdenby changed the title Doc/neural lam graphs Add notebook that explains neural-lam pytorch tensors graph disk format Oct 28, 2025
@leifdenby
Copy link
Copy Markdown
Member Author

leifdenby commented Oct 28, 2025

I've cleared the notebook output, but here's a rendered file from the most recent commit leifdenby@daa5f02:
saving_graphs_for_neural_lam.html

Creating these plots @joeloskarsson (see end of notebook) I think what is currently implemented when saving to neural-lam format actually has a bug, the one which you're fixing in https://github.com/mllam/weather-model-graphs/pull/34/files#diff-bdf567174bcb3d8da552f28644f860a91c5019a07b6238b7d85e35bd346c9edbL118. What do you think?

@leifdenby
Copy link
Copy Markdown
Member Author

Adding link to comment @joeloskarsson made which I think is important here:

assumptions wrt node index order and subgraph order (in terms of m2g, g2m and m2m, and its levels) that neural-lam assumes?

@joeloskarsson
Copy link
Copy Markdown
Contributor

This should be very useful indeed! Looking at the output there seems to a bug yes. It could very well be the label sorting that's missing, that you linked to as fixed in #34. Have you tried rebasing this on top of that PR and see if you have the same issue?

Comment thread docs/saving_graphs_for_neural_lam.ipynb Outdated
"cell_type": "markdown",
"metadata": {},
"source": [
"`neural-lam` expects the _subgraphs_ that are individually used for subsequent message passing operations to be saved separately. For the Keisler graph, this means that we need to split the graph into subgraphs that do the mesh-to-grid (`m2g`), mesh-to-mesh (`m2m`), grid-to-mesh (`g2m`), and grid-to-grid (`g2g`) operations. In `weather-model-graphs` edges have a `component` attribute that indicates which of these operations they belong to, so we can use this to split the graph."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grid-to-grid? What would that be? I think this is mentioned at other places in the notebook as well, but maybe just a typo.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that was a typo 😳 thanks for catching this!

Co-authored-by: Joel Oskarsson <joel.oskarsson@outlook.com>
@leifdenby
Copy link
Copy Markdown
Member Author

Have you tried rebasing this on top of that PR and see if you have the same issue?

No, I will try that and see :) thanks

@leifdenby leifdenby added this to the v0.4.0 (proposed) milestone Oct 29, 2025
@leifdenby
Copy link
Copy Markdown
Member Author

leifdenby commented Oct 29, 2025

Now that #34 is in main (with the sorting that you implemented @joeloskarsson) I merged main into my branch for this work.

The edges in the loaded files (neural-lam pickled pytorch tensor files) was previously:
Screenshot 2025-10-29 at 19 49 54

Now it looks like this:
billede

And here the edges from the networkx.DiGraph as defined by the node labels:
billede

So I think there is still something going wrong since the mesh indexes for the m2m subgraph edges start at zero (of course this might not matter currently in neural-lam since we always offset the indexes so that the first index is 0, and then assume that the index values are consecutive - that's right, right?)

I think a bit more digging is needed here...

I've included the cell outputs in the notebook in the most recent commit (a7a8e9e).

@joeloskarsson
Copy link
Copy Markdown
Contributor

Ok, I think that maybe we viewed different things as wrong. Or maybe I just did not pay close enough attention. I was mostly worried that not all mesh nodes seemed to be part of an edge in g2m. But maybe that was just how you set up the graph?

If I understand you correctly, you view the problem as m2m being saved with the first mesh node having index 0 in edge_index? But I don't think this is something you can get away from, if converting from networkx to pyg using pyg.utils. There are no grid nodes in the m2m graph, so naturally the first mesh node will have index 0. I suppose you could offset all mesh node indices with num_grid_nodes in the tensor, but that seems like a bit of a hack. But why is this a problem? As you wrote, neural-lam anyhow needs the m2m node indices to start with 0, and if they don't we now reindex them there.

@joeloskarsson joeloskarsson modified the milestones: v0.4.0 (proposed), v0.4.0 Nov 10, 2025
@leifdenby
Copy link
Copy Markdown
Member Author

leifdenby commented Mar 8, 2026

There are no grid nodes in the m2m graph, so naturally the first mesh node will have index 0. I suppose you could offset all mesh node indices with num_grid_nodes in the tensor, but that seems like a bit of a hack. But why is this a problem? As you wrote, neural-lam anyhow needs the m2m node indices to start with 0, and if they don't we now reindex them there.

I had expected that the node-indices in the tensors saved to disk would be globally unique integers. Rereading what you've written here makes me realise we should document as part of the spec in mllam/neural-lam#323 the node-index values in the adjacency-list tensors and how they relate to the indices of the feature tensors (in effect what we are discussing here: mllam/neural-lam#323 (comment))

I think it would make most sense if the way we reference nodes in the edge_index values was if each nodeset (e.g. grid or mesh nodes) would start with the first node being at index value 0 and the last at N_grid-1 (for grid nodes) or N_mesh-1 (for mesh nodes). That isn't what is happening in the image I posted above (middle one) where I have loaded the tensor files produced by wmg. In that image for example the g2m edges the node-index for the mesh (g2m dst) nodes starts at 12 rather than 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants