Skip to content

Add diagnostic to check that all grid nodes connect to mesh in g2m graph#59

Merged
leifdenby merged 18 commits into
mllam:mainfrom
AdMub:fix/issue-42-node-assertion
Mar 24, 2026
Merged

Add diagnostic to check that all grid nodes connect to mesh in g2m graph#59
leifdenby merged 18 commits into
mllam:mainfrom
AdMub:fix/issue-42-node-assertion

Conversation

@AdMub
Copy link
Copy Markdown
Contributor

@AdMub AdMub commented Feb 20, 2026

Describe your changes

This PR adds a safety assertion in create_all_graph_components after the creation of the Grid-to-Mesh (g2m) graph.

Previously, if g2m_connectivity_kwargs contained a connection radius that was too small, or if the mesh was too sparse, grid nodes could be left entirely disconnected from the mesh. This would fail silently, leading to unconnected grid nodes being passed into the model.

The new logic explicitly checks the degree of every grid node in the G_g2m graph. If any grid nodes have a degree of 0, it raises a descriptive ValueError explaining that the grid nodes failed to connect to the mesh, helping the user debug their radius/resolution parameters.

Testing performed:
Verified locally by generating a 10x10 grid and passing an intentionally small max_dist of 0.1 for g2m_connectivity.

Output:

ValueError: 100 grid node(s) are not connected to any mesh nodes in the g2m graph. This usually happens if the connection radius is too small or the mesh resolution is too sparse.

Issue Link

Closes #42

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section reflecting type of change (add section where missing)

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)

@joeloskarsson
Copy link
Copy Markdown
Contributor

Please have a look at what you actually are trying to merge here. It is much more appreciated (from my side at least) that you work carefully on fewer PRs than to do many things a bit sloppy, which just creates more reviewing work.

@AdMub AdMub force-pushed the fix/issue-42-node-assertion branch from 68e1c18 to d75c272 Compare February 27, 2026 16:56
@AdMub
Copy link
Copy Markdown
Contributor Author

AdMub commented Feb 27, 2026

Hi @joeloskarsson, I sincerely apologize for the messy commit history. You are completely right. I mistakenly used git add . on my local machine and accidentally pushed my local visualization/scratchpad scripts into this PR alongside the actual assertion fix, severely bloating the file diff.

I completely understand how that creates unnecessary review overhead. I have just force-pushed a clean history to this branch that strips out the unrelated files. It now contains a single, focused commit with only the g2m assertion logic in base.py. I will be much more careful with my staging moving forward!

Copy link
Copy Markdown
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made some suggestions for how to improve this :) Thanks for working on it. Much easier to review now

Comment thread src/weather_model_graphs/create/base.py Outdated
Comment thread convex_hull_verification.png Outdated
Comment thread test_issue_40.py Outdated
Comment thread test_issue_44.py Outdated
@joeloskarsson
Copy link
Copy Markdown
Contributor

Something I have found incredibly useful in these situations is to actually plot the disconnected nodes (or rather plot all nodes, with the disconnected ones marked) when this happens. I think that is something we should really add as well, at least optionally.

@Diya910
Copy link
Copy Markdown

Diya910 commented Mar 1, 2026

@joeloskarsson @leifdenby,

I read your comment on my PR about m2g, that isolated mesh nodes in m2g can actually be desirable in LAM models where the mesh covers an interior+boundary area. That's a really important distinction across the graph types (g2m, m2m, m2g) and something I hadn't fully considered when building my warning logic.

With that in mind, I'd love to contribute based on everything discussed across both PRs, I think the following would be valuable additions:

  1. Visualization of disconnected nodes : As you suggested, optionally plotting all nodes with isolated ones highlighted, which would make debugging radius/resolution parameters much faster than just a count or warning message.

  2. Graph-type-aware behavior : g2m isolated nodes should remain a hard error (as @AdMub has implemented), but for m2g a warning would be more appropriate since isolated mesh nodes there can be valid by design. m2m could follow similar logic.

That said, I'm not sure raising a hard error is the right approach for any graph type until we also provide a way to actually fix the problem. For example, by automatically increasing the connection radius, falling back to a nearest-neighbor connection for stranded nodes, or at minimum clearly guiding the user on which parameter to adjust. Throwing an error without a clear recovery path can be frustrating. What do you think?

I'd be happy to implement either or both of these. Should I collaborate with @AdMub or make a new PR?

@AdMub
Copy link
Copy Markdown
Contributor Author

AdMub commented Mar 1, 2026

Hi @Diya910, thanks for jumping in and reviewing! I completely agree that adding a plotting feature (as @joeloskarsson suggested) and graph-type-aware warnings would be massive quality-of-life improvements for users.

However, following the maintainers' preference for smaller, carefully scoped PRs, I think we should keep this current PR strictly focused on fixing the immediate silent-failure bug for g2m graphs.

Once we get this foundational safety check merged, I would absolutely love to collaborate with you on a follow-up PR to introduce the visualization features and discuss fallback behaviors!

@leifdenby - Thanks for the architectural feedback! I am working right now on extracting this logic into a dedicated connectivity_checks.py module, and I will convert those stray visual scripts into proper Pytest CI tests. I'll ping you once the updated branch is pushed!

@AdMub
Copy link
Copy Markdown
Contributor Author

AdMub commented Mar 5, 2026

Hi @leifdenby, thanks for the architectural direction! I have extracted the safety assertion into a dedicated src/weather_model_graphs/create/connectivity_checks.py module as requested. I also cleaned out the old scratchpad files and replaced them with a proper Pytest suite in tests/test_connectivity_checks.py.

The code is now formatted, modularized, and ready for another look! I would love to collaborate with @joeloskarsson and @Diya910 on a follow-up PR to add the plotting features once this foundational check is merged.

Copy link
Copy Markdown
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might have rushed this, the refactor still includes some checks in the create_all_graph_components call

Comment thread src/weather_model_graphs/create/base.py Outdated
@AdMub
Copy link
Copy Markdown
Contributor Author

AdMub commented Mar 7, 2026

Hi @leifdenby, you are completely right, I definitely rushed that last push and failed to fully clean up base.py after moving the logic into the new module. I apologize for the extra review overhead that caused.

I have just pushed an update that entirely removes the duplicate logic block from base.py. Now, base.py simply calls the isolated function inside connectivity_checks.py, keeping the core generation pipeline clean. I've thoroughly checked the diff this time to ensure no artifacts were left behind.

Copy link
Copy Markdown
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there :) you have still left files not related to fixing the issue. Also please add a changelog entry

Comment thread src/weather_model_graphs/create/base.py Outdated
graph_components["g2m"] = G_g2m

# Run safety assertion to catch isolated grid nodes
check_g2m_connectivity(G_g2m)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could rename the function as check_for_unconnected_grid_nodes or something like that? Then we don't need the comment either :)

Comment thread test_issue_40.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to leave this file in the PR? looks like experimentation to me

Comment thread test_issue_44.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to leave this file in the PR? looks like experimentation to me

Comment thread visualize_fix_40.py Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to leave this file in the PR? looks like experimentation to me

Comment thread convex_hull_verification.png Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shoulnd't be in the PR either

@leifdenby leifdenby self-assigned this Mar 8, 2026
@leifdenby leifdenby requested a review from joeloskarsson March 8, 2026 10:09
@leifdenby
Copy link
Copy Markdown
Member

Once @AdMub has addressed the changes I requested it would be helpful with a review from you @joeloskarsson just to check that you also think the issue you raised has been resolved :) thank you

@leifdenby leifdenby added this to the v0.4.0 milestone Mar 8, 2026
@ArnabTechiee
Copy link
Copy Markdown

Hi @AdMub, thanks for pushing this forward! Building on @leifdenby's review, I wanted to flag a quick architectural edge-case that will likely cause some CI test failures.

While the new ValueError for zero-degree grid nodes is exactly what we want for production LAMs, it will break several fixtures in tests/test_graph_creation.py (like test_create_lat_lon). These tests use create_fake_irregular_coords(), which intentionally generate spread-out coordinates that will fail the mesh radius check.

If you look at @joeloskarsson's original prototype branch for Issue #42, he solved this by threading an allow_zero_degree=False flag through create_all_graph_components(). We'll likely need to add that flag to the new module so those specific tests can bypass the assertion and keep the CI green.

@AdMub
Copy link
Copy Markdown
Contributor Author

AdMub commented Mar 9, 2026

Hi @leifdenby and @ArnabTechiee,

Thank you both for the guidance! I have completely purged those accidental scratchpad files from the branch history, renamed the function to check_for_unconnected_grid_nodes as requested, and added the CHANGELOG.md entry.

@ArnabTechiee - Fantastic catch regarding the CI test fixtures! I have implemented an allow_unconnected_grid_nodes=False bypass flag in create_all_graph_components to ensure the existing sparse-data tests can run without throwing the new ValueError.

The PR should now be totally clean and ready for review!

@AdMub AdMub requested a review from leifdenby March 9, 2026 06:06
@AdMub
Copy link
Copy Markdown
Contributor Author

AdMub commented Mar 15, 2026

Hi @leifdenby and @joeloskarsson, just a quick ping! I noticed a merge conflict had popped up in CHANGELOG.md due to recent merges, so I have just resolved it and updated the branch.

The PR is fully clean, the scratchpad files are gone, the module is extracted, and the allow_unconnected_grid_nodes flag is in place for the CI tests. Let me know if there is anything else you need before this is good to go!

Copy link
Copy Markdown
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a suggestion to the API - what do you think?

Comment thread src/weather_model_graphs/create/base.py Outdated
graph_crs: pyproj.crs.CRS | None = None,
decode_mask: Iterable[bool] | None = None,
return_components: bool = False,
allow_unconnected_grid_nodes: bool = False,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a design choice of course, but I would actually prefer that we don't introduce any guarantees/checks on the graph connectivity into create_all_graph_components since that function already has many (possibly too many) arguments. Instead I think we should have a submodule that handles these checking for these "graph health" issues.

So I would instead make this a two-steps process:

  1. Call create_all_graph_components which returns a networkx.DiGraph for the entire graph
  2. Call check_graph_consistency(graph, allow_unconnected_grid_nodes=False) will raise an exception or return False if the graph appears to have issues (although in that case the function should be called something graph_has_consistency_errors(...) or something like that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reason for this that there are many aspects one could imagine implementing for checking the "health" of the graph, and I think we are likely to add more over time. Instead of having to add an argument every time we think of a new way of measuring that the graph is good/bad we can separate that out into a different module and function.

@AdMub
Copy link
Copy Markdown
Contributor Author

AdMub commented Mar 21, 2026

Hi @leifdenby, this is a brilliant architectural point!

You are completely right, bloating create_all_graph_components with individual bypass flags for every future 'health metric' we invent is a bad design pattern. Separating graph construction from graph validation into a two-step process makes the API significantly cleaner and much more scalable.

I love the idea of a standalone check_graph_consistency(graph, **kwargs) function. In fact, building a comprehensive 'Topological Graph Diagnostics' module to catch isolated nodes and fragmented subgraphs is actually Pillar B of my GSoC draft proposal!

I will gladly remove the allow_unconnected_grid_nodes flag from the base builder. Should I create a new diagnostics.py module to house this check_graph_consistency function, or would you prefer I just rename the connectivity_checks.py file I made earlier to house it?

@AdMub AdMub requested a review from leifdenby March 21, 2026 17:07
@leifdenby
Copy link
Copy Markdown
Member

I will gladly remove the allow_unconnected_grid_nodes flag from the base builder. Should I create a new diagnostics.py module to house this check_graph_consistency function, or would you prefer I just rename the connectivity_checks.py file I made earlier to house it?

@AdMub great! Let's just rename your connectivity_checks.py module as diagnostics.py :)

@AdMub
Copy link
Copy Markdown
Contributor Author

AdMub commented Mar 24, 2026

Hi @leifdenby, the refactor is complete!

  1. I resolved the CHANGELOG.md conflict.
  2. Renamed the module to diagnostics.py.
  3. Completely removed the validation logic and the allow_unconnected_grid_nodes flag from create_all_graph_components.
  4. Implemented the standalone check_graph_consistency function in the new module, ensuring it safely handles both merged DiGraph and return_components=True dictionary outputs by filtering for type="grid" nodes.
  5. Updated the Pytest fixtures to use the new two-step (build -> validate) process.

The core builder API is perfectly clean again, and this sets up a great foundation for the diagnostics module to handle future health metrics. Let me know if it looks good to merge!

Copy link
Copy Markdown
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor tweaks and then this is a bulls-eye 🎯 :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make more sense to have this "diagnostics" module just in weather_model_graphs.diagnostics" rather than weather_model_graphs.create.diagnostics` - what do you think? People might want to apply diagnostics to a graph they have loaded but just created.

Comment thread CHANGELOG.md Outdated

### Added

- Added a safety assertion in g2m graph creation to ensure all grid nodes connect to the mesh (#42).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't quite right since the consistency check isn't applied during graph creation anymore but is a separate tool and step. Also, I would explicitly mention the module and function name, say wmg.diagnostics.check_graph_consistency

@AdMub
Copy link
Copy Markdown
Contributor Author

AdMub commented Mar 24, 2026

Done! 🎯 I used git mv to move diagnostics.py to the root weather_model_graphs module so it's easily accessible for loaded graphs, updated the internal test imports, and adjusted the changelog to reflect the new standalone API. Thank you so much for the architectural guidance on this PR, it feels much more robust now!

@AdMub AdMub requested a review from leifdenby March 24, 2026 13:59
@leifdenby leifdenby merged commit f14a771 into mllam:main Mar 24, 2026
3 checks passed
@leifdenby
Copy link
Copy Markdown
Member

Well done @AdMub 🚀 thanks for your contribution!

@leifdenby leifdenby changed the title fix: assert that all grid nodes connect to mesh in g2m graph Add diagnostic to check that all grid nodes connect to mesh in g2m graph Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Assert that all nodes are in g2m

5 participants