Add diagnostic to check that all grid nodes connect to mesh in g2m graph#59
Conversation
|
Please have a look at what you actually are trying to merge here. It is much more appreciated (from my side at least) that you work carefully on fewer PRs than to do many things a bit sloppy, which just creates more reviewing work. |
68e1c18 to
d75c272
Compare
|
Hi @joeloskarsson, I sincerely apologize for the messy commit history. You are completely right. I mistakenly used I completely understand how that creates unnecessary review overhead. I have just force-pushed a clean history to this branch that strips out the unrelated files. It now contains a single, focused commit with only the |
leifdenby
left a comment
There was a problem hiding this comment.
I've made some suggestions for how to improve this :) Thanks for working on it. Much easier to review now
|
Something I have found incredibly useful in these situations is to actually plot the disconnected nodes (or rather plot all nodes, with the disconnected ones marked) when this happens. I think that is something we should really add as well, at least optionally. |
|
I read your comment on my PR about m2g, that isolated mesh nodes in m2g can actually be desirable in LAM models where the mesh covers an interior+boundary area. That's a really important distinction across the graph types (g2m, m2m, m2g) and something I hadn't fully considered when building my warning logic. With that in mind, I'd love to contribute based on everything discussed across both PRs, I think the following would be valuable additions:
That said, I'm not sure raising a hard error is the right approach for any graph type until we also provide a way to actually fix the problem. For example, by automatically increasing the connection radius, falling back to a nearest-neighbor connection for stranded nodes, or at minimum clearly guiding the user on which parameter to adjust. Throwing an error without a clear recovery path can be frustrating. What do you think? I'd be happy to implement either or both of these. Should I collaborate with @AdMub or make a new PR? |
|
Hi @Diya910, thanks for jumping in and reviewing! I completely agree that adding a plotting feature (as @joeloskarsson suggested) and graph-type-aware warnings would be massive quality-of-life improvements for users. However, following the maintainers' preference for smaller, carefully scoped PRs, I think we should keep this current PR strictly focused on fixing the immediate silent-failure bug for Once we get this foundational safety check merged, I would absolutely love to collaborate with you on a follow-up PR to introduce the visualization features and discuss fallback behaviors! @leifdenby - Thanks for the architectural feedback! I am working right now on extracting this logic into a dedicated |
|
Hi @leifdenby, thanks for the architectural direction! I have extracted the safety assertion into a dedicated The code is now formatted, modularized, and ready for another look! I would love to collaborate with @joeloskarsson and @Diya910 on a follow-up PR to add the plotting features once this foundational check is merged. |
leifdenby
left a comment
There was a problem hiding this comment.
I think you might have rushed this, the refactor still includes some checks in the create_all_graph_components call
|
Hi @leifdenby, you are completely right, I definitely rushed that last push and failed to fully clean up I have just pushed an update that entirely removes the duplicate logic block from |
leifdenby
left a comment
There was a problem hiding this comment.
Getting there :) you have still left files not related to fixing the issue. Also please add a changelog entry
| graph_components["g2m"] = G_g2m | ||
|
|
||
| # Run safety assertion to catch isolated grid nodes | ||
| check_g2m_connectivity(G_g2m) |
There was a problem hiding this comment.
Maybe you could rename the function as check_for_unconnected_grid_nodes or something like that? Then we don't need the comment either :)
There was a problem hiding this comment.
did you mean to leave this file in the PR? looks like experimentation to me
There was a problem hiding this comment.
did you mean to leave this file in the PR? looks like experimentation to me
There was a problem hiding this comment.
did you mean to leave this file in the PR? looks like experimentation to me
There was a problem hiding this comment.
This shoulnd't be in the PR either
|
Once @AdMub has addressed the changes I requested it would be helpful with a review from you @joeloskarsson just to check that you also think the issue you raised has been resolved :) thank you |
|
Hi @AdMub, thanks for pushing this forward! Building on @leifdenby's review, I wanted to flag a quick architectural edge-case that will likely cause some CI test failures. While the new ValueError for zero-degree grid nodes is exactly what we want for production LAMs, it will break several fixtures in tests/test_graph_creation.py (like test_create_lat_lon). These tests use create_fake_irregular_coords(), which intentionally generate spread-out coordinates that will fail the mesh radius check. If you look at @joeloskarsson's original prototype branch for Issue #42, he solved this by threading an allow_zero_degree=False flag through create_all_graph_components(). We'll likely need to add that flag to the new module so those specific tests can bypass the assertion and keep the CI green. |
|
Hi @leifdenby and @ArnabTechiee, Thank you both for the guidance! I have completely purged those accidental scratchpad files from the branch history, renamed the function to @ArnabTechiee - Fantastic catch regarding the CI test fixtures! I have implemented an The PR should now be totally clean and ready for review! |
|
Hi @leifdenby and @joeloskarsson, just a quick ping! I noticed a merge conflict had popped up in The PR is fully clean, the scratchpad files are gone, the module is extracted, and the |
… resolve NetworkXError
leifdenby
left a comment
There was a problem hiding this comment.
I've made a suggestion to the API - what do you think?
| graph_crs: pyproj.crs.CRS | None = None, | ||
| decode_mask: Iterable[bool] | None = None, | ||
| return_components: bool = False, | ||
| allow_unconnected_grid_nodes: bool = False, |
There was a problem hiding this comment.
This is a design choice of course, but I would actually prefer that we don't introduce any guarantees/checks on the graph connectivity into create_all_graph_components since that function already has many (possibly too many) arguments. Instead I think we should have a submodule that handles these checking for these "graph health" issues.
So I would instead make this a two-steps process:
- Call
create_all_graph_componentswhich returns anetworkx.DiGraphfor the entire graph - Call
check_graph_consistency(graph, allow_unconnected_grid_nodes=False)will raise an exception or returnFalseif the graph appears to have issues (although in that case the function should be called somethinggraph_has_consistency_errors(...)or something like that.
There was a problem hiding this comment.
My reason for this that there are many aspects one could imagine implementing for checking the "health" of the graph, and I think we are likely to add more over time. Instead of having to add an argument every time we think of a new way of measuring that the graph is good/bad we can separate that out into a different module and function.
|
Hi @leifdenby, this is a brilliant architectural point! You are completely right, bloating I love the idea of a standalone I will gladly remove the |
@AdMub great! Let's just rename your |
|
Hi @leifdenby, the refactor is complete!
The core builder API is perfectly clean again, and this sets up a great foundation for the diagnostics module to handle future health metrics. Let me know if it looks good to merge! |
leifdenby
left a comment
There was a problem hiding this comment.
Two minor tweaks and then this is a bulls-eye 🎯 :)
There was a problem hiding this comment.
I think it would make more sense to have this "diagnostics" module just in weather_model_graphs.diagnostics" rather than weather_model_graphs.create.diagnostics` - what do you think? People might want to apply diagnostics to a graph they have loaded but just created.
|
|
||
| ### Added | ||
|
|
||
| - Added a safety assertion in g2m graph creation to ensure all grid nodes connect to the mesh (#42). |
There was a problem hiding this comment.
this isn't quite right since the consistency check isn't applied during graph creation anymore but is a separate tool and step. Also, I would explicitly mention the module and function name, say wmg.diagnostics.check_graph_consistency
|
Done! 🎯 I used |
|
Well done @AdMub 🚀 thanks for your contribution! |
Describe your changes
This PR adds a safety assertion in
create_all_graph_componentsafter the creation of the Grid-to-Mesh (g2m) graph.Previously, if
g2m_connectivity_kwargscontained a connection radius that was too small, or if the mesh was too sparse, grid nodes could be left entirely disconnected from the mesh. This would fail silently, leading to unconnected grid nodes being passed into the model.The new logic explicitly checks the degree of every grid node in the
G_g2mgraph. If any grid nodes have a degree of0, it raises a descriptiveValueErrorexplaining that the grid nodes failed to connect to the mesh, helping the user debug their radius/resolution parameters.Testing performed:
Verified locally by generating a 10x10 grid and passing an intentionally small
max_distof0.1forg2m_connectivity.Output:
Issue Link
Closes #42
Type of change
Checklist before requesting a review
pullwith--rebaseoption if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
Checklist for assignee
added,changedorfixed)