Skip to content

feat: auto-detect distance metric from CRS via SpatialCoordinateValue…#86

Open
FAbdullah17 wants to merge 22 commits intomllam:mainfrom
FAbdullah17:feature/distance-engine
Open

feat: auto-detect distance metric from CRS via SpatialCoordinateValue…#86
FAbdullah17 wants to merge 22 commits intomllam:mainfrom
FAbdullah17:feature/distance-engine

Conversation

@FAbdullah17
Copy link
Copy Markdown

Describe your changes

Replaces all scipy.spatial.KDTree usage with a new metric-aware spatial index class that automatically selects the correct distance metric (euclidean or haversine) based on the supplied coordinate reference system (CRS). This is the core implementation for #75.

New file: src/weather_model_graphs/spatial.py

  • SpatialCoordinateValuesSelector wraps sklearn.neighbors.BallTree to provide fast k‑nearest‑neighbour and radius queries.
  • Supports "euclidean" (for projected CRS) and "haversine" (for geographic CRS).
  • Haversine:
    • Expects input coordinates as longitude/latitude in degrees (first column = longitude, second = latitude).
    • Internally the BallTree uses [latitude_rad, longitude_rad] and the "haversine" metric.
    • Returned distances are in metres (arc‑length × Earth’s mean radius 6 371 000 m).
  • Factory method for_crs(crs, coords) reads crs.is_geographic to select the metric automatically; falls back to "euclidean" when crs is None.
  • If "haversine" is requested but scikit-learn is not installed, an ImportError with a clear installation hint is raised.

Modified: src/weather_model_graphs/create/base.py

  • Removed import scipy.spatial; added import of SpatialCoordinateValuesSelector.
  • In create_all_graph_components():
    • Build a spatial_index via SpatialCoordinateValuesSelector.for_crs().
    • Emit a UserWarning when a rectilinear mesh kind (flat, flat_multiscale, hierarchical) is combined with a geographic CRS (lat/lon). This alerts users that equally‑spaced lon/lat mesh nodes are not equally spaced on a sphere.
  • In connect_nodes_across_graphs():
    • Added a new optional keyword argument spatial_index: SpatialCoordinateValuesSelector | None.
    • Replaced kdt.query / kdt.query_ball_point with spatial_index.k_nearest_to() / spatial_index.with_radius().
    • Edge "len" attributes are now taken directly from the tree’s returned distances, guaranteeing correctness for both metrics.

Modified: src/weather_model_graphs/create/mesh/kinds/hierarchical.py

  • Removed import scipy; added import of SpatialCoordinateValuesSelector.
  • In create_hierarchical_multiscale_mesh_graph():
    • Added a new distance_metric: str parameter (default "euclidean").
    • Replaced the inter‑level KDTree with a SpatialCoordinateValuesSelector built from the coarser‑level nodes; the metric is passed down from create_all_graph_components.

Modified: src/weather_model_graphs/__init__.py

  • Exported SpatialCoordinateValuesSelector at package top‑level for easy access.

Modified: pyproject.toml

  • Added an optional extras group [global] that installs scikit-learn>=1.3.0.
    Users can now install with pip install weather-model-graphs[global] when they need haversine support for global/geographic graphs. The core package remains lightweight for projected‑only use.

New tests: tests/test_spatial_index.py (27 tests)

  • TestInit – valid/invalid metric, coordinate dtype.
  • TestEuclideanKNearest / TestEuclideanWithRadius – basic Euclidean queries, known distances, sorting, radius behaviour.
  • TestHaversineKNearest / TestHaversineWithRadius – haversine distances in metres, known values (e.g. 10° along equator ≈ 1 111 945 m), radius inclusive/exclusive.
  • TestForCrs – verifies that for_crs returns haversine for geographic CRS (pyproj.CRS("EPSG:4326"), ccrs.Geodetic()) and euclidean for projected CRS (ccrs.LambertConformal(), etc.).
  • TestRectilinearGeographicWarning – ensures the warning is raised when a rectilinear mesh is built with a geographic CRS, and absent for projected CRS.
  • TestIntegrationGraphCreation – smoke‑test building a full graph with a geographic CRS; checks that g2m/m2g edge lengths are in metres and lie in a physically reasonable range (1 km – 2000 km for a ~10° domain).

New documentation: docs/distance_metric_auto_detection.ipynb

  • A complete Jupyter notebook demonstrating:
    • Euclidean distance queries with projected coordinates.
    • Haversine distance queries with lon/lat coordinates (distances in metres).
    • The for_crs factory method applied to several CRS types.
    • Full graph integration example using create_all_graph_components with a geographic CRS.
    • The rectilinear‑mesh warning and how to trigger it.

Backward compatibility

  • All existing code paths default to "euclidean" when no CRS is supplied (graph_crs=None), so existing scripts continue to work unchanged.
  • The new spatial_index argument in connect_nodes_across_graphs() is optional; callers that do not provide it fall back to a Euclidean selector, preserving the original behaviour.

Motivation:
Graphs for global or sparse irregular data need the haversine metric to correctly measure distances on the sphere. Previously the code used only Euclidean distances, which are incorrect for lon/lat coordinates. This change introduces automatic metric selection based on the CRS, making it easy to build correct graphs for any coordinate system.

New dependencies:

  • scikit-learn is now an optional dependency (only needed for haversine). Installed via pip install weather-model-graphs[global].

Issue Link

Closes #75


Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)
  • Once the PR is ready to be merged, squash commits and merge the PR.

…sSelector (mllam#75)Replaces all scipy.spatial.KDTree usage with a new metric-aware spatialindex class that selects euclidean or haversine automatically based onthe supplied coordinate reference system.New file: src/weather_model_graphs/spatial.py- SpatialCoordinateValuesSelector wraps sklearn.neighbors.BallTree- Supports euclidean (projected CRS) and haversine (geographic CRS)- Haversine: expects lon/lat in degrees, returns distances in metres  (arc-length * 6,371,000 m); BallTree internally uses [lat_rad, lon_rad]- Factory method `for_crs(crs, coords)` reads `crs.is_geographic` to pick  the metric automatically; falls back to euclidean when CRS is None- ImportError with install hint when scikit-learn is absent and haversine  is requestedsrc/weather_model_graphs/create/base.py- Remove scipy.spatial import; import SpatialCoordinateValuesSelector- create_all_graph_components(): build spatial_index via for_crs();  emit UserWarning when a rectilinear mesh kind is combined with a  geographic CRS (flat/flat_multiscale/hierarchical + is_geographic)- connect_nodes_across_graphs(): accept spatial_index kwarg; replace  kdt.query/query_ball_point with spatial_index.k_nearest_to/with_radiussrc/weather_model_graphs/create/mesh/kinds/hierarchical.py- Remove scipy import; import SpatialCoordinateValuesSelector- create_hierarchical_multiscale_mesh_graph(): add distance_metric param;  replace inter-level KDTree with SpatialCoordinateValuesSelectorsrc/weather_model_graphs/__init__.py- Export SpatialCoordinateValuesSelector at package top-levelpyproject.toml- Add [global] optional extras group: scikit-learn>=1.3.0- Install with: pip install weather-model-graphs[global]tests/test_spatial_index.py (new, 27 tests)- TestInit, TestEuclideanKNearest, TestEuclideanWithRadius- TestHaversineKNearest, TestHaversineWithRadius- TestForCrs: pyproj.CRS(EPSG:4326) and ccrs.Geodetic() → haversine;  ccrs.PlateCarree() and None → euclidean- TestRectilinearGeographicWarning- TestIntegrationGraphCreation: g2m/m2g edge len values in [1 km, 2000 km]  for a haversine graph built over a real lat/lon domaindocs/distance_metric_auto_detection.ipynb (new)- Walkthrough: euclidean demo → haversine demo → for_crs auto-detection  table → full graph integration example → rectilinear+geographic warningBackward-compatible: all paths default to euclidean when graph_crs=None.
@FAbdullah17
Copy link
Copy Markdown
Author

Hi @leifdenby,

I've opened this PR as discussed in #75. It introduces the SpatialCoordinateValuesSelector class with automatic metric selection (euclidean/haversine) based on CRS, replaces all scipy.spatial.KDTree usage, adds 27 tests, and includes a demonstration notebook. All tests pass locally and the PR is ready for review.

Could you please take a look when you have time? Thanks!

Copy link
Copy Markdown
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is look good! Nice work 🌟

I've added a few suggestions for changes and some questions too.

@leifdenby leifdenby added this to the v0.4.0 (proposed) milestone Mar 4, 2026
@FAbdullah17
Copy link
Copy Markdown
Author

Hi @leifdenby, thank you for the detailed review and the kind words! Please find my responses to each point below.


1. hierarchical.pyd_euclidean line
Good catch. The edge "len" attribute is already being set from the distance returned by spatial_idx.k_nearest_to(), so the d_euclidean line is redundant and will be removed.


2 & 3. spatial.py — native metric distances (radians for haversine)
Great point on the sphere-radius assumption — keeping distances in the metric's native units (radians for haversine) is clearly the more flexible and correct design. Will remove _EARTH_RADIUS_M, _to_output_distances, and _to_tree_radius so that both k_nearest_to and with_radius return raw BallTree units. Docstrings and the integration test will be updated to reflect this.


4. pyproject.tomlscikit-learn as a core dependency
Agreed, that simplifies the install experience considerably. Will move scikit-learn>=1.3.0 into [project.dependencies] and remove the [global] extras group and all associated references.


5 & 6. base.py_in_source_mesh suffix on inner functions
Will restore the _find_neighbour_node_idxs_in_source_mesh suffix on both inner closures to stay consistent with the original naming convention.


7. base.py — clarifying comment on euclidean default
Will add an explanatory comment on the else branch making it explicit that euclidean is the default because the absence of graph_crs implies projected (Cartesian) coordinates.


8. base.py — rename to spatial_coord_selector
Will rename the variable and parameter throughout base.py accordingly.


9. base.py — passing full object vs. metric string
connect_nodes_across_graphs internally rebuilds the index from source-node coordinates, so only the metric string is carried through from the passed object. Since create_hierarchical_multiscale_mesh_graph already follows the distance_metric: str pattern for the same reason, aligning connect_nodes_across_graphs to accept distance_metric: str as well would give both functions a consistent interface, with create_all_graph_components passing spatial_coord_selector.distance_metric to both. Does that direction work for you, or would you prefer keeping the full object? Happy to go either way.


We will begin implementing all of the above once you confirm the preferred approach for point 9. Thank you again!

@leifdenby
Copy link
Copy Markdown
Member

Hi @leifdenby, thank you for the detailed review and the kind words! Please find my responses to each point below.

Could you add your responses directly to my comments instead of in a single comment please? :) thankyou!

…n- hierarchical: use metric-aware distance from SpatialCoordinateValuesSelector for inter-level edge len (remove Euclidean overwrite)- spatial: return native haversine units (radians) for k-nearest and radius queries; remove Earth-radius conversion helpers- base: rename spatial_index to spatial_coord_selector for clarity- base: document Euclidean default when graph_crs is not provided- base: align connect_nodes_across_graphs API with hierarchical path by passing distance_metric string instead of selector object- base: restore _find_neighbour_node_idxs_in_source_mesh helper naming for nearest_neighbours / within_radius paths- pyproject: promote scikit-learn to core dependency, remove [global] extra- tests(spatial): update haversine assertions/docs to radians and adjust integration expectations- tests(windows): fix tempfile PermissionError by using fig.savefig(fileobj) instead of fig.savefig(path) in plotting tests
@FAbdullah17
Copy link
Copy Markdown
Author

Hi @leifdenby ,

Review feedback addressed

Thanks again for the detailed review — I’ve now applied all requested follow-ups.

✅ 1) hierarchical.py: remove Euclidean overwrite

Inter-level edge length now uses the distance returned by SpatialCoordinateValuesSelector directly (d), so the active metric is respected.

✅ 2–3) spatial.py: native haversine units

k_nearest_to and with_radius now use native BallTree haversine units (radians).
Removed _EARTH_RADIUS_M, _to_output_distances, and _to_tree_radius.

✅ 4) pyproject.toml: make scikit-learn core dependency

Moved scikit-learn>=1.3.0 into main dependencies and removed the [global] extra.

✅ 5–6) base.py: restore _in_source_mesh suffix

Restored helper naming to _find_neighbour_node_idxs_in_source_mesh in nearest-neighbours and within-radius paths.

✅ 7) base.py: clarify Euclidean default

Added explicit inline comment that Euclidean is used by default when graph_crs is not provided (projected/Cartesian assumption).

✅ 8) base.py: rename spatial_indexspatial_coord_selector

Renamed variable/usage for clearer semantics.

✅ 9) base.py: pass metric string consistently

connect_nodes_across_graphs now takes distance_metric: str, matching the hierarchical flow.
Callers pass spatial_coord_selector.distance_metric.


these changes are applied in the current commit, please review them and let me know if any further updates required.

Copy link
Copy Markdown
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting really close!

I've added some general comments and suggestion for change. Can you address each comment in separate commits this time and then link to the commit in a reply to my comment please? It was quite hard to track where my comments were addressed so I had to re-review everything :)

Also, general point: Reading this I was thinking that it might be nicer to give distances in degrees rather than radians, what do you think? Radians are pretty hard to reason about, but everyone know that a circle is 360 degrees. Also we're assuming the lat/lon are given in degrees, so it matches the units of the provided geographical coordinates.

G_down.add_nodes_from(G_to.nodes(data=True))

# build kd tree for mesh point pos
# build spatial index for source (coarser) mesh node positions
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be called "spatial index" :) wherever you create a SpatialCoordinateValuesSelector call the variable spatial_coord_selector instead

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 7ca16dd: renamed the SpatialCoordinateValuesSelector variable to spatial_coord_selector in hierarchical.py (including corresponding usage/comment wording).

# sphere, so the mesh node density will vary strongly with latitude.
_is_geographic = getattr(graph_crs, "is_geographic", False)
if _is_geographic and m2m_connectivity in ("flat", "flat_multiscale", "hierarchical"):
warnings.warn(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use loguru for logging throughout, please replace with logger.warn using loguru as elsewhere in the codebase

Copy link
Copy Markdown
Author

@FAbdullah17 FAbdullah17 Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 718afe5: replaced warnings.warn(...) with loguru logger.warning(...) for the rectilinear/geographic notice to match logging style used elsewhere.

Addressed in a4c5a50: updated warning-based tests to assert loguru warning messages after switching from warnings.warn to logger.warning.

"rectilinear (equally-spaced lon/lat) grid, but the graph CRS is "
"geographic. Equally-spaced longitude/latitude values are NOT equally "
"spaced on a sphere — mesh node density will vary with latitude. "
"Consider projecting to a suitable projected CRS (e.g. via graph_crs) "
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove these last two lines. Saying that should provide a "projected CRS" is basically just saying "don't provide a graph_crs that is geographic", but the other lines in your warning are adequate in explaining why that might be a bad idea (given that the only available mesh right now is rectilinear). Also they can't use icosahdral mesh

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f5bb64a: trimmed the rectilinear/geographic warning text by removing the final two suggestion lines, while keeping the core explanation.

Maximum number of neighbours to search for in `G_target` for each node in `G_source`
distance_metric : str, optional
Distance metric used for neighbour search. Supported values are
``"euclidean"`` and ``"haversine"``. Defaults to ``"euclidean"``.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of defaulting to euclidean I think it would be safer to require that this is explicitly provided (i.e. no default)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in e050204: connect_nodes_across_graphs now requires distance_metric explicitly (no default), and direct caller usage was updated accordingly.

assert dists[1] == pytest.approx(expected_rad, rel=1e-4)

def test_distances_are_native_haversine_radians(self):
"""For geographic coords, haversine returns unit-sphere radians."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"unit-sphere radians" -> "radians", it will be radians whatever the radius of the sphere is

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 8da3663 : updated the wording from “unit-sphere radians” to “radians” in tests/test_spatial_index.py.

@FAbdullah17
Copy link
Copy Markdown
Author

Thanks @leifdenby, this is really helpful feedback.

Absolutely — this round I’ll address each review comment in a separate commit, and I’ll reply on each thread with the exact commit link/hash so the mapping is clear and easy to review.

On the units point: degrees are definitely easier to reason about for users, but for the core haversine API I think radians are the more robust technical choice. BallTree(metric="haversine") natively computes/query-filters in radians, so keeping radians at the interface avoids hidden conversions and keeps nearest-neighbour/radius behavior consistent with the underlying implementation.

So my plan is:

  1. keep radians as the core computational contract for correctness/consistency, and
  2. if needed, add a convenience conversion layer in a follow-up so user-facing workflows can opt into degree-friendly inputs/outputs without changing the metric core.

@leifdenby leifdenby self-assigned this Mar 8, 2026
@joeloskarsson joeloskarsson modified the milestones: v0.4.0 (proposed), v0.4.0 Mar 9, 2026
@Raj-Taware
Copy link
Copy Markdown

Hey @leifdenby and @FAbdullah17 ! Great work with this PR.

Was taking a look at this implementation and had a few thoughts :
About the radians point mentioned above, on tracing the with_radius execution path and as @FAbdullah17 said, sklearn BallTree haversine metric strictly evaluates in radians, but there is currently a unit trap here i think.

Currently, connect_nodes_across_graphs passes max_dist (derived from the user's configured mesh_node_distance) directly into self.tree.query_radius(tree_point,r = radius) with no conversion.
Now when this raw value gets pased like for example 3 degrees, it will be interpreted as 3 radians which is a major mismatch. The graph builder will attempt to connect almost every node in the mesh and throw OOM.

Am I following this execution path correctly, or is there a unit conversion for max_dist happening upstream that I might have missed?

If we want the interface to cleanly accept degrees as you suggested, SpatialCoordinateValuesSelector just needs to isolate the sklearn tree by applying np.deg2rad(radius) to the input before query.

@leifdenby
Copy link
Copy Markdown
Member

2. if needed, add a convenience conversion layer in a follow-up so user-facing workflows can opt into degree-friendly inputs/outputs without changing the metric core.

@FAbdullah17 and @Raj-Taware: I see your point. How about we make so that everything external to SpatialCoordinateValuesSelector uses degrees but we internally convert to radians? That way anywhere where in arguments to graph creation are using geographical projections the angle is always assumed to be in degrees?

Copy link
Copy Markdown
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your hard work here! Only very minor suggestions to changes to do and then this can go in 🚀

mesh_node_distance: float,
level_refinement_factor: float,
max_num_levels: int,
distance_metric: str = "euclidean",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we maybe not have a default here? As in force the user to provide xy and a distance metric? You have already helpfully in the docstring described what the options and when people should use either. Or maybe it would be better to have a bool argument called xy_is_geographic and then we construct a distance measurer based on this?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 96f2a99 : removed the default distance_metric in the hierarchical mesh builder so callers must pass it explicitly; kept explicit metric selection in the existing CRS-driven flow rather than introducing a parallel xy_is_geographic switch.

distances = raw_dists.flatten()
return indices, distances

def with_radius(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename this as within_radius

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 854e062 : renamed with_radius to within_radius in SpatialCoordinateValuesSelector and updated call sites/tests accordingly.

Query point (same coordinate convention as for :meth:`k_nearest_to`).
radius : float
Search radius. For ``"euclidean"`` this is in the same units as
*coords*; for ``"haversine"`` this is in **radians**.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I mentioned in my comment I would make this argument be in degrees

tree_point, r=radius, return_distance=True
)
indices = raw_idxs[0]
distances = raw_dists[0]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should convert to degrees here if using the haversine distance since the kdballtree query will return distance in radians

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in bce4d8c : converted haversine within_radius returned distances from radians to degrees (BallTree still computes internally in radians) and added test coverage for degree-valued outputs.

tree_point = self._prepare_query_point(point)
raw_dists, raw_idxs = self._tree.query(tree_point, k=k)
indices = raw_idxs.flatten()
distances = raw_dists.flatten()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert to degrees when using haversine distance

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 8dbfee1 : updated within_radius so the haversine radius argument is now interpreted in degrees at the API boundary (with internal conversion to radians for BallTree), and updated the related tests accordingly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the raw_dists need to be converted back to degrees here so that everything external to SpatialCoordinateValuesSelector is in degrees as you've done for within_radius

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 8c1a610 : converted haversine k_nearest_to returned distances from radians to degrees so it matches the external degree-based API used by within_radius

@FAbdullah17
Copy link
Copy Markdown
Author

@FAbdullah17 and @Raj-Taware: I see your point. How about we make so that everything external to SpatialCoordinateValuesSelector uses degrees but we internally convert to radians? That way anywhere where in arguments to graph creation are using geographical projections the angle is always assumed to be in degrees?

@leifdenby I agree, that's good approach. I'll update SpatialCoordinateValuesSelector so that:

  • External API: all haversine inputs (radius) and outputs (distances) are in degrees.
  • Internal: convert degrees ↔ radians as needed for the BallTree.

This keeps geographic workflows intuitive while preserving the correct radian‑based computations under the hood.
Does this plan sound good? need your approval so that I'll start implementing.

Copy link
Copy Markdown
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor thing to fix :) Also, remember to add a changelog entry

The number of levels in the hierarchical mesh graph.
distance_metric : {'euclidean', 'haversine'}
Distance metric used when computing inter-level nearest-neighbour edges
and storing edge ``"len"`` attributes. Pass ``'haversine'`` when *xy*
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for only picking up on this now, but I think we should write coords rather than xy since xy may give suggestion that we are expecting to get Cartesian grid coordinates, but these routines work on both Cartesian and geographical coordinates.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, let's keep this for now, we can fix this when we merge in #81

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed – let's keep it as is for now and address naming consistently in #81.

tree_point = self._prepare_query_point(point)
raw_dists, raw_idxs = self._tree.query(tree_point, k=k)
indices = raw_idxs.flatten()
distances = raw_dists.flatten()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the raw_dists need to be converted back to degrees here so that everything external to SpatialCoordinateValuesSelector is in degrees as you've done for within_radius

sel = SpatialCoordinateValuesSelector("haversine", simple_geo_coords)
idxs, dists = sel.k_nearest_to([0.0, 0.0], k=1)
assert idxs[0] == 0
assert dists[0] == pytest.approx(0.0, abs=1e-3)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change this test so we expect the value to be non-zero? That way you can check that the value is the expected value in degrees. I.e. query with [-10.0, 0.0] and check the distance in is 10.0 (since the distances should be returned degrees)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 0d75615 : updated the haversine k_nearest_to test to use a non-zero query ([-10.0, 0.0]) and assert a 10.0 degree distance, so the test directly verifies degree-based distance outputs.

sel = SpatialCoordinateValuesSelector("haversine", simple_geo_coords)
idxs, dists = sel.k_nearest_to([0.0, 0.0], k=2)
# nearest is self (0 rad), second is [10, 0] = deg2rad(10)
expected_rad = np.deg2rad(10.0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare with k_nearest_to() you can see that there isn't need to convert from radians to degrees there. These methods should be symmetric, both should return distances in degrees

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in cbd09e9: aligned remaining haversine k_nearest_to/wrap-around/integration expectations and wording to degree-based outputs for symmetry with within_radius.

@FAbdullah17
Copy link
Copy Markdown
Author

One minor thing to fix :) Also, remember to add a changelog entry

Adressed in 97f0e94: add changelog entry for CRS-aware degree-based metric handling

Copy link
Copy Markdown
Member

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are a 🌟 ! Just one last very small suggestion to change the test, using any is a bit too lenient here, let's check the actual values

idxs, dists = sel.within_radius([0.0, 0.0], radius=radius_deg)
assert 0 in idxs # self
assert 1 in idxs # 10° lon away
assert any(d == pytest.approx(10.0, rel=1e-4) for d in dists)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than any here can't we just assert with numpy.test.almost_equals how far away we expect the two points to lie i.e. [0.0, 10.0] presumably within-radius will return the points by distance, no? Otherwise we could maybe sort the indexes and distances by distance here in the test?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ee70506: replaced the any(...) check with a deterministic assertion by sorting returned neighbours by distance in the test, then asserting expected indices [0, 1] and distances [0.0, 10.0] using NumPy test helpers

@leifdenby
Copy link
Copy Markdown
Member

@FAbdullah17 could you have a look at fixing the failing tests please and I will do another review?

@FAbdullah17
Copy link
Copy Markdown
Author

FAbdullah17 commented Mar 25, 2026

@FAbdullah17 could you have a look at fixing the failing tests please and I will do another review?

@leifdenby I'v fixed the failing tests and hope so all checks will pass now, adressed in 44b2cd7

@leifdenby
Copy link
Copy Markdown
Member

there are still two failing tests @FAbdullah17 :)

@FAbdullah17
Copy link
Copy Markdown
Author

there are still two failing tests @FAbdullah17 :)

Adressed in 949a71d: Re-checked CI failures – root cause was missing mesh_node_distance in distance-metric notebook cells. Fixed that, and a full local run now passes: 197 passed, 0 failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feat] Auto-detect distance metric (haversine/euclidean) from coordinate CRS

4 participants