Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support (Multi)Point,FeatureCollection and GeometryCollection in bboxClip #2814

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

stevage
Copy link
Collaborator

@stevage stevage commented Jan 15, 2025

Fixes #2813

Also, small change to existing behaviour to return Multi* objects with no coordinates instead of defective Polygon and LineString objects. That is, instead of a Polygon with coordinates: [] which is in violation of the GeoJSON spec, you get a MultiPolygon with coordinates: [] which is ok.

Possibly this would count as a breaking change? But would anyone really rely on that?

Please provide the following when creating a PR:

…Clip.

Return Multi* objects with no coordinates instead of defective Polygon and LineString objects.
@stevage
Copy link
Collaborator Author

stevage commented Jan 15, 2025

Oh, a note: it would be nice to have the type signature capture this logic:

  • if you pass a Point, LineString or Polygon, you get the same thing back, or the Multi version
  • otherwise you get the same thing back

I tried. TypeScript defeated me though.

@smallsaucepan
Copy link
Member

instead of a Polygon with coordinates: [] which is in violation of the GeoJSON spec

3.1 suggests that is actually ok? "GeoJSON processors MAY interpret Geometry objects with empty "coordinates" arrays as null objects."

Not saying that's any more intuitive. Just seems odd to go from 0 results = multipoint, 1 result = point, 2 or more results = back to multipoint. Would we be better off returning null? Would be easier to check in calling code.

Possibly this would count as a breaking change? But would anyone really rely on that?

It would be breaking if we started returning a wider variety of types than before.

I tried. TypeScript defeated me though.

Where did this come unstuck? Were you trying function overloading? Or some other mechanism?

@stevage
Copy link
Collaborator Author

stevage commented Jan 17, 2025

3.1 suggests that is actually ok? "GeoJSON processors MAY interpret Geometry objects with empty "coordinates" arrays as null objects."

Yeah, but since Turf is generating the data, that would essentially be forcing all apps using Turf to adopt the non-standard interpretation.

Not saying that's any more intuitive. Just seems odd to go from 0 results = multipoint, 1 result = point, 2 or more results = back to multipoint. Would we be better off returning null? Would be easier to check in calling code.

It's intuitive if you think of the rules of English:

  • 0 pointS
  • 1 point
  • 2 pointS
  • ...

The Multi- strategy is imho the best for consistency.

  • Return null for a polygon with no points does not work for a FeatureCollection, because it can't contain null values. So Polygons in a FeatureCollection would have to behave differently from a single Polygon.
  • It's pretty questionable whether a Point can have a [] coordinates array. The spec in 3.1.1 says "one position in the case of a Point geometry, " and it is clear that a position must be at least two numbers. So potentially the Point case would have to behave differently from LineString and Polygon.

There is theoretically also the option of returning a feature with a null geometry, which would be legal, but weird in its own way.

What I like about my approach:

  • the output features are 100% legal and extremely likely to be handled correctly by any application without special handling. (Returning a null could easily break code and cause exceptions when people try to dereference properties on the returned feature).
  • it is already the case that LineStrings and Polygons might get turned into Multi- geometries, so any application author already has to consider that possibility.

I did consider that perhaps this behaviour could be an option, like { returnNulls: true} as a third argument. If true, it would return null for a single feature outside the Bbox, or just drop them from a FeatureCollection. That could be a future enhancement though.

It would be breaking if we started returning a wider variety of types than before.

It's already true that you can pass a Polygon and get back a MultiPolygon. By coincidence, if anyone is using logic like if (returnedPolygon.geometry.coordinates.length === 0) that will actually still work.

Where did this come unstuck? Were you trying function overloading? Or some other mechanism?

It was a big chain of conditional generics (... ? ... : ... ). I should have made a copy of the attempt, oops.

@stevage
Copy link
Collaborator Author

stevage commented Jan 17, 2025

Pinging @mourner as the original author in case he has any opinions.

@smallsaucepan
Copy link
Member

All fair points about null geometries in features, etc @stevage.

It's intuitive if you think of the rules of English:

0 pointS
...

Hadn't thought of it that way. I suppose I'm looking at it from the POV of a user who wants to know "was there anything returned in the clip area?" They would have to do this right?

if (result.type === "Feature" &&
  result.geometry.type = "MultiWhatever" &&
  result.geometry.coordinates.length === 0) {}

I understand the approach you're suggesting makes sense if you just feed the result straight into a loop to process the points somehow. If it's empty the loop runs zero times, no dramas.

If the user wants to make a decision at that point though they need to know how to detect an empty result. That first test above would require a pretty good understanding of geojson to get right.

How about returning null for the entire result e.g.

if (result === null) {}

We do that in other places (e.g. intersect, difference) so there is a precedent.

It would be breaking if we started returning a wider variety of types than before.

It's already true that you can pass a Polygon and get back a MultiPolygon.

True, however now we're also talking about returning Points, MultiPoints, and FeatureCollections. If client code isn't handling those already (and I don't think they would as 7.2.0 doesn't mention them) that would probably start generating type errors.

Where did this come unstuck? Were you trying function overloading? Or some other mechanism?

It was a big chain of conditional generics (... ? ... : ... ). I should have made a copy of the attempt, oops.

Oh god. I hope you've recovered. Something like this could be an option (generics omitted for line length purposes):

// Overloaded typedefs
function bboxClip(feature: Feature<Point | MultiPoint>, bbox: BBox): Feature<Point | MultiPoint> | FeatureCollection;
function bboxClip(feature: Feature<Line | MultiLine>, bbox: BBox): Feature<Line | MultiLine> | FeatureCollection
function bboxClip(feature: Feature<Polygon | MultiPolygon>, bbox: BBox): Feature<Line | MultiLine> | FeatureCollection;
// plus options for FeatureCollections and GeometryCollections ...

// Actual function implementation is a superset of all the above
function bboxClip(feature: Feature<Point | MultiPoint | Line | MultiLine | Polygon | MultiPolygon>, bbox: BBox): Feature<Point | MultiPoint | Line | MultiLine | Polygon | MultiPolygon> | FeatureCollection {
  // do the stuff
}

It's reasonably readable - definitely more so than 15 levels of nested ternary statements!

Big picture, thinking of Turf-wide consistency, I think it's useful to have a tidy way to indicate to users there was no result, whatever that means. I believe this PR with its current scope will be a breaking change (cause of other return types), so wondering if that changes the conversation at all?

@stevage
Copy link
Collaborator Author

stevage commented Jan 21, 2025

I suppose I'm looking at it from the POV of a user who wants to know "was there anything returned in the clip area?"

It's a pity we don't also have a function that more directly answers that question, a "booleanCrossesBbox" or something.

They would have to do this right?

if (result.type === "Feature" &&
  result.geometry.type = "MultiWhatever" &&
  result.geometry.coordinates.length === 0) {}

I think they just have to do something like this:

if (! result?.geometry?.coordinates?.length > 0) {  }

If the user wants to make a decision at that point though they need to know how to detect an empty result. That first test above would require a pretty good understanding of geojson to get right.

I don't think this is true.

How about returning null for the entire result e.g.

This is much more problematic, because now we force the user to do the check, because it's very likely any subsequent code will throw errors. I really don't see "nothing in the clipped area" as a special case warranting special handling or thrown exceptions, and I think the function should behave accordingly.

True, however now we're also talking about returning Points, MultiPoints, and FeatureCollections. If client code isn't handling those already (and I don't think they would as 7.2.0 doesn't mention them) that would probably start generating type errors.

Those types are never returned because those types aren't accepted as arguments currently. If we can get the type signature of the function correct, it won't cause any generated type errors, I think.

Something like this could be an option (generics omitted for line length purposes):

Ah yes, overloads looks definitely like the right approach here.

I believe this PR with its current scope will be a breaking change (cause of other return types), so wondering if that changes the conversation at all?

Again, if we get the type signature right, I don't think that's the case.

Before: Pass a Polygon, get either a MultiPolygon or Polygon back. After, the same. But also if you happen to pass a Point, now you get either a Point or a MultiPoint instead of it just being a type error. I don't think that's a breaking change?

Incidentally, what do you think about the idea of an options flag to request nulls instead of Multis?

@smallsaucepan
Copy link
Member

Again, if we get the type signature right, I don't think that's the case.

Ok I'm willing to be optimistic, though think we'll have to take the overloading approach to stand a chance.

if (! result?.geometry?.coordinates?.length > 0) { }

As long as the user can narrow down which types might come back, agree that should be enough. So the possible permutations would be (types in -> types out):

  1. Point | MultiPoint -> Point | MultiPoint
  2. LineString | MultiLineString -> LineString | MultiLineString
  3. Polygon | MultiPolygon -> Polygon | MultiPolygon
  4. GeometryCollection -> GeometryCollection
  5. FeatureCollection -> FeatureCollection

And the user adjusts their "empty result test" to suit. That cover all the bases?

Incidentally, what do you think about the idea of an options flag to request nulls instead of Multis?

Inclined to just settle on one approach and keep the interface simple. Let's press ahead with Multis as you originally suggested.

@stevage
Copy link
Collaborator Author

stevage commented Jan 22, 2025

Ok I'm willing to be optimistic, though think we'll have to take the overloading approach to stand a chance.

I think I have achieved this now in my latest commit.

I haven't thoroughly tested the type resolution, but my basic tests seem to give the right result. I'm guessing we don't have any actual TypeScript type testing in Turf?

The new type definitions are more specific than previously. Before there wasn't even a defined return type.

A wrinkle I hadn't considered was that if you pass a non-specific geometry type in, you should just get a non-specific geometry out. Fortunately, that case seems to work without losing the specificity of the other cases.

And the user adjusts their "empty result test" to suit. That cover all the bases?

I don't think think the user has to adjust anything. If they are doing this:

const out = bboxClip(myPolygon, bbox);
if (out.geometry.coordinates.length === 0) console.log('nothing in clip area')

it will still work.

If for some reason they have written:

const out = bboxClip(myPolygon, bbox);
if (out.geometry.type === 'Polygon' && out.geometry.coordinates.length === 0) console.log('nothing in clip area')

then this will not work.

Worth noting that the documentation did not make any promises about what happens when a polygon is not within the clip area. So now it is much clearer. Or to put it differently, it changes an undefined (and incorrect) behaviour to a defined, and correct behaviour.

@stevage
Copy link
Collaborator Author

stevage commented Jan 22, 2025

Btw after doing a bit more close reading, I should correct some of my statements:

  • a Polygon with coordinates [] is not in violation of the spec, as far as I can tell.
  • a LineString with coordinates [] is: "3.1.4. LineString
    For type "LineString", the "coordinates" member is an array of two or more positions."

FWIW, geojson.io takes the same interpretation: a LineString with [] coordinates is invalid.

Also, the current behaviour of bboxClip is that if you pass a MultiLineString containing a single LineString it will return a LineString back. (Just an oversight in the original implementation I think). So maintaining the original geometry type does not appear particularly sacrosanct.

@stevage
Copy link
Collaborator Author

stevage commented Jan 22, 2025

So...after all that (sorry!), I've reverted back to returning a Polygon with empty coordinates array when the Polygon is outside the clip region. I think I was confusing each ring (which needs at least 4 members) with the coordinates array itself (which doesn't seem to require more than 0 rings).

It seems I also at certain points thought mistakenly that bboxClip was currently returning a LineString with [] coordinates in that case, but it's not - it already returned a MultiLineString. So no behaviour change required there.

I did change the behaviour so that if you pass a single-item MultiLineString, you get a single-item MultiLineString back. That just seemed like a bug.

@smallsaucepan
Copy link
Member

Lol. All good. Taking a fresh look at the latest commit.

A wrinkle I hadn't considered was that if you pass a non-specific geometry type in, you should just get a non-specific geometry out.

Would you mind dropping an example of what that would look like? Had a go locally and TS is refusing to accept anything I'm doing to prompt the situation.

So maintaining the original geometry type does not appear particularly sacrosanct.

Not sure we could even if we wanted to. For example, what would be your intuition for the return type of the below (passing in a Polygon)?

Screenshot 2025-01-22 at 22 29 45

Btw, seems like the current behaviour merges the bbox with the clipped multiple polygons to be able to return a polygon. Not sure if that was intentional to keep the same return type or some other reason. See #2816.

@stevage
Copy link
Collaborator Author

stevage commented Jan 22, 2025

A wrinkle I hadn't considered was that if you pass a non-specific geometry type in, you should just get a non-specific geometry out.

Would you mind dropping an example of what that would look like? Had a go locally and TS is refusing to accept anything I'm doing to prompt the situation.

Well, this:

const test = { type: "Point", coordinates: [0, 0] } as Geometry;
const out = bboxClip(test, bbox);

Hovering over out in VS Code gives

image

So maintaining the original geometry type does not appear particularly sacrosanct.

Not sure we could even if we wanted to. For example, what would be your intuition for the return type of the below (passing in a Polygon)?

Indeed. To be more precise, I should have said "maintaining the original geometry type, in cases where it's possible, .."

Btw, seems like the current behaviour merges the bbox with the clipped multiple polygons to be able to return a polygon. Not sure if that was intentional to keep the same return type or some other reason. See #2816.

Yeah, I made some assumptions about the current behaviour in this whole PR, but I suspect the current behaviour produces defective results in a lot of places. It's really just a very simplistic wrapper around lineClip which doesn't deal with the complexities of clipping polygons.

@smallsaucepan
Copy link
Member

Well, this:

Of course. Thanks.

I suspect the current behaviour produces defective results in a lot of places

Guess we cross the bridge when we come to it. Feel the return types are generous enough now that when we fix the behaviour the types won't have to change again.

Do you have any thoughts on how to adapt the JSDoc to handle the multiple overloads? There should be a params / returns combo for each to make clear the output types are constrained by the inputs e.g. put in a Polygon and you know you're never going to get a FeatureCollection.

Screenshot 2025-01-23 at 13 01 31

@stevage
Copy link
Collaborator Author

stevage commented Jan 23, 2025 via email

@smallsaucepan
Copy link
Member

Will take a look. Thanks for working through it 👍

@stevage
Copy link
Collaborator Author

stevage commented Jan 29, 2025

Ok, I have updated the doc.

Also, since it seems that currently polygons are not split into MultiPolygons I have updated the doc to reflect that, and added test cases. I do think it should be fixed, but that's covered in #2816.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestion: Generalise bboxClip to support all GeoJSON object types
2 participants