Pure Julia WKT2 to PROJJSON conversion #156

Omar-Elrefaei · 2025-04-24T20:16:47Z

Working towards #150
/claim #150

Todo:

Better and consistent error msgs
Figure out the proper workaround of the semi_minor_axis inverse_flattening discrepancy (4275, 4267)
- Elaborate on the find_diff_paths workaround.
Adjust test/jsonutils.jl to project style
~~Decide whether tests can actually depend on DeepDiffs~~ Found better alternative solution.
Comply with JuliaFormatter

Tools for live development and debugging

# Use DeepDiffs package to view any differences between GDAL's projjson (j1) and our's (j2).
# Differences are shown red/green color. Few insignificant keys are filtered from (j1),
# though there are still more false-negatives than deltapaths filtering
function deepdiffprojjson(j1::J, j2::J) where {J<:Union{Dict}}
  j1 = deepcopy(j1)
  keystodelete = ["bbox", "area", "scope", "usages", "\$schema"]
  for k in keystodelete
    delete!(j1, k)
  end
  try
    delete!(j1["base_crs"], "coordinate_system")
  catch
    nothing
  end
  diff = deepdiff(j1, j2)
  return diff
end

# For live development or debugging.
# run checkprojjson(4275) to see differences between current and expected json output
# The presence of red/green colored output does not neccesarly mean that there is a bug to be fixed.
# If the last printed line is an empty vector, it means the colored diff is likely a superfluous difference.
function checkprojjson(crs::Int; verbose=true)
  gdaljson = gdalprojjsondict(EPSG{crs})
  wktdict = GeoIO.epsg2wktdict(crs)
  jsondict = GeoIO.wkt2json(wktdict)

  # Show pretty-printed WKT if possible
  if verbose && isdefined(Main, :PrettyPrinting)
    @info "Parsed WKT"
    pprintln(wktdict)
  elseif verbose
    @warn "Formatted printing of WKT or JSON is unavailable because PrettyPrinting is not loaded"
  end

  # Show deep diff if possible
  if verbose && isdefined(Main, :DeepDiffs)
    @info "DeepDiff"
    display(deepdiffprojjson(gdaljson, jsondict))
  elseif verbose
    @warn "Detailed colored output is unavailable because DeepDiffs is not loaded."
  end

  diffkeys = deltaprojjson(gdaljson, jsondict)
  @info "JSON keys with a potentially significant difference from expected output:"
  display(diffkeys)
end

juliohm · 2025-04-28T17:18:27Z

@Omar-Elrefaei this looks promising. Please let me know when it is ready for review.

nsajko · 2025-04-28T18:07:27Z

Some deep issues with the design here:

Calling Meta.parse on the WKT string can't be correct and is obviously a huge hack, no offence. It's parsing foreign code as Julia code!
- In particular WKT strings have completely different escaping than Julia strings, the former escape by repeating the double quotes, while the latter escape with backslashes.
- It could also perhaps be considered a security vulnerability, unless the WKT strings are completely vetted upfront on each update of the EPSG data base.
WKT keywords are case-insensitive, furthermore in some cases several alternative names are possible for the same keyword, so hardcoding the names like here can't be correct.

My design is more complex in part because it tackles these issues and more.

juliohm · 2025-04-28T18:28:09Z

Thank you for sharing @nsajko. Appreciate your inputs.

If you can evolve your PR to a final version to show the technical advantages in practice, we will happily consider it.

Both PRs are still work in progress, but it is really nice to see the high quality of the attempts already.

These end up effecting the "datum" json items

Omar-Elrefaei · 2025-05-01T17:30:23Z

I think this is functionally mostly there, save for few rare edge cases.
The code itself would take another round of cleanup tho, especially the comments.

few of the concerns brought up by nsajko are justifies, some are design tradeoffs, and some are a non-issue, I think. I'll take time to elaborate soon.

Omar-Elrefaei · 2025-05-05T18:54:27Z

Calling Meta.parse on the WKT string can't be correct and is obviously a huge hack, no offence. It's parsing foreign code as Julia code!

It could also perhaps be considered a security vulnerability, unless the WKT strings are completely vetted upfront on each update of the EPSG data base.

It's kindof a hack, I won't disagree. But I think it is an elegant one.
I wouldn't say it is parsing WKT as "actual" Julia code. more like parsing/mapping the WKT into and Expr object. We never call eval. Instead, I walk through that Expr obj and populate a Dict based on what I find.
To illustrate

julia> """hello[123, "world"]""" |> Meta.parse
:(hello[123, "world"])

julia> """hello[123, "world"]""" |> Meta.parse |> eval
ERROR: UndefVarError: `hello` not defined in `Main`
...

julia> """hello[123, "world"]""" |> Meta.parse |> GeoIO.expr2dict
Dict{Symbol, Vector{Any}} with 1 entry:
  :hello => [123, "world"]

I'm not a security expert but I doubt there is any potential for a security vulnerability here

In particular WKT strings have completely different escaping than Julia strings, the former escape by repeating the double quotes, while the latter escape with backslashes.

WKT keywords are case-insensitive, furthermore in some cases several alternative names are possible for the same keyword, so hardcoding the names like here can't be correct.

While your design is definitely more academically proper, I took a much more test-driven approach with it. Before starting with anything JSON related I ran all 7000+ WKT files in the dataset through that expr2dict function and they all passed. grep shows zero occurrences of a "" in wkt directory. Yes, if there's ever one in the future that might be a problem.
Regarding alternate names, it doesn't seem to be a problem for the files we have on hand. Working around it wouldn't be too hard, even if it would make the code uglier.

Definitely a different set of tradeoffs, it is up to @juliohm to decide which is more appropriate for his project. But I'm definitely impressed at how quickly you wrote up that full-fledged parser.

It is partially my bad for not having any clarifications about design decisions up front in the PR.

juliohm · 2025-05-05T21:02:27Z

Looking forward to evaluate the pros and cons of each approach. Thank you all for the amazing work and considerations shared so far. It really helps!

This reverts commit 33fac70. turns out ArchGDAL is still needed for other functionality

Omar-Elrefaei · 2025-05-22T19:25:23Z

Do you mean that GDAL is sometimes picking a different alternative?

Yes.

This is totally independent from the floating point discussion.

nsajko · 2025-05-22T21:16:34Z

I wonder if testing against PROJ instead of against GDAL might be simpler.

juliohm · 2025-05-23T12:47:47Z

@Omar-Elrefaei is there any way to adjust the comparison function to check for these alternative representations? I understand that GDAL is arbitrary in these choices, so we don't have much choice other than checking that any alternative matches.

Please let me know if you need any additional input from me before making the final adjustments. The suggestion by @nsajko might be interesting to explore also.

Omar-Elrefaei · 2025-05-23T19:47:50Z

So you want us to be producing projjson that matches GDAL regarding these alternatives?

@Omar-Elrefaei is there any way to adjust the comparison function to check for these alternative representations? I understand that GDAL is arbitrary in these choices, so we don't have much choice other than checking that any alternative matches.

I'm not sure what are you asking exactly.
We have to write our projjson totally independently from GDAL. WKT only supplies semi_major_axis and inverse_flattening, so we will always be writing them like that to projjson.
Now is your suggestion the following: during testing if GDAL provides semi_major_axis and inverse_flattening, compare their values to ours. If GDAL provides semi_major_axis and semi_minor_axis, we calculate a semi_minor_axis from our values and make the comparison.

That feels like complicating it beyond what is needed to be honest.

If maybe you mean that we check that we produced at least of the alternative representations: that is what happens in the isvalid schema check. I as thinking of actually proposing add JSONSchema as an actual project dependency so we can always do a schema check before returning projjson to the user.

juliohm · 2025-05-23T19:54:01Z

I believe we can always use inverse_flattening as you suggested. I'm just wondering how tests will pass in this case if the GDAL output has something else. My suggestion was to make the test comparison more evolved for the ellipsoid parameters but I believe that is not trivial to do given the way diffpaths simply scans the tree without special cases. What about we delay this decision to after the other remaining fixes? Em sex., 23 de mai. de 2025, 16:48, Omar Elrefaei ***@***.***> escreveu:

…

*Omar-Elrefaei* left a comment (JuliaEarth/GeoIO.jl#156) <#156 (comment)> So you want us to be producing projjson that matches GDAL regarding these alternatives? @Omar-Elrefaei <https://github.com/Omar-Elrefaei> is there any way to adjust the comparison function to check for these alternative representations? I understand that GDAL is arbitrary in these choices, so we don't have much choice other than checking that any alternative matches. I'm not sure what are you asking exactly. We have to write our projjson totally independently from GDAL. WKT only supplies semi_major_axis and inverse_flattening, so we will always be writing them like that to projjson. Now is your suggestion the following: during testing if GDAL provides semi_major_axis and inverse_flattening, compare their values to ours. If GDAL provides semi_major_axis and semi_minor_axis, we calculate a semi_minor_axis from our values and make the comparison. That feels like complicating it beyond what is needed to be honest. If maybe you mean that we check that we produced at least of the alternative representations: that is what happens in the isvalid schema check. I as thinking of actually proposing add JSONSchema as an actual project dependency so we can always do a schema check before returning projjson to the user. — Reply to this email directly, view it on GitHub <#156 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZQW3KJE5R5OM7ONSJHOM32753PXAVCNFSM6AAAAAB3Z7JDT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMBVGYZTSMJRGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Omar-Elrefaei · 2025-05-23T20:01:38Z

Yes we can delay the conversation.

I'm just wondering how tests will pass in this case if the GDAL output has something else.

I simply ignored inverse_flattening from the comparison.

So yes, if there is a hypothetical instance where the numerical value of our inverse_flattening is different from GDAL's inverse_flattening: that will not be caught with the tests because inverse_flattening is always ignored. While it arguable ought to be caught; I assumed that is a trivial price we are willing to pay.

Is that what you want to avoid? Don't ignore it when we can do the comparison, and ignore it when we can't.

juliohm · 2025-05-23T20:05:43Z

Yes, something along your last sentence. If we are producing the wrong ellipsoid parameters and the tests don't catch that, we might take too long to uncover the bug. But if it is something that requires too much more coding, please ignore this idea for now. We can come back to it in a separate issue. Em sex., 23 de mai. de 2025, 17:02, Omar Elrefaei ***@***.***> escreveu:

…

*Omar-Elrefaei* left a comment (JuliaEarth/GeoIO.jl#156) <#156 (comment)> Yes we can delay the conversation. I'm just wondering how tests will pass in this case if the GDAL output has something else. I simply ignored inverse_flattening from the comparison. So yes, if there is a hypothetical instance where the numerical value of our inverse_flattening is different from GDAL's inverse_flattening: that will not be caught with the tests because inverse_flattening is always ignored. While it arguable ought to be caught; I assumed that is a trivial price we are willing to pay. *Is that what you want to avoid? Don't ignore it when we can do the comparison, and ignore it when we can't.* — Reply to this email directly, view it on GitHub <#156 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZQW3M4V5FQU7YSR6JQWG32755DRAVCNFSM6AAAAAB3Z7JDT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMBVGY4DMMZXHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

juliohm · 2025-05-23T20:35:59Z

Thinking about it more carefully... If GDAL doesn't match the ellipsoid parameters with or without conversion of inverse_flattening, then this is a bug in GDAL. We are consuming the official WKT from the database so there is no way the bug is in our side. Please ignore my previous comments to try to compare the parameters like this. I think we can simply ignore them. If we were writing projjson from scratch then that would be a different story. We are just converting strings assuming the input string is undeniably correct. Em sex., 23 de mai. de 2025, 17:05, Júlio Hoffimann < ***@***.***> escreveu:

…

Yes, something along your last sentence. If we are producing the wrong ellipsoid parameters and the tests don't catch that, we might take too long to uncover the bug. But if it is something that requires too much more coding, please ignore this idea for now. We can come back to it in a separate issue. Em sex., 23 de mai. de 2025, 17:02, Omar Elrefaei < ***@***.***> escreveu: > *Omar-Elrefaei* left a comment (JuliaEarth/GeoIO.jl#156) > <#156 (comment)> > > Yes we can delay the conversation. > > I'm just wondering how tests will pass in this case if the GDAL output > has something else. > > I simply ignored inverse_flattening from the comparison. > > So yes, if there is a hypothetical instance where the numerical value of > our inverse_flattening is different from GDAL's inverse_flattening: that > will not be caught with the tests because inverse_flattening is always > ignored. While it arguable ought to be caught; I assumed that is a trivial > price we are willing to pay. > > *Is that what you want to avoid? Don't ignore it when we can do the > comparison, and ignore it when we can't.* > > — > Reply to this email directly, view it on GitHub > <#156 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAZQW3M4V5FQU7YSR6JQWG32755DRAVCNFSM6AAAAAB3Z7JDT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMBVGY4DMMZXHA> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

src/crsstrings.jl

juliohm · 2025-05-26T14:07:43Z

@Omar-Elrefaei I believe we only have two remaining issues to discuss before the merge:

The need for the jsonroundtrip test utility. We can ask the community for help with a MWE that is unrelated to CRS.
The try-catch block in the projjson utility. I believe we should simply remove the block as it is hiding potential issues upon saving in GPQ.save in extra/gis.jl.

Omar-Elrefaei · 2025-05-26T14:42:37Z

The need for the jsonroundtrip test utility. We can ask the community for help with a MWE that is unrelated to CRS.

I'll try.

The try-catch block in the projjson utility. I believe we should simply remove the block as it is hiding potential issues upon saving in GPQ.save in extra/gis.jl.

I agree. I didn't add that. It was in the original code and it did annoy me with some silent failures at some point.

juliohm · 2025-05-26T14:48:41Z

I will remove the try-catch to see if tests pass 👍🏽

@nsajko any idea of what might be causing the JSON behavior in the jsonrountrip here?

juliohm · 2025-05-26T14:54:47Z

I remember now that this try-catch block is handling the fallback Cartesian CRS. We can sort this out in a separate PR later. Let's just focus on the jsonroundtrip issue.

Omar-Elrefaei · 2025-05-26T16:34:55Z

Argh!!! 😫 🤯 Caught it.
(screenshot of debugging session. conclusion at the end. will send push fixes in less than an hour)

ie, at the end, the difference between pre and post jsonroundtrip is

I had used split(...)[1] at some point in the code, which returns a substring. just need to pass that to string

juliohm · 2025-05-26T16:44:13Z

Wow! That is annoying! I don't know why the strings are considered different just because of their types. Maybe a bug in Julia? Maybe expected behavior that is counter-intuitive? Thanks for diving into it! Em seg., 26 de mai. de 2025, 13:35, Omar Elrefaei ***@***.***> escreveu:

…

*Omar-Elrefaei* left a comment (JuliaEarth/GeoIO.jl#156) <#156 (comment)> Argh!!! 😫 🤯 Caught it. (screenshot of debugging session. conclusion at the end. will send push fixes in less than an hour) image.png (view on web) <https://github.com/user-attachments/assets/e3038d64-f7f5-4370-bc9e-c23c99a0673f> image.png (view on web) <https://github.com/user-attachments/assets/34a958a4-5072-4061-adeb-b6a8dd40ae49> ie, at the end, the difference between pre and post jsonroundtrip is image.png (view on web) <https://github.com/user-attachments/assets/9cae691b-f361-4edb-a5f5-d0fe1e7cd1e2> I had used split(...)[1] at some point in the code, which returns a substring. just need to pass that to string — Reply to this email directly, view it on GitHub <#156 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZQW3O7JEH5MC43KIMHSG33AM7ELAVCNFSM6AAAAAB3Z7JDT2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDSMJQGIZTIOJVGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

…he jsondict I found the reason `jsonroundtrip` was needed for `JSONSchema.validate` to work properly. Turns out there is a bug in JSONSchema that makes it faultily deal with the underlying String behind a SubStrings

Omar-Elrefaei · 2025-05-26T17:50:00Z

Not a bug in Julia, but in JSONSchema. a string is == substring, but not ===. That is expected.
What was not expected, is JSONSchema not taking the SubString at face value as equivalent to a String.

Peaking at their code, seems that they should use AbstractString in one specific place. I'll open an issue.

juliohm · 2025-05-26T18:04:43Z

Peaking at their code, seems that they should use AbstractString in one specific place. I'll open an issue.

Thank you! Please link the issue here for future reference.

juliohm · 2025-05-26T18:18:28Z

Thank you again for another amazing contribution @Omar-Elrefaei. That was a tough one. Be assured that this will have a huge impact in our ecosystem ❤️ 🫶

add testing against json schema

e052fa8

Omar-Elrefaei changed the title ~~WKT2 to Julia Dict to PROJJSON~~ Pure Julia WKT2 to PROJJSON conversion Apr 24, 2025

Omar-Elrefaei force-pushed the wkt2expr2json branch from 00449b1 to f4d782e Compare April 24, 2025 20:21

half-complete implementation for processing GEOGCRS

24b4323

Omar-Elrefaei force-pushed the wkt2expr2json branch from f4d782e to 24b4323 Compare April 24, 2025 20:25

Omar-Elrefaei added 2 commits April 24, 2025 16:25

use DeepDiffs.jl for semantic comparison between nested Dicts

5626679

add processing for GEOGCRS(Variant 1: ENSEMBLE)

5e63081

Omar-Elrefaei added 8 commits April 30, 2025 07:45

add processing for GEOGCRS(Variant 2: DATUM)

7d55128

handle UNIT optionally occurring in coordinate_system or each axis

65f6d2b

testing utils

27af91c

testing

39814a3

add processing for GEODCRS

1a5e749

add processing for DYNAMIC entries is GEOG and GEOD.

4bac566

These end up effecting the "datum" json items

uncommitted changes

02e91a1

add processing for PROJCRS

bf44043

Omar-Elrefaei marked this pull request as ready for review May 1, 2025 17:27

replace the actual projjsonstring function

016f543

Omar-Elrefaei added 7 commits May 5, 2025 17:46

remove ArchGDAL dependency

33fac70

Revert "remove ArchGDAL dependency"

d0a3eba

This reverts commit 33fac70. turns out ArchGDAL is still needed for other functionality

solid cleanup round

65ef7e0

refactor testing utils

65a49f1

side step EPSGs unsupported by the WKT dataset

822213f

Fix 8% of cases where axis.direction should not start with uppercase

844f960

add support for meridian and prime_meridian optional nodes

8975ebb

Omar-Elrefaei and others added 6 commits May 23, 2025 18:10

minor adjustments as per PR review

b86df1a

organize and elaborate on testing edge cases

d1024b6

remove helper debugging functions

34ef409

Final cleanup

54ed56f

Final code style fixes

1cac3ed

Move functions around

5eb7466

juliohm reviewed May 26, 2025

View reviewed changes

src/crsstrings.jl Outdated Show resolved Hide resolved

Additional cleanup

9931b58

juliohm reviewed May 26, 2025

View reviewed changes

src/crsstrings.jl Show resolved Hide resolved

Omar-Elrefaei requested a review from juliohm May 26, 2025 17:50

juliohm approved these changes May 26, 2025

View reviewed changes

juliohm merged commit 25fcaf3 into JuliaEarth:master May 26, 2025
8 checks passed

Pure Julia WKT2 to PROJJSON conversion #156

Pure Julia WKT2 to PROJJSON conversion #156

Uh oh!

Conversation

Omar-Elrefaei commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juliohm commented Apr 28, 2025

Uh oh!

nsajko commented Apr 28, 2025

Uh oh!

juliohm commented Apr 28, 2025

Uh oh!

Omar-Elrefaei commented May 1, 2025

Uh oh!

Omar-Elrefaei commented May 5, 2025

Uh oh!

juliohm commented May 5, 2025

Uh oh!

Omar-Elrefaei commented May 22, 2025

Uh oh!

nsajko commented May 22, 2025

Uh oh!

juliohm commented May 23, 2025

Uh oh!

Omar-Elrefaei commented May 23, 2025

Uh oh!

juliohm commented May 23, 2025 via email

Uh oh!

Omar-Elrefaei commented May 23, 2025

Uh oh!

juliohm commented May 23, 2025 via email

Uh oh!

juliohm commented May 23, 2025 via email

Uh oh!

Uh oh!

Uh oh!

juliohm commented May 26, 2025

Uh oh!

Omar-Elrefaei commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juliohm commented May 26, 2025

Uh oh!

juliohm commented May 26, 2025

Uh oh!

Omar-Elrefaei commented May 26, 2025

Uh oh!

juliohm commented May 26, 2025 via email

Uh oh!

Omar-Elrefaei commented May 26, 2025

Uh oh!

juliohm commented May 26, 2025

Uh oh!

Uh oh!

juliohm commented May 26, 2025

Uh oh!

Uh oh!

Omar-Elrefaei commented Apr 24, 2025 •

edited

Loading

Omar-Elrefaei commented May 26, 2025 •

edited

Loading