Skip to content

Conversation

Omar-Elrefaei
Copy link
Contributor

@Omar-Elrefaei Omar-Elrefaei commented Apr 24, 2025

Working towards #150
/claim #150

Todo:

  • Better and consistent error msgs
  • Figure out the proper workaround of the semi_minor_axis inverse_flattening discrepancy (4275, 4267)
    • Elaborate on the find_diff_paths workaround.
  • Adjust test/jsonutils.jl to project style
  • Decide whether tests can actually depend on DeepDiffs Found better alternative solution.
  • Comply with JuliaFormatter
Tools for live development and debugging
# Use DeepDiffs package to view any differences between GDAL's projjson (j1) and our's (j2).
# Differences are shown red/green color. Few insignificant keys are filtered from (j1),
# though there are still more false-negatives than deltapaths filtering
function deepdiffprojjson(j1::J, j2::J) where {J<:Union{Dict}}
  j1 = deepcopy(j1)
  keystodelete = ["bbox", "area", "scope", "usages", "\$schema"]
  for k in keystodelete
    delete!(j1, k)
  end
  try
    delete!(j1["base_crs"], "coordinate_system")
  catch
    nothing
  end
  diff = deepdiff(j1, j2)
  return diff
end

# For live development or debugging.
# run checkprojjson(4275) to see differences between current and expected json output
# The presence of red/green colored output does not neccesarly mean that there is a bug to be fixed.
# If the last printed line is an empty vector, it means the colored diff is likely a superfluous difference.
function checkprojjson(crs::Int; verbose=true)
  gdaljson = gdalprojjsondict(EPSG{crs})
  wktdict = GeoIO.epsg2wktdict(crs)
  jsondict = GeoIO.wkt2json(wktdict)

  # Show pretty-printed WKT if possible
  if verbose && isdefined(Main, :PrettyPrinting)
    @info "Parsed WKT"
    pprintln(wktdict)
  elseif verbose
    @warn "Formatted printing of WKT or JSON is unavailable because PrettyPrinting is not loaded"
  end

  # Show deep diff if possible
  if verbose && isdefined(Main, :DeepDiffs)
    @info "DeepDiff"
    display(deepdiffprojjson(gdaljson, jsondict))
  elseif verbose
    @warn "Detailed colored output is unavailable because DeepDiffs is not loaded."
  end

  diffkeys = deltaprojjson(gdaljson, jsondict)
  @info "JSON keys with a potentially significant difference from expected output:"
  display(diffkeys)
end

@Omar-Elrefaei Omar-Elrefaei changed the title WKT2 to Julia Dict to PROJJSON Pure Julia WKT2 to PROJJSON conversion Apr 24, 2025
@juliohm
Copy link
Member

juliohm commented Apr 28, 2025

@Omar-Elrefaei this looks promising. Please let me know when it is ready for review.

@nsajko
Copy link

nsajko commented Apr 28, 2025

Some deep issues with the design here:

  • Calling Meta.parse on the WKT string can't be correct and is obviously a huge hack, no offence. It's parsing foreign code as Julia code!
    • In particular WKT strings have completely different escaping than Julia strings, the former escape by repeating the double quotes, while the latter escape with backslashes.
    • It could also perhaps be considered a security vulnerability, unless the WKT strings are completely vetted upfront on each update of the EPSG data base.
  • WKT keywords are case-insensitive, furthermore in some cases several alternative names are possible for the same keyword, so hardcoding the names like here can't be correct.

My design is more complex in part because it tackles these issues and more.

@juliohm
Copy link
Member

juliohm commented Apr 28, 2025

Thank you for sharing @nsajko. Appreciate your inputs.

If you can evolve your PR to a final version to show the technical advantages in practice, we will happily consider it.

Both PRs are still work in progress, but it is really nice to see the high quality of the attempts already.

@Omar-Elrefaei Omar-Elrefaei marked this pull request as ready for review May 1, 2025 17:27
@Omar-Elrefaei
Copy link
Contributor Author

I think this is functionally mostly there, save for few rare edge cases.
The code itself would take another round of cleanup tho, especially the comments.

few of the concerns brought up by nsajko are justifies, some are design tradeoffs, and some are a non-issue, I think. I'll take time to elaborate soon.

@Omar-Elrefaei
Copy link
Contributor Author

  • Calling Meta.parse on the WKT string can't be correct and is obviously a huge hack, no offence. It's parsing foreign code as Julia code!
    • It could also perhaps be considered a security vulnerability, unless the WKT strings are completely vetted upfront on each update of the EPSG data base.

It's kindof a hack, I won't disagree. But I think it is an elegant one.
I wouldn't say it is parsing WKT as "actual" Julia code. more like parsing/mapping the WKT into and Expr object. We never call eval. Instead, I walk through that Expr obj and populate a Dict based on what I find.
To illustrate

julia> """hello[123, "world"]""" |> Meta.parse
:(hello[123, "world"])

julia> """hello[123, "world"]""" |> Meta.parse |> eval
ERROR: UndefVarError: `hello` not defined in `Main`
...

julia> """hello[123, "world"]""" |> Meta.parse |> GeoIO.expr2dict
Dict{Symbol, Vector{Any}} with 1 entry:
  :hello => [123, "world"]

I'm not a security expert but I doubt there is any potential for a security vulnerability here

  • In particular WKT strings have completely different escaping than Julia strings, the former escape by repeating the double quotes, while the latter escape with backslashes.
  • WKT keywords are case-insensitive, furthermore in some cases several alternative names are possible for the same keyword, so hardcoding the names like here can't be correct.

While your design is definitely more academically proper, I took a much more test-driven approach with it. Before starting with anything JSON related I ran all 7000+ WKT files in the dataset through that expr2dict function and they all passed. grep shows zero occurrences of a "" in wkt directory. Yes, if there's ever one in the future that might be a problem.
Regarding alternate names, it doesn't seem to be a problem for the files we have on hand. Working around it wouldn't be too hard, even if it would make the code uglier.


Definitely a different set of tradeoffs, it is up to @juliohm to decide which is more appropriate for his project. But I'm definitely impressed at how quickly you wrote up that full-fledged parser.

It is partially my bad for not having any clarifications about design decisions up front in the PR.

@juliohm
Copy link
Member

juliohm commented May 5, 2025

Looking forward to evaluate the pros and cons of each approach. Thank you all for the amazing work and considerations shared so far. It really helps!

@Omar-Elrefaei
Copy link
Contributor Author

Do you mean that GDAL is sometimes picking a different alternative?

Yes.

This is totally independent from the floating point discussion.

@nsajko
Copy link

nsajko commented May 22, 2025

I wonder if testing against PROJ instead of against GDAL might be simpler.

@juliohm
Copy link
Member

juliohm commented May 23, 2025

@Omar-Elrefaei is there any way to adjust the comparison function to check for these alternative representations? I understand that GDAL is arbitrary in these choices, so we don't have much choice other than checking that any alternative matches.

Please let me know if you need any additional input from me before making the final adjustments. The suggestion by @nsajko might be interesting to explore also.

@Omar-Elrefaei
Copy link
Contributor Author

So you want us to be producing projjson that matches GDAL regarding these alternatives?

@Omar-Elrefaei is there any way to adjust the comparison function to check for these alternative representations? I understand that GDAL is arbitrary in these choices, so we don't have much choice other than checking that any alternative matches.

I'm not sure what are you asking exactly.
We have to write our projjson totally independently from GDAL. WKT only supplies semi_major_axis and inverse_flattening, so we will always be writing them like that to projjson.
Now is your suggestion the following: during testing if GDAL provides semi_major_axis and inverse_flattening, compare their values to ours. If GDAL provides semi_major_axis and semi_minor_axis, we calculate a semi_minor_axis from our values and make the comparison.

That feels like complicating it beyond what is needed to be honest.

If maybe you mean that we check that we produced at least of the alternative representations: that is what happens in the isvalid schema check. I as thinking of actually proposing add JSONSchema as an actual project dependency so we can always do a schema check before returning projjson to the user.

@juliohm
Copy link
Member

juliohm commented May 23, 2025 via email

@Omar-Elrefaei
Copy link
Contributor Author

Yes we can delay the conversation.

I'm just wondering how tests will pass in this case if the GDAL output has something else.

I simply ignored inverse_flattening from the comparison.

So yes, if there is a hypothetical instance where the numerical value of our inverse_flattening is different from GDAL's inverse_flattening: that will not be caught with the tests because inverse_flattening is always ignored. While it arguable ought to be caught; I assumed that is a trivial price we are willing to pay.

Is that what you want to avoid? Don't ignore it when we can do the comparison, and ignore it when we can't.

@juliohm
Copy link
Member

juliohm commented May 23, 2025 via email

@juliohm
Copy link
Member

juliohm commented May 23, 2025 via email

@juliohm
Copy link
Member

juliohm commented May 26, 2025

@Omar-Elrefaei I believe we only have two remaining issues to discuss before the merge:

  1. The need for the jsonroundtrip test utility. We can ask the community for help with a MWE that is unrelated to CRS.
  2. The try-catch block in the projjson utility. I believe we should simply remove the block as it is hiding potential issues upon saving in GPQ.save in extra/gis.jl.

@Omar-Elrefaei
Copy link
Contributor Author

Omar-Elrefaei commented May 26, 2025

  1. The need for the jsonroundtrip test utility. We can ask the community for help with a MWE that is unrelated to CRS.

I'll try.

  1. The try-catch block in the projjson utility. I believe we should simply remove the block as it is hiding potential issues upon saving in GPQ.save in extra/gis.jl.

I agree. I didn't add that. It was in the original code and it did annoy me with some silent failures at some point.

@juliohm
Copy link
Member

juliohm commented May 26, 2025

I will remove the try-catch to see if tests pass 👍🏽

@nsajko any idea of what might be causing the JSON behavior in the jsonrountrip here?

@juliohm
Copy link
Member

juliohm commented May 26, 2025

I remember now that this try-catch block is handling the fallback Cartesian CRS. We can sort this out in a separate PR later. Let's just focus on the jsonroundtrip issue.

@Omar-Elrefaei
Copy link
Contributor Author

Argh!!! 😫 🤯 Caught it.
(screenshot of debugging session. conclusion at the end. will send push fixes in less than an hour)

image
image
ie, at the end, the difference between pre and post jsonroundtrip is
image
I had used split(...)[1] at some point in the code, which returns a substring. just need to pass that to string

@juliohm
Copy link
Member

juliohm commented May 26, 2025 via email

…he jsondict

I found the reason `jsonroundtrip` was needed for `JSONSchema.validate`
to work properly. Turns out there is a bug in JSONSchema that makes it
faultily deal with the underlying String behind a SubStrings
@Omar-Elrefaei
Copy link
Contributor Author

Not a bug in Julia, but in JSONSchema. a string is == substring, but not ===. That is expected.
What was not expected, is JSONSchema not taking the SubString at face value as equivalent to a String.

Peaking at their code, seems that they should use AbstractString in one specific place. I'll open an issue.

@Omar-Elrefaei Omar-Elrefaei requested a review from juliohm May 26, 2025 17:50
@juliohm
Copy link
Member

juliohm commented May 26, 2025

Peaking at their code, seems that they should use AbstractString in one specific place. I'll open an issue.

Thank you! Please link the issue here for future reference.

@juliohm juliohm merged commit 25fcaf3 into JuliaEarth:master May 26, 2025
8 checks passed
@juliohm
Copy link
Member

juliohm commented May 26, 2025

Thank you again for another amazing contribution @Omar-Elrefaei. That was a tough one. Be assured that this will have a huge impact in our ecosystem ❤️ 🫶

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants