Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sortKeys: Boolean flag to ujson/upickle APIs #553

Merged
merged 8 commits into from
Feb 9, 2024
Merged

Add sortKeys: Boolean flag to ujson/upickle APIs #553

merged 8 commits into from
Feb 9, 2024

Conversation

lihaoyi
Copy link
Member

@lihaoyi lihaoyi commented Feb 3, 2024

This PR allows anyone writing JSON to pass in sortKeys = true to make the object keys get output in sorted order.

Major changes:

  1. Introduced a new upickle.core.BufferedValue type. This is a sealed trait hierarchy that can be used to capture all the method calls received by a upickle.core.Visitor, and re-play them later. It replaces IndexedValue, which has JSON-specific idiosyncracies (e.g. converting MsgPack visitor calls to their JSON equivalents) whereas BufferedValue precisely captures the Visitor method calls unchanged. IndexedValue is left for binary compatibility but no longer used internally, even its ordering use case of buffering up things until a $type tag is found has been replaced by BufferedValue

  2. As part of the generalization from IndexedValue to BufferedValue, we now need to handle non-JSON usage patterns such as non-string-keyed objects (e.g. found in msgpack). This makes sorting these objects non-trivial, as their keys can now be arbitrary BufferedValues. BufferedValue.valueToSortKey is a rough hack to traverse any structured keys and make them sortable.

  3. Intercept all transform calls in the primary upickle/ujson APIs with BufferedValue.maybeSortKeysTransform. This checks the new sortKeys boolean flag, and if true it intercepts the visitor calls, captures them in a BufferedValue, recursively traverses and sorts the BufferedValue's object keys, then re-plays it back out

  4. Binary compatibility stubs have been added to all methods who received a new sortKeys: Boolean = true parameter. These stubs were not marked as @Deprecated, because in some cases the Scala compiler would preferentially resolve the stub over the method-with-default-value, causing spurious deprecation warnings.

Testing

  1. Sorting functionality is covered by new sortKeys unit tests in uJson and uPickle, and has been added to the TestUtil.scala round-trip tests to ensure that all existing tests exercise the a subset of the sortKeys = true codepaths for JSON and MsgPack and verify that the round-tripping is still successful

Notes

  1. It took a surprising amount of code to make this work well, as it goes against uPickle's "streaming-no-intermediate-data-structures" way of doing things. sortKeys = true has to buffer up some kind of intermediate JSON AST somewhere. I would expect performance to be worse than the default zero-intermediate-datastructures visitor approach, and so optimizing the sortKeys = true code paths was not a priority in this PR. If anyone wants maximum performance, they can leave it as the default sortKeys = false

  2. The whole Transformer/transform abstraction in the codebase is kind of half-baked and unsatisfactory, but I left it mostly as is for now with only small tweaks to make things work (e.g. allowing the implementation of BufferedValue.maybeSortKeysTransform to live in upickle.core). Realistically, the whole name Transformer and the concept it represents probably needs to be overhauled, but that's out of scope for this PR

  3. TestUtil.scala is getting big and messy, and I'm adding to the mess. It's probably OK to leave for now, but it could definitely use a cleanup

@lihaoyi lihaoyi changed the title Add sortKeys: Boolean flag to ujson/upack/upickle APIs Add sortKeys: Boolean flag to ujson/upickle APIs Feb 3, 2024
@lihaoyi lihaoyi requested review from lefou and lolgab February 3, 2024 13:49
Copy link
Member

@lefou lefou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't dug deep into the upickle code, and can't comment on the design choices. I have some general remarks though.

@@ -10,10 +10,12 @@ import upickle.core.{Visitor, ObjVisitor, ArrVisitor, Abort, AbortException}
* want to work with an AST but still provide source-index error positions if
* something goes wrong
*/
@deprecated("Left for binary compatibility, use upickle.core.BufferedValue instead")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecations should always come with a version, so we can clean them up at some point later.

Suggested change
@deprecated("Left for binary compatibility, use upickle.core.BufferedValue instead")
@deprecated("Left for binary compatibility, use upickle.core.BufferedValue instead", "upickle after 3.1.4")

}

// Binary Compatibility Stub
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be a @deprecated annotation with version, so we can better clean up later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4. Binary compatibility stubs have been added to all methods who received a new sortKeys: Boolean = true parameter. These stubs were not marked as @Deprecated, because in some cases the Scala compiler would preferentially resolve the stub over the method-with-default-value, causing spurious deprecation warnings.

Do you have an example where this happens? This can be most likely worked around by specifying all parameters of the new def, including the defaults. Or we could add a : @nowarn if we think it's ok from a bincompat perspective.

If the compiler is still using a deprecated def, than it's clearly not just a stub to keep backward compatibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the case was something like

// actual method
def foo(a: Int, b: Int = 0) = ...

// bincompat stub
def foo(a: Int) = foo(a, b = 0)

foo(1) // This calls the bincompat stub

The issue is that this would cause warnings in downstream user code, not just within uPickle's own codebase. I don't want to have users have to sprinkle @nowarn annotations to silence meaningless deprecation warnings

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a vague memory, the hack to make that work was to reduce visibility with some scoped protected/private stuff, which probably works, as these restrictions are (only) compile-time enforced but not at runtime. So code that links to these methods, will be able to use it, but the compiler will not. But I think MiMa will still complain. But I'm not 100% sure anymore.

We could probable just remove the default argument and go with classic overloading. Maybe, leave a note for when we prepare the next breakage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't manage to get it to work using various combinations of private; either the deprecation warning would fire, or mima would complain

For now just left comments for all the binary compatibility stub methods. Should be easy to grep for them later

@lihaoyi lihaoyi merged commit 0b81593 into main Feb 9, 2024
8 checks passed
@lefou lefou deleted the sortkeys branch February 9, 2024 17:54
lihaoyi added a commit that referenced this pull request Feb 16, 2024
Turn on tests to verify
#386

Scala 3 tests were accidentally disabled in
#553, this re-enables them

The big-case-class tests do work, with the caveat you need to set
`-Xmax-inlines` to be greater than the width of your case class, and the
compilation is notably slower on 3.3.1 than in Scala 213.11. However,
the performance seems to be better in 3.4.0, and `-Xmax-inlines` is no
longer necessary.

For now, I just add `-Xmax-inlines` and accept the compilation slowness,
so as to not break anyone using Scala 3.3.1. Eventually, as people
upgrade to 3.4.0, the problem can be expected to go away
@julienrf
Copy link

Hey @lihaoyi, thank you for improving upickle. However such a change should lead to a new minor release as recommended here.

This is unfortunate because your library claims to follow the versioning scheme semver-spec but in practice it does not follow it, which could lead to bad surprises.

@lihaoyi
Copy link
Member Author

lihaoyi commented Feb 16, 2024

This change is meant to be source and binary compatible. Your link states that such a change should be a patch version

@ckipp01 has raised some issues with our mima config that caused some breakage to slip through. That's a seprate issue: it's just a bug and I will fix it and cut a new issue

@julienrf
Copy link

In general, adding a new method can break the source compatibility. However, since in your case you added an overload maybe the situation is different, I am not 100% sure.

@lefou
Copy link
Member

lefou commented Feb 16, 2024

sortkeys is new API, hence we need to release it as a new minor version. The binary compatibility shims added here are just technical details. But if they would be not there, we would need to bump the major version, as it results in some API removals.

In a patch release, no API changes are allowed, only internal changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants