Skip to content

Conversation

tgregg
Copy link
Contributor

@tgregg tgregg commented Sep 19, 2025

Issue #, if available:
Resolves #381

Description of changes:
This optimization allows for byte-copy from the reader's binary Ion stream to the writer's binary Ion stream under the following conditions:

  • The reader's data source is a byte array
  • The reader's symbol table is a subset of the writer's symbol table (i.e. all of the symbol IDs in the source point to the same text in the destination context, though the destination context may have additional symbol IDs with mappings)
  • The value to be copied is not annotated (sidesteps annotation wrapper merging)
  • The reader is not in a struct (sidesteps verbatim field name transfer)

These requirements can be difficult to satisfy (particularly the symbol table requirement, because it requires using some advanced APIs), but when satisfied and the optimization is enabled (via IonBinaryWriterBuilder.withStreamCopyOptimized(true)), the performance benefits are substantial because it avoids deserialization/re-serialization of the values that qualify. In 1.11.0 we dropped support for this optimization to keep code complexity as low as possible, based on research that revealed minimal usage of the optimization. We recently became aware of a user that had not upgraded from 1.10.2 that had come to rely on the performance of the optimization. This PR adds back the optimization to allow such users to achieve the same or better performance as before 1.11.0 was released.

Reviewers can review each of the three commits in this PR individually to see how it evolved.

  1. "Resurrects the stream copy optimization feature..." - contains the minimal amount of code to make the feature work as before and satisfy the existing tests. However, the implementation using Facets was ugly (addressed in the second commit), the reference-comparison used to compare reader and writer symbol tables was not correct, and the comparisons were not optimized for the internal SymbolTable implementations used by the writer and reader (addressed in the third commit).
  2. "Don't implement byte transfer as a facet" - there was no need to use the Facet system for this, though this was how it was implemented pre-1.10.5. Simply implementing the interface in the reader works just fine and is cleaner.
  3. "Improves performance when stream copy optimization is enabled..." - This adds complexity but achieves two things: 1) corrects the "symtab extends cache" logic, which previously stored and compared references to the writer and reader symbol tables to optimize the comparisons. The writer had moved to a single "final" internal SymbolTable reference that it mutated internally, which would have caused this check to pass even if the symbol table mappings changed. 2) optimizes the comparisons between the writer and reader tables by using a private interface that allows for zero-copy access to the internals.

This change has the following goals:

  1. Provide the same or better performance as 1.10.5 when stream copy optimization is enabled and used.
  2. Provide the same or better performance as 1.11.10 (the current version) when string copy optimization is not used.

Goal 1:

The benchmark uses IonWriter.writeValue(IonReader), with the stream copy optimization enabled, to merge nested containers from one binary Ion stream into another. The symbol table of the writer is manually set to be the same as the reader's.

1.10.5:
Throughput: 9.984 ops/s
Allocation rate: 231 MB/op

1.11.0 (optimization removed):
Throughput: 4.268 ops/s
Allocation rate: 258 MB/op

This PR:
Throughput: 10.344 ops/s ✅
Allocation rate: 226 MB/op ✅

Goal 2:

Benchmarking code: ion-java-benchmark-cli, modified so that the read command tests IonWriter.writeValues(IonReader) with stream copy optimization disabled. I will formally add an option to the tool that does this in a separate PR.

CLI command: ./benchmark-cli read --mode AverageTime --time-unit microseconds --iterations 2 --warmups 2 --forks 2 --ion-reader non_incremental --io-type buffer <file>

Deeply nested single value

1.11.10

Time: 44.937 us/op
Allocation rate: 32 KB/op

This PR

Time: 40.474 us/op ✅ (note: this speedup is aided by an optimization in IonWriter.writeValues that I will include in a separate PR)
Allocation rate: 30 KB/op ✅

Large stream of Ion log data

1.11.10

Time: 878604 us/op
Allocation rate: 236 MB/op

This PR

Time: 779020 us/op ✅ (note: this speedup is aided by an optimization in IonWriter.writeValues that I will include in a separate PR)
Allocation rate: 236 MB/op ✅

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Copy link

@philstrong philstrong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved by Amazon Profiler team

Comment on lines +234 to +243
boolean compareSymbolsArrayToCollection(String[] arr, int arrayLength, Collection<String> collection) {
// Precondition: the collection contains at least as many elements as the array.
Iterator<String> collectionIterator = collection.iterator();
for (int i = 0; i < arrayLength; i++) {
if (!safeEquals(arr[i], collectionIterator.next())) {
return false;
}
}
return true;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would i < arrayLength && i.hasNext() be too expensive? How about a check if (collection.size() < arrayLength) return false;? I assume the precondition is actually that the collection contains at least arrayLength elements, not array.length elements? Isn't another precondition here that arrayLength <= arr.length?

Suggested change
boolean compareSymbolsArrayToCollection(String[] arr, int arrayLength, Collection<String> collection) {
// Precondition: the collection contains at least as many elements as the array.
Iterator<String> collectionIterator = collection.iterator();
for (int i = 0; i < arrayLength; i++) {
if (!safeEquals(arr[i], collectionIterator.next())) {
return false;
}
}
return true;
}
boolean compareSymbolsArrayToCollection(String[] arr, int arrayLength, Collection<String> collection) {
if (arr.length < arrayLength || collection.size() < arrayLength) return false;
Iterator<String> collectionIterator = collection.iterator();
for (int i = 0; i < arrayLength; i++) {
if (!safeEquals(arr[i], collectionIterator.next())) {
return false;
}
}
return true;
}

This suggestion ought to cover all these, please straighten me out if I've got it wrong :)

Comment on lines +309 to +315
// Superset must have same/more declared (local) symbols than subset.
if (numberOfLocalSymbols > otherLocal.getNumberOfLocalSymbols()) return false;

Collection<String> otherSymbols = otherLocal.getLocalSymbolsNoCopy();
if (!compareSymbolsArrayToCollection(symbols, numberOfLocalSymbols, otherSymbols)) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see that one of the checks I added in my suggestion lives outside the method. Is it correct to push it down? It looks to me like it should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave it as-is. otherLocal.getNumberOfLocalSymbols() is subtly different than otherSymbols.size() (allowing for the no-copy collection to overwrite without clearing if that's what it wants to do). I'm also comfortable leaving out some of the normal safety checks given that these methods are internal and we can rely on other internal constraints, such as the one that requires our arrays/collections to be at least as large as the related "getNumberOf.." methods say they are.

@tgregg tgregg force-pushed the resurrect-copy-optimization branch from 364465e to 3a5681f Compare September 25, 2025 23:16
@tgregg tgregg merged commit 6d406b1 into master Sep 30, 2025
16 checks passed
@tgregg tgregg deleted the resurrect-copy-optimization branch September 30, 2025 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement _Private_ByteTransferReader facet in IonReaderContinuableCoreBinary
3 participants