Resurrect copy optimization #1093

tgregg · 2025-09-19T23:31:36Z

Issue #, if available:
Resolves #381

Description of changes:
This optimization allows for byte-copy from the reader's binary Ion stream to the writer's binary Ion stream under the following conditions:

The reader's data source is a byte array
The reader's symbol table is a subset of the writer's symbol table (i.e. all of the symbol IDs in the source point to the same text in the destination context, though the destination context may have additional symbol IDs with mappings)
The value to be copied is not annotated (sidesteps annotation wrapper merging)
The reader is not in a struct (sidesteps verbatim field name transfer)

These requirements can be difficult to satisfy (particularly the symbol table requirement, because it requires using some advanced APIs), but when satisfied and the optimization is enabled (via IonBinaryWriterBuilder.withStreamCopyOptimized(true)), the performance benefits are substantial because it avoids deserialization/re-serialization of the values that qualify. In 1.11.0 we dropped support for this optimization to keep code complexity as low as possible, based on research that revealed minimal usage of the optimization. We recently became aware of a user that had not upgraded from 1.10.2 that had come to rely on the performance of the optimization. This PR adds back the optimization to allow such users to achieve the same or better performance as before 1.11.0 was released.

Reviewers can review each of the three commits in this PR individually to see how it evolved.

"Resurrects the stream copy optimization feature..." - contains the minimal amount of code to make the feature work as before and satisfy the existing tests. However, the implementation using Facets was ugly (addressed in the second commit), the reference-comparison used to compare reader and writer symbol tables was not correct, and the comparisons were not optimized for the internal SymbolTable implementations used by the writer and reader (addressed in the third commit).
"Don't implement byte transfer as a facet" - there was no need to use the Facet system for this, though this was how it was implemented pre-1.10.5. Simply implementing the interface in the reader works just fine and is cleaner.
"Improves performance when stream copy optimization is enabled..." - This adds complexity but achieves two things: 1) corrects the "symtab extends cache" logic, which previously stored and compared references to the writer and reader symbol tables to optimize the comparisons. The writer had moved to a single "final" internal SymbolTable reference that it mutated internally, which would have caused this check to pass even if the symbol table mappings changed. 2) optimizes the comparisons between the writer and reader tables by using a private interface that allows for zero-copy access to the internals.

This change has the following goals:

Provide the same or better performance as 1.10.5 when stream copy optimization is enabled and used.
Provide the same or better performance as 1.11.10 (the current version) when string copy optimization is not used.

Goal 1:

The benchmark uses IonWriter.writeValue(IonReader), with the stream copy optimization enabled, to merge nested containers from one binary Ion stream into another. The symbol table of the writer is manually set to be the same as the reader's.

1.10.5:
Throughput: 9.984 ops/s
Allocation rate: 231 MB/op

1.11.0 (optimization removed):
Throughput: 4.268 ops/s
Allocation rate: 258 MB/op

This PR:
Throughput: 10.344 ops/s ✅
Allocation rate: 226 MB/op ✅

Goal 2:

Benchmarking code: ion-java-benchmark-cli, modified so that the read command tests IonWriter.writeValues(IonReader) with stream copy optimization disabled. I will formally add an option to the tool that does this in a separate PR.

CLI command: ./benchmark-cli read --mode AverageTime --time-unit microseconds --iterations 2 --warmups 2 --forks 2 --ion-reader non_incremental --io-type buffer <file>

Deeply nested single value

1.11.10

Time: 44.937 us/op
Allocation rate: 32 KB/op

This PR

Time: 40.474 us/op ✅ (note: this speedup is aided by an optimization in IonWriter.writeValues that I will include in a separate PR)
Allocation rate: 30 KB/op ✅

Large stream of Ion log data

1.11.10

Time: 878604 us/op
Allocation rate: 236 MB/op

This PR

Time: 779020 us/op ✅ (note: this speedup is aided by an optimization in IonWriter.writeValues that I will include in a separate PR)
Allocation rate: 236 MB/op ✅

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

philstrong

Approved by Amazon Profiler team

jobarr-amzn · 2025-09-24T15:26:42Z

src/main/java/com/amazon/ion/impl/IonReaderContinuableApplicationBinary.java

+    boolean compareSymbolsArrayToCollection(String[] arr, int arrayLength, Collection<String> collection) {
+        // Precondition: the collection contains at least as many elements as the array.
+        Iterator<String> collectionIterator = collection.iterator();
+        for (int i = 0; i < arrayLength; i++) {
+            if (!safeEquals(arr[i], collectionIterator.next())) {
+                return false;
+            }
+        }
+        return true;
+    }


Would i < arrayLength && i.hasNext() be too expensive? How about a check if (collection.size() < arrayLength) return false;? I assume the precondition is actually that the collection contains at least arrayLength elements, not array.length elements? Isn't another precondition here that arrayLength <= arr.length?

Suggested change

boolean compareSymbolsArrayToCollection(String[] arr, int arrayLength, Collection<String> collection) {

// Precondition: the collection contains at least as many elements as the array.

Iterator<String> collectionIterator = collection.iterator();

for (int i = 0; i < arrayLength; i++) {

if (!safeEquals(arr[i], collectionIterator.next())) {

return false;

}

}

return true;

}

boolean compareSymbolsArrayToCollection(String[] arr, int arrayLength, Collection<String> collection) {

if (arr.length < arrayLength || collection.size() < arrayLength) return false;

Iterator<String> collectionIterator = collection.iterator();

for (int i = 0; i < arrayLength; i++) {

if (!safeEquals(arr[i], collectionIterator.next())) {

return false;

}

}

return true;

}

This suggestion ought to cover all these, please straighten me out if I've got it wrong :)

jobarr-amzn · 2025-09-24T16:36:16Z

src/main/java/com/amazon/ion/impl/IonReaderContinuableApplicationBinary.java

+        // Superset must have same/more declared (local) symbols than subset.
+        if (numberOfLocalSymbols > otherLocal.getNumberOfLocalSymbols()) return false;
+
+        Collection<String> otherSymbols = otherLocal.getLocalSymbolsNoCopy();
+        if (!compareSymbolsArrayToCollection(symbols, numberOfLocalSymbols, otherSymbols)) {
+            return false;
+        }


Now I see that one of the checks I added in my suggestion lives outside the method. Is it correct to push it down? It looks to me like it should be.

I'm going to leave it as-is. otherLocal.getNumberOfLocalSymbols() is subtly different than otherSymbols.size() (allowing for the no-copy collection to overwrite without clearing if that's what it wants to do). I'm also comfortable leaving out some of the normal safety checks given that these methods are internal and we can rely on other internal constraints, such as the one that requires our arrays/collections to be at least as large as the related "getNumberOf.." methods say they are.

…1.11.0.

…requiring symbol table copies.

tgregg mentioned this pull request Sep 19, 2025

Shortens AbstractIonWriter.writeValues, allowing the JIT to optimize it more efficiently, improving performance by up to 12%. #1094

Merged

philstrong approved these changes Sep 24, 2025

View reviewed changes

jobarr-amzn approved these changes Sep 24, 2025

View reviewed changes

tgregg added 3 commits September 25, 2025 16:16

Resurrects the stream copy optimization feature that was dropped in v…

a9f5380

…1.11.0.

Don't implement byte transfer as a facet.

b937612

Improves performance when stream copy optimization is enabled by not …

3a5681f

…requiring symbol table copies.

tgregg force-pushed the resurrect-copy-optimization branch from 364465e to 3a5681f Compare September 25, 2025 23:16

Merge branch 'master' into resurrect-copy-optimization

5e5b868

tgregg merged commit 6d406b1 into master Sep 30, 2025
16 checks passed

tgregg deleted the resurrect-copy-optimization branch September 30, 2025 19:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Resurrect copy optimization #1093

Resurrect copy optimization #1093

Uh oh!

tgregg commented Sep 19, 2025

Uh oh!

philstrong left a comment

Uh oh!

jobarr-amzn Sep 24, 2025

Uh oh!

jobarr-amzn Sep 24, 2025

Uh oh!

tgregg Sep 24, 2025

Uh oh!

Uh oh!

Uh oh!

Resurrect copy optimization #1093

Resurrect copy optimization #1093

Uh oh!

Conversation

tgregg commented Sep 19, 2025

Goal 1:

Goal 2:

Deeply nested single value

Large stream of Ion log data

Uh oh!

philstrong left a comment

Choose a reason for hiding this comment

Uh oh!

jobarr-amzn Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

jobarr-amzn Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

tgregg Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!