Skip to content

Conversation

@mmusgrov
Copy link
Member

@mmusgrov mmusgrov commented Oct 28, 2025

Draft PR testing Franz's fix for JBTM-4014, do not merge but please review if you have an interest.

The perf tests are to compare pr/2414 against main. The results on my laptop indicate:

STMBenchmark.baselineBenchmark: pr/2414 is 0.2% worse
LocalJTABenchmark.benchmark: pr/2414 is 10% better
STMBenchmark.benchmark: pr/2414 is 0.25% worse

though as always with JMH there is significant run to run variance..

@mmusgrov mmusgrov added the Hold label Oct 28, 2025
@jbosstm-bot
Copy link

Tests failed (https://jenkins-csb-narayana-ci.dno.corp.redhat.com/job/btny-pulls-performance/35/): Performance rebase on main failed. Please rebase it manually

@@ -0,0 +1,57 @@
/*
* Copyright (c) 2014, Oracle America, Inc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copyright change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorted. It comes from the command:

mvn archetype:generate -DarchetypeGroupId=org.openjdk.jmh -DarchetypeArtifactId=jmh-java-benchmark-archetype

which is a bit annoying, it will be generated by including -DarchetypeArtifactId=jmh-java-benchmark-archetype

@mmusgrov mmusgrov force-pushed the JBTM-4014-perf-test branch 2 times, most recently from 58b523d to 0023ea3 Compare October 28, 2025 17:13
@jbosstm-bot
Copy link

Tests failed (https://jenkins-csb-narayana-ci.dno.corp.redhat.com/job/btny-pulls-performance/36/): Performance rebase on main failed. Please rebase it manually

@mmusgrov mmusgrov force-pushed the JBTM-4014-perf-test branch from 0023ea3 to d2e55f8 Compare October 28, 2025 17:21
@jbosstm-bot
Copy link

@mmusgrov mmusgrov force-pushed the JBTM-4014-perf-test branch from d2e55f8 to a34de71 Compare October 28, 2025 17:27
@jbosstm-bot
Copy link

@jbosstm-bot
Copy link

Tests failed (https://jenkins-csb-narayana-ci.dno.corp.redhat.com/job/btny-pulls-performance/37/): Product comparison benchmark failed

@franz1981
Copy link

franz1981 commented Oct 30, 2025

I've created franz1981@549ba66 to benchmark before/after jbosstm/narayana#2414 (including the second commit to reduce GC while creating file break string form of Uid).

I've run the test with JDK 21 and

-gc true --jvmArgsAppend="-XX:+UseParallelGC -Xms16g -Xmx16g -XX:+AlwaysPreTouch"

in order to:

  • have a simple GC algorithm which would show the amount of work performed by the GC in a clear way
  • run GC every each benchmark iteration since it is heavily relying on finalization and producing lot of garbage
  • preset the heap size big enough to not collecting the young gen too obsessively

The results before jbosstm/narayana#2414:

Benchmark                                   Mode  Cnt      Score      Error   Units
STMBenchmark.benchmark                     thrpt   20  51389.836 ± 2670.038   ops/s
STMBenchmark.benchmark:gc.alloc.rate       thrpt   20   1162.278 ±   61.568  MB/sec
STMBenchmark.benchmark:gc.alloc.rate.norm  thrpt   20  23794.455 ±    7.439    B/op

whilst after:

Benchmark                                   Mode  Cnt      Score      Error   Units
STMBenchmark.benchmark                     thrpt   20  54270.687 ± 2843.686   ops/s
STMBenchmark.benchmark:gc.alloc.rate       thrpt   20   1112.618 ±   58.198  MB/sec
STMBenchmark.benchmark:gc.alloc.rate.norm  thrpt   20  21567.033 ±   20.929    B/op

Which shows a clear improvement,

While running this I've noticed few possible other improvements (but I have no more cycles to add them to be fair...):

  1. too much finalization going on: this is likely finalizing the Lock instances, but if there's a more deterministic way to do it (e.g. some autocloseable or anything similar, would be better - especially since finalization is already deprecated)
image
  1. too many syscalls at https://github.com/jbosstm/narayana/blob/main/ArjunaCore/arjuna/classes/com/arjuna/ats/internal/arjuna/objectstore/ShadowingStore.java#L469-L493: it could rely on https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/nio/file/Files.html#deleteIfExists(java.nio.file.Path) which will make it a single syscall (see https://github.com/openjdk/jdk/blob/2c07214d7c075da5dd4a4e872aef29f58cef2bae/src/java.base/unix/native/libnio/fs/UnixNativeDispatcher.c#L1026 and https://linux.die.net/man/2/unlinkat)

@marcosgopen
Copy link
Member

@franz1981 I tested the STMBenchmark (with your improvements) with your PR (franz1981/narayana@57127b3) locally but I didn't find a clear improvement. In my case the ops/s were "926.713 ± 61.033 " without your commit and "920.995 ± 47.619 " with it.

@franz1981
Copy link

you did used the modified benchmark at franz1981@549ba66 and verified it runs on tmpfs?
i think it doesn't because tmpfs is way faster than just 1K ops/sec

@marcosgopen
Copy link
Member

way faster than just 1K ops/sec

Oh, right. I probably missed that bit. I was using your updated version of the benchmark but I needed to create the folder "/tmp/stm-benchmark-store" otherwise the tmpfs was not used for me. Now I get
Benchmark Mode Cnt Score Error Units
STMBenchmark.baseLineBenchmark thrpt 20 1821057287.839 ± 22972700.217 ops/s
STMBenchmark.benchmark thrpt 20 28111.032 ± 687.338 ops/s
I will test the before/after versions again. Thanks!

@marcosgopen
Copy link
Member

I now see an improvement in your PR. I now understand that using the tmpfs is needed otherwise the disk would be a bottleneck for the performance.

@marcosgopen
Copy link
Member

My results (local environment) highlight around 15-20% of improvement in the STMBenchmark performance:


    First run:
        with Franz's improvement: STMBenchmark.benchmark  thrpt   20  23035.442 ± 2802.177  ops/s
        without: STMBenchmark.benchmark  thrpt   20  19821.852 ± 3158.973  ops/s

    Second run:
        with Franz's improvement: STMBenchmark.benchmark  thrpt   20  28350.225 ± 559.184  ops/s
        without: STMBenchmark.benchmark  thrpt   20  23600.040 ± 2540.165  ops/s

Considerations:

  • I tested the Franz's version of the benchmark franz1981@549ba66
  • the command used is java -jar target/benchmarks.jar org.jboss.narayana.stm.STMBenchmark.benchmark -gc true --jvmArgsAppend="-XX:+UseParallelGC -Xms16g -Xmx16g -XX:+AlwaysPreTouch"
  • I needed to create the folder 'stm-benchmark-store' under /tmp to actually use the tmpfs (filesystem).
  • I checked the /tmp folder with the command df -h : `tmpfs **G ***M **G *% /tmp

@Sanne
Copy link

Sanne commented Nov 6, 2025

Awesome work all! Looking forwards to update Quarkus and our downstream benchmarks ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants