Skip to content

Conversation

@JaroslavTulach
Copy link
Member

@JaroslavTulach JaroslavTulach commented Oct 29, 2025

Pull Request Description

Adding support for "assert GC mode" into ContextUtils. The assertGC is invoked after each @Rule or @ClassRuleusage of ContextUtils is being closed. This default behavior can be manually configured via ContextUtils.Builder.assertGC(boolean).

This work is based on investigation of a non-deterministic CI failure. By watching the process performing runtime-integration-tests/test execution in VisualVM at least one memory leak was revealed: That problem was fixed by replacing static final field by EnsoContext.Extra per context value in d21d2fd. That way those cached EnsoMultiTypes shall garbage collect at the same time as EnsoContext.

Important Notes

Originally there seemed to be 173 of instances of org.graalvm.polyglot.Context during test execution. There seemed to be millions of instances of TreeMap$Entry held in the memory:

obrazek

and they seem to leak via EnsoMultiType.ALL_TYPES:

obrazek

After this PR this debris will be completely gone!

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • All code follows the
    Scala,
    Java,
  • Unit tests have been written where possible.

@JaroslavTulach JaroslavTulach self-assigned this Oct 29, 2025
@JaroslavTulach JaroslavTulach added the CI: No changelog needed Do not require a changelog entry for this PR. label Oct 29, 2025
var ref = new WeakReference<>(ensoContext());
context.close();
context = null;
assertGC("EnsoContext can be GCed when context is closed", true, ref);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The ultimate goal is to make sure that once org.graalvm.polyglot.Context is closed
  • all references to EnsoContext are closed
  • that's not really the case right now and fixes may be needed all over the place to make that happen

@JaroslavTulach JaroslavTulach changed the title Verify context can be GCed after evaluation Verify EnsoContext can be GCed after cleaning up ContextUtils rule Oct 29, 2025
@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Oct 30, 2025

The CI is failing in JVM unit tests (as expected) with a ton of error messages:

There are memory leaks holding EnsoContext in memory somewhere in the core engine code. Next step is to run one of these failing tests, generate heap dump and find out why the context is still held in the memory.

build.sbt Outdated
"-Dpolyglot.engine.AllowExperimentalOptions=true"
"-Dpolyglot.engine.AllowExperimentalOptions=true",
"-XX:+HeapDumpOnOutOfMemoryError",
"-XX:HeapDumpPath=" + (Compile / packageBin / artifactPath).value
Copy link
Member Author

@JaroslavTulach JaroslavTulach Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update on Nov 6, 2025

How to Analyze Heap Dump

When a test fails because of assertGC, then following message is printed:

java.lang.OutOfMemoryError: Java heap space
Dumping heap to /home/devel/NetBeansProjects/enso/enso/engine/runtime-integration-tests/target/scala-2.13/runtime-integration-tests.jar ...
Heap dump file created [16382020915 bytes in 18,771 secs]
java.lang.OutOfMemoryError: Java heap space
  | => rat org.enso.test.utils.MemoryUtils.checkAndAlloc(MemoryUtils.java:60)
        at org.enso.test.utils.MemoryUtils.assertGC(MemoryUtils.java:17)
        at org.enso.test.utils.ContextUtils.close(ContextUtils.java:134)
        at org.enso.test.utils.ContextUtils$CustomStatement.evaluate(ContextUtils.java:481)
        at org.junit.rules.RunRules.evaluate(RunRules.java:20)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)

with that .hprof file at hand, one can open it in VisualVM as

jvisualvm --openfile ./engine/runtime-integration-tests/java_pid969397.hprof

and then

obrazek

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem in previous picture shows a reference from CleanableReference.QUEUE - that may go away eventually, but df08637 does its best to force the processing of the QUEUE.

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Oct 30, 2025

With recent changes, there is now 130 failing tests:

   Failed tests:
  	org.enso.interpreter.test.imports.ImportSymbolsTest
  	org.enso.interpreter.test.exports.ExportResolutionOrderingTest
  	org.enso.interpreter.test.meta.MetaIsATest
  	org.enso.compiler.test.ExportCycleDetectionTest
  	org.enso.interpreter.test.privateaccess.PrivateConstructorAccessTest
  	org.enso.interpreter.test.interop.GuestJavaInteropTest
  	org.enso.compiler.test.TypeInferenceTest
  	org.enso.interpreter.test.meta.MetaTypeMethodsTest
  	org.enso.interpreter.test.LazyAtomFieldTest
  	org.enso.compiler.dump.test.DocsGenerateTest
  	org.enso.interpreter.test.ListTest
  	org.enso.interpreter.test.ExtensionMethodResolutionTest
  	org.enso.interpreter.test.privateaccess.PrivateMethodAccessTest
  	org.enso.interpreter.test.interop.TypesExposeConstructorsTest
  	org.enso.interpreter.test.EnsoMultiValueTest
  	org.enso.interpreter.test.VectorSortTest
  	org.enso.interpreter.test.WarningsTest
  	org.enso.interpreter.runtime.ManagedResourceTest
  	org.enso.interpreter.node.expression.builtin.meta.TypeOfNodeTest
  	org.enso.interpreter.caches.HelloWorldCacheTest
  	org.enso.compiler.dump.test.DocsGenerateOrderTest
  	org.enso.interpreter.test.exports.ExportConstructorTest
  	org.enso.interpreter.test.BigNumberTest
  	org.enso.interpreter.test.exports.ExportStaticMethodTest
  	org.enso.interpreter.test.interop.InvokeMemberConsistencyTest
  	org.enso.interpreter.test.builtins.BuiltinTypesExposeMethodsTest
  	org.enso.interpreter.runtime.ResourceManagerTest
  	org.enso.interpreter.test.AtomConstructorTest
  	org.enso.interpreter.test.EnsoMultiValueInteropTest
  	org.enso.interpreter.test.FindExceptionMessageTest
  	org.enso.interpreter.runtime.progress.ProgressTest
  	org.enso.interpreter.test.VectorBuilderTest
  	org.enso.interpreter.test.BinaryOpIntegerTest
  	org.enso.interpreter.test.BinaryDispatchTest
  	org.enso.interpreter.test.privateaccess.PrivateCheckDisabledTest
  	org.enso.interpreter.test.exports.ExportExtensionMethodTest
  	org.enso.compiler.test.BindingsMapResolutionTest
  	org.enso.interpreter.test.interop.MethodInvocationOnTypeConsistencyTest
  	org.enso.interpreter.test.VectorTest
  	org.enso.compiler.test.ExportedSymbolsTest
  	org.enso.interpreter.test.meta.QualifiedNameTest
  	org.enso.interpreter.test.exports.ExportModuleTest
  	org.enso.compiler.test.passes.MethodCallsTest
  	org.enso.interpreter.node.expression.builtin.meta.TypeChainTest
  	org.enso.interpreter.test.exports.ExportConversionMethodTest
  	org.enso.interpreter.test.interop.MetaObjectTest
  	org.enso.compiler.test.passes.resolve.NameResolutionTest
  	org.enso.interpreter.test.scope.ModuleScopeTest
  	org.enso.interpreter.test.BinaryOpFloatTest
  	org.enso.interpreter.node.expression.builtin.meta.TypeOfNodeMultiValueTest
  	org.enso.interpreter.test.meta.EnsoProjectTest
  	org.enso.interpreter.test.instrument.InsightForEnsoTest
  Error during tests:
    org.enso.compiler.test.semantic.ImportExportTest

Fix or make the assertGC check of EnsoContext an opt-in or allow an opt-out?

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Oct 31, 2025

There is 55 unit test failures on Windows remaining:

  Failed: Total 27291, Failed 55, Errors 0, Passed 27234, Skipped 2, Ignored 32, Canceled 1, Pending 34
  Failed tests:
  	org.enso.interpreter.test.interop.GuestJavaInteropTest
  	org.enso.interpreter.test.interop.MetaObjectPolyglotTest
  	org.enso.interpreter.test.meta.MetaTypeMethodsTest
  	org.enso.interpreter.test.hash.HashCodeTest
  	org.enso.interpreter.test.EqualsTest
  	org.enso.interpreter.test.VectorSortTest
  	org.enso.interpreter.test.WarningsTest
  	org.enso.interpreter.caches.HelloWorldCacheTest
  	org.enso.interpreter.test.exports.ExportConstructorTest
  	org.enso.interpreter.test.exports.ExportStaticMethodTest
  	org.enso.interpreter.test.interop.InvokeMemberConsistencyTest
  	org.enso.interpreter.test.builtins.BuiltinTypesExposeMethodsTest
  	org.enso.interpreter.test.EnsoMultiValueInteropTest
  	org.enso.interpreter.test.VectorBuilderTest
  	org.enso.interpreter.test.BinaryDispatchTest
  	org.enso.interpreter.test.exports.ExportExtensionMethodTest
  	org.enso.compiler.test.BindingsMapResolutionTest
  	org.enso.interpreter.test.interop.MethodInvocationOnTypeConsistencyTest
  	org.enso.interpreter.test.exports.ExportModuleTest
  	org.enso.compiler.test.passes.MethodCallsTest
  	org.enso.interpreter.node.expression.builtin.meta.TypeChainTest
  	org.enso.interpreter.test.exports.ExportConversionMethodTest
  	org.enso.interpreter.test.interop.MetaObjectTest
  	org.enso.compiler.test.passes.resolve.NameResolutionTest
  	org.enso.interpreter.test.scope.ModuleScopeTest
  	org.enso.interpreter.node.expression.builtin.meta.TypeOfNodeMultiValueTest
  	org.enso.interpreter.test.meta.EnsoProjectTest

Some of these failures will hopefully go away after fc3401b that disables the assertGC check when ContextUtils are used "manually" (e.g. not as a JUnit @Rule).

There are failures in runtime-benchmarks - they should be solved with the new default introduced in fc3401b.

@JaroslavTulach JaroslavTulach moved this from 🌟 Q/A review to 👁️ Code review in Issues Board Oct 31, 2025
@enso-bot
Copy link

enso-bot bot commented Oct 31, 2025

Jaroslav Tulach reports a new STANDUP for yesterday (2025-10-30):

Progress: .

Discord
Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.

@JaroslavTulach
Copy link
Member Author

Down to seventeen failures:

Failed: Total 27298, Failed 17, Errors 0, Passed 27281, Ignored 32, Pending 30
Failed tests:
  	org.enso.interpreter.test.interop.GuestJavaInteropTest
  	org.enso.interpreter.test.interop.MetaObjectPolyglotTest
  	org.enso.interpreter.test.meta.MetaTypeMethodsTest
  	org.enso.interpreter.test.hash.HashCodeTest
  	org.enso.interpreter.test.EqualsTest
  	org.enso.interpreter.test.VectorSortTest
  	org.enso.interpreter.test.WarningsTest
  	org.enso.interpreter.caches.HelloWorldCacheTest
  	org.enso.interpreter.test.interop.InvokeMemberConsistencyTest
  	org.enso.interpreter.test.builtins.BuiltinTypesExposeMethodsTest
  	org.enso.interpreter.test.EnsoMultiValueInteropTest
  	org.enso.interpreter.test.VectorBuilderTest
  	org.enso.interpreter.test.BinaryDispatchTest
  	org.enso.interpreter.test.interop.MethodInvocationOnTypeConsistencyTest
  	org.enso.interpreter.node.expression.builtin.meta.TypeChainTest
  	org.enso.interpreter.test.interop.MetaObjectTest
  	org.enso.interpreter.node.expression.builtin.meta.TypeOfNodeMultiValueTest

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Oct 31, 2025

  • there is the last remaining failure:
  • org.enso.interpreter.test.interop.GuestJavaInteropTest
  • and this time I think it is a real memory leak
  • maybe just a temporary one, but real
obrazek

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Oct 31, 2025

I believe all tests are fixed. They run in seven minutes:

[info] ScalaTest
[info] Run completed in 6 minutes, 54 seconds.
[info] Total number of tests run: 1372
[info] Suites: completed 101, aborted 0
[info] Tests: succeeded 1372, failed 0, canceled 0, ignored 3, pending 30
[info] All tests passed.
[info] Passed: Total 27282, Failed 0, Errors 0, Passed 27282, Ignored 32, Pending 30
[success] Total time: 415 s (06:55), completed 31. 10. 2025 16:22:42
obrazek The memory usage when running `runtime-integration-tests/test` seems to _not leak_ anything. After each test it returns back to previous memory state. The TruffleTCK tests require enormous amount of heap. Not sure why, but then they clean after itself, so the purpose of this PR should be met.

Reviewers, @Akirathan, @hubertp, @4e6 please act!

Copy link
Member

@Akirathan Akirathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The amount of explicit null assignment to local variables is surprising. I understand why that is necessary for static fields, but for local variables used inside try-with-resources block?

@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🌟 Q/A review in Issues Board Oct 31, 2025
@enso-bot
Copy link

enso-bot bot commented Oct 31, 2025

Jaroslav Tulach reports a new STANDUP for today (2025-10-31):

Progress: .

Discord
Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.

@JaroslavTulach JaroslavTulach added the CI: Ready to merge This PR is eligible for automatic merge label Nov 1, 2025
@enso-bot
Copy link

enso-bot bot commented Nov 2, 2025

Jaroslav Tulach reports a new STANDUP for yesterday (2025-11-01):

Progress: .

@enso-bot
Copy link

enso-bot bot commented Nov 3, 2025

Jaroslav Tulach reports a new STANDUP for yesterday (2025-11-02):

Progress: .

@mergify mergify bot merged commit a92e98b into develop Nov 3, 2025
143 of 156 checks passed
@mergify mergify bot deleted the wip/jtulach/ContextGCTest branch November 3, 2025 12:39
@github-project-automation github-project-automation bot moved this from 🌟 Q/A review to 🟢 Accepted in Issues Board Nov 3, 2025
"-Dpolyglot.engine.AllowExperimentalOptions=true"
"-Dpolyglot.engine.AllowExperimentalOptions=true",
"-XX:+HeapDumpOnOutOfMemoryError",
"-XX:HeapDumpPath=" + (Compile / packageBin).value.getParentFile
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now there should be .hprof file in the engine/runtime-integration-tests/target directory when any of the tests yields OutOfMemoryException.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

-compiler CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge

Projects

Status: 🟢 Accepted

Development

Successfully merging this pull request may close these issues.

3 participants