Skip to content

fix(kotlin): Kill Gradle daemons on KLS shutdown#1082

Open
mareurs wants to merge 2 commits intooraios:mainfrom
mareurs:fix/kill-gradle-daemons-on-shutdown
Open

fix(kotlin): Kill Gradle daemons on KLS shutdown#1082
mareurs wants to merge 2 commits intooraios:mainfrom
mareurs:fix/kill-gradle-daemons-on-shutdown

Conversation

@mareurs
Copy link
Contributor

@mareurs mareurs commented Feb 24, 2026

Summary

  • KLS triggers Gradle during project indexing, which spawns persistent daemon processes (~500MB RSS each, 3-hour idle timeout)
  • These daemons detach from KLS (reparented to init/systemd) and are invisible to psutil.Process.children(recursive=True) cleanup
  • Override stop() in KotlinLanguageServer to find and terminate matching Gradle daemons after normal shutdown
  • Identification: match the java binary in the daemon's cmdline[0] against JAVA_HOME/bin/java from this KLS instance

Root cause investigation

Investigation performed in #1079 (discussion thread) confirmed:

  1. KLS itself cleans up properly — even under SIGKILL, the stdio pipe EOF causes KLS to exit naturally
  2. The real culprit: Gradle daemons spawned during KLS project indexing:
    • Run independently (parent = systemd)
    • Use KLS's bundled JRE (JAVA_HOME points to ~/.serena/language_servers/static/KotlinLanguageServer/...)
    • 3-hour default idle timeout
    • ~500MB RSS each
  3. Verified deterministically via /proc/PID/environ and controlled before/after test runs

Test results

Full test suite run with pytest -n auto (32 workers):

  • 667 passed, 82 skipped, 1 xfailed
  • 23 failed — all pre-existing, unrelated to this change:
    • Missing runtimes: .NET 10, Go/gopls, Nix, Ruby-dev, Zig, Perl, Swift
    • Flaky LS tests: Vue cross-file refs, SystemVerilog cross-file, Elm document symbols
    • CLI edge cases: project index/create
  • 121 errors — all missing runtime environments (dotnet, zig, perl, swift, ruby-dev)
  • All 5 Kotlin tests pass
  • mypy type-check: no issues

Gradle daemon before/after verification

Before fix After fix
New Gradle daemons after pytest -m kotlin 1 new (~500MB) 0
Pre-existing system Gradle daemons Untouched Untouched

Closes #1081

🤖 Generated with Claude Code

KLS triggers Gradle during project indexing, which spawns persistent daemon
processes (~500MB RSS each, 3h idle timeout). These daemons detach from KLS
(reparented to init) and are invisible to normal process-tree cleanup via
psutil.Process.children(recursive=True).

Override stop() to find and terminate Gradle daemons matching this KLS
instance's JAVA_HOME after normal shutdown. Identification is done by
matching the java binary in the daemon's cmdline against JAVA_HOME/bin/java.

Closes oraios#1081

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
if java_home:
self._kill_gradle_daemons(java_home)

def _get_java_home_from_dependency_provider(self) -> str | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neither this nor the None check for java_home above seem necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. On main, _dependency_provider is always DependencyProvider and _java_home_path is always set after _get_or_install_core_dependency(). Will simplify — access the attribute directly without the isinstance/None guards.

the JAVA_HOME used by this KLS instance.
"""
java_bin = os.path.join(java_home, "bin", "java")
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also seems too defensive. When could realpath go wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, os.path.realpath() doesn't raise — it returns the input unchanged if resolution fails. Will remove both try/except blocks.

@MischaPanch
Copy link
Contributor

@mareurs

These daemons detach from KLS (reparented to init/systemd) and are invisible to psutil.Process.children(recursive=True) cleanup

Isn't this completely f*d up?

def stop(self, shutdown_timeout: float = 2.0) -> None:
java_home = self._get_java_home_from_dependency_provider()
super().stop(shutdown_timeout=shutdown_timeout)
if java_home:
Copy link
Contributor

@MischaPanch MischaPanch Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do that in general once (if) we allow users to use their system java. The only safe way to kill the gradle daemons is to know they correspond to Serena's own java. So we will need to wait for #1079 to be merged and then to distinguish if we're using Serena's java or another one. In the advanced config docs we can then tell users that they might need to clean zombies by themselves if they enable using system java.

Meanwhile, I will raise an issue in the KLS repo and ask what's up with all of this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably close #1079 instead. Being able to reliably terminate the LS is more important than reusing a system JRE. I can't think of a good reason why using the system JRE would be important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid concern. With the bundled JRE, the JAVA_HOME/bin/java path is unique to this KLS instance, so matching is safe. With a system JRE, we'd potentially match unrelated Gradle daemons — that's dangerous.

For now this PR only targets the current behavior (bundled JRE). If #1079 is merged in the future and system JRE becomes possible, we'd need to either skip daemon cleanup for system JRE or use additional matching criteria (e.g. the project directory from /proc/PID/environ).

Separately, our investigation confirmed that the Gradle Tooling API (used by JetBrains KLS internally) always spawns a daemon — there's no --no-daemon equivalent. So post-hoc cleanup is the only option.

@mareurs
Copy link
Contributor Author

mareurs commented Feb 24, 2026

Isn't this completely f*d up?

Yes — it's a fundamental constraint of the Gradle Tooling API. The API mandates daemon usage; there's no --no-daemon mode. Every IntelliJ-based tool (IDEA, Android Studio, KLS) leaves these daemons behind. Even setting org.gradle.daemon=false in gradle.properties doesn't affect Tooling API calls. So post-hoc cleanup is the only viable approach.

- Remove _get_java_home_from_dependency_provider() indirection; use
  assert + direct access instead (dependency_provider and java_home_path
  are always set for KLS)
- Remove try/except OSError around os.path.realpath() calls — realpath
  never raises, it returns the path unchanged on failure

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kotlin LS spawns heavy zombie processes

3 participants