Skip to content

Codex/fix sandbox gc expiry#1016

Open
FicoHu wants to merge 21 commits intowecode-ai:mainfrom
FicoHu:codex/fix-sandbox-gc-expiry
Open

Codex/fix sandbox gc expiry#1016
FicoHu wants to merge 21 commits intowecode-ai:mainfrom
FicoHu:codex/fix-sandbox-gc-expiry

Conversation

@FicoHu
Copy link
Copy Markdown
Collaborator

@FicoHu FicoHu commented Apr 22, 2026

Summary by CodeRabbit

  • New Features

    • Added a configurable sandbox inactive timeout (default 7200s) for background cleanup.
  • Bug Fixes

    • Garbage collector now more reliably removes idle and orphaned sandboxes by checking each sandbox’s activity and cleaning stale active-set entries.
  • Documentation

    • Clarified timeout and sandbox lifecycle semantics across API responses and configuration docs; updated descriptions for active-timeout and time-remaining fields.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 22, 2026

📝 Walkthrough

Walkthrough

GC logic changed: sandboxes are now evaluated for inactivity using persisted lifecycle timestamps (last_activity_at, expires_at) loaded per-active-sandbox; repository now persists those timestamps; tests and API docstrings updated; a new sandbox_inactive_timeout config value was added.

Changes

Cohort / File(s) Summary
Configuration
executor_manager/common/config.py
Added sandbox_inactive_timeout (from SANDBOX_INACTIVE_TIMEOUT, default 7200s). Clarified redis_ttl docstring as Redis retention window, not execution timeout.
Sandbox GC & Manager
executor_manager/services/sandbox/manager.py
Reworked _collect_expired_sandboxes to iterate get_active_sandbox_ids(), early-return if empty, load each sandbox, compute idle_seconds vs timeout.sandbox_inactive_timeout, select expirations, remove orphaned active-set entries when load_sandbox returns None, and retain per-sandbox termination with existing locking and error handling.
Repository Persistence
executor_manager/services/sandbox/repository.py
save_sandbox now persists started_at, last_activity_at, and expires_at into the stored session JSON; updated get_expired_sandbox_ids docstring to note expiration is driven by expires_at (method remains a ZSet activity-age helper).
Tests
executor_manager/tests/services/test_sandbox_manager.py
Added test_save_sandbox_round_trip_preserves_timing_fields; updated GC tests to mock zrange membership and assert behavior for idle, orphaned, and unexpired sandboxes; adjusted fixtures to set last_activity_at/expires_at accordingly.
API Docstrings / Schemas
executor_manager/routers/sandbox.py, executor_manager/schemas/sandbox.py
Updated endpoint and schema field descriptions to clarify expires_at / time_remaining now refer to the active-timeout expiration semantics rather than direct auto-termination timestamps.

Sequence Diagram

sequenceDiagram
    autonumber
    participant GC as Garbage Collector\n(_collect_expired_sandboxes)
    participant Repo as Repository
    participant Redis as Redis
    participant Sandbox as Sandbox Object

    rect rgba(100, 150, 200, 0.5)
    Note over GC,Repo: Old flow (TTL-based ZSET query)
    GC->>Repo: get_expired_sandbox_ids(redis_ttl)
    Repo->>Redis: ZRANGEBYSCORE (by activity age)
    Redis-->>Repo: expired_ids
    Repo-->>GC: expired_ids
    end

    rect rgba(150, 200, 100, 0.5)
    Note over GC,Repo: New flow (per-sandbox expires_at / inactivity)
    GC->>Repo: get_active_sandbox_ids()
    Repo->>Redis: SMEMBERS active_set
    Redis-->>Repo: active_ids
    Repo-->>GC: active_ids
    loop For each active id
        GC->>Repo: load_sandbox(id)
        Repo->>Redis: HGET session_hash
        Redis-->>Repo: sandbox_data
        Repo-->>Sandbox: construct sandbox (includes last_activity_at, expires_at)
        GC->>Sandbox: compute idle_seconds / is_expired(sandbox_inactive_timeout)
        Sandbox-->>GC: expired? (true/false)
        alt orphan (load_sandbox -> None)
            GC->>Repo: remove_from_active_set(id)
        end
        alt expired
            GC->>GC: terminate_sandbox(id)
        end
    end
    end
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hop through sandboxes, timestamps in tow,
I guard the idle ones when their last moments slow,
started_at, last_activity_at, expires_at saved tight,
The GC wakes, checks each one through the night,
A tidy clean sweep — hop, nibble, delight!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Codex/fix sandbox gc expiry' is directly related to the main changes in the PR, which implement a new sandbox inactivity timeout mechanism and update sandbox garbage collection expiry logic.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d85b483e3b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1008 to +1009
if sandbox.is_expired():
expired_task_ids.append(task_id_str)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle sandboxes without expires_at during GC

The new GC path only expires entries when sandbox.is_expired() is true, but is_expired() returns false when expires_at is missing. Any active sandbox records that were written without expires_at (for example, pre-deploy data or mixed-version writers) will now be skipped forever, so they are never passed to terminate_sandbox and their containers can keep running until manual cleanup.

Useful? React with 👍 / 👎.

Comment on lines +999 to +1001
for task_id_str in active_task_ids:
sandbox = self._repository.load_sandbox(task_id_str)
if sandbox is None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use async repository reads in the GC scan loop

_collect_expired_sandboxes is an async scheduled job, but it now iterates every active sandbox and calls synchronous load_sandbox for each ID. With many active sandboxes this introduces N blocking Redis round-trips on the event loop, which can delay other async jobs (like heartbeat checks) and API responsiveness during GC runs.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
executor_manager/tests/services/test_sandbox_manager.py (1)

990-1017: ⚠️ Potential issue | 🟠 Major

Add @pytest.mark.asyncio decorator to this async test.

This async test is missing @pytest.mark.asyncio, unlike the neighboring async tests in the same class (test_handle_executor_dead_terminates_sandbox at line 104 and test_collect_expired_sandboxes_cleans_orphaned at line 169). Without the decorator, it may not run correctly under pytest-asyncio.

Proposed fix
+    `@pytest.mark.asyncio`
     async def test_collect_expired_sandboxes_terminates_old(
         self,
         sandbox_manager_with_mock_redis,

Note: The initial verification also identified 20 other async tests across the codebase missing this decorator (in backend/tests/, executor/tests/, and other test files). Consider running a systematic fix across all test files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@executor_manager/tests/services/test_sandbox_manager.py` around lines 990 -
1017, The async test function test_collect_expired_sandboxes_terminates_old is
missing the pytest-asyncio decorator; add `@pytest.mark.asyncio` above the test
definition so pytest runs the coroutine correctly when invoking
manager._collect_expired_sandboxes; ensure the import for pytest is present in
the test module if not already, and apply the same `@pytest.mark.asyncio` pattern
to any other async tests missing it (e.g., other tests in this file and the
noted tests across the codebase).
🧹 Nitpick comments (2)
executor_manager/services/sandbox/manager.py (1)

977-1027: Consider splitting GC code out of this oversized module.

This file is now over the 1000-line project limit; moving GC/background-task logic into a dedicated module would keep SandboxManager focused. As per coding guidelines, "File size limit: if a file exceeds 1000 lines, split it into multiple sub-modules".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@executor_manager/services/sandbox/manager.py` around lines 977 - 1027, The
_collect_expired_sandboxes method and its GC loop should be moved out of the
oversized SandboxManager file into a new dedicated module (e.g., sandbox_gc.py)
to keep SandboxManager focused; refactor by creating a SandboxGC class or
standalone async function that owns the logic currently in
_collect_expired_sandboxes (including acquiring
get_distributed_lock("sandbox_gc"), calling
self._repository.get_active_sandbox_ids(), load_sandbox, remove_from_active_set,
checking sandbox.is_expired(), and invoking _terminate_expired_sandbox for each
expired id), inject or pass the same repository and logger dependencies and
expose a simple start/stop or run method, update SandboxManager to delegate to
the new module (call the new SandboxGC.run or similar) and remove the GC code
from manager.py so the original file drops below the 1000-line limit.
executor_manager/tests/services/test_sandbox_manager.py (1)

15-1096: Consider splitting this oversized test module.

The test file exceeds the project’s 1000-line limit; separating GC/repository round-trip tests into dedicated modules would keep future sandbox tests easier to maintain. As per coding guidelines, "File size limit: if a file exceeds 1000 lines, split it into multiple sub-modules".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@executor_manager/tests/services/test_sandbox_manager.py` around lines 15 -
1096, The file is over 1000 lines; split the large TestSandboxManager module by
moving the Redis/repository and GC-related tests into one or more new test
modules (e.g., tests for repository persistence and garbage-collection logic).
Specifically, extract tests referencing repository and GC behavior such as
test_save_sandbox_success, test_save_sandbox_round_trip_preserves_timing_fields,
test_save_sandbox_missing_task_id, test_load_sandbox_*, test_save_execution_*,
test_load_execution_*, test_dict_to_execution_conversion, and
_collect_expired_sandboxes tests into a new module; keep SandboxManager
API/behavior tests (create_sandbox, get_sandbox, create_execution,
scheduler/heartbeat integration) in the original file. Ensure shared
fixtures/reset logic (reset_singletons fixture and
sandbox_manager_with_mock_redis/mock_redis_client) are available to both modules
(move them into conftest.py if necessary) and update imports/pytest discovery so
tests run unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@executor_manager/services/sandbox/manager.py`:
- Around line 998-1009: The GC loop must handle legacy sandboxes with
sandbox.expires_at == None so they aren't skipped; in the loop that loads
sandboxes (use _repository.load_sandbox and the sandbox variable) change the
expiry check to first detect expires_at is None and then apply a fallback TTL
(e.g. compute fallback_expired = sandbox.created_at + DEFAULT_SANDBOX_TTL < now)
or treat None as expired immediately depending on intended policy, otherwise
call sandbox.is_expired(); add or reuse a DEFAULT_SANDBOX_TTL constant (or
Sandbox.default_ttl) and update the expired_task_ids append logic to use this
fallback branch.

---

Outside diff comments:
In `@executor_manager/tests/services/test_sandbox_manager.py`:
- Around line 990-1017: The async test function
test_collect_expired_sandboxes_terminates_old is missing the pytest-asyncio
decorator; add `@pytest.mark.asyncio` above the test definition so pytest runs the
coroutine correctly when invoking manager._collect_expired_sandboxes; ensure the
import for pytest is present in the test module if not already, and apply the
same `@pytest.mark.asyncio` pattern to any other async tests missing it (e.g.,
other tests in this file and the noted tests across the codebase).

---

Nitpick comments:
In `@executor_manager/services/sandbox/manager.py`:
- Around line 977-1027: The _collect_expired_sandboxes method and its GC loop
should be moved out of the oversized SandboxManager file into a new dedicated
module (e.g., sandbox_gc.py) to keep SandboxManager focused; refactor by
creating a SandboxGC class or standalone async function that owns the logic
currently in _collect_expired_sandboxes (including acquiring
get_distributed_lock("sandbox_gc"), calling
self._repository.get_active_sandbox_ids(), load_sandbox, remove_from_active_set,
checking sandbox.is_expired(), and invoking _terminate_expired_sandbox for each
expired id), inject or pass the same repository and logger dependencies and
expose a simple start/stop or run method, update SandboxManager to delegate to
the new module (call the new SandboxGC.run or similar) and remove the GC code
from manager.py so the original file drops below the 1000-line limit.

In `@executor_manager/tests/services/test_sandbox_manager.py`:
- Around line 15-1096: The file is over 1000 lines; split the large
TestSandboxManager module by moving the Redis/repository and GC-related tests
into one or more new test modules (e.g., tests for repository persistence and
garbage-collection logic). Specifically, extract tests referencing repository
and GC behavior such as test_save_sandbox_success,
test_save_sandbox_round_trip_preserves_timing_fields,
test_save_sandbox_missing_task_id, test_load_sandbox_*, test_save_execution_*,
test_load_execution_*, test_dict_to_execution_conversion, and
_collect_expired_sandboxes tests into a new module; keep SandboxManager
API/behavior tests (create_sandbox, get_sandbox, create_execution,
scheduler/heartbeat integration) in the original file. Ensure shared
fixtures/reset logic (reset_singletons fixture and
sandbox_manager_with_mock_redis/mock_redis_client) are available to both modules
(move them into conftest.py if necessary) and update imports/pytest discovery so
tests run unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aad11176-c945-4bd7-b895-f7358d807350

📥 Commits

Reviewing files that changed from the base of the PR and between 0b0958a and d85b483.

📒 Files selected for processing (4)
  • executor_manager/common/config.py
  • executor_manager/services/sandbox/manager.py
  • executor_manager/services/sandbox/repository.py
  • executor_manager/tests/services/test_sandbox_manager.py

Comment on lines +998 to +1009
expired_task_ids = []
for task_id_str in active_task_ids:
sandbox = self._repository.load_sandbox(task_id_str)
if sandbox is None:
self._repository.remove_from_active_set(task_id_str)
logger.debug(
f"[SandboxManager] Cleaned orphaned ZSet entry: {task_id_str}"
)
continue

if sandbox.is_expired():
expired_task_ids.append(task_id_str)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add a legacy fallback for sandboxes without expires_at.

Records saved before this change can load with expires_at=None; sandbox.is_expired() then returns False, so GC will skip those running containers until the Redis hash expires and the metadata needed to terminate them is gone.

🛠️ Suggested fallback for legacy records
             expired_task_ids = []
+            legacy_cutoff = time.time() - self._config.timeout.redis_ttl
             for task_id_str in active_task_ids:
                 sandbox = self._repository.load_sandbox(task_id_str)
                 if sandbox is None:
                     self._repository.remove_from_active_set(task_id_str)
                     logger.debug(
                         f"[SandboxManager] Cleaned orphaned ZSet entry: {task_id_str}"
                     )
                     continue
 
-                if sandbox.is_expired():
+                if sandbox.expires_at is None:
+                    if sandbox.last_activity_at <= legacy_cutoff:
+                        expired_task_ids.append(task_id_str)
+                    continue
+
+                if sandbox.is_expired():
                     expired_task_ids.append(task_id_str)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@executor_manager/services/sandbox/manager.py` around lines 998 - 1009, The GC
loop must handle legacy sandboxes with sandbox.expires_at == None so they aren't
skipped; in the loop that loads sandboxes (use _repository.load_sandbox and the
sandbox variable) change the expiry check to first detect expires_at is None and
then apply a fallback TTL (e.g. compute fallback_expired = sandbox.created_at +
DEFAULT_SANDBOX_TTL < now) or treat None as expired immediately depending on
intended policy, otherwise call sandbox.is_expired(); add or reuse a
DEFAULT_SANDBOX_TTL constant (or Sandbox.default_ttl) and update the
expired_task_ids append logic to use this fallback branch.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
executor_manager/tests/services/test_sandbox_manager.py (1)

990-1018: ⚠️ Potential issue | 🔴 Critical

Missing @pytest.mark.asyncio — this test will be skipped/pass trivially.

test_collect_expired_sandboxes_terminates_old is defined as async def but no @pytest.mark.asyncio decorator is applied (unlike the sibling test_collect_expired_sandboxes_cleans_orphaned at line 1020 and the new test_collect_expired_sandboxes_skips_unexpired at line 1035). Depending on pytest-asyncio mode, this test will either emit a warning and be skipped, or return an un-awaited coroutine so mock_terminate.assert_called_once_with("12345") is never reached — meaning the core assertion for the fixed idle-based GC path is effectively not exercised.

🐛 Proposed fix
+    `@pytest.mark.asyncio`
     async def test_collect_expired_sandboxes_terminates_old(
         self,
         sandbox_manager_with_mock_redis,
         mock_redis_client,
         mocker,
         sample_sandbox,
     ):
         """Test terminates sandboxes idle for more than two hours."""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@executor_manager/tests/services/test_sandbox_manager.py` around lines 990 -
1018, The async test test_collect_expired_sandboxes_terminates_old is missing
the pytest async marker, so pytest may skip or not run the coroutine; add the
`@pytest.mark.asyncio` decorator above
test_collect_expired_sandboxes_terminates_old (and ensure pytest is imported in
the test module if not already) so the coroutine runs and the core assertion
that manager._collect_expired_sandboxes triggers
manager.terminate_sandbox("12345") is actually executed.
🧹 Nitpick comments (1)
executor_manager/tests/services/test_sandbox_manager.py (1)

1000-1001: Couple the idle threshold to TimeoutConfig.sandbox_inactive_timeout instead of hard-coding 2h.

Hard-coding 2 * 3600 here duplicates the default of sandbox_inactive_timeout and will silently go stale if the default is ever changed (e.g., to 1h or 4h), making the test pass for the wrong reason. Prefer sourcing the threshold from config so the test stays aligned with production behavior.

♻️ Proposed refactor
-        mock_redis_client.zrange.return_value = ["12345"]
-        sample_sandbox.last_activity_at = time.time() - (2 * 3600) - 60
-        sample_sandbox.expires_at = time.time() + 3600
+        from executor_manager.common.config import get_config
+
+        inactive_timeout = get_config().timeout.sandbox_inactive_timeout
+        mock_redis_client.zrange.return_value = ["12345"]
+        sample_sandbox.last_activity_at = time.time() - inactive_timeout - 60
+        sample_sandbox.expires_at = time.time() + 3600
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@executor_manager/tests/services/test_sandbox_manager.py` around lines 1000 -
1001, The test sets sample_sandbox.last_activity_at using a hard-coded 2*3600;
replace that with the configured idle threshold by importing and using
TimeoutConfig.sandbox_inactive_timeout (e.g., set last_activity_at = time.time()
- TimeoutConfig.sandbox_inactive_timeout - 60) and ensure TimeoutConfig is
imported into executor_manager/tests/services/test_sandbox_manager.py so the
test remains aligned with the production timeout logic used by the sandbox
expiration code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@executor_manager/tests/services/test_sandbox_manager.py`:
- Around line 990-1018: The async test
test_collect_expired_sandboxes_terminates_old is missing the pytest async
marker, so pytest may skip or not run the coroutine; add the
`@pytest.mark.asyncio` decorator above
test_collect_expired_sandboxes_terminates_old (and ensure pytest is imported in
the test module if not already) so the coroutine runs and the core assertion
that manager._collect_expired_sandboxes triggers
manager.terminate_sandbox("12345") is actually executed.

---

Nitpick comments:
In `@executor_manager/tests/services/test_sandbox_manager.py`:
- Around line 1000-1001: The test sets sample_sandbox.last_activity_at using a
hard-coded 2*3600; replace that with the configured idle threshold by importing
and using TimeoutConfig.sandbox_inactive_timeout (e.g., set last_activity_at =
time.time() - TimeoutConfig.sandbox_inactive_timeout - 60) and ensure
TimeoutConfig is imported into
executor_manager/tests/services/test_sandbox_manager.py so the test remains
aligned with the production timeout logic used by the sandbox expiration code.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8bf5ca37-0f4a-4e1a-b115-79cbf57daabb

📥 Commits

Reviewing files that changed from the base of the PR and between d85b483 and 51ab0c4.

📒 Files selected for processing (6)
  • executor_manager/common/config.py
  • executor_manager/routers/sandbox.py
  • executor_manager/schemas/sandbox.py
  • executor_manager/services/sandbox/manager.py
  • executor_manager/services/sandbox/repository.py
  • executor_manager/tests/services/test_sandbox_manager.py
✅ Files skipped from review due to trivial changes (2)
  • executor_manager/routers/sandbox.py
  • executor_manager/schemas/sandbox.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • executor_manager/services/sandbox/repository.py
  • executor_manager/services/sandbox/manager.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant