Skip to content

[Bug] SDK Sandbox.create() triggers 504 Gateway Timeout due to incorrect /v1 endpoint routing #591

@zson-two

Description

@zson-two

Describe the bug

When using the Python SDK (opensandbox>=0.1.6) to create a new sandbox instance via Sandbox.create(), the SDK frequently triggers a 504 Gateway Timeout when the Kubernetes cluster takes longer than 60 seconds to pull the image and assign a Pod IP.

The root cause forms a mismatch between the SDK design and the API endpoints. The Sandbox.create() method expects to hit the asynchronous POST /sandboxes endpoint (which returns HTTP 202 Accepted immediately) and then wait locally in sandbox.check_ready(). However, ConnectionConfig unconditionally appends _API_VERSION = "v1" to the base_url, causing the SDK to send the request to the blocking/synchronous POST /v1/sandboxes endpoint instead.

When the backend takes longer than the reverse proxy/gateway timeout (usually 60s) to fully start the Pod, it results in an inescapable SandboxApiException: Create sandbox failed: HTTP 504.

To Reproduce

Run the following script initializing a sandbox using the standard opensandbox Python SDK.

import asyncio
from datetime import timedelta
from opensandbox import Sandbox
from opensandbox.config import ConnectionConfig

async def main():
    config = ConnectionConfig(
        domain="http://sandbox-api.example.com",
        request_timeout=timedelta(seconds=3600),
        use_server_proxy=True,
    )
    
    # This will trigger a 504 timeout if the Pod takes >60s to start
    # as it attempts to synchronously await readiness due to hitting the /v1/ API.
    sandbox = await Sandbox.create(
        "ubuntu:24.04", # Assumes image takes slightly over 60s to pull on the Node
        resource={"cpu": "500m", "memory": "512Mi"},
        connection_config=config,
        timeout=timedelta(minutes=10),
    )
    
asyncio.run(main())

Traceback:

Traceback (most recent call last):
  File "/path/to/project/.venv/lib/python3.14/site-packages/opensandbox/adapters/sandboxes_adapter.py", line 147, in create_sandbox
    handle_api_error(response_obj, "Create sandbox")
  File "/path/to/project/.venv/lib/python3.14/site-packages/opensandbox/adapters/converter/response_handler.py", line 130, in handle_api_error
    raise SandboxApiException(
    ...<3 lines>...
    )
opensandbox.exceptions.sandbox.SandboxApiException: Create sandbox failed: HTTP 504

Expected behavior

The SDK should correctly target the POST /sandboxes endpoint, receive the 202 Accepted response promptly, and perform the timeout checking asynchronously through the sandbox.check_ready(ready_timeout, health_check_polling_interval) cycle as intended by the client design.

Root Cause Analysis

  1. In opensandbox/config/connection.py, the ConnectionConfig sets _API_VERSION = "v1" and unconditionally appends it in get_base_url():
    def get_base_url(self) -> str:
        domain = self.get_domain()
        if domain.startswith("http://") or domain.startswith("https://"):
            return f"{domain}/{self._API_VERSION}"
        return f"{self.protocol}://{domain}/{self._API_VERSION}"
  2. Because of this, the automatically generated API client in post_sandboxes.py points to http://sandbox-api.example.com/v1/sandboxes instead of http://sandbox-api.example.com/sandboxes.
  3. The /v1/sandboxes endpoint is a server-side synchronous method designed to block until Kubernetes achieves POD_READY_WITH_IP.
  4. The API client actually correctly anticipates a 202 response internally (if response.status_code == 202: return CreateSandboxResponse.from_dict(...)), but it will never gracefully get it due to being forced to point to the /v1/ route.

Workaround

Currently, developers can override the ConnectionConfig behavior in their application logic to drop the /v1 component:

class AsyncConnectionConfig(ConnectionConfig):
    def get_base_url(self) -> str:
        domain = self.get_domain()
        return domain if domain.startswith("http") else f"{self.protocol}://{domain}"

config = AsyncConnectionConfig(
    domain="http://sandbox-api.example.com",
    request_timeout=timedelta(seconds=3600),
    use_server_proxy=True,
)

System Information

  • Python Version: 3.14
  • OpenSandbox SDK Version: 0.1.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions