-
Notifications
You must be signed in to change notification settings - Fork 732
Description
Describe the bug
When using the Python SDK (opensandbox>=0.1.6) to create a new sandbox instance via Sandbox.create(), the SDK frequently triggers a 504 Gateway Timeout when the Kubernetes cluster takes longer than 60 seconds to pull the image and assign a Pod IP.
The root cause forms a mismatch between the SDK design and the API endpoints. The Sandbox.create() method expects to hit the asynchronous POST /sandboxes endpoint (which returns HTTP 202 Accepted immediately) and then wait locally in sandbox.check_ready(). However, ConnectionConfig unconditionally appends _API_VERSION = "v1" to the base_url, causing the SDK to send the request to the blocking/synchronous POST /v1/sandboxes endpoint instead.
When the backend takes longer than the reverse proxy/gateway timeout (usually 60s) to fully start the Pod, it results in an inescapable SandboxApiException: Create sandbox failed: HTTP 504.
To Reproduce
Run the following script initializing a sandbox using the standard opensandbox Python SDK.
import asyncio
from datetime import timedelta
from opensandbox import Sandbox
from opensandbox.config import ConnectionConfig
async def main():
config = ConnectionConfig(
domain="http://sandbox-api.example.com",
request_timeout=timedelta(seconds=3600),
use_server_proxy=True,
)
# This will trigger a 504 timeout if the Pod takes >60s to start
# as it attempts to synchronously await readiness due to hitting the /v1/ API.
sandbox = await Sandbox.create(
"ubuntu:24.04", # Assumes image takes slightly over 60s to pull on the Node
resource={"cpu": "500m", "memory": "512Mi"},
connection_config=config,
timeout=timedelta(minutes=10),
)
asyncio.run(main())Traceback:
Traceback (most recent call last):
File "/path/to/project/.venv/lib/python3.14/site-packages/opensandbox/adapters/sandboxes_adapter.py", line 147, in create_sandbox
handle_api_error(response_obj, "Create sandbox")
File "/path/to/project/.venv/lib/python3.14/site-packages/opensandbox/adapters/converter/response_handler.py", line 130, in handle_api_error
raise SandboxApiException(
...<3 lines>...
)
opensandbox.exceptions.sandbox.SandboxApiException: Create sandbox failed: HTTP 504
Expected behavior
The SDK should correctly target the POST /sandboxes endpoint, receive the 202 Accepted response promptly, and perform the timeout checking asynchronously through the sandbox.check_ready(ready_timeout, health_check_polling_interval) cycle as intended by the client design.
Root Cause Analysis
- In
opensandbox/config/connection.py, theConnectionConfigsets_API_VERSION = "v1"and unconditionally appends it inget_base_url():def get_base_url(self) -> str: domain = self.get_domain() if domain.startswith("http://") or domain.startswith("https://"): return f"{domain}/{self._API_VERSION}" return f"{self.protocol}://{domain}/{self._API_VERSION}"
- Because of this, the automatically generated API client in
post_sandboxes.pypoints tohttp://sandbox-api.example.com/v1/sandboxesinstead ofhttp://sandbox-api.example.com/sandboxes. - The
/v1/sandboxesendpoint is a server-side synchronous method designed to block until Kubernetes achievesPOD_READY_WITH_IP. - The API client actually correctly anticipates a
202response internally (if response.status_code == 202: return CreateSandboxResponse.from_dict(...)), but it will never gracefully get it due to being forced to point to the/v1/route.
Workaround
Currently, developers can override the ConnectionConfig behavior in their application logic to drop the /v1 component:
class AsyncConnectionConfig(ConnectionConfig):
def get_base_url(self) -> str:
domain = self.get_domain()
return domain if domain.startswith("http") else f"{self.protocol}://{domain}"
config = AsyncConnectionConfig(
domain="http://sandbox-api.example.com",
request_timeout=timedelta(seconds=3600),
use_server_proxy=True,
)System Information
- Python Version: 3.14
- OpenSandbox SDK Version: 0.1.6