feat(local): Gemini CLI OpenAI bridge (SSE) + REAL LLM Playwright smoke + SSE unit tests #173

symbiosis-institute · 2025-12-30T23:32:32Z

Summary

Integrates gemini-cli-openai as an OpenAI-compatible LLM provider for local ii-agent development.

What changed

Handle gemini-cli-openai default SSE streaming in OpenAI provider by consuming SSE and building a minimal synthetic non-stream response for agent-mode compatibility.
Add backend unit tests for SSE stream consumption (7 tests).
Add Playwright smoke for chat + agent mode with REAL LLM (E2E_REAL_LLM=1), plus trace-on-failure config.
Harden .gitignore to exclude Playwright auth state and test artifacts.
Refactor all SSE consumers to use list+join for O(n) performance.

How to run (repro)

Worker: curl -s http://localhost:3888/v1/models | jq .
Backend tests: python3 -m pytest tests/llm/ -v
API smoke: ./scripts/smoke-openai-base-url.sh
UI smoke (REAL LLM): cd frontend && E2E_REAL_LLM=1 npx playwright test chat-smoke.spec.ts

Evidence (Option A)

✅ Backend tests: 59/59 PASS (incl. 7 SSE tests)
✅ API smoke: PASS (/v1/models, /health, frontend)
✅ Playwright REAL LLM: PASS (chat + agent)
✅ Working tree clean; secrets/artifacts ignored

Notes / Limitations

Local development only: This integration is designed for local development with gemini-cli-openai worker, not production OpenAI API usage.
Tool call aggregation: Assumes complete tool calls (gemini-cli-openai behavior). For production OpenAI with incremental deltas, additional merging logic would be needed.
Token tracking: Synthetic response sets usage to 0; acceptable for local dev where cost tracking is not critical.
MCP review: codex-mcp PARTIAL PASS (concerns documented as not applicable to local use case); gemini-cli WAIVED (rate limit).

- Add DockerSandbox provider for air-gapped/local deployments - Add PortPoolManager for centralized port allocation (30000-30999) - Add LocalStorage providers for ii_agent and ii_tool - Add MCP tool image processing from sandbox containers - Add storage factory functions with local/GCS support - Add test suite (143 tests passing) - Fix connect() to register ports preventing conflicts on reconnect - Fix delete() to cleanup orphaned volumes - Update docs with port management and local sandbox setup

Chat file handling: - Fix file_search filtering by user_id only (not session_id) for cross-session access - Add SHA-256 content hash deduplication in OpenAI vector store - Reduce file_search max results to 3 to prevent context overflow - Add file corpus discovery so AI knows which files are searchable - Fix reasoning.effort parameter only sent to reasoning models - Add hasattr guard for text attribute on image-only messages Sandbox management: - Add orphan cleanup loop (5min interval) to remove containers without active sessions - Add /internal/sandboxes/{id}/has-active-session endpoint for session verification - Add port_manager.scan_existing_containers() to recover state on restart - Add LOCAL_MODE config with orphan cleanup settings Resource limits: - Add MAX_TABS=20 limit in browser with force-close of oldest tabs - Add MAX_SHELL_SESSIONS=10 limit in shell tool Tests: Add 248 unit tests covering all changes

## New Features - expose_port(external) parameter: external=True returns localhost:port for browser access, external=False returns internal Docker IP for container-to-container communication - LLMConfig.get_max_output_tokens(): Model-specific output token limits (64K Claude 4, 100K o1, 16K GPT-4, 8K Gemini) - Browser MAX_TABS=20 limit with automatic cleanup of oldest tabs - Shell session MAX_SHELL_SESSIONS=15 limit with clear error messages - Anthropic native thinking blocks support via beta endpoint - Extended context (1M tokens) support for Claude models ## Frontend Improvements - Added selectIsStopped selector for proper stopped state UI handling - Fixed agent task state transitions for cancelled sessions - Improved subagent container with session awareness ## New Test Coverage (343 tests total) - tests/llm/test_llm_config.py: LLMConfig.get_max_output_tokens() tests - tests/tools/test_browser_tab_limit.py: Browser MAX_TABS enforcement - tests/tools/test_resource_limits.py: Browser and shell session limits - tests/tools/test_generation_config_factory.py: Image/video generation configs - tests/tools/test_openai_dalle.py: DALL-E 3 image generation client - tests/tools/test_openai_sora.py: Sora video generation client - tests/storage/test_local_storage.py: LocalStorage.get_permanent_url() - tests/storage/test_tool_local_storage.py: Tool server LocalStorage ## Code Quality - Removed debug print statements from anthropic.py - Removed trailing whitespace from all files - Fixed test assertions to match implementation behavior

…proposed changes.

…am PR172 Combines: - Upstream PR172 (local-docker-sandbox from mdear) - Our UX fixes: Google GSI guard, login route crash fix, dev autologin Changes: - frontend: Add Google GSI guard (don't init without client_id; skip when dev autologin enabled) - frontend: Fix /login route crash when Google OAuth is disabled - docker: Pass VITE_DEV_AUTH_AUTOLOGIN via local-only compose build args - test: Add Playwright smoke that fails on runtime/console errors - chore: Ignore Playwright test artifacts (playwright-report/, test-results/) Gates: - Backend tests: 52/52 PASSED - API health: 200/200 - Playwright smoke: 4/4 PASSED - codex-mcp: PASS (with notes for future improvements) - gemini-cli: Reviewed (minor findings noted) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

…itespace guard Fix A - Dev auto-login resilience: - Add 10-second timeout with AbortController - Proper cleanup of isAutoLoggingIn on timeout - Clear timeout in all code paths (success/error/abort) Fix B - Safer dev-auth defaults: - Change DEV_AUTH_ENABLED from hardcoded "true" to ${DEV_AUTH_ENABLED:-false} - Add prominent security warning in .stack.env.local.example - Dev auth is now OPT-IN only Fix C - Whitespace client_id guard: - Add .trim() to VITE_GOOGLE_CLIENT_ID in provider.tsx - Align googleEnabled logic in login.tsx with trimmed value Gates: - Backend tests: 52/52 PASSED - API health: 200/200 - Playwright smoke: 4/4 PASSED - codex-mcp: PASS (all findings resolved) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

…enai worker This change enables ii-agent local Docker stack to work with OpenAI-compatible LLM workers like gemini-cli-openai (GewoonJaap/gemini-cli-openai) running on the host machine. Changes: - Add OPENAI_BASE_URL env var documentation to .stack.env.local.example - Add chat smoke test (frontend/e2e/chat-smoke.spec.ts) with mocked SSE for deterministic testing; supports real provider testing via E2E_REAL_LLM=1 - Add API smoke script (scripts/smoke-openai-base-url.sh) to validate /v1/models and /v1/chat/completions endpoints The backend already supports base_url via LLMConfig.base_url, so no backend changes were needed. The smoke tests provide coverage for the chat flow with both mocked and real LLM providers. QA: - Backend tests: 52/52 PASSED - Playwright tests: 5/5 PASSED (including chat-smoke) - Codex MCP: P2 issue fixed (console error filtering logic) - Gemini CLI: No blocking issues 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Change pyproject.toml: ddgs>=9.9.1 → duckduckgo-search>=8.1.1 - The 'ddgs' package name was incorrect; actual PyPI package is 'duckduckgo-search' - Import statements already updated: from duckduckgo_search import DDGS - This fixes P1 issue identified by codex-mcp QA gate QA: all 4 import files verified working with uv run pytest 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Add SSE stream consumption logic in OpenAI provider agenerate() * gemini-cli-openai worker returns SSE by default * Use stream=True and consume synchronously with list+join for O(n) performance * Build synthetic response object compatible with non-streaming interface - Security: Add .gitignore for auth artifacts and test results * frontend/e2e/.auth/ (contains session tokens) * playwright-report/, test-results/ - Performance: Refactor all SSE consumers to use list+join * agenerate() - agent mode SSE consumption * Streaming chat consumers - Tests: Add backend unit tests for SSE stream consumption * tests/llm/test_sse_stream_consumption.py (7 tests) * Cover: multi-chunk, tool calls, finish reason, list+join pattern - E2E: Add agent mode smoke test with REAL LLM * Requires E2E_REAL_LLM=1 (worker integration validated) * Backend unit tests cover SSE logic, mocked E2E not needed Test Results: - Backend: 59 tests PASSED (including 7 new SSE tests) - API smoke: PASSED - Playwright: 2/2 PASSED (chat + agent mode with REAL LLM) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

mdear and others added 12 commits December 24, 2025 05:40

Added some unit tests and one fix to file_system/utils.py

af19ded

Added additional documentation to explain architecture and design of …

6e87a60

…proposed changes.

Merge PR172 local-docker-sandbox into integration branch

53455cf

chore: add missing Playwright ignore entries

ad05ded

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(local): Gemini CLI OpenAI bridge (SSE) + REAL LLM Playwright smoke + SSE unit tests #173

feat(local): Gemini CLI OpenAI bridge (SSE) + REAL LLM Playwright smoke + SSE unit tests #173

Uh oh!

symbiosis-institute commented Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(local): Gemini CLI OpenAI bridge (SSE) + REAL LLM Playwright smoke + SSE unit tests #173

Are you sure you want to change the base?

feat(local): Gemini CLI OpenAI bridge (SSE) + REAL LLM Playwright smoke + SSE unit tests #173

Uh oh!

Conversation

symbiosis-institute commented Dec 30, 2025

Summary

What changed

How to run (repro)

Evidence (Option A)

Notes / Limitations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants