feat: use ollama launch claude for agentic cloud models

caiopizzol · caiopizzol · commit 714a28e4a7e1 · 2026-02-17T15:56:28.000-03:00
Cloud models now use `ollama launch claude -- --print` instead of
`ollama run`, enabling tools, web search, and subagents. Updated
model versions to latest (minimax-m2.5, glm-5, kimi-k2.5). Synced
sample review prompt with improved comment style and output format.
diff --git a/README.md b/README.md
@@ -125,9 +125,9 @@ Review results are saved as JSON files containing raw model outputs, timestamps,
 | Qwen    | `coder-model` (default), `vision-model`                                                                                        | [Qwen Code Docs](https://qwenlm.github.io/qwen-code-docs/)     |
 | Mistral | Config-based (`~/.vibe/config.toml`)                                                                                           | [Mistral Vibe Docs](https://docs.mistral.ai/mistral-vibe/)     |
 | Grok    | `grok-code-fast-1`, `grok-4-1-fast-*`, `grok-4-fast-*`, `grok-3`, `grok-3-mini`                                                 | [xAI API Models](https://docs.x.ai/docs/models)                |
-| Ollama  | `qwen3-coder:480b-cloud`, `devstral-2:123b-cloud`, or any model from library | [Ollama Library](https://ollama.com/library) |
+| Ollama  | Cloud (recommended): `minimax-m2.5:cloud`, `glm-5:cloud`, `kimi-k2.5:cloud`, or any model from library | [Ollama Library](https://ollama.com/library) |
 
-> **Note:** Ollama cloud models use `:cloud` suffix and require `OLLAMA_API_KEY` environment variable. Get your API key at [ollama.com](https://ollama.com). You can also run local models (e.g., `qwen2.5-coder:7b`), but they are slow and require significant memory (~8GB+ RAM for 7B models).
+> **Note:** Ollama cloud models (`:cloud` suffix) run via `ollama launch claude` with full agentic capabilities — tools, web search, and subagents. Requires `OLLAMA_API_KEY`. Get your API key at [ollama.com](https://ollama.com). Local models (e.g., `qwen2.5-coder:7b`) use `ollama run` (text-only), are slower, and require significant memory (~8GB+ RAM for 7B models).
 
 > **Note:** Mistral and Grok use command-line argument passing (not stdin), which has a ~200KB limit on macOS. Very large diffs may cause these tools to fail while other tools succeed.
 
@@ -152,7 +152,7 @@ Customize prompts for each command:
 | Qwen    | `npm install -g @qwen-code/qwen-code`                                         |
 | Mistral | `pipx install mistral-vibe`                                                   |
 | Grok    | `bun add -g @vibe-kit/grok-cli`; `export GROK_API_KEY="key"` in `~/.zshrc`    |
-| Ollama  | [ollama.com/download](https://ollama.com/download); cloud: `export OLLAMA_API_KEY="key"` in `~/.zshrc`; local: `ollama pull <model>` |
+| Ollama  | [ollama.com/download](https://ollama.com/download); cloud (agentic): `export OLLAMA_API_KEY="key"` in `~/.zshrc`; local: `ollama pull <model>` |
 
 ## Usage
 
diff --git a/agents/multi-model-executor.md b/agents/multi-model-executor.md
@@ -43,10 +43,14 @@ For each tool in the config:
 
 **Environment override for nested Claude Code**:
 
-When running inside Claude Code, the `CLAUDECODE=1` environment variable prevents spawning nested `claude` sessions. For any tool whose command starts with `claude`, prefix the command with `CLAUDECODE=0` to allow it to run:
+When running inside Claude Code, `CLAUDECODE=1` prevents spawning nested sessions. Prefix with `CLAUDECODE=0` for any tool whose command starts with `claude` or uses `ollama launch claude`:
 
 ```bash
-CLAUDECODE=0 cat /tmp/conclave-prompt.md | claude --print 2>&1
+# Claude tools
+cat /tmp/conclave-prompt.md | CLAUDECODE=0 claude --print 2>&1
+
+# Ollama cloud tools (ollama launch claude wraps Claude Code)
+cat /tmp/conclave-prompt.md | CLAUDECODE=0 ollama launch claude --model minimax-m2.5:cloud -- --print 2>&1
 ```
 
 **Stdin-based tools** (most):
@@ -70,11 +74,28 @@ If a tool has a `model` field, inject it:
 | gemini   | `-m`      | Appended                   |
 | qwen     | `-m`      | Appended                   |
 | mistral  | N/A       | Config-based               |
-| ollama   | N/A       | Appended directly (no flag)|
+| ollama   | varies    | See Ollama section below   |
 | grok     | `-m`      | Appended                   |
 
 Skip injection if command already contains a model flag.
 
+### Ollama Command Pattern
+
+Ollama has two command patterns depending on model type:
+
+**Cloud models** (`:cloud` suffix) — use `ollama launch claude` (agentic, with tools/web search):
+- Command: `ollama launch claude -- --print`
+- Model injection: `--model` flag inserted before `--`
+- Requires `CLAUDECODE=0` prefix
+- Example: `cat /tmp/conclave-prompt.md | CLAUDECODE=0 ollama launch claude --model qwen3-coder:480b-cloud -- --print 2>&1`
+
+**Local models** (no `:cloud` suffix) — use `ollama run` (text-only):
+- Command: `ollama run`
+- Model injection: Appended directly (no flag)
+- Example: `cat /tmp/conclave-prompt.md | ollama run qwen2.5-coder:7b 2>&1`
+
+Detection: If the tool's `model` field ends with `:cloud`, use the cloud pattern.
+
 ### Step 3: Collect Results
 
 Use TaskOutput for each background task to wait for completion.
diff --git a/commands/consult.md b/commands/consult.md
@@ -125,10 +125,14 @@ PROMPT_EOF
 
 **Step 5b - Run consultation commands in background** (run ALL in parallel with `run_in_background: true`):
 
-**Environment override for nested Claude Code**: When running inside Claude Code, `CLAUDECODE=1` prevents spawning nested `claude` sessions. For any tool whose command starts with `claude`, prefix with `CLAUDECODE=0`:
+**Environment override for nested Claude Code**: When running inside Claude Code, `CLAUDECODE=1` prevents spawning nested sessions. Prefix with `CLAUDECODE=0` for any tool whose command starts with `claude` or uses `ollama launch claude`:
 
 ```bash
-CLAUDECODE=0 cat /tmp/conclave-consult-prompt.md | claude --print --model opus 2>&1
+# Claude tools
+cat /tmp/conclave-consult-prompt.md | CLAUDECODE=0 claude --print --model opus 2>&1
+
+# Ollama cloud tools
+cat /tmp/conclave-consult-prompt.md | CLAUDECODE=0 ollama launch claude --model minimax-m2.5:cloud -- --print 2>&1
 ```
 
 For stdin-based tools:
@@ -239,7 +243,8 @@ Same as `/review` - see `~/.config/conclave/tools.json` for enabled tools.
 | Qwen     | `qwen -o text`                     | `-m` (append)            |
 | Mistral  | `vibe --output text -p`            | Config-based             |
 | Grok     | `grok -p`                          | `-m` (append)            |
-| Ollama   | `ollama run`                       | Appended directly        |
+| Ollama (local) | `ollama run`                  | Appended directly        |
+| Ollama (cloud) | `ollama launch claude -- --print` | `--model` (before `--`) |
 
 ## Error Handling
 
diff --git a/commands/review.md b/commands/review.md
@@ -152,10 +152,14 @@ PROMPT_EOF
 
 **Step 4b - Run review commands in background** (run ALL in parallel with `run_in_background: true`):
 
-**Environment override for nested Claude Code**: When running inside Claude Code, `CLAUDECODE=1` prevents spawning nested `claude` sessions. For any tool whose command starts with `claude`, prefix with `CLAUDECODE=0`:
+**Environment override for nested Claude Code**: When running inside Claude Code, `CLAUDECODE=1` prevents spawning nested sessions. Prefix with `CLAUDECODE=0` for any tool whose command starts with `claude` or uses `ollama launch claude`:
 
 ```bash
-CLAUDECODE=0 cat /tmp/conclave-review-prompt.md | claude --print --model opus 2>&1
+# Claude tools
+cat /tmp/conclave-review-prompt.md | CLAUDECODE=0 claude --print --model opus 2>&1
+
+# Ollama cloud tools
+cat /tmp/conclave-review-prompt.md | CLAUDECODE=0 ollama launch claude --model minimax-m2.5:cloud -- --print 2>&1
 ```
 
 For most tools (stdin-based):
@@ -177,7 +181,7 @@ For Mistral Vibe and Grok (command substitution - do not accept stdin):
 | gemini   | `-m`       | Appended to command         |
 | qwen     | `-m`       | Appended to command         |
 | mistral  | N/A        | Model set via `~/.vibe/config.toml` |
-| ollama   | N/A        | Appended directly (no flag) |
+| ollama   | varies     | See Ollama examples below   |
 | grok     | `-m`       | Appended to command         |
 
 **Notes**:
@@ -214,9 +218,14 @@ Original: grok -p
 With model: grok -p -m grok-code-fast-1
 grok -p -m grok-code-fast-1 "$(cat /tmp/conclave-review-prompt.md)"
 
-# Ollama (model appended directly, no flag)
+# Ollama local (model appended directly, no flag)
 Original: ollama run
 With model: ollama run qwen2.5-coder:7b
+
+# Ollama cloud (--model flag before --, requires CLAUDECODE=0)
+Original: ollama launch claude -- --print
+With model: ollama launch claude --model qwen3-coder:480b-cloud -- --print
+Final: cat /tmp/prompt.md | CLAUDECODE=0 ollama launch claude --model qwen3-coder:480b-cloud -- --print 2>&1
 ```
 
 Use `timeout: 300000` (5 minutes) for each command since AI tools can be slow.
@@ -643,13 +652,15 @@ Most tools receive the prompt via stdin: `cat prompt.md | {command}`
 | Qwen     | `qwen -o text`                     | `-m` (append)            | Reads prompt from stdin, `-o text` for plain output        |
 | Mistral  | `vibe --output text -p`            | Config-based             | Uses command substitution: `vibe --output text -p "$(cat file)"` |
 | Grok     | `grok -p`                          | `-m` (append)            | Uses command substitution: `grok -p -m model "$(cat file)"` |
-| Ollama   | `ollama run`                       | Appended directly        | Model appended without flag: `ollama run <model>`          |
+| Ollama (local) | `ollama run`                  | Appended directly        | Model appended without flag: `ollama run <model>`          |
+| Ollama (cloud) | `ollama launch claude -- --print` | `--model` (before `--`) | Agentic mode with tools/web search. Requires `CLAUDECODE=0` |
 
 **Notes**:
 - All tools read from the same prompt file (`/tmp/conclave-review-prompt.md`) written once in Step 4a.
 - Mistral Vibe does not accept stdin; prompt must be passed via `-p` flag using command substitution.
 - Mistral model selection is done via `~/.vibe/config.toml` (`active_model` setting), not CLI flags.
 - Grok CLI does not accept stdin; prompt must be passed via `-p` flag using command substitution (like Mistral).
+- Ollama cloud models (`:cloud` suffix) use `ollama launch claude` which runs a full agentic session. The model gets access to file read, grep, bash, web search, and subagents.
 - **Limitation**: Mistral and Grok's command-line argument passing has a ~200KB limit (ARG_MAX). Very large diffs may fail.
 
 ---
diff --git a/config/prompt.example.md b/config/prompt.example.md
@@ -14,14 +14,42 @@ When reviewing the diff:
 2. **Consider readability** - Is the code clear and maintainable? Does it follow best practices?
 3. **Evaluate performance** - Are there obvious performance concerns or optimizations?
 4. **Assess test coverage** - Are there adequate tests for these changes?
-5. **Ask clarifying questions** - Ask for clarification if unsure about the changes.
-6. **Don't be overly pedantic** - Nitpicks are fine, but only if relevant.
+5. **Don't be overly pedantic** - Nitpicks are fine, but only if relevant.
 
-In your output:
+## Comment Style
 
-- Provide a summary overview of the general code quality.
-- Present issues in a table with columns: index, line number(s), code, issue, and potential solution(s).
-- If no issues are found, briefly state that the code meets best practices.
+Write each finding as a short comment (1-3 sentences). Think teammate leaving a quick note, not writing a paper.
+
+**Rules**:
+- Concrete consequence first, then the technical detail
+- End with a question when it's a design decision
+- Lowercase start, no prefixes like "nit:" or "suggestion:"
+- Use simple words -- say "pick one place" not "canonicalize", "cut in half" not "halve", "differs from" not "diverges from"
+- Don't hedge ("I think maybe this could potentially...") -- just say what the issue is
+- Don't over-explain -- if the code is right there, trust the reader to follow
+- Skip pleasantries and filler
+
+## Output Format
+
+Start with a 2-3 sentence summary of overall code quality.
+
+Then list each finding:
+
+```
+**<file>:<lines>** -- <short title>
+
+<1-3 sentence comment>
+```
+
+End with a summary table:
+
+```
+| Finding | Severity | Action |
+|---------|----------|--------|
+| <short title> | Low/Medium/High | <what to do> |
+```
+
+If no issues are found, briefly state that the code looks good.
 
 ## Full Diff
 
diff --git a/config/tools.example.json b/config/tools.example.json
@@ -51,68 +51,68 @@
 			"model": "grok-code-fast-1",
 			"description": "xAI Grok CLI (community)"
 		},
-		"ollama-qwen": {
+		"ollama-minimax": {
 			"enabled": false,
 			"scope": ["review", "consult"],
-			"command": "ollama run",
-			"model": "qwen3-coder:480b-cloud",
-			"description": "Ollama (Qwen3 Coder 480B)"
+			"command": "ollama launch claude -- --print",
+			"model": "minimax-m2.5:cloud",
+			"description": "Ollama (MiniMax M2.5, agentic)"
 		},
-		"ollama-devstral": {
+		"ollama-glm": {
 			"enabled": false,
 			"scope": ["review", "consult"],
-			"command": "ollama run",
-			"model": "devstral-2:123b-cloud",
-			"description": "Ollama (Devstral 2 123B)"
+			"command": "ollama launch claude -- --print",
+			"model": "glm-5:cloud",
+			"description": "Ollama (GLM-5, agentic)"
 		},
-		"ollama-local": {
+		"ollama-kimi": {
 			"enabled": false,
-			"scope": ["review"],
-			"command": "ollama run",
-			"model": "qwen2.5-coder:7b",
-			"description": "Ollama (Qwen 2.5 Coder 7B, local)"
+			"scope": ["review", "consult"],
+			"command": "ollama launch claude -- --print",
+			"model": "kimi-k2.5:cloud",
+			"description": "Ollama (Kimi K2.5, agentic)"
 		},
-		"ollama-kimi": {
+		"ollama-qwen": {
 			"enabled": false,
 			"scope": ["review", "consult"],
-			"command": "ollama run",
-			"model": "kimi-k2:1t-cloud",
-			"description": "Ollama (Kimi K2 1T)"
+			"command": "ollama launch claude -- --print",
+			"model": "qwen3-coder:480b-cloud",
+			"description": "Ollama (Qwen3 Coder 480B, agentic)"
 		},
-		"ollama-glm": {
+		"ollama-devstral": {
 			"enabled": false,
 			"scope": ["review", "consult"],
-			"command": "ollama run",
-			"model": "glm-4.7:cloud",
-			"description": "Ollama (GLM-4.7)"
+			"command": "ollama launch claude -- --print",
+			"model": "devstral-2:123b-cloud",
+			"description": "Ollama (Devstral 2 123B, agentic)"
 		},
 		"ollama-deepseek": {
 			"enabled": false,
 			"scope": ["review", "consult"],
-			"command": "ollama run",
+			"command": "ollama launch claude -- --print",
 			"model": "deepseek-v3.2:cloud",
-			"description": "Ollama (DeepSeek V3.2)"
+			"description": "Ollama (DeepSeek V3.2, agentic)"
 		},
 		"ollama-rnj": {
 			"enabled": false,
 			"scope": ["review", "consult"],
-			"command": "ollama run",
+			"command": "ollama launch claude -- --print",
 			"model": "rnj-1:8b-cloud",
-			"description": "Ollama (RNJ-1 8B, code/STEM optimized)"
+			"description": "Ollama (RNJ-1 8B, agentic)"
 		},
 		"ollama-devstral-small": {
 			"enabled": false,
 			"scope": ["review", "consult"],
-			"command": "ollama run",
+			"command": "ollama launch claude -- --print",
 			"model": "devstral-small-2:24b-cloud",
-			"description": "Ollama (Devstral Small 2 24B, vision+tools)"
+			"description": "Ollama (Devstral Small 2 24B, agentic)"
 		},
-		"ollama-minimax": {
+		"ollama-local": {
 			"enabled": false,
-			"scope": ["review", "consult"],
+			"scope": ["review"],
 			"command": "ollama run",
-			"model": "minimax-m2:cloud",
-			"description": "Ollama (MiniMax M2, coding/agentic)"
+			"model": "qwen2.5-coder:7b",
+			"description": "Ollama (Qwen 2.5 Coder 7B, local)"
 		}
 	},
 	"prompts": {
diff --git a/tests/cli-live.sh b/tests/cli-live.sh
@@ -5,7 +5,7 @@
 # WARNING: This runs real API calls and incurs costs.
 # Only run when you need to verify tools are working end-to-end.
 #
-# NOTE: Ollama cloud models require OLLAMA_API_KEY environment variable.
+# NOTE: Ollama cloud models require OLLAMA_API_KEY and use `ollama launch claude` (agentic mode).
 # NOTE: Grok requires GROK_API_KEY environment variable.
 
 set -eo pipefail
@@ -46,10 +46,13 @@ test_tool() {
         return 2
     fi
 
+    # Build env prefix (e.g., CLAUDECODE=0 for nested Claude Code sessions)
+    local env_prefix="${4:-}"
+
     if [ "$use_stdin" = "true" ]; then
-        result=$(echo "$PROMPT" | $timeout_cmd $TIMEOUT $cmd 2>&1)
+        result=$(echo "$PROMPT" | $env_prefix $timeout_cmd $TIMEOUT $cmd 2>&1)
     else
-        result=$($timeout_cmd $TIMEOUT $cmd "$PROMPT" 2>&1)
+        result=$($env_prefix $timeout_cmd $TIMEOUT $cmd "$PROMPT" 2>&1)
     fi
 
     exit_code=$?
@@ -97,11 +100,15 @@ else
     echo "  ○ grok skipped (GROK_API_KEY not set)"
     ((skipped++))
 fi
-# Skip ollama cloud models if API key not set
-if [[ "$MODEL_OLLAMA" == *"-cloud"* ]] && [[ -z "$OLLAMA_API_KEY" ]]; then
-    echo "Testing ollama..."
-    echo "  ○ ollama skipped (cloud model requires OLLAMA_API_KEY)"
-    ((skipped++))
+# Ollama: cloud models use `ollama launch claude` (agentic), local uses `ollama run`
+if [[ "$MODEL_OLLAMA" == *":cloud"* ]]; then
+    if [[ -z "$OLLAMA_API_KEY" ]]; then
+        echo "Testing ollama..."
+        echo "  ○ ollama skipped (cloud model requires OLLAMA_API_KEY)"
+        ((skipped++))
+    else
+        test_tool "ollama" "ollama launch claude --model $MODEL_OLLAMA -- --print" true "CLAUDECODE=0"
+    fi
 else
     test_tool "ollama" "ollama run $MODEL_OLLAMA" true
 fi
diff --git a/tests/cli-models.sh b/tests/cli-models.sh
@@ -5,7 +5,8 @@
 # WARNING: This runs many API calls (one per model) and incurs significant costs.
 # Only run when validating README model documentation is accurate.
 #
-# NOTE: Ollama cloud models (`:cloud` suffix) require OLLAMA_API_KEY environment variable.
+# NOTE: Ollama cloud models (`:cloud` suffix) require OLLAMA_API_KEY environment variable
+# and use `ollama launch claude` (agentic mode with tools/web search).
 # Get your API key at https://ollama.com
 #
 # NOTE: Grok models require GROK_API_KEY environment variable.
@@ -48,10 +49,13 @@ test_model() {
         return 2
     fi
 
+    # Build env prefix (e.g., CLAUDECODE=0 for nested Claude Code sessions)
+    local env_prefix="${5:-}"
+
     if [ "$use_stdin" = "true" ]; then
-        result=$(echo "$PROMPT" | $timeout_cmd $TIMEOUT $cmd 2>&1)
+        result=$(echo "$PROMPT" | $env_prefix $timeout_cmd $TIMEOUT $cmd 2>&1)
     else
-        result=$($timeout_cmd $TIMEOUT $cmd "$PROMPT" 2>&1)
+        result=$($env_prefix $timeout_cmd $TIMEOUT $cmd "$PROMPT" 2>&1)
     fi
 
     exit_code=$?
@@ -137,15 +141,18 @@ fi
 echo ""
 
 # Ollama models
-# Cloud models require OLLAMA_API_KEY, local models must be pulled first
+# Cloud models use `ollama launch claude` (agentic), local models use `ollama run`
 echo "--- Ollama ---"
 if [[ -n "$OLLAMA_API_KEY" ]]; then
-    test_model "ollama" "qwen3-coder:480b-cloud" "ollama run qwen3-coder:480b-cloud" true
-    test_model "ollama" "devstral-2:123b-cloud" "ollama run devstral-2:123b-cloud" true
+    # Recommended cloud models (agentic via ollama launch claude)
+    test_model "ollama" "minimax-m2.5:cloud" "ollama launch claude --model minimax-m2.5:cloud -- --print" true "CLAUDECODE=0"
+    test_model "ollama" "glm-5:cloud" "ollama launch claude --model glm-5:cloud -- --print" true "CLAUDECODE=0"
+    test_model "ollama" "kimi-k2.5:cloud" "ollama launch claude --model kimi-k2.5:cloud -- --print" true "CLAUDECODE=0"
 else
     echo "  ○ cloud models skipped (OLLAMA_API_KEY not set)"
-    ((skipped+=2))
+    ((skipped+=3))
 fi
+# Local model (text-only via ollama run)
 test_model "ollama" "qwen2.5-coder:7b" "ollama run qwen2.5-coder:7b" true
 echo ""