[copilot-cli-research] Copilot CLI Deep Research - 2026-03-11 #20595

2026-03-11T21:29:38Z

github-actions[bot]
bot Mar 11, 2026

Analysis Date: 2026-03-11
Repository: github/gh-aw
Scope: 166 total workflows, 79 using Copilot engine (48%)
Run: §22974875231

📊 Executive Summary

This is the first comprehensive Copilot CLI deep research for the github/gh-aw repository. The analysis compared all documented Copilot CLI features against actual usage in 79 Copilot-powered workflows.

Key findings: The codebase shows excellent adoption of core patterns (strict: true at 64%, copilot-requests at 51%, cache-memory at 44%), but several powerful features remain nearly untouched: max-continuations is used in only 1 workflow, plugins and startup-timeout have zero adoption, and the AWF security sandbox is only used in 16% of workflows. Model selection is left entirely to org-level variables with no per-workflow overrides, missing cost optimization opportunities.

Primary recommendation: Expand AWF sandbox adoption to high-sensitivity workflows (security scans, secret analysis, malicious code detection) — currently 84% of Copilot workflows run without network firewalling.

🔴 High Priority Issues

Security gap: Workflows like daily-malicious-code-scan.md, daily-secrets-analysis.md, and security-compliance.md handle sensitive data but do not use AWF sandbox. AWF provides network isolation preventing data exfiltration, and only 13/79 (16%) Copilot workflows have it enabled.

Multi-turn potential missed: max-continuations is used in only 1 workflow (smoke-copilot.md). Complex analysis workflows like daily-compiler-quality.md, ci-coach.md, and repository-quality-improver.md could benefit significantly from multi-turn autopilot continuation.

🟡 Medium Priority Opportunities

Custom agent files (engine.agent): 12/79 (15%) use this — more domain-specific workflows could use specialized agent personas
Specific GitHub toolsets: Some workflows use broad [default] when more specific toolsets would reduce attack surface
Model cost optimization: Zero Copilot workflows override the model — cheap tasks could use gpt-5.1-codex-mini explicitly

1️⃣ Current State Analysis

View Copilot CLI Capabilities Inventory

Copilot CLI Engine Features (from source code analysis)

CLI Execution Flags (auto-configured by compiler):

--add-dir — Directories the agent can access (auto-managed)
--disable-builtin-mcps — Always applied
--log-level all --log-dir — Always applied for logging
--allow-tool / --allow-all-tools — Generated from tools: config
--allow-all-paths — Applied when edit: tool is enabled
--autopilot --max-autopilot-continues — Applied when max-continuations > 1
--agent — Applied when engine.agent is set
--prompt — Always applied

Engine Configuration Fields (frontmatter):

engine: copilot — Simple form
engine.id: copilot — Extended form with subfields
engine.version — Pin CLI version (e.g., "0.0.422")
engine.model — Override model (e.g., gpt-5.1-codex-mini)
engine.command — Custom executable path
engine.args — Inject custom CLI arguments
engine.agent — Reference .github/agents/*.agent.md file
engine.env — Custom environment variables
engine.max-continuations — Enable autopilot with continuation limit

Sandbox Options:

sandbox.agent: awf — AWF network firewall (string shorthand)
sandbox.agent.mounts — Custom volume mounts into AWF container
sandbox.agent: false — Disable sandbox explicitly

Special Features:

features.copilot-requests: true — Use github.token instead of COPILOT_GITHUB_TOKEN
strict: true — Strict validation mode
plugins: — Install Copilot plugins before execution
max-continuations: — Top-level shorthand for multi-turn autopilot

Available Tools for Copilot:

bash, edit, web-fetch, github, playwright, serena
cache-memory, repo-memory, agentic-workflows
mcp-scripts (with mcp-scripts feature flag)
Custom MCP servers via HTTP/SSE/command

Network Configuration:

network.allowed — List of allowed domain presets or hostnames (with AWF)
Presets: defaults, go, node, python, github, playwright, npm

View Usage Statistics

Engine Distribution (166 total workflows)

Engine	Count	%
`copilot`	79	48%
`claude`	34	20%
`codex`	9	5%
Other/default	44	27%

Feature Usage in 79 Copilot Workflows

Feature	Used	%	Notes
`strict: true`	51	64%	✅ Good adoption
`features.copilot-requests`	40	51%	✅ Good adoption
`cache-memory`	35	44%	✅ Good adoption
`repo-memory`	15	19%	🟡 Moderate
`playground/playwright`	15	19%	🟡 Moderate
`engine.agent`	12	15%	🟡 Low
`sandbox/AWF`	13	16%	🔴 Low - security gap
`web-fetch`	9	11%	🟡 Low
`serena`	10	13%	🟡 Low
`max-continuations`	1	1%	🔴 Near zero
`mcp-scripts`	1	1%	🔴 Near zero
`model override`	0	0%	🔴 Unused
`engine.args`	0	0%	🔴 Unused
`version pinning`	0	0%	🔴 Unused
`plugins`	0	0%	🔴 Unused
`startup-timeout`	0	0%	🔴 Unused

Timeout Distribution (Copilot Workflows)

Timeout	Count
10 min	17
15 min	16
20 min	19
30 min	17
5 min	4
45 min	4
60 min	2

Average: ~18 minutes. Most workflows cluster around 10-30 minutes.

2️⃣ Feature Usage Matrix

Feature Category	Available	Used	Not Used	Usage Rate
Security (AWF)	`sandbox.agent: awf`	13	66	16%
Multi-turn	`max-continuations`	1	78	1%
Agent files	`engine.agent`	12	67	15%
Model selection	`engine.model`	0	79	0%
Version pinning	`engine.version`	0	79	0%
Custom CLI args	`engine.args`	0	79	0%
Plugin system	`plugins:`	0	79	0%
Startup timeout	`startup-timeout`	0	79	0%
MCP scripts	`mcp-scripts`	1	78	1%
Web fetch	`web-fetch`	9	70	11%
Playwright	`playwright`	15	64	19%
Token feature	`copilot-requests`	40	39	51% ✅
Strict mode	`strict: true`	51	28	64% ✅
Cache memory	`cache-memory`	35	44	44% ✅

3️⃣ Missed Opportunities

View High Priority Opportunities

🔴 Opportunity 1: AWF Sandbox for Security-Sensitive Workflows

What: 84% of Copilot workflows (66/79) run without the AWF network firewall.

Why It Matters: Workflows handling sensitive data (security scans, secret analysis, malicious code detection) are at risk of unintended data exfiltration. AWF provides network isolation, restricting outbound connections to explicit allowlists.

Affected Workflows: daily-malicious-code-scan.md, daily-secrets-analysis.md, security-compliance.md, code-scanning-fixer.md, bot-detection.md (uses local GitHub MCP but no AWF)

How to Implement:

sandbox:
  agent: awf  # Enable AWF network firewall
network:
  allowed:
    - defaults  # github.com + standard endpoints
    # Add specific domains as needed

Expected Benefits: Prevents prompt injection attacks from exfiltrating data, network-isolated execution, audit trail of network access.

🔴 Opportunity 2: max-continuations for Complex Analysis Workflows

What: max-continuations enables the Copilot CLI's --autopilot --max-autopilot-continues mode, allowing multi-turn agentic execution. Only smoke-copilot.md uses it.

Why It Matters: Many complex analysis workflows (code quality, architecture diagrams, performance analysis) require multiple reasoning steps. With only a single turn, the agent may give up or produce shallow analysis.

Affected Workflows: daily-compiler-quality.md, daily-cli-performance.md, repository-quality-improver.md, ci-coach.md, dead-code-remover.md, code-simplifier.md

How to Implement:

# Simple form (top-level)
max-continuations: 3

# Or extended engine form
engine:
  id: copilot
  max-continuations: 5

Expected Benefits: 2-5x deeper analysis, ability to iterate on findings, more complete code changes.

View Medium Priority Opportunities

🟡 Opportunity 3: Custom Agent Files for Domain-Specific Workflows

What: engine.agent references a .github/agents/*.agent.md file that provides specialized persona instructions. Currently only 12 workflows use this (notably hourly-ci-cleaner.md with agent: ci-cleaner).

Why It Matters: Custom agent files allow persistent, reusable persona definitions that can be version-controlled separately from workflow prompts. They can encode domain expertise (e.g., "senior Go engineer specializing in compiler design") without bloating individual workflow files.

Affected Workflows: Any workflow with domain-specific expertise requirements: daily-compiler-quality.md, grumpy-reviewer.md, breaking-change-checker.md, docs-noob-tester.md

How to Implement:

engine:
  id: copilot
  agent: go-compiler-expert  # References .github/agents/go-compiler-expert.agent.md

Create .github/agents/go-compiler-expert.agent.md with specialized instructions.

🟡 Opportunity 4: Model Selection for Cost Optimization

What: Zero Copilot workflows override the model. All rely on the org-level GH_AW_MODEL_AGENT_COPILOT variable or default.

Why It Matters: Simple analysis tasks (counting files, basic summaries, label suggestions) don't need a powerful model. Using gpt-5.1-codex-mini for simple tasks vs gpt-5 for complex ones could significantly reduce token costs.

Affected Workflows: auto-triage-issues.md (label suggestions), draft-pr-cleanup.md (status check), sub-issue-closer.md (simple automation)

How to Implement:

engine:
  id: copilot
  model: gpt-5.1-codex-mini  # Cost-effective for simple tasks

Use full model (or omit) for complex analysis requiring deeper reasoning.

🟡 Opportunity 5: Specific GitHub Toolsets Instead of Broad Access

What: Many workflows use toolsets: [default] which grants broad GitHub API access. More specific toolsets reduce attack surface.

Why It Matters: Principle of least privilege — workflows should only access the GitHub API surfaces they need.

Examples of better toolset selection:

auto-triage-issues.md: Only needs issues toolset, not full default
copilot-pr-merged-report.md: Only needs repos + pull_requests
weekly-issue-summary.md: Only needs issues

How to Implement:

tools:
  github:
    toolsets: [issues]  # Instead of [default]

Available toolsets: default, repos, issues, pull_requests, actions, discussions, code_security, orgs, users

🟡 Opportunity 6: startup-timeout for MCP-Heavy Workflows

What: startup-timeout configures how long gh-aw waits for MCP servers to be ready before sending the prompt. Currently 0 workflows use this.

Why It Matters: Workflows using multiple MCP servers (playwright + github + serena + safeoutputs) may experience startup delays. Without a timeout, the workflow silently proceeds if servers aren't ready.

Affected Workflows: Any workflow with 3+ MCP servers: smoke-copilot.md, brave.md, research.md

How to Implement:

tools:
  startup-timeout: 30  # seconds
  timeout: 120

View Low Priority Opportunities

🟢 Opportunity 7: Version Pinning for Reproducible Builds

What: No Copilot workflows pin the engine version. All use latest.

Why It Matters: Copilot CLI updates can introduce breaking changes. Pinning enables reproducible builds and controlled upgrades.

How to Implement:

engine:
  id: copilot
  version: "0.0.422"  # Pin to tested version

Best practice: Pin in production workflows, use latest in development/smoke tests.

🟢 Opportunity 8: mcp-scripts for GitHub CLI Automation

What: mcp-scripts enables GitHub CLI (gh) commands as MCP tools. Only 1 Copilot workflow uses this despite it being very powerful for GitHub automation.

Why It Matters: Many Copilot workflows manually instruct the agent to run gh commands via bash. Using mcp-scripts with specific gh command whitelists is cleaner and safer.

Affected Workflows: Workflows that use bash: ["*"] just for gh commands: contribution-check.md, grumpy-reviewer.md

How to Implement:

tools:
  mcp-scripts:
    mcpscripts-gh:
      - "pr list --repo $\{\{ github.repository }} --limit 10 --json number,title"
      - "issue view $\{\{ github.event.issue.number }}"

🟢 Opportunity 9: Custom engine.args for Diagnostic Flags

What: engine.args allows injecting custom CLI arguments before the prompt. No Copilot workflows use this.

Why It Matters: Advanced diagnostics like custom log levels, debug flags, or pre-prompt setup could benefit from custom args.

How to Implement:

engine:
  id: copilot
  args: ["--verbose", "--add-dir", "/custom/data/"]

Use with caution — most production configurations don't need custom args as the compiler handles them automatically.

4️⃣ Workflow-Specific Recommendations

View Workflow-Specific Recommendations

`daily-malicious-code-scan.md`

Current: No AWF sandbox, full bash access
Recommended: Add sandbox.agent: awf + network.allowed: [defaults, github]
Benefit: Prevents compromised workflows from exfiltrating discovered secrets

`daily-compiler-quality.md`

Current: Single-turn execution, relies on cache for iteration
Recommended: Add max-continuations: 3 to enable multi-step analysis
Benefit: Agent can analyze → reflect → improve recommendations iteratively

`repository-quality-improver.md`

Current: Standard single-turn with bash access
Recommended: Add max-continuations: 2, consider engine.agent: code-quality-expert
Benefit: Can implement and validate fixes in multiple steps

`hourly-ci-cleaner.md` (Best Practice Example ⭐)

Current: Uses engine.id: copilot, engine.agent: ci-cleaner, custom sandbox mounts
Status: Already the most advanced Copilot configuration in the repo
Share pattern: This workflow demonstrates best practices that others should follow

`dev.md`

Current: No github: tools configured, uses copilot-requests: true
Recommended: Add github: toolsets: [repos, issues] for direct API access
Benefit: Agent can query real repo data instead of relying on context

`smoke-copilot.md` (Best Practice Example ⭐)

Current: Only workflow using max-continuations: 2 and extended engine syntax
Status: Good reference for advanced engine configuration
Pattern: Extended engine object + max-continuations + network config

5️⃣ Trends & Insights

View Historical Trends

This is the first analysis run — no historical data available for trend comparison.

Baseline established for future tracking:

AWF adoption: 16% (13/79)
max-continuations adoption: 1% (1/79)
strict adoption: 64% (51/79)
copilot-requests adoption: 51% (40/79)

Expected improvements to track in future runs:

AWF adoption should increase as security workflows are migrated
max-continuations should appear in more complex analysis workflows
model selection optimization as cost becomes important

6️⃣ Best Practice Guidelines

Based on this research, here are recommended patterns:

Security: Use sandbox.agent: awf for any workflow accessing sensitive code, credentials, or external data. Pair with explicit network.allowed lists.
Multi-turn Analysis: For complex tasks (code quality, refactoring, analysis), set max-continuations: 2-5 to enable iterative autopilot execution.
Principle of Least Privilege: Specify exact GitHub toolsets (issues, pull_requests) rather than broad default. Use read-only: true for read-only workflows.
Custom Agent Personas: Create .github/agents/*.agent.md files for domain-specific workflows (compiler expert, security analyst, documentation writer).
Cost Optimization: Use engine.model: gpt-5.1-codex-mini for simple automation tasks; reserve full models for complex reasoning tasks.
Graceful Degradation: Add tools: startup-timeout: 30 when using 3+ MCP servers to handle slow startup gracefully.

7️⃣ Action Items

Immediate Actions (this week):

Add sandbox.agent: awf to daily-malicious-code-scan.md, daily-secrets-analysis.md
Add max-continuations: 3 to daily-compiler-quality.md and repository-quality-improver.md

Short-term (this month):

Audit all security/analysis workflows for AWF adoption (target: 40%+)
Create template engine.agent files for common personas (code-reviewer, documentation-writer)
Identify top 5 simple workflows that could use gpt-5.1-codex-mini for cost savings

Long-term (this quarter):

Build cost dashboard tracking model usage across workflows
Establish version pinning policy for critical production workflows
Explore plugins: capability for specialized tool integration
Track AWF adoption rate monthly

View Supporting Evidence & Methodology

Research Methodology

Phase 1 — Feature Discovery: Analyzed pkg/workflow/copilot_engine*.go, copilot_mcp.go, copilot_engine_execution.go to identify all available CLI flags and configuration options.

Phase 2 — Usage Analysis: Scanned all 166 .github/workflows/*.md files using grep/pattern matching to identify feature usage. Files analyzed: copilot_engine.go, copilot_engine_execution.go, copilot_engine_tools.go, copilot_mcp.go, documentation in docs/src/content/docs/reference/engines.md.

Phase 3 — Gap Analysis: Cross-referenced available features against actual usage to identify missed opportunities, ranked by security/performance impact.

Data Sources:

Source code: pkg/workflow/copilot_engine*.go — CLI flags, engine options
Documentation: docs/src/content/docs/reference/engines.md — official feature docs
Workflows: github/gh-aw/.github/workflows/*.md — 166 workflow markdown files

Notable Workflows Examined in Detail

smoke-copilot.md — Only workflow with max-continuations + extended engine syntax
hourly-ci-cleaner.md — Most advanced configuration (engine.agent + custom mounts)
daily-compiler-quality.md — Good pattern: serena + cache-memory + strict + copilot-requests
archie.md — Good pattern: engine.agent + specific toolsets + message customization

References:

AI generated by Copilot CLI Deep Research Agent · history

expires on Mar 12, 2026, 9:29 PM UTC

2026-03-12T22:47:47Z

github-actions[bot]
bot Mar 12, 2026
Author

This discussion was automatically closed because it expired on 2026-03-12T21:29:37.754Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[copilot-cli-research] Copilot CLI Deep Research - 2026-03-11 #20595

Uh oh!

{{title}}

Uh oh!

Copilot CLI Engine Features (from source code analysis)

Engine Distribution (166 total workflows)

Feature Usage in 79 Copilot Workflows

Timeout Distribution (Copilot Workflows)

🔴 Opportunity 1: AWF Sandbox for Security-Sensitive Workflows

🔴 Opportunity 2: max-continuations for Complex Analysis Workflows

🟡 Opportunity 3: Custom Agent Files for Domain-Specific Workflows

🟡 Opportunity 4: Model Selection for Cost Optimization

🟡 Opportunity 5: Specific GitHub Toolsets Instead of Broad Access

🟡 Opportunity 6: startup-timeout for MCP-Heavy Workflows

🟢 Opportunity 7: Version Pinning for Reproducible Builds

🟢 Opportunity 8: mcp-scripts for GitHub CLI Automation

🟢 Opportunity 9: Custom engine.args for Diagnostic Flags

`daily-malicious-code-scan.md`

`daily-compiler-quality.md`

`repository-quality-improver.md`

`hourly-ci-cleaner.md` (Best Practice Example ⭐)

`dev.md`

`smoke-copilot.md` (Best Practice Example ⭐)

Research Methodology

Notable Workflows Examined in Detail

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[copilot-cli-research] Copilot CLI Deep Research - 2026-03-11 #20595

Uh oh!

github-actions[bot] bot Mar 11, 2026

📊 Executive Summary

🔴 High Priority Issues

🟡 Medium Priority Opportunities

1️⃣ Current State Analysis

Copilot CLI Engine Features (from source code analysis)

Engine Distribution (166 total workflows)

Feature Usage in 79 Copilot Workflows

Timeout Distribution (Copilot Workflows)

2️⃣ Feature Usage Matrix

3️⃣ Missed Opportunities

🔴 Opportunity 1: AWF Sandbox for Security-Sensitive Workflows

🔴 Opportunity 2: max-continuations for Complex Analysis Workflows

🟡 Opportunity 3: Custom Agent Files for Domain-Specific Workflows

🟡 Opportunity 4: Model Selection for Cost Optimization

🟡 Opportunity 5: Specific GitHub Toolsets Instead of Broad Access

🟡 Opportunity 6: startup-timeout for MCP-Heavy Workflows

🟢 Opportunity 7: Version Pinning for Reproducible Builds

🟢 Opportunity 8: mcp-scripts for GitHub CLI Automation

🟢 Opportunity 9: Custom engine.args for Diagnostic Flags

4️⃣ Workflow-Specific Recommendations

daily-malicious-code-scan.md

daily-compiler-quality.md

repository-quality-improver.md

hourly-ci-cleaner.md (Best Practice Example ⭐)

dev.md

smoke-copilot.md (Best Practice Example ⭐)

5️⃣ Trends & Insights

6️⃣ Best Practice Guidelines

7️⃣ Action Items

Research Methodology

Notable Workflows Examined in Detail

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 12, 2026 Author

github-actions[bot]
bot Mar 11, 2026

`daily-malicious-code-scan.md`

`daily-compiler-quality.md`

`repository-quality-improver.md`

`hourly-ci-cleaner.md` (Best Practice Example ⭐)

`dev.md`

`smoke-copilot.md` (Best Practice Example ⭐)

github-actions[bot]
bot Mar 12, 2026
Author