[copilot-cli-research] Copilot CLI Deep Research - 2026-03-12 #20724

2026-03-12T21:27:07Z

github-actions[bot]
bot Mar 12, 2026

Analysis Date: 2026-03-12
Repository: github/gh-aw
Scope: 167 total workflows, 80 using Copilot engine (48%), 34 Claude, 9 Codex

📊 Executive Summary

This analysis compares the full capabilities of the GitHub Copilot CLI engine (as implemented in pkg/workflow/copilot_engine*.go) against actual usage across the 80 Copilot-engine workflows in .github/workflows/. The repository shows strong adoption of GitHub MCP toolsets and memory tools, but several high-value features remain nearly unused.

Key Findings:

max-continuations (autopilot mode) is almost completely unused — only 1 of 80 workflows uses it, despite many multi-step analytical workflows that would benefit
Custom agent files are underutilized — 9 agent files exist in .github/agents/ but only 3 workflows reference them via engine.agent
AWF sandbox adoption is low — only 13/80 Copilot workflows (16%) use the firewall sandbox despite it being the security best practice
rate-limit is barely used — only 3 workflows define rate limiting despite many event-triggered workflows that could flood on burst events
skip-if-match is underused — only 12 workflows use it, yet many scheduled workflows create issues/PRs that could pile up

Primary Recommendation: Enable max-continuations: 3-5 for complex research/analysis workflows (daily reporters, quality analyzers) to allow Copilot to self-continue through multi-phase work without manual re-triggering.

🔴 Critical Findings

High Priority Issues

1. AWF Sandbox Only 16% Adoption (Security Gap)
Only 13 of 80 Copilot workflows use sandbox: { agent: awf }. Workflows that use bash: tools (shell execution) without AWF sandbox have unrestricted network access during agent execution, creating a potential exfiltration risk.

Affected pattern: workflows with bash: true or bash: ["*"] but no sandbox.agent: awf.

2. max-continuations Nearly Unused (1/80)
The autopilot feature (max-continuations → --autopilot --max-autopilot-continues N) allows Copilot to self-continue complex tasks. Only smoke-copilot.md uses it. Workflows like agent-performance-analyzer.md, daily-architecture-diagram.md, and daily-compiler-quality.md perform deep multi-phase analysis that would benefit from 3-5 continuations.

Medium Priority Opportunities

3. Custom Agent Files Underused (3/80)
9 agent files exist (.github/agents/), but only glossary-maintainer.md, hourly-ci-cleaner.md, and technical-doc-writer.md use engine.agent. Agents like contribution-checker, interactive-agent-designer, and w3c-specification-writer have no corresponding workflows using them.

4. rate-limit Gaps (3/80)
Only auto-triage-issues.md, bot-detection.md, and one other use rate-limiting. Event-triggered workflows (slash commands, issue/PR events) can burst under high activity. Workflows responding to issues: [opened] or issue_comment should have rate-limit: { max: 5, window: 60 }.

1️⃣ Current State Analysis

View Copilot CLI Capabilities Inventory

Available Copilot Engine Configuration Options

Config Field	Description	Default
`engine.id`	`copilot`	copilot
`engine.version`	Pin CLI version (e.g., `"0.0.422"`)	`latest`
`engine.model`	Override AI model	`claude-sonnet-4`
`engine.agent`	Custom agent file (`.github/agents/*.agent.md`)	none
`engine.args`	Custom CLI args injected before prompt	none
`engine.command`	Custom executable path	auto-installed
`engine.env`	Custom environment variables	none
`engine.max-continuations`	Enable autopilot with N continuations	1 (disabled)

CLI Flags Automatically Managed by the Compiler

Flag	When Used
`--add-dir /tmp/gh-aw/`	Always
`--add-dir "\$\{GITHUB_WORKSPACE}"`	When AWF sandbox enabled
`--disable-builtin-mcps`	Always
`--log-level all --log-dir`	Always
`--agent (id)`	When `engine.agent` is set
`--autopilot --max-autopilot-continues N`	When `max-continuations > 1`
`--allow-tool shell(cmd)`	Per bash tool entry
`--allow-tool write`	When `edit:` tool enabled
`--allow-all-tools`	When `bash: [""]` or `bash: [":"]`
`--allow-all-paths`	When `edit:` tool enabled
`COPILOT_MODEL` env var	When `engine.model` set

Sandbox/Security Options

Option	Description
`sandbox.agent: awf`	Network firewall (AWF binary wrap)
`sandbox.agent: srt`	Process isolation sandbox
`sandbox.agent: false`	Disable sandbox (not recommended)
`network.allowed: [...]`	Allowlist domains when AWF active
`strict: true`	Enable strict mode validation

View Usage Statistics

Engine Distribution (167 total workflows)

Engine	Count	%
`copilot`	80	48%
`claude`	34	20%
`codex`	9	5%
other/unspecified	44	26%

Feature Usage Across 80 Copilot Workflows

Feature	Used	% of Copilot
`github:` MCP tool	~60	75%
`cache-memory:` tool	56	70%
`strict: true`	47	59%
`network:` config	38	48%
`repo-memory:` tool	25	31%
`sandbox.agent: awf`	13	16%
`serena` MCP	11	14%
`web-fetch:` tool	7	9%
`agentic-workflows:` MCP	9	11%
`engine.agent` field	3	4%
`max-continuations`	1	1.25%
`rate-limit:`	3	4%
`playwright:` MCP	4	5%
Version pinning	0	0%
`concurrency:`	~7	9%

Most Common GitHub Toolsets

Toolset Config	Count
`[default]`	15
Inline list	12
`[default, discussions]`	6
`[pull_requests, repos]`	4
`[default, actions]`	3

Model Usage

7 workflows use gpt-5.1-codex-mini (cost-effective detection model)
1 workflow uses gpt-5
Rest use default claude-sonnet-4

2️⃣ Feature Usage Matrix

Feature	Available	Used	Not Used	Rate
`max-continuations`	✅	1	79	1%
`engine.agent` (custom)	✅	3	77	4%
AWF sandbox	✅	13	67	16%
`rate-limit`	✅	3	77	4%
`skip-if-match`	✅	12	155	7%
`engine.version` pinning	✅	0	80	0%
`engine.env`	✅	~3	~77	~4%
`engine.args`	✅	~2	~78	~3%
`web-fetch:`	✅	7	73	9%
`playwright:` MCP	✅	4	76	5%
`copilot-requests` feature	✅	~25	~55	~31%
`concurrency:`	✅	7	~73	9%

3️⃣ Missed Opportunities

View High Priority Opportunities

🔴 Opportunity 1: Enable `max-continuations` for Multi-Phase Workflows

What: The max-continuations field triggers --autopilot --max-autopilot-continues N, letting Copilot self-continue complex tasks without stopping.

Why It Matters: Many analytical workflows (daily reports, code quality analyzers, research agents) do multi-phase work that often gets truncated in a single pass. Autopilot allows Copilot to complete all phases.

Where: High candidates — agent-performance-analyzer.md, daily-compiler-quality.md, daily-mcp-concurrency-analysis.md, daily-architecture-diagram.md, copilot-pr-nlp-analysis.md, research.md

How to Implement:

engine: copilot
max-continuations: 3
timeout-minutes: 45  # increase timeout proportionally

Expected Benefits: Richer, more complete analysis outputs; fewer truncated reports; better utilization of available compute time.

🔴 Opportunity 2: Expand AWF Sandbox Adoption

What: Only 13/80 Copilot workflows use sandbox: { agent: awf }. The AWF sandbox provides network firewalling to prevent data exfiltration.

Why It Matters: Workflows with bash: access have unrestricted network egress during execution without AWF. Any prompt injection could exfiltrate data.

Where: All workflows with bash: true or broad bash permissions without a sandbox — especially those processing external user content (slash commands, issue events).

How to Implement:

sandbox:
  agent: awf  # Firewall enabled
network:
  allowed:
    - defaults
    - github

Example from artifacts-summary.md (already correct):

network:
  allowed:
    - defaults
    - node
sandbox:
  agent: awf

View Medium Priority Opportunities

🟡 Opportunity 3: Custom Agent Files for Specialized Roles

What: 9 .github/agents/*.agent.md files exist defining specialized Copilot behaviors, but only 3 workflows use them.

Unused Agent Files:

contribution-checker.agent.md — no workflow using it
interactive-agent-designer.agent.md — no workflow
w3c-specification-writer.agent.md — no workflow
create-safe-output-type.agent.md — no workflow
agentic-workflows.agent.md — used as custom agent but not via engine.agent
grumpy-reviewer.agent.md — has a grumpy-reviewer.md but doesn't use engine.agent

How to Implement:

engine:
  id: copilot
  agent: grumpy-reviewer  # references .github/agents/grumpy-reviewer.agent.md

Expected Benefits: Consistent persona/behavior across workflow executions; centralized agent instruction management.

🟡 Opportunity 4: Rate-Limiting for Event-Triggered Workflows

What: Only 3 workflows use rate-limit, yet many respond to issues:, pull_request:, or slash_command: events that can burst.

Why It Matters: Without rate-limiting, a burst of 20 new issues would trigger 20 simultaneous workflow runs, consuming credits and potentially creating 20 duplicate safe-outputs.

Where: Especially needed for: auto-triage-issues.md (has it ✅), pr-nitpick-reviewer.md, contribution-check.md, sub-issue-closer.md, all slash-command workflows.

How to Implement:

rate-limit:
  max: 5
  window: 60  # 5 runs per 60 minutes

🟡 Opportunity 5: `skip-if-match` for Idempotent Scheduled Workflows

What: Only 12 workflows use skip-if-match to prevent duplicate issue/PR creation.

Why It Matters: Scheduled workflows that create issues or PRs can pile up if they don't check for existing open items. skip-if-match prevents redundant work.

Where: Any scheduled workflow creating issues or PRs — e.g., dead-code-remover.md, daily-syntax-error-quality.md, daily-testify-uber-super-expert.md, layout-spec-maintainer.md.

How to Implement:

skip-if-match: 'is:pr is:open in:title "[dead-code]"'

🟡 Opportunity 6: `web-fetch:` for Workflows Needing External Data

What: Copilot CLI has built-in web-fetch support (supportsWebFetch: true), but only 7 Copilot workflows enable it.

Where: Workflows doing research, checking external URLs, fetching documentation — e.g., blog-auditor.md, cli-version-checker.md (uses network but not explicitly web-fetch tool), daily-news.md.

How to Implement:

tools:
  web-fetch:
  github:
    toolsets: [default]

View Low Priority Opportunities

🟢 Opportunity 7: Engine Version Pinning for Stability

What: Zero production Copilot workflows pin to a specific version. Only smoke test workflows use version pinning.

Why It Matters: Critical workflows (CI, security) could break when a new Copilot CLI version ships with behavior changes.

How to Implement (for critical workflows):

engine:
  id: copilot
  version: "0.0.422"  # pin after successful testing

🟢 Opportunity 8: `concurrency:` for Long-Running Workflows

What: Only ~7 workflows define concurrency control. Long-running workflows (30-45 min) could queue up.

How to Implement:

concurrency:
  group: $\{\{ github.workflow }}-$\{\{ github.ref }}
  cancel-in-progress: true

🟢 Opportunity 9: `engine.env` for Per-Workflow Configuration

What: The engine.env field allows passing custom environment variables to the Copilot CLI process. Almost no workflows use this for customization.

Use Cases: Debug flags, feature toggles, custom API endpoints for testing, workflow-specific configuration without modifying the prompt.

How to Implement:

engine:
  id: copilot
  env:
    MY_WORKFLOW_MODE: "strict"
    DEBUG_ANALYSIS: "true"

🟢 Opportunity 10: Third-Party MCPs Widely Available but Underused

What: 22 shared MCP configs exist in .github/workflows/shared/mcp/ (arxiv, azure, brave, chroma, context7, datadog, deepwiki, drain3, fabric-rti, jupyter, markitdown, microsoft-docs, notion, semgrep, sentry, serena-go, server-memory, skillz, slack, svelte, tavily) — many unused in Copilot workflows.

Where: Research workflows could use arxiv, context7, deepwiki; quality workflows could use semgrep; analysis workflows could use chroma for vector search.

How to Implement (example):

imports:
  - shared/mcp/arxiv.md
  - shared/mcp/context7.md
tools:
  arxiv:
  context7:

4️⃣ Specific Workflow Recommendations

View Workflow-Specific Recommendations

`agent-performance-analyzer.md`

Current: Complex meta-orchestrator, no max-continuations
Recommended: Add max-continuations: 4, increase timeout-minutes: 60
Expected: Complete multi-agent analysis in one run vs. truncating

`grumpy-reviewer.md`

Current: Has a grumpy-reviewer.agent.md in .github/agents/ but workflow doesn't use engine.agent
Recommended: Add engine: { id: copilot, agent: grumpy-reviewer }
Expected: Consistent grumpy persona via centralized agent file

`pr-nitpick-reviewer.md`, `contribution-check.md`

Current: Event-triggered (PR events), no rate-limit
Recommended: Add rate-limit: { max: 5, window: 60 }
Expected: Prevent credit exhaustion during PR bursts

`dead-code-remover.md`, `daily-file-diet.md`

Current: Scheduled, creates PRs, but dead-code-remover.md has skip-if-match while daily-file-diet.md does
Check: Ensure all PR-creating scheduled workflows have skip-if-match

`daily-architecture-diagram.md`, `daily-compiler-quality.md`

Current: Deep analytical work, single pass
Recommended: max-continuations: 3, sandbox: { agent: awf }
Expected: Richer diagrams and quality reports

`research.md`

Current: Research workflow, minimal config
Recommended: Add max-continuations: 5 for thorough research, web-fetch: tool, consider arxiv or context7 MCPs

5️⃣ Trends & Insights

View Historical Trends

This is the first comprehensive analysis using this research framework. Future runs (via copilot-cli-deep-research.md) will track:

Adoption rates of max-continuations, AWF sandbox, custom agents
Feature flag (copilot-requests) rollout progress
New MCP server integrations
Model distribution changes
Whether these recommendations are implemented

Baseline established: 2026-03-12 (run §23024359071)

Notable Architecture Observations:

The repo uses a sophisticated shared imports system (36 shared files, 22 MCP configs)
Cache-memory (56 workflows) and repo-memory (25 workflows) adoption is strong
GitHub MCP toolsets are well-configured with specific scoping in most workflows
strict: true is used in 59% of Copilot workflows — good security hygiene
AWF sandbox migration is in progress (comments show "migrated from network.firewall")

6️⃣ Best Practice Guidelines

Based on this research:

Enable max-continuations: 3 for analytical workflows — Any workflow doing multi-phase research, report generation, or code analysis benefits from autopilot continuation.
Use AWF sandbox for all workflows with bash: access — Add sandbox: { agent: awf } + network: { allowed: [defaults, github] } to prevent exfiltration.
Register custom agents for specialized roles — Create .github/agents/ files for recurring personas, then reference via engine.agent for consistent behavior.
Add rate-limit to all event-triggered workflows — Especially issues:, pull_request:, and slash_command: triggers. Standard: max: 5, window: 60.
Use skip-if-match for scheduled PR/issue creators — Prevents accumulation of duplicate issues/PRs from daily/weekly workflows.
Scope GitHub toolsets precisely — Use [repos] instead of [default] when only repo access is needed; reduces MCP attack surface.
Enable web-fetch: when workflows need external data — Copilot CLI supports it natively; explicitly declare it in tools:.

7️⃣ Action Items

Immediate (this week):

Add max-continuations: 3 to agent-performance-analyzer.md, daily-compiler-quality.md, research.md
Wire engine.agent: grumpy-reviewer in grumpy-reviewer.md
Add rate-limit to pr-nitpick-reviewer.md and contribution-check.md

Short-term (this month):

Audit all event-triggered Copilot workflows for missing rate-limit
Add AWF sandbox to workflows with bash: true that handle external user content
Create workflows for unused agent files (contribution-checker, interactive-agent-designer)
Add skip-if-match to all scheduled PR-creating workflows

Long-term (this quarter):

Evaluate version pinning for critical production workflows
Expand third-party MCP usage (arxiv/context7 for research, semgrep for security)
Complete AWF sandbox migration across all Copilot workflows
Track adoption of these recommendations in future deep-research runs

View Supporting Evidence & Methodology

Research Methodology

Data sources examined:

pkg/workflow/copilot_engine.go — Engine capabilities, supported features
pkg/workflow/copilot_engine_execution.go — CLI flag generation logic
pkg/workflow/copilot_engine_tools.go — Tool argument computation
pkg/workflow/copilot_mcp.go — MCP configuration rendering
docs/src/content/docs/reference/engines.md — Feature documentation
All 167 .github/workflows/*.md files — Actual usage patterns

Analysis approach: grep-based inventory of 80 Copilot workflows cross-referenced against engine implementation to identify gaps between available capabilities and actual usage.

Confidence: High for quantitative counts (direct file analysis). Medium for qualitative recommendations (based on workflow purpose/complexity assessment).

Repo memory saved: /tmp/gh-aw/repo-memory/default/latest.json

References:

AI generated by Copilot CLI Deep Research Agent · history

expires on Mar 13, 2026, 9:27 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[copilot-cli-research] Copilot CLI Deep Research - 2026-03-12 #20724

Uh oh!

{{title}}

Uh oh!

Available Copilot Engine Configuration Options

CLI Flags Automatically Managed by the Compiler

Sandbox/Security Options

Engine Distribution (167 total workflows)

Feature Usage Across 80 Copilot Workflows

Most Common GitHub Toolsets

Model Usage

🔴 Opportunity 1: Enable `max-continuations` for Multi-Phase Workflows

🔴 Opportunity 2: Expand AWF Sandbox Adoption

🟡 Opportunity 3: Custom Agent Files for Specialized Roles

🟡 Opportunity 4: Rate-Limiting for Event-Triggered Workflows

🟡 Opportunity 5: `skip-if-match` for Idempotent Scheduled Workflows

🟡 Opportunity 6: `web-fetch:` for Workflows Needing External Data

🟢 Opportunity 7: Engine Version Pinning for Stability

🟢 Opportunity 8: `concurrency:` for Long-Running Workflows

🟢 Opportunity 9: `engine.env` for Per-Workflow Configuration

🟢 Opportunity 10: Third-Party MCPs Widely Available but Underused

`agent-performance-analyzer.md`

`grumpy-reviewer.md`

`pr-nitpick-reviewer.md`, `contribution-check.md`

`dead-code-remover.md`, `daily-file-diet.md`

`daily-architecture-diagram.md`, `daily-compiler-quality.md`

`research.md`

Research Methodology

Replies: 0 comments

Select a reply

Uh oh!

[copilot-cli-research] Copilot CLI Deep Research - 2026-03-12 #20724

Uh oh!

github-actions[bot] bot Mar 12, 2026

📊 Executive Summary

🔴 Critical Findings

High Priority Issues

Medium Priority Opportunities

1️⃣ Current State Analysis

Available Copilot Engine Configuration Options

CLI Flags Automatically Managed by the Compiler

Sandbox/Security Options

Engine Distribution (167 total workflows)

Feature Usage Across 80 Copilot Workflows

Most Common GitHub Toolsets

Model Usage

2️⃣ Feature Usage Matrix

3️⃣ Missed Opportunities

🔴 Opportunity 1: Enable max-continuations for Multi-Phase Workflows

🔴 Opportunity 2: Expand AWF Sandbox Adoption

🟡 Opportunity 3: Custom Agent Files for Specialized Roles

🟡 Opportunity 4: Rate-Limiting for Event-Triggered Workflows

🟡 Opportunity 5: skip-if-match for Idempotent Scheduled Workflows

🟡 Opportunity 6: web-fetch: for Workflows Needing External Data

🟢 Opportunity 7: Engine Version Pinning for Stability

🟢 Opportunity 8: concurrency: for Long-Running Workflows

🟢 Opportunity 9: engine.env for Per-Workflow Configuration

🟢 Opportunity 10: Third-Party MCPs Widely Available but Underused

4️⃣ Specific Workflow Recommendations

agent-performance-analyzer.md

grumpy-reviewer.md

pr-nitpick-reviewer.md, contribution-check.md

dead-code-remover.md, daily-file-diet.md

daily-architecture-diagram.md, daily-compiler-quality.md

research.md

5️⃣ Trends & Insights

6️⃣ Best Practice Guidelines

7️⃣ Action Items

Research Methodology

Replies: 0 comments

github-actions[bot]
bot Mar 12, 2026

🔴 Opportunity 1: Enable `max-continuations` for Multi-Phase Workflows

🟡 Opportunity 5: `skip-if-match` for Idempotent Scheduled Workflows

🟡 Opportunity 6: `web-fetch:` for Workflows Needing External Data

🟢 Opportunity 8: `concurrency:` for Long-Running Workflows

🟢 Opportunity 9: `engine.env` for Per-Workflow Configuration

`agent-performance-analyzer.md`

`grumpy-reviewer.md`

`pr-nitpick-reviewer.md`, `contribution-check.md`

`dead-code-remover.md`, `daily-file-diet.md`

`daily-architecture-diagram.md`, `daily-compiler-quality.md`

`research.md`