[copilot-cli-research] Copilot CLI Deep Research - 2026-03-11 #20595
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-03-12T21:29:37.754Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Analysis Date: 2026-03-11
Repository: github/gh-aw
Scope: 166 total workflows, 79 using Copilot engine (48%)
Run: §22974875231
📊 Executive Summary
This is the first comprehensive Copilot CLI deep research for the
github/gh-awrepository. The analysis compared all documented Copilot CLI features against actual usage in 79 Copilot-powered workflows.Key findings: The codebase shows excellent adoption of core patterns (
strict: trueat 64%,copilot-requestsat 51%,cache-memoryat 44%), but several powerful features remain nearly untouched:max-continuationsis used in only 1 workflow,pluginsandstartup-timeouthave zero adoption, and the AWF security sandbox is only used in 16% of workflows. Model selection is left entirely to org-level variables with no per-workflow overrides, missing cost optimization opportunities.Primary recommendation: Expand AWF sandbox adoption to high-sensitivity workflows (security scans, secret analysis, malicious code detection) — currently 84% of Copilot workflows run without network firewalling.
🔴 High Priority Issues
Security gap: Workflows like
daily-malicious-code-scan.md,daily-secrets-analysis.md, andsecurity-compliance.mdhandle sensitive data but do not use AWF sandbox. AWF provides network isolation preventing data exfiltration, and only 13/79 (16%) Copilot workflows have it enabled.Multi-turn potential missed:
max-continuationsis used in only 1 workflow (smoke-copilot.md). Complex analysis workflows likedaily-compiler-quality.md,ci-coach.md, andrepository-quality-improver.mdcould benefit significantly from multi-turn autopilot continuation.🟡 Medium Priority Opportunities
engine.agent): 12/79 (15%) use this — more domain-specific workflows could use specialized agent personas[default]when more specific toolsets would reduce attack surfacegpt-5.1-codex-miniexplicitly1️⃣ Current State Analysis
View Copilot CLI Capabilities Inventory
Copilot CLI Engine Features (from source code analysis)
CLI Execution Flags (auto-configured by compiler):
--add-dir— Directories the agent can access (auto-managed)--disable-builtin-mcps— Always applied--log-level all --log-dir— Always applied for logging--allow-tool/--allow-all-tools— Generated fromtools:config--allow-all-paths— Applied whenedit:tool is enabled--autopilot --max-autopilot-continues— Applied whenmax-continuations > 1--agent— Applied whenengine.agentis set--prompt— Always appliedEngine Configuration Fields (frontmatter):
engine: copilot— Simple formengine.id: copilot— Extended form with subfieldsengine.version— Pin CLI version (e.g.,"0.0.422")engine.model— Override model (e.g.,gpt-5.1-codex-mini)engine.command— Custom executable pathengine.args— Inject custom CLI argumentsengine.agent— Reference.github/agents/*.agent.mdfileengine.env— Custom environment variablesengine.max-continuations— Enable autopilot with continuation limitSandbox Options:
sandbox.agent: awf— AWF network firewall (string shorthand)sandbox.agent.mounts— Custom volume mounts into AWF containersandbox.agent: false— Disable sandbox explicitlySpecial Features:
features.copilot-requests: true— Usegithub.tokeninstead ofCOPILOT_GITHUB_TOKENstrict: true— Strict validation modeplugins:— Install Copilot plugins before executionmax-continuations:— Top-level shorthand for multi-turn autopilotAvailable Tools for Copilot:
bash,edit,web-fetch,github,playwright,serenacache-memory,repo-memory,agentic-workflowsmcp-scripts(with mcp-scripts feature flag)Network Configuration:
network.allowed— List of allowed domain presets or hostnames (with AWF)defaults,go,node,python,github,playwright,npmView Usage Statistics
Engine Distribution (166 total workflows)
copilotclaudecodexFeature Usage in 79 Copilot Workflows
strict: truefeatures.copilot-requestscache-memoryrepo-memoryplayground/playwrightengine.agentsandbox/AWFweb-fetchserenamax-continuationsmcp-scriptsmodel overrideengine.argsversion pinningpluginsstartup-timeoutTimeout Distribution (Copilot Workflows)
Average: ~18 minutes. Most workflows cluster around 10-30 minutes.
2️⃣ Feature Usage Matrix
sandbox.agent: awfmax-continuationsengine.agentengine.modelengine.versionengine.argsplugins:startup-timeoutmcp-scriptsweb-fetchplaywrightcopilot-requestsstrict: truecache-memory3️⃣ Missed Opportunities
View High Priority Opportunities
🔴 Opportunity 1: AWF Sandbox for Security-Sensitive Workflows
What: 84% of Copilot workflows (66/79) run without the AWF network firewall.
Why It Matters: Workflows handling sensitive data (security scans, secret analysis, malicious code detection) are at risk of unintended data exfiltration. AWF provides network isolation, restricting outbound connections to explicit allowlists.
Affected Workflows:
daily-malicious-code-scan.md,daily-secrets-analysis.md,security-compliance.md,code-scanning-fixer.md,bot-detection.md(uses local GitHub MCP but no AWF)How to Implement:
Expected Benefits: Prevents prompt injection attacks from exfiltrating data, network-isolated execution, audit trail of network access.
🔴 Opportunity 2: max-continuations for Complex Analysis Workflows
What:
max-continuationsenables the Copilot CLI's--autopilot --max-autopilot-continuesmode, allowing multi-turn agentic execution. Onlysmoke-copilot.mduses it.Why It Matters: Many complex analysis workflows (code quality, architecture diagrams, performance analysis) require multiple reasoning steps. With only a single turn, the agent may give up or produce shallow analysis.
Affected Workflows:
daily-compiler-quality.md,daily-cli-performance.md,repository-quality-improver.md,ci-coach.md,dead-code-remover.md,code-simplifier.mdHow to Implement:
Expected Benefits: 2-5x deeper analysis, ability to iterate on findings, more complete code changes.
View Medium Priority Opportunities
🟡 Opportunity 3: Custom Agent Files for Domain-Specific Workflows
What:
engine.agentreferences a.github/agents/*.agent.mdfile that provides specialized persona instructions. Currently only 12 workflows use this (notablyhourly-ci-cleaner.mdwithagent: ci-cleaner).Why It Matters: Custom agent files allow persistent, reusable persona definitions that can be version-controlled separately from workflow prompts. They can encode domain expertise (e.g., "senior Go engineer specializing in compiler design") without bloating individual workflow files.
Affected Workflows: Any workflow with domain-specific expertise requirements:
daily-compiler-quality.md,grumpy-reviewer.md,breaking-change-checker.md,docs-noob-tester.mdHow to Implement:
Create
.github/agents/go-compiler-expert.agent.mdwith specialized instructions.🟡 Opportunity 4: Model Selection for Cost Optimization
What: Zero Copilot workflows override the model. All rely on the org-level
GH_AW_MODEL_AGENT_COPILOTvariable or default.Why It Matters: Simple analysis tasks (counting files, basic summaries, label suggestions) don't need a powerful model. Using
gpt-5.1-codex-minifor simple tasks vsgpt-5for complex ones could significantly reduce token costs.Affected Workflows:
auto-triage-issues.md(label suggestions),draft-pr-cleanup.md(status check),sub-issue-closer.md(simple automation)How to Implement:
Use full model (or omit) for complex analysis requiring deeper reasoning.
🟡 Opportunity 5: Specific GitHub Toolsets Instead of Broad Access
What: Many workflows use
toolsets: [default]which grants broad GitHub API access. More specific toolsets reduce attack surface.Why It Matters: Principle of least privilege — workflows should only access the GitHub API surfaces they need.
Examples of better toolset selection:
auto-triage-issues.md: Only needsissuestoolset, not fulldefaultcopilot-pr-merged-report.md: Only needsrepos+pull_requestsweekly-issue-summary.md: Only needsissuesHow to Implement:
Available toolsets:
default,repos,issues,pull_requests,actions,discussions,code_security,orgs,users🟡 Opportunity 6: startup-timeout for MCP-Heavy Workflows
What:
startup-timeoutconfigures how long gh-aw waits for MCP servers to be ready before sending the prompt. Currently 0 workflows use this.Why It Matters: Workflows using multiple MCP servers (playwright + github + serena + safeoutputs) may experience startup delays. Without a timeout, the workflow silently proceeds if servers aren't ready.
Affected Workflows: Any workflow with 3+ MCP servers:
smoke-copilot.md,brave.md,research.mdHow to Implement:
View Low Priority Opportunities
🟢 Opportunity 7: Version Pinning for Reproducible Builds
What: No Copilot workflows pin the engine version. All use
latest.Why It Matters: Copilot CLI updates can introduce breaking changes. Pinning enables reproducible builds and controlled upgrades.
How to Implement:
Best practice: Pin in production workflows, use latest in development/smoke tests.
🟢 Opportunity 8: mcp-scripts for GitHub CLI Automation
What:
mcp-scriptsenables GitHub CLI (gh) commands as MCP tools. Only 1 Copilot workflow uses this despite it being very powerful for GitHub automation.Why It Matters: Many Copilot workflows manually instruct the agent to run
ghcommands via bash. Usingmcp-scriptswith specificghcommand whitelists is cleaner and safer.Affected Workflows: Workflows that use
bash: ["*"]just forghcommands:contribution-check.md,grumpy-reviewer.mdHow to Implement:
🟢 Opportunity 9: Custom engine.args for Diagnostic Flags
What:
engine.argsallows injecting custom CLI arguments before the prompt. No Copilot workflows use this.Why It Matters: Advanced diagnostics like custom log levels, debug flags, or pre-prompt setup could benefit from custom args.
How to Implement:
Use with caution — most production configurations don't need custom args as the compiler handles them automatically.
4️⃣ Workflow-Specific Recommendations
View Workflow-Specific Recommendations
daily-malicious-code-scan.mdsandbox.agent: awf+network.allowed: [defaults, github]daily-compiler-quality.mdmax-continuations: 3to enable multi-step analysisrepository-quality-improver.mdmax-continuations: 2, considerengine.agent: code-quality-experthourly-ci-cleaner.md(Best Practice Example ⭐)engine.id: copilot,engine.agent: ci-cleaner, custom sandbox mountsdev.mdgithub:tools configured, usescopilot-requests: truegithub: toolsets: [repos, issues]for direct API accesssmoke-copilot.md(Best Practice Example ⭐)max-continuations: 2and extended engine syntax5️⃣ Trends & Insights
View Historical Trends
This is the first analysis run — no historical data available for trend comparison.
Baseline established for future tracking:
Expected improvements to track in future runs:
6️⃣ Best Practice Guidelines
Based on this research, here are recommended patterns:
Security: Use
sandbox.agent: awffor any workflow accessing sensitive code, credentials, or external data. Pair with explicitnetwork.allowedlists.Multi-turn Analysis: For complex tasks (code quality, refactoring, analysis), set
max-continuations: 2-5to enable iterative autopilot execution.Principle of Least Privilege: Specify exact GitHub toolsets (
issues,pull_requests) rather than broaddefault. Useread-only: truefor read-only workflows.Custom Agent Personas: Create
.github/agents/*.agent.mdfiles for domain-specific workflows (compiler expert, security analyst, documentation writer).Cost Optimization: Use
engine.model: gpt-5.1-codex-minifor simple automation tasks; reserve full models for complex reasoning tasks.Graceful Degradation: Add
tools: startup-timeout: 30when using 3+ MCP servers to handle slow startup gracefully.7️⃣ Action Items
Immediate Actions (this week):
sandbox.agent: awftodaily-malicious-code-scan.md,daily-secrets-analysis.mdmax-continuations: 3todaily-compiler-quality.mdandrepository-quality-improver.mdShort-term (this month):
engine.agentfiles for common personas (code-reviewer, documentation-writer)gpt-5.1-codex-minifor cost savingsLong-term (this quarter):
plugins:capability for specialized tool integrationView Supporting Evidence & Methodology
Research Methodology
Phase 1 — Feature Discovery: Analyzed
pkg/workflow/copilot_engine*.go,copilot_mcp.go,copilot_engine_execution.goto identify all available CLI flags and configuration options.Phase 2 — Usage Analysis: Scanned all 166
.github/workflows/*.mdfiles using grep/pattern matching to identify feature usage. Files analyzed:copilot_engine.go,copilot_engine_execution.go,copilot_engine_tools.go,copilot_mcp.go, documentation indocs/src/content/docs/reference/engines.md.Phase 3 — Gap Analysis: Cross-referenced available features against actual usage to identify missed opportunities, ranked by security/performance impact.
Data Sources:
pkg/workflow/copilot_engine*.go— CLI flags, engine optionsdocs/src/content/docs/reference/engines.md— official feature docsgithub/gh-aw/.github/workflows/*.md— 166 workflow markdown filesNotable Workflows Examined in Detail
smoke-copilot.md— Only workflow withmax-continuations+ extended engine syntaxhourly-ci-cleaner.md— Most advanced configuration (engine.agent+ custom mounts)daily-compiler-quality.md— Good pattern: serena + cache-memory + strict + copilot-requestsarchie.md— Good pattern: engine.agent + specific toolsets + message customizationReferences:
Beta Was this translation helpful? Give feedback.
All reactions