Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
189 changes: 189 additions & 0 deletions agents/undertaker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
# The Undertaker - Dead Code Detection Agent

Find unused functions, classes, variables, imports, and unreachable code with confidence scoring across your repository.

## Overview

The Undertaker is a reliable dead code detection agent that identifies unused code elements using static analysis. It provides deterministic confidence-based scoring to help you safely remove dead code while minimizing false positives.

## Features

- **Comprehensive Detection**: Identifies unused functions, classes, methods, variables, imports, types, enums, and unreachable code
- **Multi-Language Support**: Analyzes multiple programming languages
- **Confidence Scoring**: Deterministic scoring (50-100%) based on reference counts and export status
- **Safe Analysis**: Read-only static analysis that never modifies your code
- **Unreachable Code Detection**: Finds code after return/throw/break statements
- **Export-Aware**: Distinguishes between private and exported/public code
- **Detailed Reporting**: JSON output with actionable findings and reasoning

## Quick Start

### Basic Usage

```bash
# Run default analysis with 70% confidence threshold
qodo undertaker

# Use custom confidence threshold
qodo undertaker --min_confidence=80

# Include test files in analysis
qodo undertaker --include_tests=true

# Use both options together
qodo undertaker --min_confidence=85 --include_tests=true
```

## Configuration

The agent accepts the following parameters:

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `min_confidence` | number | 70 | Minimum confidence threshold (50-100). Only results meeting or exceeding this threshold are included. |
| `include_tests` | boolean | false | Whether to include test files in the dead code analysis. |

## How It Works

### Analysis Process

1. **Project Discovery**: Scans source files while excluding generated and vendor directories
2. **Definition Detection**: Identifies function/class/variable/import definitions using language-specific patterns
3. **Reference Counting**: Counts actual usage of each identifier across the codebase (excluding its definition)
4. **Export Analysis**: Determines whether code is exported/public, which affects confidence scoring
5. **Unreachable Code Detection**: Finds code after terminating statements (return/throw/break)
6. **Confidence Scoring**: Applies deterministic rules to generate confidence scores

### Confidence Scoring

The agent uses the following rules to calculate confidence scores:

| Condition | Confidence | Tier |
|-----------|-----------|------|
| No references + not exported | 100% | Very High |
| No references + exported | 90% | Very High |
| 1 reference + not exported | 75% | High |
| 1 reference + exported | 70% | High |
| 2+ references | 60% or lower | Medium |
| Unreachable code | 100% | Very High |

## Output Format

The agent returns a JSON object with the following structure:

```json
{
"summary": {
"total_files_scanned": 42,
"total_dead_code_items": 5,
"confidence_counts": {
"very_high": 3,
"high": 1,
"medium": 1
},
"estimated_lines_removable": 127
},
"dead_code_items": [
{
"identifier": "unusedFunction",
"type": "function",
"location": "src/utils.ts:42-55",
"confidence_score": 100,
"reference_count": 0,
"is_exported": false,
"reasoning": "Function is not referenced anywhere in the codebase and is not exported"
}
],
"warnings": [],
"success": true
}
```

## Interpreting Results

- **Very High Confidence (90-100%)**: Safe to remove. These are unused code elements with no references and typically not exported.
- **High Confidence (70-89%)**: Likely safe to remove. Usually has minimal references or is exported but not used.
- **Medium Confidence (50-69%)**: Exercise caution. Has some references but may still be dead code. Review before removing.

## Use Cases

### Clean Up Your Codebase
Remove unused code that accumulates over time as features are refactored or deprecated.

### Pre-Refactoring Analysis
Identify what can be safely removed before major refactoring efforts.

### Code Review
Use in your CI/CD pipeline to flag potential dead code during code reviews.

### Dependency Reduction
Identify unused exports that can be kept private or removed entirely.

## Tools Used

- **Git**: Version control operations
- **Filesystem**: Directory and file traversal
- **Ripgrep**: Efficient pattern matching and searching

## Error Handling

The agent handles errors gracefully:
- If tools fail, analysis continues with warnings
- Falls back to filesystem reading if pattern matching fails
- Returns `success: true` with warnings rather than failing entirely
- Handles cross-platform compatibility issues automatically

## Technical Details

The agent uses ripgrep for efficient cross-repository searching with language-specific patterns. It filters out comments and strings to minimize false positives and deduplicates results for accuracy.

## Limitations

- Static analysis only - cannot detect runtime dead code
- May have false negatives if code is referenced dynamically
- External library references may be missed if not directly imported in source files
- Consider running multiple times with different `min_confidence` values for comprehensive analysis

## Examples

### Example 1: Basic Analysis

```bash
qodo undertaker
```

Scans the repository and returns all dead code with confidence >= 70%.

### Example 2: High Confidence Only

```bash
qodo undertaker --min_confidence=90
```

Returns only the most reliable dead code detections (90-100% confidence).

### Example 3: Including Tests

```bash
qodo undertaker --include_tests=true
```

Includes test files in the analysis, which is useful for identifying unused test helpers or fixtures.

## Integration

The JSON output can be easily integrated into:
- CI/CD pipelines for automated reporting
- Code review tools
- Custom analysis scripts

## Best Practices

1. Start with the default `min_confidence=70` threshold
2. Review "Very High" confidence items first as candidates for removal
3. Use version control to safely remove dead code in isolated commits
4. Run the agent periodically to maintain code quality

## Support

For issues or questions about the Undertaker agent, please refer to the project documentation or create an issue in the repository.
164 changes: 164 additions & 0 deletions agents/undertaker/agent.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# The Undertaker - Dead Code Detection Agent
version = "1.0"

[commands.undertaker]
description = "Reliable dead code detection agent that identifies unused functions, classes, variables, imports, and unreachable code with confidence scoring"

instructions = """
You are a dead code detection agent. Find unused code elements in the repository using static analysis.

CORE MISSION:
Identify unused functions, classes, variables, imports, types, enums, methods, and unreachable code.
Provide confidence scoring based on reference counts and export status.
Be reliable and dependable within reasonable limits.
Output clean JSON results for integration.

ANALYSIS PROCESS:
1. Project Discovery: Scan ALL source files throughout the entire repository, excluding only generated/vendor folders. Do not limit analysis to specific routes, directories, or file patterns. Use the filesystem tool for comprehensive directory traversal starting from the project root and identify all files. Analyze all source directories and subdirectories when available for complete coverage.
2. Definition Detection: Find function/class/variable/import definitions using language-specific patterns. Use ripgrep for efficient pattern matching across all source files in the codebase.
3. Reference Counting: For each identifier, count actual usage across the codebase (excluding its definition). Use ripgrep with precise patterns to count references and exclude false positives.
4. Export Analysis: Detect if code is exported/public, which affects confidence scoring.
5. Unreachable Code: Find code after return/throw/break statements in the same block.
6. Confidence Scoring: Apply deterministic confidence rules based on usage patterns.

CONFIDENCE RULES:
No references + not exported = 100% confidence (very_high: 90-100).
No references + exported = 90% confidence (very_high: 90-100).
1 reference + not exported = 75% confidence (high: 70-89).
1 reference + exported = 70% confidence (high: 70-89).
2+ references = 60% or lower (medium: 50-69).
Unreachable code = 100% confidence (very_high: 90-100).

IMPORTANT: When calculating summary counts, ensure confidence_tier matches confidence_score ranges:
very_high: scores 90-100
high: scores 70-89
medium: scores 50-69

TECHNICAL REQUIREMENTS:
Use ripgrep for efficient cross-repository searching.
Filter out comments, strings, and false positives.
Handle multiple programming languages with appropriate patterns.
Deduplicate results and merge when appropriate.
Only include items with confidence >= min_confidence threshold.

OUTPUT REQUIREMENTS:
First, you MUST use the filesystem tool to write the final JSON output to a file named 'dead_code_analysis.json'.
Second, you MUST return the same valid JSON matching the output_schema to standard output.
Include summary statistics and detailed findings.
After writing the file, stop all operations. Do not perform any additional analysis or processing.

ERROR HANDLING:
If tools fail, add warnings but continue analysis.
Fall back to filesystem reading if ripgrep patterns fail.
Mark success=true with warnings rather than failing entirely.
Handle cross-platform compatibility issues gracefully.

FOCUS ON RELIABILITY:
Prioritize working correctly over handling every edge case.
Use simple, proven patterns over complex regex.
Provide useful results even with partial data.
Be dependable for common dead code scenarios.
"""

# Arguments for customizing the analysis
arguments = [
{ name = "min_confidence", type = "number", required = false, default = 70, description = "Minimum confidence threshold (50-100)" },
{ name = "include_tests", type = "boolean", required = false, default = false, description = "Whether to include test files in analysis" }
]

# Tools the agent can use
tools = ["git", "filesystem", "ripgrep"]

# Use plan strategy for multi-step analysis
execution_strategy = "plan"

# Simplified but comprehensive output schema
output_schema = """
{
"type": "object",
"required": ["summary", "dead_code_items", "success"],
"properties": {
"summary": {
"type": "object",
"description": "Summary statistics of the dead code analysis.",
"required": ["total_files_scanned", "total_dead_code_items", "confidence_counts", "estimated_lines_removable"],
"properties": {
"total_files_scanned": {
"type": "number",
"description": "Number of source files analyzed."
},
"total_dead_code_items": {
"type": "number",
"description": "Total dead code items found."
},
"confidence_counts": {
"type": "object",
"description": "Breakdown of dead code items by confidence tier.",
"properties": {
"very_high": { "type": "number" },
"high": { "type": "number" },
"medium": { "type": "number" }
}
},
"estimated_lines_removable": {
"type": "number",
"description": "Estimated lines that can be safely removed."
}
}
},
"dead_code_items": {
"type": "array",
"description": "List of dead code items found, sorted by confidence.",
"items": {
"type": "object",
"required": ["identifier", "type", "location", "confidence_score", "reference_count", "is_exported", "reasoning"],
"properties": {
"identifier": {
"type": "string",
"description": "Name of the code element."
},
"type": {
"type": "string",
"enum": ["function", "class", "method", "variable", "interface", "type", "enum", "import", "unreachable_code", "file"],
"description": "Type of code element."
},
"location": {
"type": "string",
"description": "File path and line range, e.g., 'src/helpers.ts:42-55' or 'lib/api.py:10-18'."
},
"confidence_score": {
"type": "number",
"minimum": 50,
"maximum": 100,
"description": "Confidence this is dead code (50-100)."
},
"reference_count": {
"type": "number",
"description": "Number of times referenced in the codebase (excluding definition)."
},
"is_exported": {
"type": "boolean",
"description": "Whether the code is exported/public."
},
"reasoning": {
"type": "string",
"description": "Explanation for why this is considered dead code."
}
}
}
},
"warnings": {
"type": "array",
"items": { "type": "string" },
"description": "Warnings encountered during analysis."
},
"success": {
"type": "boolean",
"description": "Whether the analysis completed successfully."
}
}
}
"""

# Success condition for CI/CD integration
exit_expression = "success"
Loading