Skip to content

Conversation

@anuj123upadhyay
Copy link

@anuj123upadhyay anuj123upadhyay commented Oct 17, 2025

User description

CodeGraph Navigator Agent for Complex Codebase Analysis

User Description / Summary

Introducing the CodeGraph Navigator, a Microservice Dependency Intelligence Agent designed for complex enterprise codebases.

This agent automates high-effort workflows across multiple repositories and services, providing developers with detailed insights into service dependencies, criticality, and potential impact of changes.


Description of Changes

  • Added the CodeGraph Navigator agent with comprehensive repository scanning and knowledge graph construction.
  • Features include:
    • Multi-language repository support: Go, JavaScript/TypeScript, Python, Java, C#
    • Knowledge graph construction across microservices
    • Criticality assessment with risk levels
    • Impact analysis to predict downstream effects before changes
    • Natural language querying for dependency questions
    • Visual risk matrix and detailed reporting
    • Incremental learning as repositories evolve
  • Updated documentation with setup instructions, CLI examples, and usage scenarios.
  • Defined output schemas for analyze_codebase, impact_analysis, and criticality_matrix commands.
  • Added system requirements and installation instructions for Python, Node.js, and Qodo CLI tools.

Why This Change Is Needed

Managing large-scale microservice architectures is error-prone and time-consuming.
This agent helps developers:

  • Understand dependencies between services quickly
  • Assess risks before implementing changes
  • Plan deployments effectively and safely
  • Automate repetitive analysis tasks across repositories

Testing Performed

  • Ran the agent locally on a sample multi-service repository.
  • Validated knowledge graph generation, dependency analysis, and criticality scoring.
  • Tested CLI commands for analyze_codebase, impact_analysis, and criticality_matrix.
  • Verified reports and output schema accuracy.

Additional Notes

  • Future improvements could include:
    • Integration with CI/CD pipelines for automated impact assessments
    • Visual dashboards for cross-repo dependency maps
    • Enhanced language/framework support


PR Type

Enhancement


Description

  • Introduces CodeGraph Navigator agent for microservice dependency intelligence

  • Implements multi-language repository scanning (Go, JavaScript/TypeScript, Python, Java, C#)

  • Builds knowledge graph with criticality assessment and impact analysis

  • Provides natural language querying for dependency relationships

  • Includes comprehensive documentation and CLI examples


Diagram Walkthrough

flowchart LR
  A["Repository Scanner<br/>scanner.py"] -->|JSON output| B["Knowledge Graph Builder<br/>graph_builder.py"]
  B -->|Updates| C["Knowledge Graph<br/>knowledge_graph.json"]
  C -->|Queries| D["Query Engine<br/>query_engine.py"]
  D -->|Analyzes| E["Criticality Assessment<br/>Impact Analysis"]
  E -->|Generates| F["Reports & Recommendations"]
Loading

File Walkthrough

Relevant files
Enhancement
3 files
scanner.py
Repository scanner for multi-language dependency extraction
+66/-0   
graph_builder.py
Knowledge graph construction and node/edge management       
+63/-0   
query_engine.py
Natural language query processor for dependency analysis 
+56/-0   
Configuration changes
4 files
agent.toml
Agent configuration with comprehensive command definitions
+1058/-0
knowledge_graph.json
Initial knowledge graph with sample microservices data     
+333/-0 
graph-builder.tool.json
Tool configuration for graph builder integration                 
+21/-0   
query-engine.tool.json
Tool configuration for query engine integration                   
+20/-0   
Documentation
2 files
README.md
Complete documentation with usage examples and architecture
+461/-0 
analysis_report_2025-10-17T00-00-00Z.txt
Sample analysis report demonstrating output format             
+75/-0   

…e in complex codebases (#QodoAgentChallenge)
@qodo-merge-for-open-source
Copy link
Contributor

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Concurrent file write

Description: The code opens and truncates the global knowledge graph file at
'data/knowledge_graph.json' without any file-locking or concurrency control, which can
corrupt the graph if multiple processes update it concurrently.
graph_builder.py [13-51]

Referred Code
with open(GRAPH_PATH, 'r+') as f:
    graph = json.load(f)
    nodes = graph.get('nodes', [])
    edges = graph.get('edges', [])

    # --- Node Management ---
    repo_name = scan_results['name']
    node_exists = any(node['id'] == repo_name for node in nodes)

    if not node_exists:
        nodes.append({
            "id": repo_name,
            "type": "service",
            "language": scan_results['language']
        })

    # --- Edge Management ---
    existing_edges = {(edge['source'], edge['target']) for edge in edges}

    for imp in scan_results.get('imports', []):
        target_node_exists = any(node['id'] == imp for node in nodes)


 ... (clipped 18 lines)
Unvalidated input usage

Description: The query engine trusts the contents of 'data/knowledge_graph.json' without validation and
prints interpolated values, enabling potential log/message injection if the graph is
attacker-controlled.
query_engine.py [24-26]

Referred Code
with open(GRAPH_PATH, 'r') as f:
    graph = json.load(f)
Unrestricted path scan

Description: The repository scanner reads arbitrary files under a provided path and prints JSON to
stdout without path allowlisting or sandboxing, which can be risky if paths are
user-supplied; at minimum, path validation or restrictions should be added.
scanner.py [31-45]

Referred Code
for file in files:
    if file.endswith('.go'):
        file_path = os.path.join(root, file)
        try:
            with open(file_path, 'r', errors='ignore') as f:
                content = f.read()
                # This regex finds patterns like pb.NewPaymentServiceClient(conn)
                matches = re.findall(r'pb\.New(\w+?)ServiceClient', content)
                for match in matches:
                    # Convert "Payment" to "paymentservice"
                    # THIS IS THE CORRECTED LINE:
                    service_name = match.lower() + "service" 
                    results["imports"].append(service_name)
        except Exception:
            continue
Ticket Compliance
🎫 No ticket provided
- [ ] Create ticket/issue <!-- /create_ticket --create_ticket=true -->

</details></td></tr>
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
No custom compliance provided

Follow the guide to enable custom compliance check.

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-merge-for-open-source
Copy link
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Implement core logic in tools

The agent's core analysis logic, like criticality calculation and report
generation, is defined as instructions for an LLM in agent.toml instead of being
implemented in the Python tool scripts. This logic should be moved into the
Python tools for robustness and efficiency.

Examples:

agents/codegraph-navigator-agent/agent.toml [10-87]
instructions = """
You are CodeGraph Navigator, an intelligent software analysis agent designed to understand complex microservice architectures.

Your mission is to help developers understand dependency relationships across multiple repositories by:
1. Scanning repositories to discover their programming language and dependencies
2. Building a central knowledge graph that maps all service relationships
3. Analyzing criticality scores to assess risk of changes
4. Answering natural language questions about the codebase architecture
5. Generating comprehensive reports with impact analysis


 ... (clipped 68 lines)
agents/codegraph-navigator-agent/tools/query_engine.py [6-21]
def find_dependents(target_service, graph):
    """Finds all services that depend on the target_service."""
    dependents = []
    for edge in graph['edges']:
        if edge['target'] == target_service:
            dependents.append(edge['source'])
    return dependents

def find_dependencies(target_service, graph):
    """Finds all libraries/services that the target_service depends on."""

 ... (clipped 6 lines)

Solution Walkthrough:

Before:

# agent.toml (instructions for LLM)
"""
STEP 1: REPOSITORY SCANNING
- Execute: python3 tools/scanner.py <repository_path>

STEP 2: KNOWLEDGE GRAPH UPDATE
- Execute: python3 tools/graph_builder.py and pipe the JSON to it

STEP 3: CRITICALITY ANALYSIS
- Calculate dependent count (how many services depend on this service)
- Assign criticality level based on the CRITICALITY SCALE...

STEP 4: REPORT GENERATION
- Create: data/reports/analysis_report_<timestamp>.txt
- Include: Executive summary, Services analyzed, etc.
"""

# tools/query_engine.py
def find_dependents(target_service, graph):
    # ... simple logic to find direct dependents ...
    return dependents

After:

# agent.toml (simplified instructions for LLM)
"""
When asked to analyze a repository, run the analysis tool.
- Execute: python3 tools/analyzer.py analyze --repository_path <path>
- Display the summary and report path from the tool's output.
"""

# tools/analyzer.py (new or enhanced tool)
class Analyzer:
    def analyze_repo(self, path):
        scan_results = self.scanner.scan(path)
        self.graph_builder.update(scan_results)
        # ...
    
    def calculate_criticality(self, service_name):
        # Implements logic to count dependents and assign risk level
        # ...
        return criticality_report

    def generate_report(self, analysis_data):
        # Implements logic to format and write the full report file
        # ...
        return report_path
Suggestion importance[1-10]: 10

__

Why: This suggestion correctly identifies a critical architectural flaw where core analysis logic is defined in prompts within agent.toml instead of being implemented in the Python tools, making the agent fragile, inefficient, and hard to maintain.

High
Possible issue
Validate repository path exists

Add validation to scan_repository to ensure the provided repo_path exists and is
a directory before the scan begins, preventing errors with invalid paths.

agents/codegraph-navigator-agent/tools/scanner.py [8-12]

 def scan_repository(repo_path: str):
     """Scans a repository to find dependencies and other info."""
+    
+    if not os.path.exists(repo_path):
+        print(json.dumps({"error": f"Path does not exist: {repo_path}"}), file=sys.stderr)
+        sys.exit(1)
+    
+    if not os.path.isdir(repo_path):
+        print(json.dumps({"error": f"Path is not a directory: {repo_path}"}), file=sys.stderr)
+        sys.exit(1)
     
     clean_path = os.path.normpath(repo_path)
     repo_name = os.path.basename(clean_path)
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion adds crucial input validation that is missing from the script but explicitly required by the agent's instructions in agent.toml, making the tool more robust and user-friendly.

Medium
Handle missing knowledge graph file

To prevent a FileNotFoundError, add a check to create data/knowledge_graph.json
with a default empty structure if it does not already exist.

agents/codegraph-navigator-agent/tools/graph_builder.py [13-51]

+import os
+
+# Ensure the file exists with a valid empty graph
+if not os.path.exists(GRAPH_PATH):
+    os.makedirs(os.path.dirname(GRAPH_PATH), exist_ok=True)
+    with open(GRAPH_PATH, 'w') as f:
+        json.dump({"nodes": [], "edges": []}, f, indent=2)
+
 with open(GRAPH_PATH, 'r+') as f:
     graph = json.load(f)
     nodes = graph.get('nodes', [])
     edges = graph.get('edges', [])
     
     # ... node and edge updates ...
     
     # Write the updated graph back to the file
     f.seek(0)
     f.truncate()
     json.dump(graph, f, indent=2)

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the script will crash if knowledge_graph.json is missing and provides a robust solution to prevent this by creating the file if needed.

Medium
Add error handling for missing file

Add error handling to query_graph to manage cases where knowledge_graph.json is
missing or contains invalid JSON, preventing the script from crashing.

agents/codegraph-navigator-agent/tools/query_engine.py [22-25]

 def query_graph(query):
     """Processes a natural language query against the knowledge graph."""
-    with open(GRAPH_PATH, 'r') as f:
-        graph = json.load(f)
+    if not os.path.exists(GRAPH_PATH):
+        print(f"Error: Knowledge graph not found at {GRAPH_PATH}. Please run a scan first.")
+        return
+    
+    try:
+        with open(GRAPH_PATH, 'r') as f:
+            graph = json.load(f)
+    except json.JSONDecodeError:
+        print(f"Error: Knowledge graph file is corrupted. Please rebuild the graph.")
+        return
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that the script will crash if the graph file is missing or corrupted and proposes adding necessary error handling for both scenarios.

Medium
Handle invalid JSON input

Add a try-except block around json.loads in update_graph to gracefully handle
potential JSONDecodeError from invalid input.

agents/codegraph-navigator-agent/tools/graph_builder.py [9-11]

 def update_graph(scan_results_json: str):
     """Updates the knowledge graph with new scan results."""
-    scan_results = json.loads(scan_results_json)
+    try:
+        scan_results = json.loads(scan_results_json)
+    except json.JSONDecodeError as e:
+        print(f"Error: Invalid JSON input - {e}", file=sys.stderr)
+        sys.exit(1)
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the script will crash on malformed JSON input and proposes adding a try-except block, which is a standard and necessary improvement for robustness.

Medium
General
Correctly categorize services vs libraries

Fix the node creation logic to correctly differentiate between "service" and
"library" types for new dependencies, ensuring accurate categorization in the
knowledge graph.

agents/codegraph-navigator-agent/tools/graph_builder.py [32-42]

 for imp in scan_results.get('imports', []):
     target_node_exists = any(node['id'] == imp for node in nodes)
     if not target_node_exists:
-         nodes.append({"id": imp, "type": "library"})
+        # Check if this is a service (ends with 'service') or a library
+        node_type = "service" if imp.endswith('service') else "library"
+        nodes.append({"id": imp, "type": node_type})
 
     if (repo_name, imp) not in existing_edges:
         edges.append({
             "source": repo_name,
             "target": imp,
             "type": "imports"
         })
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: This suggestion fixes a logic bug where all new dependencies are incorrectly typed as "library", leading to an inaccurate knowledge graph. The fix correctly categorizes nodes, which is critical for the agent's functionality.

Medium
  • More

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant