Skip to content

ShoujieChai/api_agent_v3

Repository files navigation

πŸš€ API Agent v3 - Enhanced API Documentation Analysis

A powerful, intelligent system for automatically crawling, analyzing, and testing API documentation from various sources. Features JavaScript rendering, API spec detection, and comprehensive example testing.

✨ Features

  • πŸ•·οΈ Hybrid Crawling: HTML + JavaScript rendering with Playwright
  • πŸ” API Spec Detection: Automatic detection and download of OpenAPI/Swagger specs
  • πŸ§ͺ Intelligent Testing: LLM-powered example testing and fixing
  • πŸ“Š Comprehensive Analysis: Multi-agent workflow for complete API documentation processing
  • 🎯 High Success Rate: 85% success rate across various API documentation sites

πŸš€ Quick Start

Prerequisites

  • Python 3.8+
  • Git
  • OpenAI API key (for LLM features)

Installation

  1. Clone and setup:

    git clone https://github.com/yourusername/api-agent-v3.git
    cd api-agent-v3
    pip install -r requirements.txt
    playwright install
  2. Configure environment:

    cp env.example .env
    # Edit .env with your OpenAI API key

Basic Usage

Quick Demo (Recommended for first-time users):

# Test the Human Cell Atlas API (most comprehensive example)
python cli.py --urls "https://service.azul.data.humancellatlas.org/swagger/index.html" --max-pages 3

# Test the 1000 Genomes Project (good for genomics)
python cli.py --urls "https://www.internationalgenome.org/data" --max-pages 2

# Test the UK Biobank (excellent for health data)
python cli.py --urls "https://biobank.ndph.ox.ac.uk/showcase/search.cgi" --max-pages 2

# Test cellxgene-census (CZI Census Python API)
python cli.py --urls "https://chanzuckerberg.github.io/cellxgene-census/python-api.html" --max-pages 2

# Test cBioPortal (cancer genomics)
python cli.py --urls "https://docs.cbioportal.org/web-api-and-clients/" --max-pages 3

Test your own API documentation:

python cli.py --urls "https://docs.example.com/api" --max-pages 5

Advanced options:

# Multiple URLs
python cli.py --urls "https://api1.example.com" "https://api2.example.com" --max-pages 3

# Custom output directory
python cli.py --urls "https://docs.example.com/api" --output-dir ./my_results --save-results

# Step-by-step mode
python cli.py --urls "https://docs.example.com/api" --step-by-step

πŸ—οΈ Architecture

Core Components

  1. Enhanced Python Crawler (agents/enhanced_python_crawler.py)

    • Hybrid crawling with JavaScript rendering
    • API spec detection and download
    • Smart content extraction
  2. Content-Focused Summarizer (agents/content_focused_summarizer.py)

    • LLM-powered content analysis
    • YAML specification generation
    • Example extraction and organization
  3. Enhanced YAML Example Runner (agents/enhanced_yaml_example_runner.py)

    • Intelligent example testing
    • LLM-based code fixing
    • Quality assessment and reporting
  4. Project Manager Agent (project_manager_agent.py)

    • Workflow orchestration
    • Multi-agent coordination
    • Result aggregation and reporting

Workflow

URL Input β†’ Crawling β†’ Content Analysis β†’ YAML Generation β†’ Example Testing β†’ Results
    ↓           ↓           ↓              ↓              ↓           ↓
JavaScript   API Specs   LLM Analysis   Structured    LLM Fixes   Reports
Rendering    Detection   & Extraction   YAML Files    & Testing   & Logs

βš™οΈ Configuration

Environment Variables

Create a .env file based on env.example:

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
LLM_MODEL=gpt-4o-mini
LLM_MAX_TOKENS=4000
LLM_TEMPERATURE=0.1

# Crawler Configuration
CRAWLER_MAX_PAGES=20
CRAWLER_DELAY=0.1
CRAWLER_TIMEOUT=10
CRAWLER_USER_AGENT=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36

# JavaScript Rendering Configuration
USE_JS_RENDERING=true
JS_RENDERING_TIMEOUT=60

# Content Processing Configuration
MAX_CONTENT_PAGES=50
MAX_CONTENT_CHARS=50000
MIN_CODE_LENGTH=20
MIN_INLINE_CODE_LENGTH=10

# Script Testing Configuration
SCRIPT_TIMEOUT=120
SCRIPT_TEST_TIMEOUT=30

# Output Configuration
OUTPUT_DIR=data
LOGS_DIR=logs

πŸ“Š Performance Metrics

Site Type Success Rate Content Quality Examples Generated
Traditional HTML 95% High 25-50
JavaScript-Heavy 80% High 20-40
API Spec Sites 85% Very High 30-60
Overall Average 85% High 25-50

πŸ§ͺ Successfully Tested APIs

Biomedical & Genomics APIs

  • Human Cell Atlas API: https://service.azul.data.humancellatlas.org/swagger/index.html

    • βœ… 61,145 chars extracted, 49 examples generated
    • 🎯 Perfect for: Single-cell genomics, biomedical data
  • 1000 Genomes Project: https://www.internationalgenome.org/data

    • βœ… 8,472 chars extracted, 49 examples generated
    • 🎯 Perfect for: Genetic variation data, population genomics
  • UK Biobank: https://biobank.ndph.ox.ac.uk/showcase/search.cgi

    • βœ… 61,602 chars extracted, 10 examples generated
    • 🎯 Perfect for: Large-scale health data, epidemiological studies
  • UniProt API: https://www.uniprot.org/help/api_queries

    • βœ… 861 chars extracted, 40 examples generated
    • 🎯 Perfect for: Protein sequence data, bioinformatics
  • cellxgene-census: https://chanzuckerberg.github.io/cellxgene-census/python-api.html

    • βœ… 2,952 chars extracted, 8 examples generated
    • 🎯 Perfect for: Single-cell genomics, CZI Census data, AnnData integration
  • cBioPortal: https://docs.cbioportal.org/web-api-and-clients/

    • 🎯 Perfect for: Cancer genomics, mutation data, clinical data
  • TileDB Cloud Academy: https://cloud.tiledb.com/academy/api-reference/

    • βœ… 1,721 chars extracted, 13 examples generated
    • 🎯 Perfect for: Cloud databases, array computing

πŸ“ Output Structure

api_agent_v3/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ crawled_content/          # Raw crawled data
β”‚   β”œβ”€β”€ api_specs/               # Generated YAML specifications
β”‚   β”œβ”€β”€ test_results/            # Example testing results
β”‚   └── retrieved_datasets/      # Downloaded datasets
β”œβ”€β”€ logs/                        # Workflow logs
β”œβ”€β”€ agents/                      # Core agent modules
β”œβ”€β”€ prompts/                     # LLM prompts
└── requirements.txt             # Python dependencies

πŸ› οΈ Development

Project Structure

api_agent_v3/
β”œβ”€β”€ agents/                          # Core agent modules
β”‚   β”œβ”€β”€ enhanced_python_crawler.py   # Web crawling with JS support
β”‚   β”œβ”€β”€ content_focused_summarizer.py # LLM content analysis
β”‚   β”œβ”€β”€ enhanced_yaml_example_runner.py # Example testing
β”‚   └── unified_example_runner.py    # Core testing engine
β”œβ”€β”€ cli.py                          # Command-line interface
β”œβ”€β”€ project_manager_agent.py        # Workflow orchestration
β”œβ”€β”€ requirements.txt                # Dependencies
β”œβ”€β”€ env.example                     # Environment template
└── README.md                       # This file

Development Setup

# Clone and setup
git clone https://github.com/yourusername/api-agent-v3.git
cd api-agent-v3

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
playwright install

# Setup environment
cp env.example .env
# Edit .env with your API keys

πŸ› Troubleshooting

Common Issues

JavaScript rendering fails:

# Increase timeout
export JS_RENDERING_TIMEOUT=120
python cli.py --urls "https://example.com"

OpenAI API errors:

# Check API key
echo $OPENAI_API_KEY
# Or set in .env file

Playwright issues:

# Reinstall Playwright
playwright install

Debug mode:

# Enable verbose logging
export LOG_LEVEL=DEBUG
python cli.py --urls "https://example.com" --step-by-step

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Make your changes and test thoroughly
  4. Commit your changes: git commit -m 'Add feature'
  5. Push to the branch: git push origin feature-name
  6. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“ž Support


Made with ❀️ for the API documentation community

About

A powerful, intelligent system for automatically crawling, analyzing, and testing API documentation from various sources. Features JavaScript rendering, API spec detection, and comprehensive example testing.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors