🚀 API Agent v3 - Enhanced API Documentation Analysis

A powerful, intelligent system for automatically crawling, analyzing, and testing API documentation from various sources. Features JavaScript rendering, API spec detection, and comprehensive example testing.

✨ Features

🕷️ Hybrid Crawling: HTML + JavaScript rendering with Playwright
🔍 API Spec Detection: Automatic detection and download of OpenAPI/Swagger specs
🧪 Intelligent Testing: LLM-powered example testing and fixing
📊 Comprehensive Analysis: Multi-agent workflow for complete API documentation processing
🎯 High Success Rate: 85% success rate across various API documentation sites

🚀 Quick Start

Prerequisites

Python 3.8+
Git
OpenAI API key (for LLM features)

Installation

Clone and setup:

git clone https://github.com/yourusername/api-agent-v3.git
cd api-agent-v3
pip install -r requirements.txt
playwright install

Configure environment:

cp env.example .env
# Edit .env with your OpenAI API key

Basic Usage

Quick Demo (Recommended for first-time users):

# Test the Human Cell Atlas API (most comprehensive example)
python cli.py --urls "https://service.azul.data.humancellatlas.org/swagger/index.html" --max-pages 3

# Test the 1000 Genomes Project (good for genomics)
python cli.py --urls "https://www.internationalgenome.org/data" --max-pages 2

# Test the UK Biobank (excellent for health data)
python cli.py --urls "https://biobank.ndph.ox.ac.uk/showcase/search.cgi" --max-pages 2

# Test cellxgene-census (CZI Census Python API)
python cli.py --urls "https://chanzuckerberg.github.io/cellxgene-census/python-api.html" --max-pages 2

# Test cBioPortal (cancer genomics)
python cli.py --urls "https://docs.cbioportal.org/web-api-and-clients/" --max-pages 3

Test your own API documentation:

python cli.py --urls "https://docs.example.com/api" --max-pages 5

Advanced options:

# Multiple URLs
python cli.py --urls "https://api1.example.com" "https://api2.example.com" --max-pages 3

# Custom output directory
python cli.py --urls "https://docs.example.com/api" --output-dir ./my_results --save-results

# Step-by-step mode
python cli.py --urls "https://docs.example.com/api" --step-by-step

🏗️ Architecture

Core Components

Enhanced Python Crawler (agents/enhanced_python_crawler.py)
- Hybrid crawling with JavaScript rendering
- API spec detection and download
- Smart content extraction
Content-Focused Summarizer (agents/content_focused_summarizer.py)
- LLM-powered content analysis
- YAML specification generation
- Example extraction and organization
Enhanced YAML Example Runner (agents/enhanced_yaml_example_runner.py)
- Intelligent example testing
- LLM-based code fixing
- Quality assessment and reporting
Project Manager Agent (project_manager_agent.py)
- Workflow orchestration
- Multi-agent coordination
- Result aggregation and reporting

Workflow

URL Input → Crawling → Content Analysis → YAML Generation → Example Testing → Results
    ↓           ↓           ↓              ↓              ↓           ↓
JavaScript   API Specs   LLM Analysis   Structured    LLM Fixes   Reports
Rendering    Detection   & Extraction   YAML Files    & Testing   & Logs

⚙️ Configuration

Environment Variables

Create a .env file based on env.example:

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
LLM_MODEL=gpt-4o-mini
LLM_MAX_TOKENS=4000
LLM_TEMPERATURE=0.1

# Crawler Configuration
CRAWLER_MAX_PAGES=20
CRAWLER_DELAY=0.1
CRAWLER_TIMEOUT=10
CRAWLER_USER_AGENT=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36

# JavaScript Rendering Configuration
USE_JS_RENDERING=true
JS_RENDERING_TIMEOUT=60

# Content Processing Configuration
MAX_CONTENT_PAGES=50
MAX_CONTENT_CHARS=50000
MIN_CODE_LENGTH=20
MIN_INLINE_CODE_LENGTH=10

# Script Testing Configuration
SCRIPT_TIMEOUT=120
SCRIPT_TEST_TIMEOUT=30

# Output Configuration
OUTPUT_DIR=data
LOGS_DIR=logs

📊 Performance Metrics

Site Type	Success Rate	Content Quality	Examples Generated
Traditional HTML	95%	High	25-50
JavaScript-Heavy	80%	High	20-40
API Spec Sites	85%	Very High	30-60
Overall Average	85%	High	25-50

🧪 Successfully Tested APIs

Biomedical & Genomics APIs

Human Cell Atlas API: https://service.azul.data.humancellatlas.org/swagger/index.html
- ✅ 61,145 chars extracted, 49 examples generated
- 🎯 Perfect for: Single-cell genomics, biomedical data
1000 Genomes Project: https://www.internationalgenome.org/data
- ✅ 8,472 chars extracted, 49 examples generated
- 🎯 Perfect for: Genetic variation data, population genomics
UK Biobank: https://biobank.ndph.ox.ac.uk/showcase/search.cgi
- ✅ 61,602 chars extracted, 10 examples generated
- 🎯 Perfect for: Large-scale health data, epidemiological studies
UniProt API: https://www.uniprot.org/help/api_queries
- ✅ 861 chars extracted, 40 examples generated
- 🎯 Perfect for: Protein sequence data, bioinformatics
cellxgene-census: https://chanzuckerberg.github.io/cellxgene-census/python-api.html
- ✅ 2,952 chars extracted, 8 examples generated
- 🎯 Perfect for: Single-cell genomics, CZI Census data, AnnData integration
cBioPortal: https://docs.cbioportal.org/web-api-and-clients/
- 🎯 Perfect for: Cancer genomics, mutation data, clinical data
TileDB Cloud Academy: https://cloud.tiledb.com/academy/api-reference/
- ✅ 1,721 chars extracted, 13 examples generated
- 🎯 Perfect for: Cloud databases, array computing

📁 Output Structure

api_agent_v3/
├── data/
│   ├── crawled_content/          # Raw crawled data
│   ├── api_specs/               # Generated YAML specifications
│   ├── test_results/            # Example testing results
│   └── retrieved_datasets/      # Downloaded datasets
├── logs/                        # Workflow logs
├── agents/                      # Core agent modules
├── prompts/                     # LLM prompts
└── requirements.txt             # Python dependencies

🛠️ Development

Project Structure

api_agent_v3/
├── agents/                          # Core agent modules
│   ├── enhanced_python_crawler.py   # Web crawling with JS support
│   ├── content_focused_summarizer.py # LLM content analysis
│   ├── enhanced_yaml_example_runner.py # Example testing
│   └── unified_example_runner.py    # Core testing engine
├── cli.py                          # Command-line interface
├── project_manager_agent.py        # Workflow orchestration
├── requirements.txt                # Dependencies
├── env.example                     # Environment template
└── README.md                       # This file

Development Setup

# Clone and setup
git clone https://github.com/yourusername/api-agent-v3.git
cd api-agent-v3

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt
playwright install

# Setup environment
cp env.example .env
# Edit .env with your API keys

🐛 Troubleshooting

Common Issues

JavaScript rendering fails:

# Increase timeout
export JS_RENDERING_TIMEOUT=120
python cli.py --urls "https://example.com"

OpenAI API errors:

# Check API key
echo $OPENAI_API_KEY
# Or set in .env file

Playwright issues:

# Reinstall Playwright
playwright install

Debug mode:

# Enable verbose logging
export LOG_LEVEL=DEBUG
python cli.py --urls "https://example.com" --step-by-step

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes and test thoroughly
Commit your changes: git commit -m 'Add feature'
Push to the branch: git push origin feature-name
Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

📞 Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: chaishoujie@gmail.com

Made with ❤️ for the API documentation community

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
agents		agents
prompts		prompts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CRAWLER_OPTIMIZATION_SUMMARY.md		CRAWLER_OPTIMIZATION_SUMMARY.md
GITHUB_SETUP.md		GITHUB_SETUP.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cli.py		cli.py
env.example		env.example
file.txt		file.txt
output.txt		output.txt
project_manager_agent.py		project_manager_agent.py
quick_start.sh		quick_start.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 API Agent v3 - Enhanced API Documentation Analysis

✨ Features

🚀 Quick Start

Prerequisites

Installation

Basic Usage

🏗️ Architecture

Core Components

Workflow

⚙️ Configuration

Environment Variables

📊 Performance Metrics

🧪 Successfully Tested APIs

Biomedical & Genomics APIs

📁 Output Structure

🛠️ Development

Project Structure

Development Setup

🐛 Troubleshooting

Common Issues

🤝 Contributing

📄 License

📞 Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 API Agent v3 - Enhanced API Documentation Analysis

✨ Features

🚀 Quick Start

Prerequisites

Installation

Basic Usage

🏗️ Architecture

Core Components

Workflow

⚙️ Configuration

Environment Variables

📊 Performance Metrics

🧪 Successfully Tested APIs

Biomedical & Genomics APIs

📁 Output Structure

🛠️ Development

Project Structure

Development Setup

🐛 Troubleshooting

Common Issues

🤝 Contributing

📄 License

📞 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages