This directory contains example code for using GeoPlan Benchmark.
Task generation example demonstrating how to:
- Generate a single task
- Generate tasks in batch
- View task statistics
Run with:
python examples/example_generate_tasks.pyTask filtering example demonstrating how to:
- Load raw tasks
- Execute filtering pipeline
- View filtering statistics
- Save filtered tasks
- Analyze tool importance
Run with:
python examples/example_filter_tasks.pyTask evaluation example demonstrating how to:
- Load tasks
- Initialize evaluator
- Evaluate a single task
- View evaluation results
Run with:
python examples/example_evaluate.pyCustom agent example demonstrating how to:
- Create a custom agent
- Implement BaseAgent interface
- Add tools
- Integrate into evaluation pipeline
Run with:
python examples/example_custom_agent.pyCustom evaluation metric example demonstrating how to:
- Create a custom evaluation metric
- Compute evaluation scores
- Integrate into evaluation pipeline
Run with:
python examples/example_custom_metric.py- Create and activate a virtual environment (venv or conda):
python -m venv venv
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate- Install dependencies:
pip install -r requirements.txt
pip install -e . # Optional, for command-line tools- Configure environment variables (create
.envfile):
OPENAI_API_KEY=your_key
OPENAI_API_BASE=https://api.openai.com/v1
GEMINI_API_KEY=your_key
- Running task generation and evaluation examples requires LLM API calls and incurs costs
- Paths in example code may need adjustment based on actual situation
- It's recommended to run simple examples first to confirm environment configuration before running full pipeline
Based on these examples, you can:
- Create your own agent implementations
- Add new evaluation metrics
- Customize task generation pipeline
- Integrate into your own projects
For more details, please refer to: