Name	Name	Last commit message	Last commit date
parent directory ..
data_examples	data_examples
README.md	README.md
example_custom_agent.py	example_custom_agent.py
example_custom_metric.py	example_custom_metric.py
example_evaluate.py	example_evaluate.py
example_filter_tasks.py	example_filter_tasks.py
example_generate_tasks.py	example_generate_tasks.py

Name

Last commit message

Last commit date

data_examples

README.md

example_custom_agent.py

example_custom_metric.py

example_evaluate.py

example_filter_tasks.py

example_generate_tasks.py

GeoPlan Benchmark Example Code

This directory contains example code for using GeoPlan Benchmark.

Example List

1. example_generate_tasks.py

Task generation example demonstrating how to:

Generate a single task
Generate tasks in batch
View task statistics

Run with:

python examples/example_generate_tasks.py

2. example_filter_tasks.py

Task filtering example demonstrating how to:

Load raw tasks
Execute filtering pipeline
View filtering statistics
Save filtered tasks
Analyze tool importance

Run with:

python examples/example_filter_tasks.py

3. example_evaluate.py

Task evaluation example demonstrating how to:

Load tasks
Initialize evaluator
Evaluate a single task
View evaluation results

Run with:

python examples/example_evaluate.py

4. example_custom_agent.py

Custom agent example demonstrating how to:

Create a custom agent
Implement BaseAgent interface
Add tools
Integrate into evaluation pipeline

Run with:

python examples/example_custom_agent.py

5. example_custom_metric.py

Custom evaluation metric example demonstrating how to:

Create a custom evaluation metric
Compute evaluation scores
Integrate into evaluation pipeline

Run with:

python examples/example_custom_metric.py

Usage Instructions

Create and activate a virtual environment (venv or conda):

python -m venv venv
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt
pip install -e .  # Optional, for command-line tools

Configure environment variables (create .env file):

OPENAI_API_KEY=your_key
OPENAI_API_BASE=https://api.openai.com/v1
GEMINI_API_KEY=your_key

Notes

Running task generation and evaluation examples requires LLM API calls and incurs costs
Paths in example code may need adjustment based on actual situation
It's recommended to run simple examples first to confirm environment configuration before running full pipeline

Extension Examples

Based on these examples, you can:

Create your own agent implementations
Add new evaluation metrics
Customize task generation pipeline
Integrate into your own projects

For more details, please refer to:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

GeoPlan Benchmark Example Code

Example List

1. example_generate_tasks.py

2. example_filter_tasks.py

3. example_evaluate.py

4. example_custom_agent.py

5. example_custom_metric.py

Usage Instructions

Notes

Extension Examples

FilesExpand file tree

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

GeoPlan Benchmark Example Code

Example List

1. example_generate_tasks.py

2. example_filter_tasks.py

3. example_evaluate.py

4. example_custom_agent.py

5. example_custom_metric.py

Usage Instructions

Notes

Extension Examples