Lighteval allows you to evaluate custom model implementations by creating a custom model class that inherits from LightevalModel
. This is useful when you want to evaluate models that aren't directly supported by the standard backends (transformers, vllm, etc).
- Create a Python file containing your custom model implementation. The model must inherit from
LightevalModel
and implement all required methods.
Here's a basic example:
from lighteval.models.abstract_model import LightevalModel
class MyCustomModel(LightevalModel):
def __init__(self, config, env_config):
super().__init__(config, env_config)
# Initialize your model here...
def greedy_until(self, requests, max_tokens=None, stop_sequences=None):
# Implement generation logic
pass
def loglikelihood(self, requests, log=True):
# Implement loglikelihood computation
pass
def loglikelihood_rolling(self, requests):
# Implement rolling loglikelihood computation
pass
def loglikelihood_single_token(self, requests):
# Implement single token loglikelihood computation
pass
- The custom model file should contain exactly one class that inherits from
LightevalModel
. This class will be automatically detected and instantiated when loading the model.
Tip
You can find a complete example of a custom model implementation in examples/custom_models/google_translate_model.py
.
You can evaluate your custom model using either the command line interface or the Python API.
python -m lighteval custom \
"google-translate" \
"examples/custom_models/google_translate_model.py" \
"lighteval|wmt20:fr-de|0|0" \
--output-dir results \
--max-samples 10
The command takes three required arguments:
- The model name (used for tracking in results/logs)
- The path to your model implementation file
- The tasks to evaluate on (same format as other backends)
from lighteval.logging.evaluation_tracker import EvaluationTracker
from lighteval.models.custom.custom_model import CustomModelConfig
from lighteval.pipeline import Pipeline, PipelineParameters, EnvConfig
# Set up evaluation tracking
evaluation_tracker = EvaluationTracker(
output_dir="results",
save_details=True
)
# Configure the pipeline
pipeline_params = PipelineParameters(
launcher_type=ParallelismManager.CUSTOM,
env_config=EnvConfig(cache_dir="tmp/")
)
# Configure your custom model
model_config = CustomModelConfig(
model="my-custom-model",
model_definition_file_path="path/to/my_model.py"
)
# Create and run the pipeline
pipeline = Pipeline(
tasks="leaderboard|truthfulqa:mc|0|0",
pipeline_parameters=pipeline_params,
evaluation_tracker=evaluation_tracker,
model_config=model_config
)
pipeline.evaluate()
pipeline.save_and_push_results()
Your custom model must implement these core methods:
greedy_until
: For generating text until a stop sequence or max tokens is reachedloglikelihood
: For computing log probabilities of specific continuationsloglikelihood_rolling
: For computing rolling log probabilities of sequencesloglikelihood_single_token
: For computing log probabilities of single tokens
See the LightevalModel
base class documentation for detailed method signatures and requirements.
-
Error Handling: Implement robust error handling in your model methods to gracefully handle edge cases.
-
Batching: Consider implementing efficient batching in your model methods to improve performance.
-
Resource Management: Properly manage any resources (e.g., API connections, model weights) in your model's
__init__
and__del__
methods. -
Documentation: Add clear docstrings to your model class and methods explaining any specific requirements or limitations.
Custom models are particularly useful for:
- Evaluating models accessed through custom APIs
- Wrapping models with specialized preprocessing/postprocessing
- Testing novel model architectures
- Evaluating ensemble models
- Integrating with external services or tools
For a complete example of a custom model that wraps the Google Translate API, see examples/custom_models/google_translate_model.py
.