H2O Sonar

Explainability Toolbox for Predictive and Generative AI

18 predictive AI explainers.
44 generative AI evaluators.
1,000,000+ curated eval prompts.
Automatic best model selection.
Used by H2O Eval Studio and H2O Driverless AI.

H2O Sonar is a Python library for AI model risk management (MRM) across predictive and generative systems. It provides explainers and evaluators that validate models, detect bias, assess fairness and privacy, and generate audit documentation. Built for regulated industries, H2O Sonar enables risk, compliance, and validation teams to quantify model risk, meet regulatory requirements, and maintain robust governance throughout the models lifecycle.

Predictive AI
Generative AI
Installation
Documentation
Feature Requests and Bugs
Contribute
Credits

H2O Sonar is used by the following H2O.ai products:

Predictive AI

H2O Driverless AI, H2O-3 and scikit-learn predictive models can be explained by the H2O Sonar.

H2O Sonar explanation report examples:

Explainers overview (HTML)
Credit card use case (PDF)

Examples:

Explainers

Approximate model behavior:

Surrogate Decision Tree
Residual Surrogate Decision Tree

Feature importance:

Feature behavior:

Fairness:

Disparate Impact Analysis (DIA)

Model debugging:

Model validity testing:

Supported environments & Python version(s):

OS / Python	Python 3.11
Linux x86 64b	Driverless AI MOJO, Driverless AI REST, H2O-3, scikit-learn

Getting Started with Predictive Models

Explain your predictive model by running an interpretation from Python or Jupyter Notebook:

# dataset

import pandas

dataset = pandas.read_csv(dataset_path)
(X, y) = dataset.drop(target_column, axis=1), dataset[target_column]

# model

from sklearn import ensemble

model = ensemble.GradientBoostingClassifier(learning_rate=0.1)
model.fit(X, y)

# interpretation

from h2o_sonar import interpret

interpretation = interpret.run_interpretation(
    dataset=dataset_path,
    model=model,
    used_features=list(X.columns),
    target_col=target_column,
    results_location=results_path,
)

# interpretation result

print(interpretation)

# get explanation created by the first explainer of the interpretation
explanation = interpretation.get_explainer_result(
    interpretation.get_finished_explainer_ids()[0]
)

# show explanation summary
print(explanation.summary())
# show explanation data
print(explanation.data(feature_name="EDUCATION", category="disparity"))
# get explanation plot
explanation.plot(feature_name="EDUCATION")
# show explainer log
print(explanation.log(path=results_path))
# store all explanation artifacts as ZIP archive
explanation.zip(file_path=archive_path)

Alternatively, you can run the interpretation using the command line interface - check help:

h2o-sonar --help

Explain your model:

h2o-sonar run interpretation \
  --dataset dataset.csv \
  --model model.mojo \
  --target-col SATISFACTION

Checkout the interpretation report and explanations:

Bring Your Own Explainer

The set of techniques and methods provided by H2O Sonar can be extended with custom explainers as H2O Sonar supports BYOE recipes - the ability to Bring Your Own Explainer. BYOE recipe is a Python code snippet. With BYOE recipe, you can use your explainers in combination with or instead of H2O Sonar built-in explainers.

Open source recipe examples - which are used also in the documentation to demonstrate H2O Sonar explainer API - can be found in:

examples/predictive/byoe/examples H2O Sonar distribution directory

Open source recipe templates - which can be used to create quickly new explainers just by choosing the desired explainer type / explanation type (like feature importance, decision tree, or partial dependence) and replacing mock data with a calculation - can be found in:

examples/predictive/byoe/templates H2O Sonar distribution directory

See documentation for more details.

Generative AI

h2oGPTe, h2oGPT, H2O LLMOps/MLOps, OpenAI, Microsoft Azure Open AI, Anthropic Claude, Amazon Bedrock, and ollama RAG and LLM hosts are supported by the H2O Sonar.

H2O Sonar evaluation report examples:

h2oGPTe's LLMs comparison (HTML)
SR 11-7 English embedding models evaluation report (HTML)
SR 11-7 multilingual embedding models evaluation report (HTML)

Examples:

Hello generative World!

Evaluators

Agent:

Agent Sanity Check

Generation:

Answer Accuracy (Semantic Similarity)
Answer Correctness
Answer Relevancy
Answer Relevancy (Sentence Similarity)
Answer Semantic Sentence Similarity
Answer Semantic Similarity
Fact-Check (Agent-based)
Faithfulness
Groundedness (Semantic Similarity)
Hallucination
JSON Schema
Language Mismatch (Judge)
Looping Detection
Machine Translation (GPTScore)
Parameterizable BYOP
Perplexity
Questions Drift
Question Answering (GPTScore)
RAGAS
Self-Consistency
Step Alignment and Completeness
Text Matching

Retrieval:

Context Mean Reciprocal Rank
Context Precision
Context Recall
Context Relevancy
Context Relevancy (Soft Recall and Precision)

Privacy:

Fairness:

Summarization:

BERTScore
BLEU
ROUGE
Summarization (Completeness and Faithfulness)
Summarization (Judge)
Summarization with reference (GPTScore)
Summarization without reference (GPTScore)

Classification:

Classification

Evals Library

H2O Sonar provides a comprehensive library featuring 1,000,000+ curated prompts specifically designed for LLM, RAG, and AI Agent evaluation.

H2O Sonar Evals Library

The library includes ready-to-use versions of the trusted industry benchmarks like:

MMLU (Massive Multitask Language Understanding)
ARC (AI2 Reasoning Challenge)
CUAD (Contract Understanding Atticus Dataset)
HellaSwag (Common Sense Reasoning)
GSM8K (Grade School Math 8K)

The library's 700+ test suites cover key domains including Question Answering, Privacy, Fairness, Security, Summarization, and Classification.

Standardized format:
- All data is provided in a normalized H2O Sonar JSON format.
Flexible workflows:
- Test suites can be combined, sampled, perturbed, and customized to meet your specific evaluation requirements.

Host Types

H2O Sonar can evaluate standalone LLMs and LLMs used by RAG systems hosted by the following products and services:

RAG:

Amazon Bedrock
h2oGPTe
OpenAI Assistants with File Search Tool

LLM:

Amazon Bedrock
Anthropic Claude Chat
h2oGPT
h2oGPTe
H2O LLMOps
Microsoft Azure OpenAI Chat
ollama
OpenAI Chat
OpenAI Chat API Compatible Hosts

Getting Started with Generative Models

Explain your generative model(s) by running an evaluation from Python or Jupyter Notebook:

# LLM models to be evaluated

model_host = h2o_sonar_config.ConnectionConfig(
    connection_type=h2o_sonar_config.ConnectionConfigType.H2O_GPT_E.name,
    name="H2O GPT Enterprise",
    description="H2O GPT Enterprise model host.",
    server_url="https://h2ogpte.h2o.ai/",
    token="YOUR_API_TOKEN_HERE",
    token_use_type=h2o_sonar_config.TokenUseType.API_KEY.name,
)
llm_models = genai.H2oGpteRagClient(model_host).list_llm_model_names()

# evaluation dataset

# test suite: RAG corpus, prompts, expected answers
rag_test_suite = testing.RagTestSuiteConfig.load_from_json(
    test_utils.find_locally("data/generative/demo_doc_test_suite.json")
)
# test lab: resolved test suite w/ actual values from the LLM models host
test_lab = testing.RagTestLab.from_rag_test_suite(
    rag_connection=model_host,
    rag_test_suite=rag_test_suite,
    rag_model_type=models.ExplainableModelType.h2ogpte,
    llm_model_names=llm_models,
    docs_cache_dir=tmp_path,
)
# deploy the test lab: upload corpus and create RAG collections/knowledge bases
test_lab.build()
# complete the test lab: actual values - answers, duration, cost, ...
test_lab.complete_dataset()

# EVALUATION

evaluation = evaluate.run_evaluation(
    # test lab as the evaluation dataset (prompts, expected and actual answers)
    dataset=test_lab.dataset,
    # models to be evaluated ~ compared in the evaluation leaderboard
    models=test_lab.evaluated_models.values(),
    # evaluators
    evaluators=[
	rag_hallucination_evaluator.RagHallucinationEvaluator().evaluator_id()
    ],
    # where to save the report
    results_location=tmp_path,
)

# HTML report and the evaluation data (JSon, CSV, data frames, ...)

print(f"HTML report: file://{evaluation.result.get_html_report_location()}")

Checkout the evaluation report:

Auto Best Model Selection

The H2O Sonar evaluations comparator is a decision-support tool designed to streamline LLM, RAG, and Agent selection and automated best-model selection. It allows you to move beyond raw numbers by providing side-by-side analysis and automated model recommendations based on your specific evaluation data - examples:

The tool performs intelligent cross-model comparison by identifying "comparable models" via intersection of evaluation data, ensuring your benchmarks are sound:

Prompt Alignment: Matches models that share the same questions / prompts.
Metric Consistency: Identifies common metric scores to ensure an "apples-to-apples" comparison.

The evaluations comparator performs automated best-model selection by applying multi-objective optimization to:

Rank Performance: Automatically suggest the "best model" based on weighted priority of your chosen metrics.
Identify Strengths: Pinpoint which model excels at retrieval (RAG) vs. reasoning (agents).
Detect Regressions: Compare new model versions against your established baselines to prevent quality drift.

The evaluations comparator brings also exportable insight reports allowing to transform complex evaluation data into stakeholder-ready assets. The tool generates comprehensive reports in two standard formats:

HTML
- Leaderboards, color-coded heatmaps, and detailed per-test case visualizations.
JSON
- Machine-readable data structure for CI/CD pipelines, custom dashboards, and archival.

Bring Your Own Evaluator

The set of techniques and methods provided by H2O Sonar for the generative AI models evaluation can be extended with custom evaluators as H2O Sonar supports BYOE recipes - the ability to Bring Your Own Evaluator. BYOE recipe is a Python code snippet. With BYOE recipe, you can use your evaluators in combination with or instead of H2O Sonar built-in evaluators.

Installation

Prepare prerequisites:

Operating system: Linux
Python 3.11
Pip 25.0+
CUDA-compatible GPU, NVIDIA drivers (optional - speed up generative evaluations)
Java 1.7+ (optional - needed for predictive H2O-3 backend only)
Graphviz (optional - needed for predictive visualizations only)

GPU acceleration (optional):

GPU support accelerates certain evaluators like BERTScore, GPTScore or Perplexity.
CUDA runtime provided via PyTorch/ONNX dependencies - installed automatically with [evaluators] extras.
Configure via environment variable: H2O_SONAR_CFG_DEVICE="gpu" (default: auto-detect).
Not supported on Linux (x86) only.

Download distribution or Python wheels:

Releases

Install Python wheel with only core dependencies for your platform:

Download the appropriate wheel file for your platform from the Releases page.
Install using: pip install h2o_sonar-<version>.whl

Package extras:

install H2O Sonar with all dependencies:
- pip install h2o_sonar-<version>.whl[explainers,evaluators]
install H2O Sonar with predictive models explainers dependencies:
- pip install h2o_sonar-<version>.whl[explainers]
install H2O Sonar with generative models evaluators dependencies:
- pip install h2o_sonar-<version>.whl[evaluators]
install H2O Sonar Generative AI clients package only:
- pip install h2o_sonar-<version>.whl[genaiclient]
install H2O Sonar core package only:
- pip install h2o_sonar-<version>.whl

Troubleshooting:

You may need to upgrade pip using python -m pip install --upgrade pip or curl -sS https://bootstrap.pypa.io/get-pip.py | python3.11 in case an H2O Sonar dependency installation fails.

Documentation

H2O Sonar resources:

Documentation:
- User documentation (ZIP)
Examples:
- Example Jupyter notebooks
- Bring Your Own Explainer examples

Feature Requests and Bugs

https://github.com/h2oai/h2o-sonar/issues/new/choose

Contribute

Do not hesitate to contribute - join us in evolving H2O Sonar and helping the AI/ML community thrive!

Prerequisites:

See Installation section.

Build project .whl:

git clone git@github.com:h2oai/h2o-sonar.git
cd h2o-sonar
make clean setup TARGET_PYTHON_VERSION=3.11
. .venv/bin/activate
make help
make diagnostics
make clean dist_src

H2O Sonar .whl can be found in the dist/ directory.

Contribute from H2O.ai

Credits

Key H2O Sonar contributors:

Munish Bhardwaj
- Predictive AI testing (Quality Assurance Engineer).
Martin Dvorak
- Predictive AI explainers and Generative AI evaluators (Software Engineer).
Mateusz Dymczyk
- Predictive AI methods (Software Engineer and Data Scientist).
Tomas Fryda
- Generative AI evaluators (Data Scientist and Software Engineer).
Navdeep Gill
- Predictive AI methods (Data Scientist and Software Engineer).
Patrick Hall
- Predictive AI data science vision and methods (Data Scientist).
Kim Montgomery
- Generative AI methods (Kaggle Grand Master Data Scientist).
Erik Stoklasa
- Generative AI methods (Software Engineer/internship).
Agus Sudjianto
- Generative AI data science vision and methods (Data Science geek who can speak).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.claude		.claude
.github		.github
data		data
docs		docs
examples		examples
h2o_sonar		h2o_sonar
licenses		licenses
make		make
tests		tests
.gitignore		.gitignore
.snyk		.snyk
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
RELEASE_PROCESS.md		RELEASE_PROCESS.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

H2O Sonar

Predictive AI

Explainers

Getting Started with Predictive Models

Bring Your Own Explainer

Generative AI

Evaluators

Evals Library

Host Types

Getting Started with Generative Models

Auto Best Model Selection

Bring Your Own Evaluator

Installation

Documentation

Feature Requests and Bugs

Contribute

Credits

About

Uh oh!

Releases 2

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

H2O Sonar

Predictive AI

Explainers

Getting Started with Predictive Models

Bring Your Own Explainer

Generative AI

Evaluators

Evals Library

Host Types

Getting Started with Generative Models

Auto Best Model Selection

Bring Your Own Evaluator

Installation

Documentation

Feature Requests and Bugs

Contribute

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Uh oh!

Contributors

Uh oh!

Languages