Skip to content

h2oai/h2o-sonar

Repository files navigation

H2O Sonar

logo

Explainability Toolbox for Predictive and Generative AI

18 predictive AI explainers.
44 generative AI evaluators.
1,000,000+ curated eval prompts.
Automatic best model selection.
Used by H2O Eval Studio and H2O Driverless AI.

interpretation report

docs All releases downloads license

H2O Sonar is a Python library for AI model risk management (MRM) across predictive and generative systems. It provides explainers and evaluators that validate models, detect bias, assess fairness and privacy, and generate audit documentation. Built for regulated industries, H2O Sonar enables risk, compliance, and validation teams to quantify model risk, meet regulatory requirements, and maintain robust governance throughout the models lifecycle.

H2O Sonar is used by the following H2O.ai products:

Predictive AI

H2O Driverless AI, H2O-3 and scikit-learn predictive models can be explained by the H2O Sonar.

H2O Sonar explanation report examples:

Examples:

Explainers

Approximate model behavior:

Feature importance:

Feature behavior:

Fairness:

Model debugging:

Model validity testing:

Supported environments & Python version(s):

OS / Python Python 3.11
Linux x86 64b Driverless AI MOJO, Driverless AI REST, H2O-3, scikit-learn

Getting Started with Predictive Models

Explain your predictive model by running an interpretation from Python or Jupyter Notebook:

# dataset

import pandas

dataset = pandas.read_csv(dataset_path)
(X, y) = dataset.drop(target_column, axis=1), dataset[target_column]

# model

from sklearn import ensemble

model = ensemble.GradientBoostingClassifier(learning_rate=0.1)
model.fit(X, y)

# interpretation

from h2o_sonar import interpret

interpretation = interpret.run_interpretation(
    dataset=dataset_path,
    model=model,
    used_features=list(X.columns),
    target_col=target_column,
    results_location=results_path,
)

# interpretation result

print(interpretation)

# get explanation created by the first explainer of the interpretation
explanation = interpretation.get_explainer_result(
    interpretation.get_finished_explainer_ids()[0]
)

# show explanation summary
print(explanation.summary())
# show explanation data
print(explanation.data(feature_name="EDUCATION", category="disparity"))
# get explanation plot
explanation.plot(feature_name="EDUCATION")
# show explainer log
print(explanation.log(path=results_path))
# store all explanation artifacts as ZIP archive
explanation.zip(file_path=archive_path)

Alternatively, you can run the interpretation using the command line interface - check help:

h2o-sonar --help

Explain your model:

h2o-sonar run interpretation \
  --dataset dataset.csv \
  --model model.mojo \
  --target-col SATISFACTION

Checkout the interpretation report and explanations:

interpretation report

Bring Your Own Explainer

The set of techniques and methods provided by H2O Sonar can be extended with custom explainers as H2O Sonar supports BYOE recipes - the ability to Bring Your Own Explainer. BYOE recipe is a Python code snippet. With BYOE recipe, you can use your explainers in combination with or instead of H2O Sonar built-in explainers.

examples

Open source recipe examples - which are used also in the documentation to demonstrate H2O Sonar explainer API - can be found in:

  • examples/predictive/byoe/examples H2O Sonar distribution directory

examples

Open source recipe templates - which can be used to create quickly new explainers just by choosing the desired explainer type / explanation type (like feature importance, decision tree, or partial dependence) and replacing mock data with a calculation - can be found in:

  • examples/predictive/byoe/templates H2O Sonar distribution directory

See documentation for more details.

Generative AI

h2oGPTe, h2oGPT, H2O LLMOps/MLOps, OpenAI, Microsoft Azure Open AI, Anthropic Claude, Amazon Bedrock, and ollama RAG and LLM hosts are supported by the H2O Sonar.

H2O Sonar evaluation report examples:

Examples:

Evaluators

Agent:

Generation:

Retrieval:

Privacy:

Fairness:

Summarization:

Classification:

Evals Library

Evals Library

H2O Sonar provides a comprehensive library featuring 1,000,000+ curated prompts specifically designed for LLM, RAG, and AI Agent evaluation.

The library includes ready-to-use versions of the trusted industry benchmarks like:

  • MMLU (Massive Multitask Language Understanding)
  • ARC (AI2 Reasoning Challenge)
  • CUAD (Contract Understanding Atticus Dataset)
  • HellaSwag (Common Sense Reasoning)
  • GSM8K (Grade School Math 8K)

The library's 700+ test suites cover key domains including Question Answering, Privacy, Fairness, Security, Summarization, and Classification.

  • Standardized format:
    • All data is provided in a normalized H2O Sonar JSON format.
  • Flexible workflows:
    • Test suites can be combined, sampled, perturbed, and customized to meet your specific evaluation requirements.

Host Types

H2O Sonar can evaluate standalone LLMs and LLMs used by RAG systems hosted by the following products and services:

RAG:

LLM:

Getting Started with Generative Models

Explain your generative model(s) by running an evaluation from Python or Jupyter Notebook:

# LLM models to be evaluated

model_host = h2o_sonar_config.ConnectionConfig(
    connection_type=h2o_sonar_config.ConnectionConfigType.H2O_GPT_E.name,
    name="H2O GPT Enterprise",
    description="H2O GPT Enterprise model host.",
    server_url="https://h2ogpte.h2o.ai/",
    token="YOUR_API_TOKEN_HERE",
    token_use_type=h2o_sonar_config.TokenUseType.API_KEY.name,
)
llm_models = genai.H2oGpteRagClient(model_host).list_llm_model_names()

# evaluation dataset

# test suite: RAG corpus, prompts, expected answers
rag_test_suite = testing.RagTestSuiteConfig.load_from_json(
    test_utils.find_locally("data/generative/demo_doc_test_suite.json")
)
# test lab: resolved test suite w/ actual values from the LLM models host
test_lab = testing.RagTestLab.from_rag_test_suite(
    rag_connection=model_host,
    rag_test_suite=rag_test_suite,
    rag_model_type=models.ExplainableModelType.h2ogpte,
    llm_model_names=llm_models,
    docs_cache_dir=tmp_path,
)
# deploy the test lab: upload corpus and create RAG collections/knowledge bases
test_lab.build()
# complete the test lab: actual values - answers, duration, cost, ...
test_lab.complete_dataset()

# EVALUATION

evaluation = evaluate.run_evaluation(
    # test lab as the evaluation dataset (prompts, expected and actual answers)
    dataset=test_lab.dataset,
    # models to be evaluated ~ compared in the evaluation leaderboard
    models=test_lab.evaluated_models.values(),
    # evaluators
    evaluators=[
	rag_hallucination_evaluator.RagHallucinationEvaluator().evaluator_id()
    ],
    # where to save the report
    results_location=tmp_path,
)

# HTML report and the evaluation data (JSon, CSV, data frames, ...)

print(f"HTML report: file://{evaluation.result.get_html_report_location()}")

Checkout the evaluation report:

interpretation report

Auto Best Model Selection

comparator

The H2O Sonar evaluations comparator is a decision-support tool designed to streamline LLM, RAG, and Agent selection and automated best-model selection. It allows you to move beyond raw numbers by providing side-by-side analysis and automated model recommendations based on your specific evaluation data - examples:

The tool performs intelligent cross-model comparison by identifying "comparable models" via intersection of evaluation data, ensuring your benchmarks are sound:

  • Prompt Alignment: Matches models that share the same questions / prompts.
  • Metric Consistency: Identifies common metric scores to ensure an "apples-to-apples" comparison.

The evaluations comparator performs automated best-model selection by applying multi-objective optimization to:

  • Rank Performance: Automatically suggest the "best model" based on weighted priority of your chosen metrics.
  • Identify Strengths: Pinpoint which model excels at retrieval (RAG) vs. reasoning (agents).
  • Detect Regressions: Compare new model versions against your established baselines to prevent quality drift.

The evaluations comparator brings also exportable insight reports allowing to transform complex evaluation data into stakeholder-ready assets. The tool generates comprehensive reports in two standard formats:

  • HTML
    • Leaderboards, color-coded heatmaps, and detailed per-test case visualizations.
  • JSON
    • Machine-readable data structure for CI/CD pipelines, custom dashboards, and archival.

Bring Your Own Evaluator

The set of techniques and methods provided by H2O Sonar for the generative AI models evaluation can be extended with custom evaluators as H2O Sonar supports BYOE recipes - the ability to Bring Your Own Evaluator. BYOE recipe is a Python code snippet. With BYOE recipe, you can use your evaluators in combination with or instead of H2O Sonar built-in evaluators.

Installation

Prepare prerequisites:

  • Operating system: Linux
  • Python 3.11
  • Pip 25.0+
  • CUDA-compatible GPU, NVIDIA drivers (optional - speed up generative evaluations)
  • Java 1.7+ (optional - needed for predictive H2O-3 backend only)
  • Graphviz (optional - needed for predictive visualizations only)

GPU acceleration (optional):

  • GPU support accelerates certain evaluators like BERTScore, GPTScore or Perplexity.
  • CUDA runtime provided via PyTorch/ONNX dependencies - installed automatically with [evaluators] extras.
  • Configure via environment variable: H2O_SONAR_CFG_DEVICE="gpu" (default: auto-detect).
  • Not supported on Linux (x86) only.

Download distribution or Python wheels:

Install Python wheel with only core dependencies for your platform:

  • Download the appropriate wheel file for your platform from the Releases page.
  • Install using: pip install h2o_sonar-<version>.whl

Package extras:

  • install H2O Sonar with all dependencies:
    • pip install h2o_sonar-<version>.whl[explainers,evaluators]
  • install H2O Sonar with predictive models explainers dependencies:
    • pip install h2o_sonar-<version>.whl[explainers]
  • install H2O Sonar with generative models evaluators dependencies:
    • pip install h2o_sonar-<version>.whl[evaluators]
  • install H2O Sonar Generative AI clients package only:
    • pip install h2o_sonar-<version>.whl[genaiclient]
  • install H2O Sonar core package only:
    • pip install h2o_sonar-<version>.whl

Troubleshooting:

  • You may need to upgrade pip using python -m pip install --upgrade pip or curl -sS https://bootstrap.pypa.io/get-pip.py | python3.11 in case an H2O Sonar dependency installation fails.

Documentation

H2O Sonar resources:

Feature Requests and Bugs

https://github.com/h2oai/h2o-sonar/issues/new/choose

Contribute

Do not hesitate to contribute - join us in evolving H2O Sonar and helping the AI/ML community thrive!

Prerequisites:

Build project .whl:

git clone git@github.com:h2oai/h2o-sonar.git
cd h2o-sonar
make clean setup TARGET_PYTHON_VERSION=3.11
. .venv/bin/activate
make help
make diagnostics
make clean dist_src

H2O Sonar .whl can be found in the dist/ directory.

Contribute from H2O.ai

Credits

Key H2O Sonar contributors:

  • Munish Bhardwaj
    • Predictive AI testing (Quality Assurance Engineer).
  • Martin Dvorak
    • Predictive AI explainers and Generative AI evaluators (Software Engineer).
  • Mateusz Dymczyk
    • Predictive AI methods (Software Engineer and Data Scientist).
  • Tomas Fryda
    • Generative AI evaluators (Data Scientist and Software Engineer).
  • Navdeep Gill
    • Predictive AI methods (Data Scientist and Software Engineer).
  • Patrick Hall
    • Predictive AI data science vision and methods (Data Scientist).
  • Kim Montgomery
    • Generative AI methods (Kaggle Grand Master Data Scientist).
  • Erik Stoklasa
    • Generative AI methods (Software Engineer/internship).
  • Agus Sudjianto
    • Generative AI data science vision and methods (Data Science geek who can speak).

About

A Toolbox for Responsible Predictive and Generative AI.

Resources

License

Stars

Watchers

Forks

Contributors

Languages