From 980f38d3ca10873354f82e0efc6d717e8fd998d9 Mon Sep 17 00:00:00 2001 From: Laura Summers Date: Tue, 21 Apr 2026 17:45:37 +0200 Subject: [PATCH 1/2] Add concept overview for Hosted Datasets Add a "What is a Hosted Dataset?" section with ASCII art diagrams showing the data model structure and relationships to experiments, tasks, and evaluators. Improves searchability by including the phrase "hosted dataset" multiple times. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/evaluate/datasets/index.md | 38 +++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/docs/evaluate/datasets/index.md b/docs/evaluate/datasets/index.md index a65441a3d..346611e3e 100644 --- a/docs/evaluate/datasets/index.md +++ b/docs/evaluate/datasets/index.md @@ -22,6 +22,44 @@ A dataset can be both: if you create a hosted dataset with the same name as one You can filter between these types using the **Hosted** and **Local** tabs at the top of the datasets list. +## What is a Hosted Dataset? + +A Hosted Dataset is a collection of test cases stored on the Logfire server. Each hosted dataset row is one **case** with inputs, expected output, and optional metadata. Additionally, a **schema** for the whole hosted dataset can be defined which constrains each case — ensuring that every case has the correct structure. + +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ Hosted Dataset │ +│ │ +│ ┌──────────────────────────────────┐ ┌─────────────────────────┐ │ +│ │ Case #1 │ │ Schema (Optional) │ │ +│ │ Input │ │ Input │ │ +│ │ Expected Output │ │ Expected Output │ │ +│ │ Metadata │ │ Metadata │ │ +│ └──────────────────────────────────┘ │ │ │ +│ ┌──────────────────────────────────┐ │ │ │ +│ │ Case #2 │ │ │ │ +│ └──────────────────────────────────┘ │ │ │ +│ ┌──────────────────────────────────┐ │ │ │ +│ │ Case #3 │ │ │ │ +│ └──────────────────────────────────┘ └─────────────────────────┘ │ +└─────────────────────────────────────────────────────────────────────┘ +``` + +Hosted datasets integrate into the broader [pydantic-evals](https://ai.pydantic.dev/evals/) data model: + +``` +Hosted Dataset (1) ─────────── (Many) Case +│ │ +│ │ +└── (Many) Experiment ──── (Many) Case results + │ + ├── (1) Task + │ + └── (Many) Evaluator +``` + +A single hosted dataset contains many cases. Over time, you run multiple experiments against the same hosted dataset — each experiment executes every case against a task and scores the results with evaluators. + ## Why Datasets? When evaluating AI systems, you need test cases that reflect real-world usage. Datasets solve several problems: From 899d1a03cb93c38c48757292076032aa4717204b Mon Sep 17 00:00:00 2001 From: Laura Summers Date: Tue, 21 Apr 2026 18:02:05 +0200 Subject: [PATCH 2/2] Add "How Cases Get Into a Hosted Dataset" section and fix diagram Add overview of the different ways to populate hosted datasets (Live View, UI, SDK) with links to detailed guides. Also switch ASCII art diagram to plain characters for cleaner rendering. Co-Authored-By: Claude Opus 4.6 (1M context) --- docs/evaluate/datasets/index.md | 42 ++++++++++++++++++++------------- 1 file changed, 26 insertions(+), 16 deletions(-) diff --git a/docs/evaluate/datasets/index.md b/docs/evaluate/datasets/index.md index 346611e3e..e8b604577 100644 --- a/docs/evaluate/datasets/index.md +++ b/docs/evaluate/datasets/index.md @@ -27,22 +27,22 @@ You can filter between these types using the **Hosted** and **Local** tabs at th A Hosted Dataset is a collection of test cases stored on the Logfire server. Each hosted dataset row is one **case** with inputs, expected output, and optional metadata. Additionally, a **schema** for the whole hosted dataset can be defined which constrains each case — ensuring that every case has the correct structure. ``` -┌─────────────────────────────────────────────────────────────────────┐ -│ Hosted Dataset │ -│ │ -│ ┌──────────────────────────────────┐ ┌─────────────────────────┐ │ -│ │ Case #1 │ │ Schema (Optional) │ │ -│ │ Input │ │ Input │ │ -│ │ Expected Output │ │ Expected Output │ │ -│ │ Metadata │ │ Metadata │ │ -│ └──────────────────────────────────┘ │ │ │ -│ ┌──────────────────────────────────┐ │ │ │ -│ │ Case #2 │ │ │ │ -│ └──────────────────────────────────┘ │ │ │ -│ ┌──────────────────────────────────┐ │ │ │ -│ │ Case #3 │ │ │ │ -│ └──────────────────────────────────┘ └─────────────────────────┘ │ -└─────────────────────────────────────────────────────────────────────┘ ++-------------------------------------------------------------------+ +| Hosted Dataset | +| | +| +--------------------------------+ +-----------------------+ | +| | Case #1 | | Schema (Optional) | | +| | Input | | Input | | +| | Expected Output | | Expected Output | | +| | Metadata | | Metadata | | +| +--------------------------------+ | | | +| +--------------------------------+ | | | +| | Case #2 | | | | +| +--------------------------------+ | | | +| +--------------------------------+ | | | +| | Case #3 | | | | +| +--------------------------------+ +-----------------------+ | ++-------------------------------------------------------------------+ ``` Hosted datasets integrate into the broader [pydantic-evals](https://ai.pydantic.dev/evals/) data model: @@ -60,6 +60,16 @@ Hosted Dataset (1) ─────────── (Many) Case A single hosted dataset contains many cases. Over time, you run multiple experiments against the same hosted dataset — each experiment executes every case against a task and scores the results with evaluators. +## How Cases Get Into a Hosted Dataset + +There are several ways to populate a hosted dataset with cases: + +- **From Live View**: Find an interesting trace or span in production and save it as a single case. You pick an existing hosted dataset or create a new one, review the extracted inputs and outputs, then add it. This is the easiest way to turn real-world usage into test cases. See [Adding Cases from Traces](ui.md#adding-cases-from-traces) for a walkthrough. +- **Manually in the UI**: Add cases one by one through the dataset's Cases tab. Useful when you want to hand-craft specific edge cases. See [Managing Cases](ui.md#managing-cases) for details. +- **Via the SDK**: Create cases programmatically with Python — either by pushing a full local `pydantic-evals` dataset or by adding individual cases. See the [SDK Guide](sdk.md) for details. + +Adding from Live View usually creates one new case from one span. Importing via the SDK can be done in bulk. + ## Why Datasets? When evaluating AI systems, you need test cases that reflect real-world usage. Datasets solve several problems: