Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 0 additions & 31 deletions docs/code_reference/analysis.md

This file was deleted.

8 changes: 0 additions & 8 deletions docs/code_reference/column_configs.md

This file was deleted.

31 changes: 31 additions & 0 deletions docs/code_reference/config/analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Analysis

Profiling result objects and report helpers returned after generation.

## Column Statistics

`DataDesigner.create()` and `DataDesigner.preview()` run the dataset profiler after generation. The profiler computes statistics for each configured column; side-effect columns are recorded separately in `DatasetProfilerResults.side_effect_column_names`.

Statistics result classes store computed metrics for each column type and format those metrics for reports.

::: data_designer.config.analysis.column_statistics

## Column Profilers

Column profilers are optional analysis tools that provide deeper insights into specific column types. Currently, the only column profiler available is the Judge Score Profiler.

Profiler result classes store computed profiler output and format it for reports.

::: data_designer.config.analysis.column_profilers

## Dataset Profiler

The [DatasetProfilerResults](#data_designer.config.analysis.dataset_profiler.DatasetProfilerResults) class stores profiling results for a generated dataset. It aggregates column-level statistics, side-effect column names, and optional profiler results, and provides methods to:

- Compute dataset-level metrics (completion percentage, column type summary)
- Filter statistics by column type
- Generate formatted analysis reports via the `to_report()` method

Reports can be displayed in the console or exported to HTML/SVG formats.

::: data_designer.config.analysis.dataset_profiler
18 changes: 18 additions & 0 deletions docs/code_reference/config/column_configs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Column Configurations

Column configs declare Data Designer's built-in column types. Each configuration inherits from [SingleColumnConfig](#data_designer.config.base.SingleColumnConfig), which provides shared arguments like the column `name`, whether to `drop` the column after generation, and the `column_type`.

For column generator implementation classes, see [column_generators](../engine/column_generators.md).

!!! info "`column_type` is a discriminator field"
The `column_type` argument is used to identify column types when deserializing the [Data Designer Config](data_designer_config.md) from JSON/YAML. It acts as the discriminator in a [discriminated union](https://docs.pydantic.dev/latest/concepts/unions/#discriminated-unions), allowing Pydantic to automatically determine which column configuration class to instantiate.

## `SingleColumnConfig` {#data_designer.config.base.SingleColumnConfig}

::: data_designer.config.base.SingleColumnConfig
options:
show_root_toc_entry: false

## Column configurations

::: data_designer.config.column_configs
10 changes: 10 additions & 0 deletions docs/code_reference/config/config_builder.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Data Designer's Config Builder

Use [DataDesignerConfigBuilder](#data_designer.config.config_builder.DataDesignerConfigBuilder) to construct [DataDesignerConfig](data_designer_config.md#data_designer.config.data_designer_config.DataDesignerConfig) objects. The builder accumulates model configs, tool configs, column configs, constraints, seed settings, processors, and profilers.

Inputs can come from scratch, a `dict`, [BuilderConfig](#data_designer.config.config_builder.BuilderConfig), a local YAML/JSON file, or an HTTP(S) YAML/JSON URL via [`from_config()`](#data_designer.config.config_builder.DataDesignerConfigBuilder.from_config). Use [`build()`](#data_designer.config.config_builder.DataDesignerConfigBuilder.build) to create a [DataDesignerConfig](data_designer_config.md#data_designer.config.data_designer_config.DataDesignerConfig), or [`write_config()`](#data_designer.config.config_builder.DataDesignerConfigBuilder.write_config) to serialize the current builder config to YAML or JSON.

!!! info "Model config loading"
[DataDesignerConfigBuilder](#data_designer.config.config_builder.DataDesignerConfigBuilder) accepts model configs as a list of [ModelConfig](models.md#data_designer.config.models.ModelConfig) objects, a YAML/JSON config path, or `None`. When `model_configs=None`, the builder loads default model configs if Data Designer can run locally; otherwise initialization raises BuilderConfigurationError. Model configs define the aliases referenced by model-backed columns such as [`LLMTextColumnConfig`](column_configs.md#data_designer.config.column_configs.LLMTextColumnConfig), [`LLMCodeColumnConfig`](column_configs.md#data_designer.config.column_configs.LLMCodeColumnConfig), [`LLMStructuredColumnConfig`](column_configs.md#data_designer.config.column_configs.LLMStructuredColumnConfig), [`LLMJudgeColumnConfig`](column_configs.md#data_designer.config.column_configs.LLMJudgeColumnConfig), [`EmbeddingColumnConfig`](column_configs.md#data_designer.config.column_configs.EmbeddingColumnConfig), and [`ImageColumnConfig`](column_configs.md#data_designer.config.column_configs.ImageColumnConfig).

::: data_designer.config.config_builder
7 changes: 7 additions & 0 deletions docs/code_reference/config/data_designer_config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Data Designer Configuration

[DataDesignerConfig](#data_designer.config.data_designer_config.DataDesignerConfig) is the top-level configuration object passed to Data Designer. It declares the columns to generate and may include model configs, tool configs, seed settings, sampler constraints, processors, and profiler configs.

Prefer [DataDesignerConfigBuilder](config_builder.md#data_designer.config.config_builder.DataDesignerConfigBuilder) for programmatic construction. Direct [DataDesignerConfig](#data_designer.config.data_designer_config.DataDesignerConfig) instantiation is also supported.

::: data_designer.config.data_designer_config
7 changes: 7 additions & 0 deletions docs/code_reference/config/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Config Package

The `data-designer-config` package provides `data_designer.config`, the configuration layer of Data Designer. It contains the objects used to describe dataset structure, model access, tool access, seed data, sampler parameters, validators, processors, run settings, plugin registrations, and analysis results.

This package is the base of the dependency chain. Engine and interface code consume these config objects, but config objects do not execute generation directly.

For programmatic configuration work, start with [config_builder](config_builder.md) and [data_designer_config](data_designer_config.md). Use the narrower pages for exact constructor fields for columns, models, MCP tools, seeds, processors, samplers, validators, or profiling results.
16 changes: 16 additions & 0 deletions docs/code_reference/config/mcp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# MCP Configuration

MCP config objects tell Data Designer which Model Context Protocol providers exist and which tools an LLM column may use.

[MCPProvider](#data_designer.config.mcp.MCPProvider) configures remote MCP servers via SSE or Streamable HTTP transport. [LocalStdioMCPProvider](#data_designer.config.mcp.LocalStdioMCPProvider) configures local MCP servers as subprocesses via stdio transport. [ToolConfig](#data_designer.config.mcp.ToolConfig) sets which tools are available for LLM columns and how they are constrained.

For MCP execution internals, see [Engine MCP](../engine/mcp.md). Related guides:

- **[MCP Providers](../../concepts/mcp/mcp-providers.md)** - Configure local or remote MCP providers
- **[Tool Configs](../../concepts/mcp/tool-configs.md)** - Define tool permissions and limits
- **[Enabling Tools](../../concepts/mcp/enabling-tools.md)** - Use tools in LLM columns
- **[Traces](../../concepts/traces.md)** - Capture full conversation history

## API Reference

::: data_designer.config.mcp
12 changes: 12 additions & 0 deletions docs/code_reference/config/models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Models

[ModelProvider](#data_designer.config.models.ModelProvider) stores connection and authentication details for model providers. [ModelConfig](#data_designer.config.models.ModelConfig) stores a model alias, model identifier, provider settings, and inference parameters. [Inference Parameters](../../concepts/models/inference-parameters.md) control model behavior. Chat-completion parameters include `temperature`, `top_p`, and `max_tokens`; `temperature` and `top_p` can be fixed values or configured distributions. [ImageContext](#data_designer.config.models.ImageContext) provides image inputs to multimodal models, and [ImageInferenceParams](#data_designer.config.models.ImageInferenceParams) configures image generation models.

Related guides:

- **[Model Providers](../../concepts/models/model-providers.md)**
- **[Model Configs](../../concepts/models/model-configs.md)**
- **[Image Context](../../notebooks/4-providing-images-as-context.ipynb)**
- **[Generating Images](../../notebooks/5-generating-images.ipynb)**

::: data_designer.config.models
17 changes: 17 additions & 0 deletions docs/code_reference/config/plugins.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Plugins

Plugin packages register [Plugin](#data_designer.plugins.plugin.Plugin) objects through entry points in the `data_designer.plugins` group. A plugin registration ties a config class to its implementation class and declares its [PluginType](#data_designer.plugins.plugin.PluginType).

Related pages: [Build Your Own](../../plugins/build_your_own.md), [Column Generators](../engine/column_generators.md), [Seed Readers](../engine/seed_readers.md), [Engine Processors](../engine/processors.md), and [Processor Configurations](processors.md).

## `Plugin` {#data_designer.plugins.plugin.Plugin}

::: data_designer.plugins.plugin.Plugin
options:
show_root_toc_entry: false

## `PluginType` {#data_designer.plugins.plugin.PluginType}

::: data_designer.plugins.plugin.PluginType
options:
show_root_toc_entry: false
7 changes: 7 additions & 0 deletions docs/code_reference/config/processors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Processor Configurations

Processor configs request data transformations after generation. Add them to a `DataDesignerConfig` or `DataDesignerConfigBuilder`; the engine later compiles them into runtime processor implementations.

Related pages: [engine processors](../engine/processors.md) and [Build Your Own](../../plugins/build_your_own.md).

::: data_designer.config.processors
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Run Config

The `run_config` module defines runtime settings that control dataset generation behavior,
including early shutdown thresholds, batch sizing, non-inference worker concurrency,
and the Jinja rendering engine used by the runtime.
`RunConfig` controls dataset generation behavior, including early shutdown thresholds,
batch sizing, non-inference worker concurrency, and the Jinja rendering engine used by
the runtime.

`JinjaRenderingEngine.SECURE` is the default. Set `JinjaRenderingEngine.NATIVE`
when you want Jinja2's broader built-in sandbox behavior instead of Data Designer's
hardened renderer.

For guidance on when to use each mode, see [Security](../concepts/security.md).
For guidance on when to use each mode, see [Security](../../concepts/security.md).

## Usage

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Sampler Parameters

The `sampler_params` module defines parameter configuration objects for all Data Designer sampler types. Sampler parameters are used within the [SamplerColumnConfig](column_configs.md#data_designer.config.column_configs.SamplerColumnConfig) to specify how values should be generated for sampled columns.
Sampler parameter classes configure Data Designer's built-in samplers. Use them in [SamplerColumnConfig](column_configs.md#data_designer.config.column_configs.SamplerColumnConfig) to specify how sampled column values are generated.

!!! tip "Displaying available samplers and their parameters"
The config builder has an `info` attribute that can be used to display the
Expand Down
19 changes: 19 additions & 0 deletions docs/code_reference/config/seeds.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Seeds

Seed configs declare existing data used as input during generation. A [SeedConfig](#data_designer.config.seed.SeedConfig) combines a seed source with optional row sampling and selection settings. Seed source objects declare where seed data comes from; the engine reads them through seed readers.

Use these objects with `DataDesignerConfigBuilder.with_seed_dataset()`. Related pages: [Seed Datasets](../../concepts/seed-datasets.md) and [seed readers](../engine/seed_readers.md).

Built-in seed sources include local files, Hugging Face paths, in-memory DataFrames, directories, file contents, and agent rollout traces. Plugin seed sources can extend the same discriminated union through the plugin system.

## Seed Config

::: data_designer.config.seed

## Built-In Seed Sources

::: data_designer.config.seed_source

## DataFrame Seed Source

::: data_designer.config.seed_source_dataframe
6 changes: 6 additions & 0 deletions docs/code_reference/config/validator_params.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Validator Parameters

`ValidationColumnConfig` selects a validator with `validator_type` and configures it with `validator_params`.
The `validator_type` field can be `code`, `local_callable`, or `remote`. The matching `validator_params` objects are:

::: data_designer.config.validator_params
10 changes: 0 additions & 10 deletions docs/code_reference/config_builder.md

This file was deleted.

7 changes: 0 additions & 7 deletions docs/code_reference/data_designer_config.md

This file was deleted.

Loading
Loading