Releases · NVIDIA-NeMo/DataDesigner

08 Jan 22:53

johnnygreco

v0.3.1

52aecfa

v0.3.1 2026-01-08

What's Changed

fix: Stray validate calls in notebooks by @mikeknep in #192
fix: exclude df from seed source serialization by @johnnygreco in #193

Full Changelog: v0.3.0...v0.3.1

Contributors

mikeknep and johnnygreco

Assets 3

08 Jan 21:15

johnnygreco

v0.3.0

de417c8

v0.3.0 2026-01-08

🎨 NeMo Data Designer v0.3.0 Release Notes

DataDesigner v0.3.0 introduces some breaking changes that we highlight below.

💥 Breaking Change: config validation

The Data Designer config validation method .validate has been moved from the config builder to the DataDesigner object.

Before (v0.2.x):

from data_designer.essentials import DataDesigner, DataDesignerConfigBuilder

data_designer = DataDesigner()
config_builder = DataDesignerConfigBuilder()

# ... build your config ...

# validate config
config_builder.validate()

After (v0.3.x):

from data_designer.essentials import DataDesigner, DataDesignerConfigBuilder

data_designer = DataDesigner()
config_builder = DataDesignerConfigBuilder()

# ... build your config ...

# validate config
data_designer.validate(config_builder)

💥 Breaking Change: seed datasets

Working with seed datasets has been simplified with the introduction of SeedSource objects, which are passed directly to config_builder.with_seed_dataset. This removes the step of making a seed reference with datastore settings (when needed).

Before (v0.2.x):

Seed from a local file:

from data_designer.essentials import DataDesigner, DataDesignerConfigBuilder

config_builder = DataDesignerConfigBuilder()

seed_dataset_reference = data_designer.make_seed_reference_from_file("my_seed_dataset.parquet")
config_builder.with_seed_dataset(seed_dataset_reference)

Seed from a Dataframe:

from data_designer.essentials import DataDesigner, DataDesignerConfigBuilder

# define dataframe `df`

config_builder = DataDesignerConfigBuilder()

# the dataframe must be written to file in v0.2.x
seed_dataset_reference = data_designer.make_seed_reference_from_dataframe(df, "my_seed_dataset.parquet")

config_builder.with_seed_dataset(seed_dataset_reference)

After (v0.3.x):

Seed from a local file:

from data_designer.essentials import DataDesigner, DataDesignerConfigBuilder, LocalFileSeedSource

config_builder = DataDesignerConfigBuilder()
config_builder.with_seed_dataset(LocalFileSeedSource(path="my_seed_dataset.parquet"))

Seed from a DataFrame:

from data_designer.essentials import DataDesigner, DataDesignerConfigBuilder, DataFrameSeedSource

# define dataframe `df`

config_builder = DataDesignerConfigBuilder()

# no need to specify a file, as the dataframe will be sampled directly in memory
config_builder.with_seed_dataset(DataFrameSeedSource(df=df))

Seed from Hugging Face Hub:

from data_designer.essentials import DataDesigner, DataDesignerConfigBuilder, HuggingFaceSeedSource

config_builder = DataDesignerConfigBuilder()
config_builder.with_seed_dataset(HuggingFaceSeedSource(path="datasets/my-username/my-dataset/data/*.parquet"))

💥 Breaking Change: plugins

When defining plugins, there are two important updates:

task -> impl
The arguments of the Plugin object are now given as fully-qualified object names (e.g., "my_plugin.module.PluginObject") rather than the actual objects.

Before (v0.2.x):

from my_plugin.multiple_column_generator import IndexMultiplierColumnGenerator, IndexMultiplierColumnConfig
from data_designer.plugins import Plugin, PluginType 

plugin = Plugin(
    task_cls=IndexMultiplierColumnGenerator,
    config_cls=IndexMultiplierColumnConfig,
    plugin_type=PluginType.COLUMN_GENERATOR,
    emoji="🔌",
)

After (v0.3.x)

from data_designer.plugins import Plugin, PluginType 

plugin = Plugin(
    impl_qualified_name="my_plugin.multiple_column_generator.IndexMultiplierColumnGenerator",
    config_qualified_name="my_plugin.multiple_column_generator.IndexMultiplierColumnConfig",
    plugin_type=PluginType.COLUMN_GENERATOR,
    emoji="🔌",
)

What's Changed

fix: make doc building workflow use python 3.11 by @johnnygreco in #170
refactor: plugin system updates by @mikeknep in #168
feat: add OpenRouter as one of the default providers by @nabinchha in #161
feat: Allow defining extra headers on model providers by @mikeknep in #174
docs: fix documentation on max_tokens by @nabinchha in #176
docs: Add extra_headers to model provider docs by @mikeknep in #178
fix: Decimal in structured generation leads to errors by @andreatgretel in #171
fix: litellm max callbacks override by @nabinchha in #180
fix: deserializing instantiates seed columns twice by @andreatgretel in #188
chore: deprecate InferenceParameters by @nabinchha in #183
refactor: Overhaul to seed datasets by @mikeknep in #167
refactor: Plugins rename task to impl by @mikeknep in #189
chore: limit update upper bound on litellm version by @johnnygreco in #190
feat: Expose shutdown options as RunConfig by @eric-tramel in #186

New Contributors

@eric-tramel made their first contribution in #186

Full Changelog: v0.2.2...v0.3.0

Contributors

eric-tramel, mikeknep, and 3 other contributors

Assets 2

07 Jan 21:51

johnnygreco

v0.2.3

6c242a1

v0.2.3 2026-01-07

What's Changed

fix: make doc building workflow use python 3.11 by @johnnygreco in #170
fix: litellm max callbacks override by @nabinchha in #180

Full Changelog: v0.2.2...v0.2.3

Contributors

nabinchha and johnnygreco

Assets 3

30 Dec 21:37

johnnygreco

v0.2.2

d7e93c5

v0.2.2 2025-12-30

What's Changed

chore: change ruff parsing to JSON + relax ruff version by @andreatgretel in #156
chore: refresh dependency list by @johnnygreco in #154
fix: seed datasets replace existing columns when names collide by @andreatgretel in #158
fix: limit imports in base generators module by @johnnygreco in #166

Full Changelog: v0.2.1...v0.2.2

Contributors

johnnygreco and andreatgretel

Assets 3

19 Dec 01:56

johnnygreco

v0.2.1

b71c6c1

v0.2.1 2025-12-18

What's Changed

docs: some updates for nano3 by @johnnygreco in #149
chore: initial telemetry impl by @johntmyers in #118
docs: just some tutorial notebook tweaks and a docstring update by @johnnygreco in #150
docs: add cli instructions to person sampling docs by @johnnygreco in #151
docs: fix links and tweak person sampling by @johnnygreco in #152

New Contributors

@johntmyers made their first contribution in #118

Full Changelog: v0.2.0...v0.2.1

Contributors

johntmyers and johnnygreco

Assets 3

17 Dec 22:19

johnnygreco

v0.2.0

8540529

v0.2.0 2025-12-17

Notable Additions

New native column type!
- 🧬 We now have native support for embedding generation
New CLI command to download the Nemotron-Personas datasets from NGC
Nemotron 3 Nano is out new default NVIDIA model for LLM-text columns
New processor documentation

What's Changed

fix: don't lowercase score names when using the judge score factory pipeline by @nabinchha in #122
docs: add initial plugin documentation by @johnnygreco in #107
docs: Updated Person Sampling docs by @kirit93 in #120
docs: add option to open notebook directly in Colab by @andreatgretel in #126
fix: typo on path to colab notebook by @andreatgretel in #129
fix: analysis report when there is a column with mixed data types by @johnnygreco in #131
docs: fix links on notebooks and add %%capture on install cell by @andreatgretel in #134
feat: support native embedding generation by @nabinchha in #106
docs: add documentation on how to configure custom model settings by @nabinchha in #124
chore: Update nvidia text default model alias to nano v3 by @nabinchha in #133
fix: handling of different inference params in info display by @nabinchha in #141
chore: update default model config settings by @johnnygreco in #142
docs: add processors by @andreatgretel in #147
chore: add explicit discriminator field for processors by @andreatgretel in #145
feat: Add download personas command to the CLI by @johnnygreco in #146
chore: update type hints to 3.10+ by @johnnygreco in #148

Full Changelog: v0.1.5...v0.2.0

Contributors

nabinchha, kirit93, and 2 other contributors

Assets 3

11 Dec 02:30

johnnygreco

v0.1.5

1cea292

v0.1.5 2025-12-10

What's Changed

fix: add git user/email and allow manual trigger for docs pipeline by @andreatgretel in #105
docs: fix links in readme by @imadreamerboy in #104
docs: add footer navigation by @johnnygreco in #108
docs: add nvidia to citation so it plays nice with bibtex by @johnnygreco in #111
docs: set up initial recipe section by @johnnygreco in #114
feat: processor to easily export part of dataset to JSONL by @andreatgretel in #26
docs: fix link and some rephrasing by @johnnygreco in #119
chore: support max parallel requests on non-llm-based configs by @johnnygreco in #116

New Contributors

@imadreamerboy made their first contribution in #104

Full Changelog: v0.1.4...v0.1.5

Contributors

johnnygreco, imadreamerboy, and andreatgretel

Assets 3

08 Dec 15:29

johnnygreco

v0.1.4

32515ba

v0.1.4 2025-12-08

What's Changed

chore: moving notebooks to jupytext and cleaning up workflows by @andreatgretel in #91
fix: remove broken download links in notebooks and add download button instead by @andreatgretel in #94
docs: move models docs to concepts > models by @nabinchha in #93
fix: small typo on text file by @andreatgretel in #95
fix: update Python version to 3.11 on build notebooks CI by @andreatgretel in #96
fix: allow docs CI to be manually triggered, better download button by @andreatgretel in #99
docs: Add example notebook showing how to use image contexts by @nabinchha in #97
docs: add models module to code reference by @nabinchha in #101
refactor: config, essentials, and plugins by @mikeknep in #100
docs: add versioning using mike by @andreatgretel in #102
style: import sorting by @mikeknep in #103

Full Changelog: v0.1.3...v0.1.4

Contributors

mikeknep, nabinchha, and andreatgretel

Assets 3

03 Dec 02:59

johnnygreco

v0.1.3

1946410

v0.1.3 2025-12-02

What's Changed

chore: fix example notebook drop=False -> drop=True to match comment by @nabinchha in #78
fix: Fix starting index for batch processing by @nabinchha in #80
fix: timestamp dataset name if there's a collision by @nabinchha in #77
docs: Drop state from PersonSamplerParams docstring by @mikeknep in #81
chore: update person fields by @johnnygreco in #83
docs: fix basic tutorial to use faker person sampling by @johnnygreco in #86

New Contributors

@mikeknep made their first contribution in #81

Full Changelog: v0.1.2...v0.1.3

Contributors

mikeknep, nabinchha, and johnnygreco

Assets 3

25 Nov 00:00

johnnygreco

v0.1.2

91e2764

v0.1.2 2025-11-24

What's Changed

docs: more broken links because we are awesome by @johnnygreco in #67
fix: update get_file_column_names to take a file reference by @johnnygreco in #68
fix: make sampler type a discriminated union with injection validator by @johnnygreco in #71
docs: update link to notebooks in readme by @johnnygreco in #69
fix: small fixes by @johnnygreco in #72

Full Changelog: v0.1.1...v0.1.2

Contributors

johnnygreco

Assets 3

Releases: NVIDIA-NeMo/DataDesigner

v0.3.1 2026-01-08

What's Changed

Contributors

Uh oh!

v0.3.0 2026-01-08

🎨 NeMo Data Designer v0.3.0 Release Notes

💥 Breaking Change: config validation

Before (v0.2.x):

After (v0.3.x):

💥 Breaking Change: seed datasets

Before (v0.2.x):

After (v0.3.x):

💥 Breaking Change: plugins

Before (v0.2.x):

After (v0.3.x)

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.3 2026-01-07

What's Changed

Contributors

Uh oh!

v0.2.2 2025-12-30

What's Changed

Contributors

Uh oh!

v0.2.1 2025-12-18

What's Changed

New Contributors

Contributors

Uh oh!

v0.2.0 2025-12-17

Notable Additions

What's Changed

Contributors

Uh oh!

v0.1.5 2025-12-10

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.4 2025-12-08

What's Changed

Contributors

Uh oh!

v0.1.3 2025-12-02

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.2 2025-11-24

What's Changed

Contributors

Uh oh!