Skip to content

Releases: NVIDIA-NeMo/DataDesigner

v0.3.1 2026-01-08

08 Jan 22:53
52aecfa

Choose a tag to compare

What's Changed

Full Changelog: v0.3.0...v0.3.1

v0.3.0 2026-01-08

08 Jan 21:15
de417c8

Choose a tag to compare

🎨 NeMo Data Designer v0.3.0 Release Notes

DataDesigner v0.3.0 introduces some breaking changes that we highlight below.

💥 Breaking Change: config validation

The Data Designer config validation method .validate has been moved from the config builder to the DataDesigner object.

Before (v0.2.x):

from data_designer.essentials import DataDesigner, DataDesignerConfigBuilder

data_designer = DataDesigner()
config_builder = DataDesignerConfigBuilder()

# ... build your config ...

# validate config
config_builder.validate()

After (v0.3.x):

from data_designer.essentials import DataDesigner, DataDesignerConfigBuilder

data_designer = DataDesigner()
config_builder = DataDesignerConfigBuilder()

# ... build your config ...

# validate config
data_designer.validate(config_builder)

💥 Breaking Change: seed datasets

Working with seed datasets has been simplified with the introduction of SeedSource objects, which are passed directly to config_builder.with_seed_dataset. This removes the step of making a seed reference with datastore settings (when needed).

Before (v0.2.x):

Seed from a local file:

from data_designer.essentials import DataDesigner, DataDesignerConfigBuilder

config_builder = DataDesignerConfigBuilder()

seed_dataset_reference = data_designer.make_seed_reference_from_file("my_seed_dataset.parquet")
config_builder.with_seed_dataset(seed_dataset_reference)

Seed from a Dataframe:

from data_designer.essentials import DataDesigner, DataDesignerConfigBuilder

# define dataframe `df`

config_builder = DataDesignerConfigBuilder()

# the dataframe must be written to file in v0.2.x
seed_dataset_reference = data_designer.make_seed_reference_from_dataframe(df, "my_seed_dataset.parquet")

config_builder.with_seed_dataset(seed_dataset_reference)

After (v0.3.x):

Seed from a local file:

from data_designer.essentials import DataDesigner, DataDesignerConfigBuilder, LocalFileSeedSource

config_builder = DataDesignerConfigBuilder()
config_builder.with_seed_dataset(LocalFileSeedSource(path="my_seed_dataset.parquet"))

Seed from a DataFrame:

from data_designer.essentials import DataDesigner, DataDesignerConfigBuilder, DataFrameSeedSource

# define dataframe `df`

config_builder = DataDesignerConfigBuilder()

# no need to specify a file, as the dataframe will be sampled directly in memory
config_builder.with_seed_dataset(DataFrameSeedSource(df=df))

Seed from Hugging Face Hub:

from data_designer.essentials import DataDesigner, DataDesignerConfigBuilder, HuggingFaceSeedSource

config_builder = DataDesignerConfigBuilder()
config_builder.with_seed_dataset(HuggingFaceSeedSource(path="datasets/my-username/my-dataset/data/*.parquet"))

💥 Breaking Change: plugins

When defining plugins, there are two important updates:

  • task -> impl
  • The arguments of the Plugin object are now given as fully-qualified object names (e.g., "my_plugin.module.PluginObject") rather than the actual objects.

Before (v0.2.x):

from my_plugin.multiple_column_generator import IndexMultiplierColumnGenerator, IndexMultiplierColumnConfig
from data_designer.plugins import Plugin, PluginType 

plugin = Plugin(
    task_cls=IndexMultiplierColumnGenerator,
    config_cls=IndexMultiplierColumnConfig,
    plugin_type=PluginType.COLUMN_GENERATOR,
    emoji="🔌",
)

After (v0.3.x)

from data_designer.plugins import Plugin, PluginType 

plugin = Plugin(
    impl_qualified_name="my_plugin.multiple_column_generator.IndexMultiplierColumnGenerator",
    config_qualified_name="my_plugin.multiple_column_generator.IndexMultiplierColumnConfig",
    plugin_type=PluginType.COLUMN_GENERATOR,
    emoji="🔌",
)

What's Changed

New Contributors

Full Changelog: v0.2.2...v0.3.0

v0.2.3 2026-01-07

07 Jan 21:51

Choose a tag to compare

What's Changed

Full Changelog: v0.2.2...v0.2.3

v0.2.2 2025-12-30

30 Dec 21:37
d7e93c5

Choose a tag to compare

What's Changed

Full Changelog: v0.2.1...v0.2.2

v0.2.1 2025-12-18

19 Dec 01:56
b71c6c1

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.0...v0.2.1

v0.2.0 2025-12-17

17 Dec 22:19
8540529

Choose a tag to compare

Notable Additions

  • New native column type!
    • 🧬 We now have native support for embedding generation
  • New CLI command to download the Nemotron-Personas datasets from NGC
  • Nemotron 3 Nano is out new default NVIDIA model for LLM-text columns
  • New processor documentation

What's Changed

Full Changelog: v0.1.5...v0.2.0

v0.1.5 2025-12-10

11 Dec 02:30
1cea292

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.1.4...v0.1.5

v0.1.4 2025-12-08

08 Dec 15:29
32515ba

Choose a tag to compare

What's Changed

Full Changelog: v0.1.3...v0.1.4

v0.1.3 2025-12-02

03 Dec 02:59
1946410

Choose a tag to compare

What's Changed

  • chore: fix example notebook drop=False -> drop=True to match comment by @nabinchha in #78
  • fix: Fix starting index for batch processing by @nabinchha in #80
  • fix: timestamp dataset name if there's a collision by @nabinchha in #77
  • docs: Drop state from PersonSamplerParams docstring by @mikeknep in #81
  • chore: update person fields by @johnnygreco in #83
  • docs: fix basic tutorial to use faker person sampling by @johnnygreco in #86

New Contributors

Full Changelog: v0.1.2...v0.1.3

v0.1.2 2025-11-24

25 Nov 00:00
91e2764

Choose a tag to compare

What's Changed

Full Changelog: v0.1.1...v0.1.2