Skip to content

fix(config): round-trip processors and profilers#605

Merged
johnnygreco merged 3 commits intomainfrom
johnny/fix-config-builder-from-config-roundtrip
May 5, 2026
Merged

fix(config): round-trip processors and profilers#605
johnnygreco merged 3 commits intomainfrom
johnny/fix-config-builder-from-config-roundtrip

Conversation

@johnnygreco
Copy link
Copy Markdown
Contributor

📋 Summary

DataDesignerConfigBuilder.from_config() did not restore processors or profilers from an existing DataDesignerConfig, so round-tripping through a builder silently dropped those settings. This PR restores those fields and adds a regression test for the behavior.

🔗 Related Issue

N/A

🔄 Changes

  • Restore processor configs when building a DataDesignerConfigBuilder from an existing config.
  • Restore profiler configs when building a DataDesignerConfigBuilder from an existing config.
  • Add a regression test covering processor and profiler round-trip behavior.

🧪 Testing

  • make test passes (not run; focused test suite below)
  • Unit tests added/updated
  • E2E tests added/updated (N/A)

Focused checks run:

  • uv run pytest packages/data-designer-config/tests/config/test_config_builder.py -q
  • uv run ruff check packages/data-designer-config/src/data_designer/config/config_builder.py packages/data-designer-config/tests/config/test_config_builder.py
  • uv run ruff format --check packages/data-designer-config/src/data_designer/config/config_builder.py packages/data-designer-config/tests/config/test_config_builder.py
  • git diff --check

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (N/A)

Signed-off-by: Johnny Greco <jogreco@nvidia.com>
@johnnygreco johnnygreco requested a review from a team as a code owner May 5, 2026 14:49
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 5, 2026

Greptile Summary

This PR fixes a silent data-loss bug in DataDesignerConfigBuilder.from_config() where processors and profilers present in an existing DataDesignerConfig were never restored, causing a round-trip through the builder to drop those settings entirely.

  • Adds two loops in from_config to replay add_processor and add_profiler calls onto the fresh builder, correctly inheriting the upsert semantics and DropColumnsProcessor side-effects that already exist on those methods.
  • Adds three regression tests: basic processor/profiler restoration, DropColumnsProcessorConfig glob-pattern side-effects (drop=True on matched columns), and a full field-coverage round-trip asserting every DataDesignerConfig field survives the builder cycle.

Confidence Score: 5/5

The change is a minimal, targeted fix — two loops added to an existing restoration method — with no side-effects on paths that did not previously touch processors or profilers.

The fix is small and correctly delegates to existing, already-tested add_processor and add_profiler methods, which handle upsert semantics and drop-column side-effects. The new tests exercise the exact scenarios described in the PR and cover the full field round-trip as a regression guard.

No files require special attention.

Important Files Changed

Filename Overview
packages/data-designer-config/src/data_designer/config/config_builder.py Adds processor and profiler restoration loops to from_config; fix is minimal and correct against a fresh builder instance
packages/data-designer-config/tests/config/test_config_builder.py Adds three targeted regression tests: basic round-trip, DropColumnsProcessor side-effects, and full field coverage for DataDesignerConfig

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant F as from_config()
    participant B as DataDesignerConfigBuilder

    C->>F: BuilderConfig (with processors/profilers)
    F->>B: cls(model_configs, tool_configs)
    loop columns
        F->>B: add_column(col)
    end
    loop constraints
        F->>B: add_constraint(constraint)
    end
    opt seed_config present
        F->>B: with_seed_dataset(source, strategy)
    end
    loop processors (NEW)
        F->>B: add_processor(processor)
        Note over B: upsert by name, applies DROP side-effects
    end
    loop profilers (NEW)
        F->>B: add_profiler(profiler)
        Note over B: appends to _profilers list
    end
    F-->>C: populated builder
Loading

Reviews (3): Last reviewed commit: "Merge branch 'main' into johnny/fix-conf..." | Re-trigger Greptile

andreatgretel
andreatgretel previously approved these changes May 5, 2026
Copy link
Copy Markdown
Contributor

@andreatgretel andreatgretel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just noting a couple of things that review flagged:

  • The new test skips DropColumnsProcessor, which is the one processor type with side effects on column state. Could perhaps add a case with a glob like col_* that asserts the per-column drop flags survive the round-trip.

  • Not for this PR, but from_config enumerates DataDesignerConfig fields by hand, which is what caused this bug. Might be worth a follow-up that asserts every field round-trips so the next field addition fails loudly. Claude Code flagged this one.

Smoke-tested locally with DropColumnsProcessor (incl. globs), multi-processor ordering, and YAML file round-trip - all good.

Signed-off-by: Johnny Greco <jogreco@nvidia.com>
@johnnygreco
Copy link
Copy Markdown
Contributor Author

@andreatgretel thanks for the notes. Addressed both in 28e4c293:

  • Added a DropColumnsProcessorConfig round-trip case with a glob (col_*) and assertions that the restored builder preserves the expected per-column drop flags.
  • Added broader DataDesignerConfig field round-trip coverage. The test derives the field set from DataDesignerConfig.model_fields, populates each current field, round-trips through from_config, and compares every restored field value so future field additions fail loudly if they are not carried through.

Focused checks are still passing locally:

  • uv run pytest packages/data-designer-config/tests/config/test_config_builder.py -q
  • uv run ruff check packages/data-designer-config/tests/config/test_config_builder.py
  • uv run ruff format --check packages/data-designer-config/tests/config/test_config_builder.py
  • git diff --check

@johnnygreco johnnygreco merged commit 8fb1320 into main May 5, 2026
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants