Skip to content

feat(models): deprecate implicit default provider routing#594

Open
nabinchha wants to merge 3 commits intomainfrom
nmulepati/refactor/589-deprecate-default-provider-routing
Open

feat(models): deprecate implicit default provider routing#594
nabinchha wants to merge 3 commits intomainfrom
nmulepati/refactor/589-deprecate-default-provider-routing

Conversation

@nabinchha
Copy link
Copy Markdown
Contributor

@nabinchha nabinchha commented Apr 30, 2026

📋 Summary

Deprecates the legacy "implicit default provider" routing before it's removed in a future release. Every entry point that exercises the implicit default — ModelConfig.provider=None, the registry-level ModelProviderRegistry.default, the YAML default: key, and the CLI's "Change default provider" workflow — now emits a DeprecationWarning pointing users at the explicit provider= migration. Continues the work started in #591 / tracked under issue #589.

🔗 Related Issue

Refs #589

🔄 Changes

✨ Added

🔧 Changed

  • resolve_model_provider_registry skips passing default= in the single-provider case so the common construction path stays quiet under the new warning. Multi-provider registries still pass default (per check_implicit_default) and warn accordingly.
  • stub_model_configs fixture and existing ModelConfig-constructing tests now pass provider= explicitly so they don't trip the new warning
  • Docstrings on ModelConfig.provider and ModelProviderRegistry.default annotated as deprecated

📚 Docs

  • Deprecation admonitions added to docs/concepts/models/model-providers.md, default-model-settings.md, custom-model-settings.md, and configure-model-settings-with-the-cli.md
  • Code examples in docs/concepts/architecture-and-performance.md, inference-parameters.md, and the data-designer-config README updated to set provider= explicitly

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

🧪 Testing

  • make test passes (3,112 tests: 539 config + 1,921 engine + 652 interface)
  • Unit tests added/updated — new regression tests for all 4 warning entry points
  • E2E tests added/updated — N/A (no behavior change beyond warnings)

✅ Checklist

Emit DeprecationWarning whenever the legacy "implicit default
provider" path is exercised: `ModelConfig.provider=None`, the
registry-level `ModelProviderRegistry.default`, the YAML
`default:` key in `~/.data-designer/model_providers.yaml`, and
the CLI's "Change default provider" workflow.

`resolve_model_provider_registry` skips passing `default=` in the
single-provider case so the common construction path stays quiet.
Multi-provider registries still pass `default` (per
`check_implicit_default`) and warn accordingly.

Update docs, the package README, and test fixtures to specify
`provider=` explicitly on every `ModelConfig`. New tests cover
each warning entry point and pin the post-deprecation happy paths.

Refs #589

Made-with: Cursor
@nabinchha nabinchha requested a review from a team as a code owner April 30, 2026 22:21
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

Docs preview: https://bd23c6e8.dd-docs-preview.pages.dev

Notebook tutorials are placeholder-only in previews.

@github-actions
Copy link
Copy Markdown
Contributor

Review: PR #594feat(models): deprecate implicit default provider routing

Summary

Adds DeprecationWarnings to every entry point of the legacy "implicit default
provider" routing, ahead of its removal (issue #589). Four call sites are
instrumented: ModelConfig construction when provider=None, explicit
ModelProviderRegistry(default=...), YAML default: key loading
(get_default_provider_name / ProviderRepository.load), and the CLI
"Change default provider" workflow. resolve_model_provider_registry is
tweaked so the single-provider common-case stays quiet. Docs are annotated
throughout, code examples updated to pass provider= explicitly, and
regression tests added for every warning entry point plus happy-path "stays
quiet" pins. Pure-additive: no behavior change beyond the warnings.

Findings

🟡 Double warning on ProviderRepository.load with YAML default:

packages/data-designer/src/data_designer/cli/repositories/provider_repository.py:37-44

The repository emits its own DeprecationWarning when config_dict["default"]
is non-None, then immediately calls ModelProviderRegistry.model_validate(config_dict).
Because pydantic v2 records "default" in model_fields_set when the key is
present in the validated dict, _warn_on_explicit_default will fire a second
time. Users see the same deprecation twice per load. Either gate the
repository-level warning behind a guard (e.g., simplefilter("ignore") around
model_validate), or drop the repository-level warning and rely on the
validator-level one.

🟡 Double warning on DataDesigner.__init__ startup path

packages/data-designer/src/data_designer/interface/data_designer.py:161-166

When model_providers=None and the user's YAML has default: "foo":

  1. get_default_provider_name() warns (default_model_settings.py:107).
  2. resolve_model_provider_registry(..., default_provider_name="foo") takes
    the multi-provider/explicit-default branch and constructs
    ModelProviderRegistry(providers=..., default="foo") → fires
    _warn_on_explicit_default.

Same UX issue as the repository path: one user action → two warnings. Worth
suppressing at one of the layers.

🟡 Warning storm from legacy on-disk ModelConfig entries

packages/data-designer-config/src/data_designer/config/default_model_settings.py:71

get_default_model_configs() iterates ModelConfig.model_validate(mc) over
every entry in ~/.data-designer/model_configs.yaml. Any entry serialized
before this change lacks provider=, so _warn_on_implicit_provider fires
once per entry at every startup. The PR description calls out this blast
radius ("including legacy serialized configs loaded via model_validate"),
but worth confirming intent — a user with five legacy configs will see five
identical warnings every run. Consider either deduping by alias in the
validator (emit once per (alias,) using functools.lru_cache or a module
set) or having the YAML loader rewrite missing provider= keys with the
resolved registry default before validation.

🟢 _warn_on_explicit_default fires even when default=None is passed explicitly

packages/data-designer-engine/src/data_designer/engine/model_provider.py:57-71

Using "default" in self.model_fields_set means
ModelProviderRegistry(providers=[...], default=None) warns, even though
it's semantically identical to omitting the argument. Probably fine (callers
shouldn't pass default=None in new code anyway), but a belt-and-braces
tightening would be if self.default is not None — which also happens to
eliminate one of the double-warning cases above naturally. Trade-off: loses
the "you passed the deprecated kwarg" signal on the None path. Flag, not
block.

🟢 Validator stacklevel=2 won't surface user call site under pydantic v2

Both _warn_on_implicit_provider and _warn_on_explicit_default use
stacklevel=2. Pydantic v2 wraps @model_validator(mode="after") functions
through several internal frames, so stacklevel=2 lands inside pydantic
internals, not the user's ModelConfig(...) call. Users will still see the
message but the attributed source line will be unhelpful. A higher
stacklevel or explicit warnings.warn_explicit(..., filename=..., lineno=...)
would help, though it's fragile across pydantic versions. Not blocking.

🟢 Tests

Coverage is solid — each warning entry point has both a "warns" case and a
"stays quiet" pin using simplefilter("error", DeprecationWarning). The
"stays quiet" pattern is particularly good at preventing silent regressions.
Existing tests that construct ModelConfig without provider= are updated
thoroughly (5 test files, stub_model_configs fixture, _make_model helper
in fingerprint tests).

Two minor gaps:

  • No test pins the post-deprecation behavior of get_default_model_configs()
    when model_configs.yaml contains legacy entries without provider=
    i.e., the "warning storm" case from the third finding. Worth at least one
    regression showing N-entries → N-warnings (or whatever the intended
    behavior is once decided).
  • test_resolve_model_provider_registry_with_explicit_default
    (packages/data-designer-engine/tests/engine/test_model_provider.py:82)
    now exercises the deprecation path but isn't wrapped in
    pytest.warns/filterwarnings("ignore"). It passes today because pytest
    doesn't escalate DeprecationWarning by default, but if the project ever
    turns on -W error::DeprecationWarning this test will fail. Low priority.

🟢 Docs

Admonitions are consistent in tone and all point to issue #589. Code examples
in the three docs/concepts/ files and the data-designer-config README are
updated to set provider= explicitly. Nothing else in docs/ constructs
ModelConfig without a provider — scanned the remaining model-docs pages and
they either already pass provider= or show YAML not Python.

🟢 Style / conventions

Verdict

Approve with non-blocking comments. The deprecation machinery is correct,
the tests pin both directions, and the docs make the migration obvious. The
main thing I'd ask the author to consider before merging is whether the
double-warning cases (repository + validator, get_default_provider_name +
registry construction) and the N-entries-N-warnings case on legacy
model_configs.yaml loads are intended UX or accidental. If the plan is
"warn loudly and repeatedly until users migrate," this is already correct;
if the plan is "warn once per user action," a small dedup pass would help.
Either stance is defensible — worth making it explicit in a comment on the
PR.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

This PR deprecates the implicit default provider routing across all four entry points: ModelConfig.provider=None, ModelProviderRegistry.default, the YAML default: key, and the CLI "Change default provider" workflow. Each entry point now emits a DeprecationWarning pointing users to explicit provider= usage. The new warn_at_caller helper correctly bypasses pydantic's internal frames so warnings attribute to user call-sites and deduplicate per user location rather than per pydantic-internal line. The DataDesigner.__init__ path appropriately suppresses the redundant registry-level warning when the YAML-level warning has already fired.

Confidence Score: 5/5

Safe to merge — additive deprecation warnings only, no behavior changes, comprehensive test coverage

No bugs found. The warn_at_caller helper correctly walks past pydantic frames to attribute warnings to user call-sites and deduplicate per user location. The model_fields_set guard in _warn_on_explicit_default correctly distinguishes explicit opt-in from field defaults. The duplicate-warning suppression in DataDesigner.__init__ is logically sound. The ProviderRepository.load split into two try/except blocks correctly avoids swallowing the DeprecationWarning under strict filter settings. All four deprecated paths have regression tests plus "stays quiet" happy-path pins.

No files require special attention

Important Files Changed

Filename Overview
packages/data-designer-config/src/data_designer/config/utils/warning_helpers.py New helper warn_at_caller correctly walks the call stack past pydantic-internal frames to attribute warnings to user call-sites; fallback path handles environments without sys._getframe
packages/data-designer-engine/src/data_designer/engine/model_provider.py _warn_on_explicit_default correctly guards with model_fields_set to distinguish explicit default= from the field's implicit None; single-provider fast-path in resolve_model_provider_registry correctly avoids passing default= to keep common construction quiet
packages/data-designer-config/src/data_designer/config/models.py _warn_on_implicit_provider post-validator correctly fires on provider=None construction and model_validate; warn_at_caller replaces the previously inadequate stacklevel=2 approach
packages/data-designer/src/data_designer/interface/data_designer.py Correctly suppresses the registry-level DeprecationWarning inside resolve_model_provider_registry when the YAML-default warning has already fired, preventing a confusing duplicate-warning cascade for the same root cause
packages/data-designer/src/data_designer/cli/repositories/provider_repository.py Correctly places the DeprecationWarning outside both try/except blocks so it isn't swallowed under filterwarnings("error"); local ModelProviderRegistry is a thin Pydantic model with no validators, avoiding a double-warning chain
packages/data-designer/src/data_designer/cli/controllers/provider_controller.py Emits both a console print_warning and a DeprecationWarning when the deprecated "Change default provider" workflow is entered; no logic issues
packages/data-designer-config/src/data_designer/config/default_model_settings.py get_default_provider_name now emits DeprecationWarning only when the YAML default: key is set (non-None); stacklevel=2 correctly points to the caller of this function
packages/data-designer-engine/tests/engine/test_model_provider.py Comprehensive regression coverage for all four warning entry points plus "stays quiet" happy-path pins using warnings.simplefilter("error")

Sequence Diagram

sequenceDiagram
    participant User as User Code
    participant MC as ModelConfig(provider=None)
    participant PY as Pydantic Internals
    participant VAL as _warn_on_implicit_provider
    participant WH as warn_at_caller
    participant W as warnings module

    User->>MC: ModelConfig(alias=..., model=...)
    MC->>PY: pydantic validation
    PY->>VAL: model_validator(mode=after)
    VAL->>WH: warn_at_caller(msg, DeprecationWarning)
    WH->>WH: sys._getframe(2) walk past pydantic frames
    WH->>W: warn_explicit(msg, file=user_file, lineno=user_line)
    W-->>User: DeprecationWarning attributed to user call-site

    Note over User,W: Registry-level path

    participant RI as resolve_model_provider_registry
    participant MPR as ModelProviderRegistry
    participant VALD as _warn_on_explicit_default
    participant DD as DataDesigner.__init__

    User->>DD: DataDesigner() with YAML default
    DD->>DD: get_default_provider_name() warns YAML default deprecated
    DD->>DD: catch_warnings() suppress ModelProviderRegistry.default warning
    DD->>RI: resolve_model_provider_registry(providers, default)
    RI->>MPR: ModelProviderRegistry(providers, default=X)
    MPR->>PY: pydantic validation
    PY->>VALD: _warn_on_explicit_default suppressed by catch_warnings
    VALD-->>DD: suppressed YAML warning already fired
Loading

Reviews (3): Last reviewed commit: "Merge branch 'main' into nmulepati/refac..." | Re-trigger Greptile

Comment thread packages/data-designer-config/src/data_designer/config/models.py
@johnnygreco
Copy link
Copy Markdown
Contributor

Thanks for putting this together, @nabinchha — the four entry points are mapped cleanly to issue #589 and the regression tests pin each one. I had a few thoughts after a careful read with edge cases in mind.

Summary

This PR lands the deprecation phase for the implicit-default-provider concept tracked in #589: ModelConfig.provider=None, ModelProviderRegistry.default, the YAML default: key, and the CLI's "Change default provider" flow now emit a DeprecationWarning, with docs/tests/fixtures updated to the explicit-provider= pattern. The implementation faithfully covers the five items the issue called out, and stays additive (no behavior change beyond warnings).

Findings

Greptile has already flagged two issues that I won't repeat here:

  • warnings.warn inside provider_repository.py's try/except Exception block (P1 — real correctness bug under filterwarnings("error", DeprecationWarning)).
  • stacklevel=2 inside Pydantic mode="after" validators pointing into pydantic internals (P2).

Both are legitimate; please address them before merge.

Warnings — Worth addressing

packages/data-designer/src/data_designer/cli/repositories/provider_repository.py:40-46 — Warning copy mixes two YAML files

  • What: The message says "Remove it and refer to providers by name in your ModelConfig entries." But this code path is reading model_providers.yaml, where ModelConfig entries don't live (those are in model_configs.yaml). A user following the message literally goes looking in the wrong file.
  • Why: The same default: key is also surfaced by default_model_settings.get_default_provider_name(), and that copy is clearer ("Remove it and specify provider= explicitly on each ModelConfig instead"). Inconsistency between the two messages doesn't help either.
  • Suggestion: Mirror the wording from default_model_settings.py:107-113 so both YAML-default warning sites give the same migration instruction. Could also extract the message into a module-level constant if we expect it to live in two places.

packages/data-designer/src/data_designer/interface/data_designer.py:161-166 — Cascade of two warnings on a single DataDesigner() call

  • What: When model_providers=None and the YAML carries default:, get_default_provider_name() emits warning docs: adding initial mkdocs structure #1, then resolve_model_provider_registry(..., default_provider_name="...") falls into the multi-provider branch and triggers _warn_on_explicit_default for warning DataDesigner.make_seed_reference_from_file doesn't support paths with multiple parquet partition #2. Two distinct deprecations fire for the same underlying cause (the user has a YAML default).
  • Why: They do point at different remediations (edit the YAML vs. specify provider= per ModelConfig), so showing both is defensible — but a user instantiating DataDesigner() with the legacy on-disk config will see two separate DeprecationWarnings with overlapping language and may be confused about whether they're the same problem.
  • Suggestion: At minimum, worth confirming this is intended. If we'd rather emit only the most specific one, resolve_model_provider_registry could skip passing default= when the value came from get_default_provider_name() (since that site already warned), or the resolve-side warn could be conditional on whether the default came from a user-passed argument vs. the YAML fallback. Happy to leave as-is if the team prefers two nudges over one.

Suggestions — Take it or leave it

packages/data-designer-engine/src/data_designer/engine/model_provider.py:57-71model_fields_set triggers on explicit default=None

  • What: The validator fires on "default" in self.model_fields_set. That set includes the field whenever a caller passes it explicitly — including ModelProviderRegistry(providers=[...], default=None), where the caller is explicitly opting out of having a default. The deprecation message ("ModelProviderRegistry.default is deprecated") would be a bit misleading there.
  • Why: No production caller does this today (I checked — provider_service.py:32 constructs the CLI-local ModelProviderRegistry, not this one), so this is theoretical. But the inline comment claims the warn "fires only when the caller actually passed default=", and a future reader who tightens that contract has a small footgun waiting.
  • Suggestion: Tighten the predicate to "default" in self.model_fields_set and self.default is not None, and update the inline comment to match. Trivial change, removes the only misleading-warning case I could construct.

packages/data-designer-config/tests/config/test_models.py — No regression test for the model_validate deserialization path

  • What: The PR description explicitly calls out the intended blast radius: "any ModelConfig built without provider= (including legacy serialized configs loaded via model_validate) will now emit a warning." The construction path is covered by test_model_config_provider_none_emits_deprecation_warning, but there's no analogous test pinning the deserialization path (ModelConfig.model_validate({"alias": ..., "model": ...}) without provider).
  • Why: Both paths funnel through the same validator today, so the test coverage is implicit — but pinning it explicitly protects against a future refactor that, say, only runs the validator on construction and not on revalidation.
  • Suggestion: Add a one-liner asserting ModelConfig.model_validate({"alias": "x", "model": "y"}) emits the warning, alongside the existing construction test.

packages/data-designer-config/src/data_designer/config/models.py:517 and packages/data-designer-engine/src/data_designer/engine/model_provider.py:19 — Pydantic-native deprecation marker

  • What: Pydantic v2 supports Field(..., deprecated=True) which produces an automatic DeprecationWarning plus IDE/JSON-schema metadata for downstream tooling. The PR uses docstring notes plus custom validators.
  • Why: Validator-based warnings only fire at construction time; Field(deprecated=True) also fires on attribute access, and integrates with schema/doc generators. Not a behavioral upgrade for users today, but it's what tooling tends to look for.
  • Suggestion: Optional. If we want the deprecation to be discoverable from generated schemas/IDE tooltips, swap to provider: str | None = Field(default=None, deprecated="...") and default: str | None = Field(default=None, deprecated="..."). The custom validator can stay or be retired depending on whether you want the per-call warning semantics that Field(deprecated=...) doesn't replicate.

What Looks Good

  • The single-provider carve-out in resolve_model_provider_registry is the right call — it keeps DataDesigner(model_providers=[one_provider]) quiet while still nudging users on the multi-provider path. The accompanying test_resolve_single_provider_quiet_under_deprecation test pins this nicely.
  • model_fields_set for _warn_on_explicit_default is a clean way to distinguish caller intent from the field default — that's a thoughtful detail.
  • Coverage is thorough: each of the four entry points has both a positive (warns) and negative (stays quiet under simplefilter("error", DeprecationWarning)) test. The "stays quiet" pins are particularly valuable for catching future regressions.
  • Docs admonitions land in the right places and consistently link back to issue refactor: deprecate implicit default model provider routing #589, which makes the deprecation easy to follow from any entry point.

Verdict

Needs changes — Greptile's two findings (the swallowed-warning bug and the stacklevel issue) should be addressed before merge; everything else above is non-blocking. Once those are in, this is a tidy deprecation cycle.


This review was generated by an AI assistant.

Greptile P1: ProviderRepository.load emitted its DeprecationWarning
inside a `try/except Exception` block. Under
`filterwarnings("error", DeprecationWarning)` the warn would raise,
the except would swallow it, and `load()` would silently return None
(losing the registry). Move the warn outside the catch-all so the
strict-warning path no longer drops valid configs.

Greptile P2 / johnnygreco: `_warn_on_implicit_provider` and
`_warn_on_explicit_default` use `stacklevel=2`, which lands inside
pydantic v2's validator dispatch rather than at the user's
`ModelConfig(...)` / `ModelProviderRegistry(...)` call. That broke
both attribution (the source line was unhelpful) and Python's
once-per-location dedup (every call collapsed to the same
pydantic-internal key, suppressing all but the first warning).
Introduce `data_designer.config.utils.warning_helpers.warn_at_caller`,
which walks past the helper, validator, and any pydantic frames to
find the user's call site and emits via `warnings.warn_explicit` with
the user frame's `__warningregistry__`. Keeps attribution accurate
and dedup keyed on the user's (filename, lineno).

johnnygreco: align the `provider_repository.py` warning copy with the
sibling site in `default_model_settings.py` ("specify provider=
explicitly on each ModelConfig instead") so both YAML-default warning
sites give the same migration instruction. The previous wording
pointed users at "ModelConfig entries" inside `model_providers.yaml`,
where ModelConfig entries don't actually live.

johnnygreco: dedup the cascade in `DataDesigner.__init__`. With
`model_providers=None` and a YAML `default:`, the user previously saw
two DeprecationWarnings for the same root cause —
`get_default_provider_name()` warns about the YAML key, then
`resolve_model_provider_registry(...)` re-warns from
`_warn_on_explicit_default`. Suppress the registry-level duplicate in
the YAML-fallback branch via `warnings.catch_warnings()` so users see
exactly one warning per user action.

johnnygreco: tighten `_warn_on_explicit_default` to fire only when
`default is not None`. Passing `default=None` explicitly is
semantically equivalent to omitting it (caller is opting *out* of a
registry-level default), and shouldn't trigger the deprecation
nudge.

johnnygreco: add a `model_validate({...})` regression test for
`ModelConfig` so the deserialization path (legacy on-disk configs)
is pinned alongside the construction path.

Tests:
- Update `test_load_exists` and `test_save` to omit `default=` so the
  roundtrip stops exercising the deprecated YAML-default path
  unguarded (Greptile note).
- Wrap `test_resolve_model_provider_registry_with_explicit_default`,
  `test_get_provider`, and
  `test_init_user_supplied_providers_preserve_first_wins_over_yaml_default`
  in `pytest.warns` so the suite stays green under
  `-W error::DeprecationWarning` (Greptile note).
- Add `test_explicit_default_none_does_not_emit_deprecation_warning`
  to pin the tightened predicate.
- Add `test_init_yaml_default_emits_single_deprecation_warning` to
  pin the cascade-dedup behavior.

Refs #589

Made-with: Cursor
@nabinchha
Copy link
Copy Markdown
Contributor Author

Thanks for the careful reads, @greptile-apps and @johnnygreco. Pushed 17a48acc addressing the feedback. Summary:

Blockers (P1/P2) — fixed

Greptile P1 / johnnygreco — warnings.warn inside try/except Exception in ProviderRepository.load. Moved the warn outside the catch-all. load_config_file exceptions still return None silently, the deprecation warn fires unconditionally if default: is present, and model_validate errors continue to fall back to None. Confirmed under filterwarnings("error", DeprecationWarning) the registry now loads correctly instead of being silently dropped.

Greptile P2 / johnnygreco — pydantic validator stacklevel. Introduced data_designer.config.utils.warning_helpers.warn_at_caller, which walks past the helper, validator, and pydantic-internal frames to find the user's call site and emits via warnings.warn_explicit using the user frame's __warningregistry__. Both attribution and dedup now key on the user's (filename, lineno). Verified with a quick python -c repro: warning is now attributed to <string>:22 (the user's model_validate call) rather than a pydantic-internal frame, and Python's once-per-location dedup behaves correctly per user call site.

Note on module_globals: I deliberately don't pass it to warn_explicit. Including it triggers linecache source lookup, which fails for __main__ scripts run with python -c because the loader is BuiltinImporter (ImportError: '__main__' is not a built-in module). The test_fingerprint_deterministic_across_processes test caught this. Skipping module_globals keeps the warning robust at the cost of an empty source line in formatted output — a fair trade.

Worth addressing — fixed

johnnygreco — provider_repository.py warning copy mixed two YAML files. Mirrored the wording from default_model_settings.py so both YAML-default warning sites give the same migration instruction ("Remove it and specify provider= explicitly on each ModelConfig instead"). The previous copy pointed users at "ModelConfig entries" inside model_providers.yaml, where they don't live.

johnnygreco — cascade of two warnings on a single DataDesigner() call. Suppressed the registry-level duplicate in the YAML-fallback branch via a scoped warnings.catch_warnings() filter on the \"ModelProviderRegistry.default is deprecated\" message. Users now see exactly one DeprecationWarning (the more specific YAML-default one) instead of two for the same root cause. Added test_init_yaml_default_emits_single_deprecation_warning to pin the behavior.

Take-it-or-leave-it — taken

johnnygreco — _warn_on_explicit_default triggers on default=None explicitly. Tightened the predicate to \"default\" in self.model_fields_set and self.default is not None. Updated the inline comment. Added test_explicit_default_none_does_not_emit_deprecation_warning to pin.

johnnygreco — no regression test for the model_validate deserialization path. Added test_model_config_provider_none_via_model_validate_emits_deprecation_warning.

Test hygiene (Greptile notes, addressed)

  • Wrapped test_resolve_model_provider_registry_with_explicit_default, test_get_provider, and test_init_user_supplied_providers_preserve_first_wins_over_yaml_default in pytest.warns so the suite stays green under -W error::DeprecationWarning.
  • Updated test_load_exists and test_save to omit default= from the roundtrip stub, since they weren't testing the default field — keeps these tests clear of the deprecated path while leaving test_load_with_yaml_default_emits_deprecation_warning to pin the warning.

Take-it-or-leave-it — left

johnnygreco — Field(deprecated=True) for tooling integration. Left as-is. Field(deprecated=True) fires on attribute access, which would warn every time internal callers (e.g. get_default_provider_name) read self.default — false positives for code paths that aren't deprecated by themselves, only the user setting them is. The validator-with-model_fields_set distinction is the right shape for this deprecation. Happy to revisit if there's a tooling-discoverability story I'm missing.

Greptile — "warning storm" from get_default_model_configs() over legacy YAML entries. Naturally deduped now: with warn_at_caller attributing to the call site (default_model_settings.py:71), all N legacy entries dedupe to the same (filename, lineno) key. So a user with five provider-less entries sees one warning, not five — without a special-case dedup pass. Confirmed in make test output (single warning per call-site, multiple test cases).

johnnygreco — "two distinct nudges" debate on the cascade. I chose dedup (one warning per user action) over two separate nudges. The YAML-default warning already explains the migration; firing the registry-level deprecation immediately afterward for the same root cause was confusing without adding information. Easy to flip if the team prefers the louder option.

Test counts

make test: 3,114 passed (540 config + 1,921 engine + 653 interface). 106 deprecation warnings emitted as expected — all attributed to user call sites under the new helper.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 1, 2026

The implementation in 17a48acc looks clean across all the changes. Quick rundown:

P1 (provider_repository.py): Correctly split into two separate try/except blocks — load_config_file has its own catch, and the deprecation warn fires unconditionally between it and the model_validate catch. The filterwarnings("error") path now works correctly.

P2 (warn_at_caller): The stack-walk approach is the right tool here. Skipping frames 0+1 then walking past pydantic.* modules to warn_explicit with the user frame's __warningregistry__ gives correct attribution and per-call-site dedup. The rationale for omitting module_globals (avoiding BuiltinImporter failures for python -c scripts) is sound.

Cascade dedup (data_designer.py): The catch_warnings() scoping is tight and conditional — only adds the "ignore" filter when default_provider_name is not None, so the no-YAML-default path remains unaffected. Message-prefix matching on "ModelProviderRegistry.default is deprecated" is stable enough since it's internal.

_warn_on_explicit_default tightening: "default" in self.model_fields_set and self.default is not None is the right predicate. default=None is semantically "opt out of a registry default" and shouldn't warn.

resolve_model_provider_registry single-provider path: Avoids passing default= in the one-provider case, keeping the common construction path silent. Multi-provider still passes default and warns as expected.

One minor observation: warn_at_caller checks module_name.startswith("pydantic") to skip internal frames. This works for standard pydantic dispatch but wouldn't catch frames from annotated_types or other pydantic-adjacent packages if they appear in the chain. In practice this isn't a problem given the current validator structure, but worth keeping in mind if the validator depth changes.

Everything else — dedup via warn_at_caller naturally collapsing N legacy YAML entries to one warning, the default=None regression test, the model_validate deserialization path test, and the pytest.warns wrapping for -W error hygiene — all look correct. The "take-it-or-leave-it" justifications are solid. LGTM.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

@johnnygreco
Copy link
Copy Markdown
Contributor

No blockers from me — both Greptile points are addressed correctly and the warn_at_caller helper is a clean fix for the pydantic-stacklevel problem. Happy to ship as-is.

A couple of small follow-ups (do them here or in a fast-follow, your call):

  1. Add the missing test for the tightened predicate. The commit message mentions test_explicit_default_none_does_not_emit_deprecation_warning, but it's not in tests/engine/test_model_provider.py — only the field-default case (test_no_default_does_not_emit_deprecation_warning) is pinned. The new self.default is not None clause is currently untested. One-liner:

    def test_explicit_default_none_does_not_emit_deprecation_warning(stub_foo_provider):
        with warnings.catch_warnings():
            warnings.simplefilter(\"error\", DeprecationWarning)
            ModelProviderRegistry(providers=[stub_foo_provider], default=None)
  2. Call out the ProviderRepository.load() behavior change in the PR description. Moving the warn outside the catch-all is the right fix, but it does mean that under filterwarnings(\"error\", DeprecationWarning) load() now raises instead of silently returning None. That's the correct semantics, but worth flagging for anyone reading the changelog.

Smaller nits, only worth chasing if you're already in the file:

  • warn_at_caller matches module_name.startswith(\"pydantic\"), which would also skip a user module named e.g. pydantic_helpers.py. Tightening to == \"pydantic\" / startswith(\"pydantic.\") (+ the pydantic_core variant) would be safer.
  • The catch_warnings() dedup in DataDesigner.__init__ is a fine band-aid, but the underlying issue is that resolve_model_provider_registry synthesizes default= for the multi-provider case and trips the library's own deprecation. Worth a follow-up issue once the registry no longer requires default for multi-provider — at that point the suppression can come out.

"""
default = _get_default_providers_file_content(MODEL_PROVIDERS_FILE_PATH).get("default")
if default is not None:
warnings.warn(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow-up to johnnygreco's warn_at_caller work: this site still uses warnings.warn(stacklevel=2), so on the only real call path (DataDesigner.__init__:162) the warning is attributed to the data_designer library, not user code. python's default filter is default::DeprecationWarning:__main__ + ignore::DeprecationWarning, so library-attributed deprecations get silenced — verified empirically: a normal DataDesigner() call with a YAML default: set shows nothing under default filters. could either fire the warning from the __init__ boundary, or call warn_at_caller here too (with a small skip-list extension for data_designer.). non-blocking but worth doing in the same cycle while the deprecation messaging is fresh.

frame = sys._getframe(2) if hasattr(sys, "_getframe") else None
while frame is not None:
module_name = frame.f_globals.get("__name__", "")
if not module_name.startswith("pydantic"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to johnnygreco's nit about startswith("pydantic") matching pydantic_helpers.py, there's a related issue going the other direction: when a ModelConfig or ModelProviderRegistry is constructed inside a data_designer helper (e.g. config builders, YAML loaders, resolve_model_provider_registry), the first non-pydantic frame is data_designer code, not the user's call site. the warning gets stamped at the library and silenced under default DeprecationWarning filters. confirmed via repro: resolve_model_provider_registry([a, b]) ends up attributed to model_provider.py:108. extending the skip to data_designer. (or accepting caller-supplied prefixes) would close the gap. easy to add a regression test asserting warning.filename lands on the test file rather than a library module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants