feat(models): deprecate implicit default provider routing#594
feat(models): deprecate implicit default provider routing#594
Conversation
Emit DeprecationWarning whenever the legacy "implicit default provider" path is exercised: `ModelConfig.provider=None`, the registry-level `ModelProviderRegistry.default`, the YAML `default:` key in `~/.data-designer/model_providers.yaml`, and the CLI's "Change default provider" workflow. `resolve_model_provider_registry` skips passing `default=` in the single-provider case so the common construction path stays quiet. Multi-provider registries still pass `default` (per `check_implicit_default`) and warn accordingly. Update docs, the package README, and test fixtures to specify `provider=` explicitly on every `ModelConfig`. New tests cover each warning entry point and pin the post-deprecation happy paths. Refs #589 Made-with: Cursor
|
Docs preview: https://bd23c6e8.dd-docs-preview.pages.dev
|
Review: PR #594 —
|
Greptile SummaryThis PR deprecates the implicit default provider routing across all four entry points:
|
| Filename | Overview |
|---|---|
| packages/data-designer-config/src/data_designer/config/utils/warning_helpers.py | New helper warn_at_caller correctly walks the call stack past pydantic-internal frames to attribute warnings to user call-sites; fallback path handles environments without sys._getframe |
| packages/data-designer-engine/src/data_designer/engine/model_provider.py | _warn_on_explicit_default correctly guards with model_fields_set to distinguish explicit default= from the field's implicit None; single-provider fast-path in resolve_model_provider_registry correctly avoids passing default= to keep common construction quiet |
| packages/data-designer-config/src/data_designer/config/models.py | _warn_on_implicit_provider post-validator correctly fires on provider=None construction and model_validate; warn_at_caller replaces the previously inadequate stacklevel=2 approach |
| packages/data-designer/src/data_designer/interface/data_designer.py | Correctly suppresses the registry-level DeprecationWarning inside resolve_model_provider_registry when the YAML-default warning has already fired, preventing a confusing duplicate-warning cascade for the same root cause |
| packages/data-designer/src/data_designer/cli/repositories/provider_repository.py | Correctly places the DeprecationWarning outside both try/except blocks so it isn't swallowed under filterwarnings("error"); local ModelProviderRegistry is a thin Pydantic model with no validators, avoiding a double-warning chain |
| packages/data-designer/src/data_designer/cli/controllers/provider_controller.py | Emits both a console print_warning and a DeprecationWarning when the deprecated "Change default provider" workflow is entered; no logic issues |
| packages/data-designer-config/src/data_designer/config/default_model_settings.py | get_default_provider_name now emits DeprecationWarning only when the YAML default: key is set (non-None); stacklevel=2 correctly points to the caller of this function |
| packages/data-designer-engine/tests/engine/test_model_provider.py | Comprehensive regression coverage for all four warning entry points plus "stays quiet" happy-path pins using warnings.simplefilter("error") |
Sequence Diagram
sequenceDiagram
participant User as User Code
participant MC as ModelConfig(provider=None)
participant PY as Pydantic Internals
participant VAL as _warn_on_implicit_provider
participant WH as warn_at_caller
participant W as warnings module
User->>MC: ModelConfig(alias=..., model=...)
MC->>PY: pydantic validation
PY->>VAL: model_validator(mode=after)
VAL->>WH: warn_at_caller(msg, DeprecationWarning)
WH->>WH: sys._getframe(2) walk past pydantic frames
WH->>W: warn_explicit(msg, file=user_file, lineno=user_line)
W-->>User: DeprecationWarning attributed to user call-site
Note over User,W: Registry-level path
participant RI as resolve_model_provider_registry
participant MPR as ModelProviderRegistry
participant VALD as _warn_on_explicit_default
participant DD as DataDesigner.__init__
User->>DD: DataDesigner() with YAML default
DD->>DD: get_default_provider_name() warns YAML default deprecated
DD->>DD: catch_warnings() suppress ModelProviderRegistry.default warning
DD->>RI: resolve_model_provider_registry(providers, default)
RI->>MPR: ModelProviderRegistry(providers, default=X)
MPR->>PY: pydantic validation
PY->>VALD: _warn_on_explicit_default suppressed by catch_warnings
VALD-->>DD: suppressed YAML warning already fired
Reviews (3): Last reviewed commit: "Merge branch 'main' into nmulepati/refac..." | Re-trigger Greptile
|
Thanks for putting this together, @nabinchha — the four entry points are mapped cleanly to issue #589 and the regression tests pin each one. I had a few thoughts after a careful read with edge cases in mind. SummaryThis PR lands the deprecation phase for the implicit-default-provider concept tracked in #589: Findings
Warnings — Worth addressing
Suggestions — Take it or leave it
What Looks Good
VerdictNeeds changes — Greptile's two findings (the swallowed-warning bug and the This review was generated by an AI assistant. |
Greptile P1: ProviderRepository.load emitted its DeprecationWarning
inside a `try/except Exception` block. Under
`filterwarnings("error", DeprecationWarning)` the warn would raise,
the except would swallow it, and `load()` would silently return None
(losing the registry). Move the warn outside the catch-all so the
strict-warning path no longer drops valid configs.
Greptile P2 / johnnygreco: `_warn_on_implicit_provider` and
`_warn_on_explicit_default` use `stacklevel=2`, which lands inside
pydantic v2's validator dispatch rather than at the user's
`ModelConfig(...)` / `ModelProviderRegistry(...)` call. That broke
both attribution (the source line was unhelpful) and Python's
once-per-location dedup (every call collapsed to the same
pydantic-internal key, suppressing all but the first warning).
Introduce `data_designer.config.utils.warning_helpers.warn_at_caller`,
which walks past the helper, validator, and any pydantic frames to
find the user's call site and emits via `warnings.warn_explicit` with
the user frame's `__warningregistry__`. Keeps attribution accurate
and dedup keyed on the user's (filename, lineno).
johnnygreco: align the `provider_repository.py` warning copy with the
sibling site in `default_model_settings.py` ("specify provider=
explicitly on each ModelConfig instead") so both YAML-default warning
sites give the same migration instruction. The previous wording
pointed users at "ModelConfig entries" inside `model_providers.yaml`,
where ModelConfig entries don't actually live.
johnnygreco: dedup the cascade in `DataDesigner.__init__`. With
`model_providers=None` and a YAML `default:`, the user previously saw
two DeprecationWarnings for the same root cause —
`get_default_provider_name()` warns about the YAML key, then
`resolve_model_provider_registry(...)` re-warns from
`_warn_on_explicit_default`. Suppress the registry-level duplicate in
the YAML-fallback branch via `warnings.catch_warnings()` so users see
exactly one warning per user action.
johnnygreco: tighten `_warn_on_explicit_default` to fire only when
`default is not None`. Passing `default=None` explicitly is
semantically equivalent to omitting it (caller is opting *out* of a
registry-level default), and shouldn't trigger the deprecation
nudge.
johnnygreco: add a `model_validate({...})` regression test for
`ModelConfig` so the deserialization path (legacy on-disk configs)
is pinned alongside the construction path.
Tests:
- Update `test_load_exists` and `test_save` to omit `default=` so the
roundtrip stops exercising the deprecated YAML-default path
unguarded (Greptile note).
- Wrap `test_resolve_model_provider_registry_with_explicit_default`,
`test_get_provider`, and
`test_init_user_supplied_providers_preserve_first_wins_over_yaml_default`
in `pytest.warns` so the suite stays green under
`-W error::DeprecationWarning` (Greptile note).
- Add `test_explicit_default_none_does_not_emit_deprecation_warning`
to pin the tightened predicate.
- Add `test_init_yaml_default_emits_single_deprecation_warning` to
pin the cascade-dedup behavior.
Refs #589
Made-with: Cursor
|
Thanks for the careful reads, @greptile-apps and @johnnygreco. Pushed Blockers (P1/P2) — fixedGreptile P1 / johnnygreco — Greptile P2 / johnnygreco — pydantic validator stacklevel. Introduced
Worth addressing — fixedjohnnygreco — johnnygreco — cascade of two warnings on a single Take-it-or-leave-it — takenjohnnygreco — johnnygreco — no regression test for the Test hygiene (Greptile notes, addressed)
Take-it-or-leave-it — leftjohnnygreco — Greptile — "warning storm" from johnnygreco — "two distinct nudges" debate on the cascade. I chose dedup (one warning per user action) over two separate nudges. The YAML-default warning already explains the migration; firing the registry-level deprecation immediately afterward for the same root cause was confusing without adding information. Easy to flip if the team prefers the louder option. Test counts
|
|
The implementation in P1 ( P2 ( Cascade dedup (
One minor observation: Everything else — dedup via Tip: You can customize Greptile's behavior for this repo with |
|
No blockers from me — both Greptile points are addressed correctly and the A couple of small follow-ups (do them here or in a fast-follow, your call):
Smaller nits, only worth chasing if you're already in the file:
|
| """ | ||
| default = _get_default_providers_file_content(MODEL_PROVIDERS_FILE_PATH).get("default") | ||
| if default is not None: | ||
| warnings.warn( |
There was a problem hiding this comment.
follow-up to johnnygreco's warn_at_caller work: this site still uses warnings.warn(stacklevel=2), so on the only real call path (DataDesigner.__init__:162) the warning is attributed to the data_designer library, not user code. python's default filter is default::DeprecationWarning:__main__ + ignore::DeprecationWarning, so library-attributed deprecations get silenced — verified empirically: a normal DataDesigner() call with a YAML default: set shows nothing under default filters. could either fire the warning from the __init__ boundary, or call warn_at_caller here too (with a small skip-list extension for data_designer.). non-blocking but worth doing in the same cycle while the deprecation messaging is fresh.
| frame = sys._getframe(2) if hasattr(sys, "_getframe") else None | ||
| while frame is not None: | ||
| module_name = frame.f_globals.get("__name__", "") | ||
| if not module_name.startswith("pydantic"): |
There was a problem hiding this comment.
related to johnnygreco's nit about startswith("pydantic") matching pydantic_helpers.py, there's a related issue going the other direction: when a ModelConfig or ModelProviderRegistry is constructed inside a data_designer helper (e.g. config builders, YAML loaders, resolve_model_provider_registry), the first non-pydantic frame is data_designer code, not the user's call site. the warning gets stamped at the library and silenced under default DeprecationWarning filters. confirmed via repro: resolve_model_provider_registry([a, b]) ends up attributed to model_provider.py:108. extending the skip to data_designer. (or accepting caller-supplied prefixes) would close the gap. easy to add a regression test asserting warning.filename lands on the test file rather than a library module.
📋 Summary
Deprecates the legacy "implicit default provider" routing before it's removed in a future release. Every entry point that exercises the implicit default —
ModelConfig.provider=None, the registry-levelModelProviderRegistry.default, the YAMLdefault:key, and the CLI's "Change default provider" workflow — now emits aDeprecationWarningpointing users at the explicitprovider=migration. Continues the work started in #591 / tracked under issue #589.🔗 Related Issue
Refs #589
🔄 Changes
✨ Added
ModelConfig._warn_on_implicit_provider— pydantic post-validator that warns wheneverproviderisNone(packages/data-designer-config/src/data_designer/config/models.py)ModelProviderRegistry._warn_on_explicit_default— fires only when caller actually passeddefault=(usesmodel_fields_setso the field-defaultNonepath stays quiet) (packages/data-designer-engine/src/data_designer/engine/model_provider.py)get_default_provider_name(),ProviderRepository.load, andProviderController._handle_change_default🔧 Changed
resolve_model_provider_registryskips passingdefault=in the single-provider case so the common construction path stays quiet under the new warning. Multi-provider registries still passdefault(percheck_implicit_default) and warn accordingly.stub_model_configsfixture and existingModelConfig-constructing tests now passprovider=explicitly so they don't trip the new warningModelConfig.providerandModelProviderRegistry.defaultannotated as deprecated📚 Docs
docs/concepts/models/model-providers.md,default-model-settings.md,custom-model-settings.md, andconfigure-model-settings-with-the-cli.mddocs/concepts/architecture-and-performance.md,inference-parameters.md, and thedata-designer-configREADME updated to setprovider=explicitly🔍 Attention Areas
packages/data-designer-engine/src/data_designer/engine/model_provider.py— `_warn_on_explicit_default` uses `model_fields_set` to distinguish "caller passed `default=`" from "field at default `None`". The single-provider `resolve_model_provider_registry` tweak relies on this distinction so common construction paths stay quiet. Worth a careful read.packages/data-designer-config/src/data_designer/config/models.py— `_warn_on_implicit_provider` runs at construction time, so any `ModelConfig` built without `provider=` (including legacy serialized configs loaded via `model_validate`) will now emit a warning. Confirm this is the intended blast radius.🧪 Testing
make testpasses (3,112 tests: 539 config + 1,921 engine + 652 interface)✅ Checklist