Skip to content

Commit f73da19

Browse files
authored
feat(models): deprecate implicit default provider routing (#594)
* feat(models): deprecate implicit default provider routing Emit DeprecationWarning whenever the legacy "implicit default provider" path is exercised: `ModelConfig.provider=None`, the registry-level `ModelProviderRegistry.default`, the YAML `default:` key in `~/.data-designer/model_providers.yaml`, and the CLI's "Change default provider" workflow. `resolve_model_provider_registry` skips passing `default=` in the single-provider case so the common construction path stays quiet. Multi-provider registries still pass `default` (per `check_implicit_default`) and warn accordingly. Update docs, the package README, and test fixtures to specify `provider=` explicitly on every `ModelConfig`. New tests cover each warning entry point and pin the post-deprecation happy paths. Refs #589 Made-with: Cursor * fix(models): address PR #594 review feedback Greptile P1: ProviderRepository.load emitted its DeprecationWarning inside a `try/except Exception` block. Under `filterwarnings("error", DeprecationWarning)` the warn would raise, the except would swallow it, and `load()` would silently return None (losing the registry). Move the warn outside the catch-all so the strict-warning path no longer drops valid configs. Greptile P2 / johnnygreco: `_warn_on_implicit_provider` and `_warn_on_explicit_default` use `stacklevel=2`, which lands inside pydantic v2's validator dispatch rather than at the user's `ModelConfig(...)` / `ModelProviderRegistry(...)` call. That broke both attribution (the source line was unhelpful) and Python's once-per-location dedup (every call collapsed to the same pydantic-internal key, suppressing all but the first warning). Introduce `data_designer.config.utils.warning_helpers.warn_at_caller`, which walks past the helper, validator, and any pydantic frames to find the user's call site and emits via `warnings.warn_explicit` with the user frame's `__warningregistry__`. Keeps attribution accurate and dedup keyed on the user's (filename, lineno). johnnygreco: align the `provider_repository.py` warning copy with the sibling site in `default_model_settings.py` ("specify provider= explicitly on each ModelConfig instead") so both YAML-default warning sites give the same migration instruction. The previous wording pointed users at "ModelConfig entries" inside `model_providers.yaml`, where ModelConfig entries don't actually live. johnnygreco: dedup the cascade in `DataDesigner.__init__`. With `model_providers=None` and a YAML `default:`, the user previously saw two DeprecationWarnings for the same root cause — `get_default_provider_name()` warns about the YAML key, then `resolve_model_provider_registry(...)` re-warns from `_warn_on_explicit_default`. Suppress the registry-level duplicate in the YAML-fallback branch via `warnings.catch_warnings()` so users see exactly one warning per user action. johnnygreco: tighten `_warn_on_explicit_default` to fire only when `default is not None`. Passing `default=None` explicitly is semantically equivalent to omitting it (caller is opting *out* of a registry-level default), and shouldn't trigger the deprecation nudge. johnnygreco: add a `model_validate({...})` regression test for `ModelConfig` so the deserialization path (legacy on-disk configs) is pinned alongside the construction path. Tests: - Update `test_load_exists` and `test_save` to omit `default=` so the roundtrip stops exercising the deprecated YAML-default path unguarded (Greptile note). - Wrap `test_resolve_model_provider_registry_with_explicit_default`, `test_get_provider`, and `test_init_user_supplied_providers_preserve_first_wins_over_yaml_default` in `pytest.warns` so the suite stays green under `-W error::DeprecationWarning` (Greptile note). - Add `test_explicit_default_none_does_not_emit_deprecation_warning` to pin the tightened predicate. - Add `test_init_yaml_default_emits_single_deprecation_warning` to pin the cascade-dedup behavior. Refs #589 Made-with: Cursor * fix(models): make deprecation warnings visible under default filters andreatgretel (PR #594): the YAML-default warning in `get_default_provider_name` and the registry-default warning emitted from inside DataDesigner helpers were attributing to data_designer library frames, not user code. Python's default filter chain includes `ignore::DeprecationWarning`, so library-attributed entries are silenced — meaning a normal `DataDesigner()` call with a YAML `default:` set showed nothing, and `resolve_model_provider_registry` warnings were similarly invisible. Two related changes: 1. `warn_at_caller`: extend the default skip-list from `("pydantic",)` to `("pydantic", "pydantic_core", "data_designer")` so the walk escapes both pydantic's validator-dispatch frames and data_designer helper frames before attributing. Also tighten the prefix predicate to exact-or-dotted-prefix matching (`name == p or name.startswith(p + ".")`) so e.g. `pydantic_helpers` is not falsely matched as part of `pydantic` (johnnygreco nit). Allow callers to pass a custom `skip_prefixes` for flexibility. Drop the "skip frame 0+1 unconditionally" guard now that prefix matching covers it. 2. `get_default_provider_name`: switch from `warnings.warn(stacklevel=2)` to `warn_at_caller`. The previous stacklevel pointed into `default_model_settings.py`, which is a library file → silenced under default filters. Verified the fix empirically with `python -W default`: warning is now attributed to the user's call site and rendered. johnnygreco (PR #594): add the missing `test_explicit_default_none_does_not_emit_deprecation_warning` regression for the `self.default is not None` predicate landed in the prior round. Tests: - New `test_warning_helpers.py` pins prefix-matching precision (rejects `pydantic_helpers` / `data_designer_other`), default skip-list contents, attribution past skip-prefix frames, and per-call-site dedup behavior. - `test_get_default_provider_name_warning_attributes_to_user_frame` pins andreatgretel's repro for the YAML-default site. - `test_explicit_default_warning_attributes_to_user_frame` pins the multi-frame case: construction goes through `resolve_model_provider_registry`, so the walk has to escape both pydantic and data_designer before landing on the test file. - `test_explicit_default_none_does_not_emit_deprecation_warning` pins johnnygreco's predicate-tightening regression. 3,124 tests pass (540 config + 1,923 engine + 653 interface; +10 net from this round). Refs #589 Made-with: Cursor * fix(models): apply warn_at_caller to remaining deprecation sites greptile-apps (PR #594, r3189904028): `ProviderRepository.load`'s YAML-default `DeprecationWarning` was using `warnings.warn(stacklevel=2)`, which attributes to whichever data_designer frame called `load()` — controllers, services, list/reset commands, agent introspection. Every real call path lands on `data_designer.cli.*`, which falls under Python's default `ignore::DeprecationWarning` filter and is silenced. Audit found two more sites with the same problem: - `DatasetBuilder._resolve_async_compatibility` (`allow_resize` / issue #552) — was using `stacklevel=4` to walk past `_resolve_async_compatibility -> build/build_preview -> interface -> user`. Brittle: any added frame (decorator, async wrapping, the `try/except DeprecationWarning: raise` boundary) shifts attribution silently. The existing test passed only because it used `simplefilter("always") + record=True`, which records warnings regardless of attribution. - `ProviderController._handle_change_default` — was using `stacklevel=2`, which lands on the menu dispatcher in the same controller module. `print_warning` already shows the message visually, but programmatic observers (`pytest.warns`, `filterwarnings("error", ...)`) saw a library-attributed entry that default filters silenced. All three migrated to `warn_at_caller` (the helper from 247fa30) so attribution lands on the user's call site regardless of internal chain shape. `data_designer` is already in `DEFAULT_INTERNAL_PREFIXES`, so the walk escapes the entire library in one pass. Added attribution regression tests at each site asserting `warning.filename == __file__`. A future regression to `warnings.warn(stacklevel=N)` now fails CI instead of silently silencing the user-facing nudge: - `test_load_with_yaml_default_attributes_warning_to_caller` (test_provider_repository.py) - `test_resolve_async_compatibility` extended with the same assertion - `test_handle_change_default_emits_deprecation_warning` rewritten from `pytest.warns(...)` to a `catch_warnings(record=True)` block that filters for the message and asserts `filename == __file__` (`pytest.warns` does not check attribution, so the rewrite is required to actually catch the regression). 3,125 tests pass (548 config + 1,923 engine + 654 interface). Refs #589
1 parent 8fb1320 commit f73da19

26 files changed

Lines changed: 753 additions & 25 deletions

docs/concepts/architecture-and-performance.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,6 +161,7 @@ import data_designer.config as dd
161161
model = dd.ModelConfig(
162162
alias="my-model",
163163
model="nvidia/nemotron-3-nano-30b-a3b",
164+
provider="nvidia",
164165
inference_parameters=dd.ChatCompletionInferenceParams(
165166
max_parallel_requests=8,
166167
),

docs/concepts/models/configure-model-settings-with-the-cli.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,12 @@ data-designer config providers
6767

6868
**Change default provider**: Set which provider is used by default. This option is only available when multiple providers are configured.
6969

70+
!!! warning "Deprecated: 'Change default provider' workflow"
71+
The "Change default provider" workflow is **deprecated** and will be removed in a future
72+
release alongside the registry-level default. Specify `provider=` explicitly on each
73+
`ModelConfig` instead — the workflow now emits a `DeprecationWarning` when entered.
74+
See [issue #589](https://github.com/NVIDIA-NeMo/DataDesigner/issues/589).
75+
7076
## Managing Model Configurations
7177

7278
Run the interactive model configuration command:
@@ -111,7 +117,7 @@ data-designer config list
111117
This command displays:
112118

113119
- **Model Providers**: All configured providers with their endpoints (API keys are masked)
114-
- **Default Provider**: The currently selected default provider
120+
- **Default Provider**: The currently selected default provider _(deprecated; see issue #589)_
115121
- **Model Configurations**: All configured models with their settings
116122

117123
## Resetting Configurations

docs/concepts/models/custom-model-settings.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,13 @@ preview_result.display_sample_record()
9090
!!! note "Default Providers Always Available"
9191
When you only specify `model_configs`, the default model providers (NVIDIA, OpenAI, and OpenRouter) are still available. You only need to create custom providers if you want to connect to different endpoints or modify provider settings.
9292

93+
!!! warning "Always specify `provider=` on `ModelConfig`"
94+
Leaving `provider` unset (or passing `provider=None`) on `ModelConfig` is **deprecated**.
95+
The legacy "implicit default provider" routing — used when `provider` is omitted — emits
96+
a `DeprecationWarning` and will be removed in a future release. Always reference the
97+
intended provider by name, as the examples below do. See
98+
[issue #589](https://github.com/NVIDIA-NeMo/DataDesigner/issues/589).
99+
93100
!!! tip "Mixing Custom and Default Models"
94101
When you provide custom `model_configs` to `DataDesignerConfigBuilder`, they **replace** the defaults entirely. To use custom model configs in addition to the default configs, use the add_model_config method:
95102

docs/concepts/models/default-model-settings.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,13 @@ Both methods operate on the same files, ensuring consistency across your entire
107107
!!! warning "API Key Requirements"
108108
While default model configurations are always available, you need to set the appropriate API key environment variable (`NVIDIA_API_KEY`, `OPENAI_API_KEY`, or `OPENROUTER_API_KEY`) to actually use the corresponding models for data generation. Without a valid API key, any attempt to generate data using that provider's models will fail.
109109

110+
!!! warning "Deprecated: implicit default provider routing"
111+
The `default:` key in `~/.data-designer/model_providers.yaml` and the registry-level
112+
"default provider" concept are **deprecated** and will be removed in a future release.
113+
Specify `provider=` explicitly on every `ModelConfig` instead — the built-in defaults
114+
above already do this, and a `DeprecationWarning` is now emitted whenever the legacy
115+
routing is exercised. See [issue #589](https://github.com/NVIDIA-NeMo/DataDesigner/issues/589).
116+
110117
!!! tip "Environment Variables"
111118
Store your API keys in environment variables rather than hardcoding them in your scripts:
112119

docs/concepts/models/inference-parameters.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,7 @@ dd.ModelConfig(
167167
dd.ModelConfig(
168168
alias="dalle",
169169
model="dall-e-3",
170+
provider="openai",
170171
inference_parameters=dd.ImageInferenceParams(
171172
extra_body={"size": "1024x1024", "quality": "hd"}
172173
),

docs/concepts/models/model-providers.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,14 @@ Model providers are external services that host and serve models. Data Designer
66

77
A `ModelProvider` defines how Data Designer connects to a provider's API endpoint. When you create a `ModelConfig`, you reference a provider by name, and Data Designer uses that provider's settings to make API calls to the appropriate endpoint.
88

9+
!!! warning "Deprecated: implicit default provider routing"
10+
Earlier versions of Data Designer let you omit `provider=` on `ModelConfig` and
11+
fall back to a registry-level default — including the `default:` key in
12+
`~/.data-designer/model_providers.yaml`. That implicit routing is **deprecated**
13+
and will be removed in a future release. Always reference a provider by name on
14+
every `ModelConfig`. A `DeprecationWarning` is now emitted when the legacy path
15+
is exercised. See [issue #589](https://github.com/NVIDIA-NeMo/DataDesigner/issues/589).
16+
917
## ModelProvider Configuration
1018

1119
The `ModelProvider` class has the following fields:

packages/data-designer-config/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ config_builder = dd.DataDesignerConfigBuilder(
2121
dd.ModelConfig(
2222
alias="my-model",
2323
model="nvidia/nemotron-3-nano-30b-a3b",
24+
provider="nvidia",
2425
inference_parameters=dd.ChatCompletionInferenceParams(temperature=0.7),
2526
),
2627
]

packages/data-designer-config/src/data_designer/config/default_model_settings.py

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
PREDEFINED_PROVIDERS_MODEL_MAP,
2525
)
2626
from data_designer.config.utils.io_helpers import load_config_file, save_config_file
27+
from data_designer.config.utils.warning_helpers import warn_at_caller
2728

2829
logger = logging.getLogger(__name__)
2930

@@ -95,7 +96,28 @@ def get_default_providers() -> list[ModelProvider]:
9596

9697

9798
def get_default_provider_name() -> str | None:
98-
return _get_default_providers_file_content(MODEL_PROVIDERS_FILE_PATH).get("default")
99+
"""Return the YAML's ``default:`` provider name, if set.
100+
101+
Deprecated: this function and the underlying YAML key are deprecated and
102+
will be removed in a future release. Specify ``provider=`` explicitly on
103+
each ``ModelConfig`` instead. See issue #589.
104+
"""
105+
default = _get_default_providers_file_content(MODEL_PROVIDERS_FILE_PATH).get("default")
106+
if default is not None:
107+
# ``warn_at_caller`` (rather than ``warnings.warn(stacklevel=2)``) so the
108+
# warning attributes to the user's call site rather than this library
109+
# module. The only real call path is ``DataDesigner.__init__``, which
110+
# is itself a ``data_designer`` frame; under default Python filters,
111+
# library-attributed ``DeprecationWarning`` entries are silenced
112+
# (``ignore::DeprecationWarning``), so library attribution = invisible
113+
# warning. See PR #594 review.
114+
warn_at_caller(
115+
f"The 'default:' key in {MODEL_PROVIDERS_FILE_PATH} is deprecated and will "
116+
"be removed in a future release. Remove it and specify provider= explicitly "
117+
"on each ModelConfig instead. See issue #589.",
118+
DeprecationWarning,
119+
)
120+
return default
99121

100122

101123
def resolve_seed_default_model_settings() -> None:

packages/data-designer-config/src/data_designer/config/models.py

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
load_image_path_to_base64,
3232
)
3333
from data_designer.config.utils.io_helpers import smart_load_yaml
34+
from data_designer.config.utils.warning_helpers import warn_at_caller
3435

3536
logger = logging.getLogger(__name__)
3637

@@ -503,7 +504,10 @@ class ModelConfig(ConfigBase):
503504
model: Model identifier (e.g., from build.nvidia.com or other providers).
504505
inference_parameters: Inference parameters for the model (temperature, top_p, max_tokens, etc.).
505506
The generation_type is determined by the type of inference_parameters.
506-
provider: Optional model provider name if using custom providers.
507+
provider: Name of the model provider. Required in a future release. Leaving
508+
``provider`` unset (or ``None``) currently routes through the registry's
509+
implicit default and is **deprecated**; specify ``provider=`` explicitly.
510+
See issue #589.
507511
skip_health_check: Whether to skip the health check for this model. Defaults to False.
508512
"""
509513

@@ -535,6 +539,22 @@ def _convert_inference_parameters(cls, value: Any) -> Any:
535539
return ChatCompletionInferenceParams(**value)
536540
return value
537541

542+
@model_validator(mode="after")
543+
def _warn_on_implicit_provider(self) -> Self:
544+
if self.provider is None:
545+
# Use ``warn_at_caller`` so the warning is attributed to the user's
546+
# ``ModelConfig(...)`` / ``model_validate(...)`` call rather than a
547+
# pydantic-internal frame. Without this, every call dedupes to the
548+
# same pydantic line and only the first emission is shown. See
549+
# PR #594 review.
550+
warn_at_caller(
551+
f"ModelConfig.provider=None is deprecated and will be required in a future release. "
552+
f"Specify provider= explicitly on ModelConfig(alias={self.alias!r}, ...). "
553+
"See issue #589.",
554+
DeprecationWarning,
555+
)
556+
return self
557+
538558

539559
class ModelProvider(ConfigBase):
540560
"""Configuration for a custom model provider.

packages/data-designer-config/src/data_designer/config/testing/fixtures.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,7 @@ def stub_model_configs() -> list[ModelConfig]:
139139
ModelConfig(
140140
alias="stub-model",
141141
model="stub-model",
142+
provider="provider-1",
142143
inference_parameters=ChatCompletionInferenceParams(
143144
temperature=0.9,
144145
top_p=0.9,

0 commit comments

Comments
 (0)