fix(inference): enable routing of models with provider_data alone #3928

ashwinb · 2025-10-28T01:58:53Z

This PR enables routing of fully qualified model IDs of the form provider_id/model_id even when the models are not registered with the Stack.

Here's the situation: assume a remote inference provider which works only when users provide their own API keys via X-LlamaStack-Provider-Data header. By definition, we cannot list models and hence update our routing registry. But because we require a provider ID in the models now, we can identify which provider to route to and let that provider decide.

Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up.

Also, updated inference router so that the responses have the exact model that the request had.

Test Plan

Added an integration test

Closes #3929

Assume a remote inference provider which works only when users provide their own API keys via provider_data. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. Added an integration test

src/llama_stack/core/routers/inference.py

Co-authored-by: ehhuang <[email protected]>

tests/integration/telemetry/test_completions.py

The telemetry module was moved from llama_stack.apis.telemetry to llama_stack.core.telemetry in PR llamastack#3919. This updates the import in the provider data routing test to use the new location.

) This PR enables routing of fully qualified model IDs of the form `provider_id/model_id` even when the models are not registered with the Stack. Here's the situation: assume a remote inference provider which works only when users provide their own API keys via `X-LlamaStack-Provider-Data` header. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. ## Test Plan Added an integration test Closes #3929 --------- Co-authored-by: ehhuang <[email protected]>

…amastack#3928) This PR enables routing of fully qualified model IDs of the form `provider_id/model_id` even when the models are not registered with the Stack. Here's the situation: assume a remote inference provider which works only when users provide their own API keys via `X-LlamaStack-Provider-Data` header. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. Added an integration test Closes llamastack#3929 --------- Co-authored-by: ehhuang <[email protected]>

## Summary Cherry-picks 5 critical fixes from main to the release-0.3.x branch for the v0.3.1 release, plus CI workflow updates. **Note**: This recreates the cherry-picks from the closed PR #3991, now targeting the renamed `release-0.3.x` branch (previously `release-0.3.x-maint`). ## Commits 1. **2c56a8560** - fix(context): prevent provider data leak between streaming requests (#3924) - **CRITICAL SECURITY FIX**: Prevents provider credentials from leaking between requests - Fixed import path for 0.3.0 compatibility 2. **ddd32b187** - fix(inference): enable routing of models with provider_data alone (#3928) - Enables routing for fully qualified model IDs with provider_data - Resolved merge conflicts, adapted for 0.3.0 structure 3. **f7c2973aa** - fix: Avoid BadRequestError due to invalid max_tokens (#3667) - Fixes failures with Gemini and other providers that reject max_tokens=0 - Non-breaking API change 4. **d7f9da616** - fix(responses): sync conversation before yielding terminal events in streaming (#3888) - Ensures conversation sync executes even when streaming consumers break early 5. **0ffa8658b** - fix(logging): ensure logs go to stderr, loggers obey levels (#3885) - Fixes logging infrastructure 6. **75b49cb3c** - ci: support release branches and match client branch (#3990) - Updates CI workflows to support release-X.Y.x branches - Matches client branch from llama-stack-client-python for release testing - Fixes artifact name collisions ## Adaptations for 0.3.0 - Fixed import paths: `llama_stack.core.telemetry.tracing` → `llama_stack.providers.utils.telemetry.tracing` - Fixed import paths: `llama_stack.core.telemetry.telemetry` → `llama_stack.apis.telemetry` - Changed `self.telemetry_enabled` → `self.telemetry` (0.3.0 attribute name) - Removed `rerank()` method that doesn't exist in 0.3.0 ## Testing All imports verified and tests should pass once CI is set up.

ashwinb requested review from bbrowning, ehhuang, franciscojavierarceo, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners October 28, 2025 01:58

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 28, 2025

fixes

8cce689

ashwinb mentioned this pull request Oct 28, 2025

Fix routing for unregistered models using provider_id/model_id format #3929

Closed

ehhuang reviewed Oct 28, 2025

View reviewed changes

src/llama_stack/core/routers/inference.py Outdated Show resolved Hide resolved

Update src/llama_stack/core/routers/inference.py

73f4762

Co-authored-by: ehhuang <[email protected]>

ehhuang reviewed Oct 28, 2025

View reviewed changes

tests/integration/telemetry/test_completions.py Show resolved Hide resolved

ehhuang approved these changes Oct 28, 2025

View reviewed changes

ashwinb added 2 commits October 28, 2025 10:50

fix: update telemetry import path after module move

80f836a

The telemetry module was moved from llama_stack.apis.telemetry to llama_stack.core.telemetry in PR llamastack#3919. This updates the import in the provider data routing test to use the new location.

fix

fc3ca6a

ashwinb merged commit f88416e into llamastack:main Oct 28, 2025
23 checks passed

ashwinb deleted the fix_remote_providers branch October 28, 2025 18:16

This was referenced Oct 30, 2025

feat(cherry-pick): fixes for 0.3.1 release #3991

Closed

feat(cherry-pick): fixes for 0.3.1 release #3998

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(inference): enable routing of models with provider_data alone #3928

fix(inference): enable routing of models with provider_data alone #3928

Uh oh!

ashwinb commented Oct 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(inference): enable routing of models with provider_data alone #3928

fix(inference): enable routing of models with provider_data alone #3928

Uh oh!

Conversation

ashwinb commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ashwinb commented Oct 28, 2025 •

edited

Loading