- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.2k
fix(inference): enable routing of models with provider_data alone #3928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Merged
      
      
    Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    Assume a remote inference provider which works only when users provide their own API keys via provider_data. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. Added an integration test
              
                    ehhuang
  
              
              reviewed
              
                  
                    Oct 28, 2025 
                  
              
              
            
            
Co-authored-by: ehhuang <[email protected]>
              
                    ehhuang
  
              
              reviewed
              
                  
                    Oct 28, 2025 
                  
              
              
            
            
              
                    ehhuang
  
              
              approved these changes
              
                  
                    Oct 28, 2025 
                  
              
              
            
            
The telemetry module was moved from llama_stack.apis.telemetry to llama_stack.core.telemetry in PR llamastack#3919. This updates the import in the provider data routing test to use the new location.
    
  ashwinb 
      added a commit
      that referenced
      this pull request
    
      Oct 28, 2025 
    
    
      
  
    
      
    
  
) This PR enables routing of fully qualified model IDs of the form `provider_id/model_id` even when the models are not registered with the Stack. Here's the situation: assume a remote inference provider which works only when users provide their own API keys via `X-LlamaStack-Provider-Data` header. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. ## Test Plan Added an integration test Closes #3929 --------- Co-authored-by: ehhuang <[email protected]>
    
  ashwinb 
      added a commit
        to ashwinb/llama-stack
      that referenced
      this pull request
    
      Oct 30, 2025 
    
    
      
  
    
      
    
  
…amastack#3928) This PR enables routing of fully qualified model IDs of the form `provider_id/model_id` even when the models are not registered with the Stack. Here's the situation: assume a remote inference provider which works only when users provide their own API keys via `X-LlamaStack-Provider-Data` header. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. Added an integration test Closes llamastack#3929 --------- Co-authored-by: ehhuang <[email protected]>
  This was referenced Oct 30, 2025 
      
    
  ashwinb 
      added a commit
      that referenced
      this pull request
    
      Oct 31, 2025 
    
    
      
  
    
      
    
  
## Summary Cherry-picks 5 critical fixes from main to the release-0.3.x branch for the v0.3.1 release, plus CI workflow updates. **Note**: This recreates the cherry-picks from the closed PR #3991, now targeting the renamed `release-0.3.x` branch (previously `release-0.3.x-maint`). ## Commits 1. **2c56a8560** - fix(context): prevent provider data leak between streaming requests (#3924) - **CRITICAL SECURITY FIX**: Prevents provider credentials from leaking between requests - Fixed import path for 0.3.0 compatibility 2. **ddd32b187** - fix(inference): enable routing of models with provider_data alone (#3928) - Enables routing for fully qualified model IDs with provider_data - Resolved merge conflicts, adapted for 0.3.0 structure 3. **f7c2973aa** - fix: Avoid BadRequestError due to invalid max_tokens (#3667) - Fixes failures with Gemini and other providers that reject max_tokens=0 - Non-breaking API change 4. **d7f9da616** - fix(responses): sync conversation before yielding terminal events in streaming (#3888) - Ensures conversation sync executes even when streaming consumers break early 5. **0ffa8658b** - fix(logging): ensure logs go to stderr, loggers obey levels (#3885) - Fixes logging infrastructure 6. **75b49cb3c** - ci: support release branches and match client branch (#3990) - Updates CI workflows to support release-X.Y.x branches - Matches client branch from llama-stack-client-python for release testing - Fixes artifact name collisions ## Adaptations for 0.3.0 - Fixed import paths: `llama_stack.core.telemetry.tracing` → `llama_stack.providers.utils.telemetry.tracing` - Fixed import paths: `llama_stack.core.telemetry.telemetry` → `llama_stack.apis.telemetry` - Changed `self.telemetry_enabled` → `self.telemetry` (0.3.0 attribute name) - Removed `rerank()` method that doesn't exist in 0.3.0 ## Testing All imports verified and tests should pass once CI is set up.
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
This PR enables routing of fully qualified model IDs of the form
provider_id/model_ideven when the models are not registered with the Stack.Here's the situation: assume a remote inference provider which works only when users provide their own API keys via
X-LlamaStack-Provider-Dataheader. By definition, we cannot list models and hence update our routing registry. But because we require a provider ID in the models now, we can identify which provider to route to and let that provider decide.Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up.
Also, updated inference router so that the responses have the exact model that the request had.
Test Plan
Added an integration test
Closes #3929