How to deploy multiple model using nvidia dynamo graph of one architecture #3882

Nikhil-sarvam · 2025-10-24T18:54:26Z

Nikhil-sarvam
Oct 24, 2025

For example, I’m using a vLLM backend with the agg_router.yaml deployment, which follows a specific architecture. As far as I know, the router in this setup is global. Now, I need to deploy another model with the same architecture on the same cluster. How can I create a separate deployment and routing mechanism without requests being misrouted to the other model? Also, how would we use the KGateway and EPP components to route requests to different models when multiple models are deployed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to deploy multiple model using nvidia dynamo graph of one architecture #3882

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to deploy multiple model using nvidia dynamo graph of one architecture #3882

Uh oh!

Nikhil-sarvam Oct 24, 2025

Replies: 0 comments

Nikhil-sarvam
Oct 24, 2025