Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
258 changes: 207 additions & 51 deletions NNNN-limited-template.md
Original file line number Diff line number Diff line change
@@ -1,101 +1,257 @@
# <Title>
# How to Author and Run Dynamo Components

**Status**: Draft | Under Review | Approved | Replaced | Deferred | Rejected
**Status**: Under Review

**Authors**: [Name/Team]
**Authors**: Ishan Dhanani, Alec Flowers

**Category**: Architecture | Process | Guidelines
**Category**: Architecture

**Replaces**: [Link of previous proposal if applicable]
**Required Reviewers**: Itay Neeman, Kyle Kranen, Mohammed Abdulwahhab, Maksim Khadkevich, Biswa Panda Rajan, Ryan McCormick

**Replaced By**: [Link of previous proposal if applicable]
**Review Date**: 06/09/2025

**Sponsor**: [Name of code owner or maintainer to shepard process]
**Implementation PR / Tracking Issue**: WIP

**Required Reviewers**: [Names of technical leads that are required for acceptance]
# Background: Running a component using dynamo-serve vs dynamo-run

**Review Date**: [Date for review]
Right now we have 2 different ways of running components that go through different code paths:

**Pull Request**: [Link to Pull Request of the Proposal itself]
## dynamo-run

**Implementation PR / Tracking Issue**: [Link to Pull Request or Tracking Issue for Implementation]
- **Purpose**: CLI tool for running OAI frontend/processor and python engines
- **Usage**: `dynamo run in=http out=vllm ~/models/Llama-3.2-3B`
- **Architecture**: Direct component execution - each engine runs as a subprocess
- **Control**: Explicit configuration via CLI flags

# Summary
## dynamo-serve

**\[Required\]**
- **Purpose**: Orchestrates inference graphs composed of multiple interdependent services
- **Usage**: `dynamo serve graphs.agg:Frontend -f ./configs/agg.yaml`
- **Architecture**: Service graph execution using circus process manager + BentoML decorators
- **Control**: High-level via YAML configs, dependency/env var injection and `depends()`, `.link()`

# Motivation
## Current Hybrid Approach

**\[Required\]**
The frontend actually uses `dynamo-run` internally to run the OpenAI-compatible HTTP server and Rust-based processor, while `dynamo-serve` manages the overall service orchestration.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean that the frontend uses dynamo-run?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The frontend component of a graph runs dynamo-run as a subprocess in order to start the rust server/processor. Here's an example https://github.com/ai-dynamo/dynamo/blob/main/examples/sglang/components/frontend.py


Describe the problem that needs to be addressed with enough detail for
someone familiar with the project to understand. Generally one to two
short paragraphs. Additional details can be placed in the background
section as needed. Cover **what** the issue is and **why** it needs to
be addressed. Link to github issues if relevant.
## The Problem

## Goals
Dynamo is meant to be a distributed, modular framework for building inference graphs with modern strategies like disaggregated serving and KV-aware routing. While `dynamo serve` provides a clean UX via `circusd` process management, we've created a problematic split:

**\[Optional \- if not applicable omit\]**
**Historical Context**: Pre-GTC, our components demonstrated rust↔python interoperability and were runnable with `python3 component.py <flags>` - they were pythonic and easy to hack on.

List out any additional goals in bullet points. Goals may be aspirational / difficult to measure but guide the proposal.
**Current Issues**:

* Goal
1. We maintain two separate sets of examples - [dynamo serve](https://github.com/ai-dynamo/dynamo/blob/main/examples/sglang/components/worker.py) and [dynamo run](https://github.com/ai-dynamo/dynamo/blob/main/launch/dynamo-run/src/subprocess/sglang_inc.py) - with duplicated logic and no code sharing.
2. Users must use `dynamo-serve` for all components - they can no longer run any example code with `python3 component.py <flags>`. This breaks the pythonic, hackable experience that made Dynamo accessible.
3. Decorators hide critical logic, making debugging nearly impossible. Runtime injection of `CUDA_VISIBLE_DEVICES`, namespace overrides in K8s, and other "magic" configurations surprise users with no clear error messages.
4. Large model deployment requires unintuitive flags and `.link` files.
5. We've hidden the rust↔python interoperability that differentiates Dynamo. Users can't see or interact with our core bindings (component, namespace, endpoint).

* Goal
As we ramp up to production, fixing this UX split is critical. This proposal provides a path to maintain `dynamo serve`'s developer experience while staying true to our rust core and making components standalone + runnable.

* Goal
## Principles

1. Each Dynamo component **MUST** be runnable using `python3 component.py <flags>`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we explain why this is critical? The explanation of "this is more Pythonic/hackable" isn't enough - what is a user trying to do for which they need this, what is the workflow that the lack of this is blocking?

Copy link
Author

@ishandhanani ishandhanani Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example is running a single model split across 2 nodes. Lets use SGLang Disagg as an example

In order to run TP>8 - you have to run the sglang worker with arg node_rank=0 on the head node and then node_rank=1 on the child node. In the current paradigm - that means you have to do

# node 1 - spin up the frontend, processor, and shard 1 of prefill worker but not any of the decode worker
dynamo serve graphs.agg:Frontend -f configs/agg_head.yaml

# node 2 - spin up shard 2 of the prefill worker 
dynamo serve graphs.agg:Frontend -f configs/agg_child.yaml --service-name SGLangWorker

# node 3 - spin up only shard 1 of the decode worker
dynamo serve graphs.disagg:Frontend -f configs/disagg_head.yaml --service-name SGLangDecodeWorker

# node 4 - spin up only shard 2 of decode worker
dynamo serve graphs.disagg:Frontend -f configs/disagg_child.yaml --service-name SGLangDecodeWorker

Note that I am having to run agg on the first 2 nodes, disagg on the next 2, and specify the --service-name on nodes 2 through 4. It would be great if I could just do

#node 1
dynamo-run in=http out=dyn &
python3 prefill.py -f configs/disagg_head.yaml

# node 2
python3 prefill.py -f configs/disagg_child.yaml

# node 3
python3 decode.py -f configs/disagg_child.yaml

# node 3
python3 decode.py -f configs/disagg_child.yaml

Copy link
Author

@ishandhanani ishandhanani Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to use vllm_v1 to demonstrate the following

Another example is running each component on a different node. Say I have 4 H100 nodes and I am trying to run a

  1. Frontend
  2. Load Balancer
  3. 2 PrefillWorker
  4. 2 DecodeWorkers

I'll try a different approach to the one above and instead try to just run each component separately without picking and choosing from some link graph. Here's what I have to do to spin things up on node 1

# node 1 - spin up the CPU processes (frontend and load balancer) and prefill worker 1
# I have to keep specifying service-name or else the `depends` will force everything to spin up
# unless I sed the `depends` out....bad UX
dynamo serve components.frontend:Frontend -f config.yaml --service-name Frontend
dynamo serve components.simple_load_balancer:SimpleLoadBalancer -f config.yaml --service-name SimpleLoadBalancer
dynamo serve components.worker:VllmPrefillWorker -f config.yaml 

I want to do the following on node 1

dynamo run in=http out=dyn
python3 simple_load_balancer.py -f config.yaml
python3 prefill_worker.py -f config.yaml

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details. On the first example, I am not sure I see a huge delta between the two cases. On the second, I can see some of the issues. In general, kind of odd to see config.yaml used in both cases, I'd expect the pure Python version to be just command line args.

Perhaps more importantly, the use case for this seems to be more for the "Dynamo developer", rather than a normal user? Not to say that isn't important, but just want to make sure we're talking about the same persona.

Copy link
Author

@ishandhanani ishandhanani Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the first example, I am not sure I see a huge delta between the two cases

I guess something like dynamo serve graphs.agg:Frontend -f configs/agg_child.yaml --service-name SGLangWorker is a bit unintuitive for me. Why am I accessing the entire graph and selecting the PrefillWorker service from it? Is it not easier just to run the prefill worker python code?

Additionally - in order to run this full example across nodes - I have to run the agg graph and then run the disagg graph in order to avoid the depends statement in the prefill worker. If I don't then the decode worker spins up and causes an error because the it has no GPUs available.

In general, kind of odd to see config.yaml used in both cases, I'd expect the pure Python version to be just command line args.

Yea CLI args also work. A YAML/JSON parser would be helpful because these FWs require many many args to get things spun up sometimes. But this might just be the Dynamo Developer persona talking :)

Perhaps more importantly, the use case for this seems to be more for the "Dynamo developer", rather than a normal user? Not to say that isn't important, but just want to make sure we're talking about the same persona.

This is an interesting thought and it does bring up the question of who our ICP is for Dynamo? In my eyes - people that will be deploying our software at data center scale will want maximum flexibility to tinker with our FW (something I don't believe is present if I cannot run python code with python3). Ping to @harryskim for viz on this discussion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think the form should be something like:

dynamo serve graph {optional config} --node-list node_list.yaml

If you are serving a graph and you need multiple nodes for that graph that should be encapsulated in serve ....

having to run the command on multiple nodes with different slices - does seem to defeat the purpose of dynamo serve . dynamo serve, serves a graph with the required resources. So that should by definition be able to handle multi-node graphs.

It doesn't today. And maybe - with things like microk8s - we don't need this intermediate state of serving a graph without k8s or slurm.

Copy link
Author

@ishandhanani ishandhanani Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VLLM and SGLang both let you run disagg serving + kv routing using just raw python in single/multinode settings.

To be clear - if I can run python3 sglang.launch_server .... to run an SGLang prefill worker, why can't I do python3 prefill_worker.py ... in Dynamo? That is a degradation in UX to me


2. `dynamo serve` **MUST** launch components using `python3 -m component.py <flags>` internally

3. We **MUST** expose dynamo core bindings (component, namespace, endpoint) directly in each component - no more hiding behind decorators

4. We **MUST** unify argument parsing across standalone and serve modes (similar to current `ServiceConfig` but shared)

5. We **MUST** maintain single-source components compatible with both `dynamo-run` and `dynamo serve`

6. We **MUST** provide transparent configuration - no runtime injection or environment variable magic

7. We **MUST** deliver clear error messages with actionable fix instructions instead of silent overrides

8. We **SHOULD** enable seamless local → K8s deployment patterns matching industry standards (AIBrix, VLLM production stack, llmd)

### Non Goals

**\[Optional \- if not applicable omit\]**
1. We **SHOULD NOT** deprecate `dynamo serve` - the orchestration UX remains valuable
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will contend that this is incorrect, or rather, that an explicit goal is that there is only one way to run a Dynamo "graph", and we cannot have both dynamo-run and dynamo serve.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with you here. We've been using dynamo-run to describe the launcher for the rust frontend and the processor and less as a way to launch the other components. That's why its also used in the frontend. It's not really a launcher for the graph. It's a launcher for the rust bits

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood - just not a good situation :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should only be one way to:

  • define and implement a component in python and rust
  • define a graph in python and rust (maybe yaml?)
  • launch a graph locally
  • launch a graph in k8s and slurm

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed there should be one way. I'm not sure I understand why dynamo-run is needed if the components can run via their main function and dynamo serve launches a graph.


2. This proposal will not address SDK layer abstractions like `depends()` and `link` that are currently present in the SDK

## Proposal

### `main` per component

- Replace `serve_dynamo` with each component's main function. An example is shown below for the SGL Decode Worker.
- Remove `@service` decorator (BentoML construct) that held `resources` and dynamo info like `namespace`
- Add `BaseDeployment` class to store:
- Resources configuration
- Namespace info
- K8s deployment helper functions
- Default values that can be overridden
- Eliminate confusing code:
- Remove `async_on_start`
- Remove `dynamo_endpoint`
- Remove runtime injection via `dynamo_context`
- We can start using abstract classes for our request handlers to address the issue to repeated code for things like `Router` and `Frontend`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue for abstraction is not the definition of the request handlers - but the proxy / use objects.

That is, how to abstract the component, and namespace from the caller in a typed way.

Abstraction for the request handlers - is great for reuse at the implementation of the request handler - but I think doesn't complete the 'proxy objects' on the client side.

Copy link
Author

@ishandhanani ishandhanani Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, how to abstract the component, and namespace from the caller in a typed way.

I think this is addressed in the Deployment class no? I also don't think I fully understand what you mean by proxy/use objects

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like:

https://github.com/ai-dynamo/dynamo/blob/8115584715b8571b0e2c1216a6e6c9135a13b364/examples/llm/components/processor.py#L58

The specific implementation of "LLMWorker" type is hidden from the client but still typed ....

to me that was the key abstract of the Abstraction

Does Deployment class do that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific implementation of "LLMWorker" type is hidden from the client but still typed ....

This depends is not even used for that purpose. It's a remnant of BentoML. It's only there so that the LLMWorker will spin up using dynamo serve.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As others have said, The point of having the LLMWorker interface was to define types. ie so I can swap-out a different LLM backend if I want with the same interface.

In addition I think we can remove a lot of boilerplate in setting up the endpoints, if we have some base component or decorators.

What would BaseDeployment do?


```python
from dynamo.deploy import BaseDeployment # adding this class turns this into a valid deployment
from dynamo.sglang.utils import parse_sglang_args

class Deployment(BaseDeployment):
namespace = "dynamo"
name = "sgldecode"
resources = {"gpu": 1}

class DecodeHandler(BaseDecodeClass):
def __init__(self, engine_args: ServerArgs):
self.engine_args = engine_args
self.engine = sgl.Engine(server_args=self.engine_args)
logger.info("Decode worker initialized")

async def generate(self, req: DisaggPreprocessedRequest):
g = await self.engine.async_generate(
input_ids=req.request.token_ids,
sampling_params=req.sampling_params,
stream=True,
bootstrap_host=req.bootstrap_host,
bootstrap_port=req.bootstrap_port,
bootstrap_room=req.bootstrap_room,
)

async for result in g:
yield result

if __name__ == "__main__":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code going to be 90% the same in every component? Not saying that's a problem, but want to make sure I understand.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea 90% is about right

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something for us to think about then.

Copy link
Author

@ishandhanani ishandhanani Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current approach is having these bits in serve_dynamo.py which is also why we cannot run each component without dynamo-serve. I personally am not a fan of abstractions especially since we have not found a stable API yet which is why I'm advocating for this being explicitly written out. I think it very much helps readability as well.

If we can find a middle ground I'm 100% open to looking at things differently.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, that wasn’t a “this is bad and we should think how to fix this”, but rather for us to decide how we feel about it.

An answer might be “we are fine with this for now and will revisit when we see stability or it starts really hurting”

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would separate the "how" from the "what" - meaning we have a set of requirements - the exact look and feel of the decorators can change. They should all be part of the runtime and not part of deploy.

That is - I'm not sure the boiler place of defining a component, a service, attaching a method to an endpoint is all really necessary if it can be hidden behind a decorator:

@dynamo_component(namespace="")
class Foo:
    @dynamo_endpoint()
     def process_image(x:type):

Copy link
Author

@ishandhanani ishandhanani Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the abstractions need to be ironed out

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, its a lot of repeated boiler plate and a simple method to wrap these common pattern will follow DRY pattern.

import asyncio
import uvloop
from dynamo.runtime import DistributedRuntime, dynamo_worker

engine_args = parse_sglang_args()

@dynamo_worker()
async def worker(runtime: DistributedRuntime, engine_args):
worker = SGLangDecodeWorker(engine_args)
deploy = Deployment()

component = runtime.namespace(deploy.namespace).component(deploy.name)
await component.create_service()

# serve endpoint
endpoint = component.endpoint("generate")
await endpoint.serve_endpoint(worker.generate)
Comment on lines +129 to +139

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the correct abstractions this boilerplate can completely go away.

This gets worse if I want to have multiple endpoints in a component / service.


uvloop.install()
asyncio.run(worker(engine_args))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You passed in engine_args but not runtime - where is that defined?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime is injected via the dynamo_worker() decorator.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why keep that decorator - what purpose is it saving? It seemed like the document was making an argument that these magic decorators are not helping.

Copy link
Author

@ishandhanani ishandhanani Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to revisit this and maybe we should. That decorator comes from the original rust<>python bindings. Code for it is here for reference https://github.com/ai-dynamo/dynamo/blob/47c3dad73957c2c7b8b802109bac1c96b859b9de/lib/bindings/python/src/dynamo/runtime/__init__.py#L34

```

### FlexibleArgumentParser for Dynamo
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show an example of this?


Currently - we pass in arguments for each worker via a YAML file. This YAML file is combined with any CLI overrides, saved in an environment variable, and then exported in each workers process. An end user has no idea how this works unless they dive into the `ServiceConfig` class. Instead, we propose a `DynamoFlexibleArgumentParser`. This works similarly to the current `ServiceConfig` but is expanded to also be used if a user is running `python3 component.py <flags>`.

# Team Discussion Notes

## 6/10/2025

We had an intial discussion about the DEP where we primarily discussed the issues with the current component writing/running experience.

### Issues we brought up

**Running components**

- Impossible to run components using python3. Major issue for development
- Inflexibility for power users

### Specific examples we discussed

List out any items which are out of scope / specifically not required in bullet points. Indicates the scope of the proposal and issue being resolved.
<details>
<summary>Expand for pain points</summary>

## Requirements
Pain Points:

**\[Optional \- if not applicable omit\]**
1. dynamo_context: dict[str, Any] = {}

List out any additional requirements in numbered subheadings.
- This is the definition of dynamo_context which is a variable you have to import and then gets populated at runtime
- You can't tell what it gets populated with, when it get populated, or where it gets populated

**\<numbered subheadings\>**
2. VLLM_WORKER_ID = dynamo_context["endpoints"][0].lease_id()

### REQ \<\#\> \<Title\>
- To get endpoints you have to know that there is a list of endpoints
- Have to guess which one is the endpoint you actually want

Describe the requirement in as much detail as necessary for others to understand it and how it applies to the DEP. Keep in mind that requirements should be measurable and will be used to determine if a DEP has been successfully implemented or not.
3. Too much going on under the hood

Requirement names should be prefixed using a monotonically increasing number such as “REQ 1 \<Title\>” followed by “REQ 2 \<Title\>” and so on. Use title casing when naming requirements. Requirement names should be as descriptive as possible while remaining as terse as possible.
- If I try to set CUDA_VISIBLE_DEVICES in the command line, dynamo-serve silently overwrites my selection
- Had to search source code to find fix via "magic" environment variable DYN_DISABLE_AUTO_GPU_ALLOCATION
- See: https://github.com/ai-dynamo/dynamo/blob/main/deploy/sdk/src/dynamo/sdk/cli/allocator.py#L37
- All of our dynamo logic is hidden behind the service decorator which decides which function of serve_dynamo.py to call
- Decorators currently stem from an abstracted version of BentoML. None of these have been architected with dynamo in mind.

Use all-caps, bolded terms like **MUST** and **SHOULD** when describing each requirement. See [RFC-2119](https://datatracker.ietf.org/doc/html/rfc2119) for additional information.
4. @endpoint() decorator is an example of above issue

- Expected simple boilerplate replacement like:
```python
endpoint = namespace().component().endpoint()
endpoint.serve_endpoint(fn)
```
- Instead get opaque class implementation:
```python
class DynamoEndpoint(DynamoEndpointInterface):
"""Base class for dynamo endpoints
Dynamo endpoints are methods decorated with @endpoint."""
```
- Unclear what functionality is actually being added
- The SDK contains a lot fo this.

# Proposal
5. dynamo-serve deployment issues

**\[Required\]**
- Doesn't work well with bare metal, slurm, or non-k8s deployments
- These environments require manual process launch per node
- Conflicts with "create graph and launch once" model
- This also means you cannot run any sort of profiling on each component
- Running multinode is not intuitive and requires work arounds and backward commands for an end user

Describe the high level design / proposal. Use sub sections as needed, but start with an overview and then dig into the details. Try to provide images and diagrams to facilitate understanding.
6. Double code

# Alternate Solutions
- We have a lot of code that is duplicated between the `dynamo-run` and `dynamo-serve` components
- This is confusing and makes it difficult to maintain
- We should have a single source of truth for the component but this isn't possible because we can't only have dynamo-serve as a way to run these

**\[Required, if not applicable write N/A\]**
</details>

List out solutions that were considered but ultimately rejected. Consider free form \- but a possible format shown below.
### Open Questions

## Alt \<\#\> \<Title\>
- Do we need `dynamo serve`?

**Pros:**
### Things we agreed on as a team

\<bulleted list or pros describing the positive aspects of this solution\>
- Components should be runnable using `python3 component.py <flags>`
- Handling of CLI flags and the ServiceConfig should be much simpler

**Cons:**
### Next Steps

\<bulleted list or pros describing the negative aspects of this solution\>
- Small group discussion between component writers and SDK team
- Daily standup starting Thursday to track
- Goal is to come to an understanding and conclusion by end of the week with goal to ship unified dev experience

**Reason Rejected:**
# Appendix

\<bulleted list or pros describing why this option was not used\>
## Unified README/examples

**Notes:**
Work in progress. Would love feedback!

\<optional: additional comments about this solution\>
```bash
examples/
├── README.md
├── common/
│ ├── frontend.py
│ └── base_classes.py
├── sglang/
│ ├── sglang_engine.py
│ └── utils.py
├── vllm/
│ ├── vllm_engine.py
│ └── utils.py
└── trtllm/
│ ├── trtllm_engine.py
│ └── utils.py
```

Each engine's `utils.py` would contain things like argument parsing/validation and any other helper functions. The `common/base_classes.py` could contain abstract classes for the `BaseDecodeWorker` or the `BaseWorker` class.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. I think we need to find a way where the "common" path (i.e. running one of these engines) is not dependent on "examples", but rather properly in the product surface area. But likely a separate discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we establish a direction here in terms of how to define a component or any changes we want to make there - the goal is to move production components to top level:

components/engines
components/engines/vllm
components/common/
components/

this can then be a sub package:

dynamo.components

along side

dynamo.runtime
dynamo.llm