ai-dynamo · ishandhanani · Jun 8, 2025 · Jun 9, 2025 · Jun 9, 2025 · Jun 9, 2025
diff --git a/NNNN-limited-template.md b/NNNN-limited-template.md
@@ -1,101 +1,257 @@
-# <Title>
+# How to Author and Run Dynamo Components
 
-**Status**: Draft | Under Review | Approved | Replaced | Deferred | Rejected
+**Status**: Under Review
 
-**Authors**: [Name/Team] 
+**Authors**: Ishan Dhanani, Alec Flowers
 
-**Category**: Architecture | Process | Guidelines 
+**Category**: Architecture
 
-**Replaces**: [Link of previous proposal if applicable] 
+**Required Reviewers**: Itay Neeman, Kyle Kranen, Mohammed Abdulwahhab, Maksim Khadkevich, Biswa Panda Rajan, Ryan McCormick
 
-**Replaced By**: [Link of previous proposal if applicable] 
+**Review Date**: 06/09/2025
 
-**Sponsor**: [Name of code owner or maintainer to shepard process]
+**Implementation PR / Tracking Issue**: WIP
 
-**Required Reviewers**: [Names of technical leads that are required for acceptance]
+# Background: Running a component using dynamo-serve vs dynamo-run
 
-**Review Date**: [Date for review]
+Right now we have 2 different ways of running components that go through different code paths:
 
-**Pull Request**: [Link to Pull Request of the Proposal itself]
+## dynamo-run
 
-**Implementation PR / Tracking Issue**: [Link to Pull Request or Tracking Issue for Implementation]
+- **Purpose**: CLI tool for running OAI frontend/processor and python engines
+- **Usage**: `dynamo run in=http out=vllm ~/models/Llama-3.2-3B`
+- **Architecture**: Direct component execution - each engine runs as a subprocess
+- **Control**: Explicit configuration via CLI flags
 
-# Summary
+## dynamo-serve
 
-**\[Required\]**
+- **Purpose**: Orchestrates inference graphs composed of multiple interdependent services
+- **Usage**: `dynamo serve graphs.agg:Frontend -f ./configs/agg.yaml`
+- **Architecture**: Service graph execution using circus process manager + BentoML decorators
+- **Control**: High-level via YAML configs, dependency/env var injection and `depends()`, `.link()`
 
-# Motivation
+## Current Hybrid Approach
 
-**\[Required\]**
+The frontend actually uses `dynamo-run` internally to run the OpenAI-compatible HTTP server and Rust-based processor, while `dynamo-serve` manages the overall service orchestration.
 
-Describe the problem that needs to be addressed with enough detail for
-someone familiar with the project to understand. Generally one to two
-short paragraphs. Additional details can be placed in the background
-section as needed. Cover **what** the issue is and **why** it needs to
-be addressed. Link to github issues if relevant.
+## The Problem
 
-## Goals
+Dynamo is meant to be a distributed, modular framework for building inference graphs with modern strategies like disaggregated serving and KV-aware routing. While `dynamo serve` provides a clean UX via `circusd` process management, we've created a problematic split:
 
-**\[Optional \- if not applicable omit\]**
+**Historical Context**: Pre-GTC, our components demonstrated rust↔python interoperability and were runnable with `python3 component.py <flags>` - they were pythonic and easy to hack on.
 
-List out any additional goals in bullet points. Goals may be aspirational / difficult to measure but guide the proposal. 
+**Current Issues**:
 
-* Goal
+1. We maintain two separate sets of examples - [dynamo serve](https://github.com/ai-dynamo/dynamo/blob/main/examples/sglang/components/worker.py) and [dynamo run](https://github.com/ai-dynamo/dynamo/blob/main/launch/dynamo-run/src/subprocess/sglang_inc.py) - with duplicated logic and no code sharing.
+2. Users must use `dynamo-serve` for all components - they can no longer run any example code with `python3 component.py <flags>`. This breaks the pythonic, hackable experience that made Dynamo accessible.
+3. Decorators hide critical logic, making debugging nearly impossible. Runtime injection of `CUDA_VISIBLE_DEVICES`, namespace overrides in K8s, and other "magic" configurations surprise users with no clear error messages.
+4. Large model deployment requires unintuitive flags and `.link` files.
+5. We've hidden the rust↔python interoperability that differentiates Dynamo. Users can't see or interact with our core bindings (component, namespace, endpoint).
 
-* Goal
+As we ramp up to production, fixing this UX split is critical. This proposal provides a path to maintain `dynamo serve`'s developer experience while staying true to our rust core and making components standalone + runnable.
 
-* Goal
+## Principles
+
+1. Each Dynamo component **MUST** be runnable using `python3 component.py <flags>`
+
+2. `dynamo serve` **MUST** launch components using `python3 -m component.py <flags>` internally
+
+3. We **MUST** expose dynamo core bindings (component, namespace, endpoint) directly in each component - no more hiding behind decorators
+
+4. We **MUST** unify argument parsing across standalone and serve modes (similar to current `ServiceConfig` but shared)
+
+5. We **MUST** maintain single-source components compatible with both `dynamo-run` and `dynamo serve`
+
+6. We **MUST** provide transparent configuration - no runtime injection or environment variable magic
+
+7. We **MUST** deliver clear error messages with actionable fix instructions instead of silent overrides
+
+8. We **SHOULD** enable seamless local → K8s deployment patterns matching industry standards (AIBrix, VLLM production stack, llmd)
 
 ### Non Goals
 
-**\[Optional \- if not applicable omit\]**
+1. We **SHOULD NOT** deprecate `dynamo serve` - the orchestration UX remains valuable
+
+2. This proposal will not address SDK layer abstractions like `depends()` and `link` that are currently present in the SDK
+
+## Proposal
+
+### `main` per component
+
+- Replace `serve_dynamo` with each component's main function. An example is shown below for the SGL Decode Worker.
+- Remove `@service` decorator (BentoML construct) that held `resources` and dynamo info like `namespace`
+- Add `BaseDeployment` class to store:
+  - Resources configuration
+  - Namespace info
+  - K8s deployment helper functions
+  - Default values that can be overridden
+- Eliminate confusing code:
+  - Remove `async_on_start`
+  - Remove `dynamo_endpoint`
+  - Remove runtime injection via `dynamo_context`
+- We can start using abstract classes for our request handlers to address the issue to repeated code for things like `Router` and `Frontend`
+
+```python
+from dynamo.deploy import BaseDeployment # adding this class turns this into a valid deployment
+from dynamo.sglang.utils import parse_sglang_args
+
+class Deployment(BaseDeployment):
+    namespace = "dynamo"
+    name = "sgldecode"
+    resources = {"gpu": 1}
+
+class DecodeHandler(BaseDecodeClass):
+    def __init__(self, engine_args: ServerArgs):
+        self.engine_args = engine_args
+        self.engine = sgl.Engine(server_args=self.engine_args)
+        logger.info("Decode worker initialized")
+
+    async def generate(self, req: DisaggPreprocessedRequest):
+        g = await self.engine.async_generate(
+            input_ids=req.request.token_ids,
+            sampling_params=req.sampling_params,
+            stream=True,
+            bootstrap_host=req.bootstrap_host,
+            bootstrap_port=req.bootstrap_port,
+            bootstrap_room=req.bootstrap_room,
+        )
+
+        async for result in g:
+            yield result
+
+if __name__ == "__main__":
+    import asyncio
+    import uvloop
+    from dynamo.runtime import DistributedRuntime, dynamo_worker
+
+    engine_args = parse_sglang_args()
+
+    @dynamo_worker()
+    async def worker(runtime: DistributedRuntime, engine_args):
+        worker = SGLangDecodeWorker(engine_args)
+        deploy = Deployment()
+
+        component = runtime.namespace(deploy.namespace).component(deploy.name)
+        await component.create_service()
+
+        # serve endpoint
+        endpoint = component.endpoint("generate")
+        await endpoint.serve_endpoint(worker.generate)
+
+    uvloop.install()
+    asyncio.run(worker(engine_args))
+```
+
+### FlexibleArgumentParser for Dynamo
+
+Currently - we pass in arguments for each worker via a YAML file. This YAML file is combined with any CLI overrides, saved in an environment variable, and then exported in each workers process. An end user has no idea how this works unless they dive into the `ServiceConfig` class. Instead, we propose a `DynamoFlexibleArgumentParser`. This works similarly to the current `ServiceConfig` but is expanded to also be used if a user is running `python3 component.py <flags>`.
+
+# Team Discussion Notes
+
+## 6/10/2025
+
+We had an intial discussion about the DEP where we primarily discussed the issues with the current component writing/running experience.
+
+### Issues we brought up
+
+**Running components**
+
+- Impossible to run components using python3. Major issue for development
+- Inflexibility for power users
+
+### Specific examples we discussed
 
-List out any items which are out of scope / specifically not required in bullet points. Indicates the scope of the proposal and issue being resolved.
+<details>
+<summary>Expand for pain points</summary>
 
-## Requirements
+Pain Points:
 
-**\[Optional \- if not applicable omit\]**
+1. dynamo_context: dict[str, Any] = {}
 
-List out any additional requirements in numbered subheadings.
+   - This is the definition of dynamo_context which is a variable you have to import and then gets populated at runtime
+   - You can't tell what it gets populated with, when it get populated, or where it gets populated
 
-**\<numbered subheadings\>**
+2. VLLM_WORKER_ID = dynamo_context["endpoints"][0].lease_id()
 
-### REQ \<\#\> \<Title\>
+   - To get endpoints you have to know that there is a list of endpoints
+   - Have to guess which one is the endpoint you actually want
 
-Describe the requirement in as much detail as necessary for others to understand it and how it applies to the DEP. Keep in mind that requirements should be measurable and will be used to determine if a DEP has been successfully implemented or not.
+3. Too much going on under the hood
 
-Requirement names should be prefixed using a monotonically increasing number such as “REQ 1 \<Title\>” followed by “REQ 2 \<Title\>” and so on. Use title casing when naming requirements. Requirement names should be as descriptive as possible while remaining as terse as possible.
+   - If I try to set CUDA_VISIBLE_DEVICES in the command line, dynamo-serve silently overwrites my selection
+   - Had to search source code to find fix via "magic" environment variable DYN_DISABLE_AUTO_GPU_ALLOCATION
+   - See: https://github.com/ai-dynamo/dynamo/blob/main/deploy/sdk/src/dynamo/sdk/cli/allocator.py#L37
+   - All of our dynamo logic is hidden behind the service decorator which decides which function of serve_dynamo.py to call
+   - Decorators currently stem from an abstracted version of BentoML. None of these have been architected with dynamo in mind.
 
-Use all-caps, bolded terms like **MUST** and **SHOULD** when describing each requirement. See [RFC-2119](https://datatracker.ietf.org/doc/html/rfc2119) for additional information.
+4. @endpoint() decorator is an example of above issue
 
+   - Expected simple boilerplate replacement like:
+     ```python
+     endpoint = namespace().component().endpoint()
+     endpoint.serve_endpoint(fn)
+     ```
+   - Instead get opaque class implementation:
+     ```python
+     class DynamoEndpoint(DynamoEndpointInterface):
+         """Base class for dynamo endpoints
+         Dynamo endpoints are methods decorated with @endpoint."""
+     ```
+   - Unclear what functionality is actually being added
+   - The SDK contains a lot fo this.
 
-# Proposal
+5. dynamo-serve deployment issues
 
-**\[Required\]**
+   - Doesn't work well with bare metal, slurm, or non-k8s deployments
+   - These environments require manual process launch per node
+   - Conflicts with "create graph and launch once" model
+   - This also means you cannot run any sort of profiling on each component
+   - Running multinode is not intuitive and requires work arounds and backward commands for an end user
 
-Describe the high level design / proposal. Use sub sections as needed, but start with an overview and then dig into the details. Try to provide images and diagrams to facilitate understanding.
+6. Double code
 
-# Alternate Solutions
+   - We have a lot of code that is duplicated between the `dynamo-run` and `dynamo-serve` components
+   - This is confusing and makes it difficult to maintain
+   - We should have a single source of truth for the component but this isn't possible because we can't only have dynamo-serve as a way to run these
 
-**\[Required, if not applicable write N/A\]**
+</details>
 
-List out solutions that were considered but ultimately rejected. Consider free form \- but a possible format shown below.
+### Open Questions
 
-## Alt \<\#\> \<Title\>
+- Do we need `dynamo serve`?
 
-**Pros:**
+### Things we agreed on as a team
 
-\<bulleted list or pros describing the positive aspects of this solution\>
+- Components should be runnable using `python3 component.py <flags>`
+- Handling of CLI flags and the ServiceConfig should be much simpler
 
-**Cons:**
+### Next Steps
 
-\<bulleted list or pros describing the negative aspects of this solution\>
+- Small group discussion between component writers and SDK team
+- Daily standup starting Thursday to track
+- Goal is to come to an understanding and conclusion by end of the week with goal to ship unified dev experience
 
-**Reason Rejected:**
+# Appendix
 
-\<bulleted list or pros describing why this option was not used\>
+## Unified README/examples
 
-**Notes:**
+Work in progress. Would love feedback!
 
-\<optional: additional comments about this solution\>
+```bash
+examples/
+├── README.md
+├── common/
+│ ├── frontend.py
+│ └── base_classes.py
+├── sglang/
+│ ├── sglang_engine.py
+│ └── utils.py
+├── vllm/
+│ ├── vllm_engine.py
+│ └── utils.py
+└── trtllm/
+│ ├── trtllm_engine.py
+│ └── utils.py
+```
 
+Each engine's `utils.py` would contain things like argument parsing/validation and any other helper functions. The `common/base_classes.py` could contain abstract classes for the `BaseDecodeWorker` or the `BaseWorker` class.