ai-dynamo · dagil-nvidia · Oct 16, 2025
diff --git a/README.md b/README.md
@@ -59,9 +59,9 @@ Dynamo is designed to be inference engine agnostic (supports TRT-LLM, vLLM, SGLa
 | [**Disaggregated Serving**](/docs/architecture/disagg_serving.md)                                 | ✅   | ✅     | ✅           |
 | [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧   | 🚧     | 🚧           |
 | [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md)                                    | ✅   | ✅     | ✅           |
-| [**Load Based Planner**](/docs/architecture/load_planner.md)                                      | 🚧   | 🚧     | 🚧           |
-| [**SLA-Based Planner**](/docs/architecture/sla_planner.md)                                        | ✅   | ✅     | ✅           |
-| [**KVBM**](/docs/architecture/kvbm_architecture.md)                                               | ✅   | 🚧     | ✅           |
+| [**Load Based Planner**](docs/planner/load_planner.md)                                      | 🚧   | 🚧     | 🚧           |
+| [**SLA-Based Planner**](docs/planner/sla_planner.md)                                        | ✅   | ✅     | ✅           |
+| [**KVBM**](docs/kvbm/kvbm_architecture.md)                                               | ✅   | 🚧     | ✅           |
 
 To learn more about each framework and their capabilities, check out each framework's README!
 
@@ -74,7 +74,7 @@ Built in Rust for performance and in Python for extensibility, Dynamo is fully o
 # Installation
 
 The following examples require a few system level packages.
-Recommended to use Ubuntu 24.04 with a x86_64 CPU. See [docs/support_matrix.md](docs/support_matrix.md)
+Recommended to use Ubuntu 24.04 with a x86_64 CPU. See [docs/reference/support-matrix.md](docs/reference/support-matrix.md)
 
 ## 1. Initial setup
 

diff --git a/components/backends/sglang/prometheus.md b/components/backends/sglang/prometheus.md
@@ -10,7 +10,7 @@ When running SGLang through Dynamo, SGLang engine metrics are automatically pass
 
 For the complete and authoritative list of all SGLang metrics, always refer to the official documentation linked above.
 
-Dynamo runtime metrics are documented in [docs/guides/metrics.md](../../../docs/guides/metrics.md).
+Dynamo runtime metrics are documented in [docs/observability/metrics.md](../../observability/metrics.md).
 
 ## Metric Reference
 
@@ -91,7 +91,7 @@ sglang:cache_hit_rate{model_name="meta-llama/Llama-3.1-8B-Instruct"} 0.0075
 - [SGLang GitHub - Metrics Collector](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/metrics/collector.py)
 
 ### Dynamo Metrics
-- **Dynamo Metrics Guide**: See `docs/guides/metrics.md` for complete documentation on Dynamo runtime metrics
+- **Dynamo Metrics Guide**: See [docs/observability/metrics.md](../../observability/metrics.md) for complete documentation on Dynamo runtime metrics
 - **Dynamo Runtime Metrics**: Metrics prefixed with `dynamo_*` for runtime, components, endpoints, and namespaces
   - Implementation: `lib/runtime/src/metrics.rs` (Rust runtime metrics)
   - Metric names: `lib/runtime/src/metrics/prometheus_names.rs` (metric name constants)

diff --git a/components/backends/vllm/deploy/README.md b/components/backends/vllm/deploy/README.md
@@ -237,7 +237,7 @@ args:
 - **Deployment Guide**: [Creating Kubernetes Deployments](../../../../docs/kubernetes/create_deployment.md)
 - **Quickstart**: [Deployment Quickstart](../../../../docs/kubernetes/README.md)
 - **Platform Setup**: [Dynamo Cloud Installation](../../../../docs/kubernetes/installation_guide.md)
-- **SLA Planner**: [SLA Planner Quickstart Guide](../../../../docs/kubernetes/sla_planner_quickstart.md)
+- **SLA Planner**: [SLA Planner Quickstart Guide](../../../../docs/planner/sla_planner_quickstart.md)
 - **Examples**: [Deployment Examples](../../../../docs/examples/README.md)
 - **Architecture Docs**: [Disaggregated Serving](../../../../docs/architecture/disagg_serving.md), [KV-Aware Routing](../../../../docs/architecture/kv_cache_routing.md)
 

diff --git a/components/backends/vllm/prometheus.md b/components/backends/vllm/prometheus.md
@@ -10,7 +10,7 @@ When running vLLM through Dynamo, vLLM engine metrics are automatically passed t
 
 For the complete and authoritative list of all vLLM metrics, always refer to the official documentation linked above.
 
-Dynamo runtime metrics are documented in [docs/guides/metrics.md](../../../docs/guides/metrics.md).
+Dynamo runtime metrics are documented in [docs/observability/metrics.md](../../observability/metrics.md).
 
 ## Metric Reference
 
@@ -96,7 +96,7 @@ vllm:time_to_first_token_seconds_sum{model_name="meta-llama/Llama-3.1-8B"} 89.38
 - [vLLM GitHub - Metrics Implementation](https://github.com/vllm-project/vllm/tree/main/vllm/engine/metrics)
 
 ### Dynamo Metrics
-- **Dynamo Metrics Guide**: See `docs/guides/metrics.md` for complete documentation on Dynamo runtime metrics
+- **Dynamo Metrics Guide**: See [docs/observability/metrics.md](../../observability/metrics.md) for complete documentation on Dynamo runtime metrics
 - **Dynamo Runtime Metrics**: Metrics prefixed with `dynamo_*` for runtime, components, endpoints, and namespaces
   - Implementation: `lib/runtime/src/metrics.rs` (Rust runtime metrics)
   - Metric names: `lib/runtime/src/metrics/prometheus_names.rs` (metric name constants)

@@ -15,4 +15,4 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->
 
-Please refer to [planner docs](../../docs/architecture/planner_intro.rst) for planner documentation.
+Please refer to [planner docs](../../../../docs/planner/planner_intro.rst) for planner documentation.
@@ -3,7 +3,7 @@
 This directory contains configuration for visualizing metrics from the metrics aggregation service using Prometheus and Grafana.
 
 > [!NOTE]
-> For detailed information about Dynamo's metrics system, including hierarchical metrics, automatic labeling, and usage examples, see the [Metrics Guide](../../docs/guides/metrics.md).
+> For detailed information about Dynamo's metrics system, including hierarchical metrics, automatic labeling, and usage examples, see the [Metrics Guide](../../docs/observability/metrics.md).
 
 ## Overview
 

@@ -19,7 +19,7 @@ Dynamo supports OpenTelemetry-based distributed tracing, allowing you to visuali
 
 ## Environment Variables
 
-Dynamo's tracing is configured via environment variables. For complete logging documentation, see [docs/guides/logging.md](../../docs/guides/logging.md).
+Dynamo's tracing is configured via environment variables. For complete logging documentation, see [docs/observability/logging.md](../../docs/observability/logging.md).
 
 ### Required Environment Variables
 

diff --git a/docs/API/nixl_connect/README.md → docs/api/nixl_connect/README.md b/docs/API/nixl_connect/README.md → docs/api/nixl_connect/README.md
diff --git a/docs/API/nixl_connect/connector.md → docs/api/nixl_connect/connector.md b/docs/API/nixl_connect/connector.md → docs/api/nixl_connect/connector.md
diff --git a/docs/API/nixl_connect/descriptor.md → docs/api/nixl_connect/descriptor.md b/docs/API/nixl_connect/descriptor.md → docs/api/nixl_connect/descriptor.md
diff --git a/docs/API/nixl_connect/device.md → docs/api/nixl_connect/device.md b/docs/API/nixl_connect/device.md → docs/api/nixl_connect/device.md
diff --git a/docs/API/nixl_connect/device_kind.md → docs/api/nixl_connect/device_kind.md b/docs/API/nixl_connect/device_kind.md → docs/api/nixl_connect/device_kind.md
diff --git a/docs/API/nixl_connect/operation_status.md → docs/api/nixl_connect/operation_status.md b/docs/API/nixl_connect/operation_status.md → docs/api/nixl_connect/operation_status.md
diff --git a/docs/API/nixl_connect/rdma_metadata.md → docs/api/nixl_connect/rdma_metadata.md b/docs/API/nixl_connect/rdma_metadata.md → docs/api/nixl_connect/rdma_metadata.md
diff --git a/docs/API/nixl_connect/read_operation.md → docs/api/nixl_connect/read_operation.md b/docs/API/nixl_connect/read_operation.md → docs/api/nixl_connect/read_operation.md
diff --git a/docs/API/nixl_connect/readable_operation.md → docs/api/nixl_connect/readable_operation.md b/docs/API/nixl_connect/readable_operation.md → docs/api/nixl_connect/readable_operation.md
diff --git a/docs/API/nixl_connect/writable_operation.md → docs/api/nixl_connect/writable_operation.md b/docs/API/nixl_connect/writable_operation.md → docs/api/nixl_connect/writable_operation.md
diff --git a/docs/API/nixl_connect/write_operation.md → docs/api/nixl_connect/write_operation.md b/docs/API/nixl_connect/write_operation.md → docs/api/nixl_connect/write_operation.md
diff --git a/docs/architecture/architecture.md b/docs/architecture/architecture.md
@@ -54,8 +54,8 @@ The following diagram outlines Dynamo's high-level architecture. To enable large
 
 - [Dynamo Disaggregated Serving](disagg_serving.md)
 - [Dynamo Smart Router](kv_cache_routing.md)
-- [Dynamo KV Cache Block Manager](kvbm_intro.rst)
-- [Planner](planner_intro.rst)
+- [Dynamo KV Cache Block Manager](../kvbm/kvbm_intro.rst)
+- [Planner](../planner/planner_intro.rst)
 - [NVIDIA Inference Transfer Library (NIXL)](https://github.com/ai-dynamo/nixl/blob/main/docs/nixl.md)
 
 Every component in the Dynamo architecture is independently scalable and portable. The API server can adapt to task-specific deployment. A smart router processes user requests to route them to the optimal worker for performance. Specifically, for Large Language Models (LLMs), Dynamo employs KV cache-aware routing, which directs requests to the worker with the highest cache hit rate while maintaining load balance, expediting decoding. This routing strategy leverages a KV cache manager that maintains a global radix tree registry for hit rate calculation. The KV cache manager also oversees a multi-tiered memory system, enabling rapid KV cache storage and eviction. This design results in substantial TTFT reductions, increased throughput, and the ability to process extensive context lengths.

diff --git a/docs/architecture/kv_cache_routing.md b/docs/architecture/kv_cache_routing.md
@@ -154,7 +154,7 @@ For improved fault tolerance, you can launch multiple frontend + router replicas
 
 ### Router State Management
 
-The KV Router tracks two types of state (see [KV Router Architecture](../components/router/README.md) for details):
+The KV Router tracks two types of state (see [KV Router Architecture](../router/README.md) for details):
 
 1. **Prefix blocks (cached KV blocks)**: Maintained in a radix tree, tracking which blocks are cached on each worker. This state is **persistent** - backed by NATS JetStream events and object store snapshots. New router replicas automatically sync this state on startup, ensuring consistent cache awareness across restarts.
 
@@ -506,4 +506,4 @@ This approach gives you complete control over routing decisions, allowing you to
 - **Maximize cache reuse**: Use `best_worker_id()` which considers both prefill and decode loads
 - **Balance load**: Consider both `potential_prefill_tokens` and `potential_decode_blocks` together
 
-See [KV Router Architecture](../components/router/README.md) for performance tuning details.
+See [KV Router Architecture](../router/README.md) for performance tuning details.
diff --git a/docs/backends/sglang/README.md b/docs/backends/sglang/README.md
@@ -37,9 +37,9 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 | [**Disaggregated Serving**](../../architecture/disagg_serving.md) | ✅ |  |
 | [**Conditional Disaggregation**](../../architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
 | [**KV-Aware Routing**](../../architecture/kv_cache_routing.md) | ✅ |  |
-| [**SLA-Based Planner**](../../architecture/sla_planner.md) | ✅ |  |
+| [**SLA-Based Planner**](../../planner/sla_planner.md) | ✅ |  |
 | [**Multimodal EPD Disaggregation**](multimodal_epd.md) | ✅ |  |
-| [**KVBM**](../../architecture/kvbm_architecture.md) | ❌ | Planned |
+| [**KVBM**](../../kvbm/kvbm_architecture.md) | ❌ | Planned |
 
 
 ## Dynamo SGLang Integration

diff --git a/docs/backends/trtllm/README.md b/docs/backends/trtllm/README.md
@@ -55,9 +55,9 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 | [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ |  |
 | [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
 | [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ |  |
-| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | ✅ |  |
-| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | Planned |
-| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | 🚧 | Planned |
+| [**SLA-Based Planner**](../../../docs/planner/sla_planner.md) | ✅ |  |
+| [**Load Based Planner**](../../../docs/planner/load_planner.md) | 🚧 | Planned |
+| [**KVBM**](../../../docs/kvbm/kvbm_architecture.md) | 🚧 | Planned |
 
 ### Large Scale P/D and WideEP Features
 
@@ -308,4 +308,4 @@ For detailed instructions on running comprehensive performance sweeps across bot
 
 Dynamo with TensorRT-LLM currently supports integration with the Dynamo KV Block Manager. This integration can significantly reduce time-to-first-token (TTFT) latency, particularly in usage patterns such as multi-turn conversations and repeated long-context requests.
 
-Here is the instruction: [Running KVBM in TensorRT-LLM](./../../../docs/guides/run_kvbm_in_trtllm.md) .
+Here is the instruction: [Running KVBM in TensorRT-LLM](./../../../docs/kvbm/trtllm-setup.md) .
diff --git a/docs/backends/vllm/README.md b/docs/backends/vllm/README.md
@@ -38,9 +38,9 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 | [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ |  |
 | [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP |
 | [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ |  |
-| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | ✅ |  |
-| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | WIP |
-| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | ✅ |  |
+| [**SLA-Based Planner**](../../../docs/planner/sla_planner.md) | ✅ |  |
+| [**Load Based Planner**](../../../docs/planner/load_planner.md) | 🚧 | WIP |
+| [**KVBM**](../../../docs/kvbm/kvbm_architecture.md) | ✅ |  |
 | [**LMCache**](./LMCache_Integration.md) | ✅ |  |
 
 ### Large Scale P/D and WideEP Features

diff --git a/docs/benchmarks/pre_deployment_profiling.md b/docs/benchmarks/pre_deployment_profiling.md
@@ -1,7 +1,7 @@
 # Pre-Deployment Profiling
 
 > [!TIP]
-> **New to SLA Planner?** For a complete workflow including profiling and deployment, see the [SLA Planner Quick Start Guide](/docs/kubernetes/sla_planner_quickstart.md).
+> **New to SLA Planner?** For a complete workflow including profiling and deployment, see the [SLA Planner Quick Start Guide](/docs/planner/sla_planner_quickstart.md).
 
 ## Profiling Script
 
@@ -99,7 +99,7 @@ SLA planner can work with any interpolation data that follows the above format.
 ## Detailed Kubernetes Profiling Instructions
 
 > [!TIP]
-> For a complete step-by-step workflow, see the [SLA Planner Quick Start Guide](/docs/kubernetes/sla_planner_quickstart.md).
+> For a complete step-by-step workflow, see the [SLA Planner Quick Start Guide](/docs/planner/sla_planner_quickstart.md).
 
 This section provides detailed technical information for advanced users who need to customize the profiling process.
 

diff --git a/docs/deploy/metrics/docker-compose.yml b/docs/deploy/metrics/docker-compose.yml
diff --git a/docs/guides/backend.md → docs/development/backend-guide.md b/docs/guides/backend.md → docs/development/backend-guide.md
diff --git a/docs/runtime/README.md → docs/development/runtime-guide.md b/docs/runtime/README.md → docs/development/runtime-guide.md
diff --git a/docs/guides/tool_calling.md → docs/guides/tool-calling.md b/docs/guides/tool_calling.md → docs/guides/tool-calling.md
diff --git a/docs/hidden_toctree.rst b/docs/hidden_toctree.rst
@@ -11,18 +11,18 @@
    :maxdepth: 2
    :hidden:
 
-   runtime/README.md
-   API/nixl_connect/connector.md
-   API/nixl_connect/descriptor.md
-   API/nixl_connect/device.md
-   API/nixl_connect/device_kind.md
-   API/nixl_connect/operation_status.md
-   API/nixl_connect/rdma_metadata.md
-   API/nixl_connect/readable_operation.md
-   API/nixl_connect/writable_operation.md
-   API/nixl_connect/read_operation.md
-   API/nixl_connect/write_operation.md
-   API/nixl_connect/README.md
+   development/runtime-guide.md
+   api/nixl_connect/connector.md
+   api/nixl_connect/descriptor.md
+   api/nixl_connect/device.md
+   api/nixl_connect/device_kind.md
+   api/nixl_connect/operation_status.md
+   api/nixl_connect/rdma_metadata.md
+   api/nixl_connect/readable_operation.md
+   api/nixl_connect/writable_operation.md
+   api/nixl_connect/read_operation.md
+   api/nixl_connect/write_operation.md
+   api/nixl_connect/README.md
 
    kubernetes/api_reference.md
    kubernetes/create_deployment.md
@@ -32,14 +32,14 @@
    kubernetes/grove.md
    kubernetes/model_caching_with_fluid.md
    kubernetes/README.md
-   guides/dynamo_run.md
-   guides/metrics.md
-   guides/run_kvbm_in_vllm.md
-   guides/run_kvbm_in_trtllm.md
-   guides/tool_calling.md
+   reference/cli.md
+   observability/metrics.md
+   kvbm/vllm-setup.md
+   kvbm/trtllm-setup.md
+   guides/tool-calling.md
 
    architecture/kv_cache_routing.md
-   architecture/load_planner.md
+   planner/load_planner.md
    architecture/request_migration.md
    architecture/request_cancellation.md
 

diff --git a/docs/index.rst b/docs/index.rst
@@ -42,7 +42,7 @@ Quickstart
 
    Quickstart <self>
    Installation <_sections/installation>
-   Support Matrix <support_matrix.md>
+   Support Matrix <reference/support-matrix.md>
    Architecture <_sections/architecture>
    Examples <_sections/examples>
 
@@ -63,18 +63,18 @@ Quickstart
    :caption: Components
 
    Backends <_sections/backends>
-   Router <components/router/README>
-   Planner <architecture/planner_intro>
-   KVBM <architecture/kvbm_intro>
+   Router <router/README>
+   Planner <planner/planner_intro>
+   KVBM <kvbm/kvbm_intro>
 
 .. toctree::
    :hidden:
    :caption: Developer Guide
 
    Benchmarking Guide <benchmarks/benchmarking.md>
-   SLA Planner (Autoscaling) Quickstart <kubernetes/sla_planner_quickstart>
-   Logging <guides/logging.md>
-   Health Checks <guides/health_check.md>
-   Tuning Disaggregated Serving Performance <guides/disagg_perf_tuning.md>
-   Writing Python Workers in Dynamo <guides/backend.md>
-   Glossary <dynamo_glossary.md>
+   SLA Planner (Autoscaling) Quickstart <planner/sla_planner_quickstart>
+   Logging <observability/logging.md>
+   Health Checks <observability/health-checks.md>
+   Tuning Disaggregated Serving Performance <performance/tuning.md>
+   Writing Python Workers in Dynamo <development/backend-guide.md>
+   Glossary <reference/glossary.md>
diff --git a/docs/kubernetes/create_deployment.md b/docs/kubernetes/create_deployment.md
@@ -90,7 +90,7 @@ Consult the corresponding sh file. Each of the python commands to launch a compo
 
 The front end is launched with "python3 -m dynamo.frontend [--http-port 8000] [--router-mode kv]"
 Each worker will launch `python -m dynamo.YOUR_INFERENCE_BACKEND --model YOUR_MODEL --your-flags `command.
-If you are a Dynamo contributor the [dynamo run guide](/docs/guides/dynamo_run.md) for details on how to run this command.
+If you are a Dynamo contributor the [dynamo run guide](/docs/reference/cli.md) for details on how to run this command.
 
 
 ## Step 3: Key Customization Points

diff --git a/docs/kubernetes/installation_guide.md b/docs/kubernetes/installation_guide.md
@@ -196,7 +196,7 @@ kubectl get pods -n ${NAMESPACE}
 
 3. **Optional:**
    - [Set up Prometheus & Grafana](metrics.md)
-   - [SLA Planner Quickstart Guide](sla_planner_quickstart.md) (for SLA-aware scheduling and autoscaling)
+   - [SLA Planner Quickstart Guide](../planner/sla_planner_quickstart.md) (for SLA-aware scheduling and autoscaling)
 
 ## Troubleshooting
 

diff --git a/docs/kubernetes/metrics.md b/docs/kubernetes/metrics.md
@@ -65,7 +65,7 @@ This will create two components:
 
 Both components expose a `/metrics` endpoint following the OpenMetrics format, but with different metrics appropriate to their roles. For details about:
 - Deployment configuration: See the [vLLM README](/docs/backends/vllm/README.md)
-- Available metrics: See the [metrics guide](/docs/guides/metrics.md)
+- Available metrics: See the [metrics guide](/docs/observability/metrics.md)
 
 ### Validate the Deployment
 

diff --git a/docs/architecture/kvbm_architecture.md → docs/kvbm/kvbm_architecture.md b/docs/architecture/kvbm_architecture.md → docs/kvbm/kvbm_architecture.md
diff --git a/docs/architecture/kvbm_components.md → docs/kvbm/kvbm_components.md b/docs/architecture/kvbm_components.md → docs/kvbm/kvbm_components.md
diff --git a/docs/architecture/kvbm_intro.rst → docs/kvbm/kvbm_intro.rst b/docs/architecture/kvbm_intro.rst → docs/kvbm/kvbm_intro.rst
diff --git a/docs/architecture/kvbm_motivation.md → docs/kvbm/kvbm_motivation.md b/docs/architecture/kvbm_motivation.md → docs/kvbm/kvbm_motivation.md
diff --git a/docs/architecture/kvbm_reading.md → docs/kvbm/kvbm_reading.md b/docs/architecture/kvbm_reading.md → docs/kvbm/kvbm_reading.md
diff --git a/docs/guides/run_kvbm_in_trtllm.md → docs/kvbm/trtllm-setup.md b/docs/guides/run_kvbm_in_trtllm.md → docs/kvbm/trtllm-setup.md
@@ -19,7 +19,7 @@ limitations under the License.
 
 This guide explains how to leverage KVBM (KV Block Manager) to mange KV cache and do KV offloading in TensorRT-LLM (trtllm).
 
-To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest/architecture/kvbm_intro.html)
+To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html)
 
 > [!Note]
 > - Ensure that `etcd` and `nats` are running before starting.

diff --git a/docs/guides/run_kvbm_in_vllm.md → docs/kvbm/vllm-setup.md b/docs/guides/run_kvbm_in_vllm.md → docs/kvbm/vllm-setup.md
@@ -19,7 +19,7 @@ limitations under the License.
 
 This guide explains how to leverage KVBM (KV Block Manager) to mange KV cache and do KV offloading in vLLM.
 
-To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest/architecture/kvbm_intro.html)
+To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest/kvbm/kvbm_intro.html)
 
 ## Quick Start
 

diff --git a/docs/guides/health_check.md → docs/observability/health-checks.md b/docs/guides/health_check.md → docs/observability/health-checks.md
@@ -197,4 +197,4 @@ date: Wed, 03 Sep 2025 13:42:45 GMT
 
 - [Distributed Runtime Architecture](../architecture/distributed_runtime.md)
 - [Dynamo Architecture Overview](../architecture/architecture.md)
-- [Backend Guide](backend.md)
+- [Backend Guide](../development/backend-guide.md)
diff --git a/docs/guides/logging.md → docs/observability/logging.md b/docs/guides/logging.md → docs/observability/logging.md
@@ -187,5 +187,5 @@ curl -d '{"model": "Qwen/Qwen3-0.6B", "max_completion_tokens": 2049, "messages":
 
 - [Distributed Runtime Architecture](../architecture/distributed_runtime.md)
 - [Dynamo Architecture Overview](../architecture/architecture.md)
-- [Backend Guide](backend.md)
+- [Backend Guide](../development/backend-guide.md)
 - [Log Aggregation in Kubernetes](../kubernetes/logging.md)
diff --git a/docs/guides/metrics.md → docs/observability/metrics.md b/docs/guides/metrics.md → docs/observability/metrics.md
@@ -96,6 +96,6 @@ The metrics system includes a pre-configured Grafana dashboard for visualizing s
 
 - [Distributed Runtime Architecture](../architecture/distributed_runtime.md)
 - [Dynamo Architecture Overview](../architecture/architecture.md)
-- [Backend Guide](backend.md)
+- [Backend Guide](../development/backend-guide.md)
 - [Metrics Implementation Examples](../../deploy/metrics/README.md#implementation-examples)
 - [Complete Metrics Setup Guide](../../deploy/metrics/README.md)
diff --git a/docs/guides/disagg_perf_tuning.md → docs/performance/tuning.md b/docs/guides/disagg_perf_tuning.md → docs/performance/tuning.md
diff --git a/docs/architecture/load_planner.md → docs/planner/load_planner.md b/docs/architecture/load_planner.md → docs/planner/load_planner.md