diff --git a/deps/0007-test-strategy.md b/deps/0007-test-strategy.md new file mode 100644 index 0000000..4ca0ba2 --- /dev/null +++ b/deps/0007-test-strategy.md @@ -0,0 +1,482 @@ +# Test Guideline for Dynamo + +## Summary + +This document defines the comprehensive testing strategy for the Dynamo distributed inference framework. It establishes testing standards, organizational patterns, and best practices for validating a complex multi-language system with Rust core components, Python bindings, and multiple backend integrations. + +## Motivation + +Currently the Dynamo project has a number of different test strategies and implementations which can be confusing in particular with respect to what tests run, when, and where. There is not a guide for developers, QA or operations teams as to the general theory and basic set of tools, tests, or when and how they should be run. We need a set of guidelines and overarching structure to help form the basis for test plans. + +## Requirements + +1. Tests MUST be able to run locally as well as in CI. This is subject to appropriate hardware being available in the environment +2. Tests MUST be deterministic. Tests deemed "flaky" will be removed. +3. Tests SHOULD be written before beginning development of a new feature. + +## Test Characteristics +- **Fast**: Unit tests < 10ms, Integration tests < 1s +- **Reliable**: No flaky tests, deterministic outcomes +- **Isolated**: Tests don't affect each other +- **Clear**: Test intent obvious from name and structure +- **Maintainable**: Tests updated with code changes + +## Code Coverage Requirements +- **Rust**: Minimum 80% line coverage, 90% for critical paths +- **Python**: Minimum 85% line coverage, 95% for public APIs + +--- + +## Testing Directory Structure +``` + +dynamo/ +├── lib/ +│ ├── runtime/ +│ │ ├── src/ +│ │ │ └── lib.rs # Rust code + unit tests inside +│ │ └── tests/ # Optional Rust integration tests specific to runtime +│ | └── benches/ +│ ├── llm/ +│ │ └── src/ +│ │ └── lib.rs # Unit tests here +│ │ └── tests/ # Optional Rust integration tests specific to runtime +│ | └── benches/ +│ └── ... +├── components/ +│ ├── planner/ +│ │ └── tests/ # Python unit tests for planner module +│ ├── backend/ +│ │ └── tests/ # Python unit tests for backend module +│ └── ... +├── tests/ # Top-level integration tests (Rust and Python) + + ├── # Python end-to-end tests + ├── benchmark/ + └── fault_tolerance/ + +``` + +--- + +## Test Categories and Levels + +### 1. Unit Tests + +#### Rust Unit Tests +**Location**: Inline with source code using `#[cfg(test)]` +**Purpose**: Test individual functions, structs, and modules in isolation +**Characteristics**: Fast (<1ms), deterministic, no I/O, no network + +```rust +#[cfg(test)] +mod tests { + use super::*; + use tokio_test; + + #[test] + fn test_sync_function() { + let result = my_function(input); + assert_eq!(result, expected); + } + + #[tokio::test] + async fn test_async_function() { + let result = async_function().await; + assert!(result.is_ok()); + } + + #[test] + fn test_error_conditions() { + let result = function_with_errors(invalid_input); + assert!(matches!(result, Err(ErrorType::InvalidInput))); + } +} +``` + +#### Python Unit Tests +**Location**: `component_module/tests/` + +**Purpose**: Test individual Python functions and classes + +**Characteristics**: Fast (<10ms), isolated, mocked dependencies + +```python +import pytest +from unittest.mock import Mock, patch + +@pytest.mark.unit +def test_function_behavior(): + """Test specific function behavior in isolation""" + result = my_function(test_input) + assert result == expected_output + +@pytest.mark.unit +@patch('external_dependency') +def test_with_mocked_dependency(mock_dep): + """Test with external dependencies mocked""" + mock_dep.return_value = mock_response + result = function_using_dependency() + assert result.is_valid() + +@pytest.mark.unit +@pytest.mark.parametrize("input,expected", [ + ("valid_input", True), + ("invalid_input", False), +]) +def test_input_validation(input, expected): + """Parameterized test for various inputs""" + assert validate_input(input) == expected +``` + +### 2. Integration Tests + +#### Rust Integration Tests +**Location**: `tests/` directory in each crate +**Purpose**: Test public APIs and component interactions +**Characteristics**: Medium speed (<100ms), realistic data, limited scope + +```rust +// tests/component_integration.rs +use dynamo_runtime::Runtime; +use dynamo_llm::LLMEngine; + +#[tokio::test] +async fn test_runtime_llm_integration() { + let runtime = Runtime::new().await.unwrap(); + let engine = LLMEngine::new(&runtime).await.unwrap(); + + let result = engine.process_request(test_request()).await; + assert!(result.is_ok()); +} + +#[tokio::test] +async fn test_error_propagation() { + let runtime = Runtime::new().await.unwrap(); + let engine = LLMEngine::new(&runtime).await.unwrap(); + + let result = engine.process_request(invalid_request()).await; + assert!(matches!(result, Err(LLMError::InvalidRequest(_)))); +} +``` + +#### Python Integration Tests +**Location**: `tests/` directory in each component. + +**Purpose**: Test component interactions and Python-Rust bindings + +**Characteristics**: Medium speed (<1s), real components, controlled environment + +```python +@pytest.mark.integration +@pytest.mark.asyncio +async def test_python_rust_integration(): + """Test Python-Rust binding integration""" + runtime = await Runtime.create() + context = runtime.create_context() + + result = await context.process(test_data) + assert result.status == "success" + + await runtime.shutdown() + +@pytest.mark.integration +def test_multi_component_workflow(): + """Test workflow across multiple components""" + planner = Planner() + frontend = Frontend() + backend = Backend("vllm") + + plan = planner.create_plan(request) + processed = frontend.process(plan) + result = backend.execute(processed) + + assert result.is_valid() +``` + +### 3. End-to-End Tests + +#### Python: System E2E Tests +**Location**: `tests/` in root directory. + +**Purpose**: Validate complete system behavior. + +**Characteristics**: Slow (>5s), realistic scenarios, full system. + +```python +@pytest.mark.e2e +@pytest.mark.slow +@pytest.mark.gpu_required +async def test_complete_inference_workflow(): + """Test complete inference from request to response""" + # Start full system + system = await DynamoSystem.start(config) + + # Send realistic request + request = InferenceRequest( + model="test-model", + prompt="Test prompt", + max_tokens=100 + ) + + response = await system.process_request(request) + + assert response.status == "completed" + assert len(response.tokens) > 0 + assert response.latency_ms < MAX_ACCEPTABLE_LATENCY + + await system.shutdown() + +@pytest.mark.e2e +@pytest.mark.multi_gpu +def test_distributed_inference(): + """Test inference across multiple GPUs""" + system = DynamoSystem.start_distributed(gpu_count=2) + + # Test load balancing + requests = [create_test_request() for _ in range(10)] + responses = await system.process_batch(requests) + + assert all(r.status == "completed" for r in responses) + assert_gpu_utilization_balanced() +``` + +### 4. Performance Tests + +#### Rust Benchmarks +**Location**: `benches/` in each crate +**Tool**: Criterion.rs +**Purpose**: Track performance regressions + +```rust +// benches/tokenizer_bench.rs +use criterion::{black_box, criterion_group, criterion_main, Criterion}; +use dynamo_tokens::Tokenizer; + +fn tokenizer_benchmark(c: &mut Criterion) { + let tokenizer = Tokenizer::new("test-model").unwrap(); + let text = "Sample text for tokenization"; + + c.bench_function("tokenize", |b| { + b.iter(|| tokenizer.encode(black_box(text))) + }); + + c.bench_function("decode", |b| { + let tokens = tokenizer.encode(text).unwrap(); + b.iter(|| tokenizer.decode(black_box(&tokens))) + }); +} + +criterion_group!(benches, tokenizer_benchmark); +criterion_main!(benches); +``` + +#### Python Performance Tests +**Location**: `tests/benchamrks/` in root directory + +**Purpose**: Validate system performance characteristics + +```python +@pytest.mark.benchmark +@pytest.mark.performance +def test_throughput_benchmark(benchmark): + """Benchmark system throughput""" + system = setup_test_system() + + def process_batch(): + requests = [create_test_request() for _ in range(100)] + return system.process_batch_sync(requests) + + result = benchmark(process_batch) + + # Assert performance requirements + assert result.throughput > MIN_THROUGHPUT_RPS + assert result.p95_latency < MAX_P95_LATENCY + +@pytest.mark.stress +@pytest.mark.slow +def test_sustained_load(): + """Test system under sustained load""" + system = setup_test_system() + + start_time = time.time() + duration = 300 # 5 minutes + + while time.time() - start_time < duration: + response = system.process_request(create_test_request()) + assert response.status == "success" + + # Monitor resource usage + assert_memory_usage_stable() + assert_cpu_usage_reasonable() +``` + +### 5. Security Tests + +#### Security/OSRB Test Framework + +**Location**: `tests/security/` in root directory + +**Purpose**: Validate security controls and detect OSRB exceptions. + +```python +@pytest.mark.security +def test_input_sanitization(): + """Test that malicious inputs are properly sanitized""" + malicious_inputs = [ + "'; DROP TABLE users; --", + "", + "../../../etc/passwd", + "{{7*7}}", # Template injection + ] + + for malicious_input in malicious_inputs: + response = system.process_request(malicious_input) + assert response.status == "error" + assert "sanitized" in response.message.lower() + +``` + +### 6. Fault Tolerance Tests + +#### Reliability Testing + +**Location**: `tests/fault_tolerance/` in root directory + +**Purpose**: Validate system behavior under failure conditions + + +```python +@pytest.mark.fault_tolerance +@pytest.mark.slow +async def test_network_partition_recovery(): + """Test system recovery from network partitions""" + system = await create_distributed_system(nodes=3) + + # Introduce network partition + await system.partition_network(nodes=[0], isolated_nodes=[1, 2]) + + # System should continue operating with reduced capacity + response = await system.process_request(test_request) + assert response.status in ["success", "degraded"] + + # Heal partition + await system.heal_network_partition() + + # System should return to full capacity + await wait_for_system_recovery() + response = await system.process_request(test_request) + assert response.status == "success" + +@pytest.mark.fault_tolerance +def test_graceful_degradation(): + """Test system degradation under resource pressure""" + system = setup_test_system() + + # Gradually increase load + for load_level in [10, 50, 100, 200, 500]: + responses = system.process_concurrent_requests(load_level) + + success_rate = sum(1 for r in responses if r.status == "success") / len(responses) + + if load_level <= 100: + assert success_rate >= 0.99 # High success rate under normal load + else: + assert success_rate >= 0.80 # Graceful degradation under high load +``` + + +--- + +## Test Segmentation and Grouping + +This section explains how tests are organized, segmented, and run within this project for both **Python** and **Rust** codebases. It covers the usage of **pytest markers** for Python tests and **Cargo features** for Rust tests, along with guidelines on running segmented tests efficiently. Please ensure that the marker names and features names across the toml files are consistent for Rust and Python. + +--- + +### Python Tests Segmentation (pytest) + +We use **pytest markers** to categorize tests by their purpose, requirements, and execution characteristics. This helps selectively run relevant tests during development, CI/CD, and nightly/weekly runs. + +#### Test Types and Markers + +| Marker | Description | +|---------------------------|---------------------------------------------------------| +| `@pytest.mark.unit` | Marks **unit tests**, testing individual components. | +| `@pytest.mark.integration`| Marks **integration tests**, testing interactions between components. | +| `@pytest.mark.e2e` | Marks **end-to-end (E2E) tests**, simulating user workflows. | +| `@pytest.mark.stress` | Marks **stress tests** designed for load and robustness. | + +### Further Classification (Integration Test Examples) + +- **System Configuration Marks (Hardware Requirements):** + - `@pytest.mark.gpus_needed_0` – No GPUs required. + - `@pytest.mark.gpus_needed_1` – Requires 1 GPU. + - `@pytest.mark.gpus_needed_2` – Requires 2 GPUs. + +- **Life-Cycle Marks:** + - `@pytest.mark.premerge` – Tests to run before code merge. + - `@pytest.mark.postmerge` – Tests to run after merge. + - `@pytest.mark.nightly` – Tests scheduled to run nightly. + - `@pytest.mark.release` – Tests to run before releases. + +- **Worker Framework Marks:** + - `@pytest.mark.vllm` + - `@pytest.mark.tensorrt_llm` + - `@pytest.mark.sglang` + - `@pytest.mark.dynamo` + +- **Execution Specific Marks:** + - `@pytest.mark.fast` – Quick tests, often small models. + - `@pytest.mark.slow` – Tests that take longer time >10 minutes. + - `@pytest.mark.skip(reason="...")` – Skip tests with a reason. + - `@pytest.mark.xfail(reason="...")` – Expected failing tests. + +- **Component Specific Marks:** + - `@pytest.mark.kvbm` – Tests for KVBM behavior. + - `@pytest.mark.planner` – Tests for planner behavior. + - `@pytest.mark.router` – Tests for router behavior. + +- **Infrastructure Specific Marks:** + - `@pytest.mark.h100` – wideep tests requires to be run on H100 and cannot be run on L40. Also, certain pytorch versions support compute capability 8.0 and require a higher CC. + +NOTE: The markers/features will be updated as required. + +### How to Run Python Tests by Marker + +Run all tests with a specific marker: + +```bash +pytest -m +``` + +### Rust Tests Segmentation using Cargo features + +Tests can be conditionally compiled using `#[cfg(feature = "feature_name")]`. For example: + +```rust +#[cfg(feature = "gpu")] +#[test] +fn test_gpu_acceleration() { + // GPU-specific test code here +} +``` +```rust +#[cfg(feature = "nightly")] +#[test] +fn test_nightly_only_feature() { + // Nightly-only test code here +} +``` + +To combine features; + +```rust +#[cfg(all(feature = "gpu", feature = "vllm"))] +#[test] +fn test_gpu_and_vllm() { + // Test requiring both features +} + +cargo test --features "gpu vllm" +```