|
| 1 | +# Cooperative Vector DirectX Feature - Test Plan |
| 2 | + |
| 3 | +<a name="top"></a> |
| 4 | + |
| 5 | +## Executive Summary |
| 6 | + |
| 7 | +**DISCLAIMER: This is based on the WIP cooperative vector spec. Some details may change.** |
| 8 | + |
| 9 | +**TODO: Update naming once spec is finalized.** |
| 10 | + |
| 11 | +**Current status is: UNDER EXTERNAL REVIEW** |
| 12 | + |
| 13 | +This test plan outlines the comprehensive validation strategy for the DirectX |
| 14 | +Cooperative Vector feature, which enables hardware-accelerated vector-matrix |
| 15 | +operations within DirectX 12 shaders using HLSL. The feature supports neural |
| 16 | +network computations and other machine learning workloads through optimized |
| 17 | +HLSL intrinsics for matrix-vector operations. |
| 18 | + |
| 19 | +The plan defines a systematic testing approach covering: |
| 20 | +- **Functionality validation** for all HLSL cooperative vector intrinsics |
| 21 | +- **Type support testing** for both mandatory and optional combinations |
| 22 | +- **Comprehensive matrix/vector parameter testing** across layouts, dimensions |
| 23 | + and memory patterns |
| 24 | +- **Execution environment verification** across shader stages and control flow |
| 25 | + patterns |
| 26 | +- **Precision validation** to ensure correctness within defined tolerances |
| 27 | + |
| 28 | +The test methodology incorporates feature detection and conformance testing |
| 29 | +on supported hardware. The document serves as a comprehensive reference for |
| 30 | +implementing, validating, and maintaining HLK execution tests for the DirectX |
| 31 | +Cooperative Vector feature through Microsoft's ExecTest/HLK framework. |
| 32 | + |
| 33 | +## Table of Contents |
| 34 | + |
| 35 | +- [1. Test Scope](#1-test-scope) |
| 36 | + - [1.1 Feature Components](#11-feature-components) |
| 37 | + - [1.2 Target Environment](#12-target-environment) |
| 38 | + - [1.3 Test Types](#13-test-types) |
| 39 | +- [2. Test Methodology](#2-test-methodology) |
| 40 | + - [2.1 Feature Detection](#21-feature-detection) |
| 41 | + - [2.2 Functionality Testing for Matrix-Vector Operations](#22-functionality-testing-for-matrix-vector-operations) |
| 42 | + - [2.2.1 MatrixVectorMul/MulAdd Tests](#221-matrixvectormuladd-tests) |
| 43 | + - [2.2.2 OuterProductAccumulate Tests](#222-outerproductaccumulate-tests) |
| 44 | + - [2.2.3 InterlockedAdd Tests](#223-interlockedadd-tests) |
| 45 | + - [2.2.4 Input Vector Interpretation Tests](#224-input-vector-interpretation-tests) |
| 46 | + - [2.3 Matrix Conversion Testing](#23-matrix-conversion-testing) |
| 47 | + - [2.3.1 GetCooperativeMatrixVectorConversionDestinationInfo](#231-getcooperativematrixvectorconversiondestinationinfo) |
| 48 | + - [2.3.2 CooperativeVectorConvertMatrix](#232-cooperativevectorconvertmatrix) |
| 49 | + - [2.4 Control Flow Tests](#24-control-flow-tests) |
| 50 | + - [2.5 Shader Stages to Test](#25-shader-stages-to-test) |
| 51 | + - [2.6 Multi-Layer Neural Network Tests](#26-multi-layer-neural-network-tests) |
| 52 | + - [2.7 Non-mandatory Configuration Testing](#27-non-mandatory-configuration-testing) |
| 53 | +- [3. Test Infrastructure](#3-test-infrastructure) |
| 54 | + - [3.1 Test Framework](#31-test-framework) |
| 55 | + - [3.2 Shader Generation](#32-shader-generation) |
| 56 | + - [3.3 Result Validation](#33-result-validation) |
| 57 | + |
| 58 | +## 1. Test Scope |
| 59 | + |
| 60 | +### 1.1 Feature Components |
| 61 | +**Mandatory Operations for `D3D12_COOPERATIVE_VECTOR_TIER_1_0`** |
| 62 | +- `MatrixVectorMul` - Matrix-Vector Multiply |
| 63 | +- `MatrixVectorMulAdd` - Matrix-Vector Multiply-Add |
| 64 | +- `ID3D12Device::GetCooperativeMatrixVectorConversionDestinationInfo` - API to |
| 65 | + query destination buffer size for matrix conversion |
| 66 | +- `ID3D12CommandList::CooperativeVectorConvertMatrix` - API for matrix layout |
| 67 | + and type conversion |
| 68 | + |
| 69 | +**Mandatory Operations for `D3D12_COOPERATIVE_VECTOR_TIER_1_1`** |
| 70 | +- `OuterProductAccumulate` - Vector-Vector Outer Product and Accumulate |
| 71 | +- `InterlockedAdd` - Add all components of a vector component-wise atomically |
| 72 | + to memory |
| 73 | + |
| 74 | +### 1.2 Target Environment |
| 75 | +- **OS Versions**: Windows 11, Windows 10 (latest versions) |
| 76 | +- **Hardware**: All GPUs supporting `D3D12_COOPERATIVE_VECTOR_TIER_1_0`, |
| 77 | + optional features in `D3D12_COOPERATIVE_VECTOR_TIER_1_1` |
| 78 | + |
| 79 | +### 1.3 Test Types |
| 80 | +- Functionality tests |
| 81 | + - Basic functionality tests for all mandatory operations and type |
| 82 | + combinations in the minimum support set for |
| 83 | + `D3D12_COOPERATIVE_VECTOR_TIER_1_0` |
| 84 | + - Basic functionality tests for all mandatory operations and type |
| 85 | + combinations in the minimum support set for |
| 86 | + `D3D12_COOPERATIVE_VECTOR_TIER_1_1` |
| 87 | +- Extended functionality tests |
| 88 | + - Extended functionality tests for other type combinations supported by the |
| 89 | + driver |
| 90 | +- Edge case tests |
| 91 | + - Test with values that are at the edge of representable values for the given |
| 92 | + type |
| 93 | + - Test with special values (NaN, Infinity, Denormal) |
| 94 | + - Test with various control flow patterns |
| 95 | +- Multi-Layer tests |
| 96 | + - Test a subset of test variable configurations with more complex/realistic |
| 97 | + use cases. IE: MatrixVectorMul with interleaved activation functions. |
| 98 | + |
| 99 | +[Back to Top](#top) |
| 100 | + |
| 101 | +## 2. Test Methodology |
| 102 | + |
| 103 | +### 2.1 Feature Detection |
| 104 | + |
| 105 | +- For devices reporting `D3D12_COOPERATIVE_VECTOR_TIER_1_0` all mandatory |
| 106 | +operations and type combinations in the minimum support set must be supported. |
| 107 | +- For devices reporting `D3D12_COOPERATIVE_VECTOR_TIER_1_1` all mandatory |
| 108 | +operations and type combinations in the minimum support set must be supported. |
| 109 | + |
| 110 | +- When performing each test, check that the driver reports the operation and |
| 111 | + its type combinations are supported. |
| 112 | + - If the driver reports that a mandatory test configuration is not supported, |
| 113 | + the test should fail. |
| 114 | + - If the driver reports that an optional test configuration is supported, a |
| 115 | + test failure would result in failing the conformance test even though the |
| 116 | + operation is optional. The driver should correctly report support. |
| 117 | + - Otherwise skip the test. |
| 118 | + |
| 119 | +### 2.2 Functionality Testing for Matrix-Vector Operations |
| 120 | + |
| 121 | +#### 2.2.1 MatrixVectorMul/MulAdd Tests |
| 122 | +- Test all mandatory type combinations in the minimum support set |
| 123 | +- Test various optional type combinations if driver reports support |
| 124 | +- Test with and without matrix transposition if driver reports support |
| 125 | +- Test with all matrix layouts |
| 126 | +- Test matrices of different dimensions (small, ML common, non-power of 2) |
| 127 | +- Test different values for `MatrixOffset` and `MatrixStride` parameters |
| 128 | + |
| 129 | +#### 2.2.2 OuterProductAccumulate Tests |
| 130 | +- Test mandatory type combination: `FP16`→`FP16` |
| 131 | +- Test various optional type combinations if driver reports support |
| 132 | +- Test with various matrix layouts |
| 133 | +- Test matrices of different dimensions (small, ML common, non-power of 2) |
| 134 | +- Test different values for `ResultMatrixOffset` and `ResultMatrixStride` |
| 135 | + parameters |
| 136 | +- Test atomic accumulation behavior with multiple threads/waves |
| 137 | + |
| 138 | +#### 2.2.3 InterlockedAdd Tests |
| 139 | +- Test mandatory type combination: `FP16`→`FP16` |
| 140 | +- Test various optional type combinations if driver reports support |
| 141 | +- Test vectors of different lengths (small, ML common, non-power of 2) |
| 142 | +- Test different values for `ResultOffset` parameter |
| 143 | +- Test atomic accumulation behavior with multiple threads/waves |
| 144 | + |
| 145 | +#### 2.2.4 Input Vector Interpretation Tests |
| 146 | +- The functionality tests should cover the conversion of input vector type |
| 147 | + to input interpretation type. |
| 148 | + - Test arithmetic conversions that preserve values (EX: fp16->fp8) |
| 149 | + - Test bitcast conversions that do not affect values |
| 150 | + (EX: HLSL packed type/uint -> SignedInt8x4Packed) |
| 151 | + |
| 152 | +### 2.3 Matrix Conversion Testing |
| 153 | + |
| 154 | +#### 2.3.1 GetCooperativeMatrixVectorConversionDestinationInfo |
| 155 | +- Test queries for all destination layouts (row-major, column-major, |
| 156 | + inferencing-optimal, training-optimal) and types in the minimum support set |
| 157 | +- Verify returned sizes are sufficient for subsequent conversion operations |
| 158 | +- Validate that returned sizes match the actual required size when performing |
| 159 | + conversion |
| 160 | + |
| 161 | +#### 2.3.2 CooperativeVectorConvertMatrix |
| 162 | +- Test all mandatory source and destination type combinations in the minimum |
| 163 | + support set |
| 164 | +- Test all source and destination layout combinations |
| 165 | +- Test with various matrix dimensions |
| 166 | +- Test with different stride values for row/column major layouts |
| 167 | +- Test multiple conversions in a single API call, i.e., multiple |
| 168 | + `D3D12_COOPERATIVE_VECTOR_MATRIX_CONVERSION_INFO` objects passed in. |
| 169 | + |
| 170 | +### 2.4 Control Flow Tests |
| 171 | + |
| 172 | +The vector-matrix tests should cover the following control flow patterns: |
| 173 | + |
| 174 | +| Pattern Type | Description | |
| 175 | +|-----------------------|--------------------------------------------------| |
| 176 | +| Uniform execution | All lanes in wave execute the same code path | |
| 177 | +| Divergent execution | 50% of lanes take a different branch | |
| 178 | +| Non-uniform offsets | Different lanes use different matrix offsets | |
| 179 | + |
| 180 | +### 2.5 Shader Stages to Test |
| 181 | + |
| 182 | +- Tests must cover all supported shader stages. |
| 183 | +- Test in compute shaders comprehensively with all type combinations and |
| 184 | + dimensions |
| 185 | +- For other shader stages, use a more limited set of tests with: |
| 186 | + - A subset of key types |
| 187 | + - A subset of key dimensions |
| 188 | + - Only basic functionality tests (no advanced or special cases) |
| 189 | + |
| 190 | +This approach ensures we cover all shader stages without combinatorial |
| 191 | +explosion of test cases. |
| 192 | + |
| 193 | +### 2.6 Multi-Layer Neural Network Tests |
| 194 | + |
| 195 | +- Test chained MatrixVectorMul(Add)? operations with interleaved activation |
| 196 | + functions |
| 197 | +- Test with different number of layers |
| 198 | + |
| 199 | +### 2.7 Non-mandatory Configuration Testing |
| 200 | + |
| 201 | +This section outlines the approach for testing optional type combinations that |
| 202 | +go beyond the mandatory requirements. |
| 203 | +**REMINDER**: If the driver reports that an optional type combination is |
| 204 | +supported, a test failure would result in failing the conformance test. |
| 205 | + |
| 206 | +These conformance tests are focused on the mandatory configurations, but we |
| 207 | +should have tests that cover the optional configurations. |
| 208 | +The optional configurations will be tested using the parametrized shader |
| 209 | +generator and use the basic functionality tests. |
| 210 | +To prevent combinatorial explosion, the optional configurations will be more |
| 211 | +limited in scope and will not be required to cover all of the test variables, |
| 212 | +but they should at least cover the allowed types in a representative subset |
| 213 | +of the test variables used for basic functionality tests. |
| 214 | + |
| 215 | +[Back to Top](#top) |
| 216 | + |
| 217 | +## 3. Test Infrastructure |
| 218 | + |
| 219 | +### 3.1 Test Framework |
| 220 | +- Tests will be implemented using the DirectX ExecTest/HLK testing framework |
| 221 | +- Parameterized test generation will be used to cover the extensive range of |
| 222 | + configuration space |
| 223 | + |
| 224 | +### 3.2 Shader Generation |
| 225 | +- Create a shader generator framework that can produce test shaders with |
| 226 | + configurable parameters |
| 227 | +- Shader generator should be able to produce shaders for all above tests |
| 228 | + |
| 229 | +### 3.3 Result Validation |
| 230 | + |
| 231 | +When implementing result validation for the DirectX Cooperative Vector feature, |
| 232 | +use the following approaches: |
| 233 | + |
| 234 | +- **Validate Using Reference Implementations**: |
| 235 | + - Create CPU reference implementations for each intrinsic that provide |
| 236 | + reference results for comparison |
| 237 | + |
| 238 | +- **Define Precision Requirements by Type and Operation**: |
| 239 | + - Use value patterns that are exactly representable to allow bit-exact |
| 240 | + comparison for basic functionality and special value handling tests. |
| 241 | + - Use relative error thresholds for more complex operations like multi-layer |
| 242 | + tests. |
| 243 | + |
| 244 | +- **Focus on Key Special Value Handling**: |
| 245 | + - **NaN Propagation**: Test that NaN inputs lead to NaN outputs across |
| 246 | + operations |
| 247 | + - **Infinity Handling**: Test basic infinity handling according to DirectX |
| 248 | + rules |
| 249 | + - **Basic Denormal Handling**: Test denormal input and output behavior |
| 250 | + according to precision requirements |
| 251 | + |
| 252 | +This approach ensures that tests validate functional correctness while |
| 253 | +accommodating reasonable implementation-specific variations in precision, |
| 254 | +particularly for lower-precision formats or operations that involve multiple |
| 255 | +calculation steps. |
| 256 | + |
| 257 | +[Back to Top](#top) |
0 commit comments