Skip to content

Commit 40f5470

Browse files
Clarify matrix/bias alignment and size restrictions, rename reducesumaccumulate to vectoraccumulate
1 parent 1cba954 commit 40f5470

File tree

1 file changed

+21
-29
lines changed

1 file changed

+21
-29
lines changed

proposals/0029-cooperative-vector.md

+21-29
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ specification we add four operations:
122122
* **Vector-Vector Outer Product and Accumulate:** Compute the outerproduct of
123123
two vectors and accumulate the result matrix atomically-elementwise in
124124
memory.
125-
* **Reduce and Accumulate:** Accumulate elements of a vector
125+
* **Vector Accumulate:** Accumulate elements of a vector
126126
atomically-elementwise to corresponding elements in memory.
127127

128128

@@ -218,8 +218,10 @@ For optimal layouts, **matrix stride** is ignored.
218218

219219
Only non-packed interpretations are valid for matrices.
220220

221-
The base address of **matrix resource** and **matrix offset** must be 64 byte
222-
aligned.
221+
The base address of **matrix resource** and **matrix offset** must be 128 byte
222+
aligned. Also note that the size of the underlying allocation is guaranteed to
223+
be a multiple of 16 bytes ensuring that the 16 bytes access of the last
224+
row/column of the matrix is valid memory.
223225

224226
The **matrix stride** is 16 byte aligned.
225227

@@ -300,8 +302,10 @@ resource**, with **matrix offset**, **matrix stride**, **matrix
300302
interpretation** and **matrix layout** behaving as described [above]
301303
(#matrix-vector-multiply-and-multiply-add-operations).
302304

303-
The base address of **matrix resource** and **matrix offset** must be 64 byte
304-
aligned.
305+
The base address of **matrix resource** and **matrix offset** must be 128 byte
306+
aligned. Also note that the size of the underlying allocation is guaranteed to
307+
be a multiple of 16 bytes ensuring that the 16 bytes access of the last
308+
row/column of the matrix is valid memory
305309

306310
The **matrix stride** is 16 byte aligned.
307311

@@ -318,12 +322,12 @@ guaranteed to be supported on all implementations can be found in
318322
`I8`, `F8_E4M3`, `F8_E5M2`,
319323

320324

321-
### Reduce Sum Accumulate
325+
### Vector Accumulate
322326

323327
#### Syntax
324328

325329
``` llvm
326-
declare void @dx.op.vecreducesumacc.v[NUM][TY](
330+
declare void @dx.op.vectoraccumulate.v[NUM][TY](
327331
immarg i32, ; opcode
328332
<[NUM] x [TY]>, ; input vector
329333
%dx.types.Handle, ; output array resource
@@ -666,7 +670,7 @@ typedef struct D3D12_COOPERATIVE_VECTOR_PROPERTIES_INFERENCE
666670
BOOL TransposeSupported;
667671
};
668672
669-
// Used for OuterProductAccumulate and ReduceSumAccumulate intrinsics
673+
// Used for OuterProductAccumulate and VectorAccumulate intrinsics
670674
typedef struct D3D12_COOPERATIVE_VECTOR_PROPERTIES_TRAINING
671675
{
672676
D3D12_COOPERATIVE_VECTOR_DATATYPE InputType;
@@ -679,8 +683,8 @@ typedef struct D3D12_FEATURE_DATA_COOPERATIVE_VECTOR
679683
Out D3D12_COOPERATIVE_VECTOR_PROPERTIES_INFERENCE* pMatrixVectorMulAddProperties;
680684
InOut UINT OuterProductAccPropCount;
681685
Out D3D12_COOPERATIVE_VECTOR_PROPERTIES_TRAINING* pOuterProductAccProperties;
682-
InOut UINT ReduceSumAccPropCount;
683-
Out D3D12_COOPERATIVE_VECTOR_PROPERTIES_TRAINING* pReduceSumAccProperties;
686+
InOut UINT VectorAccumulatePropCount;
687+
Out D3D12_COOPERATIVE_VECTOR_PROPERTIES_TRAINING* pVectorAccumulateProperties;
684688
};
685689
686690
```
@@ -705,10 +709,10 @@ the operation fails and `E_INVALIDARG` is returned.
705709

706710
**D3D12_COOPERATIVE_VECTOR_TIER_1_0**: Device supports *MatrixVectorMul*
707711
and *MatrixVectorMulAdd* intrinsics. `OuterProductAccPropCount` and
708-
`ReduceSumAccPropCount` are 0 in this case.
712+
`VectorAccumulatePropCount` are 0 in this case.
709713

710714
**D3D12_COOPERATIVE_VECTOR_TIER_1_1**: Device supports previous
711-
tiers, *OuterProductAccumulate* and *ReduceSumAccumulate* functions.
715+
tiers, *OuterProductAccumulate* and *VectorAccumulate* functions.
712716

713717
#### Minimum Support Set
714718

@@ -739,7 +743,7 @@ explicitly checked for the combinations below.
739743
| FP16 | FP16 |
740744
| FP16 | FP32 |
741745

742-
##### For ReduceSumAccumulate
746+
##### For VectorAccumulate
743747

744748
| InputType | AccumulationType |
745749
|-----------|------------------|
@@ -811,7 +815,8 @@ the inputs required to calculate the necessary size. The same descriptor,
811815
updated with the calculated output size, is then passed to the conversion
812816
API.
813817

814-
The `DestStride` must be a multiple of 16 bytes.
818+
The `DestSize` and `DestStride` must be a multiple of 16 bytes. The `DestVA`
819+
must be 128B aligned.
815820

816821
```c++
817822

@@ -987,22 +992,9 @@ Various combinations of enums for specifying interpretations were considered
987992
with varying trade-offs of complexity versus typesafety and simplicity, before
988993
deciding to extend the existing `ComponentType` enum.
989994

990-
## Open Issues
991-
992-
* Q: Type interpretations to use HLSL conversion rules of ML best practices?
993-
* A: This spec uses the ML best practices like the SpirV spec. // TODO: get
994-
approval
995-
* Q: More details on formats and their precision requirements
996-
* A: Implementation Dependent
997-
* Q: How do you handle cases where different implementations may not produce bit
998-
identical results?
999-
* A: Some combination of exactly representable results/ epsilon ranges.
1000-
* Q: Using MatrixView and VectorView as a wrapper for the BAB containing the
1001-
matrix/bias vectors and their corresponding interpretations.
1002-
1003995
## Acknowledgments
1004996

1005-
We would like to thank Jeff Bolz, Yury Uralsky and Patrick Neill for their
1006-
contributions to this specification.
997+
We would like to thank Jeff Bolz, Yury Uralsky, Patrick Neill, Tex Riddell and
998+
Amar Patel for their contributions to this specification.
1007999

10081000
<!-- {% endraw %} -->

0 commit comments

Comments
 (0)