Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses

Update: Our team will evaluate this more before outsourcing the migration to more people in the community

Context:
Previously we use AffineQuantizedTensor for many of our use cases including int4, float8, intx, floatx. It introduces some complicated abstractions like Layout, people have been saying it's a bit hard to understand, and there are many indirections in the code.

As an effort simplify the code base and make it easier to contribute to, we have been adding new features with a different structure in mind. Now we want to structure Tensors by "dtype" and "packing_format", e.g. we'll have Int4PreshuffledTensor, Int8Tensor, Float8Tensor, instead of having AffineQuantizedTensor and multiple layouts.

Please check out our updated docs for the new tensor subclass organization structure and guide for design:
* quantization overview: https://docs-preview.pytorch.org/pytorch/ao/2723/quantization_overview.html
* contributor guide: https://docs-preview.pytorch.org/pytorch/ao/2723/contributor_guide.html
* Examples of tensor subclasses following new design: https://github.com/pytorch/ao/tree/main/torchao/quantization/quantize_/workflows

## migration status

| inference config name | current status | plan | POC | status |
| ----------------------------------------- | ---- |  --- | --- | --- |
| `MXFPInferenceConfig` | built on v2 | n/a | - | done |
| `NVFP4InferenceConfig` | built on v2 | n/a | - | done |
| `Float8DynamicActivationInt4WeightConfig` | built on v2 | n/a | - | done |
| `Int4WeightOnlyConfig` | v2 and v1 exists | deprecate v1 | ? | ? |
| `Int8DynamicActivationIntxWeightConfig` | v2 and v1 exists | deprecate v1 | ? | ? |
| `Float8WeightOnlyConfig` | v2 and v1 exists | deprecate v1 | ? | ? |
| `Float8DynamicActivationFloat8WeightConfig` | v2 and v1 exists | deprecate v1 | ? | ? |
| `IntxWeightOnlyConfig` | v2 and v1 exists | deprecate v1 | ? | ? |
| `Float8DynamicActivationFloat8SemiSparseWeightConfig` | v1 exists | create v2, then deprecate v1 | ? | ? |
| `Int8WeightOnlyConfig` | v1 exists | create v2, then deprecate v1 | ? | ? |
| `Int8DynamicActivationInt8WeightConfig` | v1 exists | create v2, then deprecate v1 | ? | ? |
| `Int8DynamicActivationInt4WeightConfig` | v1 exists | move to prototype | ? | ? |
| `Int4DynamicActivationInt4WeightConfig` | v1 exists | move to prototype | ? | ? |
| `GemliteUIntXWeightOnlyConfig` | v1 exists | move to prototype | ? | ? |
| `Float8StaticActivationFloat8WeightConfig` | v1 exists | move to prototype | ? | ? |
| `UIntXWeightOnlyConfig` | v1 exists | move to prototype | ? | ? |
| `FPXWeightOnlyConfig` | v1 exists | move to prototype | ? | ? |

## appendix

List of things to migrate:
INT8
* [x] [move to prototype] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/block_sparse_layout.py @jainapurva https://github.com/pytorch/ao/pull/3276
* [ ] [migrate] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/plain_layout.py @namgyu-youn https://github.com/pytorch/ao/pull/3241
* [ ] [move to prototype] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/semi_sparse_layout.py @namgyu-youn  https://github.com/pytorch/ao/pull/3258 (no need to migrate to new tensor structure)


[migration done, TODO: delete old path after all migration is done] INT4 weight only
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/int4_cpu_layout.py @Xia-Weiwen  https://github.com/pytorch/ao/blob/main/torchao/quantization/quantize_/workflows/int4/int4_opaque_tensor.py
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/int4_xpu_layout.py @liangan1 https://github.com/pytorch/ao/pull/2845
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/marlin_sparse_layout.py @liangel-02 https://github.com/pytorch/ao/pull/2771
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/tensor_core_tiled_layout.py @jerryzh168  https://github.com/pytorch/ao/pull/2791
* [x] HQQ support for tensor core tiled layout @jerryzh168 https://github.com/pytorch/ao/pull/2912/

[move to prototype] INT4 weight + int8 activation
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/cutlass_int4_packed_layout.py @jainapurva https://github.com/pytorch/ao/pull/3277
* [ ] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/dyn_int8_act_int4_wei_cpu_layout.py @jainapurva https://github.com/pytorch/ao/pull/3299
* [ ] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/marlin_qqq_tensor.py @jainapurva 


UINTx Weight Only
* [ ] [move to protoype] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/gemlite_layout.py
* [ ] [move to protoype] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/uintx_layout.py

[migration done, TODO: delete old path after all migration is done] Int8DynamicActivationIntxWeightConfig
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/packed_linear_int8_dynamic_activation_intx_weight_layout.py @metascroy https://github.com/pytorch/ao/pull/2742
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/q_dq_layout.py @metascroy https://github.com/pytorch/ao/pull/2732

FP8
* [ ] [migrate] https://github.com/pytorch/ao/blob/main/torchao/dtypes/floatx/cutlass_semi_sparse_layout.py @namgyu-youn  https://github.com/pytorch/ao/pull/3258 and @bbeckca #3182 

FPx
* [ ] [move to protoype] https://github.com/pytorch/ao/blob/main/torchao/dtypes/floatx/floatx_tensor_core_layout.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses #2752

migration status

appendix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

inference config name	current status	plan	POC	status
`MXFPInferenceConfig`	built on v2	n/a	-	done
`NVFP4InferenceConfig`	built on v2	n/a	-	done
`Float8DynamicActivationInt4WeightConfig`	built on v2	n/a	-	done
`Int4WeightOnlyConfig`	v2 and v1 exists	deprecate v1	?	?
`Int8DynamicActivationIntxWeightConfig`	v2 and v1 exists	deprecate v1	?	?
`Float8WeightOnlyConfig`	v2 and v1 exists	deprecate v1	?	?
`Float8DynamicActivationFloat8WeightConfig`	v2 and v1 exists	deprecate v1	?	?
`IntxWeightOnlyConfig`	v2 and v1 exists	deprecate v1	?	?
`Float8DynamicActivationFloat8SemiSparseWeightConfig`	v1 exists	create v2, then deprecate v1	?	?
`Int8WeightOnlyConfig`	v1 exists	create v2, then deprecate v1	?	?
`Int8DynamicActivationInt8WeightConfig`	v1 exists	create v2, then deprecate v1	?	?
`Int8DynamicActivationInt4WeightConfig`	v1 exists	move to prototype	?	?
`Int4DynamicActivationInt4WeightConfig`	v1 exists	move to prototype	?	?
`GemliteUIntXWeightOnlyConfig`	v1 exists	move to prototype	?	?
`Float8StaticActivationFloat8WeightConfig`	v1 exists	move to prototype	?	?
`UIntXWeightOnlyConfig`	v1 exists	move to prototype	?	?
`FPXWeightOnlyConfig`	v1 exists	move to prototype	?	?

Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses #2752

Description

migration status

appendix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions