Is your feature request related to a problem? Please describe.
Disentangle scale swizzling from GEMM.
Describe the solution you'd like
- Add logic in C++ tensor class whether row-wise/col-wise scales are swizzled or not
- Add logic in PyTorch quantized tensors whether row-wise/col-wise scales are swizzled
- Add hints in quantizer whether pre-swizzling is helpful
Describe alternatives you've considered
Additional context