### Description <!-- Describe your request. --> Hi, are there plans to support bfloat16, fp8, fp4? ### Context <!-- What is the motivation for this request? --> Numerical stability from bfloat16 and performance boosts from fp8, fp4