You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These divisibility requirements are from FP8 Tensor Cores. The simplest fix is to pad to the nearest multiple of 32, but also this compute seems too small to get full GPU utilization. It may be better to disable FP8 for small layers to avoid the extra overhead.
When I use xx, because the features are multiples of 8 and 16, the Linear layer can be converted to te Linear, as follows:
(time_embedding): Sequential(
(0): Linear(in_features=256, out_features=5120, bias=True)
(1): SiLU()
(2): Linear(in_features=5120, out_features=5120, bias=True)
)
But when I input the data, because the data is a tensor, the dimension changes from 1 to 256 to 2 (1256), triggering an error as follows:
AssertionError: FP8 execution requires 2D input matrices with height divisible by 8 and width divisible by 16, but got tensor with dims=[1, 256]
Is there any other solution besides disabling TE?
The text was updated successfully, but these errors were encountered: