Have you met the problem that the loss quickly converges to zero in two epochs even with very large swap noise (>0.5) or dropout? Meanwhile, the transformed features do not contain useful informations. I am not sure if this is the problem caused by the dataset or not...
Have you met the problem that the loss quickly converges to zero in two epochs even with very large swap noise (>0.5) or dropout? Meanwhile, the transformed features do not contain useful informations. I am not sure if this is the problem caused by the dataset or not...