Update models_mamba.py #90

mdchuc · 2024-05-30T00:13:28Z

While trying to train vision mamba with bidirectional mode in masked autoencoder network, I experienced nan loss. Though switch training from mixed precision to full precision fixed the problem but significantly increased training time (almost twice). Looking at the code, the adding of forward and backward hidden_states/residuals does increased the magnitude of both twice as compared to the original hidden states (after patch embedding). By dividing by 2, nan loss was resolved and mixed precision training can continue.

mdchuc mentioned this pull request May 30, 2024

Loss is nan, stopping training #30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update models_mamba.py #90

Update models_mamba.py #90

Uh oh!

mdchuc commented May 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Update models_mamba.py #90

Are you sure you want to change the base?

Update models_mamba.py #90

Uh oh!

Conversation

mdchuc commented May 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant