-
Notifications
You must be signed in to change notification settings - Fork 335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED error while performing inference on gemma-2b on TPU #85
Comments
Hi @adityarajsahu, I reproduced this issue. The crash occurring on the second run is likely caused by residual memory from the previous run not being released. This is a common issue with large models and memory-intensive processes in environments like Google Colab. Thank you. |
Hi @Gopi-Uppari, |
Hi @adityarajsahu, To ensure all memory is released after the process terminates, we can restart the runtime/session in Colab. Thank you. |
Hi @Gopi-Uppari, I am working on GCP TPU VM. |
You can explicitly call Otherwise, Jax will release the memory if there's no reference on the python array anymore. |
Hi @adityarajsahu, Could you please confirm if this issue is resolved for you with the above comments ? Please feel free to close the issue if it is resolved ? Thank you. |
I ran the whole colab script on my TPU server - https://colab.research.google.com/github/google-deepmind/gemma/blob/main/colabs/sampling_tutorial.ipynb#scrollTo=tqbJ1SUcESaN
The script ran properly the first time, but sometime later, when I again ran the script I got the following error
jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Error allocating device buffer: Attempting to allocate 15.65G. That was not possible. There are 11.08G free.; (0x0x0_HBM0)
Can anyone please tell what is the exact cause of the error and how to fix this?
The text was updated successfully, but these errors were encountered: