Getting jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED error while performing inference on gemma-2b on TPU #85

adityarajsahu · 2025-02-07T06:06:01Z

I ran the whole colab script on my TPU server - https://colab.research.google.com/github/google-deepmind/gemma/blob/main/colabs/sampling_tutorial.ipynb#scrollTo=tqbJ1SUcESaN

The script ran properly the first time, but sometime later, when I again ran the script I got the following error

jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Error allocating device buffer: Attempting to allocate 15.65G. That was not possible. There are 11.08G free.; (0x0x0_HBM0)

Can anyone please tell what is the exact cause of the error and how to fix this?

The text was updated successfully, but these errors were encountered:

Gopi-Uppari · 2025-02-10T06:49:19Z

Hi @adityarajsahu,

I reproduced this issue. The crash occurring on the second run is likely caused by residual memory from the previous run not being released. This is a common issue with large models and memory-intensive processes in environments like Google Colab.
To resolve it, batch Inputs to Reduce Memory Usage. If the input batch is too large, split it into smaller batches to process sequentially. Can you please refer this gist file where you will find the modified code.

Thank you.

adityarajsahu · 2025-02-10T07:09:21Z

Hi @Gopi-Uppari,
Thanks for the help, I will try the modified code. Apart from that, any way to ensure all memory is released after the process terminates?

Gopi-Uppari · 2025-02-10T08:07:05Z

Hi @adityarajsahu,

To ensure all memory is released after the process terminates, we can restart the runtime/session in Colab.

Thank you.

adityarajsahu · 2025-02-10T09:04:10Z

Hi @Gopi-Uppari,

I am working on GCP TPU VM.

Conchylicultor · 2025-02-10T22:05:21Z

You can explicitly call x.delete() on an array to explicitly release the memory from the TPU. Use it inside jax.tree.map to release a tree of arrays.

Otherwise, Jax will release the memory if there's no reference on the python array anymore.

Gopi-Uppari · 2025-02-17T05:01:40Z

Hi @adityarajsahu,

Could you please confirm if this issue is resolved for you with the above comments ? Please feel free to close the issue if it is resolved ?

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED error while performing inference on gemma-2b on TPU #85

Getting jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED error while performing inference on gemma-2b on TPU #85

adityarajsahu commented Feb 7, 2025

Gopi-Uppari commented Feb 10, 2025

adityarajsahu commented Feb 10, 2025

Gopi-Uppari commented Feb 10, 2025

adityarajsahu commented Feb 10, 2025

Conchylicultor commented Feb 10, 2025

Gopi-Uppari commented Feb 17, 2025

Getting jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED error while performing inference on gemma-2b on TPU #85

Getting jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED error while performing inference on gemma-2b on TPU #85

Comments

adityarajsahu commented Feb 7, 2025

Gopi-Uppari commented Feb 10, 2025

adityarajsahu commented Feb 10, 2025

Gopi-Uppari commented Feb 10, 2025

adityarajsahu commented Feb 10, 2025

Conchylicultor commented Feb 10, 2025

Gopi-Uppari commented Feb 17, 2025