OOM during inference with HJB optimization

Thank you for the great work!

May I ask what GPU config is required for inference with HJB optimization, and if there are any constraints for the reference image or target video? I encountered an OOM issue when running the command_op_infer.sh, and I was using H100 with 80G and have already set `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True`. Could you please advise how to resolve the issue? Thank you very much!

```
<PROJECT_PATH>/animation/helper/backbones/iresnet.py:149: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(self.fp16):
  4%|█████                                                                                                                        | 1/25 [00:28<11:14, 28.08s/it] current iteration: 0
  4%|█████                                                                                                                        | 1/25 [00:36<14:30, 36.26s/it]

Traceback (most recent call last):
  File "<PROJECT_PATH>/inference_op.py", line 471, in <module>
    video_frames = pipeline(
  File "<PROJECT_PATH>/animation/pipelines/inference_pipeline_animation_pro.py", line 692, in __call__
    latents = self.scheduler.step(
  File "<PROJECT_PATH>/animation/pipelines/euler_discrete_pro.py", line 702, in step
    pred_frames = decode_latents_scheduler_new(
        latents=z0,
        num_frames=num_frames,
        decode_chunk_size=decode_chunk_size,
        vae=vae,
        device=device
    )
  File "<PROJECT_PATH>/animation/pipelines/euler_discrete_pro.py", line 107, in decode_latents_scheduler_new
    frame = vae.decode(latents[i: i + decode_chunk_size], **decode_kwargs).sample
  File "<ENV_PATH>/lib/python3.10/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "<PROJECT_PATH>/animation/modules/refined_vae.py", line 355, in decode
    decoded = self.decoder(
        z,
        num_frames=num_frames,
        image_only_indicator=image_only_indicator
    )
  File "<ENV_PATH>/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "<ENV_PATH>/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "<PROJECT_PATH>/animation/modules/refined_vae.py", line 107, in forward
    sample = torch.utils.checkpoint.checkpoint(
        custom_forward,
        *inputs
    )
  File "<ENV_PATH>/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner
    return disable_fn(*args, **kwargs)
  File "<ENV_PATH>/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn
    return fn(*args, **kwargs)
  File "<ENV_PATH>/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 496, in checkpoint
    ret = function(*args, **kwargs)
  File "<PROJECT_PATH>/animation/modules/refined_vae.py", line 91, in custom_forward
    return module(*inputs)
  File "<ENV_PATH>/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "<ENV_PATH>/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "<ENV_PATH>/lib/python3.10/site-packages/diffusers/models/unets/unet_3d_blocks.py", line 1000, in forward
    hidden_states = resnet(hidden_states, temb)
  File "<ENV_PATH>/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "<ENV_PATH>/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "<ENV_PATH>/lib/python3.10/site-packages/diffusers/models/resnet.py", line 693, in forward
    hidden_states = self.spatial_res_block(hidden_states, temb)
  File "<ENV_PATH>/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "<ENV_PATH>/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "<ENV_PATH>/lib/python3.10/site-packages/diffusers/models/resnet.py", line 327, in forward
    hidden_states = self.norm1(hidden_states)
  File "<ENV_PATH>/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 313, in forward
    return F.group_norm(input, self.num_groups, self.weight, self.bias, self.eps)
  File "<ENV_PATH>/lib/python3.10/site-packages/torch/nn/functional.py", line 2955, in group_norm
    return torch.group_norm(input, self.num_groups, self.weight, self.bias, self.eps)

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB. GPU 0 has a total capacity of 79.19 GiB, of which 913.06 MiB is free. Including non-PyTorch memory, this process has 78.29 GiB in use. Of that, 77.22 GiB is allocated by PyTorch and 374.74 MiB is reserved but unallocated.  
If reserved but unallocated memory is large, try setting `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` to avoid fragmentation.  
See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OOM during inference with HJB optimization #119

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

OOM during inference with HJB optimization #119

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions