-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathbenchmarks_51849.out
24 lines (21 loc) · 2.11 KB
/
benchmarks_51849.out
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
ERROR: Unable to locate a modulefile for 'cuda/11.7'
1.13.1
[2023-11-20 18:43:27,785] [INFO] [distributed.py:36:init_distributed] Not using the DeepSpeed or torch.distributed launchers, attempting to detect MPI environment...
[2023-11-20 18:43:28,251] [INFO] [distributed.py:83:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=26.0.144.205, master_port=6000
[2023-11-20 18:43:28,251] [INFO] [distributed.py:46:init_distributed] Initializing torch distributed with backend: nccl
[2023-11-20 18:43:29,480] [INFO] [checkpointing.py:223:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234
Traceback (most recent call last):
File "/fsx/home-jacob/TransformerSizing/torch_transformer_flops.py", line 490, in <module>
num_attention_heads: 128, hidden_size: 32768, train_micro_batch_size_per_gpu: 4, tensor_mp_size: 1, pipeline_mp_size: 1, dp_size: 1
Actual
------
benchmark_transformer(configuration, seq_length, train_batch_size)
File "/fsx/home-jacob/TransformerSizing/torch_transformer_flops.py", line 401, in benchmark_transformer
transformer_layer = ParallelTransformerLayer(args,attention_mask_func=attention_mask_func,init_method=init_method,output_layer_init_method=init_method,layer_number=0).half().to("cuda:0")
File "/fsx/home-jacob/TransformerSizing/megatron/model/transformer.py", line 823, in __init__
self.mlp = ParallelMLP(
File "/fsx/home-jacob/TransformerSizing/megatron/model/transformer.py", line 98, in __init__
self.dense_h_to_4h = mpu.ColumnParallelLinear(
File "/fsx/home-jacob/TransformerSizing/megatron/mpu/layers.py", line 458, in __init__
torch.empty(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB (GPU 0; 39.56 GiB total capacity; 32.00 GiB already allocated; 6.68 GiB free; 32.00 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF