Skip to content

CUDA: out of memory running test #3

@CrhistyanSilva

Description

@CrhistyanSilva

Hi! I'm trying to run the test for CLIC.mobile dataset, but it show "RuntimeError: CUDA out of memory." Is there any option or change that help with it? Here is the command I'm running:

# Note: depending on your environment, adapt CUDA_VISIBLE_DEVICES.
CUDA_VISIBLE_DEVICES=0 python -u run_test.py \options 
    "$MODELS_DIR" 1109_1715 \
    "AUTOEXPAND:$DATASET_DIR/mobile_valid" \
    --restore_itr 1000000 \
    --tau \
    --clf_p "$MODELS_DIR/1115_1729*/ckpts/*.pt" \
    --qstrategy CLF_ONLY

I was able to run with open images 500 without problems, maybe my GPU is not enough for this other test? The GPU is a GeForce RTX 2070 with Max-Q Design with 8192Mb of memory. OS: Linux Mint 2021

Here is the full output:

WRAN: using Agg backend linux
*** AC_NEEDS_CROP_DIM = 3000000
*** AUTOEXPAND ->
9: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q9
10: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q10
11: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q11
12: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q12
13: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q13
14: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q14
15: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q15
16: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q16
17: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q17
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00040
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q9/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00030
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q9_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00029
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q10/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00028
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q10_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00029
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q11/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00029
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q11_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00030
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q12/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00028
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q12_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00030
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q13/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00028
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q13_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00028
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q14/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00027
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q14_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00029
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q15/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00028
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q15_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00029
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q16/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00028
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q16_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00029
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q17/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00028
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q17_None_dS=False_professional_valid_None_dS=False)...
Starting Q 13
Got 1 datasets.
Testing 1109_1715 at -1 ---
Got keep test.
After filter:
GlobalConfig()
*** global_config fw_s=3
*** global_config long_means
*** global_config long_pis
*** global_config long_sigma
*** global_config gdn
*** global_config no_norm_final
*** global_config lr.initial=5e-05
*** global_config down_up=deconv
Updating config.lr.initial = 5e-05
Using global_config: GlobalConfig(
	down_up=deconv
	fw_s=3
	gdn
	long_means
	long_pis
	long_sigma
	lr.initial=5e-05
	no_norm_final
	unet_skip)
*** no norm for final
*** DownUp, adding DeconvUp()
filter_width for sigma = 3
Did set tail_networks.sigmas
Did set tail_networks.means
Did set tail_networks.pis
Setting tail_networks[ dict_keys(['sigmas', 'means', 'pis']) ]
EB: self.cin_style = None
******************************
*** Padding by a factor 2
******************************
*** Setting classifier...
Using classifier with config configs/ms/clf/down2_nonorm_down.cf
ClassifierNetwork(
  (head): Sequential(
    (0): Conv2d(3, 64, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2))
    (1): IdentityModule()
    (2): ReLU(inplace)
    (3): Conv2d(64, 128, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2))
    (4): IdentityModule()
    (5): ReLU(inplace)
  )
  (model): Sequential(
    (0): ResBlock(Conv(128x3)/N(128)/ReLU(inplace)/Conv(128x3)/N(128))
    (1): ResBlock(Conv(128x3)/N(128)/ReLU(inplace)/Conv(128x3)/N(128))
    (2): ResBlock(Conv(128x3)/N(128)/ReLU(inplace)/Conv(128x3)/N(128))
    (3): ResBlock(Conv(128x3)/N(128)/ReLU(inplace)/Conv(128x3)/N(128))
    (4): Conv2d(128, 256, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2))
    (5): ResBlock(Conv(256x3)/N(256)/ReLU(inplace)/Conv(256x3)/N(256))
    (6): ResBlock(Conv(256x3)/N(256)/ReLU(inplace)/Conv(256x3)/N(256))
    (7): ResBlock(Conv(256x3)/N(256)/ReLU(inplace)/Conv(256x3)/N(256))
    (8): ResBlock(Conv(256x3)/N(256)/ReLU(inplace)/Conv(256x3)/N(256))
    (9): ChannelAverage()
  )
  (tail): Sequential(
    (0): Linear(in_features=256, out_features=7, bias=True)
  )
)
Restoring /home/crhistyan/proyecto_grado/bpg/models/1115_1729 clf@down2_nonorm_down clf@model1715 exp_min=6.25e-06 lr.initial=0.0001 lr.schedule=exp_0.25_i50000 n_resblock=4/ckpts/ckpt_0000106000.pt
Loaded!
*** Enabling QSTRATEGY=CLF_ONLY...
*** Ignoring 0 ckpts after 1612829659.0823963
Restoring /home/crhistyan/proyecto_grado/bpg/models/1109_1715 gdn_wide_deep3 new_oi_q12_14_128 unet_skip/ckpts/ckpt_0000998500.pt... (strict=True)
Testing <dataloaders.compressed_images_loader.MetaResidualDataset object at 0x7f9e40b37690>
*** MetaResidualDataset professional_valid_None_dS=False_m2_multi_q9_10_11_12_13_14_15_16_17
Traceback (most recent call last):
  File "run_test.py", line 303, in <module>
    main()
  File "run_test.py", line 84, in main
    results += tester.test_all(datasets)
  File "/home/crhistyan/proyecto_grado/bpg/RC-PyTorch/src/test/multiscale_tester.py", line 309, in test_all
    return self._get_results(datasets, self.test)
  File "/home/crhistyan/anaconda3/envs/bpg/lib/python3.7/site-packages/fjcommon/functools_ext.py", line 32, in composed
    return f1(f2(*args_c, **kwargs_c))
  File "/home/crhistyan/proyecto_grado/bpg/RC-PyTorch/src/test/multiscale_tester.py", line 291, in _get_results
    results = [fn(ds) for ds in datasets]
  File "/home/crhistyan/proyecto_grado/bpg/RC-PyTorch/src/test/multiscale_tester.py", line 291, in <listcomp>
    results = [fn(ds) for ds in datasets]
  File "/home/crhistyan/proyecto_grado/bpg/RC-PyTorch/src/test/multiscale_tester.py", line 333, in test
    results = self._test(ds)
  File "/home/crhistyan/proyecto_grado/bpg/RC-PyTorch/src/test/multiscale_tester.py", line 420, in _test
    out = self.blueprint.forward(x_n_crop, bpps)  # Note: bpps only used for conditional IN!
  File "/home/crhistyan/proyecto_grado/bpg/RC-PyTorch/src/blueprints/enhancement_blueprint.py", line 236, in forward
    network_out: prob_clf.NetworkOutput = self.net(x_l, side_information)
  File "/home/crhistyan/anaconda3/envs/bpg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/crhistyan/proyecto_grado/bpg/RC-PyTorch/src/modules_enh/enhancement_network.py", line 288, in forward
    x = self.unet_skip_conv(torch.cat((x, x_after_head), dim=1))
  File "/home/crhistyan/anaconda3/envs/bpg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/crhistyan/anaconda3/envs/bpg/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/crhistyan/anaconda3/envs/bpg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/crhistyan/anaconda3/envs/bpg/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 338, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 1.33 GiB (GPU 0; 7.80 GiB total capacity; 5.56 GiB already allocated; 1.14 GiB free; 17.20 MiB cached)

Thanks!!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions