1gpu vs multi-gpu: good convergence, but no results when test single image #1602

RSly · 2017-04-26T14:13:26Z

Hi,

I have a question maybe it is a bug or maybe I am missing something:

Case-1: I train a model using latest digits+nv-caffe with batch_size 1 and 1 gpu
=> the curves are great, loss is low, and when tested on the images (test one), the objects are correctly recognized

Case-2: same config as above, same data, same network, etc. I use batch_size=3 in train and batch_size=1 for test and run it on 3 gpus
=> the curves are great, loss is low BUT when tested using 'test single image' no object is recognized, even after many epochs, etc.

I am surprised that the curves show same 'low loss', but the actual test on the images gives nothing!
any suggestions?

ontheway16 · 2017-04-27T04:20:47Z

Hi, are all 3 gpus are the same?

RSly · 2017-04-27T09:48:32Z

yes, would that matter?

ontheway16 · 2017-04-27T16:05:56Z

Although I could not get an answer on selection of gpu for inference yet in #1418 , it appears only the gpu-0 utilized for infer. Just thought whether swapping gpus may help, in case of different gpu boards.

RSly closed this as completed Jun 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1gpu vs multi-gpu: good convergence, but no results when test single image #1602

1gpu vs multi-gpu: good convergence, but no results when test single image #1602

RSly commented Apr 26, 2017

ontheway16 commented Apr 27, 2017

RSly commented Apr 27, 2017

ontheway16 commented Apr 27, 2017

1gpu vs multi-gpu: good convergence, but no results when test single image #1602

1gpu vs multi-gpu: good convergence, but no results when test single image #1602

Comments

RSly commented Apr 26, 2017

ontheway16 commented Apr 27, 2017

RSly commented Apr 27, 2017

ontheway16 commented Apr 27, 2017