Skip to content
This repository has been archived by the owner on Jan 7, 2025. It is now read-only.

1gpu vs multi-gpu: good convergence, but no results when test single image #1602

Closed
RSly opened this issue Apr 26, 2017 · 3 comments
Closed

Comments

@RSly
Copy link

RSly commented Apr 26, 2017

Hi,

I have a question maybe it is a bug or maybe I am missing something:

Case-1: I train a model using latest digits+nv-caffe with batch_size 1 and 1 gpu
=> the curves are great, loss is low, and when tested on the images (test one), the objects are correctly recognized

Case-2: same config as above, same data, same network, etc. I use batch_size=3 in train and batch_size=1 for test and run it on 3 gpus
=> the curves are great, loss is low BUT when tested using 'test single image' no object is recognized, even after many epochs, etc.

I am surprised that the curves show same 'low loss', but the actual test on the images gives nothing!
any suggestions?

@ontheway16
Copy link

Hi, are all 3 gpus are the same?

@RSly
Copy link
Author

RSly commented Apr 27, 2017

yes, would that matter?

@ontheway16
Copy link

Although I could not get an answer on selection of gpu for inference yet in #1418 , it appears only the gpu-0 utilized for infer. Just thought whether swapping gpus may help, in case of different gpu boards.

@RSly RSly closed this as completed Jun 8, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants