-
Notifications
You must be signed in to change notification settings - Fork 355
How to run CaffeOnSpark with a pre-existing model? #265
Comments
Yes, you can test with an existing model. hadoop fs -rm -f /cifar10.model.h5 /cifar10_features_result |
What about for LMDB on YARN? I imagine it would be similar but we would remove the lines reading and replace it with just a -test, right? Are there any other files we would need to modify? And where would we store the model (is there a need to remove it from hadoop?) and how would we tell CaffeOnSpark to read from that model instead of generating a new one? I thought we would remove the line reading Thank you!! hadoop fs -rm -r -f hdfs:///mnist_features_result |
"LMDB" is a data format, to use it, you need change "source_class" in lenet_memory_train_test.prototxt. We do not recommend "LMDB" for large data set since it is not a distributed data format. "hadoop fs -rm" deletes the file/directory, if you don't want to delete it, don't do it. Note the job will fail if your program writes to an existing directory, since overwriting is not allowed. Only "-train" generates new model. "-test" and "-features" read the provided model. Don't delete the existing model if you use either "-test" or "-features" since it won't be able to read it. |
Thank you, I did all that and it runs fine now :). Thank you |
CaffeOnSpark will copy entire LMDB file to all executors, since the we can not really partition it without reading it first, as opposite to dataframe or sequencefile, where you can read part of the file. Spark does partition the file afterwards, so each executor only processes partitions. |
Hi,
I'm looking for a way to separate the training and testing phases in CaffeOnSpark. In other words, I'd like to create an MNIST Model and train it in one phase and then test it in another (and save that model for testing with different data). Is it possible to do this without interleaving the data (as is done in the wiki example)? For example, first I would train the model and generate it without testing anything. Then, I could use that existing model (without training a new one on the same training data all over again) on multiple different test datasets.
Is there a way to do this? Additionally, regardless of the separation of the phases, is there a way to use an existing/trained CaffeOnSpark model on new data (instead of creating an entirely new model on training data each time you wish to run it)? How could I do this/what commands do I need to modify?
Thanks!
The text was updated successfully, but these errors were encountered: