You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sockeye is training much faster than Marian.
I run a 1 data epoch long training on a 4.7M training examples small data set with either framework. To best of my knowledge I used comparable training parameters for both frameworks. Bu the results were 21 min vs 36 min, favoring Sockeye.
What I do not know is, if it is a problem due to my old setup - Ubuntu 18.04.6 and everything that follows from that (e.g. old compiler and other stuff), or it something to do with Marian.
How to reproduce
A typical way of training Sockeye systems is to run data prep step before training. sockeye-prepare-data --source train.bpe.en --target --output . --max-seq-len 128 --shared-vocab --num-words 25000
Data prep time was not included in training time.
To measure Sockeye's training time I used timestamps between start and end of the training, which to me worked out to be 21 min. touch sockeye.start & torchrun --no_python --nproc_per_node 2 sockeye-train --prepared-data . --output models --validation-source dev.bpe.en --validation-target --max-num-epochs 1 --shared-vocab --dist --amp --update-interval 12 --batch-size 18000--max-seq-len 128 > training.log 2>&1 & touch sockeye.end
For Marian I used /marian-vocab --max-size 25000 marian --devices 0 1 --type transformer --model /tmp/toms/sockeye-test/model.npz --train-sets /tmp/toms/sockeye-test/train.bpe.en /tmp/toms/sockeye-test/ --vocabs en-lv-shared-vocab.yml en-lv-shared-vocab.yml --max-length 128 --max-length-factor 1.5 --mini-batch-fit --workspace 18000 --maxi-batch 2000 --early-stopping 10 --valid-freq 1000000 --save-freq 2000000 --disp-freq 100 --keep-best --overwrite --valid-metrics cross-entropy translation --valid-sets /tmp/toms/sockeye-test/dev.bpe.en /tmp/toms/sockeye-test/ --valid-script-path /tmp/toms/sockeye-test/ --log /tmp/toms/sockeye-test/train.log --valid-log /tmp/toms/sockeye-test/valid.log --seed 347155 --exponential-smoothing --normalize 0.6 --beam-size 6 --quiet-translation --valid-translation-output /tmp/toms/sockeye-test/valid.output.txt --valid-mini-batch 16 --enc-depth 6 --dec-depth 6 --transformer-heads 8 --transformer-preprocess d --transformer-postprocess-emb d --transformer-postprocess dan --optimizer-delay 12 --learn-rate 0.0005 --lr-warmup 16000 --lr-decay-inv-sqrt 16000 --lr-report --clip-norm 5 --tied-embeddings-all --sync-sgd --transformer-dropout 0.1 --transformer-dropout-attention 0.1 --transformer-dropout-ffn 0.1 --optimizer adam --optimizer-params 0.9 0.98 1e-09 --sqlite /tmp/en-lv-W69bwc2f6meuT-combined.db -e 1 --fp16
To measure Marian's training time I used timestamps for outputs Training started and Training finished which to me worked out to be around 36 min. This was with Marian version: v1.10.24; 4dd30b5 2021-09-08 14:02:21 +0100
I also tried Marian v1.11.0 f00d062 2022-02-08 08:39:24 -0800 but it gave even worse - 43 min.
I do realize, that Marian's --workspace 18000 and Sockeye's --batch-size 18000 aren't the same, however, running with different --batch-size values didn't affect time it took Sockeye to train for one epoch.
I also checked if both frameworks have seen the same number of sentences during their respective training runs. The numbers were about the same.
Marian version: v1.10.24; 4dd30b5 2021-09-08 14:02:21 +0100
Marian version: v1.11.0 f00d062 2022-02-08 08:39:24 -0800
CMake command:
cmake ..
-- The CXX compiler identification is GNU 7.5.0
-- The C compiler identification is GNU 7.5.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Project name: marian
-- Project version: v1.11.0+f00d0621
Submodule 'examples' ( registered for path 'examples'
Submodule 'regression-tests' ( registered for path 'regression-tests'
Submodule 'src/3rd_party/fbgemm' ( registered for path 'src/3rd_party/fbgemm'
Submodule 'src/3rd_party/intgemm' ( registered for path 'src/3rd_party/intgemm'
Submodule 'src/3rd_party/nccl' ( registered for path 'src/3rd_party/nccl'
Submodule 'src/3rd_party/sentencepiece' ( registered for path 'src/3rd_party/sentencepiece'
Submodule 'src/3rd_party/simple-websocket-server' ( registered for path 'src/3rd_party/simple-websocket-server'
Cloning into '/tmp/toms/sockeye-test/marian/examples'...
Cloning into '/tmp/toms/sockeye-test/marian/regression-tests'...
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/fbgemm'...
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/intgemm'...
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/nccl'...
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/sentencepiece'...
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/simple-websocket-server'...
Submodule path 'examples': checked out '6d5921cc7de91f4e915b59e9c52c9a76c4e99b00'
Submodule path 'regression-tests': checked out '0716f4e012d1e3f7543bffa8aecc97ce9c903e17'
Submodule path 'src/3rd_party/fbgemm': checked out '6f45243cb8ab7d7ab921af18d313ae97144618b8'
Submodule 'third_party/asmjit' ( registered for path 'src/3rd_party/fbgemm/third_party/asmjit'
Submodule 'third_party/cpuinfo' ( registered for path 'src/3rd_party/fbgemm/third_party/cpuinfo'
Submodule 'third_party/googletest' ( registered for path 'src/3rd_party/fbgemm/third_party/googletest'
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/fbgemm/third_party/asmjit'...
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/fbgemm/third_party/cpuinfo'...
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/fbgemm/third_party/googletest'...
Submodule path 'src/3rd_party/fbgemm/third_party/asmjit': checked out '4da474ac9aa2689e88d5e40a2f37628f302d7e3c'
Submodule path 'src/3rd_party/fbgemm/third_party/cpuinfo': checked out 'd5e37adf1406cf899d7d9ec1d317c47506ccb970'
Submodule path 'src/3rd_party/fbgemm/third_party/googletest': checked out '0fc5466dbb9e623029b1ada539717d10bd45e99e'
Submodule path 'src/3rd_party/intgemm': checked out '8abde25b13c3ab210c0dec8e23f4944e3953812d'
Submodule path 'src/3rd_party/nccl': checked out '5dcf7751494f9d04057bfc6b4a2b64611bc12253'
Submodule path 'src/3rd_party/sentencepiece': checked out 'c307b874deb5ea896db8f93506e173353e66d4d3'
Submodule path 'src/3rd_party/simple-websocket-server': checked out '1d7e84aeb3f1ebdc78f6965d79ad3ca3003789fe'
CMake Warning at CMakeLists.txt:79 (message):
CMAKE_BUILD_TYPE not set; setting to Release
-- Building with -march=native and intrinsics will be chosen automatically by the compiler to match the current machine.
-- Checking support for CPU intrinsics
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: software/anaconda3/envs/sockeye3 (found suitable version "10.0", minimum required is "9.0")
-- Compiling code for Pascal GPUs
-- Compiling code for Volta GPUs
-- Compiling code for Turing GPUs
-- Found CUDA libraries: software/anaconda3/envs/sockeye3/lib64/; software/anaconda3/envs/sockeye3/lib64/; software/anaconda3/envs/sockeye3/lib64/
-- Found Tcmalloc: /usr/lib/x86_64-linux-gnu/
-- Found MKL: -Wl,--start-group;/opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.a;/opt/intel/mkl/lib/intel64/libmkl_sequential.a;/opt/intel/mkl/lib/intel64/libmkl_core.a;-Wl,--end-group
CMake Warning at src/3rd_party/intgemm/CMakeLists.txt:33 (message):
Not building AVX512VNNI-based multiplication because your compiler is
too old.
For details rerun cmake with --debug-trycompile then try to build in
-- VERSION: 0.1.94
-- Found TCMalloc: /usr/lib/x86_64-linux-gnu/
-- Found Doxygen: /usr/bin/doxygen (found version "1.8.13") found components: doxygen dot
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/toms/sockeye-test/marian/build
Both frameworks use CUDA Version 10 although there could be minor differences, as Sockeye 3 is installed via Conda and uses its installation.
Bug description
Sockeye is training much faster than Marian.
I run a 1 data epoch long training on a 4.7M training examples small data set with either framework. To best of my knowledge I used comparable training parameters for both frameworks. Bu the results were 21 min vs 36 min, favoring Sockeye.
What I do not know is, if it is a problem due to my old setup - Ubuntu 18.04.6 and everything that follows from that (e.g. old compiler and other stuff), or it something to do with Marian.
How to reproduce
A typical way of training Sockeye systems is to run data prep step before training.
sockeye-prepare-data --source train.bpe.en --target --output . --max-seq-len 128 --shared-vocab --num-words 25000
Data prep time was not included in training time.
To measure Sockeye's training time I used timestamps between start and end of the training, which to me worked out to be 21 min.
touch sockeye.start & torchrun --no_python --nproc_per_node 2 sockeye-train --prepared-data . --output models --validation-source dev.bpe.en --validation-target --max-num-epochs 1 --shared-vocab --dist --amp --update-interval 12 --batch-size 18000--max-seq-len 128 > training.log 2>&1 & touch sockeye.end
For Marian I used
/marian-vocab --max-size 25000
marian --devices 0 1 --type transformer --model /tmp/toms/sockeye-test/model.npz --train-sets /tmp/toms/sockeye-test/train.bpe.en /tmp/toms/sockeye-test/ --vocabs en-lv-shared-vocab.yml en-lv-shared-vocab.yml --max-length 128 --max-length-factor 1.5 --mini-batch-fit --workspace 18000 --maxi-batch 2000 --early-stopping 10 --valid-freq 1000000 --save-freq 2000000 --disp-freq 100 --keep-best --overwrite --valid-metrics cross-entropy translation --valid-sets /tmp/toms/sockeye-test/dev.bpe.en /tmp/toms/sockeye-test/ --valid-script-path /tmp/toms/sockeye-test/ --log /tmp/toms/sockeye-test/train.log --valid-log /tmp/toms/sockeye-test/valid.log --seed 347155 --exponential-smoothing --normalize 0.6 --beam-size 6 --quiet-translation --valid-translation-output /tmp/toms/sockeye-test/valid.output.txt --valid-mini-batch 16 --enc-depth 6 --dec-depth 6 --transformer-heads 8 --transformer-preprocess d --transformer-postprocess-emb d --transformer-postprocess dan --optimizer-delay 12 --learn-rate 0.0005 --lr-warmup 16000 --lr-decay-inv-sqrt 16000 --lr-report --clip-norm 5 --tied-embeddings-all --sync-sgd --transformer-dropout 0.1 --transformer-dropout-attention 0.1 --transformer-dropout-ffn 0.1 --optimizer adam --optimizer-params 0.9 0.98 1e-09 --sqlite /tmp/en-lv-W69bwc2f6meuT-combined.db -e 1 --fp16
To measure Marian's training time I used timestamps for outputs Training started and Training finished which to me worked out to be around 36 min. This was with Marian version: v1.10.24; 4dd30b5 2021-09-08 14:02:21 +0100
I also tried Marian v1.11.0 f00d062 2022-02-08 08:39:24 -0800 but it gave even worse - 43 min.
I do realize, that Marian's
--workspace 18000
and Sockeye's--batch-size 18000
aren't the same, however, running with different --batch-size values didn't affect time it took Sockeye to train for one epoch.I also checked if both frameworks have seen the same number of sentences during their respective training runs. The numbers were about the same.
cmake ..
-- The CXX compiler identification is GNU 7.5.0
-- The C compiler identification is GNU 7.5.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Project name: marian
-- Project version: v1.11.0+f00d0621
Submodule 'examples' ( registered for path 'examples'
Submodule 'regression-tests' ( registered for path 'regression-tests'
Submodule 'src/3rd_party/fbgemm' ( registered for path 'src/3rd_party/fbgemm'
Submodule 'src/3rd_party/intgemm' ( registered for path 'src/3rd_party/intgemm'
Submodule 'src/3rd_party/nccl' ( registered for path 'src/3rd_party/nccl'
Submodule 'src/3rd_party/sentencepiece' ( registered for path 'src/3rd_party/sentencepiece'
Submodule 'src/3rd_party/simple-websocket-server' ( registered for path 'src/3rd_party/simple-websocket-server'
Cloning into '/tmp/toms/sockeye-test/marian/examples'...
Cloning into '/tmp/toms/sockeye-test/marian/regression-tests'...
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/fbgemm'...
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/intgemm'...
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/nccl'...
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/sentencepiece'...
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/simple-websocket-server'...
Submodule path 'examples': checked out '6d5921cc7de91f4e915b59e9c52c9a76c4e99b00'
Submodule path 'regression-tests': checked out '0716f4e012d1e3f7543bffa8aecc97ce9c903e17'
Submodule path 'src/3rd_party/fbgemm': checked out '6f45243cb8ab7d7ab921af18d313ae97144618b8'
Submodule 'third_party/asmjit' ( registered for path 'src/3rd_party/fbgemm/third_party/asmjit'
Submodule 'third_party/cpuinfo' ( registered for path 'src/3rd_party/fbgemm/third_party/cpuinfo'
Submodule 'third_party/googletest' ( registered for path 'src/3rd_party/fbgemm/third_party/googletest'
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/fbgemm/third_party/asmjit'...
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/fbgemm/third_party/cpuinfo'...
Cloning into '/tmp/toms/sockeye-test/marian/src/3rd_party/fbgemm/third_party/googletest'...
Submodule path 'src/3rd_party/fbgemm/third_party/asmjit': checked out '4da474ac9aa2689e88d5e40a2f37628f302d7e3c'
Submodule path 'src/3rd_party/fbgemm/third_party/cpuinfo': checked out 'd5e37adf1406cf899d7d9ec1d317c47506ccb970'
Submodule path 'src/3rd_party/fbgemm/third_party/googletest': checked out '0fc5466dbb9e623029b1ada539717d10bd45e99e'
Submodule path 'src/3rd_party/intgemm': checked out '8abde25b13c3ab210c0dec8e23f4944e3953812d'
Submodule path 'src/3rd_party/nccl': checked out '5dcf7751494f9d04057bfc6b4a2b64611bc12253'
Submodule path 'src/3rd_party/sentencepiece': checked out 'c307b874deb5ea896db8f93506e173353e66d4d3'
Submodule path 'src/3rd_party/simple-websocket-server': checked out '1d7e84aeb3f1ebdc78f6965d79ad3ca3003789fe'
CMake Warning at CMakeLists.txt:79 (message):
CMAKE_BUILD_TYPE not set; setting to Release
-- Building with -march=native and intrinsics will be chosen automatically by the compiler to match the current machine.
-- Checking support for CPU intrinsics
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: software/anaconda3/envs/sockeye3 (found suitable version "10.0", minimum required is "9.0")
-- Compiling code for Pascal GPUs
-- Compiling code for Volta GPUs
-- Compiling code for Turing GPUs
-- Found CUDA libraries: software/anaconda3/envs/sockeye3/lib64/; software/anaconda3/envs/sockeye3/lib64/; software/anaconda3/envs/sockeye3/lib64/
-- Found Tcmalloc: /usr/lib/x86_64-linux-gnu/
-- Found MKL: -Wl,--start-group;/opt/intel/mkl/lib/intel64/libmkl_intel_ilp64.a;/opt/intel/mkl/lib/intel64/libmkl_sequential.a;/opt/intel/mkl/lib/intel64/libmkl_core.a;-Wl,--end-group
CMake Warning at src/3rd_party/intgemm/CMakeLists.txt:33 (message):
Not building AVX512VNNI-based multiplication because your compiler is
too old.
For details rerun cmake with --debug-trycompile then try to build in
-- VERSION: 0.1.94
-- Found TCMalloc: /usr/lib/x86_64-linux-gnu/
-- Found Doxygen: /usr/bin/doxygen (found version "1.8.13") found components: doxygen dot
-- Configuring done
-- Generating done
-- Build files have been written to: /tmp/toms/sockeye-test/marian/build
Ubuntu 18.04.6
The text was updated successfully, but these errors were encountered: