smith_waterman_parallel TODO list MPI for multiple cores ✅ pthreads for single core multiple threads ✅ CUDA for GPU accelerator ✅