Hi,
Was wondering - for each element in the batch, does the current algorithm automatically parallelise? I have an RTX3090 (with 24 GiB) and I run out of memory instantly for a sequence anything longer than 512 samples.
I was wondering if CUDA is trying to parallelise across each sequence in the batch automatically - if so, I think it's be good to run them in series if there isn't enough memory, seeing as they should be independent. It seems DTW inherently has high memory use, and I'd rather have the loss take longer than be limited in my sequence length if this is the case and that's possible.
Cheers
Hi,
Was wondering - for each element in the batch, does the current algorithm automatically parallelise? I have an RTX3090 (with 24 GiB) and I run out of memory instantly for a sequence anything longer than 512 samples.
I was wondering if CUDA is trying to parallelise across each sequence in the batch automatically - if so, I think it's be good to run them in series if there isn't enough memory, seeing as they should be independent. It seems DTW inherently has high memory use, and I'd rather have the loss take longer than be limited in my sequence length if this is the case and that's possible.
Cheers