Open
Description
Is your feature request related to a problem? Please describe.
For improving performance, reducing memory allocations is one of a key techniques. This often requires pre-allocating memory and then doing updates in-place. ITensors already provides functionality like A .= B .* C
, while A .+= B .* C
, which would allow to add (rather than to write) the output of B * C
to A
in-place, is missing. This would be useful for performant ML applications, in particular for gradient accumulation.
Describe the solution you'd like
indices = [Index(2), Index(3)]
A = ITensor(indices)
B = randomITensor(indices[1])
C = randomITensor(indices[2])
A .+= B .* C # (similarly to A .= B .* C)
Describe alternatives you've considered
A simple alternative would be to introduce a buffer tensor, which would double the amount of pre-allocated memory though:
indices = [Index(2), Index(3)]
A = ITensor(indices)
A_buffer = ITensor(indices)
B = randomITensor(indices[1])
C = randomITensor(indices[2])
A_buffer .= B .* C
A += A_buffer