1) The allocators in parallelized loops are thread-local 2) For nested parallelized loops, one thread might access another thread's thread-local buffer, which will result in unexpected behavior. 3) We need to hoist such allocators for correct result.