We should be able to take two level scenes and combine them, with the diffusion model finding a suitable way to stitch them together. I'm not sure how hard it will be. However, there is flexibility given that the diffusion model can generate scenes of any size.