Reward model inference

Need to add reward model inference for when the RM is a sizable model. Currently attempts to have RM on each GPU. This is problematic because there are many cases where RM is too big to fit alongside the denoiser model. Solution in LLM case is often to use Triton inference server or to put RM on one gpu while main model uses rest of GPUs. Should be explored further.