-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Need to add reward model inference for when the RM is a sizable model. Currently attempts to have RM on each GPU. This is problematic because there are many cases where RM is too big to fit alongside the denoiser model. Solution in LLM case is often to use Triton inference server or to put RM on one gpu while main model uses rest of GPUs. Should be explored further.
Metadata
Metadata
Assignees
Labels
No labels