Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wrong inputs/model placement when using a single core #725

Merged
merged 1 commit into from
Nov 15, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions optimum/neuron/modeling_traced.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
NEURON_COMPILER_VERSION = get_neuroncc_version()

if is_neuronx_available():
import torch_neuronx
from torch_neuronx import move_trace_to_device

NEURON_COMPILER_TYPE = "neuronx-cc"
Expand Down Expand Up @@ -127,8 +128,12 @@ def load_model(
if path.is_file():
model = torch.jit.load(path)
# For non-inlined models, send the module manually to device. This is important for weights/neff non-inlined module since when loading the module, the neff is automatically moved to Neuron but not the weights. We need to move the weights to Neuron as well manually to avoid great host to device IO penalty.
if is_neuronx_available() and to_neuron:
move_trace_to_device(model, device_id)
if is_neuronx_available():
torch_neuronx.experimental.set_neuron_cores(
model, start_nc=0, nc_count=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the model has tp_degree > 1 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no tp support right now for SD models, only one neuron device is supported, and we want both cores to be used, we apply ddp with the entire model loaded on both cores.

) # The inputs are allocated to nc:0 by default, this line ensures both input tensors and the model are on the same core.
if to_neuron:
move_trace_to_device(model, device_id)
return model

def replace_weights(self, weights: Optional[Union[Dict[str, torch.Tensor], torch.nn.Module]] = None):
Expand Down
Loading