-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use AWS Neuron sdk 2.21 #754
base: main
Are you sure you want to change the base?
Conversation
1d60cef
to
e3e9c80
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
54d54a0
to
fef3fa8
Compare
00c9655
to
fa0d0b9
Compare
@dacorvo For failing vision model tests, it seems that batch_size != 1 will all fail (in our tests, we applied batch_size = 2). I will open a ticket in neuron SDK repo, I could set batch size to 1 to make the CIs green, but I doubt if we should bump to that version... |
Export to float leads to compilation errors in AWS Neuron SDK 2.21.0
181182f
to
083a98f
Compare
083a98f
to
466d6d9
Compare
This PR is stale because it has been open 15 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
What does this PR do?
This bumps the AWS Neuron SDK version to use AWS Neuron SDK 2.21, which is the first SDK compatible with trn2 instances.
The underlying
pytorch
version is now 2.5.1, which implies significant changes in the XLA stack.This leads to compilation errors in:
[TEN404] (_divide.1146) Internal tensorizer error: BirCodeGenLoop:Too many strides! {{{{0,+,1}[4],+,0}[2],+,4}[16],+,0}[2]