-
Notifications
You must be signed in to change notification settings - Fork 125
feat: Add Hyperpod Optimum-neuron LoRA example #631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution.
Could you please:
- join the AWS organization on github
- organize the folder as
pytorch/fine-tuning/optimum-neuron
- Add the kubernetes manifest into a kubernetes folder. The repo structure organizes by scheduler.
Thank you for the quick review!
|
aws-samples organization 2and 3 are good. |
|
||
# # Update Neuron Compiler and Framework | ||
RUN python -m pip install --upgrade neuronx-cc==2.* torch-neuronx==2.1.* torchvision | ||
RUN python -m pip install --upgrade neuronx-distributed neuronx-distributed-training |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fix the library version? A new release could break the compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jianying, I have pin the dependencies in this CR. Ideally when this PR is merged and released, we will switch to optimum-neuron 0.1.0 sagemaker image: https://github.com/aws/deep-learning-containers/pull/4670/files#diff-0f776bad437279bcc3d6005ec1b29170f0b4e53dfbc5c3c234fb202817f707a3. It has the required dependencies too.
3.test_cases/pytorch/optimum-neuron/llama3/kubernetes/fine-tuning/Dockerfile
Outdated
Show resolved
Hide resolved
3.test_cases/pytorch/optimum-neuron/llama3/kubernetes/fine-tuning/src/peft_train.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment
3.test_cases/pytorch/optimum-neuron/llama3/kubernetes/fine-tuning/README.md
Outdated
Show resolved
Hide resolved
3.test_cases/pytorch/optimum-neuron/llama3/kubernetes/fine-tuning/README.md
Outdated
Show resolved
Hide resolved
3.test_cases/pytorch/optimum-neuron/llama3/kubernetes/fine-tuning/README.md
Show resolved
Hide resolved
3.test_cases/pytorch/optimum-neuron/llama3/kubernetes/fine-tuning/README.md
Outdated
Show resolved
Hide resolved
…ing/README.md Co-authored-by: Keita Watanabe <[email protected]>
…ing/README.md Co-authored-by: Keita Watanabe <[email protected]>
…ing/README.md Co-authored-by: Keita Watanabe <[email protected]>
3.test_cases/pytorch/optimum-neuron/llama3/kubernetes/fine-tuning/Dockerfile
Outdated
Show resolved
Hide resolved
Thank you, I have added to the aws-samples organization. |
3.test_cases/pytorch/optimum-neuron/llama3/kubernetes/fine-tuning/generate-jobspec.sh
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
* Add Hyperpod Optimum-neuron LoRA example * fix README * restructure files * fix * Update to use newer version of peft and llama 3.8 * Update 3.test_cases/pytorch/optimum-neuron/llama3/kubernetes/fine-tuning/README.md Co-authored-by: Keita Watanabe <[email protected]> * Update 3.test_cases/pytorch/optimum-neuron/llama3/kubernetes/fine-tuning/README.md Co-authored-by: Keita Watanabe <[email protected]> * Update 3.test_cases/pytorch/optimum-neuron/llama3/kubernetes/fine-tuning/README.md Co-authored-by: Keita Watanabe <[email protected]> * pin dependencies and address comments * fix * switch model, remove HF token, update compile steps --------- Co-authored-by: Keita Watanabe <[email protected]>
Issue #, if available: N/A, new feature
Description of changes:
This PR adds an example to the test_cases, by using Huggingface optimum-neuron library for PEFT fine-tuning. This repo is is the interface between the Huggingface Transformers library and AWS Accelerators including AWS Trainium and AWS Inferentia. It provides a set of tools enabling easy model loading, training and inference on single- and multi-Accelerator settings for different downstream tasks.
The example uses Hyperpod EKS environment, and the code will used in the Hyperpod workshop
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.