-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can it be fine tuned in samller GPU #8
Comments
I will try to fine tune with some smaller scale resources and let you know what I see. I think running with Flash Attention will help a lot with GPU memory issues ... |
Thanks. I did run successfully fine tune using deepspeed with much less
resource.
…On Thu, Jan 26, 2023 at 6:38 AM J38 ***@***.***> wrote:
I will try to fine tune with some smaller scale resources and let you know
what I see.
I think running with Flash Attention will help a lot with GPU memory
issues ...
—
Reply to this email directly, view it on GitHub
<#8 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACK7ODLDQMJ4A4QM6S6CMETWUJO2ZANCNFSM6AAAAAAUGGTOMI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
BTW, I turn on the Flash. Attention with --use_flash True, I got Runtime Exception Ideas ? |
It seems flash_attn only supports heads dimension of multiple 8, pubmedgpt is 20. |
We trained the model with Flash Attention so it should definitely work ... I will get a working example going and get back to you with what I did ... |
I've fine tuned it with Flash Attention before ... |
Btw, do you mind sharing your fine tuned model. Thanks
…On Thu, Jan 26, 2023 at 10:47 PM J38 ***@***.***> wrote:
I've fine tuned it with Flash Attention before ...
—
Reply to this email directly, view it on GitHub
<#8 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACK7ODLPKDFJUC3EV3HVJZDWUNAM5ANCNFSM6AAAAAAUGGTOMI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
As compute resources become available we should fine-tune some models and release new versions that are fine tuned! Just to provide and update, the NIH has asked us to rename the model since they hold the trademark on "PubMed" and OpenAI is trademarking GPT ... so from now on the model is BioMedLM ! |
Hi anyili, I am trying to fine tune on a seqcls task with deepspeed but I encountered an error "RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.". I have just raised this issue for help. I saw that you manage to fine tune with deepspeed successfully. Are you able to share with me how you do it ? Thanks! |
@anyili can you tell me how to finetune seqcls with deepspeed? and anyone finetune seqcls success with --use_flash? |
Hi, could the model be fine-tuned in just a few smaller GPUs, like 4 A40 with 48Gb memory. I am trying to use deepspeed, but still OOM.
thanks
The text was updated successfully, but these errors were encountered: