can it be fine tuned in samller GPU #8

anyili · 2023-01-25T11:36:05Z

Hi, could the model be fine-tuned in just a few smaller GPUs, like 4 A40 with 48Gb memory. I am trying to use deepspeed, but still OOM.
thanks

J38 · 2023-01-26T11:38:10Z

I will try to fine tune with some smaller scale resources and let you know what I see.

I think running with Flash Attention will help a lot with GPU memory issues ...

anyili · 2023-01-26T12:41:52Z

Thanks. I did run successfully fine tune using deepspeed with much less resource.

…

On Thu, Jan 26, 2023 at 6:38 AM J38 ***@***.***> wrote: I will try to fine tune with some smaller scale resources and let you know what I see. I think running with Flash Attention will help a lot with GPU memory issues ... — Reply to this email directly, view it on GitHub <#8 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACK7ODLDQMJ4A4QM6S6CMETWUJO2ZANCNFSM6AAAAAAUGGTOMI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

anyili · 2023-01-26T15:24:00Z

BTW, I turn on the Flash. Attention with --use_flash True, I got Runtime Exception
RuntimeError: Expected is_sm80 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

Ideas ?

anyili · 2023-01-26T18:30:42Z

It seems flash_attn only supports heads dimension of multiple 8, pubmedgpt is 20.

J38 · 2023-01-27T03:47:06Z

We trained the model with Flash Attention so it should definitely work ... I will get a working example going and get back to you with what I did ...

J38 · 2023-01-27T03:47:15Z

I've fine tuned it with Flash Attention before ...

anyili · 2023-01-28T17:11:22Z

Btw, do you mind sharing your fine tuned model. Thanks

…

On Thu, Jan 26, 2023 at 10:47 PM J38 ***@***.***> wrote: I've fine tuned it with Flash Attention before ... — Reply to this email directly, view it on GitHub <#8 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACK7ODLPKDFJUC3EV3HVJZDWUNAM5ANCNFSM6AAAAAAUGGTOMI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

J38 · 2023-01-30T12:53:45Z

As compute resources become available we should fine-tune some models and release new versions that are fine tuned! Just to provide and update, the NIH has asked us to rename the model since they hold the trademark on "PubMed" and OpenAI is trademarking GPT ... so from now on the model is BioMedLM !

guathwa · 2023-02-01T07:29:26Z

Hi anyili, I am trying to fine tune on a seqcls task with deepspeed but I encountered an error "RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.". I have just raised this issue for help.

I saw that you manage to fine tune with deepspeed successfully. Are you able to share with me how you do it ? Thanks!

zhengbiqing · 2023-09-14T09:23:19Z

@anyili can you tell me how to finetune seqcls with deepspeed?

and anyone finetune seqcls success with --use_flash?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can it be fine tuned in samller GPU #8

can it be fine tuned in samller GPU #8

anyili commented Jan 25, 2023

J38 commented Jan 26, 2023

anyili commented Jan 26, 2023 via email

anyili commented Jan 26, 2023

anyili commented Jan 26, 2023

J38 commented Jan 27, 2023

J38 commented Jan 27, 2023

anyili commented Jan 28, 2023 via email

J38 commented Jan 30, 2023

guathwa commented Feb 1, 2023

zhengbiqing commented Sep 14, 2023

can it be fine tuned in samller GPU #8

can it be fine tuned in samller GPU #8

Comments

anyili commented Jan 25, 2023

J38 commented Jan 26, 2023

anyili commented Jan 26, 2023 via email

anyili commented Jan 26, 2023

anyili commented Jan 26, 2023

J38 commented Jan 27, 2023

J38 commented Jan 27, 2023

anyili commented Jan 28, 2023 via email

J38 commented Jan 30, 2023

guathwa commented Feb 1, 2023

zhengbiqing commented Sep 14, 2023