Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can it be fine tuned in samller GPU #8

Open
anyili opened this issue Jan 25, 2023 · 10 comments
Open

can it be fine tuned in samller GPU #8

anyili opened this issue Jan 25, 2023 · 10 comments

Comments

@anyili
Copy link

anyili commented Jan 25, 2023

Hi, could the model be fine-tuned in just a few smaller GPUs, like 4 A40 with 48Gb memory. I am trying to use deepspeed, but still OOM.
thanks

@J38
Copy link
Contributor

J38 commented Jan 26, 2023

I will try to fine tune with some smaller scale resources and let you know what I see.

I think running with Flash Attention will help a lot with GPU memory issues ...

@anyili
Copy link
Author

anyili commented Jan 26, 2023 via email

@anyili
Copy link
Author

anyili commented Jan 26, 2023

BTW, I turn on the Flash. Attention with --use_flash True, I got Runtime Exception
RuntimeError: Expected is_sm80 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

Ideas ?

@anyili
Copy link
Author

anyili commented Jan 26, 2023

It seems flash_attn only supports heads dimension of multiple 8, pubmedgpt is 20.

@J38
Copy link
Contributor

J38 commented Jan 27, 2023

We trained the model with Flash Attention so it should definitely work ... I will get a working example going and get back to you with what I did ...

@J38
Copy link
Contributor

J38 commented Jan 27, 2023

I've fine tuned it with Flash Attention before ...

@anyili
Copy link
Author

anyili commented Jan 28, 2023 via email

@J38
Copy link
Contributor

J38 commented Jan 30, 2023

As compute resources become available we should fine-tune some models and release new versions that are fine tuned! Just to provide and update, the NIH has asked us to rename the model since they hold the trademark on "PubMed" and OpenAI is trademarking GPT ... so from now on the model is BioMedLM !

@guathwa
Copy link

guathwa commented Feb 1, 2023

Hi anyili, I am trying to fine tune on a seqcls task with deepspeed but I encountered an error "RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.". I have just raised this issue for help.

I saw that you manage to fine tune with deepspeed successfully. Are you able to share with me how you do it ? Thanks!

@zhengbiqing
Copy link

@anyili can you tell me how to finetune seqcls with deepspeed?

and anyone finetune seqcls success with --use_flash?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants