Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model card issue#1125 #1129

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/hub/model-card-annotated.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,9 @@ _Write 1-2 sentences on what the training data is. Ideally this links to a Datas

## Training Procedure [optional]

_When you want to know what hardware you'll need to train or fine-tune a model, consider the following factors: the number of parameters in the model and the training regime you plan to use._

_e.g A model with 3B parameters and fp32 precision format needs at least 48GB of GPU memory, while bf16 requires at least 24GB of memory with Amphere or higher hardware. Mixed pf16 requires at least 54GB of GPU memory._
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_e.g A model with 3B parameters and fp32 precision format needs at least 48GB of GPU memory, while bf16 requires at least 24GB of memory with Amphere or higher hardware. Mixed pf16 requires at least 54GB of GPU memory._
_e.g A model with 3B parameters and fp32 precision format needs at least 48GB of GPU memory, while bf16 requires at least 24GB of memory with Ampere or higher hardware. Mixed fp16 requires at least 54GB of GPU memory._

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These numbers sound a bit high to me. In any case, they depend on a number of factors like optimizer choice. Should the recommended optimizer be a part of the training_regime data?


### Preprocessing

Expand All @@ -166,6 +169,13 @@ _Write 1-2 sentences on what the training data is. Ideally this links to a Datas

_Detail tokenization, resizing/rewriting (depending on the modality), etc._

### Training Hyperparameters


* **Training regime:** training_regime`

_Detail the model training process, specifically the type of precision used - whether it is **fp32/fp16/bf16** - and whether it is **mixed or non-mixed precision**_

### Speeds, Sizes, Times


Expand Down