Skip to content

Commit

Permalink
Update llama2-7b-fine-tuning.ipynb
Browse files Browse the repository at this point in the history
Directory and Spelling changes
  • Loading branch information
Paulie631 authored and dacorvo committed Jan 27, 2025
1 parent 3a2bbe7 commit 62495b1
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions notebooks/text-generation/llama2-7b-fine-tuning.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
"source": [
"# Fine-tune Llama on AWS Trainium \n",
"\n",
"This tutorial will teach how to fine-tune open LLMs like [Llama 2](https://huggingface.co/meta-llama/Llama-2-70b-hf) on AWS Trainium. In our example, we are going to leverage Hugging Face Optimum Neuron, [Transformers](https://huggingface.co/docs/transformers/index)and datasets. \n",
"This tutorial will teach how to fine-tune open LLMs like [Llama 2](https://huggingface.co/meta-llama/Llama-2-70b-hf) on AWS Trainium. In our example, we are going to leverage Hugging Face Optimum Neuron, [Transformers](https://huggingface.co/docs/transformers/index) and datasets. \n",
"\n",
"You will learn how to:\n",
"\n",
Expand Down Expand Up @@ -54,7 +54,7 @@
"git clone https://github.com/huggingface/optimum-neuron.git\n",
"```\n",
"\n",
"Next we can change our directory to `notbooks/text-generation` and launch the `jupyter` environment.``\n",
"Next we can change our directory to `notebooks/text-generation` and launch the `jupyter` environment.``\n",
"\n",
"\n",
"```bash\n",
Expand Down Expand Up @@ -363,7 +363,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"_Note: Compiling without a cache can take ~40 minutes. It will also create dummy files in the `dolly_llama_sharded` during compilation you we have to remove them afterwards. We also need to add `MALLOC_ARENA_MAX=64` to limit the CPU allocation to avoid potential crashes, don't remove it for now._ "
"_Note: Compiling without a cache can take ~40 minutes. It will also create dummy files in the `dolly_llama` directory during compilation you we have to remove them afterwards. We also need to add `MALLOC_ARENA_MAX=64` to limit the CPU allocation to avoid potential crashes, don't remove it for now._ "
]
},
{
Expand Down Expand Up @@ -414,7 +414,7 @@
"source": [
"Thats it, we successfully trained Llama 7B on AWS Trainium. The training took for 3 epochs on dolly (15k samples) took 43:24 minutes where the raw training time was only 31:46 minutes. This leads to a cost of ~$15.5 for the e2e training on the trn1.32xlarge instance. Not Bad! \n",
"\n",
"But before we can share and test our model we need to consolidate our model. Since we used Tensor Parallelism during training, we need to consolidate the model weights before we can use it. Tensor Parallelism shards the model weights accross different workers, only sharded checkpoints will be saved during training.\n",
"But before we can share and test our model we need to consolidate our model. Since we used Tensor Parallelism during training, we need to consolidate the model weights before we can use it. Tensor Parallelism shards the model weights across different workers, only sharded checkpoints will be saved during training.\n",
"\n",
"The Optimum CLI provides a way of doing that very easily via the `optimum neuron consolidate`` command:"
]
Expand Down Expand Up @@ -481,7 +481,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"_Note: Inference compilation can take ~25minutes. Luckily, you need to only run this onces. Since you can save the model afterwards. If you are going to run on Inferentia2 you need to recompile again. The compilation is parameter and hardware specific._"
"_Note: Inference compilation can take ~25minutes. Luckily, you need to only run this once. Since you can save the model afterwards. If you are going to run on Inferentia2 you need to recompile again. The compilation is parameter and hardware specific._"
]
},
{
Expand Down Expand Up @@ -532,7 +532,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets test inference. First we test without a context.\n",
"Let's test inference. First we test without a context.\n",
"\n",
"_Note: Inference is not expected to be super fast on AWS Trainium using 2 cores. For Inference we recommend using Inferentia2._"
]
Expand Down Expand Up @@ -562,7 +562,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"That looks correct. Now, lets add some context, e.g. as you would do for RAG applications"
"That looks correct. Now, let's add some context, e.g. as you would do for RAG applications"
]
},
{
Expand Down

0 comments on commit 62495b1

Please sign in to comment.