From 62495b1a571ac77e290dee5efdcdbd79a8a570e1 Mon Sep 17 00:00:00 2001 From: Paulie631 Date: Fri, 10 Jan 2025 11:56:07 -0500 Subject: [PATCH] Update llama2-7b-fine-tuning.ipynb Directory and Spelling changes --- .../text-generation/llama2-7b-fine-tuning.ipynb | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/notebooks/text-generation/llama2-7b-fine-tuning.ipynb b/notebooks/text-generation/llama2-7b-fine-tuning.ipynb index c8db71270..08e41a58b 100644 --- a/notebooks/text-generation/llama2-7b-fine-tuning.ipynb +++ b/notebooks/text-generation/llama2-7b-fine-tuning.ipynb @@ -7,7 +7,7 @@ "source": [ "# Fine-tune Llama on AWS Trainium \n", "\n", - "This tutorial will teach how to fine-tune open LLMs like [Llama 2](https://huggingface.co/meta-llama/Llama-2-70b-hf) on AWS Trainium. In our example, we are going to leverage Hugging Face Optimum Neuron, [Transformers](https://huggingface.co/docs/transformers/index)and datasets. \n", + "This tutorial will teach how to fine-tune open LLMs like [Llama 2](https://huggingface.co/meta-llama/Llama-2-70b-hf) on AWS Trainium. In our example, we are going to leverage Hugging Face Optimum Neuron, [Transformers](https://huggingface.co/docs/transformers/index) and datasets. \n", "\n", "You will learn how to:\n", "\n", @@ -54,7 +54,7 @@ "git clone https://github.com/huggingface/optimum-neuron.git\n", "```\n", "\n", - "Next we can change our directory to `notbooks/text-generation` and launch the `jupyter` environment.``\n", + "Next we can change our directory to `notebooks/text-generation` and launch the `jupyter` environment.``\n", "\n", "\n", "```bash\n", @@ -363,7 +363,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "_Note: Compiling without a cache can take ~40 minutes. It will also create dummy files in the `dolly_llama_sharded` during compilation you we have to remove them afterwards. We also need to add `MALLOC_ARENA_MAX=64` to limit the CPU allocation to avoid potential crashes, don't remove it for now._ " + "_Note: Compiling without a cache can take ~40 minutes. It will also create dummy files in the `dolly_llama` directory during compilation you we have to remove them afterwards. We also need to add `MALLOC_ARENA_MAX=64` to limit the CPU allocation to avoid potential crashes, don't remove it for now._ " ] }, { @@ -414,7 +414,7 @@ "source": [ "Thats it, we successfully trained Llama 7B on AWS Trainium. The training took for 3 epochs on dolly (15k samples) took 43:24 minutes where the raw training time was only 31:46 minutes. This leads to a cost of ~$15.5 for the e2e training on the trn1.32xlarge instance. Not Bad! \n", "\n", - "But before we can share and test our model we need to consolidate our model. Since we used Tensor Parallelism during training, we need to consolidate the model weights before we can use it. Tensor Parallelism shards the model weights accross different workers, only sharded checkpoints will be saved during training.\n", + "But before we can share and test our model we need to consolidate our model. Since we used Tensor Parallelism during training, we need to consolidate the model weights before we can use it. Tensor Parallelism shards the model weights across different workers, only sharded checkpoints will be saved during training.\n", "\n", "The Optimum CLI provides a way of doing that very easily via the `optimum neuron consolidate`` command:" ] @@ -481,7 +481,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "_Note: Inference compilation can take ~25minutes. Luckily, you need to only run this onces. Since you can save the model afterwards. If you are going to run on Inferentia2 you need to recompile again. The compilation is parameter and hardware specific._" + "_Note: Inference compilation can take ~25minutes. Luckily, you need to only run this once. Since you can save the model afterwards. If you are going to run on Inferentia2 you need to recompile again. The compilation is parameter and hardware specific._" ] }, { @@ -532,7 +532,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Lets test inference. First we test without a context.\n", + "Let's test inference. First we test without a context.\n", "\n", "_Note: Inference is not expected to be super fast on AWS Trainium using 2 cores. For Inference we recommend using Inferentia2._" ] @@ -562,7 +562,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "That looks correct. Now, lets add some context, e.g. as you would do for RAG applications" + "That looks correct. Now, let's add some context, e.g. as you would do for RAG applications" ] }, {