This project demonstrates the process of fine-tuning a pre-trained LLaMA model using PEFT (Parameter-Efficient Fine-Tuning) techniques and generating text responses. The model is fine-tuned on a medical terms dataset and is capable of generating detailed responses to user prompts.
To get started, clone this repository and install the required dependencies:
git clone https://github.com/yourusername/llama-finetuning.git
cd llama-finetuning
pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7
pip install huggingface_hub
-
Import Necessary Modules:
import torch from trl import SFTTrainer from peft import LoraConfig from datasets import load_dataset from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, pipeline)
-
Load the Pre-trained Model:
llama_model = AutoModelForCausalLM.from_pretrained( pretrained_model_name_or_path="aboonaji/llama2finetune-v2", quantization_config=BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=getattr(torch, "float16"), bnb_4bit_quant_type="nf4" ) ) llama_model.config.use_cache = False llama_model.config.pretraining_tp = 1
-
Load the Tokenizer:
llama_tokenizer = AutoTokenizer.from_pretrained( pretrained_model_name_or_path="aboonaji/llama2finetune-v2", trust_remote_code=True ) llama_tokenizer.pad_token = llama_tokenizer.eos_token llama_tokenizer.padding_side = "right"
-
Set Up Training Arguments:
training_args = TrainingArguments( output_dir="./results", per_device_train_batch_size=4, max_steps=100 )
-
Load the Dataset:
train_dataset = load_dataset( path="aboonaji/wiki_medical_terms_llam2_format", split="train" )
-
Define PEFT Configuration:
peft_config = LoraConfig( task_type="CAUSAL_LM", r=64, lora_alpha=16, lora_dropout=0.1 )
-
Initialize and Train with SFTTrainer:
llama_sft_trainer = SFTTrainer( model=llama_model, args=training_args, train_dataset=train_dataset, tokenizer=llama_tokenizer, peft_config=peft_config, dataset_text_field="text" ) llama_sft_trainer.train()
-
Define the User Prompt:
user_prompt = "Tell me about scoliosis"
-
Initialize Text Generation Pipeline:
text_generation_pipeline = pipeline( task="text-generation", model=llama_model, tokenizer=llama_tokenizer, max_length=300 )
-
Generate and Print the Model's Answer:
model_answer = text_generation_pipeline(f"<s> [INST] {user_prompt} [/INST]") print(model_answer[0]['generated_text'])
- Model:
aboonaji/llama2finetune-v2
- Tokenizer:
aboonaji/llama2finetune-v2
- Dataset:
aboonaji/wiki_medical_terms_llam2_format
- Training Arguments:
- Output Directory:
./results
- Batch Size: 4
- Max Steps: 100
- Output Directory:
- PEFT Configuration:
- Task Type:
CAUSAL_LM
- r: 64
- LoRA Alpha: 16
- LoRA Dropout: 0.1
- Task Type:
The dataset used for fine-tuning is a collection of medical terms in the LLaMA2 format. It is available at aboonaji/wiki_medical_terms_llam2_format
and contains text data that helps the model learn and generate accurate medical information.
The LLaMA model used in this project is a causal language model fine-tuned with PEFT techniques. The model is configured to use 4-bit quantization, allowing for efficient training and inference on lower computational resources.
- Memory Issues: Ensure your machine has sufficient memory to handle the dataset and model. Consider using cloud-based solutions if local resources are insufficient.
- Dependency Conflicts: Make sure all dependencies are installed with the specified versions to avoid conflicts.
- Training Problems: Double-check the dataset path and ensure the training arguments are set correctly.
- Hugging Face for providing the transformer models and datasets.
- The PEFT library for parameter-efficient fine-tuning.
- The contributors and maintainers of the LLaMA project.