-
Notifications
You must be signed in to change notification settings - Fork 28.7k
Can MPS use FP16 when training?Why I can't? #32648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Keeping the other issue closed and commenting over here: #32035 (comment) TL;DR it's in the torch nightlies, PyTorch only merged support like last week. Once it's on a stable release we'll enable it |
OK!Thanks a lot! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
please look into this issue, i am getting the same problem |
@muellerzr I asked Pytorch and they told me to ask huggingface since Pytorch had already added this function.So how it is now? |
System Info
Device:Apple M3 Pro
OS:macOS Sonoma 14.1
packages:
datasets 2.20.1.dev0
evaluate 0.4.2
huggingface-hub 0.23.5
tokenizers 0.19.1
torch 2.5.0.dev20240717
torchaudio 2.4.0.dev20240717
torchvision 0.20.0.dev20240717
Who can help?
@ArthurZucker @muellerzr
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
import os
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
TrainingArguments,
Trainer,
GenerationConfig,
DataCollatorForSeq2Seq
)
from datasets import Dataset, load_dataset
from peft import LoraConfig, TaskType, get_peft_model, PeftModel, PeftConfig
import torch
ds_name = input('请输入等待训练的数据集(csv文件)名称(不包含后缀)')
model_name = input('请输入等待训练的模型名称(子文件夹名称)')
save_name = input('请输入希望保存的lora名称')
current_dir = os.getcwd()
save_dir = os.path.join(current_dir, 'model_saved', save_name)
os.makedirs(save_dir, exist_ok=True)
target_file_path = os.path.join(current_dir, 'datasets', ds_name + '.csv')
model_dir = os.path.join(current_dir, 'model', model_name)
dataset = load_dataset("csv", data_files=target_file_path, split="train")
tokenizer = AutoTokenizer.from_pretrained(model_dir)
tokenizer.padding_side = "right"
tokenizer.pad_token_id = 2
def process_func(example):
MAX_LENGTH = 384
instruction = example.get("instruction", "")
input_text = example.get("input", "")
prompt = f"Human: {instruction}\n{input_text}".strip() if input_text else f"Human: {instruction}".strip()
instruction_tokenized = tokenizer(prompt + "\n\nAssistant: ", add_special_tokens=False)
response_tokenized = tokenizer(example["output"], add_special_tokens=False
input_ids = instruction_tokenized["input_ids"] + response_tokenized["input_ids"] + [tokenizer.eos_token_id]
attention_mask = instruction_tokenized["attention_mask"] + response_tokenized["attention_mask"] + [1]
labels = [-100] * len(instruction_tokenized["input_ids"]) + response_tokenized["input_ids"] + [tokenizer.eos_token_id]
if len(input_ids) > MAX_LENGTH:
input_ids = input_ids[:MAX_LENGTH]
attention_mask = attention_mask[:MAX_LENGTH]
labels = labels[:MAX_LENGTH]
return {
"input_ids": input_ids,
"attention_mask": attention_mask,
"labels": labels
}
tokenized_dataset = dataset.map(process_func, remove_columns=dataset.column_names)
print(tokenized_dataset)
device = torch.device("mps")
model = AutoModelForCausalLM.from_pretrained(
model_dir,
low_cpu_mem_usage=True,
torch_dtype=torch.half
)
model = model.to(device)
config = LoraConfig(task_type=TaskType.CAUSAL_LM)
model = get_peft_model(model, config)
model.print_trainable_parameters()
model=model.half
args = TrainingArguments(
output_dir=save_dir,
per_device_train_batch_size=2,
gradient_accumulation_steps=8,
logging_steps=10,
num_train_epochs=2,
)
trainer = Trainer(
model=model,
args=args,
train_dataset=tokenized_dataset,
data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)
trainer.train()
Expected behavior
Please let transformers not show me the error below again.Thanks for everyone!
ValueError Traceback (most recent call last)
Cell In[16], line 1
----> 1 trainer = Trainer(
2 model=model,
3 args=args,
4 train_dataset=tokenized_dataset,
5 data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
6 )
File ~/Data/AIHub/Trans-Penv/transformers/src/transformers/trainer.py:409, in Trainer.init(self, model, args, data_collator, train_dataset, eval_dataset, tokenizer, model_init, compute_metrics, callbacks, optimizers, preprocess_logits_for_metrics)
406 self.deepspeed = None
407 self.is_in_train = False
--> 409 self.create_accelerator_and_postprocess()
411 # memory metrics - must set up as early as possible
412 self._memory_tracker = TrainerMemoryTracker(self.args.skip_memory_metrics)
File ~/Data/AIHub/Trans-Penv/transformers/src/transformers/trainer.py:4648, in Trainer.create_accelerator_and_postprocess(self)
4645 args.update(accelerator_config)
4647 # create accelerator object
-> 4648 self.accelerator = Accelerator(**args)
4649 # some Trainer classes need to use
gather
instead ofgather_for_metrics
, thus we store a flag4650 self.gather_function = self.accelerator.gather_for_metrics
File /opt/anaconda3/envs/tfs/lib/python3.12/site-packages/accelerate/accelerator.py:467, in Accelerator.init(self, device_placement, split_batches, mixed_precision, gradient_accumulation_steps, cpu, dataloader_config, deepspeed_plugin, fsdp_plugin, megatron_lm_plugin, rng_types, log_with, project_dir, project_config, gradient_accumulation_plugin, dispatch_batches, even_batches, use_seedable_sampler, step_scheduler_with_optimizer, kwargs_handlers, dynamo_backend)
...
--> 467 raise ValueError(f"fp16 mixed precision requires a GPU (not {self.device.type!r}).")
468 kwargs = self.scaler_handler.to_kwargs() if self.scaler_handler is not None else {}
469 if self.distributed_type == DistributedType.FSDP:
ValueError: fp16 mixed precision requires a GPU (not 'mps').
The text was updated successfully, but these errors were encountered: