Skip to content

Commit 6bd51b1

Browse files
committed
docs: add more docs and updates related to TP
Signed-off-by: Mehant Kammakomati <[email protected]>
1 parent ebc811a commit 6bd51b1

File tree

3 files changed

+9
-7
lines changed

3 files changed

+9
-7
lines changed

docs/source/en/llm_tutorial_optimization.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ To give some examples of how much VRAM it roughly takes to load a model in bfloa
5555

5656
As of writing this document, the largest GPU chip on the market is the A100 & H100 offering 80GB of VRAM. Most of the models listed before require more than 80GB just to be loaded and therefore necessarily require [tensor parallelism](https://huggingface.co/docs/transformers/perf_train_gpu_many#tensor-parallelism) and/or [pipeline parallelism](https://huggingface.co/docs/transformers/perf_train_gpu_many#naive-model-parallelism-vertical-and-pipeline-parallelism).
5757

58-
🤗 Transformers does not support tensor parallelism out of the box as it requires the model architecture to be written in a specific way. If you're interested in writing models in a tensor-parallelism-friendly way, feel free to have a look at [the text-generation-inference library](https://github.com/huggingface/text-generation-inference/tree/main/server/text_generation_server/models/custom_modeling).
58+
🤗 Transformers now supports tensor parallelism for supported models having `base_tp_plan` in their respecitve config classes. Learn more about Tensor Parallelism [here](perf_train_gpu_many#tensor-parallelism). Furthermore, if you're interested in writing models in a tensor-parallelism-friendly way, feel free to have a look at [the text-generation-inference library](https://github.com/huggingface/text-generation-inference/tree/main/server/text_generation_server/models/custom_modeling).
5959

6060
Naive pipeline parallelism is supported out of the box. For this, simply load the model with `device="auto"` which will automatically place the different layers on the available GPUs as explained [here](https://huggingface.co/docs/accelerate/v0.22.0/en/concept_guides/big_model_inference).
6161
Note, however that while very effective, this naive pipeline parallelism does not tackle the issues of GPU idling. For this more advanced pipeline parallelism is required as explained [here](https://huggingface.co/docs/transformers/en/perf_train_gpu_many#naive-model-parallelism-vertical-and-pipeline-parallelism).

docs/source/en/perf_train_gpu_many.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -450,13 +450,13 @@ Implementations:
450450
- [parallelformers](https://github.com/tunib-ai/parallelformers) (only inference at the moment)
451451
- [SageMaker](https://arxiv.org/abs/2111.05972) - this is a proprietary solution that can only be used on AWS.
452452
- [OSLO](https://github.com/tunib-ai/oslo) has the tensor parallelism implementation based on the Transformers.
453-
- [`transformers` integration](main_classes/trainer)
453+
- [`transformers` integration](main_classes/trainer) tensor parallelism is available through tp_size attribute for models having `base_tp_plan`. Further you can look at [example usage](perf_infer_gpu_multi)
454454

455455
SageMaker combines TP with DP for a more efficient processing.
456456

457457
🤗 Transformers status:
458-
- core: not yet implemented in the core
459-
- but if you want inference [parallelformers](https://github.com/tunib-ai/parallelformers) provides this support for most of our models. So until this is implemented in the core you can use theirs. And hopefully training mode will be supported too.
458+
- core: uses PyTorch 2 APIs to support tensor parallelism to models having base_tp_plan in their respective config classes.
459+
- Alternatively, you can as well try [parallelformers](https://github.com/tunib-ai/parallelformers) that provides this support for most of our models. Training mode with TP is as well supported natively in transformers.
460460
- Deepspeed-Inference also supports our BERT, GPT-2, and GPT-Neo models in their super-fast CUDA-kernel-based inference mode, see more [here](https://www.deepspeed.ai/tutorials/inference-tutorial/)
461461

462462
🤗 Accelerate integrates with [TP from Megatron-LM](https://huggingface.co/docs/accelerate/v0.23.0/en/usage_guides/megatron_lm).
@@ -536,7 +536,7 @@ Important papers:
536536
- [Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model](
537537
https://arxiv.org/abs/2201.11990)
538538

539-
🤗 Transformers status: not yet implemented, since we have no PP and TP.
539+
🤗 Transformers status: not yet implemented, since we have no PP.
540540

541541
## FlexFlow
542542

src/transformers/training_args.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -570,8 +570,9 @@ class TrainingArguments:
570570
used when the xla flag is set to true, and an auto wrapping policy is specified through
571571
fsdp_min_num_params or fsdp_transformer_layer_cls_to_wrap.
572572
tp_size (`int`, *optional*):
573-
Use tp_size to enable PyTorch tensor parallelism. Set a value greater than 1 to activate TP. The same is
574-
used to prepare device mesh internally. Requires accelerate>1.3.0.
573+
Use tp_size to enable PyTorch tensor parallelism. Tensor parallelism support is only available to models having `base_tp_plan`
574+
in their respective config classes.
575+
Set a value greater than 1 to activate TP. The same is used to prepare device mesh internally. Requires accelerate>1.3.0.
575576
deepspeed (`str` or `dict`, *optional*):
576577
Use [Deepspeed](https://github.com/deepspeedai/DeepSpeed). This is an experimental feature and its API may
577578
evolve in the future. The value is either the location of DeepSpeed json config file (e.g.,
@@ -1257,6 +1258,7 @@ class TrainingArguments:
12571258
metadata={
12581259
"help": (
12591260
"Use tp_size to enable pytorch tensor parallelism."
1261+
"Tensor parallelism support is only available to models having `base_tp_plan` in their respective config classes."
12601262
"Set a value greater than 1 to activate TP."
12611263
"The same is used to prepare device mesh internally."
12621264
"Requires accelerate>1.3.0."

0 commit comments

Comments
 (0)