Description
🚀 Feature
Supports input with dynamic shape.
Motivation
During the training of Large Language Models (LLMs), the sequence lengths of input data are typically variable, necessitating padding prior to training. Common padding techniques include "longest," which pads to the length of the longest sequence within a batch, and "max_length," which pads to a predetermined maximum length. Using max_length padding introduces substantial inefficiencies as the padding contributes to the computations, adversely affecting overall performance. Meanwhile, employing the longest sequence padding, although it considerably reduces redundant computations, can trigger XLA to perform multiple compilations due to the dynamic nature of the input data, such as the varying sequence_length dimension in a two-dimensional [batch_size, sequence_length] input. This dynamic shape can slow down the overall training process. While PyTorch/XLA has incorporated some dynamic operations like masked_select
and nonzero
, it does not yet accommodate inputs with dynamic shapes. Consequently, enabling support for dynamic shape inputs is essential.