Sampling parameters, generalize batch config. #230
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
✨ Description
Deal with some structural issues and technical debt causing trouble for ongoing work.
SamplingParameter
structure for holding the data sampling parameters that come from the trainer (batch config, model, etc.). This is needed as an alternative to the fast-growing argument list toGPTData.__init__
and the associated bloat. (ex. optionally prevent cross-document attention #177, improvements to MTP implementation #218, DPO #223)BatchConfig
, extract the model-specific parameters intoGPTBatchConfig
.num_micro_sequences
->micro_batch_splits
. Since the generic batch config and schedule runner shouldn't have to know about model-specific sequences.use_loss_masking_spans
to the batch config. This will make it easier to know if loss masking is enabled, ex. to prevent it in Knowledge distillation, fix and improve cross-entropy #229). @sohamparikh this may require some config changes. I added backward compatibility, but it will only work if set globally (data.sampling.use_loss_masking_spans
)cached_property
in a few places following the discussion in Make the specified config parameters update the pretrained config #211. If it work we can use it all over the place to simplify derived fields.🔍 Type of change
Select all that apply: