We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent a49c6db commit f8035ddCopy full SHA for f8035dd
torchft/local_sgd.py
@@ -561,8 +561,8 @@ def _allreduce_bucketized(self) -> None:
561
562
class DiLoCo:
563
"""
564
- DiLoCo is a subclass of LocalSGD that overrides the synchronization
565
- mechanism to average and synchronize the pseudogradients (delta of the previous global weight and current local weights).
+ DiLoCo implements distributed optimization by averaging and synchronizing
+ pseudogradients (delta of the previous global weight and current local weights).
566
567
The class implements a more general version of DiLoco, Streaming DiLoCo,
568
which synchronizes fragments of pseudogradients at different steps.
0 commit comments